[2024-07-24 15:44:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-24 15:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-24 15:44:05 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-24 15:44:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/e62.pth [2024-07-24 15:44:38 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/e62.pth.................... [2024-07-24 15:44:38 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-24 15:44:38 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-24 15:44:38 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/e62.pth' (epoch 62) [2024-07-24 15:44:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-24 15:44:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][0/625] eta 1:36:46 lr 0.001132 wd 0.0500 time 9.2899 (9.2899) data time 0.7451 (0.7451) model time 0.0000 (0.0000) loss 4.2153 (4.2153) grad_norm 1.6301 (1.6301) loss_scale 8192.0000 (8192.0000) mem 16181MB [2024-07-24 15:44:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][10/625] eta 0:13:03 lr 0.001132 wd 0.0500 time 0.3404 (1.2737) data time 0.0010 (0.0687) model time 0.0000 (0.0000) loss 2.7438 (3.7118) grad_norm 1.5426 (1.4335) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][20/625] eta 0:08:21 lr 0.001132 wd 0.0500 time 0.3321 (0.8291) data time 0.0010 (0.0365) model time 0.0000 (0.0000) loss 3.6850 (3.6685) grad_norm 1.3496 (1.4473) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][30/625] eta 0:06:39 lr 0.001132 wd 0.0500 time 0.3361 (0.6707) data time 0.0008 (0.0250) model time 0.0000 (0.0000) loss 2.4740 (3.6714) grad_norm 1.3929 (1.3848) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][40/625] eta 0:05:44 lr 0.001132 wd 0.0500 time 0.3355 (0.5896) data time 0.0010 (0.0192) model time 0.0000 (0.0000) loss 3.6677 (3.6188) grad_norm 1.7426 (1.4090) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][50/625] eta 0:05:10 lr 0.001132 wd 0.0500 time 0.3341 (0.5402) data time 0.0008 (0.0157) model time 0.0000 (0.0000) loss 4.1543 (3.6156) grad_norm 1.1905 (1.4034) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][60/625] eta 0:04:51 lr 0.001132 wd 0.0500 time 0.3330 (0.5156) data time 0.0012 (0.0132) model time 0.3318 (0.3888) loss 3.3322 (3.5680) grad_norm 2.4788 (1.4420) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][70/625] eta 0:04:32 lr 0.001132 wd 0.0500 time 0.3488 (0.4906) data time 0.0010 (0.0115) model time 0.3478 (0.3631) loss 3.1291 (3.5160) grad_norm 1.5676 (1.4715) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][80/625] eta 0:04:17 lr 0.001132 wd 0.0500 time 0.3336 (0.4722) data time 0.0010 (0.0102) model time 0.3325 (0.3554) loss 2.8011 (3.5034) grad_norm 1.5931 (1.4688) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][90/625] eta 0:04:04 lr 0.001132 wd 0.0500 time 0.3394 (0.4579) data time 0.0008 (0.0093) model time 0.3386 (0.3519) loss 4.0056 (3.4829) grad_norm 1.6189 (1.4570) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][100/625] eta 0:03:54 lr 0.001132 wd 0.0500 time 0.3383 (0.4464) data time 0.0008 (0.0084) model time 0.3375 (0.3495) loss 3.4274 (3.4805) grad_norm 1.5448 (1.4470) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][110/625] eta 0:03:44 lr 0.001132 wd 0.0500 time 0.3486 (0.4369) data time 0.0010 (0.0078) model time 0.3476 (0.3479) loss 2.9531 (3.4697) grad_norm 1.9360 (1.4599) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][120/625] eta 0:03:36 lr 0.001132 wd 0.0500 time 0.3338 (0.4289) data time 0.0008 (0.0072) model time 0.3330 (0.3467) loss 2.3489 (3.4650) grad_norm 1.4261 (1.4526) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][130/625] eta 0:03:29 lr 0.001132 wd 0.0500 time 0.3406 (0.4224) data time 0.0011 (0.0067) model time 0.3396 (0.3462) loss 3.4742 (3.4513) grad_norm 1.2778 (1.4322) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][140/625] eta 0:03:22 lr 0.001132 wd 0.0500 time 0.3390 (0.4166) data time 0.0009 (0.0063) model time 0.3381 (0.3455) loss 3.7330 (3.4442) grad_norm 1.0613 (1.4234) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][150/625] eta 0:03:15 lr 0.001131 wd 0.0500 time 0.3493 (0.4116) data time 0.0011 (0.0060) model time 0.3482 (0.3450) loss 2.6711 (3.4408) grad_norm 1.5825 (1.4319) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][160/625] eta 0:03:09 lr 0.001131 wd 0.0500 time 0.3427 (0.4073) data time 0.0010 (0.0057) model time 0.3416 (0.3447) loss 3.2395 (3.4407) grad_norm 1.0677 (1.4373) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][170/625] eta 0:03:03 lr 0.001131 wd 0.0500 time 0.3454 (0.4035) data time 0.0010 (0.0054) model time 0.3444 (0.3444) loss 3.5265 (3.4331) grad_norm 1.0938 (1.4307) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][180/625] eta 0:02:57 lr 0.001131 wd 0.0500 time 0.3374 (0.4000) data time 0.0011 (0.0052) model time 0.3363 (0.3439) loss 3.4671 (3.4210) grad_norm 1.3268 (1.4327) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][190/625] eta 0:02:52 lr 0.001131 wd 0.0500 time 0.3401 (0.3970) data time 0.0011 (0.0049) model time 0.3390 (0.3437) loss 2.7116 (3.4142) grad_norm 1.4119 (1.4321) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][200/625] eta 0:02:47 lr 0.001131 wd 0.0500 time 0.3405 (0.3943) data time 0.0011 (0.0047) model time 0.3394 (0.3436) loss 3.2649 (3.4012) grad_norm 2.1414 (1.4357) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][210/625] eta 0:02:42 lr 0.001131 wd 0.0500 time 0.3449 (0.3918) data time 0.0011 (0.0046) model time 0.3438 (0.3435) loss 3.6183 (3.3918) grad_norm 1.8819 (1.4418) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][220/625] eta 0:02:37 lr 0.001131 wd 0.0500 time 0.3437 (0.3896) data time 0.0008 (0.0044) model time 0.3429 (0.3433) loss 3.6172 (3.3857) grad_norm 1.3973 (1.4366) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][230/625] eta 0:02:33 lr 0.001131 wd 0.0500 time 0.3390 (0.3875) data time 0.0008 (0.0043) model time 0.3382 (0.3432) loss 2.0207 (3.3829) grad_norm 1.3081 (1.4301) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][240/625] eta 0:02:28 lr 0.001131 wd 0.0500 time 0.3437 (0.3855) data time 0.0008 (0.0041) model time 0.3429 (0.3430) loss 3.5913 (3.3819) grad_norm 1.1207 (1.4268) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][250/625] eta 0:02:23 lr 0.001131 wd 0.0500 time 0.3379 (0.3838) data time 0.0008 (0.0040) model time 0.3372 (0.3429) loss 3.5675 (3.3760) grad_norm 2.1533 (1.4275) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][260/625] eta 0:02:19 lr 0.001131 wd 0.0500 time 0.3359 (0.3822) data time 0.0008 (0.0039) model time 0.3350 (0.3427) loss 3.5820 (3.3688) grad_norm 1.0249 (1.4275) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][270/625] eta 0:02:15 lr 0.001131 wd 0.0500 time 0.3579 (0.3809) data time 0.0008 (0.0038) model time 0.3571 (0.3429) loss 3.6998 (3.3603) grad_norm 1.3627 (1.4265) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][280/625] eta 0:02:11 lr 0.001131 wd 0.0500 time 0.3595 (0.3809) data time 0.0010 (0.0037) model time 0.3584 (0.3445) loss 3.4452 (3.3631) grad_norm 1.5435 (1.4295) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][290/625] eta 0:02:07 lr 0.001131 wd 0.0500 time 0.3347 (0.3795) data time 0.0009 (0.0036) model time 0.3338 (0.3443) loss 2.4315 (3.3592) grad_norm 1.4338 (1.4276) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][300/625] eta 0:02:02 lr 0.001131 wd 0.0500 time 0.3381 (0.3784) data time 0.0010 (0.0035) model time 0.3370 (0.3444) loss 3.0556 (3.3474) grad_norm 1.1730 (1.4289) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][310/625] eta 0:01:58 lr 0.001131 wd 0.0500 time 0.3376 (0.3773) data time 0.0011 (0.0034) model time 0.3366 (0.3443) loss 3.9949 (3.3488) grad_norm 1.6030 (1.4261) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][320/625] eta 0:01:54 lr 0.001131 wd 0.0500 time 0.3608 (0.3762) data time 0.0008 (0.0034) model time 0.3600 (0.3442) loss 4.0589 (3.3612) grad_norm 1.5657 (1.4240) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][330/625] eta 0:01:50 lr 0.001131 wd 0.0500 time 0.3438 (0.3752) data time 0.0008 (0.0033) model time 0.3430 (0.3441) loss 2.5369 (3.3592) grad_norm 1.6265 (1.4232) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][340/625] eta 0:01:46 lr 0.001131 wd 0.0500 time 0.3356 (0.3743) data time 0.0010 (0.0032) model time 0.3346 (0.3440) loss 3.6112 (3.3619) grad_norm 1.1311 (1.4198) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][350/625] eta 0:01:42 lr 0.001130 wd 0.0500 time 0.3392 (0.3733) data time 0.0007 (0.0032) model time 0.3385 (0.3439) loss 3.3736 (3.3637) grad_norm 1.7983 (1.4198) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:46:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][360/625] eta 0:01:38 lr 0.001130 wd 0.0500 time 0.3699 (0.3725) data time 0.0008 (0.0031) model time 0.3691 (0.3438) loss 2.8324 (3.3612) grad_norm 1.2930 (1.4193) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][370/625] eta 0:01:34 lr 0.001130 wd 0.0500 time 0.3346 (0.3717) data time 0.0012 (0.0031) model time 0.3333 (0.3438) loss 2.4462 (3.3584) grad_norm 1.1330 (1.4158) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][380/625] eta 0:01:30 lr 0.001130 wd 0.0500 time 0.3341 (0.3708) data time 0.0008 (0.0030) model time 0.3333 (0.3436) loss 2.4637 (3.3559) grad_norm 1.7701 (1.4120) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][390/625] eta 0:01:26 lr 0.001130 wd 0.0500 time 0.3465 (0.3701) data time 0.0008 (0.0029) model time 0.3457 (0.3436) loss 4.3936 (3.3518) grad_norm 1.4430 (1.4147) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][400/625] eta 0:01:23 lr 0.001130 wd 0.0500 time 0.3349 (0.3695) data time 0.0010 (0.0029) model time 0.3339 (0.3436) loss 3.5876 (3.3562) grad_norm 1.7250 (1.4141) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][410/625] eta 0:01:19 lr 0.001130 wd 0.0500 time 0.3537 (0.3690) data time 0.0011 (0.0029) model time 0.3526 (0.3437) loss 3.9405 (3.3612) grad_norm 4.0367 (1.4262) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][420/625] eta 0:01:15 lr 0.001130 wd 0.0500 time 0.3434 (0.3684) data time 0.0008 (0.0028) model time 0.3426 (0.3436) loss 3.5418 (3.3620) grad_norm 1.3406 (1.4361) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][430/625] eta 0:01:11 lr 0.001130 wd 0.0500 time 0.3438 (0.3679) data time 0.0008 (0.0028) model time 0.3430 (0.3437) loss 3.7321 (3.3685) grad_norm 2.0383 (1.4434) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][440/625] eta 0:01:07 lr 0.001130 wd 0.0500 time 0.3471 (0.3673) data time 0.0007 (0.0027) model time 0.3464 (0.3437) loss 3.9027 (3.3716) grad_norm 1.4066 (1.4415) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][450/625] eta 0:01:04 lr 0.001130 wd 0.0500 time 0.3533 (0.3668) data time 0.0008 (0.0027) model time 0.3525 (0.3436) loss 3.2628 (3.3704) grad_norm 1.4972 (1.4399) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][460/625] eta 0:01:00 lr 0.001130 wd 0.0500 time 0.3359 (0.3663) data time 0.0011 (0.0027) model time 0.3347 (0.3436) loss 3.3266 (3.3652) grad_norm 1.9414 (1.4441) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][470/625] eta 0:00:56 lr 0.001130 wd 0.0500 time 0.3448 (0.3658) data time 0.0010 (0.0026) model time 0.3437 (0.3436) loss 3.3490 (3.3573) grad_norm 1.2427 (1.4443) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][480/625] eta 0:00:52 lr 0.001130 wd 0.0500 time 0.3389 (0.3653) data time 0.0009 (0.0026) model time 0.3380 (0.3435) loss 3.7554 (3.3583) grad_norm 1.0610 (1.4386) loss_scale 8192.0000 (8192.0000) mem 14258MB [2024-07-24 15:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][490/625] eta 0:00:49 lr 0.001130 wd 0.0500 time 0.3396 (0.3649) data time 0.0011 (0.0026) model time 0.3385 (0.3435) loss 3.9022 (3.3626) grad_norm 1.6678 (1.4414) loss_scale 16384.0000 (8308.7902) mem 14258MB [2024-07-24 15:47:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][500/625] eta 0:00:45 lr 0.001130 wd 0.0500 time 0.3368 (0.3653) data time 0.0010 (0.0026) model time 0.3358 (0.3443) loss 3.6392 (3.3593) grad_norm 1.6227 (1.4441) loss_scale 16384.0000 (8469.9721) mem 14258MB [2024-07-24 15:47:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][510/625] eta 0:00:41 lr 0.001130 wd 0.0500 time 0.3388 (0.3648) data time 0.0009 (0.0025) model time 0.3379 (0.3443) loss 4.0564 (3.3629) grad_norm 1.1143 (1.4395) loss_scale 16384.0000 (8624.8454) mem 14258MB [2024-07-24 15:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][520/625] eta 0:00:38 lr 0.001130 wd 0.0500 time 0.3402 (0.3644) data time 0.0008 (0.0025) model time 0.3393 (0.3443) loss 3.3633 (3.3632) grad_norm 1.0437 (1.4346) loss_scale 16384.0000 (8773.7735) mem 14258MB [2024-07-24 15:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][530/625] eta 0:00:34 lr 0.001130 wd 0.0500 time 0.3490 (0.3640) data time 0.0012 (0.0025) model time 0.3478 (0.3442) loss 3.3053 (3.3561) grad_norm 1.4947 (1.4347) loss_scale 16384.0000 (8917.0923) mem 14258MB [2024-07-24 15:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][540/625] eta 0:00:30 lr 0.001130 wd 0.0500 time 0.3645 (0.3637) data time 0.0010 (0.0025) model time 0.3634 (0.3442) loss 3.2949 (3.3547) grad_norm 1.1509 (1.4375) loss_scale 16384.0000 (9055.1128) mem 14258MB [2024-07-24 15:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][550/625] eta 0:00:27 lr 0.001129 wd 0.0500 time 0.3357 (0.3633) data time 0.0010 (0.0024) model time 0.3347 (0.3441) loss 4.0885 (3.3563) grad_norm 1.5554 (1.4360) loss_scale 16384.0000 (9188.1234) mem 14258MB [2024-07-24 15:48:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][560/625] eta 0:00:23 lr 0.001129 wd 0.0500 time 0.3447 (0.3629) data time 0.0010 (0.0024) model time 0.3437 (0.3441) loss 3.6645 (3.3605) grad_norm 1.8105 (1.4376) loss_scale 16384.0000 (9316.3922) mem 14258MB [2024-07-24 15:48:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][570/625] eta 0:00:19 lr 0.001129 wd 0.0500 time 0.3529 (0.3626) data time 0.0011 (0.0024) model time 0.3518 (0.3440) loss 3.7658 (3.3643) grad_norm 1.3481 (1.4377) loss_scale 16384.0000 (9440.1681) mem 14258MB [2024-07-24 15:48:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][580/625] eta 0:00:16 lr 0.001129 wd 0.0500 time 0.3358 (0.3623) data time 0.0009 (0.0024) model time 0.3349 (0.3440) loss 3.9740 (3.3674) grad_norm 1.1873 (1.4380) loss_scale 16384.0000 (9559.6833) mem 14258MB [2024-07-24 15:48:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][590/625] eta 0:00:12 lr 0.001129 wd 0.0500 time 0.3434 (0.3620) data time 0.0011 (0.0024) model time 0.3422 (0.3440) loss 3.7544 (3.3693) grad_norm 1.4953 (1.4394) loss_scale 16384.0000 (9675.1540) mem 14258MB [2024-07-24 15:48:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][600/625] eta 0:00:09 lr 0.001129 wd 0.0500 time 0.3393 (0.3616) data time 0.0010 (0.0024) model time 0.3383 (0.3439) loss 2.2232 (3.3686) grad_norm 1.1520 (1.4357) loss_scale 16384.0000 (9786.7820) mem 14258MB [2024-07-24 15:48:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][610/625] eta 0:00:05 lr 0.001129 wd 0.0500 time 0.3343 (0.3613) data time 0.0008 (0.0023) model time 0.3335 (0.3439) loss 3.2978 (3.3663) grad_norm 1.1383 (1.4333) loss_scale 16384.0000 (9894.7561) mem 14258MB [2024-07-24 15:48:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][620/625] eta 0:00:01 lr 0.001129 wd 0.0500 time 0.3294 (0.3608) data time 0.0005 (0.0023) model time 0.3289 (0.3436) loss 3.9510 (3.3680) grad_norm 1.5097 (1.4328) loss_scale 16384.0000 (9999.2528) mem 14258MB [2024-07-24 15:48:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 63 training takes 0:03:45 [2024-07-24 15:48:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 15:48:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 15:48:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.493 (0.493) Loss 0.7344 (0.7344) Acc@1 84.082 (84.082) Acc@5 97.656 (97.656) Mem 14258MB [2024-07-24 15:48:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.128) Loss 1.1484 (0.8788) Acc@1 75.391 (82.036) Acc@5 93.408 (96.378) Mem 14258MB [2024-07-24 15:48:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.108) Loss 1.3252 (1.0454) Acc@1 70.361 (77.846) Acc@5 90.723 (94.338) Mem 14258MB [2024-07-24 15:48:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.479 Acc@5 94.272 [2024-07-24 15:48:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.5% [2024-07-24 15:48:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.829 (0.829) Loss 0.7109 (0.7109) Acc@1 85.889 (85.889) Acc@5 97.314 (97.314) Mem 14258MB [2024-07-24 15:48:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.164) Loss 1.1406 (0.8686) Acc@1 73.877 (81.760) Acc@5 93.311 (96.227) Mem 14258MB [2024-07-24 15:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.127) Loss 1.3398 (1.0383) Acc@1 68.945 (77.520) Acc@5 90.479 (94.085) Mem 14258MB [2024-07-24 15:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.207 Acc@5 94.096 [2024-07-24 15:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.2% [2024-07-24 15:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.21% [2024-07-24 15:48:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 15:48:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 15:48:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][0/625] eta 0:16:55 lr 0.001129 wd 0.0500 time 1.6245 (1.6245) data time 0.4293 (0.4293) model time 0.0000 (0.0000) loss 2.8250 (2.8250) grad_norm 1.4730 (1.4730) loss_scale 16384.0000 (16384.0000) mem 14257MB [2024-07-24 15:48:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][10/625] eta 0:04:41 lr 0.001129 wd 0.0500 time 0.3410 (0.4577) data time 0.0007 (0.0400) model time 0.0000 (0.0000) loss 3.3713 (3.4262) grad_norm 1.3607 (1.4782) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:48:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][20/625] eta 0:04:03 lr 0.001129 wd 0.0500 time 0.3442 (0.4028) data time 0.0010 (0.0216) model time 0.0000 (0.0000) loss 3.5245 (3.2974) grad_norm 1.3049 (1.6382) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:48:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][30/625] eta 0:03:47 lr 0.001129 wd 0.0500 time 0.3392 (0.3831) data time 0.0012 (0.0150) model time 0.0000 (0.0000) loss 2.8625 (3.3214) grad_norm 1.0999 (1.6351) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:48:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][40/625] eta 0:03:41 lr 0.001129 wd 0.0500 time 0.3432 (0.3787) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 3.6503 (3.2899) grad_norm 1.0602 (1.5336) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:48:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][50/625] eta 0:03:33 lr 0.001129 wd 0.0500 time 0.3409 (0.3717) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 2.9078 (3.3507) grad_norm 1.5257 (1.5264) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][60/625] eta 0:03:27 lr 0.001129 wd 0.0500 time 0.3369 (0.3669) data time 0.0010 (0.0081) model time 0.3359 (0.3416) loss 2.9757 (3.3374) grad_norm 1.2702 (1.5356) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][70/625] eta 0:03:21 lr 0.001129 wd 0.0500 time 0.3435 (0.3631) data time 0.0008 (0.0071) model time 0.3427 (0.3403) loss 2.8577 (3.3231) grad_norm 1.5003 (1.5456) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][80/625] eta 0:03:16 lr 0.001129 wd 0.0500 time 0.3350 (0.3604) data time 0.0008 (0.0064) model time 0.3342 (0.3401) loss 4.4927 (3.3332) grad_norm 1.9283 (1.5619) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][90/625] eta 0:03:11 lr 0.001129 wd 0.0500 time 0.3387 (0.3585) data time 0.0010 (0.0058) model time 0.3378 (0.3406) loss 3.3768 (3.3041) grad_norm 1.6451 (1.5520) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][100/625] eta 0:03:09 lr 0.001129 wd 0.0500 time 0.3429 (0.3611) data time 0.0008 (0.0053) model time 0.3422 (0.3492) loss 3.5800 (3.2995) grad_norm 1.3523 (1.5545) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][110/625] eta 0:03:05 lr 0.001129 wd 0.0500 time 0.3437 (0.3595) data time 0.0008 (0.0049) model time 0.3429 (0.3482) loss 3.7056 (3.3314) grad_norm 1.3461 (1.5658) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][120/625] eta 0:03:00 lr 0.001128 wd 0.0500 time 0.3336 (0.3582) data time 0.0009 (0.0046) model time 0.3327 (0.3474) loss 3.8802 (3.3361) grad_norm 1.5553 (1.5541) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][130/625] eta 0:02:56 lr 0.001128 wd 0.0500 time 0.3394 (0.3570) data time 0.0007 (0.0043) model time 0.3386 (0.3466) loss 3.6422 (3.3294) grad_norm 1.2626 (1.5351) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][140/625] eta 0:02:52 lr 0.001128 wd 0.0500 time 0.3403 (0.3560) data time 0.0010 (0.0041) model time 0.3393 (0.3461) loss 3.8180 (3.3401) grad_norm 1.3439 (1.5543) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][150/625] eta 0:02:48 lr 0.001128 wd 0.0500 time 0.3449 (0.3550) data time 0.0011 (0.0039) model time 0.3438 (0.3455) loss 3.9375 (3.3427) grad_norm 1.4248 (1.5333) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][160/625] eta 0:02:44 lr 0.001128 wd 0.0500 time 0.3383 (0.3542) data time 0.0010 (0.0037) model time 0.3373 (0.3451) loss 3.1497 (3.3379) grad_norm 1.8326 (1.5319) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][170/625] eta 0:02:40 lr 0.001128 wd 0.0500 time 0.3540 (0.3537) data time 0.0007 (0.0036) model time 0.3533 (0.3450) loss 2.4105 (3.3426) grad_norm 1.2246 (1.5398) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][180/625] eta 0:02:37 lr 0.001128 wd 0.0500 time 0.3464 (0.3532) data time 0.0008 (0.0034) model time 0.3456 (0.3449) loss 2.4267 (3.3395) grad_norm 2.0507 (1.5480) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][190/625] eta 0:02:33 lr 0.001128 wd 0.0500 time 0.3381 (0.3526) data time 0.0008 (0.0033) model time 0.3373 (0.3447) loss 3.5338 (3.3275) grad_norm 1.3382 (1.5375) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][200/625] eta 0:02:29 lr 0.001128 wd 0.0500 time 0.3435 (0.3520) data time 0.0010 (0.0032) model time 0.3425 (0.3444) loss 2.4448 (3.3175) grad_norm 1.0590 (1.5336) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][210/625] eta 0:02:25 lr 0.001128 wd 0.0500 time 0.3406 (0.3517) data time 0.0008 (0.0031) model time 0.3398 (0.3443) loss 4.0299 (3.3189) grad_norm 1.4405 (1.5264) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][220/625] eta 0:02:22 lr 0.001128 wd 0.0500 time 0.3417 (0.3513) data time 0.0008 (0.0030) model time 0.3409 (0.3442) loss 3.6706 (3.3120) grad_norm 1.2161 (1.5156) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][230/625] eta 0:02:18 lr 0.001128 wd 0.0500 time 0.3569 (0.3511) data time 0.0010 (0.0029) model time 0.3558 (0.3442) loss 3.7659 (3.3148) grad_norm 1.4140 (1.5033) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][240/625] eta 0:02:15 lr 0.001128 wd 0.0500 time 0.3650 (0.3508) data time 0.0011 (0.0028) model time 0.3639 (0.3441) loss 4.0238 (3.3147) grad_norm 1.9041 (1.5041) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][250/625] eta 0:02:11 lr 0.001128 wd 0.0500 time 0.3431 (0.3505) data time 0.0010 (0.0028) model time 0.3421 (0.3440) loss 2.7601 (3.3171) grad_norm 1.4457 (1.5046) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][260/625] eta 0:02:08 lr 0.001128 wd 0.0500 time 0.3417 (0.3508) data time 0.0008 (0.0027) model time 0.3410 (0.3447) loss 4.3646 (3.3146) grad_norm 1.6214 (1.5108) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][270/625] eta 0:02:04 lr 0.001128 wd 0.0500 time 0.3406 (0.3506) data time 0.0011 (0.0026) model time 0.3395 (0.3447) loss 3.9296 (3.3130) grad_norm 1.0808 (1.5084) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][280/625] eta 0:02:00 lr 0.001128 wd 0.0500 time 0.3455 (0.3504) data time 0.0011 (0.0026) model time 0.3445 (0.3446) loss 3.9455 (3.3152) grad_norm 2.2705 (1.5041) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][290/625] eta 0:01:57 lr 0.001128 wd 0.0500 time 0.3432 (0.3501) data time 0.0008 (0.0025) model time 0.3424 (0.3445) loss 3.8169 (3.3153) grad_norm 1.0839 (1.5019) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][300/625] eta 0:01:53 lr 0.001128 wd 0.0500 time 0.3495 (0.3499) data time 0.0010 (0.0025) model time 0.3485 (0.3444) loss 3.6413 (3.3244) grad_norm 1.4305 (1.5023) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][310/625] eta 0:01:50 lr 0.001127 wd 0.0500 time 0.3354 (0.3499) data time 0.0008 (0.0024) model time 0.3346 (0.3445) loss 2.3260 (3.3250) grad_norm 2.0399 (1.5105) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][320/625] eta 0:01:46 lr 0.001127 wd 0.0500 time 0.3423 (0.3508) data time 0.0010 (0.0024) model time 0.3413 (0.3458) loss 3.6815 (3.3251) grad_norm 1.7412 (1.5059) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][330/625] eta 0:01:43 lr 0.001127 wd 0.0500 time 0.3392 (0.3508) data time 0.0008 (0.0023) model time 0.3383 (0.3459) loss 3.8108 (3.3204) grad_norm 1.4543 (1.5037) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][340/625] eta 0:01:39 lr 0.001127 wd 0.0500 time 0.3385 (0.3504) data time 0.0010 (0.0023) model time 0.3375 (0.3457) loss 3.9212 (3.3212) grad_norm 1.3865 (1.5011) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][350/625] eta 0:01:36 lr 0.001127 wd 0.0500 time 0.3396 (0.3502) data time 0.0010 (0.0023) model time 0.3385 (0.3455) loss 3.6838 (3.3300) grad_norm 1.3043 (1.4990) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][360/625] eta 0:01:32 lr 0.001127 wd 0.0500 time 0.3351 (0.3500) data time 0.0010 (0.0022) model time 0.3341 (0.3454) loss 2.3418 (3.3251) grad_norm 1.9808 (1.5049) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][370/625] eta 0:01:29 lr 0.001127 wd 0.0500 time 0.3416 (0.3498) data time 0.0011 (0.0022) model time 0.3406 (0.3453) loss 3.4152 (3.3269) grad_norm 1.4312 (1.5102) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][380/625] eta 0:01:25 lr 0.001127 wd 0.0500 time 0.3440 (0.3497) data time 0.0010 (0.0022) model time 0.3430 (0.3453) loss 3.0445 (3.3247) grad_norm 1.4668 (1.5054) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][390/625] eta 0:01:22 lr 0.001127 wd 0.0500 time 0.3436 (0.3495) data time 0.0008 (0.0022) model time 0.3428 (0.3452) loss 2.7766 (3.3306) grad_norm 1.2852 (1.4976) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][400/625] eta 0:01:18 lr 0.001127 wd 0.0500 time 0.3432 (0.3495) data time 0.0010 (0.0022) model time 0.3422 (0.3452) loss 3.1140 (3.3226) grad_norm 1.3766 (1.4925) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][410/625] eta 0:01:15 lr 0.001127 wd 0.0500 time 0.3486 (0.3495) data time 0.0009 (0.0021) model time 0.3477 (0.3452) loss 3.8175 (3.3234) grad_norm 1.3876 (1.4909) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][420/625] eta 0:01:11 lr 0.001127 wd 0.0500 time 0.3419 (0.3493) data time 0.0010 (0.0021) model time 0.3409 (0.3451) loss 3.4514 (3.3284) grad_norm 1.1536 (1.4883) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][430/625] eta 0:01:08 lr 0.001127 wd 0.0500 time 0.3418 (0.3492) data time 0.0011 (0.0021) model time 0.3407 (0.3450) loss 3.6477 (3.3288) grad_norm 1.1823 (1.4839) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][440/625] eta 0:01:04 lr 0.001127 wd 0.0500 time 0.3465 (0.3490) data time 0.0008 (0.0021) model time 0.3458 (0.3449) loss 2.6303 (3.3308) grad_norm 1.3199 (1.4795) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][450/625] eta 0:01:01 lr 0.001127 wd 0.0500 time 0.3400 (0.3489) data time 0.0009 (0.0020) model time 0.3390 (0.3449) loss 2.6097 (3.3245) grad_norm 1.1289 (1.4788) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][460/625] eta 0:00:57 lr 0.001127 wd 0.0500 time 0.3488 (0.3488) data time 0.0008 (0.0020) model time 0.3480 (0.3448) loss 3.4062 (3.3266) grad_norm 0.9986 (1.4732) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][470/625] eta 0:00:54 lr 0.001127 wd 0.0500 time 0.3409 (0.3486) data time 0.0008 (0.0020) model time 0.3401 (0.3447) loss 4.0676 (3.3256) grad_norm 1.4621 (1.4701) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][480/625] eta 0:00:50 lr 0.001127 wd 0.0500 time 0.3502 (0.3489) data time 0.0010 (0.0020) model time 0.3492 (0.3451) loss 3.3768 (3.3266) grad_norm 1.4571 (1.4691) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][490/625] eta 0:00:47 lr 0.001127 wd 0.0500 time 0.3405 (0.3488) data time 0.0010 (0.0020) model time 0.3395 (0.3451) loss 3.3455 (3.3295) grad_norm 1.5065 (1.4665) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][500/625] eta 0:00:43 lr 0.001127 wd 0.0500 time 0.3451 (0.3486) data time 0.0011 (0.0019) model time 0.3441 (0.3449) loss 3.7861 (3.3318) grad_norm 1.1255 (1.4664) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][510/625] eta 0:00:40 lr 0.001126 wd 0.0500 time 0.3392 (0.3485) data time 0.0010 (0.0019) model time 0.3382 (0.3448) loss 3.9656 (3.3323) grad_norm 1.7765 (1.4656) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][520/625] eta 0:00:36 lr 0.001126 wd 0.0500 time 0.3338 (0.3483) data time 0.0008 (0.0019) model time 0.3330 (0.3447) loss 2.8315 (3.3287) grad_norm 1.8387 (1.4620) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][530/625] eta 0:00:33 lr 0.001126 wd 0.0500 time 0.3340 (0.3482) data time 0.0009 (0.0019) model time 0.3331 (0.3446) loss 3.5237 (3.3292) grad_norm 1.2418 (1.4645) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][540/625] eta 0:00:29 lr 0.001126 wd 0.0500 time 0.4768 (0.3488) data time 0.0007 (0.0019) model time 0.4761 (0.3453) loss 3.9718 (3.3249) grad_norm 1.3362 (1.4650) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][550/625] eta 0:00:26 lr 0.001126 wd 0.0500 time 0.3405 (0.3486) data time 0.0011 (0.0019) model time 0.3394 (0.3452) loss 3.9141 (3.3313) grad_norm 1.2742 (1.4634) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][560/625] eta 0:00:22 lr 0.001126 wd 0.0500 time 0.3440 (0.3485) data time 0.0010 (0.0019) model time 0.3430 (0.3451) loss 3.1706 (3.3393) grad_norm 1.4373 (1.4676) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:51:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][570/625] eta 0:00:19 lr 0.001126 wd 0.0500 time 0.3361 (0.3483) data time 0.0011 (0.0018) model time 0.3350 (0.3450) loss 3.3250 (3.3404) grad_norm 1.1455 (1.4641) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][580/625] eta 0:00:15 lr 0.001126 wd 0.0500 time 0.3367 (0.3482) data time 0.0008 (0.0018) model time 0.3358 (0.3448) loss 3.2666 (3.3440) grad_norm 1.1796 (1.4587) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][590/625] eta 0:00:12 lr 0.001126 wd 0.0500 time 0.3441 (0.3481) data time 0.0008 (0.0018) model time 0.3433 (0.3448) loss 2.8749 (3.3434) grad_norm 1.5955 (1.4568) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][600/625] eta 0:00:08 lr 0.001126 wd 0.0500 time 0.3394 (0.3480) data time 0.0009 (0.0018) model time 0.3385 (0.3447) loss 3.0756 (3.3421) grad_norm 1.3294 (1.4563) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][610/625] eta 0:00:05 lr 0.001126 wd 0.0500 time 0.3349 (0.3478) data time 0.0005 (0.0018) model time 0.3344 (0.3445) loss 2.5718 (3.3405) grad_norm 1.1751 (1.4555) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][620/625] eta 0:00:01 lr 0.001126 wd 0.0500 time 0.3238 (0.3475) data time 0.0008 (0.0018) model time 0.3230 (0.3442) loss 3.2514 (3.3429) grad_norm 1.2434 (1.4547) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 64 training takes 0:03:37 [2024-07-24 15:52:15 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 15:52:16 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 15:52:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.480 (0.480) Loss 0.7280 (0.7280) Acc@1 85.791 (85.791) Acc@5 97.266 (97.266) Mem 14261MB [2024-07-24 15:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.126) Loss 1.2324 (0.8865) Acc@1 72.656 (81.809) Acc@5 92.285 (96.320) Mem 14261MB [2024-07-24 15:52:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.088 (0.107) Loss 1.3008 (1.0566) Acc@1 70.947 (77.860) Acc@5 91.895 (94.278) Mem 14261MB [2024-07-24 15:52:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.533 Acc@5 94.248 [2024-07-24 15:52:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.5% [2024-07-24 15:52:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.907 (0.907) Loss 0.7061 (0.7061) Acc@1 85.986 (85.986) Acc@5 97.412 (97.412) Mem 14261MB [2024-07-24 15:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.169) Loss 1.1289 (0.8618) Acc@1 74.463 (81.956) Acc@5 93.311 (96.325) Mem 14261MB [2024-07-24 15:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.130) Loss 1.3281 (1.0296) Acc@1 69.189 (77.723) Acc@5 90.576 (94.189) Mem 14261MB [2024-07-24 15:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.411 Acc@5 94.184 [2024-07-24 15:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.4% [2024-07-24 15:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.41% [2024-07-24 15:52:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 15:52:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 15:52:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][0/625] eta 0:08:20 lr 0.001126 wd 0.0500 time 0.8005 (0.8005) data time 0.4749 (0.4749) model time 0.0000 (0.0000) loss 3.2662 (3.2662) grad_norm 1.7358 (1.7358) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][10/625] eta 0:04:07 lr 0.001126 wd 0.0500 time 0.4186 (0.4030) data time 0.0010 (0.0442) model time 0.0000 (0.0000) loss 3.1352 (3.4395) grad_norm 1.4050 (1.4809) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][20/625] eta 0:03:51 lr 0.001126 wd 0.0500 time 0.3414 (0.3827) data time 0.0008 (0.0236) model time 0.0000 (0.0000) loss 2.8206 (3.4510) grad_norm 1.8315 (1.4989) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][30/625] eta 0:03:40 lr 0.001126 wd 0.0500 time 0.3355 (0.3698) data time 0.0008 (0.0163) model time 0.0000 (0.0000) loss 4.0505 (3.5369) grad_norm 1.9907 (1.4609) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][40/625] eta 0:03:32 lr 0.001126 wd 0.0500 time 0.3441 (0.3632) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 3.9492 (3.5380) grad_norm 1.3223 (1.4208) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][50/625] eta 0:03:27 lr 0.001126 wd 0.0500 time 0.3350 (0.3606) data time 0.0010 (0.0104) model time 0.0000 (0.0000) loss 3.6475 (3.5239) grad_norm 1.1432 (1.4286) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][60/625] eta 0:03:22 lr 0.001126 wd 0.0500 time 0.3367 (0.3581) data time 0.0008 (0.0088) model time 0.3359 (0.3448) loss 1.7633 (3.4357) grad_norm 1.3606 (1.4367) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][70/625] eta 0:03:17 lr 0.001126 wd 0.0500 time 0.3435 (0.3560) data time 0.0010 (0.0077) model time 0.3425 (0.3434) loss 2.0942 (3.3733) grad_norm 1.5160 (1.4342) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][80/625] eta 0:03:13 lr 0.001125 wd 0.0500 time 0.3505 (0.3546) data time 0.0008 (0.0069) model time 0.3497 (0.3434) loss 3.9331 (3.3818) grad_norm 1.1019 (1.4312) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][90/625] eta 0:03:08 lr 0.001125 wd 0.0500 time 0.3410 (0.3533) data time 0.0008 (0.0062) model time 0.3402 (0.3429) loss 4.0000 (3.3799) grad_norm 1.1404 (1.4031) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:52:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][100/625] eta 0:03:04 lr 0.001125 wd 0.0500 time 0.3456 (0.3520) data time 0.0011 (0.0057) model time 0.3445 (0.3423) loss 3.9139 (3.3794) grad_norm 1.8361 (1.4110) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][110/625] eta 0:03:00 lr 0.001125 wd 0.0500 time 0.3433 (0.3512) data time 0.0012 (0.0053) model time 0.3420 (0.3423) loss 4.1952 (3.3989) grad_norm 1.4621 (1.4057) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][120/625] eta 0:02:56 lr 0.001125 wd 0.0500 time 0.3365 (0.3505) data time 0.0010 (0.0050) model time 0.3355 (0.3420) loss 3.6097 (3.4045) grad_norm 1.7388 (1.4334) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][130/625] eta 0:02:53 lr 0.001125 wd 0.0500 time 0.3612 (0.3500) data time 0.0011 (0.0047) model time 0.3601 (0.3422) loss 3.7963 (3.4132) grad_norm 1.1737 (1.4499) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][140/625] eta 0:02:51 lr 0.001125 wd 0.0500 time 0.3438 (0.3535) data time 0.0009 (0.0044) model time 0.3429 (0.3484) loss 3.4466 (3.4141) grad_norm 1.3912 (1.4504) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][150/625] eta 0:02:47 lr 0.001125 wd 0.0500 time 0.3407 (0.3528) data time 0.0011 (0.0042) model time 0.3396 (0.3477) loss 3.2466 (3.4116) grad_norm 1.3783 (1.4404) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][160/625] eta 0:02:43 lr 0.001125 wd 0.0500 time 0.3411 (0.3521) data time 0.0008 (0.0040) model time 0.3403 (0.3471) loss 3.3927 (3.4246) grad_norm 1.1992 (1.4372) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][170/625] eta 0:02:39 lr 0.001125 wd 0.0500 time 0.3388 (0.3515) data time 0.0011 (0.0039) model time 0.3377 (0.3465) loss 3.1543 (3.4108) grad_norm 1.6145 (1.4365) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][180/625] eta 0:02:36 lr 0.001125 wd 0.0500 time 0.3451 (0.3510) data time 0.0008 (0.0037) model time 0.3444 (0.3462) loss 3.5796 (3.4018) grad_norm 1.1220 (1.4206) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][190/625] eta 0:02:32 lr 0.001125 wd 0.0500 time 0.3409 (0.3505) data time 0.0008 (0.0036) model time 0.3402 (0.3458) loss 2.6914 (3.3964) grad_norm 1.3189 (1.4045) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][200/625] eta 0:02:28 lr 0.001125 wd 0.0500 time 0.3435 (0.3500) data time 0.0009 (0.0034) model time 0.3426 (0.3453) loss 3.6646 (3.3923) grad_norm 1.4759 (1.4051) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][210/625] eta 0:02:25 lr 0.001125 wd 0.0500 time 0.3357 (0.3495) data time 0.0011 (0.0033) model time 0.3346 (0.3449) loss 3.3102 (3.3981) grad_norm 2.6142 (1.4111) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][220/625] eta 0:02:21 lr 0.001125 wd 0.0500 time 0.3343 (0.3491) data time 0.0008 (0.0032) model time 0.3335 (0.3446) loss 2.2616 (3.3886) grad_norm 1.1779 (1.4113) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][230/625] eta 0:02:17 lr 0.001125 wd 0.0500 time 0.3383 (0.3488) data time 0.0008 (0.0031) model time 0.3375 (0.3443) loss 3.3115 (3.3761) grad_norm 2.0411 (1.4155) loss_scale 16384.0000 (16384.0000) mem 14261MB [2024-07-24 15:53:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][240/625] eta 0:02:14 lr 0.001125 wd 0.0500 time 0.3394 (0.3490) data time 0.0010 (0.0030) model time 0.3384 (0.3448) loss 3.5026 (3.3577) grad_norm 1.2332 (inf) loss_scale 8192.0000 (16282.0249) mem 14261MB [2024-07-24 15:53:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][250/625] eta 0:02:10 lr 0.001125 wd 0.0500 time 0.3470 (0.3488) data time 0.0010 (0.0029) model time 0.3460 (0.3447) loss 3.0189 (3.3548) grad_norm 1.2395 (inf) loss_scale 8192.0000 (15959.7131) mem 14261MB [2024-07-24 15:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][260/625] eta 0:02:07 lr 0.001125 wd 0.0500 time 0.3435 (0.3485) data time 0.0011 (0.0029) model time 0.3424 (0.3446) loss 3.7453 (3.3597) grad_norm 1.9730 (inf) loss_scale 8192.0000 (15662.0996) mem 14261MB [2024-07-24 15:53:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][270/625] eta 0:02:03 lr 0.001124 wd 0.0500 time 0.3370 (0.3483) data time 0.0010 (0.0028) model time 0.3361 (0.3443) loss 3.4931 (3.3669) grad_norm 1.3912 (inf) loss_scale 8192.0000 (15386.4502) mem 14261MB [2024-07-24 15:54:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][280/625] eta 0:02:00 lr 0.001124 wd 0.0500 time 0.3492 (0.3480) data time 0.0008 (0.0027) model time 0.3484 (0.3442) loss 2.2431 (3.3628) grad_norm 1.4870 (inf) loss_scale 8192.0000 (15130.4199) mem 14261MB [2024-07-24 15:54:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][290/625] eta 0:01:56 lr 0.001124 wd 0.0500 time 0.3418 (0.3480) data time 0.0009 (0.0027) model time 0.3409 (0.3442) loss 2.4711 (3.3607) grad_norm 1.4828 (inf) loss_scale 8192.0000 (14891.9863) mem 14261MB [2024-07-24 15:54:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][300/625] eta 0:01:53 lr 0.001124 wd 0.0500 time 0.3493 (0.3481) data time 0.0010 (0.0026) model time 0.3483 (0.3445) loss 3.2881 (3.3576) grad_norm 1.4798 (inf) loss_scale 8192.0000 (14669.3953) mem 14261MB [2024-07-24 15:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][310/625] eta 0:01:49 lr 0.001124 wd 0.0500 time 0.3395 (0.3478) data time 0.0009 (0.0026) model time 0.3386 (0.3443) loss 2.4313 (3.3535) grad_norm 1.1858 (inf) loss_scale 8192.0000 (14461.1190) mem 14261MB [2024-07-24 15:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][320/625] eta 0:01:46 lr 0.001124 wd 0.0500 time 0.3484 (0.3477) data time 0.0008 (0.0025) model time 0.3476 (0.3442) loss 3.5493 (3.3456) grad_norm 1.0734 (inf) loss_scale 8192.0000 (14265.8193) mem 14261MB [2024-07-24 15:54:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][330/625] eta 0:01:42 lr 0.001124 wd 0.0500 time 0.3365 (0.3475) data time 0.0007 (0.0025) model time 0.3358 (0.3441) loss 2.3655 (3.3409) grad_norm 1.0384 (inf) loss_scale 8192.0000 (14082.3202) mem 14261MB [2024-07-24 15:54:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][340/625] eta 0:01:38 lr 0.001124 wd 0.0500 time 0.3339 (0.3474) data time 0.0014 (0.0024) model time 0.3325 (0.3440) loss 3.1066 (3.3477) grad_norm 1.0456 (inf) loss_scale 8192.0000 (13909.5836) mem 14261MB [2024-07-24 15:54:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][350/625] eta 0:01:35 lr 0.001124 wd 0.0500 time 0.3398 (0.3471) data time 0.0008 (0.0024) model time 0.3390 (0.3438) loss 3.8326 (3.3587) grad_norm 1.2968 (inf) loss_scale 8192.0000 (13746.6895) mem 14261MB [2024-07-24 15:54:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][360/625] eta 0:01:32 lr 0.001124 wd 0.0500 time 0.3422 (0.3480) data time 0.0011 (0.0024) model time 0.3411 (0.3449) loss 3.0756 (3.3632) grad_norm 1.1372 (inf) loss_scale 8192.0000 (13592.8199) mem 14261MB [2024-07-24 15:54:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][370/625] eta 0:01:28 lr 0.001124 wd 0.0500 time 0.3333 (0.3478) data time 0.0010 (0.0023) model time 0.3324 (0.3448) loss 2.7022 (3.3576) grad_norm 1.7626 (inf) loss_scale 8192.0000 (13447.2453) mem 14261MB [2024-07-24 15:54:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][380/625] eta 0:01:25 lr 0.001124 wd 0.0500 time 0.3376 (0.3477) data time 0.0011 (0.0023) model time 0.3365 (0.3447) loss 3.5825 (3.3589) grad_norm 0.9691 (inf) loss_scale 8192.0000 (13309.3123) mem 14261MB [2024-07-24 15:54:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][390/625] eta 0:01:21 lr 0.001124 wd 0.0500 time 0.3499 (0.3476) data time 0.0008 (0.0023) model time 0.3491 (0.3446) loss 3.4934 (3.3571) grad_norm 1.2803 (inf) loss_scale 8192.0000 (13178.4348) mem 14261MB [2024-07-24 15:54:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][400/625] eta 0:01:18 lr 0.001124 wd 0.0500 time 0.3417 (0.3474) data time 0.0010 (0.0022) model time 0.3407 (0.3445) loss 2.5833 (3.3445) grad_norm 1.7427 (inf) loss_scale 8192.0000 (13054.0848) mem 14261MB [2024-07-24 15:54:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][410/625] eta 0:01:14 lr 0.001124 wd 0.0500 time 0.3405 (0.3472) data time 0.0012 (0.0022) model time 0.3393 (0.3443) loss 2.8262 (3.3354) grad_norm 2.4356 (inf) loss_scale 8192.0000 (12935.7859) mem 14261MB [2024-07-24 15:54:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][420/625] eta 0:01:11 lr 0.001124 wd 0.0500 time 0.3379 (0.3471) data time 0.0010 (0.0022) model time 0.3369 (0.3442) loss 3.3742 (3.3372) grad_norm 1.1315 (inf) loss_scale 8192.0000 (12823.1069) mem 14261MB [2024-07-24 15:54:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][430/625] eta 0:01:07 lr 0.001124 wd 0.0500 time 0.3450 (0.3470) data time 0.0010 (0.0021) model time 0.3440 (0.3441) loss 3.9467 (3.3372) grad_norm 1.4411 (inf) loss_scale 8192.0000 (12715.6566) mem 14261MB [2024-07-24 15:54:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][440/625] eta 0:01:04 lr 0.001124 wd 0.0500 time 0.3449 (0.3468) data time 0.0008 (0.0021) model time 0.3440 (0.3440) loss 3.1908 (3.3354) grad_norm 1.0425 (inf) loss_scale 8192.0000 (12613.0794) mem 14261MB [2024-07-24 15:54:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][450/625] eta 0:01:00 lr 0.001124 wd 0.0500 time 0.3407 (0.3467) data time 0.0008 (0.0021) model time 0.3399 (0.3439) loss 3.9956 (3.3436) grad_norm 2.0476 (inf) loss_scale 8192.0000 (12515.0510) mem 14261MB [2024-07-24 15:55:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][460/625] eta 0:00:57 lr 0.001123 wd 0.0500 time 0.3381 (0.3471) data time 0.0008 (0.0021) model time 0.3373 (0.3443) loss 3.3893 (3.3443) grad_norm 1.6560 (inf) loss_scale 8192.0000 (12421.2755) mem 14261MB [2024-07-24 15:55:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][470/625] eta 0:00:53 lr 0.001123 wd 0.0500 time 0.3345 (0.3470) data time 0.0008 (0.0020) model time 0.3337 (0.3443) loss 2.4610 (3.3411) grad_norm 1.3385 (inf) loss_scale 8192.0000 (12331.4820) mem 14261MB [2024-07-24 15:55:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][480/625] eta 0:00:50 lr 0.001123 wd 0.0500 time 0.3375 (0.3468) data time 0.0011 (0.0020) model time 0.3365 (0.3442) loss 3.7271 (3.3484) grad_norm 1.1320 (inf) loss_scale 8192.0000 (12245.4220) mem 14261MB [2024-07-24 15:55:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][490/625] eta 0:00:46 lr 0.001123 wd 0.0500 time 0.3382 (0.3467) data time 0.0010 (0.0020) model time 0.3372 (0.3441) loss 3.0049 (3.3463) grad_norm 1.1857 (inf) loss_scale 8192.0000 (12162.8676) mem 14261MB [2024-07-24 15:55:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][500/625] eta 0:00:43 lr 0.001123 wd 0.0500 time 0.3397 (0.3466) data time 0.0011 (0.0020) model time 0.3386 (0.3440) loss 2.7221 (3.3422) grad_norm 1.6484 (inf) loss_scale 8192.0000 (12083.6088) mem 14261MB [2024-07-24 15:55:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][510/625] eta 0:00:39 lr 0.001123 wd 0.0500 time 0.3507 (0.3465) data time 0.0007 (0.0020) model time 0.3500 (0.3439) loss 4.1038 (3.3419) grad_norm 1.3785 (inf) loss_scale 8192.0000 (12007.4521) mem 14261MB [2024-07-24 15:55:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][520/625] eta 0:00:36 lr 0.001123 wd 0.0500 time 0.3377 (0.3464) data time 0.0009 (0.0020) model time 0.3368 (0.3439) loss 3.5570 (3.3466) grad_norm 1.6643 (inf) loss_scale 8192.0000 (11934.2188) mem 14261MB [2024-07-24 15:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][530/625] eta 0:00:32 lr 0.001123 wd 0.0500 time 0.3413 (0.3464) data time 0.0008 (0.0019) model time 0.3405 (0.3438) loss 3.6297 (3.3464) grad_norm 1.6101 (inf) loss_scale 8192.0000 (11863.7439) mem 14261MB [2024-07-24 15:55:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][540/625] eta 0:00:29 lr 0.001123 wd 0.0500 time 0.3524 (0.3463) data time 0.0008 (0.0019) model time 0.3516 (0.3438) loss 3.4884 (3.3433) grad_norm 1.0766 (inf) loss_scale 8192.0000 (11795.8743) mem 14261MB [2024-07-24 15:55:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][550/625] eta 0:00:25 lr 0.001123 wd 0.0500 time 0.3416 (0.3463) data time 0.0011 (0.0019) model time 0.3405 (0.3438) loss 3.1782 (3.3482) grad_norm 0.9477 (inf) loss_scale 8192.0000 (11730.4682) mem 14261MB [2024-07-24 15:55:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][560/625] eta 0:00:22 lr 0.001123 wd 0.0500 time 0.3449 (0.3462) data time 0.0007 (0.0019) model time 0.3442 (0.3437) loss 2.5324 (3.3464) grad_norm 1.2991 (inf) loss_scale 8192.0000 (11667.3939) mem 14261MB [2024-07-24 15:55:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][570/625] eta 0:00:19 lr 0.001123 wd 0.0500 time 0.3621 (0.3462) data time 0.0010 (0.0019) model time 0.3611 (0.3437) loss 3.6216 (3.3480) grad_norm 2.1831 (inf) loss_scale 8192.0000 (11606.5289) mem 14261MB [2024-07-24 15:55:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][580/625] eta 0:00:15 lr 0.001123 wd 0.0500 time 0.3347 (0.3468) data time 0.0010 (0.0019) model time 0.3337 (0.3444) loss 3.9876 (3.3523) grad_norm 2.5996 (inf) loss_scale 8192.0000 (11547.7590) mem 14261MB [2024-07-24 15:55:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][590/625] eta 0:00:12 lr 0.001123 wd 0.0500 time 0.3434 (0.3471) data time 0.0008 (0.0019) model time 0.3427 (0.3447) loss 2.7231 (3.3541) grad_norm 1.2805 (inf) loss_scale 8192.0000 (11490.9780) mem 14261MB [2024-07-24 15:55:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][600/625] eta 0:00:08 lr 0.001123 wd 0.0500 time 0.3468 (0.3471) data time 0.0008 (0.0018) model time 0.3460 (0.3448) loss 2.5315 (3.3505) grad_norm 1.3481 (inf) loss_scale 8192.0000 (11436.0865) mem 14261MB [2024-07-24 15:55:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][610/625] eta 0:00:05 lr 0.001123 wd 0.0500 time 0.3279 (0.3469) data time 0.0005 (0.0018) model time 0.3273 (0.3446) loss 3.5907 (3.3510) grad_norm 1.1751 (inf) loss_scale 8192.0000 (11382.9918) mem 14261MB [2024-07-24 15:55:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][620/625] eta 0:00:01 lr 0.001123 wd 0.0500 time 0.3265 (0.3467) data time 0.0007 (0.0018) model time 0.3258 (0.3444) loss 3.2317 (3.3476) grad_norm 1.1787 (inf) loss_scale 8192.0000 (11331.6071) mem 14261MB [2024-07-24 15:56:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 65 training takes 0:03:36 [2024-07-24 15:56:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 15:56:01 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 15:56:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.496 (0.496) Loss 0.7632 (0.7632) Acc@1 86.475 (86.475) Acc@5 97.412 (97.412) Mem 14261MB [2024-07-24 15:56:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.130) Loss 1.1963 (0.9142) Acc@1 74.805 (81.849) Acc@5 93.066 (96.351) Mem 14261MB [2024-07-24 15:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.109) Loss 1.3066 (1.0803) Acc@1 71.338 (77.830) Acc@5 91.943 (94.445) Mem 14261MB [2024-07-24 15:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.551 Acc@5 94.410 [2024-07-24 15:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.6% [2024-07-24 15:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.55% [2024-07-24 15:56:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 15:56:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 15:56:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.7012 (0.7012) Acc@1 85.986 (85.986) Acc@5 97.607 (97.607) Mem 14261MB [2024-07-24 15:56:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.125) Loss 1.1211 (0.8551) Acc@1 74.463 (82.156) Acc@5 93.506 (96.422) Mem 14261MB [2024-07-24 15:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.107) Loss 1.3174 (1.0214) Acc@1 69.434 (77.962) Acc@5 90.820 (94.310) Mem 14261MB [2024-07-24 15:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.631 Acc@5 94.294 [2024-07-24 15:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.6% [2024-07-24 15:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.63% [2024-07-24 15:56:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 15:56:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 15:56:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][0/625] eta 0:08:21 lr 0.001123 wd 0.0500 time 0.8024 (0.8024) data time 0.4798 (0.4798) model time 0.0000 (0.0000) loss 3.4755 (3.4755) grad_norm 1.2435 (1.2435) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][10/625] eta 0:04:02 lr 0.001123 wd 0.0500 time 0.3541 (0.3946) data time 0.0008 (0.0445) model time 0.0000 (0.0000) loss 3.2984 (3.5112) grad_norm 1.7565 (1.2924) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][20/625] eta 0:03:46 lr 0.001123 wd 0.0500 time 0.3388 (0.3736) data time 0.0010 (0.0252) model time 0.0000 (0.0000) loss 3.5638 (3.3098) grad_norm 1.3292 (1.3991) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][30/625] eta 0:03:37 lr 0.001122 wd 0.0500 time 0.3806 (0.3650) data time 0.0010 (0.0174) model time 0.0000 (0.0000) loss 3.9065 (3.3065) grad_norm 1.4997 (1.4060) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][40/625] eta 0:03:31 lr 0.001122 wd 0.0500 time 0.3426 (0.3619) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 3.1793 (3.2309) grad_norm 1.4455 (1.4920) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][50/625] eta 0:03:26 lr 0.001122 wd 0.0500 time 0.3395 (0.3587) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 2.0800 (3.1797) grad_norm 1.7097 (1.4691) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][60/625] eta 0:03:21 lr 0.001122 wd 0.0500 time 0.3416 (0.3564) data time 0.0009 (0.0094) model time 0.3407 (0.3439) loss 2.9484 (3.2018) grad_norm 2.9449 (1.5344) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][70/625] eta 0:03:16 lr 0.001122 wd 0.0500 time 0.3397 (0.3542) data time 0.0007 (0.0082) model time 0.3390 (0.3419) loss 2.8454 (3.2092) grad_norm 1.7126 (1.5283) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][80/625] eta 0:03:13 lr 0.001122 wd 0.0500 time 0.3386 (0.3543) data time 0.0008 (0.0073) model time 0.3378 (0.3457) loss 4.0958 (3.1998) grad_norm 1.5738 (1.5051) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][90/625] eta 0:03:09 lr 0.001122 wd 0.0500 time 0.3431 (0.3536) data time 0.0011 (0.0066) model time 0.3420 (0.3462) loss 2.7110 (3.1920) grad_norm 1.2838 (1.4991) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][100/625] eta 0:03:05 lr 0.001122 wd 0.0500 time 0.3438 (0.3525) data time 0.0010 (0.0061) model time 0.3427 (0.3450) loss 3.2943 (3.1930) grad_norm 1.1491 (1.4838) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][110/625] eta 0:03:01 lr 0.001122 wd 0.0500 time 0.3375 (0.3516) data time 0.0010 (0.0057) model time 0.3365 (0.3443) loss 2.4595 (3.1735) grad_norm 1.4243 (1.4710) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][120/625] eta 0:02:57 lr 0.001122 wd 0.0500 time 0.3443 (0.3507) data time 0.0008 (0.0053) model time 0.3434 (0.3437) loss 3.8316 (3.1869) grad_norm 1.6263 (1.4617) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][130/625] eta 0:02:53 lr 0.001122 wd 0.0500 time 0.3409 (0.3502) data time 0.0009 (0.0050) model time 0.3400 (0.3436) loss 4.0458 (3.2041) grad_norm 1.1865 (1.4556) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:56:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][140/625] eta 0:02:49 lr 0.001122 wd 0.0500 time 0.3414 (0.3498) data time 0.0008 (0.0047) model time 0.3406 (0.3436) loss 2.8754 (3.2082) grad_norm 1.4440 (1.4646) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][150/625] eta 0:02:45 lr 0.001122 wd 0.0500 time 0.3382 (0.3494) data time 0.0010 (0.0045) model time 0.3372 (0.3435) loss 3.8351 (3.2242) grad_norm 1.2695 (1.4719) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][160/625] eta 0:02:42 lr 0.001122 wd 0.0500 time 0.3407 (0.3491) data time 0.0009 (0.0043) model time 0.3397 (0.3436) loss 3.7711 (3.2311) grad_norm 1.2028 (1.4757) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][170/625] eta 0:02:38 lr 0.001122 wd 0.0500 time 0.3457 (0.3489) data time 0.0008 (0.0041) model time 0.3449 (0.3436) loss 3.0141 (3.2433) grad_norm 1.6370 (1.4670) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][180/625] eta 0:02:36 lr 0.001122 wd 0.0500 time 0.3452 (0.3506) data time 0.0008 (0.0039) model time 0.3444 (0.3463) loss 3.1014 (3.2504) grad_norm 1.9550 (1.4640) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][190/625] eta 0:02:32 lr 0.001122 wd 0.0500 time 0.3463 (0.3502) data time 0.0008 (0.0038) model time 0.3456 (0.3460) loss 3.1089 (3.2439) grad_norm 1.3466 (1.4645) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][200/625] eta 0:02:29 lr 0.001122 wd 0.0500 time 0.3357 (0.3506) data time 0.0008 (0.0036) model time 0.3350 (0.3467) loss 3.7650 (3.2536) grad_norm 1.4780 (1.4645) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][210/625] eta 0:02:25 lr 0.001122 wd 0.0500 time 0.3431 (0.3502) data time 0.0009 (0.0035) model time 0.3422 (0.3465) loss 3.7568 (3.2533) grad_norm 1.0942 (1.4614) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][220/625] eta 0:02:21 lr 0.001121 wd 0.0500 time 0.3425 (0.3500) data time 0.0008 (0.0034) model time 0.3416 (0.3463) loss 3.4886 (3.2655) grad_norm 1.5296 (1.4566) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][230/625] eta 0:02:18 lr 0.001121 wd 0.0500 time 0.3415 (0.3498) data time 0.0009 (0.0033) model time 0.3406 (0.3461) loss 3.6142 (3.2599) grad_norm 1.3097 (1.4571) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][240/625] eta 0:02:14 lr 0.001121 wd 0.0500 time 0.3337 (0.3493) data time 0.0009 (0.0032) model time 0.3328 (0.3457) loss 4.0320 (3.2621) grad_norm 1.5241 (1.4639) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][250/625] eta 0:02:10 lr 0.001121 wd 0.0500 time 0.3399 (0.3490) data time 0.0011 (0.0031) model time 0.3388 (0.3455) loss 3.3784 (3.2614) grad_norm 2.6406 (1.4716) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][260/625] eta 0:02:07 lr 0.001121 wd 0.0500 time 0.3424 (0.3488) data time 0.0011 (0.0030) model time 0.3414 (0.3453) loss 3.6163 (3.2735) grad_norm 1.1084 (1.4763) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][270/625] eta 0:02:03 lr 0.001121 wd 0.0500 time 0.3561 (0.3489) data time 0.0011 (0.0030) model time 0.3550 (0.3455) loss 4.0384 (3.2787) grad_norm 1.2421 (1.4722) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][280/625] eta 0:02:00 lr 0.001121 wd 0.0500 time 0.3348 (0.3487) data time 0.0010 (0.0029) model time 0.3338 (0.3454) loss 3.5438 (3.2826) grad_norm 1.9760 (1.4716) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][290/625] eta 0:01:56 lr 0.001121 wd 0.0500 time 0.3398 (0.3485) data time 0.0008 (0.0029) model time 0.3390 (0.3452) loss 3.4665 (3.2838) grad_norm 1.6156 (1.4716) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][300/625] eta 0:01:53 lr 0.001121 wd 0.0500 time 0.3466 (0.3483) data time 0.0011 (0.0028) model time 0.3455 (0.3451) loss 3.9187 (3.2827) grad_norm 1.8415 (1.4810) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:57:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][310/625] eta 0:01:49 lr 0.001121 wd 0.0500 time 0.3384 (0.3481) data time 0.0009 (0.0027) model time 0.3375 (0.3449) loss 4.1008 (3.2857) grad_norm 1.2312 (1.4771) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][320/625] eta 0:01:46 lr 0.001121 wd 0.0500 time 0.3387 (0.3479) data time 0.0008 (0.0027) model time 0.3379 (0.3448) loss 2.1621 (3.2895) grad_norm 1.6458 (1.4709) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][330/625] eta 0:01:42 lr 0.001121 wd 0.0500 time 0.3401 (0.3478) data time 0.0010 (0.0026) model time 0.3390 (0.3447) loss 3.0269 (3.2832) grad_norm 1.7784 (1.4640) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][340/625] eta 0:01:39 lr 0.001121 wd 0.0500 time 0.3349 (0.3475) data time 0.0011 (0.0026) model time 0.3339 (0.3444) loss 2.9159 (3.2827) grad_norm 2.0149 (1.4641) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][350/625] eta 0:01:35 lr 0.001121 wd 0.0500 time 0.3475 (0.3473) data time 0.0009 (0.0026) model time 0.3466 (0.3443) loss 4.5205 (3.2770) grad_norm 1.8949 (1.4675) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][360/625] eta 0:01:32 lr 0.001121 wd 0.0500 time 0.3389 (0.3472) data time 0.0010 (0.0025) model time 0.3379 (0.3442) loss 4.2129 (3.2785) grad_norm 1.1630 (1.4664) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][370/625] eta 0:01:28 lr 0.001121 wd 0.0500 time 0.3426 (0.3470) data time 0.0009 (0.0025) model time 0.3417 (0.3441) loss 3.7711 (3.2794) grad_norm 1.0349 (1.4641) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][380/625] eta 0:01:25 lr 0.001121 wd 0.0500 time 0.3452 (0.3470) data time 0.0008 (0.0024) model time 0.3443 (0.3441) loss 3.8478 (3.2819) grad_norm 1.2575 (1.4617) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][390/625] eta 0:01:21 lr 0.001121 wd 0.0500 time 0.3412 (0.3469) data time 0.0010 (0.0024) model time 0.3401 (0.3440) loss 3.0059 (3.2840) grad_norm 0.9796 (1.4535) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][400/625] eta 0:01:18 lr 0.001121 wd 0.0500 time 0.3397 (0.3477) data time 0.0008 (0.0024) model time 0.3389 (0.3450) loss 3.2088 (3.2879) grad_norm 1.0781 (1.4497) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][410/625] eta 0:01:14 lr 0.001120 wd 0.0500 time 0.3346 (0.3476) data time 0.0009 (0.0023) model time 0.3337 (0.3449) loss 3.8634 (3.2852) grad_norm 1.5332 (1.4489) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][420/625] eta 0:01:11 lr 0.001120 wd 0.0500 time 0.3412 (0.3478) data time 0.0011 (0.0023) model time 0.3401 (0.3451) loss 3.3091 (3.2828) grad_norm 1.2648 (1.4515) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][430/625] eta 0:01:07 lr 0.001120 wd 0.0500 time 0.3349 (0.3476) data time 0.0010 (0.0023) model time 0.3339 (0.3450) loss 3.4146 (3.2803) grad_norm 1.5855 (1.4510) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][440/625] eta 0:01:04 lr 0.001120 wd 0.0500 time 0.3470 (0.3475) data time 0.0008 (0.0022) model time 0.3462 (0.3449) loss 3.9507 (3.2849) grad_norm 1.4096 (1.4536) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][450/625] eta 0:01:00 lr 0.001120 wd 0.0500 time 0.3574 (0.3474) data time 0.0011 (0.0022) model time 0.3563 (0.3449) loss 3.5159 (3.2912) grad_norm 2.4093 (1.4595) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][460/625] eta 0:00:57 lr 0.001120 wd 0.0500 time 0.3392 (0.3474) data time 0.0008 (0.0022) model time 0.3384 (0.3449) loss 3.5107 (3.2954) grad_norm 1.1897 (1.4591) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][470/625] eta 0:00:53 lr 0.001120 wd 0.0500 time 0.3372 (0.3474) data time 0.0010 (0.0022) model time 0.3362 (0.3449) loss 3.2702 (3.2950) grad_norm 1.2790 (1.4588) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][480/625] eta 0:00:50 lr 0.001120 wd 0.0500 time 0.3357 (0.3472) data time 0.0011 (0.0022) model time 0.3346 (0.3448) loss 3.3333 (3.2963) grad_norm 1.5110 (1.4549) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:58:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][490/625] eta 0:00:46 lr 0.001120 wd 0.0500 time 0.3312 (0.3471) data time 0.0010 (0.0021) model time 0.3302 (0.3447) loss 3.7177 (3.2964) grad_norm 1.8115 (1.4541) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:59:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][500/625] eta 0:00:43 lr 0.001120 wd 0.0500 time 0.3454 (0.3470) data time 0.0008 (0.0021) model time 0.3446 (0.3446) loss 3.3602 (3.2959) grad_norm 1.4465 (1.4505) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:59:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][510/625] eta 0:00:39 lr 0.001120 wd 0.0500 time 0.3415 (0.3470) data time 0.0008 (0.0021) model time 0.3407 (0.3446) loss 3.1263 (3.2974) grad_norm 1.3096 (1.4467) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:59:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][520/625] eta 0:00:36 lr 0.001120 wd 0.0500 time 0.3397 (0.3469) data time 0.0010 (0.0021) model time 0.3388 (0.3445) loss 3.7699 (3.2940) grad_norm 1.2835 (1.4466) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:59:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][530/625] eta 0:00:32 lr 0.001120 wd 0.0500 time 0.3391 (0.3469) data time 0.0008 (0.0021) model time 0.3383 (0.3445) loss 3.3206 (3.2958) grad_norm 2.2736 (1.4452) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 15:59:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][540/625] eta 0:00:29 lr 0.001120 wd 0.0500 time 0.3512 (0.3468) data time 0.0007 (0.0020) model time 0.3504 (0.3445) loss 2.0952 (3.2913) grad_norm 1.8014 (inf) loss_scale 4096.0000 (8161.7153) mem 14261MB [2024-07-24 15:59:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][550/625] eta 0:00:26 lr 0.001120 wd 0.0500 time 0.3351 (0.3468) data time 0.0010 (0.0020) model time 0.3341 (0.3445) loss 1.9017 (3.2890) grad_norm 1.3614 (inf) loss_scale 4096.0000 (8087.9274) mem 14261MB [2024-07-24 15:59:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][560/625] eta 0:00:22 lr 0.001120 wd 0.0500 time 0.3413 (0.3468) data time 0.0010 (0.0020) model time 0.3403 (0.3445) loss 3.5831 (3.2934) grad_norm 1.1888 (inf) loss_scale 4096.0000 (8016.7701) mem 14261MB [2024-07-24 15:59:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][570/625] eta 0:00:19 lr 0.001120 wd 0.0500 time 0.3381 (0.3467) data time 0.0008 (0.0020) model time 0.3373 (0.3445) loss 3.0731 (3.2967) grad_norm 1.5377 (inf) loss_scale 4096.0000 (7948.1051) mem 14261MB [2024-07-24 15:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][580/625] eta 0:00:15 lr 0.001120 wd 0.0500 time 0.3393 (0.3467) data time 0.0013 (0.0020) model time 0.3380 (0.3444) loss 3.5730 (3.3049) grad_norm 1.6978 (inf) loss_scale 4096.0000 (7881.8038) mem 14261MB [2024-07-24 15:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][590/625] eta 0:00:12 lr 0.001119 wd 0.0500 time 0.3389 (0.3466) data time 0.0008 (0.0020) model time 0.3382 (0.3444) loss 2.3296 (3.2969) grad_norm 1.1140 (inf) loss_scale 4096.0000 (7817.7462) mem 14261MB [2024-07-24 15:59:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][600/625] eta 0:00:08 lr 0.001119 wd 0.0500 time 0.3331 (0.3465) data time 0.0009 (0.0019) model time 0.3322 (0.3443) loss 2.9764 (3.2982) grad_norm 0.8720 (inf) loss_scale 4096.0000 (7755.8203) mem 14261MB [2024-07-24 15:59:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][610/625] eta 0:00:05 lr 0.001119 wd 0.0500 time 0.3305 (0.3464) data time 0.0005 (0.0019) model time 0.3300 (0.3442) loss 4.1158 (3.3020) grad_norm 1.1854 (inf) loss_scale 4096.0000 (7695.9214) mem 14261MB [2024-07-24 15:59:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][620/625] eta 0:00:01 lr 0.001119 wd 0.0500 time 0.4943 (0.3468) data time 0.0008 (0.0019) model time 0.4935 (0.3446) loss 2.9467 (3.3034) grad_norm 1.0559 (inf) loss_scale 4096.0000 (7637.9517) mem 14261MB [2024-07-24 15:59:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 66 training takes 0:03:36 [2024-07-24 15:59:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 15:59:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 15:59:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.492 (0.492) Loss 0.7446 (0.7446) Acc@1 85.840 (85.840) Acc@5 97.949 (97.949) Mem 14261MB [2024-07-24 15:59:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.129) Loss 1.1475 (0.8811) Acc@1 75.293 (82.160) Acc@5 93.115 (96.507) Mem 14261MB [2024-07-24 15:59:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.109) Loss 1.3379 (1.0473) Acc@1 71.094 (78.160) Acc@5 90.771 (94.392) Mem 14261MB [2024-07-24 15:59:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.863 Acc@5 94.326 [2024-07-24 15:59:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.9% [2024-07-24 15:59:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.86% [2024-07-24 15:59:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 15:59:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 15:59:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.6948 (0.6948) Acc@1 86.182 (86.182) Acc@5 97.607 (97.607) Mem 14261MB [2024-07-24 15:59:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.127) Loss 1.1113 (0.8485) Acc@1 74.951 (82.320) Acc@5 93.604 (96.449) Mem 14261MB [2024-07-24 15:59:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.108) Loss 1.3037 (1.0129) Acc@1 69.971 (78.169) Acc@5 91.211 (94.399) Mem 14261MB [2024-07-24 15:59:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.845 Acc@5 94.388 [2024-07-24 15:59:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.8% [2024-07-24 15:59:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.85% [2024-07-24 15:59:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 15:59:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 15:59:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][0/625] eta 0:07:49 lr 0.001119 wd 0.0500 time 0.7510 (0.7510) data time 0.4030 (0.4030) model time 0.0000 (0.0000) loss 3.9738 (3.9738) grad_norm 2.2154 (2.2154) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 15:59:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][10/625] eta 0:03:52 lr 0.001119 wd 0.0500 time 0.3467 (0.3773) data time 0.0009 (0.0376) model time 0.0000 (0.0000) loss 3.9387 (3.2382) grad_norm 1.3349 (1.4672) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][20/625] eta 0:03:37 lr 0.001119 wd 0.0500 time 0.3423 (0.3599) data time 0.0016 (0.0203) model time 0.0000 (0.0000) loss 3.5699 (3.3314) grad_norm 1.2405 (1.3356) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][30/625] eta 0:03:32 lr 0.001119 wd 0.0500 time 0.3434 (0.3569) data time 0.0010 (0.0141) model time 0.0000 (0.0000) loss 3.9604 (3.4128) grad_norm 0.9948 (1.5410) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][40/625] eta 0:03:26 lr 0.001119 wd 0.0500 time 0.3456 (0.3528) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 2.6642 (3.3465) grad_norm 1.2103 (1.5123) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][50/625] eta 0:03:21 lr 0.001119 wd 0.0500 time 0.3499 (0.3505) data time 0.0010 (0.0090) model time 0.0000 (0.0000) loss 3.7177 (3.3801) grad_norm 1.3253 (1.4942) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][60/625] eta 0:03:17 lr 0.001119 wd 0.0500 time 0.3468 (0.3490) data time 0.0008 (0.0077) model time 0.3460 (0.3405) loss 4.1611 (3.4529) grad_norm 1.4132 (1.4537) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][70/625] eta 0:03:13 lr 0.001119 wd 0.0500 time 0.3849 (0.3486) data time 0.0008 (0.0067) model time 0.3841 (0.3427) loss 3.8148 (3.4609) grad_norm 1.8566 (1.4359) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][80/625] eta 0:03:09 lr 0.001119 wd 0.0500 time 0.3398 (0.3475) data time 0.0011 (0.0060) model time 0.3387 (0.3413) loss 3.3773 (3.4316) grad_norm 1.4390 (1.4352) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][90/625] eta 0:03:05 lr 0.001119 wd 0.0500 time 0.3378 (0.3469) data time 0.0008 (0.0055) model time 0.3370 (0.3412) loss 3.8729 (3.4215) grad_norm 1.3478 (1.4264) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][100/625] eta 0:03:01 lr 0.001119 wd 0.0500 time 0.3474 (0.3465) data time 0.0008 (0.0050) model time 0.3466 (0.3415) loss 3.7759 (3.4275) grad_norm 1.6013 (1.4318) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][110/625] eta 0:02:58 lr 0.001119 wd 0.0500 time 0.3377 (0.3461) data time 0.0008 (0.0047) model time 0.3369 (0.3413) loss 3.6347 (3.3912) grad_norm 1.2877 (1.4148) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][120/625] eta 0:02:54 lr 0.001119 wd 0.0500 time 0.3406 (0.3460) data time 0.0007 (0.0044) model time 0.3399 (0.3417) loss 2.9490 (3.3905) grad_norm 1.7891 (1.4065) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][130/625] eta 0:02:51 lr 0.001119 wd 0.0500 time 0.3548 (0.3458) data time 0.0010 (0.0041) model time 0.3538 (0.3418) loss 3.1075 (3.3779) grad_norm 0.9853 (1.4010) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][140/625] eta 0:02:47 lr 0.001119 wd 0.0500 time 0.3383 (0.3457) data time 0.0008 (0.0039) model time 0.3374 (0.3420) loss 3.9725 (3.3946) grad_norm 1.8936 (1.4197) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][150/625] eta 0:02:44 lr 0.001118 wd 0.0500 time 0.3553 (0.3456) data time 0.0011 (0.0037) model time 0.3542 (0.3420) loss 2.4647 (3.4010) grad_norm 1.2922 (1.4200) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][160/625] eta 0:02:41 lr 0.001118 wd 0.0500 time 0.3672 (0.3467) data time 0.0010 (0.0036) model time 0.3662 (0.3438) loss 3.4000 (3.3951) grad_norm 1.9008 (1.4424) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][170/625] eta 0:02:37 lr 0.001118 wd 0.0500 time 0.3356 (0.3466) data time 0.0010 (0.0034) model time 0.3346 (0.3440) loss 3.9181 (3.4013) grad_norm 0.9682 (1.4577) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][180/625] eta 0:02:34 lr 0.001118 wd 0.0500 time 0.3333 (0.3463) data time 0.0008 (0.0033) model time 0.3325 (0.3436) loss 3.4618 (3.4096) grad_norm 1.3973 (1.4809) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:00:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][190/625] eta 0:02:30 lr 0.001118 wd 0.0500 time 0.3351 (0.3461) data time 0.0012 (0.0032) model time 0.3339 (0.3435) loss 3.5995 (3.4073) grad_norm 1.6122 (1.4688) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][200/625] eta 0:02:27 lr 0.001118 wd 0.0500 time 0.3385 (0.3460) data time 0.0011 (0.0031) model time 0.3374 (0.3434) loss 3.3612 (3.4080) grad_norm 1.5644 (1.4648) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][210/625] eta 0:02:23 lr 0.001118 wd 0.0500 time 0.3475 (0.3460) data time 0.0011 (0.0030) model time 0.3464 (0.3435) loss 3.4123 (3.4186) grad_norm 1.0760 (1.4527) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][220/625] eta 0:02:20 lr 0.001118 wd 0.0500 time 0.3435 (0.3477) data time 0.0011 (0.0029) model time 0.3424 (0.3458) loss 3.3313 (3.4197) grad_norm 1.1626 (1.4514) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][230/625] eta 0:02:17 lr 0.001118 wd 0.0500 time 0.3376 (0.3475) data time 0.0010 (0.0028) model time 0.3366 (0.3456) loss 3.4459 (3.4266) grad_norm 1.4870 (1.4513) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][240/625] eta 0:02:13 lr 0.001118 wd 0.0500 time 0.3389 (0.3472) data time 0.0011 (0.0027) model time 0.3378 (0.3453) loss 2.4865 (3.4087) grad_norm 1.3664 (1.4470) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][250/625] eta 0:02:10 lr 0.001118 wd 0.0500 time 0.3740 (0.3473) data time 0.0008 (0.0027) model time 0.3732 (0.3454) loss 3.3366 (3.4055) grad_norm 1.1230 (1.4445) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][260/625] eta 0:02:06 lr 0.001118 wd 0.0500 time 0.3468 (0.3470) data time 0.0011 (0.0026) model time 0.3457 (0.3451) loss 2.8776 (3.3960) grad_norm 1.0563 (1.4408) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][270/625] eta 0:02:03 lr 0.001118 wd 0.0500 time 0.3414 (0.3469) data time 0.0011 (0.0025) model time 0.3403 (0.3450) loss 3.5310 (3.4100) grad_norm 1.2898 (1.4385) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][280/625] eta 0:01:59 lr 0.001118 wd 0.0500 time 0.3535 (0.3468) data time 0.0010 (0.0025) model time 0.3525 (0.3449) loss 3.5838 (3.4063) grad_norm 1.1349 (1.4368) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][290/625] eta 0:01:56 lr 0.001118 wd 0.0500 time 0.3474 (0.3467) data time 0.0012 (0.0024) model time 0.3462 (0.3448) loss 3.4243 (3.3992) grad_norm 1.0815 (1.4399) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][300/625] eta 0:01:52 lr 0.001118 wd 0.0500 time 0.3412 (0.3465) data time 0.0010 (0.0024) model time 0.3402 (0.3447) loss 3.4810 (3.3905) grad_norm 1.1509 (1.4336) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][310/625] eta 0:01:49 lr 0.001118 wd 0.0500 time 0.3453 (0.3464) data time 0.0008 (0.0023) model time 0.3445 (0.3446) loss 3.6908 (3.3889) grad_norm 1.4533 (1.4340) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][320/625] eta 0:01:45 lr 0.001118 wd 0.0500 time 0.3396 (0.3462) data time 0.0007 (0.0023) model time 0.3389 (0.3444) loss 3.2220 (3.3873) grad_norm 1.2930 (1.4282) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][330/625] eta 0:01:42 lr 0.001118 wd 0.0500 time 0.3429 (0.3461) data time 0.0008 (0.0023) model time 0.3421 (0.3442) loss 2.9384 (3.3830) grad_norm 1.1815 (1.4258) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][340/625] eta 0:01:38 lr 0.001117 wd 0.0500 time 0.3383 (0.3460) data time 0.0010 (0.0022) model time 0.3373 (0.3442) loss 2.7477 (3.3803) grad_norm 1.5518 (1.4259) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][350/625] eta 0:01:35 lr 0.001117 wd 0.0500 time 0.3476 (0.3460) data time 0.0011 (0.0022) model time 0.3466 (0.3441) loss 2.5504 (3.3680) grad_norm 1.0279 (1.4273) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:01:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][360/625] eta 0:01:31 lr 0.001117 wd 0.0500 time 0.3419 (0.3459) data time 0.0007 (0.0022) model time 0.3412 (0.3440) loss 2.6589 (3.3563) grad_norm 1.1411 (1.4246) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][370/625] eta 0:01:28 lr 0.001117 wd 0.0500 time 0.3566 (0.3458) data time 0.0008 (0.0021) model time 0.3558 (0.3440) loss 2.8804 (3.3524) grad_norm 1.3681 (1.4228) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][380/625] eta 0:01:24 lr 0.001117 wd 0.0500 time 0.3360 (0.3463) data time 0.0010 (0.0021) model time 0.3350 (0.3446) loss 3.9131 (3.3494) grad_norm 1.2918 (1.4245) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][390/625] eta 0:01:21 lr 0.001117 wd 0.0500 time 0.3439 (0.3463) data time 0.0010 (0.0021) model time 0.3428 (0.3446) loss 3.1821 (3.3413) grad_norm 1.3927 (1.4264) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][400/625] eta 0:01:17 lr 0.001117 wd 0.0500 time 0.3404 (0.3462) data time 0.0011 (0.0021) model time 0.3393 (0.3445) loss 3.1683 (3.3456) grad_norm 1.8433 (1.4309) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][410/625] eta 0:01:14 lr 0.001117 wd 0.0500 time 0.3454 (0.3461) data time 0.0012 (0.0020) model time 0.3442 (0.3444) loss 3.2968 (3.3429) grad_norm 1.8753 (1.4390) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][420/625] eta 0:01:10 lr 0.001117 wd 0.0500 time 0.3397 (0.3460) data time 0.0008 (0.0020) model time 0.3388 (0.3443) loss 3.0079 (3.3443) grad_norm 1.7042 (1.4392) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][430/625] eta 0:01:07 lr 0.001117 wd 0.0500 time 0.3558 (0.3460) data time 0.0008 (0.0020) model time 0.3550 (0.3443) loss 3.1935 (3.3503) grad_norm 1.3562 (1.4417) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][440/625] eta 0:01:04 lr 0.001117 wd 0.0500 time 0.5347 (0.3472) data time 0.0008 (0.0020) model time 0.5339 (0.3458) loss 4.0703 (3.3545) grad_norm 1.2799 (1.4427) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][450/625] eta 0:01:00 lr 0.001117 wd 0.0500 time 0.3354 (0.3472) data time 0.0009 (0.0020) model time 0.3346 (0.3458) loss 3.3389 (3.3509) grad_norm 1.9029 (1.4473) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][460/625] eta 0:00:57 lr 0.001117 wd 0.0500 time 0.3484 (0.3471) data time 0.0012 (0.0019) model time 0.3473 (0.3457) loss 3.7331 (3.3503) grad_norm 1.9129 (1.4520) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][470/625] eta 0:00:53 lr 0.001117 wd 0.0500 time 0.3254 (0.3472) data time 0.0008 (0.0020) model time 0.3245 (0.3457) loss 3.8395 (3.3517) grad_norm 1.3463 (1.4550) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][480/625] eta 0:00:50 lr 0.001117 wd 0.0500 time 0.3433 (0.3473) data time 0.0007 (0.0020) model time 0.3426 (0.3458) loss 1.8800 (3.3478) grad_norm 1.6699 (1.4543) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][490/625] eta 0:00:46 lr 0.001117 wd 0.0500 time 0.3759 (0.3473) data time 0.0011 (0.0020) model time 0.3749 (0.3457) loss 3.7181 (3.3474) grad_norm 0.9795 (1.4508) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][500/625] eta 0:00:43 lr 0.001117 wd 0.0500 time 0.3361 (0.3474) data time 0.0008 (0.0019) model time 0.3353 (0.3459) loss 2.6643 (3.3452) grad_norm 1.9364 (1.4564) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][510/625] eta 0:00:39 lr 0.001117 wd 0.0500 time 0.3461 (0.3475) data time 0.0009 (0.0019) model time 0.3452 (0.3460) loss 4.1976 (3.3460) grad_norm 1.6505 (1.4574) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][520/625] eta 0:00:36 lr 0.001116 wd 0.0500 time 0.3805 (0.3476) data time 0.0008 (0.0019) model time 0.3797 (0.3461) loss 4.1823 (3.3486) grad_norm 0.9283 (1.4549) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][530/625] eta 0:00:33 lr 0.001116 wd 0.0500 time 0.3435 (0.3475) data time 0.0009 (0.0019) model time 0.3426 (0.3461) loss 3.3716 (3.3430) grad_norm 1.2998 (1.4524) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][540/625] eta 0:00:29 lr 0.001116 wd 0.0500 time 0.3527 (0.3476) data time 0.0009 (0.0019) model time 0.3518 (0.3461) loss 2.3829 (3.3445) grad_norm 1.4208 (1.4518) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][550/625] eta 0:00:26 lr 0.001116 wd 0.0500 time 0.3382 (0.3475) data time 0.0009 (0.0019) model time 0.3373 (0.3461) loss 3.9862 (3.3466) grad_norm 2.2652 (1.4525) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][560/625] eta 0:00:22 lr 0.001116 wd 0.0500 time 0.3407 (0.3474) data time 0.0008 (0.0019) model time 0.3400 (0.3460) loss 3.4256 (3.3441) grad_norm 1.0075 (1.4529) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][570/625] eta 0:00:19 lr 0.001116 wd 0.0500 time 0.3450 (0.3473) data time 0.0010 (0.0018) model time 0.3439 (0.3459) loss 3.2533 (3.3420) grad_norm 1.0831 (1.4524) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][580/625] eta 0:00:15 lr 0.001116 wd 0.0500 time 0.3487 (0.3473) data time 0.0008 (0.0018) model time 0.3479 (0.3458) loss 2.9426 (3.3385) grad_norm 1.0863 (1.4488) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][590/625] eta 0:00:12 lr 0.001116 wd 0.0500 time 0.3467 (0.3472) data time 0.0011 (0.0018) model time 0.3456 (0.3458) loss 3.6322 (3.3364) grad_norm 1.5556 (1.4490) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][600/625] eta 0:00:08 lr 0.001116 wd 0.0500 time 0.3441 (0.3474) data time 0.0008 (0.0018) model time 0.3433 (0.3459) loss 4.0323 (3.3406) grad_norm 1.4519 (1.4468) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][610/625] eta 0:00:05 lr 0.001116 wd 0.0500 time 0.3304 (0.3473) data time 0.0008 (0.0018) model time 0.3296 (0.3459) loss 3.3502 (3.3395) grad_norm 1.2344 (1.4430) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][620/625] eta 0:00:01 lr 0.001116 wd 0.0500 time 0.3265 (0.3470) data time 0.0005 (0.0018) model time 0.3259 (0.3456) loss 3.5658 (3.3404) grad_norm 1.6525 (1.4409) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 67 training takes 0:03:36 [2024-07-24 16:03:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:03:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:03:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.502 (0.502) Loss 0.7339 (0.7339) Acc@1 85.303 (85.303) Acc@5 97.900 (97.900) Mem 14261MB [2024-07-24 16:03:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.128) Loss 1.1553 (0.8691) Acc@1 74.170 (82.076) Acc@5 93.262 (96.298) Mem 14261MB [2024-07-24 16:03:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.109) Loss 1.2666 (1.0253) Acc@1 70.605 (78.076) Acc@5 91.943 (94.299) Mem 14261MB [2024-07-24 16:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.883 Acc@5 94.296 [2024-07-24 16:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.9% [2024-07-24 16:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.88% [2024-07-24 16:03:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 16:03:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 16:03:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.485 (0.485) Loss 0.6914 (0.6914) Acc@1 86.328 (86.328) Acc@5 97.607 (97.607) Mem 14261MB [2024-07-24 16:03:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.126) Loss 1.1025 (0.8429) Acc@1 75.293 (82.511) Acc@5 93.799 (96.498) Mem 14261MB [2024-07-24 16:03:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.107) Loss 1.2939 (1.0057) Acc@1 69.922 (78.358) Acc@5 91.260 (94.466) Mem 14261MB [2024-07-24 16:03:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.045 Acc@5 94.458 [2024-07-24 16:03:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.0% [2024-07-24 16:03:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.05% [2024-07-24 16:03:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:03:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:03:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][0/625] eta 0:07:54 lr 0.001116 wd 0.0500 time 0.7593 (0.7593) data time 0.4376 (0.4376) model time 0.0000 (0.0000) loss 2.5785 (2.5785) grad_norm 1.5842 (1.5842) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][10/625] eta 0:03:54 lr 0.001116 wd 0.0500 time 0.3488 (0.3814) data time 0.0010 (0.0408) model time 0.0000 (0.0000) loss 3.6160 (3.3092) grad_norm 2.7409 (1.4833) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][20/625] eta 0:03:39 lr 0.001116 wd 0.0500 time 0.3471 (0.3620) data time 0.0009 (0.0218) model time 0.0000 (0.0000) loss 3.0664 (3.1388) grad_norm 1.3964 (1.5957) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][30/625] eta 0:03:35 lr 0.001116 wd 0.0500 time 0.5659 (0.3629) data time 0.0008 (0.0155) model time 0.0000 (0.0000) loss 3.7100 (3.2189) grad_norm 1.9401 (1.6812) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][40/625] eta 0:03:33 lr 0.001116 wd 0.0500 time 0.3388 (0.3656) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 3.3017 (3.2201) grad_norm 1.0719 (1.6765) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:03:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][50/625] eta 0:03:27 lr 0.001116 wd 0.0500 time 0.3365 (0.3610) data time 0.0008 (0.0098) model time 0.0000 (0.0000) loss 3.3951 (3.2249) grad_norm 1.2693 (1.6121) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][60/625] eta 0:03:22 lr 0.001116 wd 0.0500 time 0.3345 (0.3577) data time 0.0008 (0.0084) model time 0.3337 (0.3395) loss 4.1980 (3.2686) grad_norm 1.0110 (1.5713) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][70/625] eta 0:03:17 lr 0.001116 wd 0.0500 time 0.3409 (0.3558) data time 0.0008 (0.0074) model time 0.3401 (0.3413) loss 2.8154 (3.2695) grad_norm 1.0498 (1.5617) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][80/625] eta 0:03:12 lr 0.001115 wd 0.0500 time 0.3316 (0.3538) data time 0.0008 (0.0066) model time 0.3308 (0.3406) loss 4.1340 (3.3193) grad_norm 1.4089 (1.5502) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][90/625] eta 0:03:08 lr 0.001115 wd 0.0500 time 0.3442 (0.3527) data time 0.0010 (0.0060) model time 0.3432 (0.3410) loss 2.6645 (3.3148) grad_norm 1.1506 (1.5728) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][100/625] eta 0:03:05 lr 0.001115 wd 0.0500 time 0.3472 (0.3533) data time 0.0008 (0.0055) model time 0.3464 (0.3443) loss 4.3553 (3.3303) grad_norm 1.0634 (1.5479) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][110/625] eta 0:03:01 lr 0.001115 wd 0.0500 time 0.3433 (0.3525) data time 0.0008 (0.0051) model time 0.3425 (0.3442) loss 3.7311 (3.3523) grad_norm 0.9608 (1.5172) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][120/625] eta 0:02:57 lr 0.001115 wd 0.0500 time 0.3454 (0.3519) data time 0.0008 (0.0048) model time 0.3447 (0.3442) loss 2.5774 (3.3607) grad_norm 1.2348 (1.5187) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][130/625] eta 0:02:53 lr 0.001115 wd 0.0500 time 0.3490 (0.3512) data time 0.0010 (0.0045) model time 0.3480 (0.3439) loss 3.4246 (3.3683) grad_norm 1.3968 (1.5180) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][140/625] eta 0:02:50 lr 0.001115 wd 0.0500 time 0.3398 (0.3506) data time 0.0008 (0.0042) model time 0.3389 (0.3436) loss 3.6274 (3.3747) grad_norm 1.3810 (1.5055) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][150/625] eta 0:02:46 lr 0.001115 wd 0.0500 time 0.3477 (0.3502) data time 0.0008 (0.0040) model time 0.3469 (0.3436) loss 3.0116 (3.3635) grad_norm 1.0105 (1.4933) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][160/625] eta 0:02:42 lr 0.001115 wd 0.0500 time 0.3339 (0.3498) data time 0.0007 (0.0038) model time 0.3332 (0.3436) loss 3.4039 (3.3503) grad_norm 1.1389 (1.4861) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][170/625] eta 0:02:39 lr 0.001115 wd 0.0500 time 0.3393 (0.3499) data time 0.0008 (0.0037) model time 0.3385 (0.3441) loss 4.1932 (3.3644) grad_norm 1.1086 (1.4788) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][180/625] eta 0:02:35 lr 0.001115 wd 0.0500 time 0.3376 (0.3496) data time 0.0011 (0.0035) model time 0.3365 (0.3440) loss 3.6039 (3.3599) grad_norm 1.3364 (1.4809) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][190/625] eta 0:02:31 lr 0.001115 wd 0.0500 time 0.3518 (0.3494) data time 0.0009 (0.0034) model time 0.3510 (0.3441) loss 2.5893 (3.3636) grad_norm 1.3421 (1.4708) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][200/625] eta 0:02:28 lr 0.001115 wd 0.0500 time 0.3385 (0.3490) data time 0.0010 (0.0033) model time 0.3375 (0.3439) loss 2.4241 (3.3513) grad_norm 2.3215 (1.4692) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][210/625] eta 0:02:24 lr 0.001115 wd 0.0500 time 0.3844 (0.3492) data time 0.0010 (0.0032) model time 0.3834 (0.3444) loss 3.0124 (3.3433) grad_norm 1.5998 (1.4726) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][220/625] eta 0:02:21 lr 0.001115 wd 0.0500 time 0.3309 (0.3491) data time 0.0008 (0.0031) model time 0.3300 (0.3445) loss 2.6429 (3.3390) grad_norm 1.3426 (1.4681) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][230/625] eta 0:02:17 lr 0.001115 wd 0.0500 time 0.3363 (0.3489) data time 0.0008 (0.0030) model time 0.3355 (0.3444) loss 2.9161 (3.3392) grad_norm 1.1433 (1.4646) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][240/625] eta 0:02:14 lr 0.001115 wd 0.0500 time 0.3509 (0.3492) data time 0.0008 (0.0030) model time 0.3501 (0.3449) loss 3.7230 (3.3437) grad_norm 2.4318 (1.4630) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][250/625] eta 0:02:10 lr 0.001115 wd 0.0500 time 0.3402 (0.3489) data time 0.0011 (0.0029) model time 0.3391 (0.3448) loss 3.4755 (3.3457) grad_norm 2.0387 (1.4702) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][260/625] eta 0:02:08 lr 0.001114 wd 0.0500 time 0.3409 (0.3520) data time 0.0010 (0.0030) model time 0.3399 (0.3485) loss 3.5851 (3.3504) grad_norm 0.9795 (1.4630) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][270/625] eta 0:02:04 lr 0.001114 wd 0.0500 time 0.3397 (0.3516) data time 0.0010 (0.0029) model time 0.3387 (0.3482) loss 3.2905 (3.3525) grad_norm 1.5933 (1.4638) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][280/625] eta 0:02:01 lr 0.001114 wd 0.0500 time 0.3482 (0.3514) data time 0.0012 (0.0028) model time 0.3469 (0.3480) loss 3.6307 (3.3486) grad_norm 1.4705 (1.4612) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][290/625] eta 0:01:57 lr 0.001114 wd 0.0500 time 0.3363 (0.3512) data time 0.0010 (0.0028) model time 0.3353 (0.3479) loss 3.6265 (3.3501) grad_norm 1.2235 (1.4553) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][300/625] eta 0:01:54 lr 0.001114 wd 0.0500 time 0.3404 (0.3513) data time 0.0008 (0.0027) model time 0.3395 (0.3480) loss 2.8836 (3.3529) grad_norm 1.4024 (1.4539) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][310/625] eta 0:01:50 lr 0.001114 wd 0.0500 time 0.3370 (0.3511) data time 0.0008 (0.0026) model time 0.3362 (0.3479) loss 2.7971 (3.3487) grad_norm 1.3451 (1.4462) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][320/625] eta 0:01:47 lr 0.001114 wd 0.0500 time 0.3435 (0.3515) data time 0.0010 (0.0026) model time 0.3425 (0.3485) loss 3.4153 (3.3474) grad_norm 1.9099 (1.4475) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][330/625] eta 0:01:43 lr 0.001114 wd 0.0500 time 0.3463 (0.3514) data time 0.0008 (0.0026) model time 0.3455 (0.3485) loss 3.8865 (3.3460) grad_norm 1.4765 (1.4562) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][340/625] eta 0:01:40 lr 0.001114 wd 0.0500 time 0.3393 (0.3512) data time 0.0011 (0.0025) model time 0.3382 (0.3483) loss 2.6727 (3.3410) grad_norm 1.0443 (1.4504) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][350/625] eta 0:01:36 lr 0.001114 wd 0.0500 time 0.3372 (0.3510) data time 0.0011 (0.0025) model time 0.3362 (0.3481) loss 3.3251 (3.3406) grad_norm 1.4250 (1.4467) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][360/625] eta 0:01:32 lr 0.001114 wd 0.0500 time 0.3472 (0.3509) data time 0.0010 (0.0024) model time 0.3462 (0.3481) loss 3.2211 (3.3383) grad_norm 1.1322 (1.4486) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][370/625] eta 0:01:29 lr 0.001114 wd 0.0500 time 0.3300 (0.3507) data time 0.0011 (0.0024) model time 0.3289 (0.3479) loss 3.8339 (3.3403) grad_norm 1.2353 (1.4496) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][380/625] eta 0:01:25 lr 0.001114 wd 0.0500 time 0.3449 (0.3505) data time 0.0010 (0.0023) model time 0.3439 (0.3477) loss 3.9429 (3.3454) grad_norm 1.2697 (1.4436) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][390/625] eta 0:01:22 lr 0.001114 wd 0.0500 time 0.3348 (0.3504) data time 0.0008 (0.0023) model time 0.3341 (0.3476) loss 3.6268 (3.3430) grad_norm 1.5157 (1.4398) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][400/625] eta 0:01:18 lr 0.001114 wd 0.0500 time 0.3414 (0.3502) data time 0.0010 (0.0023) model time 0.3404 (0.3475) loss 2.4612 (3.3365) grad_norm 1.5856 (1.4396) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][410/625] eta 0:01:15 lr 0.001114 wd 0.0500 time 0.3450 (0.3502) data time 0.0009 (0.0023) model time 0.3440 (0.3474) loss 2.8369 (3.3367) grad_norm 1.5815 (1.4349) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][420/625] eta 0:01:11 lr 0.001114 wd 0.0500 time 0.3368 (0.3500) data time 0.0011 (0.0022) model time 0.3357 (0.3473) loss 3.1278 (3.3332) grad_norm 1.1240 (1.4384) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][430/625] eta 0:01:08 lr 0.001114 wd 0.0500 time 0.3411 (0.3498) data time 0.0010 (0.0022) model time 0.3401 (0.3472) loss 3.3878 (3.3329) grad_norm 1.3843 (1.4446) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][440/625] eta 0:01:04 lr 0.001113 wd 0.0500 time 0.3432 (0.3496) data time 0.0008 (0.0022) model time 0.3424 (0.3470) loss 1.9506 (3.3369) grad_norm 1.8048 (1.4486) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][450/625] eta 0:01:01 lr 0.001113 wd 0.0500 time 0.3472 (0.3495) data time 0.0010 (0.0021) model time 0.3462 (0.3468) loss 3.9159 (3.3383) grad_norm 1.7828 (1.4512) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][460/625] eta 0:00:57 lr 0.001113 wd 0.0500 time 0.3495 (0.3494) data time 0.0010 (0.0021) model time 0.3485 (0.3467) loss 3.1896 (3.3416) grad_norm 1.8864 (1.4542) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][470/625] eta 0:00:54 lr 0.001113 wd 0.0500 time 0.3524 (0.3492) data time 0.0007 (0.0021) model time 0.3517 (0.3466) loss 2.5612 (3.3389) grad_norm 1.1243 (1.4538) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][480/625] eta 0:00:50 lr 0.001113 wd 0.0500 time 0.4096 (0.3499) data time 0.0008 (0.0021) model time 0.4088 (0.3475) loss 3.2722 (3.3389) grad_norm 1.8665 (1.4503) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][490/625] eta 0:00:47 lr 0.001113 wd 0.0500 time 0.3390 (0.3503) data time 0.0011 (0.0021) model time 0.3380 (0.3479) loss 3.8423 (3.3428) grad_norm 1.2308 (1.4467) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][500/625] eta 0:00:43 lr 0.001113 wd 0.0500 time 0.3386 (0.3501) data time 0.0008 (0.0020) model time 0.3378 (0.3477) loss 4.1136 (3.3449) grad_norm 1.6363 (1.4456) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][510/625] eta 0:00:40 lr 0.001113 wd 0.0500 time 0.3551 (0.3500) data time 0.0010 (0.0020) model time 0.3541 (0.3476) loss 3.5345 (3.3484) grad_norm 1.3879 (1.4523) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][520/625] eta 0:00:36 lr 0.001113 wd 0.0500 time 0.3382 (0.3498) data time 0.0010 (0.0020) model time 0.3372 (0.3475) loss 3.5514 (3.3489) grad_norm 1.5910 (1.4513) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][530/625] eta 0:00:33 lr 0.001113 wd 0.0500 time 0.3447 (0.3497) data time 0.0012 (0.0020) model time 0.3435 (0.3473) loss 3.8779 (3.3502) grad_norm 1.1443 (1.4497) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][540/625] eta 0:00:29 lr 0.001113 wd 0.0500 time 0.3161 (0.3498) data time 0.0008 (0.0020) model time 0.3152 (0.3475) loss 3.8871 (3.3531) grad_norm 1.3222 (1.4465) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][550/625] eta 0:00:26 lr 0.001113 wd 0.0500 time 0.3455 (0.3497) data time 0.0008 (0.0020) model time 0.3447 (0.3474) loss 2.8242 (3.3545) grad_norm 1.0058 (1.4465) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][560/625] eta 0:00:22 lr 0.001113 wd 0.0500 time 0.3432 (0.3497) data time 0.0008 (0.0019) model time 0.3424 (0.3474) loss 3.4155 (3.3564) grad_norm 1.0645 (1.4433) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:06:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][570/625] eta 0:00:19 lr 0.001113 wd 0.0500 time 0.3549 (0.3495) data time 0.0008 (0.0019) model time 0.3541 (0.3473) loss 2.1502 (3.3540) grad_norm 1.6368 (1.4455) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][580/625] eta 0:00:15 lr 0.001113 wd 0.0500 time 0.3449 (0.3494) data time 0.0008 (0.0019) model time 0.3441 (0.3471) loss 2.7337 (3.3517) grad_norm 1.2515 (1.4466) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][590/625] eta 0:00:12 lr 0.001113 wd 0.0500 time 0.3401 (0.3493) data time 0.0010 (0.0019) model time 0.3391 (0.3470) loss 3.6517 (3.3534) grad_norm 1.3297 (1.4464) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][600/625] eta 0:00:08 lr 0.001113 wd 0.0500 time 0.3389 (0.3492) data time 0.0010 (0.0019) model time 0.3379 (0.3469) loss 3.0085 (3.3537) grad_norm 1.1645 (1.4452) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][610/625] eta 0:00:05 lr 0.001113 wd 0.0500 time 0.3293 (0.3490) data time 0.0008 (0.0019) model time 0.3285 (0.3468) loss 2.7075 (3.3531) grad_norm 0.9003 (1.4473) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][620/625] eta 0:00:01 lr 0.001112 wd 0.0500 time 0.3277 (0.3488) data time 0.0007 (0.0019) model time 0.3269 (0.3465) loss 2.3588 (3.3545) grad_norm 1.4673 (1.4473) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 68 training takes 0:03:37 [2024-07-24 16:07:16 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:07:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:07:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.496 (0.496) Loss 0.6953 (0.6953) Acc@1 86.621 (86.621) Acc@5 97.510 (97.510) Mem 14261MB [2024-07-24 16:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.128) Loss 1.2070 (0.8449) Acc@1 70.947 (81.898) Acc@5 93.457 (96.311) Mem 14261MB [2024-07-24 16:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.109) Loss 1.2773 (1.0136) Acc@1 71.631 (77.972) Acc@5 91.553 (94.364) Mem 14261MB [2024-07-24 16:07:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.655 Acc@5 94.348 [2024-07-24 16:07:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.7% [2024-07-24 16:07:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.889 (0.889) Loss 0.6880 (0.6880) Acc@1 86.328 (86.328) Acc@5 97.607 (97.607) Mem 14261MB [2024-07-24 16:07:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.167) Loss 1.0947 (0.8384) Acc@1 75.586 (82.631) Acc@5 93.652 (96.515) Mem 14261MB [2024-07-24 16:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.129) Loss 1.2842 (0.9992) Acc@1 70.215 (78.516) Acc@5 91.309 (94.515) Mem 14261MB [2024-07-24 16:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.197 Acc@5 94.506 [2024-07-24 16:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-07-24 16:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.20% [2024-07-24 16:07:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:07:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:07:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][0/625] eta 0:07:37 lr 0.001112 wd 0.0500 time 0.7319 (0.7319) data time 0.4033 (0.4033) model time 0.0000 (0.0000) loss 2.5968 (2.5968) grad_norm 1.7968 (1.7968) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][10/625] eta 0:03:51 lr 0.001112 wd 0.0500 time 0.3461 (0.3767) data time 0.0009 (0.0376) model time 0.0000 (0.0000) loss 4.4850 (3.3914) grad_norm 1.1406 (1.3689) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][20/625] eta 0:03:38 lr 0.001112 wd 0.0500 time 0.3450 (0.3609) data time 0.0008 (0.0202) model time 0.0000 (0.0000) loss 3.8767 (3.4101) grad_norm 1.5121 (1.4602) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][30/625] eta 0:03:31 lr 0.001112 wd 0.0500 time 0.3417 (0.3551) data time 0.0010 (0.0140) model time 0.0000 (0.0000) loss 3.2452 (3.3130) grad_norm 1.7950 (1.5167) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][40/625] eta 0:03:26 lr 0.001112 wd 0.0500 time 0.3413 (0.3525) data time 0.0009 (0.0109) model time 0.0000 (0.0000) loss 2.8351 (3.3654) grad_norm 1.7151 (1.4962) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][50/625] eta 0:03:21 lr 0.001112 wd 0.0500 time 0.3359 (0.3510) data time 0.0011 (0.0090) model time 0.0000 (0.0000) loss 3.3826 (3.3584) grad_norm 1.1411 (1.4894) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][60/625] eta 0:03:17 lr 0.001112 wd 0.0500 time 0.3460 (0.3493) data time 0.0009 (0.0077) model time 0.3451 (0.3394) loss 3.6865 (3.3761) grad_norm 1.0822 (1.4555) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][70/625] eta 0:03:14 lr 0.001112 wd 0.0500 time 0.3378 (0.3503) data time 0.0010 (0.0067) model time 0.3368 (0.3474) loss 3.8619 (3.3283) grad_norm 1.3458 (1.4504) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][80/625] eta 0:03:15 lr 0.001112 wd 0.0500 time 0.3403 (0.3590) data time 0.0008 (0.0067) model time 0.3395 (0.3696) loss 2.0804 (3.3467) grad_norm 1.1891 (1.4229) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:07:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][90/625] eta 0:03:10 lr 0.001112 wd 0.0500 time 0.3391 (0.3569) data time 0.0008 (0.0061) model time 0.3383 (0.3619) loss 3.0412 (3.3168) grad_norm 1.4865 (1.4158) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][100/625] eta 0:03:06 lr 0.001112 wd 0.0500 time 0.3400 (0.3556) data time 0.0008 (0.0056) model time 0.3391 (0.3581) loss 3.9373 (3.3300) grad_norm 2.9186 (1.4551) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][110/625] eta 0:03:02 lr 0.001112 wd 0.0500 time 0.3409 (0.3541) data time 0.0008 (0.0052) model time 0.3401 (0.3548) loss 2.7652 (3.3071) grad_norm 1.1670 (1.4673) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][120/625] eta 0:02:58 lr 0.001112 wd 0.0500 time 0.3548 (0.3537) data time 0.0010 (0.0048) model time 0.3538 (0.3538) loss 3.6269 (3.3101) grad_norm 1.0879 (1.4522) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][130/625] eta 0:02:54 lr 0.001112 wd 0.0500 time 0.3363 (0.3525) data time 0.0008 (0.0045) model time 0.3355 (0.3518) loss 3.7915 (3.2971) grad_norm 2.1009 (1.4649) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][140/625] eta 0:02:50 lr 0.001112 wd 0.0500 time 0.3522 (0.3517) data time 0.0008 (0.0043) model time 0.3514 (0.3504) loss 3.7221 (3.2957) grad_norm 1.2591 (1.4607) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][150/625] eta 0:02:46 lr 0.001112 wd 0.0500 time 0.3427 (0.3509) data time 0.0007 (0.0041) model time 0.3420 (0.3493) loss 3.5516 (3.2929) grad_norm 1.0345 (1.4649) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][160/625] eta 0:02:42 lr 0.001112 wd 0.0500 time 0.3361 (0.3503) data time 0.0008 (0.0039) model time 0.3353 (0.3485) loss 3.6786 (3.2987) grad_norm 1.4025 (1.4787) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][170/625] eta 0:02:39 lr 0.001112 wd 0.0500 time 0.3465 (0.3499) data time 0.0008 (0.0037) model time 0.3457 (0.3480) loss 3.7139 (3.3115) grad_norm 1.7909 (1.4924) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][180/625] eta 0:02:35 lr 0.001111 wd 0.0500 time 0.3376 (0.3495) data time 0.0010 (0.0036) model time 0.3366 (0.3475) loss 2.9096 (3.2998) grad_norm 1.0169 (1.4795) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][190/625] eta 0:02:31 lr 0.001111 wd 0.0500 time 0.3391 (0.3491) data time 0.0009 (0.0034) model time 0.3382 (0.3470) loss 3.8805 (3.2942) grad_norm 1.2140 (1.4715) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][200/625] eta 0:02:28 lr 0.001111 wd 0.0500 time 0.3400 (0.3486) data time 0.0008 (0.0033) model time 0.3392 (0.3463) loss 3.2705 (3.2906) grad_norm 2.3409 (1.4810) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][210/625] eta 0:02:24 lr 0.001111 wd 0.0500 time 0.3476 (0.3483) data time 0.0011 (0.0032) model time 0.3466 (0.3461) loss 3.3957 (3.2957) grad_norm 1.5669 (1.4824) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][220/625] eta 0:02:20 lr 0.001111 wd 0.0500 time 0.3414 (0.3479) data time 0.0008 (0.0031) model time 0.3406 (0.3456) loss 4.0027 (3.3116) grad_norm 0.8948 (1.4783) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][230/625] eta 0:02:17 lr 0.001111 wd 0.0500 time 0.3441 (0.3477) data time 0.0008 (0.0030) model time 0.3432 (0.3454) loss 2.3318 (3.3093) grad_norm 1.6636 (1.4727) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][240/625] eta 0:02:13 lr 0.001111 wd 0.0500 time 0.3505 (0.3476) data time 0.0010 (0.0030) model time 0.3495 (0.3453) loss 3.7687 (3.3099) grad_norm 1.3769 (1.4840) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][250/625] eta 0:02:10 lr 0.001111 wd 0.0500 time 0.3411 (0.3473) data time 0.0008 (0.0029) model time 0.3403 (0.3450) loss 3.8794 (3.3068) grad_norm 1.5732 (1.4812) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][260/625] eta 0:02:06 lr 0.001111 wd 0.0500 time 0.3434 (0.3471) data time 0.0008 (0.0028) model time 0.3426 (0.3448) loss 3.8028 (3.3043) grad_norm 2.5067 (1.4957) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:08:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][270/625] eta 0:02:03 lr 0.001111 wd 0.0500 time 0.3450 (0.3469) data time 0.0011 (0.0028) model time 0.3438 (0.3447) loss 3.0673 (3.3053) grad_norm 1.2898 (1.4911) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][280/625] eta 0:01:59 lr 0.001111 wd 0.0500 time 0.3463 (0.3468) data time 0.0008 (0.0027) model time 0.3455 (0.3446) loss 3.0635 (3.3038) grad_norm 1.4351 (1.4849) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][290/625] eta 0:01:56 lr 0.001111 wd 0.0500 time 0.3341 (0.3474) data time 0.0011 (0.0026) model time 0.3330 (0.3453) loss 3.3886 (3.3097) grad_norm 1.9535 (1.4825) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][300/625] eta 0:01:53 lr 0.001111 wd 0.0500 time 0.5238 (0.3498) data time 0.0012 (0.0026) model time 0.5226 (0.3482) loss 3.5656 (3.3182) grad_norm 2.8895 (1.4895) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][310/625] eta 0:01:50 lr 0.001111 wd 0.0500 time 0.3383 (0.3501) data time 0.0017 (0.0025) model time 0.3366 (0.3486) loss 3.5381 (3.3198) grad_norm 1.2123 (1.4992) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][320/625] eta 0:01:46 lr 0.001111 wd 0.0500 time 0.3385 (0.3498) data time 0.0010 (0.0025) model time 0.3375 (0.3483) loss 3.0986 (3.3087) grad_norm 1.4079 (1.4967) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][330/625] eta 0:01:43 lr 0.001111 wd 0.0500 time 0.3457 (0.3496) data time 0.0010 (0.0025) model time 0.3447 (0.3480) loss 3.9688 (3.3088) grad_norm 1.2394 (1.4895) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][340/625] eta 0:01:39 lr 0.001111 wd 0.0500 time 0.3392 (0.3494) data time 0.0008 (0.0024) model time 0.3384 (0.3479) loss 3.5148 (3.3025) grad_norm 1.5267 (1.4797) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][350/625] eta 0:01:36 lr 0.001111 wd 0.0500 time 0.3470 (0.3493) data time 0.0012 (0.0024) model time 0.3458 (0.3477) loss 3.6837 (3.3039) grad_norm 1.8332 (1.4825) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][360/625] eta 0:01:32 lr 0.001110 wd 0.0500 time 0.3315 (0.3491) data time 0.0008 (0.0024) model time 0.3307 (0.3475) loss 4.0359 (3.3018) grad_norm 1.3459 (1.4855) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][370/625] eta 0:01:28 lr 0.001110 wd 0.0500 time 0.3396 (0.3489) data time 0.0008 (0.0023) model time 0.3389 (0.3473) loss 2.6251 (3.2994) grad_norm 1.1039 (1.4801) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][380/625] eta 0:01:25 lr 0.001110 wd 0.0500 time 0.3416 (0.3487) data time 0.0008 (0.0023) model time 0.3408 (0.3471) loss 3.1905 (3.3044) grad_norm 4.3370 (1.4917) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][390/625] eta 0:01:21 lr 0.001110 wd 0.0500 time 0.3605 (0.3486) data time 0.0009 (0.0022) model time 0.3595 (0.3470) loss 2.4399 (3.3039) grad_norm 1.4015 (1.4857) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][400/625] eta 0:01:18 lr 0.001110 wd 0.0500 time 0.3560 (0.3484) data time 0.0008 (0.0022) model time 0.3552 (0.3468) loss 3.8950 (3.3008) grad_norm 1.0713 (1.4849) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][410/625] eta 0:01:14 lr 0.001110 wd 0.0500 time 0.3499 (0.3482) data time 0.0016 (0.0022) model time 0.3483 (0.3466) loss 2.3741 (3.3008) grad_norm 1.5550 (1.4858) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][420/625] eta 0:01:11 lr 0.001110 wd 0.0500 time 0.3417 (0.3481) data time 0.0007 (0.0022) model time 0.3410 (0.3465) loss 4.1869 (3.3072) grad_norm 1.6708 (1.4831) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][430/625] eta 0:01:07 lr 0.001110 wd 0.0500 time 0.3614 (0.3481) data time 0.0010 (0.0022) model time 0.3604 (0.3465) loss 3.7168 (3.3114) grad_norm 1.6471 (1.4810) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:09:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][440/625] eta 0:01:04 lr 0.001110 wd 0.0500 time 0.3369 (0.3480) data time 0.0008 (0.0021) model time 0.3361 (0.3463) loss 3.7212 (3.3102) grad_norm 1.7306 (1.4781) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][450/625] eta 0:01:00 lr 0.001110 wd 0.0500 time 0.3430 (0.3481) data time 0.0010 (0.0021) model time 0.3421 (0.3465) loss 3.7753 (3.3074) grad_norm 1.5167 (1.4815) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][460/625] eta 0:00:57 lr 0.001110 wd 0.0500 time 0.3483 (0.3480) data time 0.0011 (0.0021) model time 0.3473 (0.3464) loss 3.5693 (3.3111) grad_norm 1.4063 (1.4812) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][470/625] eta 0:00:53 lr 0.001110 wd 0.0500 time 0.3345 (0.3480) data time 0.0009 (0.0021) model time 0.3336 (0.3464) loss 3.8140 (3.3121) grad_norm 1.2684 (1.4754) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][480/625] eta 0:00:50 lr 0.001110 wd 0.0500 time 0.3558 (0.3480) data time 0.0008 (0.0021) model time 0.3550 (0.3464) loss 3.1104 (3.3079) grad_norm 2.3778 (1.4803) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][490/625] eta 0:00:46 lr 0.001110 wd 0.0500 time 0.3477 (0.3480) data time 0.0011 (0.0020) model time 0.3465 (0.3464) loss 3.4812 (3.3088) grad_norm 1.6313 (1.4804) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][500/625] eta 0:00:43 lr 0.001110 wd 0.0500 time 0.3423 (0.3480) data time 0.0008 (0.0020) model time 0.3415 (0.3464) loss 4.4462 (3.3094) grad_norm 1.5108 (1.4798) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][510/625] eta 0:00:40 lr 0.001110 wd 0.0500 time 0.5239 (0.3482) data time 0.0011 (0.0020) model time 0.5228 (0.3467) loss 2.4383 (3.3122) grad_norm 1.5554 (1.4840) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][520/625] eta 0:00:36 lr 0.001110 wd 0.0500 time 0.5605 (0.3495) data time 0.0008 (0.0020) model time 0.5597 (0.3482) loss 2.4654 (3.3063) grad_norm 1.2432 (1.4910) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][530/625] eta 0:00:33 lr 0.001109 wd 0.0500 time 0.3495 (0.3499) data time 0.0011 (0.0020) model time 0.3484 (0.3485) loss 3.2693 (3.3090) grad_norm 1.9407 (1.4913) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][540/625] eta 0:00:29 lr 0.001109 wd 0.0500 time 0.3381 (0.3497) data time 0.0010 (0.0020) model time 0.3371 (0.3483) loss 3.5188 (3.3081) grad_norm 1.1689 (1.4929) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][550/625] eta 0:00:26 lr 0.001109 wd 0.0500 time 0.3305 (0.3495) data time 0.0008 (0.0019) model time 0.3297 (0.3482) loss 2.2395 (3.3072) grad_norm 1.3830 (1.4914) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][560/625] eta 0:00:22 lr 0.001109 wd 0.0500 time 0.3418 (0.3494) data time 0.0010 (0.0019) model time 0.3408 (0.3480) loss 4.0447 (3.3087) grad_norm 1.7518 (1.4940) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][570/625] eta 0:00:19 lr 0.001109 wd 0.0500 time 0.3407 (0.3493) data time 0.0008 (0.0019) model time 0.3399 (0.3479) loss 3.8170 (3.3146) grad_norm 1.9624 (1.4949) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][580/625] eta 0:00:15 lr 0.001109 wd 0.0500 time 0.3483 (0.3492) data time 0.0008 (0.0019) model time 0.3475 (0.3478) loss 2.8427 (3.3138) grad_norm 1.7545 (1.4927) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][590/625] eta 0:00:12 lr 0.001109 wd 0.0500 time 0.3443 (0.3491) data time 0.0010 (0.0019) model time 0.3433 (0.3477) loss 3.6517 (3.3108) grad_norm 1.1678 (1.4886) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][600/625] eta 0:00:08 lr 0.001109 wd 0.0500 time 0.3464 (0.3491) data time 0.0010 (0.0019) model time 0.3454 (0.3477) loss 3.6675 (3.3078) grad_norm 1.6605 (1.4872) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:10:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][610/625] eta 0:00:05 lr 0.001109 wd 0.0500 time 0.3274 (0.3489) data time 0.0005 (0.0019) model time 0.3269 (0.3475) loss 2.6799 (3.3090) grad_norm 1.5647 (1.4845) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:11:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][620/625] eta 0:00:01 lr 0.001109 wd 0.0500 time 0.3310 (0.3486) data time 0.0008 (0.0018) model time 0.3303 (0.3472) loss 3.8797 (3.3103) grad_norm 1.1341 (1.4825) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:11:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 69 training takes 0:03:37 [2024-07-24 16:11:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:11:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:11:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.488 (0.488) Loss 0.7129 (0.7129) Acc@1 86.182 (86.182) Acc@5 97.656 (97.656) Mem 14261MB [2024-07-24 16:11:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.129) Loss 1.1592 (0.8771) Acc@1 74.707 (81.916) Acc@5 92.969 (96.258) Mem 14261MB [2024-07-24 16:11:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.109) Loss 1.3057 (1.0374) Acc@1 69.922 (78.092) Acc@5 92.041 (94.343) Mem 14261MB [2024-07-24 16:11:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.801 Acc@5 94.250 [2024-07-24 16:11:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.8% [2024-07-24 16:11:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.869 (0.869) Loss 0.6836 (0.6836) Acc@1 86.377 (86.377) Acc@5 97.656 (97.656) Mem 14261MB [2024-07-24 16:11:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.167) Loss 1.0889 (0.8334) Acc@1 75.732 (82.746) Acc@5 94.043 (96.595) Mem 14261MB [2024-07-24 16:11:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.129) Loss 1.2725 (0.9927) Acc@1 70.459 (78.692) Acc@5 91.553 (94.608) Mem 14261MB [2024-07-24 16:11:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.395 Acc@5 94.598 [2024-07-24 16:11:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.4% [2024-07-24 16:11:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.39% [2024-07-24 16:11:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:11:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:11:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][0/625] eta 0:07:39 lr 0.001109 wd 0.0500 time 0.7352 (0.7352) data time 0.4026 (0.4026) model time 0.0000 (0.0000) loss 3.1274 (3.1274) grad_norm 1.0825 (1.0825) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:11:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][10/625] eta 0:03:53 lr 0.001109 wd 0.0500 time 0.3630 (0.3798) data time 0.0008 (0.0375) model time 0.0000 (0.0000) loss 3.0717 (3.1274) grad_norm 1.2749 (1.2887) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:11:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][20/625] eta 0:03:38 lr 0.001109 wd 0.0500 time 0.3473 (0.3613) data time 0.0010 (0.0202) model time 0.0000 (0.0000) loss 3.5160 (3.2423) grad_norm 1.1817 (1.3318) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:11:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][30/625] eta 0:03:31 lr 0.001109 wd 0.0500 time 0.3396 (0.3558) data time 0.0008 (0.0140) model time 0.0000 (0.0000) loss 2.8293 (3.2634) grad_norm 1.1829 (1.3719) loss_scale 4096.0000 (4096.0000) mem 14261MB [2024-07-24 16:11:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][40/625] eta 0:03:29 lr 0.001109 wd 0.0500 time 0.3450 (0.3575) data time 0.0010 (0.0109) model time 0.0000 (0.0000) loss 2.9466 (3.2742) grad_norm 1.0892 (1.3959) loss_scale 8192.0000 (4495.6098) mem 14261MB [2024-07-24 16:11:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][50/625] eta 0:03:23 lr 0.001109 wd 0.0500 time 0.3480 (0.3542) data time 0.0008 (0.0089) model time 0.0000 (0.0000) loss 2.3563 (3.2527) grad_norm 2.4389 (1.3754) loss_scale 8192.0000 (5220.3922) mem 14261MB [2024-07-24 16:11:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][60/625] eta 0:03:18 lr 0.001109 wd 0.0500 time 0.3477 (0.3521) data time 0.0009 (0.0076) model time 0.3469 (0.3405) loss 2.9915 (3.2416) grad_norm 1.1178 (1.3921) loss_scale 8192.0000 (5707.5410) mem 14261MB [2024-07-24 16:11:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][70/625] eta 0:03:14 lr 0.001109 wd 0.0500 time 0.3597 (0.3510) data time 0.0010 (0.0067) model time 0.3587 (0.3418) loss 3.3439 (3.2742) grad_norm 1.4625 (1.3990) loss_scale 8192.0000 (6057.4648) mem 14261MB [2024-07-24 16:11:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][80/625] eta 0:03:10 lr 0.001108 wd 0.0500 time 0.3415 (0.3495) data time 0.0008 (0.0060) model time 0.3407 (0.3406) loss 2.6596 (3.2662) grad_norm 1.3715 (1.4084) loss_scale 8192.0000 (6320.9877) mem 14261MB [2024-07-24 16:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][90/625] eta 0:03:06 lr 0.001108 wd 0.0500 time 0.3320 (0.3489) data time 0.0009 (0.0055) model time 0.3311 (0.3411) loss 4.1277 (3.2880) grad_norm 1.2437 (1.4158) loss_scale 8192.0000 (6526.5934) mem 14261MB [2024-07-24 16:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][100/625] eta 0:03:02 lr 0.001108 wd 0.0500 time 0.3425 (0.3484) data time 0.0011 (0.0050) model time 0.3415 (0.3414) loss 2.6478 (3.3098) grad_norm 1.4917 (1.4073) loss_scale 8192.0000 (6691.4851) mem 14261MB [2024-07-24 16:11:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][110/625] eta 0:03:00 lr 0.001108 wd 0.0500 time 0.5587 (0.3500) data time 0.0008 (0.0047) model time 0.5579 (0.3455) loss 2.7450 (3.3211) grad_norm 1.2721 (1.4006) loss_scale 8192.0000 (6826.6667) mem 14261MB [2024-07-24 16:11:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][120/625] eta 0:02:59 lr 0.001108 wd 0.0500 time 0.3467 (0.3555) data time 0.0010 (0.0043) model time 0.3457 (0.3554) loss 3.7329 (3.3014) grad_norm 1.1755 (1.4036) loss_scale 8192.0000 (6939.5041) mem 14261MB [2024-07-24 16:11:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][130/625] eta 0:02:56 lr 0.001108 wd 0.0500 time 0.3381 (0.3559) data time 0.0011 (0.0041) model time 0.3370 (0.3559) loss 3.7279 (3.3061) grad_norm 1.5410 (1.4108) loss_scale 8192.0000 (7035.1145) mem 14261MB [2024-07-24 16:11:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][140/625] eta 0:02:52 lr 0.001108 wd 0.0500 time 0.3425 (0.3550) data time 0.0008 (0.0039) model time 0.3417 (0.3544) loss 3.1441 (3.3032) grad_norm 1.3560 (1.4112) loss_scale 8192.0000 (7117.1631) mem 14261MB [2024-07-24 16:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][150/625] eta 0:02:48 lr 0.001108 wd 0.0500 time 0.3410 (0.3542) data time 0.0008 (0.0037) model time 0.3403 (0.3532) loss 3.4395 (3.3006) grad_norm 1.3086 (1.4308) loss_scale 8192.0000 (7188.3444) mem 14261MB [2024-07-24 16:12:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][160/625] eta 0:02:44 lr 0.001108 wd 0.0500 time 0.3394 (0.3535) data time 0.0010 (0.0035) model time 0.3384 (0.3522) loss 2.7029 (3.3092) grad_norm 1.3166 (1.4268) loss_scale 8192.0000 (7250.6832) mem 14261MB [2024-07-24 16:12:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][170/625] eta 0:02:40 lr 0.001108 wd 0.0500 time 0.3391 (0.3528) data time 0.0012 (0.0034) model time 0.3379 (0.3512) loss 4.0784 (3.3036) grad_norm 1.4516 (1.4293) loss_scale 8192.0000 (7305.7310) mem 14261MB [2024-07-24 16:12:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][180/625] eta 0:02:36 lr 0.001108 wd 0.0500 time 0.3653 (0.3525) data time 0.0011 (0.0033) model time 0.3642 (0.3507) loss 3.4912 (3.3164) grad_norm 1.2851 (1.4209) loss_scale 8192.0000 (7354.6961) mem 14261MB [2024-07-24 16:12:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][190/625] eta 0:02:33 lr 0.001108 wd 0.0500 time 0.3379 (0.3520) data time 0.0010 (0.0032) model time 0.3368 (0.3502) loss 3.6600 (3.3220) grad_norm 1.2784 (1.4197) loss_scale 8192.0000 (7398.5340) mem 14261MB [2024-07-24 16:12:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][200/625] eta 0:02:29 lr 0.001108 wd 0.0500 time 0.3399 (0.3515) data time 0.0008 (0.0031) model time 0.3391 (0.3495) loss 3.4883 (3.3182) grad_norm 1.4280 (1.4061) loss_scale 8192.0000 (7438.0100) mem 14261MB [2024-07-24 16:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][210/625] eta 0:02:25 lr 0.001108 wd 0.0500 time 0.3442 (0.3511) data time 0.0009 (0.0030) model time 0.3433 (0.3490) loss 4.3331 (3.3288) grad_norm 1.3503 (1.3983) loss_scale 8192.0000 (7473.7441) mem 14261MB [2024-07-24 16:12:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][220/625] eta 0:02:22 lr 0.001108 wd 0.0500 time 0.3430 (0.3507) data time 0.0010 (0.0029) model time 0.3420 (0.3486) loss 3.4322 (3.3299) grad_norm 1.6981 (1.4086) loss_scale 8192.0000 (7506.2443) mem 14261MB [2024-07-24 16:12:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][230/625] eta 0:02:18 lr 0.001108 wd 0.0500 time 0.3503 (0.3504) data time 0.0008 (0.0028) model time 0.3496 (0.3483) loss 2.3376 (3.3286) grad_norm 1.6717 (1.4069) loss_scale 8192.0000 (7535.9307) mem 14261MB [2024-07-24 16:12:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][240/625] eta 0:02:14 lr 0.001108 wd 0.0500 time 0.3382 (0.3501) data time 0.0008 (0.0027) model time 0.3373 (0.3480) loss 2.2838 (3.3240) grad_norm 2.1618 (1.4166) loss_scale 8192.0000 (7563.1535) mem 14261MB [2024-07-24 16:12:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][250/625] eta 0:02:11 lr 0.001108 wd 0.0500 time 0.3448 (0.3498) data time 0.0011 (0.0026) model time 0.3437 (0.3477) loss 3.2582 (3.3233) grad_norm 1.2472 (1.4203) loss_scale 8192.0000 (7588.2072) mem 14261MB [2024-07-24 16:12:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][260/625] eta 0:02:07 lr 0.001107 wd 0.0500 time 0.5764 (0.3505) data time 0.0008 (0.0026) model time 0.5757 (0.3486) loss 3.6041 (3.3257) grad_norm 1.4814 (1.4282) loss_scale 8192.0000 (7611.3410) mem 14261MB [2024-07-24 16:12:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][270/625] eta 0:02:04 lr 0.001107 wd 0.0500 time 0.3292 (0.3503) data time 0.0009 (0.0025) model time 0.3282 (0.3483) loss 3.8031 (3.3259) grad_norm 1.5854 (1.4282) loss_scale 8192.0000 (7632.7675) mem 14261MB [2024-07-24 16:12:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][280/625] eta 0:02:00 lr 0.001107 wd 0.0500 time 0.3559 (0.3504) data time 0.0010 (0.0025) model time 0.3549 (0.3485) loss 3.1009 (3.3254) grad_norm 1.4113 (1.4254) loss_scale 8192.0000 (7652.6690) mem 14261MB [2024-07-24 16:12:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][290/625] eta 0:01:57 lr 0.001107 wd 0.0500 time 0.3375 (0.3502) data time 0.0009 (0.0024) model time 0.3366 (0.3483) loss 3.7731 (3.3308) grad_norm 1.3311 (1.4233) loss_scale 8192.0000 (7671.2027) mem 14261MB [2024-07-24 16:12:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][300/625] eta 0:01:53 lr 0.001107 wd 0.0500 time 0.3458 (0.3499) data time 0.0008 (0.0024) model time 0.3451 (0.3480) loss 2.1005 (3.3293) grad_norm 1.7069 (1.4144) loss_scale 8192.0000 (7688.5050) mem 14261MB [2024-07-24 16:12:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][310/625] eta 0:01:50 lr 0.001107 wd 0.0500 time 0.3334 (0.3497) data time 0.0011 (0.0023) model time 0.3322 (0.3478) loss 3.4019 (3.3330) grad_norm 1.1279 (1.4083) loss_scale 8192.0000 (7704.6945) mem 14261MB [2024-07-24 16:13:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][320/625] eta 0:01:46 lr 0.001107 wd 0.0500 time 0.3456 (0.3495) data time 0.0010 (0.0023) model time 0.3446 (0.3476) loss 3.2645 (3.3373) grad_norm 1.1180 (1.4045) loss_scale 8192.0000 (7719.8754) mem 14261MB [2024-07-24 16:13:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][330/625] eta 0:01:43 lr 0.001107 wd 0.0500 time 0.3415 (0.3493) data time 0.0008 (0.0023) model time 0.3407 (0.3474) loss 3.4322 (3.3445) grad_norm 1.7611 (1.4040) loss_scale 8192.0000 (7734.1390) mem 14261MB [2024-07-24 16:13:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][340/625] eta 0:01:40 lr 0.001107 wd 0.0500 time 0.3589 (0.3512) data time 0.0011 (0.0022) model time 0.3578 (0.3496) loss 3.6136 (3.3418) grad_norm 1.3854 (1.4026) loss_scale 8192.0000 (7747.5660) mem 14261MB [2024-07-24 16:13:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][350/625] eta 0:01:36 lr 0.001107 wd 0.0500 time 0.3382 (0.3516) data time 0.0008 (0.0022) model time 0.3374 (0.3501) loss 4.1713 (3.3434) grad_norm 1.1947 (1.4004) loss_scale 8192.0000 (7760.2279) mem 14261MB [2024-07-24 16:13:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][360/625] eta 0:01:33 lr 0.001107 wd 0.0500 time 0.3923 (0.3515) data time 0.0010 (0.0022) model time 0.3913 (0.3500) loss 3.8425 (3.3396) grad_norm 1.0313 (1.4025) loss_scale 8192.0000 (7772.1884) mem 14261MB [2024-07-24 16:13:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][370/625] eta 0:01:29 lr 0.001107 wd 0.0500 time 0.3457 (0.3513) data time 0.0010 (0.0022) model time 0.3448 (0.3498) loss 3.2689 (3.3364) grad_norm 1.3090 (1.3987) loss_scale 8192.0000 (7783.5040) mem 14261MB [2024-07-24 16:13:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][380/625] eta 0:01:26 lr 0.001107 wd 0.0500 time 0.3429 (0.3511) data time 0.0009 (0.0021) model time 0.3420 (0.3495) loss 2.4581 (3.3359) grad_norm 1.3658 (1.3968) loss_scale 8192.0000 (7794.2257) mem 14261MB [2024-07-24 16:13:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][390/625] eta 0:01:22 lr 0.001107 wd 0.0500 time 0.3464 (0.3509) data time 0.0010 (0.0021) model time 0.3454 (0.3492) loss 3.1838 (3.3351) grad_norm 1.5110 (1.3967) loss_scale 8192.0000 (7804.3990) mem 14261MB [2024-07-24 16:13:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][400/625] eta 0:01:18 lr 0.001107 wd 0.0500 time 0.3369 (0.3507) data time 0.0008 (0.0021) model time 0.3361 (0.3490) loss 4.1116 (3.3305) grad_norm 1.6391 (1.3989) loss_scale 8192.0000 (7814.0648) mem 14261MB [2024-07-24 16:13:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][410/625] eta 0:01:15 lr 0.001107 wd 0.0500 time 0.3396 (0.3506) data time 0.0011 (0.0021) model time 0.3385 (0.3489) loss 3.3160 (3.3286) grad_norm 1.3172 (1.3980) loss_scale 8192.0000 (7823.2603) mem 14261MB [2024-07-24 16:13:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][420/625] eta 0:01:11 lr 0.001107 wd 0.0500 time 0.3381 (0.3504) data time 0.0008 (0.0021) model time 0.3373 (0.3487) loss 2.4189 (3.3308) grad_norm 1.4336 (1.4002) loss_scale 8192.0000 (7832.0190) mem 14261MB [2024-07-24 16:13:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][430/625] eta 0:01:08 lr 0.001106 wd 0.0500 time 0.3360 (0.3502) data time 0.0008 (0.0020) model time 0.3352 (0.3485) loss 3.2486 (3.3333) grad_norm 1.4014 (1.4003) loss_scale 8192.0000 (7840.3712) mem 14261MB [2024-07-24 16:13:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][440/625] eta 0:01:04 lr 0.001106 wd 0.0500 time 0.3368 (0.3504) data time 0.0010 (0.0020) model time 0.3358 (0.3488) loss 3.6770 (3.3350) grad_norm 1.2789 (1.4016) loss_scale 8192.0000 (7848.3447) mem 14261MB [2024-07-24 16:13:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][450/625] eta 0:01:01 lr 0.001106 wd 0.0500 time 0.3386 (0.3503) data time 0.0009 (0.0020) model time 0.3376 (0.3487) loss 3.7082 (3.3300) grad_norm 1.1171 (1.4048) loss_scale 8192.0000 (7855.9645) mem 14261MB [2024-07-24 16:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][460/625] eta 0:00:57 lr 0.001106 wd 0.0500 time 0.3397 (0.3503) data time 0.0011 (0.0020) model time 0.3386 (0.3487) loss 3.3725 (3.3302) grad_norm 1.2977 (1.4114) loss_scale 8192.0000 (7863.2538) mem 14261MB [2024-07-24 16:13:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][470/625] eta 0:00:54 lr 0.001106 wd 0.0500 time 0.3400 (0.3501) data time 0.0008 (0.0020) model time 0.3392 (0.3485) loss 2.1396 (3.3290) grad_norm 1.2205 (1.4105) loss_scale 8192.0000 (7870.2335) mem 14261MB [2024-07-24 16:13:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][480/625] eta 0:00:50 lr 0.001106 wd 0.0500 time 0.3208 (0.3502) data time 0.0008 (0.0019) model time 0.3199 (0.3487) loss 2.5610 (3.3258) grad_norm 2.0770 (1.4151) loss_scale 8192.0000 (7876.9231) mem 14261MB [2024-07-24 16:14:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][490/625] eta 0:00:47 lr 0.001106 wd 0.0500 time 0.3414 (0.3500) data time 0.0010 (0.0019) model time 0.3403 (0.3484) loss 3.0571 (3.3169) grad_norm 1.6662 (1.4143) loss_scale 8192.0000 (7883.3401) mem 14261MB [2024-07-24 16:14:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][500/625] eta 0:00:43 lr 0.001106 wd 0.0500 time 0.3455 (0.3500) data time 0.0010 (0.0019) model time 0.3445 (0.3484) loss 3.0715 (3.3123) grad_norm 0.9793 (1.4096) loss_scale 8192.0000 (7889.5010) mem 14261MB [2024-07-24 16:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][510/625] eta 0:00:40 lr 0.001106 wd 0.0500 time 0.3417 (0.3499) data time 0.0010 (0.0019) model time 0.3407 (0.3483) loss 3.3736 (3.3094) grad_norm 1.2104 (1.4084) loss_scale 8192.0000 (7895.4207) mem 14261MB [2024-07-24 16:14:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][520/625] eta 0:00:36 lr 0.001106 wd 0.0500 time 0.3441 (0.3498) data time 0.0008 (0.0019) model time 0.3433 (0.3482) loss 2.6910 (3.3065) grad_norm 1.2440 (1.4086) loss_scale 8192.0000 (7901.1132) mem 14261MB [2024-07-24 16:14:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][530/625] eta 0:00:33 lr 0.001106 wd 0.0500 time 0.3370 (0.3499) data time 0.0008 (0.0019) model time 0.3362 (0.3483) loss 3.2757 (3.3063) grad_norm 1.5864 (1.4112) loss_scale 8192.0000 (7906.5913) mem 14261MB [2024-07-24 16:14:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][540/625] eta 0:00:29 lr 0.001106 wd 0.0500 time 0.3479 (0.3498) data time 0.0010 (0.0019) model time 0.3470 (0.3482) loss 3.3811 (3.3065) grad_norm 0.9482 (1.4068) loss_scale 8192.0000 (7911.8669) mem 14261MB [2024-07-24 16:14:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][550/625] eta 0:00:26 lr 0.001106 wd 0.0500 time 0.3612 (0.3498) data time 0.0008 (0.0019) model time 0.3604 (0.3483) loss 2.8124 (3.3053) grad_norm 1.2105 (1.4056) loss_scale 8192.0000 (7916.9510) mem 14261MB [2024-07-24 16:14:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][560/625] eta 0:00:22 lr 0.001106 wd 0.0500 time 0.3389 (0.3507) data time 0.0008 (0.0018) model time 0.3381 (0.3492) loss 4.1392 (3.3112) grad_norm 1.5087 (1.4094) loss_scale 8192.0000 (7921.8538) mem 14261MB [2024-07-24 16:14:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][570/625] eta 0:00:19 lr 0.001106 wd 0.0500 time 0.3416 (0.3511) data time 0.0008 (0.0018) model time 0.3407 (0.3497) loss 4.0364 (3.3134) grad_norm 1.2643 (1.4083) loss_scale 8192.0000 (7926.5849) mem 14261MB [2024-07-24 16:14:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][580/625] eta 0:00:15 lr 0.001106 wd 0.0500 time 0.3426 (0.3510) data time 0.0011 (0.0018) model time 0.3416 (0.3495) loss 3.7445 (3.3126) grad_norm 2.3506 (1.4127) loss_scale 8192.0000 (7931.1532) mem 14261MB [2024-07-24 16:14:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][590/625] eta 0:00:12 lr 0.001106 wd 0.0500 time 0.3361 (0.3509) data time 0.0011 (0.0018) model time 0.3350 (0.3494) loss 3.8399 (3.3104) grad_norm 1.6243 (1.4142) loss_scale 8192.0000 (7935.5668) mem 14261MB [2024-07-24 16:14:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][600/625] eta 0:00:08 lr 0.001106 wd 0.0500 time 0.3440 (0.3507) data time 0.0010 (0.0018) model time 0.3430 (0.3493) loss 3.4936 (3.3120) grad_norm 1.8180 (1.4115) loss_scale 8192.0000 (7939.8336) mem 14261MB [2024-07-24 16:14:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][610/625] eta 0:00:05 lr 0.001105 wd 0.0500 time 0.3298 (0.3505) data time 0.0005 (0.0018) model time 0.3292 (0.3490) loss 3.1540 (3.3195) grad_norm 1.2413 (1.4116) loss_scale 8192.0000 (7943.9607) mem 14261MB [2024-07-24 16:14:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][620/625] eta 0:00:01 lr 0.001105 wd 0.0500 time 0.3270 (0.3502) data time 0.0005 (0.0018) model time 0.3264 (0.3487) loss 3.7550 (3.3187) grad_norm 1.3938 (1.4132) loss_scale 8192.0000 (7947.9549) mem 14261MB [2024-07-24 16:14:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 70 training takes 0:03:38 [2024-07-24 16:14:48 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:14:49 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.506 (0.506) Loss 0.7358 (0.7358) Acc@1 85.938 (85.938) Acc@5 97.266 (97.266) Mem 14261MB [2024-07-24 16:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.130) Loss 1.1885 (0.8863) Acc@1 74.707 (82.369) Acc@5 93.457 (96.480) Mem 14261MB [2024-07-24 16:14:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.109) Loss 1.2656 (1.0479) Acc@1 72.119 (78.376) Acc@5 92.236 (94.568) Mem 14261MB [2024-07-24 16:14:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.109 Acc@5 94.566 [2024-07-24 16:14:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.1% [2024-07-24 16:14:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.11% [2024-07-24 16:14:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 16:14:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 16:14:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.499 (0.499) Loss 0.6807 (0.6807) Acc@1 86.426 (86.426) Acc@5 97.852 (97.852) Mem 14261MB [2024-07-24 16:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.128) Loss 1.0830 (0.8296) Acc@1 75.879 (82.910) Acc@5 94.238 (96.635) Mem 14261MB [2024-07-24 16:14:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.108) Loss 1.2627 (0.9874) Acc@1 70.654 (78.839) Acc@5 91.895 (94.692) Mem 14261MB [2024-07-24 16:14:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.535 Acc@5 94.682 [2024-07-24 16:14:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-07-24 16:14:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.53% [2024-07-24 16:14:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:14:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:14:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][0/625] eta 0:07:49 lr 0.001105 wd 0.0500 time 0.7507 (0.7507) data time 0.4116 (0.4116) model time 0.0000 (0.0000) loss 3.2672 (3.2672) grad_norm 1.0503 (1.0503) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][10/625] eta 0:03:54 lr 0.001105 wd 0.0500 time 0.3559 (0.3811) data time 0.0010 (0.0383) model time 0.0000 (0.0000) loss 3.0824 (3.2245) grad_norm 2.2432 (1.2804) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][20/625] eta 0:03:40 lr 0.001105 wd 0.0500 time 0.3474 (0.3643) data time 0.0008 (0.0205) model time 0.0000 (0.0000) loss 2.3704 (3.2541) grad_norm 1.5027 (1.3569) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][30/625] eta 0:03:32 lr 0.001105 wd 0.0500 time 0.3575 (0.3571) data time 0.0008 (0.0142) model time 0.0000 (0.0000) loss 2.8433 (3.2365) grad_norm 1.0244 (1.3893) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][40/625] eta 0:03:27 lr 0.001105 wd 0.0500 time 0.3466 (0.3539) data time 0.0009 (0.0110) model time 0.0000 (0.0000) loss 3.4060 (3.2294) grad_norm 1.4299 (1.3819) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][50/625] eta 0:03:22 lr 0.001105 wd 0.0500 time 0.3265 (0.3515) data time 0.0007 (0.0091) model time 0.0000 (0.0000) loss 2.1964 (3.2257) grad_norm 1.1266 (1.3636) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][60/625] eta 0:03:17 lr 0.001105 wd 0.0500 time 0.3438 (0.3499) data time 0.0011 (0.0078) model time 0.3426 (0.3412) loss 3.6564 (3.2777) grad_norm 1.0540 (1.3610) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][70/625] eta 0:03:13 lr 0.001105 wd 0.0500 time 0.3422 (0.3489) data time 0.0008 (0.0068) model time 0.3414 (0.3414) loss 2.1527 (3.2389) grad_norm 1.2796 (1.3642) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][80/625] eta 0:03:09 lr 0.001105 wd 0.0500 time 0.3378 (0.3480) data time 0.0010 (0.0061) model time 0.3368 (0.3411) loss 3.6915 (3.2451) grad_norm 1.6116 (1.3660) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][90/625] eta 0:03:05 lr 0.001105 wd 0.0500 time 0.3410 (0.3475) data time 0.0010 (0.0056) model time 0.3400 (0.3414) loss 3.2900 (3.2347) grad_norm 1.6062 (1.4055) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][100/625] eta 0:03:02 lr 0.001105 wd 0.0500 time 0.3411 (0.3470) data time 0.0010 (0.0051) model time 0.3401 (0.3414) loss 3.1144 (3.2209) grad_norm 1.1925 (1.4225) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][110/625] eta 0:02:58 lr 0.001105 wd 0.0500 time 0.3379 (0.3463) data time 0.0008 (0.0047) model time 0.3371 (0.3409) loss 2.0164 (3.2322) grad_norm 1.3805 (1.4364) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][120/625] eta 0:02:54 lr 0.001105 wd 0.0500 time 0.3368 (0.3459) data time 0.0009 (0.0044) model time 0.3359 (0.3408) loss 3.1330 (3.2340) grad_norm 1.3203 (1.4228) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][130/625] eta 0:02:51 lr 0.001105 wd 0.0500 time 0.3351 (0.3456) data time 0.0012 (0.0042) model time 0.3339 (0.3408) loss 3.9079 (3.2549) grad_norm 2.4741 (1.4384) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][140/625] eta 0:02:47 lr 0.001105 wd 0.0500 time 0.3570 (0.3454) data time 0.0012 (0.0039) model time 0.3558 (0.3410) loss 3.5318 (3.2694) grad_norm 1.1559 (1.4411) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][150/625] eta 0:02:44 lr 0.001105 wd 0.0500 time 0.3399 (0.3454) data time 0.0011 (0.0038) model time 0.3388 (0.3413) loss 3.2957 (3.2633) grad_norm 1.7782 (1.4546) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][160/625] eta 0:02:42 lr 0.001104 wd 0.0500 time 0.5451 (0.3502) data time 0.0010 (0.0036) model time 0.5440 (0.3485) loss 3.3154 (3.2671) grad_norm 1.3373 (1.4613) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:15:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][170/625] eta 0:02:39 lr 0.001104 wd 0.0500 time 0.3591 (0.3501) data time 0.0008 (0.0035) model time 0.3583 (0.3484) loss 2.7078 (3.2679) grad_norm 1.9110 (1.4539) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][180/625] eta 0:02:35 lr 0.001104 wd 0.0500 time 0.3404 (0.3497) data time 0.0008 (0.0034) model time 0.3397 (0.3479) loss 3.8032 (3.2904) grad_norm 1.0936 (1.4488) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][190/625] eta 0:02:31 lr 0.001104 wd 0.0500 time 0.3495 (0.3492) data time 0.0011 (0.0033) model time 0.3484 (0.3473) loss 3.6554 (3.2825) grad_norm 1.4813 (1.4388) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][200/625] eta 0:02:28 lr 0.001104 wd 0.0500 time 0.3435 (0.3499) data time 0.0009 (0.0032) model time 0.3426 (0.3483) loss 3.7842 (3.2853) grad_norm 1.6747 (1.4418) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][210/625] eta 0:02:25 lr 0.001104 wd 0.0500 time 0.3446 (0.3497) data time 0.0011 (0.0031) model time 0.3435 (0.3480) loss 3.5656 (3.2937) grad_norm 1.0996 (1.4460) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][220/625] eta 0:02:21 lr 0.001104 wd 0.0500 time 0.3482 (0.3494) data time 0.0008 (0.0030) model time 0.3474 (0.3476) loss 3.3223 (3.2992) grad_norm 1.1400 (1.4451) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][230/625] eta 0:02:17 lr 0.001104 wd 0.0500 time 0.3294 (0.3491) data time 0.0010 (0.0029) model time 0.3284 (0.3473) loss 3.2492 (3.2943) grad_norm 1.3751 (1.4418) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][240/625] eta 0:02:14 lr 0.001104 wd 0.0500 time 0.3400 (0.3488) data time 0.0011 (0.0029) model time 0.3389 (0.3469) loss 3.3939 (3.3089) grad_norm 1.6075 (1.4499) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][250/625] eta 0:02:10 lr 0.001104 wd 0.0500 time 0.3327 (0.3484) data time 0.0009 (0.0028) model time 0.3318 (0.3464) loss 3.6458 (3.3100) grad_norm 1.5523 (1.4554) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][260/625] eta 0:02:07 lr 0.001104 wd 0.0500 time 0.3482 (0.3481) data time 0.0008 (0.0027) model time 0.3474 (0.3461) loss 3.9345 (3.3141) grad_norm 1.4968 (1.4534) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][270/625] eta 0:02:03 lr 0.001104 wd 0.0500 time 0.3364 (0.3479) data time 0.0011 (0.0026) model time 0.3353 (0.3459) loss 3.6553 (3.3139) grad_norm 1.4155 (1.4450) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][280/625] eta 0:01:59 lr 0.001104 wd 0.0500 time 0.3458 (0.3477) data time 0.0011 (0.0026) model time 0.3447 (0.3458) loss 3.5811 (3.3139) grad_norm 1.8101 (1.4522) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][290/625] eta 0:01:56 lr 0.001104 wd 0.0500 time 0.3327 (0.3476) data time 0.0010 (0.0025) model time 0.3317 (0.3457) loss 3.4567 (3.3148) grad_norm 1.4534 (1.4487) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][300/625] eta 0:01:52 lr 0.001104 wd 0.0500 time 0.3404 (0.3475) data time 0.0015 (0.0025) model time 0.3390 (0.3455) loss 3.4061 (3.3140) grad_norm 1.2781 (1.4470) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][310/625] eta 0:01:49 lr 0.001104 wd 0.0500 time 0.3421 (0.3473) data time 0.0011 (0.0024) model time 0.3410 (0.3454) loss 3.1311 (3.3171) grad_norm 1.1384 (1.4583) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][320/625] eta 0:01:45 lr 0.001104 wd 0.0500 time 0.3315 (0.3472) data time 0.0011 (0.0024) model time 0.3304 (0.3453) loss 2.4234 (3.3044) grad_norm 1.3613 (1.4623) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][330/625] eta 0:01:42 lr 0.001103 wd 0.0500 time 0.3335 (0.3471) data time 0.0010 (0.0024) model time 0.3325 (0.3451) loss 3.1624 (3.3033) grad_norm 1.6479 (1.4553) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][340/625] eta 0:01:38 lr 0.001103 wd 0.0500 time 0.3347 (0.3469) data time 0.0010 (0.0023) model time 0.3336 (0.3450) loss 3.1326 (3.3043) grad_norm 1.2708 (1.4635) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][350/625] eta 0:01:35 lr 0.001103 wd 0.0500 time 0.3355 (0.3466) data time 0.0010 (0.0023) model time 0.3344 (0.3447) loss 3.7179 (3.3085) grad_norm 1.8238 (1.4661) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][360/625] eta 0:01:31 lr 0.001103 wd 0.0500 time 0.3431 (0.3469) data time 0.0010 (0.0022) model time 0.3421 (0.3450) loss 2.6347 (3.3016) grad_norm 1.3790 (1.4639) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][370/625] eta 0:01:28 lr 0.001103 wd 0.0500 time 0.3463 (0.3469) data time 0.0011 (0.0022) model time 0.3452 (0.3450) loss 3.8201 (3.2986) grad_norm 1.2520 (1.4592) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][380/625] eta 0:01:25 lr 0.001103 wd 0.0500 time 0.3426 (0.3482) data time 0.0011 (0.0022) model time 0.3414 (0.3466) loss 2.7673 (3.2982) grad_norm 1.1405 (1.4566) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][390/625] eta 0:01:21 lr 0.001103 wd 0.0500 time 0.3385 (0.3488) data time 0.0011 (0.0022) model time 0.3374 (0.3473) loss 2.6960 (3.2926) grad_norm 1.3309 (1.4541) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][400/625] eta 0:01:18 lr 0.001103 wd 0.0500 time 0.3478 (0.3488) data time 0.0008 (0.0021) model time 0.3470 (0.3473) loss 3.8293 (3.2997) grad_norm 2.2391 (1.4568) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][410/625] eta 0:01:14 lr 0.001103 wd 0.0500 time 0.3380 (0.3487) data time 0.0012 (0.0021) model time 0.3368 (0.3472) loss 3.0751 (3.2990) grad_norm 1.3971 (1.4545) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][420/625] eta 0:01:11 lr 0.001103 wd 0.0500 time 0.5829 (0.3492) data time 0.0008 (0.0021) model time 0.5821 (0.3478) loss 2.5855 (3.2997) grad_norm 1.5631 (1.4565) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][430/625] eta 0:01:08 lr 0.001103 wd 0.0500 time 0.3634 (0.3491) data time 0.0009 (0.0021) model time 0.3625 (0.3476) loss 3.5942 (3.2957) grad_norm 1.8556 (1.4535) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][440/625] eta 0:01:04 lr 0.001103 wd 0.0500 time 0.3354 (0.3490) data time 0.0010 (0.0020) model time 0.3344 (0.3475) loss 3.1988 (3.2947) grad_norm 1.3033 (1.4500) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][450/625] eta 0:01:01 lr 0.001103 wd 0.0500 time 0.3419 (0.3488) data time 0.0008 (0.0020) model time 0.3411 (0.3474) loss 3.2592 (3.2984) grad_norm 2.1380 (1.4523) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][460/625] eta 0:00:57 lr 0.001103 wd 0.0500 time 0.3394 (0.3487) data time 0.0008 (0.0020) model time 0.3387 (0.3473) loss 3.9928 (3.2994) grad_norm 1.1340 (1.4585) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][470/625] eta 0:00:54 lr 0.001103 wd 0.0500 time 0.3389 (0.3486) data time 0.0011 (0.0020) model time 0.3377 (0.3471) loss 2.7286 (3.3044) grad_norm 1.6547 (1.4562) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][480/625] eta 0:00:50 lr 0.001103 wd 0.0500 time 0.3559 (0.3485) data time 0.0008 (0.0019) model time 0.3551 (0.3471) loss 3.5362 (3.3049) grad_norm 1.1103 (1.4542) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][490/625] eta 0:00:47 lr 0.001103 wd 0.0500 time 0.3426 (0.3484) data time 0.0011 (0.0019) model time 0.3415 (0.3469) loss 3.1410 (3.3054) grad_norm 1.1364 (1.4515) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][500/625] eta 0:00:43 lr 0.001102 wd 0.0500 time 0.3382 (0.3482) data time 0.0009 (0.0019) model time 0.3374 (0.3468) loss 3.2593 (3.3031) grad_norm 1.0951 (1.4465) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][510/625] eta 0:00:40 lr 0.001102 wd 0.0500 time 0.3475 (0.3482) data time 0.0008 (0.0019) model time 0.3467 (0.3467) loss 3.7715 (3.3044) grad_norm 1.2635 (1.4426) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:17:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][520/625] eta 0:00:36 lr 0.001102 wd 0.0500 time 0.3706 (0.3482) data time 0.0011 (0.0019) model time 0.3694 (0.3467) loss 3.7878 (3.3040) grad_norm 1.5667 (1.4417) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][530/625] eta 0:00:33 lr 0.001102 wd 0.0500 time 0.3469 (0.3482) data time 0.0009 (0.0019) model time 0.3459 (0.3467) loss 3.2868 (3.3011) grad_norm 1.1947 (1.4491) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][540/625] eta 0:00:29 lr 0.001102 wd 0.0500 time 0.3350 (0.3481) data time 0.0009 (0.0018) model time 0.3341 (0.3467) loss 3.6617 (3.2983) grad_norm 1.6714 (1.4554) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][550/625] eta 0:00:26 lr 0.001102 wd 0.0500 time 0.3480 (0.3481) data time 0.0010 (0.0018) model time 0.3470 (0.3467) loss 3.0429 (3.2918) grad_norm 1.8156 (1.4557) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][560/625] eta 0:00:22 lr 0.001102 wd 0.0500 time 0.3395 (0.3481) data time 0.0008 (0.0018) model time 0.3387 (0.3467) loss 2.4833 (3.2884) grad_norm 1.5450 (1.4535) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][570/625] eta 0:00:19 lr 0.001102 wd 0.0500 time 0.3376 (0.3481) data time 0.0008 (0.0018) model time 0.3368 (0.3467) loss 2.4719 (3.2925) grad_norm 1.6429 (1.4576) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][580/625] eta 0:00:15 lr 0.001102 wd 0.0500 time 0.3515 (0.3481) data time 0.0009 (0.0018) model time 0.3506 (0.3467) loss 4.2409 (3.2918) grad_norm 1.0485 (1.4547) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][590/625] eta 0:00:12 lr 0.001102 wd 0.0500 time 0.3417 (0.3481) data time 0.0008 (0.0018) model time 0.3409 (0.3466) loss 3.6207 (3.2944) grad_norm 1.2952 (1.4562) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][600/625] eta 0:00:08 lr 0.001102 wd 0.0500 time 0.3332 (0.3483) data time 0.0007 (0.0018) model time 0.3325 (0.3469) loss 2.4167 (3.2940) grad_norm 1.1830 (1.4583) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][610/625] eta 0:00:05 lr 0.001102 wd 0.0500 time 0.3331 (0.3485) data time 0.0008 (0.0018) model time 0.3323 (0.3472) loss 2.7317 (3.2933) grad_norm 1.7210 (1.4572) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][620/625] eta 0:00:01 lr 0.001102 wd 0.0500 time 0.3291 (0.3482) data time 0.0005 (0.0018) model time 0.3286 (0.3468) loss 2.8004 (3.2917) grad_norm 1.8724 (1.4563) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 71 training takes 0:03:37 [2024-07-24 16:18:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:18:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:18:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.493 (0.493) Loss 0.6943 (0.6943) Acc@1 85.986 (85.986) Acc@5 97.607 (97.607) Mem 14261MB [2024-07-24 16:18:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.126) Loss 1.1572 (0.8576) Acc@1 74.805 (82.129) Acc@5 93.066 (96.418) Mem 14261MB [2024-07-24 16:18:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.107) Loss 1.2803 (1.0267) Acc@1 70.947 (77.960) Acc@5 91.211 (94.443) Mem 14261MB [2024-07-24 16:18:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.671 Acc@5 94.394 [2024-07-24 16:18:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.7% [2024-07-24 16:18:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.846 (0.846) Loss 0.6777 (0.6777) Acc@1 86.621 (86.621) Acc@5 97.852 (97.852) Mem 14261MB [2024-07-24 16:18:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.163) Loss 1.0752 (0.8258) Acc@1 76.270 (83.110) Acc@5 94.238 (96.666) Mem 14261MB [2024-07-24 16:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.127) Loss 1.2529 (0.9819) Acc@1 71.045 (79.048) Acc@5 92.188 (94.759) Mem 14261MB [2024-07-24 16:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.745 Acc@5 94.750 [2024-07-24 16:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.7% [2024-07-24 16:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.74% [2024-07-24 16:18:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:18:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][0/625] eta 0:09:10 lr 0.001102 wd 0.0500 time 0.8806 (0.8806) data time 0.5569 (0.5569) model time 0.0000 (0.0000) loss 3.3901 (3.3901) grad_norm 2.3887 (2.3887) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][10/625] eta 0:04:01 lr 0.001102 wd 0.0500 time 0.3359 (0.3931) data time 0.0008 (0.0520) model time 0.0000 (0.0000) loss 3.9518 (3.5995) grad_norm 1.4340 (1.8053) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][20/625] eta 0:03:43 lr 0.001102 wd 0.0500 time 0.3427 (0.3697) data time 0.0011 (0.0277) model time 0.0000 (0.0000) loss 3.7359 (3.4484) grad_norm 1.2306 (1.6032) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][30/625] eta 0:03:35 lr 0.001102 wd 0.0500 time 0.3490 (0.3627) data time 0.0008 (0.0191) model time 0.0000 (0.0000) loss 3.3522 (3.3740) grad_norm 1.1535 (1.5230) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:18:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][40/625] eta 0:03:30 lr 0.001102 wd 0.0500 time 0.3638 (0.3591) data time 0.0008 (0.0147) model time 0.0000 (0.0000) loss 4.1554 (3.3382) grad_norm 1.2554 (1.4988) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][50/625] eta 0:03:25 lr 0.001101 wd 0.0500 time 0.3427 (0.3567) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 3.1589 (3.3252) grad_norm 1.3575 (1.4823) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][60/625] eta 0:03:20 lr 0.001101 wd 0.0500 time 0.3456 (0.3555) data time 0.0011 (0.0102) model time 0.3445 (0.3487) loss 3.4415 (3.3413) grad_norm 1.2900 (1.4342) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][70/625] eta 0:03:16 lr 0.001101 wd 0.0500 time 0.3704 (0.3545) data time 0.0008 (0.0091) model time 0.3696 (0.3470) loss 3.3812 (3.3420) grad_norm 1.4944 (1.4173) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][80/625] eta 0:03:12 lr 0.001101 wd 0.0500 time 0.3410 (0.3526) data time 0.0009 (0.0081) model time 0.3401 (0.3442) loss 3.4829 (3.3599) grad_norm 1.2815 (1.4109) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][90/625] eta 0:03:08 lr 0.001101 wd 0.0500 time 0.3408 (0.3515) data time 0.0010 (0.0074) model time 0.3397 (0.3434) loss 3.4885 (3.3364) grad_norm 1.8437 (1.4236) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][100/625] eta 0:03:04 lr 0.001101 wd 0.0500 time 0.3417 (0.3508) data time 0.0008 (0.0067) model time 0.3409 (0.3435) loss 3.7876 (3.3140) grad_norm 1.6157 (1.4460) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][110/625] eta 0:03:00 lr 0.001101 wd 0.0500 time 0.3404 (0.3505) data time 0.0009 (0.0062) model time 0.3396 (0.3439) loss 3.4030 (3.2898) grad_norm 1.2120 (1.4766) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][120/625] eta 0:02:56 lr 0.001101 wd 0.0500 time 0.3381 (0.3499) data time 0.0007 (0.0058) model time 0.3374 (0.3438) loss 3.8514 (3.2863) grad_norm 2.0442 (1.4766) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][130/625] eta 0:02:52 lr 0.001101 wd 0.0500 time 0.3454 (0.3493) data time 0.0010 (0.0054) model time 0.3444 (0.3434) loss 2.9418 (3.2903) grad_norm 1.7467 (1.4752) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][140/625] eta 0:02:49 lr 0.001101 wd 0.0500 time 0.3365 (0.3491) data time 0.0015 (0.0051) model time 0.3350 (0.3436) loss 3.2258 (3.2943) grad_norm 1.0794 (1.4596) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][150/625] eta 0:02:45 lr 0.001101 wd 0.0500 time 0.3382 (0.3484) data time 0.0012 (0.0048) model time 0.3371 (0.3430) loss 3.1065 (3.2901) grad_norm 1.1828 (1.4964) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][160/625] eta 0:02:41 lr 0.001101 wd 0.0500 time 0.3380 (0.3478) data time 0.0010 (0.0046) model time 0.3370 (0.3426) loss 3.7520 (3.2881) grad_norm 1.1400 (1.5142) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][170/625] eta 0:02:38 lr 0.001101 wd 0.0500 time 0.3409 (0.3475) data time 0.0010 (0.0044) model time 0.3399 (0.3424) loss 3.4254 (3.2760) grad_norm 1.0574 (1.5156) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][180/625] eta 0:02:34 lr 0.001101 wd 0.0500 time 0.3345 (0.3470) data time 0.0011 (0.0042) model time 0.3334 (0.3421) loss 3.7422 (3.2813) grad_norm 1.8982 (1.5114) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][190/625] eta 0:02:31 lr 0.001101 wd 0.0500 time 0.5263 (0.3484) data time 0.0011 (0.0041) model time 0.5253 (0.3442) loss 2.5856 (3.2778) grad_norm 1.4751 (1.5075) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][200/625] eta 0:02:29 lr 0.001101 wd 0.0500 time 0.5733 (0.3507) data time 0.0010 (0.0039) model time 0.5723 (0.3475) loss 3.4259 (3.2963) grad_norm 1.5023 (1.5001) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:19:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][210/625] eta 0:02:25 lr 0.001100 wd 0.0500 time 0.3429 (0.3513) data time 0.0008 (0.0038) model time 0.3421 (0.3484) loss 4.2675 (3.3062) grad_norm 1.5785 (1.4989) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][220/625] eta 0:02:22 lr 0.001100 wd 0.0500 time 0.3478 (0.3509) data time 0.0011 (0.0036) model time 0.3467 (0.3481) loss 3.1897 (3.3041) grad_norm 1.3303 (1.4961) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][230/625] eta 0:02:18 lr 0.001100 wd 0.0500 time 0.3425 (0.3505) data time 0.0010 (0.0035) model time 0.3415 (0.3476) loss 3.4408 (3.3015) grad_norm 1.1167 (1.4873) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][240/625] eta 0:02:14 lr 0.001100 wd 0.0500 time 0.3336 (0.3501) data time 0.0009 (0.0034) model time 0.3326 (0.3472) loss 3.7962 (3.3186) grad_norm 1.0934 (1.4814) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][250/625] eta 0:02:11 lr 0.001100 wd 0.0500 time 0.3473 (0.3499) data time 0.0010 (0.0033) model time 0.3463 (0.3470) loss 3.6241 (3.3338) grad_norm 1.1911 (1.4700) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][260/625] eta 0:02:07 lr 0.001100 wd 0.0500 time 0.3294 (0.3495) data time 0.0010 (0.0032) model time 0.3284 (0.3467) loss 3.4080 (3.3365) grad_norm 1.6058 (1.4657) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][270/625] eta 0:02:03 lr 0.001100 wd 0.0500 time 0.3323 (0.3492) data time 0.0009 (0.0032) model time 0.3314 (0.3463) loss 2.4197 (3.3270) grad_norm 1.1129 (1.4632) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][280/625] eta 0:02:00 lr 0.001100 wd 0.0500 time 0.3388 (0.3490) data time 0.0008 (0.0031) model time 0.3379 (0.3462) loss 3.1374 (3.3192) grad_norm 1.4272 (1.4607) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][290/625] eta 0:01:56 lr 0.001100 wd 0.0500 time 0.3417 (0.3488) data time 0.0010 (0.0030) model time 0.3407 (0.3460) loss 3.4720 (3.3215) grad_norm 1.3288 (1.4596) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][300/625] eta 0:01:53 lr 0.001100 wd 0.0500 time 0.3358 (0.3486) data time 0.0009 (0.0030) model time 0.3350 (0.3458) loss 3.8783 (3.3208) grad_norm 1.7546 (1.4660) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][310/625] eta 0:01:49 lr 0.001100 wd 0.0500 time 0.3465 (0.3483) data time 0.0008 (0.0029) model time 0.3457 (0.3455) loss 3.9653 (3.3249) grad_norm 1.2103 (1.4732) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][320/625] eta 0:01:46 lr 0.001100 wd 0.0500 time 0.3486 (0.3481) data time 0.0010 (0.0029) model time 0.3475 (0.3454) loss 2.4840 (3.3255) grad_norm 1.1232 (1.4713) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][330/625] eta 0:01:42 lr 0.001100 wd 0.0500 time 0.3448 (0.3480) data time 0.0011 (0.0028) model time 0.3437 (0.3453) loss 3.1516 (3.3247) grad_norm 1.2810 (1.4653) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][340/625] eta 0:01:39 lr 0.001100 wd 0.0500 time 0.3405 (0.3480) data time 0.0008 (0.0028) model time 0.3398 (0.3452) loss 4.0675 (3.3200) grad_norm 1.5874 (1.4668) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][350/625] eta 0:01:35 lr 0.001100 wd 0.0500 time 0.3425 (0.3478) data time 0.0009 (0.0028) model time 0.3416 (0.3451) loss 3.0716 (3.3207) grad_norm 1.5793 (1.4709) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][360/625] eta 0:01:32 lr 0.001100 wd 0.0500 time 0.3379 (0.3479) data time 0.0008 (0.0027) model time 0.3371 (0.3453) loss 3.4044 (3.3249) grad_norm 1.4992 (1.4764) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][370/625] eta 0:01:28 lr 0.001100 wd 0.0500 time 0.3389 (0.3478) data time 0.0011 (0.0027) model time 0.3378 (0.3451) loss 3.4036 (3.3260) grad_norm 1.9721 (1.4824) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][380/625] eta 0:01:25 lr 0.001099 wd 0.0500 time 0.3357 (0.3476) data time 0.0014 (0.0026) model time 0.3344 (0.3450) loss 2.7929 (3.3242) grad_norm 1.2079 (1.4943) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:20:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][390/625] eta 0:01:21 lr 0.001099 wd 0.0500 time 0.3427 (0.3475) data time 0.0010 (0.0026) model time 0.3418 (0.3449) loss 2.7866 (3.3294) grad_norm 0.9515 (1.5029) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][400/625] eta 0:01:18 lr 0.001099 wd 0.0500 time 0.3535 (0.3474) data time 0.0008 (0.0026) model time 0.3527 (0.3449) loss 3.9689 (3.3301) grad_norm 1.0950 (1.5019) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][410/625] eta 0:01:14 lr 0.001099 wd 0.0500 time 0.5559 (0.3483) data time 0.0011 (0.0025) model time 0.5547 (0.3460) loss 3.9372 (3.3224) grad_norm 1.3490 (1.4972) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][420/625] eta 0:01:11 lr 0.001099 wd 0.0500 time 0.3434 (0.3492) data time 0.0008 (0.0025) model time 0.3426 (0.3470) loss 3.9327 (3.3245) grad_norm 1.2529 (1.4911) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][430/625] eta 0:01:08 lr 0.001099 wd 0.0500 time 0.3464 (0.3500) data time 0.0010 (0.0024) model time 0.3454 (0.3479) loss 4.0141 (3.3245) grad_norm 1.3430 (1.4872) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][440/625] eta 0:01:04 lr 0.001099 wd 0.0500 time 0.3472 (0.3497) data time 0.0008 (0.0024) model time 0.3465 (0.3476) loss 3.7963 (3.3268) grad_norm 0.9614 (1.4818) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][450/625] eta 0:01:01 lr 0.001099 wd 0.0500 time 0.3455 (0.3496) data time 0.0011 (0.0024) model time 0.3444 (0.3475) loss 3.5700 (3.3317) grad_norm 1.5734 (1.4809) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][460/625] eta 0:00:57 lr 0.001099 wd 0.0500 time 0.3572 (0.3495) data time 0.0011 (0.0024) model time 0.3561 (0.3474) loss 3.8046 (3.3330) grad_norm 1.3911 (1.4763) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][470/625] eta 0:00:54 lr 0.001099 wd 0.0500 time 0.3421 (0.3494) data time 0.0010 (0.0023) model time 0.3411 (0.3473) loss 3.3645 (3.3327) grad_norm 1.4333 (1.4741) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][480/625] eta 0:00:50 lr 0.001099 wd 0.0500 time 0.3401 (0.3492) data time 0.0010 (0.0023) model time 0.3391 (0.3472) loss 2.9278 (3.3307) grad_norm 1.3880 (1.4790) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][490/625] eta 0:00:47 lr 0.001099 wd 0.0500 time 0.3391 (0.3491) data time 0.0012 (0.0023) model time 0.3380 (0.3471) loss 3.2369 (3.3231) grad_norm 1.6027 (1.4827) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][500/625] eta 0:00:43 lr 0.001099 wd 0.0500 time 0.3412 (0.3490) data time 0.0008 (0.0023) model time 0.3404 (0.3470) loss 2.8355 (3.3208) grad_norm 1.6045 (1.4868) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][510/625] eta 0:00:40 lr 0.001099 wd 0.0500 time 0.3397 (0.3489) data time 0.0009 (0.0022) model time 0.3388 (0.3469) loss 2.4283 (3.3204) grad_norm 1.7688 (1.4863) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][520/625] eta 0:00:36 lr 0.001099 wd 0.0500 time 0.3417 (0.3488) data time 0.0010 (0.0022) model time 0.3407 (0.3468) loss 3.8850 (3.3165) grad_norm 2.0617 (1.4860) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][530/625] eta 0:00:33 lr 0.001099 wd 0.0500 time 0.3410 (0.3487) data time 0.0010 (0.0022) model time 0.3399 (0.3467) loss 3.4719 (3.3167) grad_norm 1.5784 (1.4874) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][540/625] eta 0:00:29 lr 0.001099 wd 0.0500 time 0.3421 (0.3487) data time 0.0007 (0.0022) model time 0.3414 (0.3467) loss 3.9531 (3.3169) grad_norm 1.1933 (1.4847) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][550/625] eta 0:00:26 lr 0.001098 wd 0.0500 time 0.3432 (0.3486) data time 0.0010 (0.0021) model time 0.3422 (0.3466) loss 3.0799 (3.3201) grad_norm 1.2031 (1.4845) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:21:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][560/625] eta 0:00:22 lr 0.001098 wd 0.0500 time 0.3406 (0.3485) data time 0.0010 (0.0021) model time 0.3397 (0.3466) loss 3.5725 (3.3214) grad_norm 1.4771 (1.4812) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][570/625] eta 0:00:19 lr 0.001098 wd 0.0500 time 0.3392 (0.3485) data time 0.0008 (0.0021) model time 0.3384 (0.3465) loss 3.6837 (3.3215) grad_norm 1.3658 (1.4827) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][580/625] eta 0:00:15 lr 0.001098 wd 0.0500 time 0.3519 (0.3484) data time 0.0010 (0.0021) model time 0.3509 (0.3465) loss 3.1500 (3.3227) grad_norm 1.7776 (1.4817) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][590/625] eta 0:00:12 lr 0.001098 wd 0.0500 time 0.3385 (0.3483) data time 0.0010 (0.0021) model time 0.3375 (0.3464) loss 3.3002 (3.3152) grad_norm 1.0395 (1.4831) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][600/625] eta 0:00:08 lr 0.001098 wd 0.0500 time 0.3354 (0.3482) data time 0.0008 (0.0021) model time 0.3346 (0.3463) loss 2.7525 (3.3151) grad_norm 1.6514 (1.4846) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][610/625] eta 0:00:05 lr 0.001098 wd 0.0500 time 0.3332 (0.3482) data time 0.0005 (0.0020) model time 0.3327 (0.3462) loss 4.0280 (3.3143) grad_norm 1.3648 (1.4835) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][620/625] eta 0:00:01 lr 0.001098 wd 0.0500 time 0.3314 (0.3479) data time 0.0005 (0.0020) model time 0.3308 (0.3460) loss 4.0486 (3.3167) grad_norm 1.1727 (1.4794) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 72 training takes 0:03:37 [2024-07-24 16:22:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:22:21 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:22:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.542 (0.542) Loss 0.7065 (0.7065) Acc@1 86.621 (86.621) Acc@5 97.754 (97.754) Mem 14261MB [2024-07-24 16:22:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.132) Loss 1.1914 (0.8545) Acc@1 73.682 (82.382) Acc@5 92.969 (96.427) Mem 14261MB [2024-07-24 16:22:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.110) Loss 1.2588 (1.0108) Acc@1 71.436 (78.509) Acc@5 91.797 (94.538) Mem 14261MB [2024-07-24 16:22:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.157 Acc@5 94.534 [2024-07-24 16:22:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-07-24 16:22:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.16% [2024-07-24 16:22:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 16:22:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 16:22:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.500 (0.500) Loss 0.6743 (0.6743) Acc@1 86.719 (86.719) Acc@5 97.803 (97.803) Mem 14261MB [2024-07-24 16:22:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.130) Loss 1.0703 (0.8219) Acc@1 76.562 (83.225) Acc@5 94.482 (96.742) Mem 14261MB [2024-07-24 16:22:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.109) Loss 1.2451 (0.9766) Acc@1 71.338 (79.215) Acc@5 92.236 (94.850) Mem 14261MB [2024-07-24 16:22:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.907 Acc@5 94.832 [2024-07-24 16:22:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-07-24 16:22:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.91% [2024-07-24 16:22:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:22:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:22:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][0/625] eta 0:08:15 lr 0.001098 wd 0.0500 time 0.7931 (0.7931) data time 0.4676 (0.4676) model time 0.0000 (0.0000) loss 3.9689 (3.9689) grad_norm 1.5236 (1.5236) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][10/625] eta 0:04:06 lr 0.001098 wd 0.0500 time 0.3378 (0.4013) data time 0.0008 (0.0436) model time 0.0000 (0.0000) loss 3.6570 (3.3733) grad_norm 0.9695 (1.3599) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][20/625] eta 0:03:56 lr 0.001098 wd 0.0500 time 0.3423 (0.3915) data time 0.0008 (0.0233) model time 0.0000 (0.0000) loss 3.6640 (3.4603) grad_norm 1.7666 (1.4207) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][30/625] eta 0:03:47 lr 0.001098 wd 0.0500 time 0.3421 (0.3816) data time 0.0009 (0.0161) model time 0.0000 (0.0000) loss 3.7702 (3.3853) grad_norm 2.3990 (1.4725) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][40/625] eta 0:03:37 lr 0.001098 wd 0.0500 time 0.3371 (0.3718) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 3.9572 (3.3976) grad_norm 1.0989 (1.4223) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][50/625] eta 0:03:30 lr 0.001098 wd 0.0500 time 0.3442 (0.3662) data time 0.0010 (0.0102) model time 0.0000 (0.0000) loss 3.6205 (3.4446) grad_norm 1.3150 (1.3830) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][60/625] eta 0:03:24 lr 0.001098 wd 0.0500 time 0.3306 (0.3621) data time 0.0009 (0.0087) model time 0.3298 (0.3397) loss 4.3293 (3.4944) grad_norm 1.2695 (1.3566) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][70/625] eta 0:03:19 lr 0.001098 wd 0.0500 time 0.3353 (0.3594) data time 0.0008 (0.0076) model time 0.3344 (0.3408) loss 3.0919 (3.4610) grad_norm 1.3233 (1.3443) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:22:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][80/625] eta 0:03:14 lr 0.001098 wd 0.0500 time 0.3376 (0.3571) data time 0.0010 (0.0068) model time 0.3366 (0.3406) loss 3.2317 (3.4366) grad_norm 1.2292 (1.3461) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][90/625] eta 0:03:10 lr 0.001097 wd 0.0500 time 0.3421 (0.3553) data time 0.0008 (0.0062) model time 0.3413 (0.3404) loss 2.1132 (3.4135) grad_norm 1.4024 (1.3542) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][100/625] eta 0:03:05 lr 0.001097 wd 0.0500 time 0.3382 (0.3541) data time 0.0008 (0.0057) model time 0.3373 (0.3406) loss 2.6045 (3.4053) grad_norm 1.3778 (1.3757) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][110/625] eta 0:03:01 lr 0.001097 wd 0.0500 time 0.3328 (0.3529) data time 0.0010 (0.0053) model time 0.3319 (0.3405) loss 3.8796 (3.3953) grad_norm 1.1114 (1.3705) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][120/625] eta 0:02:57 lr 0.001097 wd 0.0500 time 0.3411 (0.3521) data time 0.0011 (0.0050) model time 0.3400 (0.3406) loss 3.2486 (3.3906) grad_norm 0.9505 (1.3728) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][130/625] eta 0:02:53 lr 0.001097 wd 0.0500 time 0.3413 (0.3514) data time 0.0008 (0.0047) model time 0.3405 (0.3406) loss 3.9113 (3.4046) grad_norm 1.0646 (1.3849) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][140/625] eta 0:02:50 lr 0.001097 wd 0.0500 time 0.3355 (0.3509) data time 0.0009 (0.0045) model time 0.3346 (0.3410) loss 4.1253 (3.3991) grad_norm 1.8006 (1.3964) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][150/625] eta 0:02:47 lr 0.001097 wd 0.0500 time 0.3465 (0.3517) data time 0.0015 (0.0043) model time 0.3450 (0.3431) loss 2.5239 (3.3958) grad_norm 1.1538 (1.4029) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][160/625] eta 0:02:43 lr 0.001097 wd 0.0500 time 0.3371 (0.3512) data time 0.0011 (0.0042) model time 0.3360 (0.3429) loss 2.9130 (3.3855) grad_norm 1.8375 (1.4064) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:23:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][170/625] eta 0:02:39 lr 0.001097 wd 0.0500 time 0.3388 (0.3509) data time 0.0007 (0.0040) model time 0.3380 (0.3430) loss 3.9439 (3.4034) grad_norm 1.9002 (1.4142) loss_scale 16384.0000 (8623.1579) mem 14261MB [2024-07-24 16:23:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][180/625] eta 0:02:36 lr 0.001097 wd 0.0500 time 0.3428 (0.3509) data time 0.0011 (0.0039) model time 0.3417 (0.3435) loss 3.3980 (3.3915) grad_norm 1.6016 (1.4186) loss_scale 16384.0000 (9051.9337) mem 14261MB [2024-07-24 16:23:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][190/625] eta 0:02:32 lr 0.001097 wd 0.0500 time 0.3438 (0.3506) data time 0.0007 (0.0037) model time 0.3430 (0.3435) loss 3.5534 (3.3891) grad_norm 1.0009 (1.4139) loss_scale 16384.0000 (9435.8115) mem 14261MB [2024-07-24 16:23:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][200/625] eta 0:02:28 lr 0.001097 wd 0.0500 time 0.3323 (0.3501) data time 0.0008 (0.0036) model time 0.3315 (0.3433) loss 3.8372 (3.3713) grad_norm inf (inf) loss_scale 8192.0000 (9740.7363) mem 14261MB [2024-07-24 16:23:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][210/625] eta 0:02:25 lr 0.001097 wd 0.0500 time 0.3391 (0.3501) data time 0.0009 (0.0035) model time 0.3382 (0.3436) loss 2.7906 (3.3575) grad_norm 0.9706 (inf) loss_scale 8192.0000 (9667.3365) mem 14261MB [2024-07-24 16:23:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][220/625] eta 0:02:21 lr 0.001097 wd 0.0500 time 0.3431 (0.3497) data time 0.0008 (0.0034) model time 0.3423 (0.3434) loss 2.7601 (3.3464) grad_norm 1.3270 (inf) loss_scale 8192.0000 (9600.5792) mem 14261MB [2024-07-24 16:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][230/625] eta 0:02:18 lr 0.001097 wd 0.0500 time 0.3488 (0.3508) data time 0.0008 (0.0033) model time 0.3480 (0.3451) loss 3.4725 (3.3388) grad_norm 1.6098 (inf) loss_scale 8192.0000 (9539.6017) mem 14261MB [2024-07-24 16:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][240/625] eta 0:02:15 lr 0.001097 wd 0.0500 time 0.3368 (0.3520) data time 0.0008 (0.0032) model time 0.3360 (0.3468) loss 3.7178 (3.3372) grad_norm 1.3940 (inf) loss_scale 8192.0000 (9483.6846) mem 14261MB [2024-07-24 16:23:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][250/625] eta 0:02:12 lr 0.001097 wd 0.0500 time 0.3442 (0.3523) data time 0.0010 (0.0031) model time 0.3431 (0.3475) loss 3.3208 (3.3277) grad_norm 1.4160 (inf) loss_scale 8192.0000 (9432.2231) mem 14261MB [2024-07-24 16:24:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][260/625] eta 0:02:08 lr 0.001096 wd 0.0500 time 0.3419 (0.3523) data time 0.0011 (0.0030) model time 0.3409 (0.3476) loss 3.8699 (3.3346) grad_norm 1.4263 (inf) loss_scale 8192.0000 (9384.7050) mem 14261MB [2024-07-24 16:24:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][270/625] eta 0:02:05 lr 0.001096 wd 0.0500 time 0.3423 (0.3522) data time 0.0007 (0.0030) model time 0.3416 (0.3477) loss 2.5371 (3.3284) grad_norm 0.9943 (inf) loss_scale 8192.0000 (9340.6937) mem 14261MB [2024-07-24 16:24:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][280/625] eta 0:02:01 lr 0.001096 wd 0.0500 time 0.3477 (0.3518) data time 0.0011 (0.0029) model time 0.3467 (0.3474) loss 2.5918 (3.3296) grad_norm 1.7171 (inf) loss_scale 8192.0000 (9299.8149) mem 14261MB [2024-07-24 16:24:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][290/625] eta 0:01:57 lr 0.001096 wd 0.0500 time 0.3570 (0.3517) data time 0.0007 (0.0028) model time 0.3563 (0.3473) loss 3.7915 (3.3360) grad_norm 1.6516 (inf) loss_scale 8192.0000 (9261.7457) mem 14261MB [2024-07-24 16:24:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][300/625] eta 0:01:54 lr 0.001096 wd 0.0500 time 0.3477 (0.3513) data time 0.0007 (0.0028) model time 0.3470 (0.3470) loss 3.9908 (3.3388) grad_norm 1.1373 (inf) loss_scale 8192.0000 (9226.2060) mem 14261MB [2024-07-24 16:24:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][310/625] eta 0:01:50 lr 0.001096 wd 0.0500 time 0.3388 (0.3510) data time 0.0008 (0.0027) model time 0.3380 (0.3468) loss 3.8023 (3.3362) grad_norm 1.2389 (inf) loss_scale 8192.0000 (9192.9518) mem 14261MB [2024-07-24 16:24:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][320/625] eta 0:01:46 lr 0.001096 wd 0.0500 time 0.3443 (0.3508) data time 0.0010 (0.0027) model time 0.3432 (0.3466) loss 3.7246 (3.3392) grad_norm 1.2643 (inf) loss_scale 8192.0000 (9161.7695) mem 14261MB [2024-07-24 16:24:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][330/625] eta 0:01:43 lr 0.001096 wd 0.0500 time 0.3378 (0.3505) data time 0.0008 (0.0026) model time 0.3369 (0.3464) loss 2.4606 (3.3329) grad_norm 1.2960 (inf) loss_scale 8192.0000 (9132.4713) mem 14261MB [2024-07-24 16:24:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][340/625] eta 0:01:39 lr 0.001096 wd 0.0500 time 0.3365 (0.3501) data time 0.0009 (0.0026) model time 0.3356 (0.3461) loss 3.4273 (3.3365) grad_norm 1.6894 (inf) loss_scale 8192.0000 (9104.8915) mem 14261MB [2024-07-24 16:24:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][350/625] eta 0:01:36 lr 0.001096 wd 0.0500 time 0.3401 (0.3498) data time 0.0008 (0.0025) model time 0.3393 (0.3458) loss 2.5332 (3.3365) grad_norm 1.0964 (inf) loss_scale 8192.0000 (9078.8832) mem 14261MB [2024-07-24 16:24:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][360/625] eta 0:01:32 lr 0.001096 wd 0.0500 time 0.3490 (0.3496) data time 0.0011 (0.0025) model time 0.3480 (0.3456) loss 3.4967 (3.3300) grad_norm 1.5958 (inf) loss_scale 8192.0000 (9054.3158) mem 14261MB [2024-07-24 16:24:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][370/625] eta 0:01:29 lr 0.001096 wd 0.0500 time 0.3353 (0.3494) data time 0.0008 (0.0025) model time 0.3345 (0.3455) loss 4.0130 (3.3353) grad_norm 1.4507 (inf) loss_scale 8192.0000 (9031.0728) mem 14261MB [2024-07-24 16:24:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][380/625] eta 0:01:25 lr 0.001096 wd 0.0500 time 0.3429 (0.3495) data time 0.0008 (0.0024) model time 0.3421 (0.3457) loss 2.3718 (3.3276) grad_norm 1.2513 (inf) loss_scale 8192.0000 (9009.0499) mem 14261MB [2024-07-24 16:24:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][390/625] eta 0:01:22 lr 0.001096 wd 0.0500 time 0.3405 (0.3493) data time 0.0007 (0.0024) model time 0.3397 (0.3456) loss 2.6964 (3.3285) grad_norm 1.2029 (inf) loss_scale 8192.0000 (8988.1535) mem 14261MB [2024-07-24 16:24:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][400/625] eta 0:01:18 lr 0.001096 wd 0.0500 time 0.3438 (0.3492) data time 0.0008 (0.0024) model time 0.3431 (0.3455) loss 2.8244 (3.3278) grad_norm 1.5155 (inf) loss_scale 8192.0000 (8968.2993) mem 14261MB [2024-07-24 16:24:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][410/625] eta 0:01:15 lr 0.001096 wd 0.0500 time 0.3402 (0.3490) data time 0.0008 (0.0023) model time 0.3394 (0.3454) loss 3.3230 (3.3267) grad_norm 1.2583 (inf) loss_scale 8192.0000 (8949.4112) mem 14261MB [2024-07-24 16:24:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][420/625] eta 0:01:11 lr 0.001096 wd 0.0500 time 0.3318 (0.3488) data time 0.0008 (0.0023) model time 0.3309 (0.3452) loss 2.4639 (3.3258) grad_norm 1.0838 (inf) loss_scale 8192.0000 (8931.4204) mem 14261MB [2024-07-24 16:24:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][430/625] eta 0:01:07 lr 0.001095 wd 0.0500 time 0.3402 (0.3486) data time 0.0010 (0.0023) model time 0.3392 (0.3450) loss 3.8064 (3.3188) grad_norm 1.2859 (inf) loss_scale 8192.0000 (8914.2645) mem 14261MB [2024-07-24 16:25:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][440/625] eta 0:01:04 lr 0.001095 wd 0.0500 time 0.3476 (0.3484) data time 0.0009 (0.0022) model time 0.3468 (0.3449) loss 1.9668 (3.3148) grad_norm 1.2237 (inf) loss_scale 8192.0000 (8897.8866) mem 14261MB [2024-07-24 16:25:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][450/625] eta 0:01:00 lr 0.001095 wd 0.0500 time 0.3519 (0.3483) data time 0.0011 (0.0022) model time 0.3509 (0.3449) loss 2.3998 (3.3196) grad_norm 1.5726 (inf) loss_scale 8192.0000 (8882.2350) mem 14261MB [2024-07-24 16:25:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][460/625] eta 0:00:57 lr 0.001095 wd 0.0500 time 0.5001 (0.3493) data time 0.0008 (0.0022) model time 0.4993 (0.3460) loss 3.5193 (3.3227) grad_norm 1.5250 (inf) loss_scale 8192.0000 (8867.2625) mem 14261MB [2024-07-24 16:25:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][470/625] eta 0:00:54 lr 0.001095 wd 0.0500 time 0.3436 (0.3499) data time 0.0009 (0.0022) model time 0.3427 (0.3467) loss 3.5137 (3.3282) grad_norm 1.6044 (inf) loss_scale 8192.0000 (8852.9257) mem 14261MB [2024-07-24 16:25:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][480/625] eta 0:00:50 lr 0.001095 wd 0.0500 time 0.3465 (0.3499) data time 0.0010 (0.0021) model time 0.3455 (0.3467) loss 3.7456 (3.3289) grad_norm 1.8537 (inf) loss_scale 8192.0000 (8839.1850) mem 14261MB [2024-07-24 16:25:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][490/625] eta 0:00:47 lr 0.001095 wd 0.0500 time 0.3388 (0.3498) data time 0.0008 (0.0021) model time 0.3380 (0.3467) loss 3.5932 (3.3271) grad_norm 1.5979 (inf) loss_scale 8192.0000 (8826.0041) mem 14261MB [2024-07-24 16:25:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][500/625] eta 0:00:43 lr 0.001095 wd 0.0500 time 0.3434 (0.3497) data time 0.0009 (0.0021) model time 0.3426 (0.3467) loss 3.8985 (3.3194) grad_norm 2.2451 (inf) loss_scale 8192.0000 (8813.3493) mem 14261MB [2024-07-24 16:25:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][510/625] eta 0:00:40 lr 0.001095 wd 0.0500 time 0.3380 (0.3496) data time 0.0012 (0.0021) model time 0.3368 (0.3466) loss 3.3997 (3.3191) grad_norm 1.3262 (inf) loss_scale 8192.0000 (8801.1898) mem 14261MB [2024-07-24 16:25:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][520/625] eta 0:00:36 lr 0.001095 wd 0.0500 time 0.3413 (0.3495) data time 0.0010 (0.0020) model time 0.3403 (0.3465) loss 2.7658 (3.3128) grad_norm 2.6211 (inf) loss_scale 8192.0000 (8789.4971) mem 14261MB [2024-07-24 16:25:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][530/625] eta 0:00:33 lr 0.001095 wd 0.0500 time 0.3435 (0.3493) data time 0.0010 (0.0020) model time 0.3425 (0.3464) loss 3.6098 (3.3179) grad_norm 1.3695 (inf) loss_scale 8192.0000 (8778.2448) mem 14261MB [2024-07-24 16:25:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][540/625] eta 0:00:29 lr 0.001095 wd 0.0500 time 0.3500 (0.3492) data time 0.0008 (0.0020) model time 0.3492 (0.3463) loss 4.0196 (3.3208) grad_norm 1.0424 (inf) loss_scale 8192.0000 (8767.4085) mem 14261MB [2024-07-24 16:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][550/625] eta 0:00:26 lr 0.001095 wd 0.0500 time 0.3324 (0.3490) data time 0.0008 (0.0020) model time 0.3316 (0.3461) loss 3.4104 (3.3215) grad_norm 1.3025 (inf) loss_scale 8192.0000 (8756.9655) mem 14261MB [2024-07-24 16:25:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][560/625] eta 0:00:22 lr 0.001095 wd 0.0500 time 0.3418 (0.3489) data time 0.0008 (0.0020) model time 0.3410 (0.3460) loss 3.8180 (3.3231) grad_norm 2.3650 (inf) loss_scale 8192.0000 (8746.8948) mem 14261MB [2024-07-24 16:25:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][570/625] eta 0:00:19 lr 0.001095 wd 0.0500 time 0.3461 (0.3488) data time 0.0008 (0.0020) model time 0.3452 (0.3459) loss 3.3431 (3.3223) grad_norm 1.0423 (inf) loss_scale 8192.0000 (8737.1769) mem 14261MB [2024-07-24 16:25:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][580/625] eta 0:00:15 lr 0.001095 wd 0.0500 time 0.3426 (0.3486) data time 0.0012 (0.0019) model time 0.3414 (0.3458) loss 3.3645 (3.3240) grad_norm 0.9760 (inf) loss_scale 8192.0000 (8727.7935) mem 14261MB [2024-07-24 16:25:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][590/625] eta 0:00:12 lr 0.001094 wd 0.0500 time 0.3393 (0.3485) data time 0.0008 (0.0019) model time 0.3385 (0.3457) loss 2.5420 (3.3193) grad_norm 1.3679 (inf) loss_scale 8192.0000 (8718.7276) mem 14261MB [2024-07-24 16:25:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][600/625] eta 0:00:08 lr 0.001094 wd 0.0500 time 0.3465 (0.3487) data time 0.0007 (0.0019) model time 0.3457 (0.3459) loss 2.5819 (3.3163) grad_norm 1.4921 (inf) loss_scale 8192.0000 (8709.9634) mem 14261MB [2024-07-24 16:26:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][610/625] eta 0:00:05 lr 0.001094 wd 0.0500 time 0.3315 (0.3486) data time 0.0005 (0.0019) model time 0.3310 (0.3458) loss 3.7391 (3.3120) grad_norm 0.9132 (inf) loss_scale 8192.0000 (8701.4861) mem 14261MB [2024-07-24 16:26:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][620/625] eta 0:00:01 lr 0.001094 wd 0.0500 time 0.3321 (0.3483) data time 0.0005 (0.0019) model time 0.3315 (0.3456) loss 3.1839 (3.3113) grad_norm 1.7303 (inf) loss_scale 8192.0000 (8693.2818) mem 14261MB [2024-07-24 16:26:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 73 training takes 0:03:37 [2024-07-24 16:26:06 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:26:07 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:26:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.557 (0.557) Loss 0.7026 (0.7026) Acc@1 87.012 (87.012) Acc@5 97.607 (97.607) Mem 14261MB [2024-07-24 16:26:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.136) Loss 1.1270 (0.8323) Acc@1 74.268 (82.724) Acc@5 93.457 (96.578) Mem 14261MB [2024-07-24 16:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.113) Loss 1.2256 (0.9878) Acc@1 72.314 (78.776) Acc@5 91.504 (94.722) Mem 14261MB [2024-07-24 16:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.499 Acc@5 94.714 [2024-07-24 16:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-07-24 16:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.50% [2024-07-24 16:26:09 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 16:26:10 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 16:26:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.505 (0.505) Loss 0.6714 (0.6714) Acc@1 86.816 (86.816) Acc@5 97.900 (97.900) Mem 14261MB [2024-07-24 16:26:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.130) Loss 1.0654 (0.8188) Acc@1 76.367 (83.345) Acc@5 94.629 (96.808) Mem 14261MB [2024-07-24 16:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.110) Loss 1.2363 (0.9718) Acc@1 71.338 (79.336) Acc@5 92.334 (94.947) Mem 14261MB [2024-07-24 16:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.011 Acc@5 94.928 [2024-07-24 16:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-24 16:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.01% [2024-07-24 16:26:13 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:26:14 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:26:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][0/625] eta 0:07:57 lr 0.001094 wd 0.0500 time 0.7632 (0.7632) data time 0.4360 (0.4360) model time 0.0000 (0.0000) loss 2.9558 (2.9558) grad_norm 1.2379 (1.2379) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][10/625] eta 0:03:56 lr 0.001094 wd 0.0500 time 0.3446 (0.3844) data time 0.0009 (0.0407) model time 0.0000 (0.0000) loss 3.8296 (3.2055) grad_norm 1.9127 (1.6015) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][20/625] eta 0:03:39 lr 0.001094 wd 0.0500 time 0.3427 (0.3629) data time 0.0011 (0.0218) model time 0.0000 (0.0000) loss 3.6653 (3.2377) grad_norm 1.4830 (1.6405) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][30/625] eta 0:03:31 lr 0.001094 wd 0.0500 time 0.3423 (0.3554) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 2.3117 (3.1855) grad_norm 1.1761 (1.5009) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][40/625] eta 0:03:26 lr 0.001094 wd 0.0500 time 0.3403 (0.3526) data time 0.0010 (0.0117) model time 0.0000 (0.0000) loss 2.9135 (3.1868) grad_norm 1.3745 (1.4778) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][50/625] eta 0:03:24 lr 0.001094 wd 0.0500 time 0.3419 (0.3549) data time 0.0010 (0.0096) model time 0.0000 (0.0000) loss 2.6542 (3.1828) grad_norm 1.1356 (1.4431) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][60/625] eta 0:03:24 lr 0.001094 wd 0.0500 time 0.3376 (0.3616) data time 0.0008 (0.0082) model time 0.3368 (0.3948) loss 3.7867 (3.2052) grad_norm 1.3001 (1.4214) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][70/625] eta 0:03:20 lr 0.001094 wd 0.0500 time 0.3326 (0.3618) data time 0.0012 (0.0072) model time 0.3314 (0.3784) loss 3.6478 (3.1910) grad_norm 1.3566 (1.4452) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][80/625] eta 0:03:15 lr 0.001094 wd 0.0500 time 0.3413 (0.3592) data time 0.0009 (0.0064) model time 0.3405 (0.3656) loss 3.6705 (3.1743) grad_norm 1.0947 (1.4387) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][90/625] eta 0:03:11 lr 0.001094 wd 0.0500 time 0.3429 (0.3571) data time 0.0013 (0.0059) model time 0.3416 (0.3590) loss 2.9457 (3.1954) grad_norm 1.3169 (1.4186) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][100/625] eta 0:03:07 lr 0.001094 wd 0.0500 time 0.3383 (0.3577) data time 0.0010 (0.0054) model time 0.3373 (0.3595) loss 3.7528 (3.1969) grad_norm 1.2214 (1.4020) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][110/625] eta 0:03:03 lr 0.001094 wd 0.0500 time 0.3454 (0.3564) data time 0.0010 (0.0050) model time 0.3444 (0.3566) loss 2.6842 (3.2028) grad_norm 1.2192 (1.4439) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:26:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][120/625] eta 0:02:59 lr 0.001094 wd 0.0500 time 0.3336 (0.3551) data time 0.0009 (0.0047) model time 0.3327 (0.3541) loss 2.6953 (3.1992) grad_norm 1.4359 (1.4897) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][130/625] eta 0:02:55 lr 0.001093 wd 0.0500 time 0.3490 (0.3541) data time 0.0010 (0.0044) model time 0.3480 (0.3525) loss 2.7186 (3.1755) grad_norm 1.3010 (1.4861) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][140/625] eta 0:02:51 lr 0.001093 wd 0.0500 time 0.3429 (0.3533) data time 0.0011 (0.0042) model time 0.3418 (0.3514) loss 3.8393 (3.1917) grad_norm 1.2701 (1.4795) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][150/625] eta 0:02:47 lr 0.001093 wd 0.0500 time 0.3403 (0.3526) data time 0.0010 (0.0040) model time 0.3392 (0.3503) loss 3.6642 (3.1936) grad_norm 0.9633 (1.4864) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][160/625] eta 0:02:43 lr 0.001093 wd 0.0500 time 0.3468 (0.3520) data time 0.0008 (0.0038) model time 0.3460 (0.3496) loss 3.8129 (3.2043) grad_norm 1.1718 (1.4952) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][170/625] eta 0:02:39 lr 0.001093 wd 0.0500 time 0.3412 (0.3514) data time 0.0011 (0.0036) model time 0.3401 (0.3488) loss 3.7828 (3.1972) grad_norm 1.8513 (1.4968) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][180/625] eta 0:02:36 lr 0.001093 wd 0.0500 time 0.3400 (0.3510) data time 0.0011 (0.0035) model time 0.3390 (0.3484) loss 3.6425 (3.1957) grad_norm 1.1500 (1.4882) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][190/625] eta 0:02:32 lr 0.001093 wd 0.0500 time 0.3542 (0.3506) data time 0.0008 (0.0034) model time 0.3534 (0.3480) loss 3.7206 (3.2114) grad_norm 1.5748 (1.4766) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][200/625] eta 0:02:28 lr 0.001093 wd 0.0500 time 0.3354 (0.3501) data time 0.0008 (0.0032) model time 0.3346 (0.3474) loss 2.8624 (3.2135) grad_norm 1.3210 (1.4680) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][210/625] eta 0:02:25 lr 0.001093 wd 0.0500 time 0.3433 (0.3498) data time 0.0012 (0.0031) model time 0.3421 (0.3471) loss 2.8661 (3.2078) grad_norm 1.1052 (1.4662) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][220/625] eta 0:02:21 lr 0.001093 wd 0.0500 time 0.3426 (0.3495) data time 0.0008 (0.0030) model time 0.3418 (0.3468) loss 2.9909 (3.2125) grad_norm 1.6240 (1.4790) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][230/625] eta 0:02:17 lr 0.001093 wd 0.0500 time 0.3407 (0.3491) data time 0.0008 (0.0030) model time 0.3399 (0.3464) loss 3.4150 (3.2190) grad_norm 1.6308 (1.4967) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][240/625] eta 0:02:14 lr 0.001093 wd 0.0500 time 0.3488 (0.3490) data time 0.0010 (0.0029) model time 0.3477 (0.3463) loss 3.4048 (3.2208) grad_norm 1.1089 (1.5017) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][250/625] eta 0:02:10 lr 0.001093 wd 0.0500 time 0.3396 (0.3489) data time 0.0008 (0.0028) model time 0.3388 (0.3463) loss 4.4157 (3.2244) grad_norm 1.4654 (1.4997) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][260/625] eta 0:02:07 lr 0.001093 wd 0.0500 time 0.3374 (0.3487) data time 0.0010 (0.0027) model time 0.3364 (0.3462) loss 3.3431 (3.2254) grad_norm 1.7997 (1.5113) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][270/625] eta 0:02:03 lr 0.001093 wd 0.0500 time 0.3498 (0.3486) data time 0.0008 (0.0027) model time 0.3490 (0.3461) loss 3.7905 (3.2274) grad_norm 2.0686 (1.5129) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][280/625] eta 0:02:00 lr 0.001093 wd 0.0500 time 0.3391 (0.3503) data time 0.0008 (0.0026) model time 0.3383 (0.3482) loss 3.8539 (3.2429) grad_norm 1.7882 (1.5150) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:27:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][290/625] eta 0:01:57 lr 0.001093 wd 0.0500 time 0.3445 (0.3508) data time 0.0010 (0.0026) model time 0.3435 (0.3488) loss 2.7339 (3.2513) grad_norm 1.6713 (1.5205) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][300/625] eta 0:01:53 lr 0.001092 wd 0.0500 time 0.3380 (0.3506) data time 0.0008 (0.0025) model time 0.3372 (0.3486) loss 3.5913 (3.2567) grad_norm 1.3102 (1.5207) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][310/625] eta 0:01:50 lr 0.001092 wd 0.0500 time 0.3433 (0.3503) data time 0.0008 (0.0025) model time 0.3425 (0.3484) loss 3.0532 (3.2481) grad_norm 2.4203 (1.5200) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][320/625] eta 0:01:46 lr 0.001092 wd 0.0500 time 0.3151 (0.3508) data time 0.0009 (0.0024) model time 0.3143 (0.3489) loss 4.0643 (3.2497) grad_norm 1.7238 (1.5220) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][330/625] eta 0:01:43 lr 0.001092 wd 0.0500 time 0.3447 (0.3506) data time 0.0008 (0.0024) model time 0.3439 (0.3488) loss 2.7981 (3.2528) grad_norm 1.7681 (1.5215) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][340/625] eta 0:01:39 lr 0.001092 wd 0.0500 time 0.3463 (0.3504) data time 0.0008 (0.0023) model time 0.3455 (0.3486) loss 3.4377 (3.2517) grad_norm 1.3239 (1.5217) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][350/625] eta 0:01:36 lr 0.001092 wd 0.0500 time 0.3446 (0.3503) data time 0.0008 (0.0023) model time 0.3438 (0.3484) loss 3.7116 (3.2555) grad_norm 1.4706 (1.5159) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][360/625] eta 0:01:32 lr 0.001092 wd 0.0500 time 0.3377 (0.3500) data time 0.0008 (0.0023) model time 0.3368 (0.3482) loss 2.6583 (3.2568) grad_norm 1.2008 (1.5112) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][370/625] eta 0:01:29 lr 0.001092 wd 0.0500 time 0.3310 (0.3498) data time 0.0010 (0.0022) model time 0.3300 (0.3480) loss 2.7063 (3.2606) grad_norm 1.1658 (1.5060) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][380/625] eta 0:01:25 lr 0.001092 wd 0.0500 time 0.3580 (0.3497) data time 0.0011 (0.0022) model time 0.3569 (0.3479) loss 3.3224 (3.2600) grad_norm 1.3504 (1.5007) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][390/625] eta 0:01:22 lr 0.001092 wd 0.0500 time 0.3401 (0.3496) data time 0.0010 (0.0022) model time 0.3391 (0.3477) loss 3.7279 (3.2659) grad_norm 2.1589 (1.5123) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][400/625] eta 0:01:18 lr 0.001092 wd 0.0500 time 0.3619 (0.3495) data time 0.0010 (0.0021) model time 0.3609 (0.3476) loss 2.2676 (3.2672) grad_norm 2.0847 (1.5157) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][410/625] eta 0:01:15 lr 0.001092 wd 0.0500 time 0.3376 (0.3493) data time 0.0011 (0.0021) model time 0.3365 (0.3475) loss 3.5832 (3.2612) grad_norm 1.7368 (1.5171) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][420/625] eta 0:01:11 lr 0.001092 wd 0.0500 time 0.3385 (0.3492) data time 0.0009 (0.0021) model time 0.3376 (0.3473) loss 3.2779 (3.2598) grad_norm 1.1824 (1.5104) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][430/625] eta 0:01:08 lr 0.001092 wd 0.0500 time 0.3432 (0.3490) data time 0.0010 (0.0021) model time 0.3423 (0.3472) loss 2.9160 (3.2611) grad_norm 0.9795 (1.5052) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][440/625] eta 0:01:04 lr 0.001092 wd 0.0500 time 0.3391 (0.3489) data time 0.0011 (0.0020) model time 0.3380 (0.3470) loss 2.7313 (3.2607) grad_norm 1.1346 (1.5006) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][450/625] eta 0:01:01 lr 0.001092 wd 0.0500 time 0.3590 (0.3488) data time 0.0009 (0.0020) model time 0.3582 (0.3469) loss 2.3692 (3.2600) grad_norm 1.2864 (1.4995) loss_scale 8192.0000 (8192.0000) mem 14261MB [2024-07-24 16:28:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][460/625] eta 0:00:57 lr 0.001091 wd 0.0500 time 0.3419 (0.3486) data time 0.0010 (0.0020) model time 0.3409 (0.3468) loss 3.3471 (3.2578) grad_norm 1.0886 (inf) loss_scale 4096.0000 (8112.0347) mem 14261MB [2024-07-24 16:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][470/625] eta 0:00:54 lr 0.001091 wd 0.0500 time 0.3374 (0.3486) data time 0.0008 (0.0020) model time 0.3366 (0.3468) loss 4.0513 (3.2645) grad_norm 1.0836 (inf) loss_scale 4096.0000 (8026.7686) mem 14261MB [2024-07-24 16:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][480/625] eta 0:00:50 lr 0.001091 wd 0.0500 time 0.3406 (0.3484) data time 0.0011 (0.0020) model time 0.3396 (0.3466) loss 4.0488 (3.2670) grad_norm 1.5383 (inf) loss_scale 4096.0000 (7945.0478) mem 14261MB [2024-07-24 16:29:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][490/625] eta 0:00:47 lr 0.001091 wd 0.0500 time 0.3480 (0.3484) data time 0.0008 (0.0019) model time 0.3472 (0.3466) loss 3.2167 (3.2666) grad_norm 1.2662 (inf) loss_scale 4096.0000 (7866.6558) mem 14261MB [2024-07-24 16:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][500/625] eta 0:00:43 lr 0.001091 wd 0.0500 time 0.3325 (0.3492) data time 0.0008 (0.0019) model time 0.3318 (0.3475) loss 3.7782 (3.2676) grad_norm 1.1138 (inf) loss_scale 4096.0000 (7791.3932) mem 14261MB [2024-07-24 16:29:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][510/625] eta 0:00:40 lr 0.001091 wd 0.0500 time 0.3352 (0.3500) data time 0.0008 (0.0019) model time 0.3343 (0.3484) loss 3.1655 (3.2680) grad_norm 0.9570 (inf) loss_scale 4096.0000 (7719.0763) mem 14261MB [2024-07-24 16:29:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][520/625] eta 0:00:36 lr 0.001091 wd 0.0500 time 0.3505 (0.3499) data time 0.0007 (0.0019) model time 0.3498 (0.3483) loss 2.3190 (3.2684) grad_norm 1.7915 (inf) loss_scale 4096.0000 (7649.5355) mem 14261MB [2024-07-24 16:29:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][530/625] eta 0:00:33 lr 0.001091 wd 0.0500 time 0.3453 (0.3498) data time 0.0011 (0.0019) model time 0.3442 (0.3482) loss 3.1867 (3.2637) grad_norm 2.2695 (inf) loss_scale 4096.0000 (7582.6139) mem 14261MB [2024-07-24 16:29:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][540/625] eta 0:00:29 lr 0.001091 wd 0.0500 time 0.3443 (0.3496) data time 0.0010 (0.0019) model time 0.3433 (0.3481) loss 3.4466 (3.2643) grad_norm 1.1428 (inf) loss_scale 4096.0000 (7518.1664) mem 14261MB [2024-07-24 16:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][550/625] eta 0:00:26 lr 0.001091 wd 0.0500 time 0.3414 (0.3497) data time 0.0010 (0.0018) model time 0.3404 (0.3482) loss 4.0363 (3.2691) grad_norm 1.1782 (inf) loss_scale 4096.0000 (7456.0581) mem 14261MB [2024-07-24 16:29:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][560/625] eta 0:00:22 lr 0.001091 wd 0.0500 time 0.3440 (0.3497) data time 0.0008 (0.0018) model time 0.3433 (0.3482) loss 2.8362 (3.2671) grad_norm 1.1212 (inf) loss_scale 4096.0000 (7396.1640) mem 14261MB [2024-07-24 16:29:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][570/625] eta 0:00:19 lr 0.001091 wd 0.0500 time 0.3631 (0.3497) data time 0.0008 (0.0018) model time 0.3623 (0.3481) loss 3.1085 (3.2641) grad_norm 1.1727 (inf) loss_scale 4096.0000 (7338.3678) mem 14261MB [2024-07-24 16:29:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][580/625] eta 0:00:15 lr 0.001091 wd 0.0500 time 0.3466 (0.3496) data time 0.0012 (0.0018) model time 0.3453 (0.3481) loss 3.6494 (3.2647) grad_norm 1.2861 (inf) loss_scale 4096.0000 (7282.5611) mem 14261MB [2024-07-24 16:29:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][590/625] eta 0:00:12 lr 0.001091 wd 0.0500 time 0.3390 (0.3495) data time 0.0011 (0.0018) model time 0.3379 (0.3479) loss 3.4218 (3.2667) grad_norm 1.7729 (inf) loss_scale 4096.0000 (7228.6430) mem 14261MB [2024-07-24 16:29:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][600/625] eta 0:00:08 lr 0.001091 wd 0.0500 time 0.3363 (0.3495) data time 0.0010 (0.0018) model time 0.3353 (0.3480) loss 3.0721 (3.2694) grad_norm 1.2615 (inf) loss_scale 4096.0000 (7176.5191) mem 14261MB [2024-07-24 16:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][610/625] eta 0:00:05 lr 0.001091 wd 0.0500 time 0.3294 (0.3494) data time 0.0005 (0.0018) model time 0.3288 (0.3479) loss 3.0354 (3.2700) grad_norm 1.1877 (inf) loss_scale 4096.0000 (7126.1015) mem 14261MB [2024-07-24 16:29:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][620/625] eta 0:00:01 lr 0.001090 wd 0.0500 time 0.3303 (0.3492) data time 0.0006 (0.0018) model time 0.3297 (0.3476) loss 2.3787 (3.2683) grad_norm 1.5576 (inf) loss_scale 4096.0000 (7077.3076) mem 14261MB [2024-07-24 16:29:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 74 training takes 0:03:38 [2024-07-24 16:29:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:29:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:29:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.6704 (0.6704) Acc@1 85.449 (85.449) Acc@5 97.900 (97.900) Mem 14261MB [2024-07-24 16:29:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.131) Loss 1.1445 (0.8390) Acc@1 74.805 (82.764) Acc@5 93.896 (96.604) Mem 14261MB [2024-07-24 16:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.110) Loss 1.2539 (1.0088) Acc@1 71.729 (78.627) Acc@5 92.188 (94.592) Mem 14261MB [2024-07-24 16:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.419 Acc@5 94.580 [2024-07-24 16:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.4% [2024-07-24 16:29:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.839 (0.839) Loss 0.6689 (0.6689) Acc@1 86.914 (86.914) Acc@5 97.900 (97.900) Mem 14261MB [2024-07-24 16:29:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.165) Loss 1.0596 (0.8146) Acc@1 76.758 (83.496) Acc@5 94.678 (96.839) Mem 14261MB [2024-07-24 16:29:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.128) Loss 1.2285 (0.9661) Acc@1 71.484 (79.506) Acc@5 92.432 (95.038) Mem 14261MB [2024-07-24 16:29:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.175 Acc@5 95.028 [2024-07-24 16:29:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-24 16:29:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.17% [2024-07-24 16:29:59 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:30:00 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:30:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][0/625] eta 0:38:06 lr 0.001090 wd 0.0500 time 3.6585 (3.6585) data time 2.9561 (2.9561) model time 0.0000 (0.0000) loss 8.3370 (8.3370) grad_norm 3.0809 (3.0809) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][10/625] eta 0:07:16 lr 0.001090 wd 0.0500 time 0.4160 (0.7104) data time 0.0010 (0.2697) model time 0.0000 (0.0000) loss 9.7721 (8.3551) grad_norm 1.8194 (2.2359) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][20/625] eta 0:05:44 lr 0.001090 wd 0.0500 time 0.4261 (0.5693) data time 0.0010 (0.1419) model time 0.0000 (0.0000) loss 7.4387 (8.2920) grad_norm 1.7060 (2.1587) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][30/625] eta 0:05:09 lr 0.001090 wd 0.0500 time 0.4099 (0.5199) data time 0.0008 (0.0964) model time 0.0000 (0.0000) loss 8.8642 (8.3304) grad_norm 1.7769 (2.3169) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][40/625] eta 0:04:49 lr 0.001090 wd 0.0500 time 0.4200 (0.4946) data time 0.0007 (0.0732) model time 0.0000 (0.0000) loss 9.2423 (8.2422) grad_norm 1.9278 (2.2512) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][50/625] eta 0:04:35 lr 0.001090 wd 0.0500 time 0.4127 (0.4793) data time 0.0010 (0.0590) model time 0.0000 (0.0000) loss 9.1782 (8.3147) grad_norm 3.0952 (2.2717) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][60/625] eta 0:04:24 lr 0.001090 wd 0.0500 time 0.4191 (0.4688) data time 0.0008 (0.0495) model time 0.4183 (0.4144) loss 7.5012 (8.2078) grad_norm 1.5778 (2.2501) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][70/625] eta 0:04:16 lr 0.001090 wd 0.0500 time 0.4091 (0.4619) data time 0.0008 (0.0428) model time 0.4083 (0.4164) loss 6.3510 (8.1384) grad_norm 1.8229 (2.2192) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][80/625] eta 0:04:09 lr 0.001090 wd 0.0500 time 0.4060 (0.4579) data time 0.0009 (0.0376) model time 0.4051 (0.4204) loss 7.9323 (8.1855) grad_norm 3.0530 (2.2334) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][90/625] eta 0:04:03 lr 0.001090 wd 0.0500 time 0.4202 (0.4558) data time 0.0011 (0.0336) model time 0.4191 (0.4246) loss 8.3529 (8.2031) grad_norm 2.3879 (2.2322) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][100/625] eta 0:04:00 lr 0.001090 wd 0.0500 time 0.4211 (0.4577) data time 0.0011 (0.0304) model time 0.4200 (0.4346) loss 9.1780 (8.1749) grad_norm 2.3549 (2.2870) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][110/625] eta 0:03:54 lr 0.001090 wd 0.0500 time 0.4148 (0.4558) data time 0.0009 (0.0277) model time 0.4139 (0.4347) loss 8.8402 (8.1699) grad_norm 1.9674 (2.2945) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][120/625] eta 0:03:48 lr 0.001090 wd 0.0500 time 0.4064 (0.4522) data time 0.0011 (0.0255) model time 0.4052 (0.4314) loss 8.8344 (8.1341) grad_norm 1.6177 (2.2821) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:30:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][130/625] eta 0:03:42 lr 0.001090 wd 0.0500 time 0.4173 (0.4500) data time 0.0008 (0.0236) model time 0.4165 (0.4303) loss 7.9009 (8.1384) grad_norm 2.0132 (2.2586) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][140/625] eta 0:03:37 lr 0.001090 wd 0.0500 time 0.4160 (0.4477) data time 0.0011 (0.0220) model time 0.4149 (0.4287) loss 7.6602 (8.1416) grad_norm 3.1487 (2.2503) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][150/625] eta 0:03:31 lr 0.001090 wd 0.0500 time 0.4159 (0.4457) data time 0.0008 (0.0206) model time 0.4151 (0.4275) loss 6.9103 (8.1405) grad_norm 1.7423 (2.2737) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][160/625] eta 0:03:26 lr 0.001089 wd 0.0500 time 0.4166 (0.4437) data time 0.0011 (0.0194) model time 0.4155 (0.4262) loss 7.0338 (8.1353) grad_norm 2.3496 (2.2687) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][170/625] eta 0:03:21 lr 0.001089 wd 0.0500 time 0.4187 (0.4422) data time 0.0010 (0.0184) model time 0.4178 (0.4254) loss 6.8810 (8.1604) grad_norm 1.6341 (2.2516) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][180/625] eta 0:03:16 lr 0.001089 wd 0.0500 time 0.4249 (0.4410) data time 0.0010 (0.0174) model time 0.4239 (0.4250) loss 8.1392 (8.1411) grad_norm 1.8737 (2.2373) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][190/625] eta 0:03:11 lr 0.001089 wd 0.0500 time 0.4181 (0.4401) data time 0.0010 (0.0165) model time 0.4171 (0.4247) loss 10.1449 (8.1591) grad_norm 2.5358 (2.2302) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][200/625] eta 0:03:06 lr 0.001089 wd 0.0500 time 0.4080 (0.4387) data time 0.0010 (0.0158) model time 0.4070 (0.4238) loss 8.9240 (8.1585) grad_norm 1.7274 (2.2200) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][210/625] eta 0:03:01 lr 0.001089 wd 0.0500 time 0.4145 (0.4376) data time 0.0010 (0.0151) model time 0.4135 (0.4233) loss 8.5640 (8.1748) grad_norm 2.3927 (2.2424) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][220/625] eta 0:02:56 lr 0.001089 wd 0.0500 time 0.4274 (0.4366) data time 0.0012 (0.0144) model time 0.4262 (0.4227) loss 6.4849 (8.1354) grad_norm 2.1024 (2.2187) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][230/625] eta 0:02:52 lr 0.001089 wd 0.0500 time 0.4144 (0.4356) data time 0.0010 (0.0139) model time 0.4134 (0.4222) loss 8.8583 (8.1512) grad_norm 1.7481 (2.2117) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][240/625] eta 0:02:47 lr 0.001089 wd 0.0500 time 0.4200 (0.4348) data time 0.0010 (0.0133) model time 0.4190 (0.4218) loss 7.7757 (8.1354) grad_norm 1.9719 (2.2083) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][250/625] eta 0:02:42 lr 0.001089 wd 0.0500 time 0.4145 (0.4340) data time 0.0010 (0.0128) model time 0.4134 (0.4214) loss 7.6080 (8.1336) grad_norm 2.4684 (2.1917) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][260/625] eta 0:02:38 lr 0.001089 wd 0.0500 time 0.4171 (0.4334) data time 0.0008 (0.0124) model time 0.4163 (0.4213) loss 8.3897 (8.1425) grad_norm 2.1091 (2.1964) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:31:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][270/625] eta 0:02:33 lr 0.001089 wd 0.0500 time 0.4147 (0.4328) data time 0.0008 (0.0120) model time 0.4140 (0.4209) loss 8.3067 (8.1352) grad_norm 1.7252 (2.1964) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][280/625] eta 0:02:29 lr 0.001089 wd 0.0500 time 0.4365 (0.4323) data time 0.0011 (0.0116) model time 0.4354 (0.4208) loss 7.5156 (8.1340) grad_norm 1.7558 (2.1838) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][290/625] eta 0:02:24 lr 0.001089 wd 0.0500 time 0.4149 (0.4317) data time 0.0008 (0.0112) model time 0.4141 (0.4206) loss 7.7023 (8.1412) grad_norm 3.0813 (2.1816) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][300/625] eta 0:02:20 lr 0.001089 wd 0.0500 time 0.4181 (0.4318) data time 0.0008 (0.0109) model time 0.4174 (0.4211) loss 9.1678 (8.1378) grad_norm 2.0622 (2.1906) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][310/625] eta 0:02:16 lr 0.001089 wd 0.0500 time 0.4126 (0.4319) data time 0.0011 (0.0106) model time 0.4115 (0.4215) loss 9.0021 (8.1377) grad_norm 2.6560 (2.1846) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][320/625] eta 0:02:12 lr 0.001088 wd 0.0500 time 0.6106 (0.4342) data time 0.0010 (0.0103) model time 0.6096 (0.4246) loss 9.1268 (8.1382) grad_norm 3.0086 (2.1713) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][330/625] eta 0:02:08 lr 0.001088 wd 0.0500 time 0.4098 (0.4343) data time 0.0008 (0.0100) model time 0.4090 (0.4251) loss 7.0896 (8.1253) grad_norm 2.2347 (2.1709) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][340/625] eta 0:02:03 lr 0.001088 wd 0.0500 time 0.4200 (0.4339) data time 0.0008 (0.0097) model time 0.4192 (0.4248) loss 9.2159 (8.1219) grad_norm 2.7169 (2.1627) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][350/625] eta 0:01:59 lr 0.001088 wd 0.0500 time 0.4301 (0.4334) data time 0.0011 (0.0095) model time 0.4290 (0.4245) loss 7.1431 (8.1309) grad_norm 1.9597 (2.1529) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][360/625] eta 0:01:54 lr 0.001088 wd 0.0500 time 0.4131 (0.4330) data time 0.0008 (0.0092) model time 0.4123 (0.4243) loss 6.5835 (8.1067) grad_norm 1.7572 (2.1479) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][370/625] eta 0:01:50 lr 0.001088 wd 0.0500 time 0.4178 (0.4325) data time 0.0010 (0.0090) model time 0.4168 (0.4240) loss 8.7671 (8.1213) grad_norm 1.7828 (2.1468) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][380/625] eta 0:01:45 lr 0.001088 wd 0.0500 time 0.4137 (0.4320) data time 0.0007 (0.0088) model time 0.4129 (0.4237) loss 8.5432 (8.1147) grad_norm 2.9327 (2.1472) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][390/625] eta 0:01:41 lr 0.001088 wd 0.0500 time 0.4138 (0.4316) data time 0.0010 (0.0086) model time 0.4128 (0.4234) loss 9.4643 (8.1134) grad_norm 1.6883 (2.1476) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][400/625] eta 0:01:37 lr 0.001088 wd 0.0500 time 0.4044 (0.4312) data time 0.0010 (0.0084) model time 0.4034 (0.4231) loss 8.6013 (8.1228) grad_norm 3.0612 (2.1504) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:32:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][410/625] eta 0:01:32 lr 0.001088 wd 0.0500 time 0.4264 (0.4308) data time 0.0011 (0.0082) model time 0.4253 (0.4229) loss 8.1066 (8.1262) grad_norm 1.6799 (2.1436) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][420/625] eta 0:01:28 lr 0.001088 wd 0.0500 time 0.4129 (0.4305) data time 0.0010 (0.0081) model time 0.4119 (0.4227) loss 7.1787 (8.1145) grad_norm 1.9604 (2.1400) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][430/625] eta 0:01:23 lr 0.001088 wd 0.0500 time 0.4142 (0.4301) data time 0.0009 (0.0079) model time 0.4132 (0.4225) loss 7.6101 (8.1140) grad_norm 1.6450 (2.1364) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][440/625] eta 0:01:19 lr 0.001088 wd 0.0500 time 0.4215 (0.4299) data time 0.0008 (0.0078) model time 0.4206 (0.4224) loss 8.5447 (8.1149) grad_norm 1.9775 (2.1290) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][450/625] eta 0:01:15 lr 0.001088 wd 0.0500 time 0.4106 (0.4296) data time 0.0011 (0.0076) model time 0.4095 (0.4222) loss 7.9991 (8.1211) grad_norm 1.9137 (2.1294) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][460/625] eta 0:01:10 lr 0.001088 wd 0.0500 time 0.4193 (0.4293) data time 0.0010 (0.0075) model time 0.4184 (0.4220) loss 8.4507 (8.1215) grad_norm 1.5608 (2.1257) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][470/625] eta 0:01:06 lr 0.001088 wd 0.0500 time 0.4224 (0.4290) data time 0.0007 (0.0073) model time 0.4217 (0.4219) loss 7.4419 (8.1141) grad_norm 1.7743 (2.1204) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][480/625] eta 0:01:02 lr 0.001087 wd 0.0500 time 0.4112 (0.4287) data time 0.0011 (0.0072) model time 0.4101 (0.4217) loss 8.2446 (8.1118) grad_norm 1.4978 (2.1110) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][490/625] eta 0:00:57 lr 0.001087 wd 0.0500 time 0.4075 (0.4284) data time 0.0008 (0.0071) model time 0.4067 (0.4215) loss 6.3010 (8.1077) grad_norm 1.9894 (2.1073) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][500/625] eta 0:00:53 lr 0.001087 wd 0.0500 time 0.4163 (0.4282) data time 0.0008 (0.0070) model time 0.4155 (0.4214) loss 8.2899 (8.1057) grad_norm 2.2089 (2.1051) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][510/625] eta 0:00:49 lr 0.001087 wd 0.0500 time 0.4237 (0.4279) data time 0.0010 (0.0068) model time 0.4228 (0.4212) loss 8.4548 (8.1040) grad_norm 2.2362 (2.1006) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][520/625] eta 0:00:44 lr 0.001087 wd 0.0500 time 0.4269 (0.4281) data time 0.0008 (0.0067) model time 0.4261 (0.4215) loss 6.6228 (8.1038) grad_norm 2.5030 (2.1102) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][530/625] eta 0:00:40 lr 0.001087 wd 0.0500 time 0.6384 (0.4286) data time 0.0010 (0.0066) model time 0.6374 (0.4221) loss 7.1344 (8.0988) grad_norm 1.4805 (2.1083) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][540/625] eta 0:00:36 lr 0.001087 wd 0.0500 time 0.4116 (0.4289) data time 0.0010 (0.0065) model time 0.4106 (0.4227) loss 7.8750 (8.1006) grad_norm 3.5704 (2.1057) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:33:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][550/625] eta 0:00:32 lr 0.001087 wd 0.0500 time 0.4128 (0.4298) data time 0.0008 (0.0064) model time 0.4120 (0.4237) loss 7.1806 (8.0944) grad_norm 2.5721 (2.1145) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][560/625] eta 0:00:27 lr 0.001087 wd 0.0500 time 0.4158 (0.4296) data time 0.0008 (0.0063) model time 0.4150 (0.4236) loss 7.9761 (8.0893) grad_norm 1.9881 (2.1171) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][570/625] eta 0:00:23 lr 0.001087 wd 0.0500 time 0.4130 (0.4294) data time 0.0008 (0.0063) model time 0.4122 (0.4234) loss 8.7921 (8.0951) grad_norm 1.6869 (2.1141) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][580/625] eta 0:00:19 lr 0.001087 wd 0.0500 time 0.4135 (0.4292) data time 0.0010 (0.0062) model time 0.4125 (0.4233) loss 8.0960 (8.0975) grad_norm 2.5679 (2.1095) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][590/625] eta 0:00:15 lr 0.001087 wd 0.0500 time 0.4160 (0.4290) data time 0.0010 (0.0061) model time 0.4150 (0.4232) loss 8.9118 (8.1002) grad_norm 2.6118 (2.1068) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][600/625] eta 0:00:10 lr 0.001087 wd 0.0500 time 0.4166 (0.4288) data time 0.0010 (0.0060) model time 0.4156 (0.4230) loss 8.4909 (8.1021) grad_norm 3.7396 (2.1176) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][610/625] eta 0:00:06 lr 0.001087 wd 0.0500 time 0.4136 (0.4286) data time 0.0006 (0.0059) model time 0.4130 (0.4229) loss 6.6226 (8.0965) grad_norm 2.2564 (2.1199) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][620/625] eta 0:00:02 lr 0.001087 wd 0.0500 time 0.4105 (0.4283) data time 0.0008 (0.0058) model time 0.4097 (0.4227) loss 9.3506 (8.0971) grad_norm 1.6560 (2.1172) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 75 training takes 0:04:27 [2024-07-24 16:34:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:34:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:34:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.6748 (0.6748) Acc@1 86.035 (86.035) Acc@5 97.998 (97.998) Mem 14941MB [2024-07-24 16:34:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.130) Loss 1.0938 (0.8229) Acc@1 75.391 (82.706) Acc@5 93.945 (96.791) Mem 14941MB [2024-07-24 16:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.110) Loss 1.2275 (0.9823) Acc@1 71.289 (78.711) Acc@5 91.797 (94.743) Mem 14941MB [2024-07-24 16:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.337 Acc@5 94.696 [2024-07-24 16:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.3% [2024-07-24 16:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.960 (0.960) Loss 0.6670 (0.6670) Acc@1 86.963 (86.963) Acc@5 97.852 (97.852) Mem 14941MB [2024-07-24 16:34:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.177) Loss 1.0566 (0.8115) Acc@1 76.611 (83.540) Acc@5 94.482 (96.813) Mem 14941MB [2024-07-24 16:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.134) Loss 1.2236 (0.9626) Acc@1 71.338 (79.532) Acc@5 92.578 (95.071) Mem 14941MB [2024-07-24 16:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.219 Acc@5 95.056 [2024-07-24 16:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-24 16:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.22% [2024-07-24 16:34:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:34:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:34:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][0/625] eta 0:13:16 lr 0.001087 wd 0.0500 time 1.2747 (1.2747) data time 0.8823 (0.8823) model time 0.0000 (0.0000) loss 8.9365 (8.9365) grad_norm 1.6144 (1.6144) loss_scale 4096.0000 (4096.0000) mem 14940MB [2024-07-24 16:34:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][10/625] eta 0:05:09 lr 0.001086 wd 0.0500 time 0.4121 (0.5028) data time 0.0008 (0.0812) model time 0.0000 (0.0000) loss 7.1888 (8.1035) grad_norm 2.6307 (2.0206) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][20/625] eta 0:04:40 lr 0.001086 wd 0.0500 time 0.4233 (0.4641) data time 0.0010 (0.0431) model time 0.0000 (0.0000) loss 9.0846 (8.2449) grad_norm 1.4553 (1.8714) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][30/625] eta 0:04:26 lr 0.001086 wd 0.0500 time 0.4096 (0.4483) data time 0.0008 (0.0295) model time 0.0000 (0.0000) loss 9.5422 (8.1441) grad_norm 1.5584 (1.8179) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][40/625] eta 0:04:17 lr 0.001086 wd 0.0500 time 0.4162 (0.4401) data time 0.0008 (0.0226) model time 0.0000 (0.0000) loss 7.7565 (8.0868) grad_norm 2.6799 (1.8337) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:34:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][50/625] eta 0:04:12 lr 0.001086 wd 0.0500 time 0.4100 (0.4398) data time 0.0008 (0.0183) model time 0.0000 (0.0000) loss 7.5173 (8.0911) grad_norm 1.4829 (1.7959) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][60/625] eta 0:04:06 lr 0.001086 wd 0.0500 time 0.4181 (0.4361) data time 0.0010 (0.0155) model time 0.4171 (0.4161) loss 8.7827 (8.0878) grad_norm 1.7593 (1.8466) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][70/625] eta 0:04:00 lr 0.001086 wd 0.0500 time 0.4143 (0.4338) data time 0.0008 (0.0135) model time 0.4135 (0.4175) loss 7.8385 (8.0699) grad_norm 2.3932 (1.8729) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][80/625] eta 0:03:55 lr 0.001086 wd 0.0500 time 0.4393 (0.4321) data time 0.0009 (0.0119) model time 0.4384 (0.4179) loss 7.2813 (8.0826) grad_norm 1.7127 (1.8620) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][90/625] eta 0:03:50 lr 0.001086 wd 0.0500 time 0.4107 (0.4299) data time 0.0008 (0.0107) model time 0.4099 (0.4163) loss 8.2453 (8.0707) grad_norm 2.2901 (1.8606) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][100/625] eta 0:03:44 lr 0.001086 wd 0.0500 time 0.4210 (0.4285) data time 0.0008 (0.0098) model time 0.4203 (0.4159) loss 7.7855 (8.0626) grad_norm 1.6518 (1.8455) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][110/625] eta 0:03:40 lr 0.001086 wd 0.0500 time 0.4579 (0.4277) data time 0.0010 (0.0090) model time 0.4569 (0.4165) loss 6.2502 (8.0145) grad_norm 3.6670 (1.8916) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][120/625] eta 0:03:35 lr 0.001086 wd 0.0500 time 0.4101 (0.4266) data time 0.0011 (0.0083) model time 0.4091 (0.4160) loss 8.1950 (8.0235) grad_norm 1.9718 (1.8818) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][130/625] eta 0:03:32 lr 0.001086 wd 0.0500 time 0.4134 (0.4285) data time 0.0008 (0.0078) model time 0.4126 (0.4203) loss 7.6397 (8.0260) grad_norm 2.1032 (1.8618) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][140/625] eta 0:03:30 lr 0.001086 wd 0.0500 time 0.6083 (0.4344) data time 0.0011 (0.0073) model time 0.6072 (0.4304) loss 9.4418 (8.0547) grad_norm 2.2509 (1.8885) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][150/625] eta 0:03:26 lr 0.001086 wd 0.0500 time 0.4152 (0.4347) data time 0.0010 (0.0069) model time 0.4142 (0.4311) loss 8.2002 (8.0575) grad_norm 1.7809 (1.8893) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][160/625] eta 0:03:21 lr 0.001086 wd 0.0500 time 0.4150 (0.4334) data time 0.0008 (0.0065) model time 0.4142 (0.4295) loss 7.0381 (8.0523) grad_norm 2.1162 (1.8805) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][170/625] eta 0:03:16 lr 0.001085 wd 0.0500 time 0.4159 (0.4323) data time 0.0007 (0.0062) model time 0.4152 (0.4281) loss 7.2314 (8.0423) grad_norm 1.7241 (1.8955) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][180/625] eta 0:03:11 lr 0.001085 wd 0.0500 time 0.4205 (0.4314) data time 0.0010 (0.0059) model time 0.4195 (0.4271) loss 6.8443 (8.0297) grad_norm 3.3814 (1.9093) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:35:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][190/625] eta 0:03:07 lr 0.001085 wd 0.0500 time 0.4154 (0.4305) data time 0.0009 (0.0056) model time 0.4144 (0.4260) loss 7.0998 (8.0182) grad_norm 1.7679 (1.9077) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][200/625] eta 0:03:02 lr 0.001085 wd 0.0500 time 0.4207 (0.4297) data time 0.0009 (0.0054) model time 0.4198 (0.4253) loss 8.4261 (8.0302) grad_norm 2.0712 (1.9025) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][210/625] eta 0:02:58 lr 0.001085 wd 0.0500 time 0.4478 (0.4292) data time 0.0008 (0.0052) model time 0.4470 (0.4248) loss 7.6357 (8.0277) grad_norm 2.9414 (1.9241) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][220/625] eta 0:02:53 lr 0.001085 wd 0.0500 time 0.4073 (0.4285) data time 0.0008 (0.0050) model time 0.4066 (0.4241) loss 6.5178 (7.9988) grad_norm 1.5902 (1.9345) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][230/625] eta 0:02:49 lr 0.001085 wd 0.0500 time 0.4184 (0.4280) data time 0.0010 (0.0048) model time 0.4173 (0.4236) loss 8.0679 (8.0092) grad_norm 1.4365 (1.9205) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][240/625] eta 0:02:44 lr 0.001085 wd 0.0500 time 0.4282 (0.4276) data time 0.0011 (0.0047) model time 0.4272 (0.4233) loss 9.5695 (8.0204) grad_norm 1.8325 (1.9136) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][250/625] eta 0:02:40 lr 0.001085 wd 0.0500 time 0.4081 (0.4272) data time 0.0009 (0.0045) model time 0.4072 (0.4230) loss 7.4549 (8.0319) grad_norm 1.3491 (1.9218) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][260/625] eta 0:02:35 lr 0.001085 wd 0.0500 time 0.4182 (0.4268) data time 0.0007 (0.0044) model time 0.4175 (0.4226) loss 8.3672 (8.0447) grad_norm 2.9652 (1.9473) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][270/625] eta 0:02:31 lr 0.001085 wd 0.0500 time 0.4096 (0.4270) data time 0.0009 (0.0043) model time 0.4088 (0.4230) loss 8.5601 (8.0590) grad_norm 1.9060 (1.9544) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][280/625] eta 0:02:27 lr 0.001085 wd 0.0500 time 0.4098 (0.4266) data time 0.0009 (0.0042) model time 0.4089 (0.4226) loss 7.2454 (8.0612) grad_norm 1.6206 (1.9497) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][290/625] eta 0:02:22 lr 0.001085 wd 0.0500 time 0.4161 (0.4262) data time 0.0008 (0.0041) model time 0.4153 (0.4223) loss 6.5294 (8.0487) grad_norm 1.3940 (1.9419) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][300/625] eta 0:02:19 lr 0.001085 wd 0.0500 time 0.4497 (0.4294) data time 0.0011 (0.0040) model time 0.4486 (0.4262) loss 6.3780 (8.0460) grad_norm 1.7290 (1.9434) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][310/625] eta 0:02:15 lr 0.001085 wd 0.0500 time 0.4188 (0.4289) data time 0.0011 (0.0039) model time 0.4177 (0.4257) loss 8.0015 (8.0414) grad_norm 2.9652 (1.9554) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][320/625] eta 0:02:10 lr 0.001085 wd 0.0500 time 0.4092 (0.4284) data time 0.0012 (0.0038) model time 0.4081 (0.4252) loss 8.7590 (8.0485) grad_norm 1.3740 (1.9466) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:36:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][330/625] eta 0:02:06 lr 0.001084 wd 0.0500 time 0.4102 (0.4280) data time 0.0010 (0.0037) model time 0.4093 (0.4249) loss 8.1175 (8.0543) grad_norm 1.5533 (1.9390) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][340/625] eta 0:02:01 lr 0.001084 wd 0.0500 time 0.4201 (0.4277) data time 0.0010 (0.0036) model time 0.4191 (0.4245) loss 9.1818 (8.0555) grad_norm 2.2563 (1.9351) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][350/625] eta 0:01:57 lr 0.001084 wd 0.0500 time 0.4137 (0.4286) data time 0.0011 (0.0035) model time 0.4126 (0.4257) loss 8.7511 (8.0494) grad_norm 1.5221 (1.9271) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][360/625] eta 0:01:54 lr 0.001084 wd 0.0500 time 0.6331 (0.4306) data time 0.0012 (0.0035) model time 0.6319 (0.4280) loss 6.6975 (8.0388) grad_norm 1.5617 (1.9160) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][370/625] eta 0:01:49 lr 0.001084 wd 0.0500 time 0.4222 (0.4308) data time 0.0010 (0.0035) model time 0.4212 (0.4282) loss 8.9619 (8.0582) grad_norm 1.4743 (1.9111) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][380/625] eta 0:01:45 lr 0.001084 wd 0.0500 time 0.4185 (0.4303) data time 0.0008 (0.0034) model time 0.4177 (0.4278) loss 7.6214 (8.0613) grad_norm 1.4585 (1.9033) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][390/625] eta 0:01:41 lr 0.001084 wd 0.0500 time 0.4266 (0.4300) data time 0.0008 (0.0034) model time 0.4258 (0.4275) loss 8.3632 (8.0581) grad_norm 2.3660 (1.9049) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][400/625] eta 0:01:36 lr 0.001084 wd 0.0500 time 0.4205 (0.4297) data time 0.0010 (0.0033) model time 0.4195 (0.4271) loss 8.4354 (8.0588) grad_norm 1.8088 (1.9018) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][410/625] eta 0:01:32 lr 0.001084 wd 0.0500 time 0.4266 (0.4295) data time 0.0010 (0.0033) model time 0.4256 (0.4269) loss 9.0649 (8.0626) grad_norm 1.5259 (1.9088) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][420/625] eta 0:01:27 lr 0.001084 wd 0.0500 time 0.4121 (0.4292) data time 0.0008 (0.0032) model time 0.4113 (0.4266) loss 8.9229 (8.0721) grad_norm 1.9263 (1.9268) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][430/625] eta 0:01:23 lr 0.001084 wd 0.0500 time 0.4162 (0.4289) data time 0.0007 (0.0031) model time 0.4155 (0.4264) loss 9.4274 (8.0757) grad_norm 1.4158 (1.9150) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][440/625] eta 0:01:19 lr 0.001084 wd 0.0500 time 0.4222 (0.4287) data time 0.0010 (0.0031) model time 0.4212 (0.4262) loss 8.0470 (8.0728) grad_norm 4.6207 (1.9250) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][450/625] eta 0:01:15 lr 0.001084 wd 0.0500 time 0.4261 (0.4286) data time 0.0008 (0.0031) model time 0.4253 (0.4260) loss 6.8524 (8.0730) grad_norm 1.6621 (1.9263) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][460/625] eta 0:01:10 lr 0.001084 wd 0.0500 time 0.4192 (0.4284) data time 0.0010 (0.0030) model time 0.4182 (0.4259) loss 7.7917 (8.0636) grad_norm 1.8151 (1.9225) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:37:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][470/625] eta 0:01:06 lr 0.001084 wd 0.0500 time 0.4150 (0.4281) data time 0.0010 (0.0030) model time 0.4140 (0.4256) loss 8.5918 (8.0745) grad_norm 1.3978 (1.9172) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][480/625] eta 0:01:02 lr 0.001084 wd 0.0500 time 0.4167 (0.4279) data time 0.0008 (0.0029) model time 0.4159 (0.4254) loss 7.5248 (8.0736) grad_norm 1.7469 (1.9142) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][490/625] eta 0:00:57 lr 0.001083 wd 0.0500 time 0.4168 (0.4280) data time 0.0010 (0.0029) model time 0.4158 (0.4256) loss 7.3257 (8.0741) grad_norm 1.3947 (1.9138) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][500/625] eta 0:00:53 lr 0.001083 wd 0.0500 time 0.4170 (0.4278) data time 0.0011 (0.0029) model time 0.4159 (0.4253) loss 9.5334 (8.0811) grad_norm 1.6250 (1.9272) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][510/625] eta 0:00:49 lr 0.001083 wd 0.0500 time 0.4227 (0.4275) data time 0.0008 (0.0028) model time 0.4219 (0.4251) loss 7.7518 (8.0756) grad_norm 2.2287 (1.9242) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][520/625] eta 0:00:44 lr 0.001083 wd 0.0500 time 0.4138 (0.4272) data time 0.0008 (0.0028) model time 0.4130 (0.4248) loss 9.3669 (8.0768) grad_norm 1.3819 (1.9236) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][530/625] eta 0:00:40 lr 0.001083 wd 0.0500 time 0.4173 (0.4271) data time 0.0010 (0.0028) model time 0.4163 (0.4246) loss 7.4405 (8.0730) grad_norm 1.9956 (1.9207) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][540/625] eta 0:00:36 lr 0.001083 wd 0.0500 time 0.4174 (0.4269) data time 0.0008 (0.0027) model time 0.4166 (0.4245) loss 8.6863 (8.0712) grad_norm 1.5571 (1.9162) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][550/625] eta 0:00:32 lr 0.001083 wd 0.0500 time 0.4088 (0.4267) data time 0.0012 (0.0027) model time 0.4076 (0.4243) loss 8.5138 (8.0677) grad_norm 1.4749 (1.9175) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][560/625] eta 0:00:27 lr 0.001083 wd 0.0500 time 0.4119 (0.4266) data time 0.0008 (0.0027) model time 0.4111 (0.4241) loss 8.5359 (8.0620) grad_norm 1.8368 (1.9153) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][570/625] eta 0:00:23 lr 0.001083 wd 0.0500 time 0.6239 (0.4279) data time 0.0011 (0.0027) model time 0.6228 (0.4256) loss 8.1419 (8.0586) grad_norm 1.7886 (1.9161) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][580/625] eta 0:00:19 lr 0.001083 wd 0.0500 time 0.4089 (0.4290) data time 0.0010 (0.0027) model time 0.4079 (0.4268) loss 7.7083 (8.0658) grad_norm 1.3592 (1.9127) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][590/625] eta 0:00:15 lr 0.001083 wd 0.0500 time 0.4139 (0.4297) data time 0.0011 (0.0026) model time 0.4129 (0.4276) loss 8.7620 (8.0606) grad_norm 2.3670 (1.9115) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][600/625] eta 0:00:10 lr 0.001083 wd 0.0500 time 0.4127 (0.4294) data time 0.0012 (0.0026) model time 0.4116 (0.4273) loss 8.8090 (8.0645) grad_norm 1.5540 (1.9106) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][610/625] eta 0:00:06 lr 0.001083 wd 0.0500 time 0.4115 (0.4292) data time 0.0005 (0.0026) model time 0.4109 (0.4271) loss 8.5472 (8.0623) grad_norm 1.8642 (1.9102) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][620/625] eta 0:00:02 lr 0.001083 wd 0.0500 time 0.4160 (0.4290) data time 0.0007 (0.0026) model time 0.4153 (0.4269) loss 7.1118 (8.0680) grad_norm 1.8520 (1.9094) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 76 training takes 0:04:28 [2024-07-24 16:39:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:39:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:39:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.6758 (0.6758) Acc@1 85.840 (85.840) Acc@5 97.607 (97.607) Mem 14941MB [2024-07-24 16:39:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.134) Loss 1.0996 (0.8223) Acc@1 75.830 (82.892) Acc@5 93.799 (96.618) Mem 14941MB [2024-07-24 16:39:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.111) Loss 1.2217 (0.9813) Acc@1 72.217 (78.997) Acc@5 92.432 (94.738) Mem 14941MB [2024-07-24 16:39:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.641 Acc@5 94.710 [2024-07-24 16:39:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.6% [2024-07-24 16:39:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.64% [2024-07-24 16:39:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 16:39:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 16:39:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.556 (0.556) Loss 0.6626 (0.6626) Acc@1 86.963 (86.963) Acc@5 97.754 (97.754) Mem 14941MB [2024-07-24 16:39:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.133) Loss 1.0518 (0.8079) Acc@1 76.660 (83.536) Acc@5 94.434 (96.800) Mem 14941MB [2024-07-24 16:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.111) Loss 1.2197 (0.9587) Acc@1 71.631 (79.599) Acc@5 92.627 (95.052) Mem 14941MB [2024-07-24 16:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.277 Acc@5 95.036 [2024-07-24 16:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-24 16:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.28% [2024-07-24 16:39:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:39:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:39:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][0/625] eta 0:12:39 lr 0.001083 wd 0.0500 time 1.2147 (1.2147) data time 0.8049 (0.8049) model time 0.0000 (0.0000) loss 7.9490 (7.9490) grad_norm 1.4512 (1.4512) loss_scale 4096.0000 (4096.0000) mem 14940MB [2024-07-24 16:39:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][10/625] eta 0:04:59 lr 0.001083 wd 0.0500 time 0.4155 (0.4873) data time 0.0010 (0.0741) model time 0.0000 (0.0000) loss 7.9813 (8.2092) grad_norm 1.5469 (1.5088) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][20/625] eta 0:04:33 lr 0.001082 wd 0.0500 time 0.4186 (0.4521) data time 0.0008 (0.0393) model time 0.0000 (0.0000) loss 8.1082 (8.0697) grad_norm 1.3051 (1.5563) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][30/625] eta 0:04:22 lr 0.001082 wd 0.0500 time 0.4201 (0.4407) data time 0.0010 (0.0269) model time 0.0000 (0.0000) loss 8.3579 (8.0900) grad_norm 1.4525 (1.5907) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][40/625] eta 0:04:14 lr 0.001082 wd 0.0500 time 0.4093 (0.4353) data time 0.0010 (0.0206) model time 0.0000 (0.0000) loss 8.6673 (8.1370) grad_norm 1.4456 (1.7310) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][50/625] eta 0:04:08 lr 0.001082 wd 0.0500 time 0.4147 (0.4322) data time 0.0007 (0.0168) model time 0.0000 (0.0000) loss 8.5494 (8.1504) grad_norm 2.0623 (1.8107) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][60/625] eta 0:04:02 lr 0.001082 wd 0.0500 time 0.4105 (0.4298) data time 0.0010 (0.0142) model time 0.4094 (0.4168) loss 7.9090 (8.0719) grad_norm 1.3310 (1.7640) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][70/625] eta 0:03:57 lr 0.001082 wd 0.0500 time 0.4097 (0.4278) data time 0.0011 (0.0123) model time 0.4087 (0.4157) loss 7.3257 (8.0746) grad_norm 2.0561 (1.7463) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][80/625] eta 0:03:52 lr 0.001082 wd 0.0500 time 0.4258 (0.4264) data time 0.0008 (0.0109) model time 0.4250 (0.4155) loss 8.3391 (8.1030) grad_norm 1.7953 (1.7372) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][90/625] eta 0:03:47 lr 0.001082 wd 0.0500 time 0.4135 (0.4253) data time 0.0010 (0.0099) model time 0.4125 (0.4154) loss 8.6966 (8.0931) grad_norm 1.4649 (1.7568) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][100/625] eta 0:03:42 lr 0.001082 wd 0.0500 time 0.4111 (0.4245) data time 0.0010 (0.0090) model time 0.4101 (0.4156) loss 8.8718 (8.0829) grad_norm 1.9233 (1.8069) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:39:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][110/625] eta 0:03:38 lr 0.001082 wd 0.0500 time 0.4083 (0.4234) data time 0.0012 (0.0083) model time 0.4070 (0.4149) loss 8.4730 (8.0719) grad_norm 2.0349 (1.8022) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][120/625] eta 0:03:33 lr 0.001082 wd 0.0500 time 0.4094 (0.4227) data time 0.0012 (0.0077) model time 0.4082 (0.4146) loss 8.1451 (8.0359) grad_norm 2.2167 (1.8127) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][130/625] eta 0:03:28 lr 0.001082 wd 0.0500 time 0.4184 (0.4221) data time 0.0008 (0.0072) model time 0.4176 (0.4146) loss 6.2536 (8.0029) grad_norm 1.7275 (1.8152) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][140/625] eta 0:03:24 lr 0.001082 wd 0.0500 time 0.4125 (0.4217) data time 0.0009 (0.0068) model time 0.4116 (0.4147) loss 6.9601 (7.9889) grad_norm 1.5509 (1.8187) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][150/625] eta 0:03:20 lr 0.001082 wd 0.0500 time 0.4174 (0.4213) data time 0.0010 (0.0064) model time 0.4163 (0.4147) loss 8.9647 (7.9825) grad_norm 2.7110 (1.8221) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][160/625] eta 0:03:16 lr 0.001082 wd 0.0500 time 0.4253 (0.4216) data time 0.0009 (0.0061) model time 0.4243 (0.4157) loss 8.4063 (7.9912) grad_norm 3.0076 (1.8508) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][170/625] eta 0:03:14 lr 0.001082 wd 0.0500 time 0.5650 (0.4266) data time 0.0008 (0.0058) model time 0.5642 (0.4231) loss 7.5981 (8.0048) grad_norm 1.4130 (1.8616) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][180/625] eta 0:03:12 lr 0.001081 wd 0.0500 time 0.4201 (0.4319) data time 0.0008 (0.0056) model time 0.4194 (0.4305) loss 9.4525 (8.0221) grad_norm 2.4292 (1.8613) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][190/625] eta 0:03:07 lr 0.001081 wd 0.0500 time 0.4093 (0.4313) data time 0.0014 (0.0054) model time 0.4079 (0.4298) loss 7.3786 (8.0250) grad_norm 1.2699 (1.8494) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][200/625] eta 0:03:03 lr 0.001081 wd 0.0500 time 0.4172 (0.4307) data time 0.0008 (0.0052) model time 0.4165 (0.4290) loss 7.6706 (8.0091) grad_norm 1.7286 (1.8356) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][210/625] eta 0:02:58 lr 0.001081 wd 0.0500 time 0.4020 (0.4302) data time 0.0008 (0.0050) model time 0.4012 (0.4283) loss 8.2105 (8.0116) grad_norm 2.0990 (1.8254) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][220/625] eta 0:02:54 lr 0.001081 wd 0.0500 time 0.4097 (0.4301) data time 0.0010 (0.0049) model time 0.4087 (0.4281) loss 9.0289 (8.0337) grad_norm 2.3300 (1.8328) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][230/625] eta 0:02:51 lr 0.001081 wd 0.0500 time 0.4127 (0.4354) data time 0.0008 (0.0047) model time 0.4119 (0.4349) loss 9.4282 (8.0187) grad_norm 1.4606 (1.8505) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:40:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][240/625] eta 0:02:47 lr 0.001081 wd 0.0500 time 0.4133 (0.4346) data time 0.0010 (0.0046) model time 0.4123 (0.4339) loss 9.2181 (8.0329) grad_norm 2.0857 (1.8625) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][250/625] eta 0:02:42 lr 0.001081 wd 0.0500 time 0.4235 (0.4341) data time 0.0008 (0.0045) model time 0.4227 (0.4332) loss 6.1004 (8.0018) grad_norm 2.0742 (1.8743) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][260/625] eta 0:02:38 lr 0.001081 wd 0.0500 time 0.4162 (0.4333) data time 0.0008 (0.0043) model time 0.4155 (0.4323) loss 8.0437 (7.9751) grad_norm 2.6209 (1.8770) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][270/625] eta 0:02:33 lr 0.001081 wd 0.0500 time 0.4140 (0.4330) data time 0.0008 (0.0043) model time 0.4132 (0.4318) loss 8.6318 (7.9819) grad_norm 1.2648 (1.8616) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][280/625] eta 0:02:29 lr 0.001081 wd 0.0500 time 0.4143 (0.4324) data time 0.0010 (0.0042) model time 0.4133 (0.4311) loss 8.7752 (7.9768) grad_norm 1.8739 (1.8493) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][290/625] eta 0:02:24 lr 0.001081 wd 0.0500 time 0.4128 (0.4318) data time 0.0008 (0.0041) model time 0.4120 (0.4304) loss 8.5064 (7.9737) grad_norm 1.6999 (1.8560) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][300/625] eta 0:02:20 lr 0.001081 wd 0.0500 time 0.4309 (0.4313) data time 0.0010 (0.0040) model time 0.4299 (0.4298) loss 7.1205 (7.9542) grad_norm 2.2315 (1.8516) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][310/625] eta 0:02:15 lr 0.001081 wd 0.0500 time 0.4101 (0.4309) data time 0.0008 (0.0039) model time 0.4093 (0.4292) loss 8.5594 (7.9541) grad_norm 2.1583 (1.8475) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][320/625] eta 0:02:11 lr 0.001081 wd 0.0500 time 0.4171 (0.4304) data time 0.0008 (0.0038) model time 0.4163 (0.4287) loss 6.3393 (7.9547) grad_norm 2.2284 (1.8521) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][330/625] eta 0:02:06 lr 0.001080 wd 0.0500 time 0.4145 (0.4299) data time 0.0007 (0.0037) model time 0.4138 (0.4282) loss 9.2042 (7.9639) grad_norm 1.5153 (1.8520) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][340/625] eta 0:02:02 lr 0.001080 wd 0.0500 time 0.4171 (0.4295) data time 0.0008 (0.0037) model time 0.4164 (0.4277) loss 8.5602 (7.9619) grad_norm 1.5976 (1.8421) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][350/625] eta 0:01:58 lr 0.001080 wd 0.0500 time 0.4385 (0.4292) data time 0.0008 (0.0036) model time 0.4377 (0.4274) loss 7.1559 (7.9568) grad_norm 1.6722 (1.8347) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][360/625] eta 0:01:53 lr 0.001080 wd 0.0500 time 0.4179 (0.4289) data time 0.0011 (0.0035) model time 0.4167 (0.4270) loss 8.6299 (7.9675) grad_norm 2.4288 (1.8292) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][370/625] eta 0:01:49 lr 0.001080 wd 0.0500 time 0.4164 (0.4286) data time 0.0008 (0.0035) model time 0.4156 (0.4267) loss 7.0559 (7.9599) grad_norm 2.0822 (1.8290) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:41:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][380/625] eta 0:01:44 lr 0.001080 wd 0.0500 time 0.4103 (0.4283) data time 0.0008 (0.0034) model time 0.4095 (0.4263) loss 7.5886 (7.9580) grad_norm 1.3407 (1.8282) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][390/625] eta 0:01:41 lr 0.001080 wd 0.0500 time 0.4299 (0.4299) data time 0.0009 (0.0034) model time 0.4290 (0.4282) loss 8.1148 (7.9534) grad_norm 1.8984 (1.8192) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][400/625] eta 0:01:37 lr 0.001080 wd 0.0500 time 0.6311 (0.4330) data time 0.0008 (0.0033) model time 0.6303 (0.4318) loss 7.4362 (7.9533) grad_norm 2.8222 (1.8162) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][410/625] eta 0:01:33 lr 0.001080 wd 0.0500 time 0.4250 (0.4326) data time 0.0009 (0.0032) model time 0.4242 (0.4313) loss 7.4312 (7.9502) grad_norm 1.9657 (1.8166) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][420/625] eta 0:01:28 lr 0.001080 wd 0.0500 time 0.4119 (0.4322) data time 0.0010 (0.0032) model time 0.4109 (0.4309) loss 9.4887 (7.9577) grad_norm 1.4773 (1.8217) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][430/625] eta 0:01:24 lr 0.001080 wd 0.0500 time 0.4174 (0.4320) data time 0.0010 (0.0031) model time 0.4164 (0.4306) loss 6.5945 (7.9412) grad_norm 2.0931 (1.8225) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][440/625] eta 0:01:19 lr 0.001080 wd 0.0500 time 0.3930 (0.4321) data time 0.0009 (0.0031) model time 0.3920 (0.4308) loss 6.6235 (7.9392) grad_norm 1.3027 (1.8219) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][450/625] eta 0:01:15 lr 0.001080 wd 0.0500 time 0.4100 (0.4317) data time 0.0008 (0.0031) model time 0.4091 (0.4304) loss 8.0233 (7.9464) grad_norm 3.1490 (1.8335) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][460/625] eta 0:01:11 lr 0.001080 wd 0.0500 time 0.4110 (0.4314) data time 0.0010 (0.0030) model time 0.4100 (0.4300) loss 9.3081 (7.9527) grad_norm 1.5691 (1.8327) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][470/625] eta 0:01:06 lr 0.001080 wd 0.0500 time 0.4199 (0.4310) data time 0.0011 (0.0030) model time 0.4189 (0.4296) loss 7.4544 (7.9597) grad_norm 1.5429 (1.8305) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][480/625] eta 0:01:02 lr 0.001080 wd 0.0500 time 0.4130 (0.4307) data time 0.0008 (0.0029) model time 0.4122 (0.4292) loss 9.6895 (7.9578) grad_norm 2.1015 (1.8348) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][490/625] eta 0:00:58 lr 0.001079 wd 0.0500 time 0.4202 (0.4304) data time 0.0010 (0.0029) model time 0.4192 (0.4289) loss 7.8239 (7.9546) grad_norm 1.2960 (1.8335) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][500/625] eta 0:00:53 lr 0.001079 wd 0.0500 time 0.4132 (0.4301) data time 0.0010 (0.0029) model time 0.4122 (0.4286) loss 8.4949 (7.9550) grad_norm 1.5467 (1.8329) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][510/625] eta 0:00:49 lr 0.001079 wd 0.0500 time 0.4104 (0.4298) data time 0.0010 (0.0028) model time 0.4094 (0.4283) loss 8.8431 (7.9544) grad_norm 1.8015 (1.8322) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:42:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][520/625] eta 0:00:45 lr 0.001079 wd 0.0500 time 0.4112 (0.4296) data time 0.0008 (0.0028) model time 0.4104 (0.4280) loss 6.8029 (7.9556) grad_norm 2.1476 (1.8335) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:43:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][530/625] eta 0:00:40 lr 0.001079 wd 0.0500 time 0.4238 (0.4294) data time 0.0010 (0.0028) model time 0.4228 (0.4278) loss 9.5273 (7.9611) grad_norm 2.1895 (1.8324) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:43:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][540/625] eta 0:00:36 lr 0.001079 wd 0.0500 time 0.4097 (0.4291) data time 0.0008 (0.0027) model time 0.4089 (0.4275) loss 6.6781 (7.9592) grad_norm 3.2492 (1.8351) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][550/625] eta 0:00:32 lr 0.001079 wd 0.0500 time 0.4117 (0.4289) data time 0.0010 (0.0027) model time 0.4107 (0.4272) loss 8.9676 (7.9596) grad_norm 2.4933 (1.8387) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:43:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][560/625] eta 0:00:27 lr 0.001079 wd 0.0500 time 0.4119 (0.4286) data time 0.0010 (0.0027) model time 0.4109 (0.4270) loss 8.4912 (7.9628) grad_norm 2.1847 (1.8348) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:43:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][570/625] eta 0:00:23 lr 0.001079 wd 0.0500 time 0.4225 (0.4283) data time 0.0011 (0.0026) model time 0.4214 (0.4267) loss 7.8672 (7.9636) grad_norm 2.4783 (1.8344) loss_scale 4096.0000 (4096.0000) mem 14941MB [2024-07-24 16:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][580/625] eta 0:00:19 lr 0.001079 wd 0.0500 time 0.4119 (0.4281) data time 0.0010 (0.0026) model time 0.4109 (0.4265) loss 7.0961 (7.9567) grad_norm 1.9616 (1.8315) loss_scale 8192.0000 (4124.1997) mem 14941MB [2024-07-24 16:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][590/625] eta 0:00:14 lr 0.001079 wd 0.0500 time 0.4153 (0.4279) data time 0.0008 (0.0026) model time 0.4145 (0.4262) loss 8.1016 (7.9577) grad_norm 1.5390 (1.8265) loss_scale 8192.0000 (4193.0288) mem 14941MB [2024-07-24 16:43:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][600/625] eta 0:00:10 lr 0.001079 wd 0.0500 time 0.4262 (0.4278) data time 0.0021 (0.0026) model time 0.4241 (0.4261) loss 8.6018 (7.9524) grad_norm 1.6546 (1.8247) loss_scale 8192.0000 (4259.5674) mem 14941MB [2024-07-24 16:43:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][610/625] eta 0:00:06 lr 0.001079 wd 0.0500 time 0.4173 (0.4289) data time 0.0005 (0.0025) model time 0.4168 (0.4274) loss 9.9050 (7.9567) grad_norm 1.8342 (1.8207) loss_scale 8192.0000 (4323.9280) mem 14941MB [2024-07-24 16:43:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][620/625] eta 0:00:02 lr 0.001079 wd 0.0500 time 0.4102 (0.4302) data time 0.0007 (0.0025) model time 0.4095 (0.4288) loss 8.7016 (7.9647) grad_norm 2.4109 (1.8188) loss_scale 8192.0000 (4386.2158) mem 14941MB [2024-07-24 16:43:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 77 training takes 0:04:28 [2024-07-24 16:43:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:43:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:43:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.676 (0.676) Loss 0.6616 (0.6616) Acc@1 87.256 (87.256) Acc@5 97.852 (97.852) Mem 14941MB [2024-07-24 16:43:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.143) Loss 1.0957 (0.8182) Acc@1 75.488 (83.043) Acc@5 94.385 (96.680) Mem 14941MB [2024-07-24 16:43:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.116) Loss 1.2305 (0.9755) Acc@1 71.729 (79.176) Acc@5 92.334 (94.757) Mem 14941MB [2024-07-24 16:43:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.865 Acc@5 94.736 [2024-07-24 16:43:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-07-24 16:43:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.86% [2024-07-24 16:43:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 16:43:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 16:43:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.697 (0.697) Loss 0.6587 (0.6587) Acc@1 86.914 (86.914) Acc@5 97.803 (97.803) Mem 14941MB [2024-07-24 16:43:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.145) Loss 1.0488 (0.8046) Acc@1 76.709 (83.540) Acc@5 94.434 (96.791) Mem 14941MB [2024-07-24 16:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.117) Loss 1.2148 (0.9551) Acc@1 71.533 (79.636) Acc@5 92.822 (95.057) Mem 14941MB [2024-07-24 16:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.311 Acc@5 95.052 [2024-07-24 16:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-24 16:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.31% [2024-07-24 16:43:48 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:43:49 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:43:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][0/625] eta 0:11:13 lr 0.001079 wd 0.0500 time 1.0779 (1.0779) data time 0.6853 (0.6853) model time 0.0000 (0.0000) loss 7.7792 (7.7792) grad_norm 1.7175 (1.7175) loss_scale 8192.0000 (8192.0000) mem 14940MB [2024-07-24 16:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][10/625] eta 0:04:51 lr 0.001079 wd 0.0500 time 0.4053 (0.4747) data time 0.0007 (0.0632) model time 0.0000 (0.0000) loss 8.4042 (8.0659) grad_norm 1.4336 (2.1191) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:43:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][20/625] eta 0:04:31 lr 0.001078 wd 0.0500 time 0.4112 (0.4480) data time 0.0011 (0.0336) model time 0.0000 (0.0000) loss 8.2124 (8.2057) grad_norm 1.3101 (1.8868) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][30/625] eta 0:04:20 lr 0.001078 wd 0.0500 time 0.4117 (0.4373) data time 0.0009 (0.0231) model time 0.0000 (0.0000) loss 8.7043 (8.0333) grad_norm 1.6069 (1.8739) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][40/625] eta 0:04:12 lr 0.001078 wd 0.0500 time 0.4147 (0.4314) data time 0.0007 (0.0177) model time 0.0000 (0.0000) loss 8.9059 (8.1395) grad_norm 1.8979 (1.8420) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][50/625] eta 0:04:06 lr 0.001078 wd 0.0500 time 0.4168 (0.4284) data time 0.0008 (0.0144) model time 0.0000 (0.0000) loss 7.2122 (8.1463) grad_norm 1.7606 (1.9298) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][60/625] eta 0:04:01 lr 0.001078 wd 0.0500 time 0.4202 (0.4268) data time 0.0010 (0.0122) model time 0.4192 (0.4175) loss 8.2877 (8.1125) grad_norm 1.5913 (1.9570) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][70/625] eta 0:03:56 lr 0.001078 wd 0.0500 time 0.4184 (0.4257) data time 0.0007 (0.0107) model time 0.4176 (0.4179) loss 8.6843 (8.1218) grad_norm 1.7908 (1.9838) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][80/625] eta 0:03:51 lr 0.001078 wd 0.0500 time 0.4093 (0.4245) data time 0.0012 (0.0095) model time 0.4081 (0.4168) loss 7.9188 (8.0829) grad_norm 2.0736 (1.9560) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][90/625] eta 0:03:46 lr 0.001078 wd 0.0500 time 0.4123 (0.4235) data time 0.0008 (0.0085) model time 0.4115 (0.4162) loss 9.1031 (8.0765) grad_norm 1.6247 (1.9071) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][100/625] eta 0:03:41 lr 0.001078 wd 0.0500 time 0.4189 (0.4228) data time 0.0010 (0.0078) model time 0.4179 (0.4161) loss 8.3009 (8.0639) grad_norm 1.3405 (1.8641) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][110/625] eta 0:03:37 lr 0.001078 wd 0.0500 time 0.4100 (0.4222) data time 0.0012 (0.0072) model time 0.4088 (0.4158) loss 7.9487 (8.0865) grad_norm 1.5394 (1.8362) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][120/625] eta 0:03:32 lr 0.001078 wd 0.0500 time 0.4118 (0.4216) data time 0.0010 (0.0067) model time 0.4109 (0.4155) loss 8.1720 (8.0964) grad_norm 1.4813 (1.8201) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][130/625] eta 0:03:28 lr 0.001078 wd 0.0500 time 0.4143 (0.4210) data time 0.0008 (0.0063) model time 0.4135 (0.4153) loss 7.4425 (8.0456) grad_norm 1.6673 (1.8431) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][140/625] eta 0:03:23 lr 0.001078 wd 0.0500 time 0.4215 (0.4205) data time 0.0009 (0.0059) model time 0.4206 (0.4151) loss 6.9643 (7.9973) grad_norm 1.4127 (1.8594) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][150/625] eta 0:03:19 lr 0.001078 wd 0.0500 time 0.4180 (0.4201) data time 0.0010 (0.0056) model time 0.4169 (0.4149) loss 8.2659 (7.9874) grad_norm 2.3496 (1.8800) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:44:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][160/625] eta 0:03:15 lr 0.001078 wd 0.0500 time 0.4128 (0.4198) data time 0.0008 (0.0053) model time 0.4120 (0.4149) loss 8.1456 (8.0203) grad_norm 1.2118 (1.8976) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][170/625] eta 0:03:11 lr 0.001078 wd 0.0500 time 0.6272 (0.4208) data time 0.0009 (0.0050) model time 0.6263 (0.4165) loss 6.6075 (8.0068) grad_norm 1.8317 (1.8879) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][180/625] eta 0:03:07 lr 0.001077 wd 0.0500 time 0.4211 (0.4204) data time 0.0010 (0.0048) model time 0.4201 (0.4162) loss 9.1453 (8.0190) grad_norm 1.7836 (1.8782) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][190/625] eta 0:03:02 lr 0.001077 wd 0.0500 time 0.4105 (0.4200) data time 0.0007 (0.0046) model time 0.4098 (0.4160) loss 10.2699 (8.0120) grad_norm 2.2051 (1.8747) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][200/625] eta 0:02:59 lr 0.001077 wd 0.0500 time 0.4196 (0.4217) data time 0.0008 (0.0044) model time 0.4188 (0.4184) loss 7.7506 (8.0215) grad_norm 1.7951 (1.8837) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][210/625] eta 0:02:56 lr 0.001077 wd 0.0500 time 0.6314 (0.4254) data time 0.0009 (0.0043) model time 0.6306 (0.4234) loss 8.3034 (8.0187) grad_norm 2.3225 (1.8813) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][220/625] eta 0:02:53 lr 0.001077 wd 0.0500 time 0.4122 (0.4286) data time 0.0008 (0.0041) model time 0.4114 (0.4277) loss 8.8641 (8.0114) grad_norm 1.8226 (1.8722) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][230/625] eta 0:02:49 lr 0.001077 wd 0.0500 time 0.4288 (0.4283) data time 0.0010 (0.0040) model time 0.4279 (0.4273) loss 7.1071 (8.0004) grad_norm 1.3235 (1.8604) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][240/625] eta 0:02:44 lr 0.001077 wd 0.0500 time 0.4109 (0.4279) data time 0.0008 (0.0039) model time 0.4101 (0.4268) loss 6.3711 (7.9827) grad_norm 2.3437 (1.8525) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][250/625] eta 0:02:40 lr 0.001077 wd 0.0500 time 0.4326 (0.4275) data time 0.0009 (0.0037) model time 0.4317 (0.4263) loss 8.9443 (7.9900) grad_norm 1.3753 (1.8719) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][260/625] eta 0:02:35 lr 0.001077 wd 0.0500 time 0.4215 (0.4271) data time 0.0008 (0.0036) model time 0.4207 (0.4258) loss 8.0621 (7.9907) grad_norm 1.9486 (1.8839) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][270/625] eta 0:02:31 lr 0.001077 wd 0.0500 time 0.4092 (0.4267) data time 0.0010 (0.0035) model time 0.4083 (0.4253) loss 7.8201 (7.9718) grad_norm 1.9456 (1.8943) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][280/625] eta 0:02:27 lr 0.001077 wd 0.0500 time 0.4126 (0.4264) data time 0.0010 (0.0035) model time 0.4116 (0.4250) loss 8.1288 (7.9772) grad_norm 1.8156 (1.8974) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][290/625] eta 0:02:22 lr 0.001077 wd 0.0500 time 0.4120 (0.4260) data time 0.0010 (0.0034) model time 0.4110 (0.4245) loss 9.2253 (7.9847) grad_norm 2.0457 (1.9034) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:45:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][300/625] eta 0:02:18 lr 0.001077 wd 0.0500 time 0.4287 (0.4257) data time 0.0010 (0.0033) model time 0.4277 (0.4241) loss 9.4036 (8.0051) grad_norm 1.4196 (1.9034) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][310/625] eta 0:02:13 lr 0.001077 wd 0.0500 time 0.4123 (0.4254) data time 0.0007 (0.0032) model time 0.4115 (0.4238) loss 7.2662 (8.0032) grad_norm 1.4134 (1.8946) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][320/625] eta 0:02:09 lr 0.001077 wd 0.0500 time 0.4202 (0.4250) data time 0.0010 (0.0032) model time 0.4192 (0.4234) loss 8.2017 (7.9972) grad_norm 1.4449 (1.8851) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][330/625] eta 0:02:05 lr 0.001076 wd 0.0500 time 0.4203 (0.4248) data time 0.0007 (0.0031) model time 0.4196 (0.4231) loss 7.5934 (7.9969) grad_norm 1.5731 (1.8771) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][340/625] eta 0:02:01 lr 0.001076 wd 0.0500 time 0.4218 (0.4246) data time 0.0010 (0.0030) model time 0.4208 (0.4229) loss 7.9364 (7.9992) grad_norm 1.6085 (1.8707) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][350/625] eta 0:01:56 lr 0.001076 wd 0.0500 time 0.4312 (0.4244) data time 0.0011 (0.0030) model time 0.4301 (0.4227) loss 7.9690 (7.9842) grad_norm 1.7547 (1.8682) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][360/625] eta 0:01:52 lr 0.001076 wd 0.0500 time 0.4231 (0.4242) data time 0.0011 (0.0029) model time 0.4220 (0.4225) loss 8.9162 (7.9810) grad_norm 2.0679 (1.8633) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][370/625] eta 0:01:48 lr 0.001076 wd 0.0500 time 0.4165 (0.4241) data time 0.0010 (0.0029) model time 0.4155 (0.4224) loss 8.1528 (7.9854) grad_norm 1.4132 (1.8635) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][380/625] eta 0:01:43 lr 0.001076 wd 0.0500 time 0.4233 (0.4240) data time 0.0010 (0.0028) model time 0.4223 (0.4223) loss 8.9968 (7.9836) grad_norm 2.0666 (1.8639) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][390/625] eta 0:01:39 lr 0.001076 wd 0.0500 time 0.5895 (0.4243) data time 0.0008 (0.0028) model time 0.5887 (0.4226) loss 7.3692 (7.9792) grad_norm 2.0783 (1.8602) loss_scale 8192.0000 (8192.0000) mem 14941MB [2024-07-24 16:46:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-24 16:46:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:46:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:48:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-24 16:48:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-24 16:48:35 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-24 16:48:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-24 16:48:59 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-24 16:49:00 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-24 16:49:00 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-24 16:49:00 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 78) [2024-07-24 16:49:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-24 16:49:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][400/625] eta 0:06:02 lr 0.001076 wd 0.0500 time 0.4074 (1.6110) data time 0.0010 (0.1029) model time 0.4063 (1.5081) loss 8.4119 (8.6214) grad_norm 1.6852 (1.7739) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][410/625] eta 0:03:12 lr 0.001076 wd 0.0500 time 0.3969 (0.8969) data time 0.0008 (0.0429) model time 0.3961 (0.8539) loss 7.2642 (8.4524) grad_norm 1.5852 (1.8108) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][420/625] eta 0:02:26 lr 0.001076 wd 0.0500 time 0.4083 (0.7125) data time 0.0007 (0.0274) model time 0.4076 (0.6852) loss 9.1165 (8.4015) grad_norm 2.4672 (1.9187) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][430/625] eta 0:02:02 lr 0.001076 wd 0.0500 time 0.4006 (0.6280) data time 0.0008 (0.0203) model time 0.3997 (0.6077) loss 7.8647 (8.3842) grad_norm 2.4112 (1.9475) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][440/625] eta 0:01:47 lr 0.001076 wd 0.0500 time 0.4084 (0.5797) data time 0.0006 (0.0162) model time 0.4078 (0.5635) loss 7.9785 (8.3236) grad_norm 1.7459 (2.0313) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][450/625] eta 0:01:37 lr 0.001076 wd 0.0500 time 0.4024 (0.5568) data time 0.0008 (0.0135) model time 0.4016 (0.5434) loss 8.5830 (8.3062) grad_norm 1.5317 (1.9825) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][460/625] eta 0:01:28 lr 0.001076 wd 0.0500 time 0.4021 (0.5337) data time 0.0009 (0.0116) model time 0.4012 (0.5221) loss 8.0027 (8.2360) grad_norm 1.6022 (1.9256) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][470/625] eta 0:01:20 lr 0.001076 wd 0.0500 time 0.3951 (0.5163) data time 0.0009 (0.0102) model time 0.3943 (0.5061) loss 8.3900 (8.1757) grad_norm 1.7239 (1.8870) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][480/625] eta 0:01:12 lr 0.001075 wd 0.0500 time 0.3984 (0.5031) data time 0.0008 (0.0091) model time 0.3975 (0.4940) loss 7.5378 (8.1346) grad_norm 1.4724 (1.8528) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][490/625] eta 0:01:06 lr 0.001075 wd 0.0500 time 0.4180 (0.4928) data time 0.0008 (0.0083) model time 0.4172 (0.4845) loss 8.6524 (8.1525) grad_norm 1.4857 (1.8286) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:49:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][500/625] eta 0:01:00 lr 0.001075 wd 0.0500 time 0.4141 (0.4842) data time 0.0006 (0.0076) model time 0.4135 (0.4766) loss 7.8155 (8.1776) grad_norm 2.0170 (1.8168) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][510/625] eta 0:00:54 lr 0.001075 wd 0.0500 time 0.3995 (0.4772) data time 0.0009 (0.0070) model time 0.3986 (0.4702) loss 8.7656 (8.1700) grad_norm 1.5137 (1.8007) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][520/625] eta 0:00:49 lr 0.001075 wd 0.0500 time 0.3996 (0.4714) data time 0.0009 (0.0065) model time 0.3988 (0.4648) loss 8.2911 (8.1449) grad_norm 2.2937 (1.8356) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][530/625] eta 0:00:44 lr 0.001075 wd 0.0500 time 0.3888 (0.4665) data time 0.0006 (0.0061) model time 0.3882 (0.4604) loss 6.8359 (8.1011) grad_norm 1.2541 (1.8243) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][540/625] eta 0:00:39 lr 0.001075 wd 0.0500 time 0.3954 (0.4624) data time 0.0008 (0.0058) model time 0.3946 (0.4566) loss 8.0755 (8.0924) grad_norm 1.3359 (1.8076) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][550/625] eta 0:00:34 lr 0.001075 wd 0.0500 time 0.4010 (0.4585) data time 0.0008 (0.0054) model time 0.4001 (0.4531) loss 6.9290 (8.0790) grad_norm 2.8748 (1.8401) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][560/625] eta 0:00:29 lr 0.001075 wd 0.0500 time 0.4014 (0.4553) data time 0.0008 (0.0052) model time 0.4006 (0.4501) loss 7.5938 (8.0797) grad_norm 2.4335 (1.8380) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][570/625] eta 0:00:24 lr 0.001075 wd 0.0500 time 0.4048 (0.4524) data time 0.0008 (0.0049) model time 0.4041 (0.4475) loss 7.9823 (8.0596) grad_norm 2.1115 (1.8362) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][580/625] eta 0:00:20 lr 0.001075 wd 0.0500 time 0.4119 (0.4500) data time 0.0006 (0.0047) model time 0.4113 (0.4453) loss 7.5791 (8.0337) grad_norm 1.2385 (1.8333) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][590/625] eta 0:00:15 lr 0.001075 wd 0.0500 time 0.3889 (0.4478) data time 0.0007 (0.0045) model time 0.3882 (0.4432) loss 8.7439 (8.0238) grad_norm 2.7851 (1.8409) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][600/625] eta 0:00:11 lr 0.001075 wd 0.0500 time 0.4054 (0.4457) data time 0.0006 (0.0043) model time 0.4048 (0.4413) loss 7.7842 (8.0064) grad_norm 1.8823 (1.8577) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][610/625] eta 0:00:06 lr 0.001075 wd 0.0500 time 2.6774 (0.4543) data time 0.0004 (0.0042) model time 2.6770 (0.4501) loss 8.3668 (7.9989) grad_norm 1.4321 (1.8606) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][620/625] eta 0:00:02 lr 0.001075 wd 0.0500 time 0.4008 (0.4520) data time 0.0006 (0.0040) model time 0.4002 (0.4479) loss 8.0111 (7.9959) grad_norm 1.6594 (1.8691) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 16:50:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 78 training takes 0:01:44 [2024-07-24 16:50:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:50:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:50:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.438 (0.438) Loss 0.6826 (0.6826) Acc@1 86.182 (86.182) Acc@5 97.900 (97.900) Mem 14931MB [2024-07-24 16:50:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.118) Loss 1.1074 (0.8106) Acc@1 75.684 (82.906) Acc@5 93.359 (96.737) Mem 14931MB [2024-07-24 16:50:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.2363 (0.9709) Acc@1 70.947 (79.099) Acc@5 91.992 (94.831) Mem 14931MB [2024-07-24 16:50:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.841 Acc@5 94.846 [2024-07-24 16:50:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-07-24 16:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.966 (0.966) Loss 0.6562 (0.6562) Acc@1 87.012 (87.012) Acc@5 97.754 (97.754) Mem 14931MB [2024-07-24 16:50:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.167) Loss 1.0459 (0.8014) Acc@1 77.100 (83.625) Acc@5 94.434 (96.791) Mem 14931MB [2024-07-24 16:50:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.128) Loss 1.2080 (0.9512) Acc@1 71.484 (79.725) Acc@5 92.773 (95.085) Mem 14931MB [2024-07-24 16:51:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.411 Acc@5 95.070 [2024-07-24 16:51:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-07-24 16:51:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.41% [2024-07-24 16:51:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:51:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][0/625] eta 0:10:27 lr 0.001075 wd 0.0500 time 1.0040 (1.0040) data time 0.3960 (0.3960) model time 0.0000 (0.0000) loss 8.0429 (8.0429) grad_norm 2.1872 (2.1872) loss_scale 8192.0000 (8192.0000) mem 14938MB [2024-07-24 16:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][10/625] eta 0:04:43 lr 0.001074 wd 0.0500 time 0.4099 (0.4604) data time 0.0006 (0.0367) model time 0.0000 (0.0000) loss 7.8717 (7.9025) grad_norm 2.4212 (2.3014) loss_scale 8192.0000 (8192.0000) mem 14939MB [2024-07-24 16:51:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][20/625] eta 0:04:23 lr 0.001074 wd 0.0500 time 0.3993 (0.4353) data time 0.0007 (0.0202) model time 0.0000 (0.0000) loss 8.1461 (7.7768) grad_norm 1.8003 (2.2548) loss_scale 8192.0000 (8192.0000) mem 14939MB [2024-07-24 16:51:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][30/625] eta 0:04:12 lr 0.001074 wd 0.0500 time 0.4042 (0.4248) data time 0.0009 (0.0141) model time 0.0000 (0.0000) loss 7.9323 (7.6325) grad_norm 1.4925 (2.0363) loss_scale 8192.0000 (8192.0000) mem 14939MB [2024-07-24 16:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][40/625] eta 0:04:05 lr 0.001074 wd 0.0500 time 0.4111 (0.4200) data time 0.0008 (0.0108) model time 0.0000 (0.0000) loss 7.7053 (7.6638) grad_norm 1.7673 (1.9565) loss_scale 8192.0000 (8192.0000) mem 14939MB [2024-07-24 16:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][50/625] eta 0:04:04 lr 0.001074 wd 0.0500 time 0.4245 (0.4250) data time 0.0009 (0.0089) model time 0.0000 (0.0000) loss 7.0504 (7.7292) grad_norm 1.7537 (1.9938) loss_scale 8192.0000 (8192.0000) mem 14939MB [2024-07-24 16:51:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][60/625] eta 0:03:58 lr 0.001074 wd 0.0500 time 0.4034 (0.4216) data time 0.0008 (0.0076) model time 0.4026 (0.4037) loss 7.3157 (7.7384) grad_norm 1.5390 (1.9401) loss_scale 8192.0000 (8192.0000) mem 14939MB [2024-07-24 16:51:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][70/625] eta 0:03:52 lr 0.001074 wd 0.0500 time 0.4033 (0.4189) data time 0.0008 (0.0066) model time 0.4025 (0.4025) loss 7.6867 (7.6995) grad_norm 2.5764 (1.9931) loss_scale 8192.0000 (8192.0000) mem 14939MB [2024-07-24 16:51:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][80/625] eta 0:03:47 lr 0.001074 wd 0.0500 time 0.4035 (0.4169) data time 0.0008 (0.0059) model time 0.4027 (0.4022) loss 8.1401 (7.7271) grad_norm 1.2902 (inf) loss_scale 4096.0000 (7787.4568) mem 14939MB [2024-07-24 16:51:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][90/625] eta 0:03:42 lr 0.001074 wd 0.0500 time 0.4022 (0.4156) data time 0.0009 (0.0054) model time 0.4013 (0.4027) loss 8.4511 (7.8366) grad_norm 1.7660 (inf) loss_scale 4096.0000 (7381.8022) mem 14939MB [2024-07-24 16:51:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][100/625] eta 0:03:37 lr 0.001074 wd 0.0500 time 0.4051 (0.4145) data time 0.0009 (0.0049) model time 0.4042 (0.4030) loss 7.9217 (7.8254) grad_norm 1.5545 (inf) loss_scale 4096.0000 (7056.4752) mem 14939MB [2024-07-24 16:51:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][110/625] eta 0:03:32 lr 0.001074 wd 0.0500 time 0.3964 (0.4134) data time 0.0007 (0.0046) model time 0.3956 (0.4028) loss 8.3621 (7.8671) grad_norm 1.4944 (inf) loss_scale 4096.0000 (6789.7658) mem 14939MB [2024-07-24 16:51:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][120/625] eta 0:03:28 lr 0.001074 wd 0.0500 time 0.4045 (0.4126) data time 0.0008 (0.0043) model time 0.4037 (0.4025) loss 9.4626 (7.8812) grad_norm 1.3931 (inf) loss_scale 4096.0000 (6567.1405) mem 14939MB [2024-07-24 16:51:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][130/625] eta 0:03:24 lr 0.001074 wd 0.0500 time 0.4052 (0.4122) data time 0.0007 (0.0041) model time 0.4046 (0.4031) loss 9.1458 (7.8819) grad_norm 1.5487 (inf) loss_scale 4096.0000 (6378.5038) mem 14939MB [2024-07-24 16:52:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][140/625] eta 0:03:19 lr 0.001074 wd 0.0500 time 0.4045 (0.4118) data time 0.0009 (0.0038) model time 0.4036 (0.4033) loss 8.3612 (7.8694) grad_norm 1.5958 (inf) loss_scale 4096.0000 (6216.6241) mem 14939MB [2024-07-24 16:52:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][150/625] eta 0:03:15 lr 0.001074 wd 0.0500 time 0.4046 (0.4113) data time 0.0006 (0.0036) model time 0.4040 (0.4035) loss 8.7743 (7.8769) grad_norm 2.1177 (inf) loss_scale 4096.0000 (6076.1854) mem 14939MB [2024-07-24 16:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][160/625] eta 0:03:11 lr 0.001073 wd 0.0500 time 0.4032 (0.4109) data time 0.0006 (0.0035) model time 0.4026 (0.4034) loss 7.2788 (7.8638) grad_norm 1.6822 (inf) loss_scale 4096.0000 (5953.1925) mem 14939MB [2024-07-24 16:52:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][170/625] eta 0:03:06 lr 0.001073 wd 0.0500 time 0.4035 (0.4106) data time 0.0007 (0.0033) model time 0.4029 (0.4035) loss 8.0642 (7.8945) grad_norm 1.1800 (inf) loss_scale 4096.0000 (5844.5848) mem 14939MB [2024-07-24 16:52:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][180/625] eta 0:03:02 lr 0.001073 wd 0.0500 time 0.4098 (0.4103) data time 0.0008 (0.0032) model time 0.4089 (0.4036) loss 8.3684 (7.9026) grad_norm 2.3374 (inf) loss_scale 4096.0000 (5747.9779) mem 14939MB [2024-07-24 16:52:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][190/625] eta 0:02:58 lr 0.001073 wd 0.0500 time 0.3925 (0.4100) data time 0.0009 (0.0031) model time 0.3916 (0.4037) loss 8.3180 (7.9081) grad_norm 1.6125 (inf) loss_scale 4096.0000 (5661.4869) mem 14939MB [2024-07-24 16:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][200/625] eta 0:02:54 lr 0.001073 wd 0.0500 time 0.4001 (0.4096) data time 0.0007 (0.0029) model time 0.3995 (0.4035) loss 6.8236 (7.9228) grad_norm 2.4536 (inf) loss_scale 4096.0000 (5583.6020) mem 14939MB [2024-07-24 16:52:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][210/625] eta 0:02:49 lr 0.001073 wd 0.0500 time 0.4049 (0.4095) data time 0.0007 (0.0028) model time 0.4042 (0.4036) loss 7.2946 (7.9371) grad_norm 1.8880 (inf) loss_scale 4096.0000 (5513.0995) mem 14939MB [2024-07-24 16:52:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][220/625] eta 0:02:46 lr 0.001073 wd 0.0500 time 0.4031 (0.4099) data time 0.0009 (0.0028) model time 0.4022 (0.4045) loss 6.7200 (7.9199) grad_norm 3.0641 (inf) loss_scale 4096.0000 (5448.9774) mem 14939MB [2024-07-24 16:52:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][230/625] eta 0:02:41 lr 0.001073 wd 0.0500 time 0.4034 (0.4095) data time 0.0007 (0.0027) model time 0.4028 (0.4043) loss 6.5418 (7.8933) grad_norm 2.1680 (inf) loss_scale 4096.0000 (5390.4069) mem 14939MB [2024-07-24 16:52:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][240/625] eta 0:02:37 lr 0.001073 wd 0.0500 time 0.3995 (0.4092) data time 0.0008 (0.0026) model time 0.3987 (0.4040) loss 7.3937 (7.8768) grad_norm 1.3829 (inf) loss_scale 4096.0000 (5336.6971) mem 14939MB [2024-07-24 16:52:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][250/625] eta 0:02:33 lr 0.001073 wd 0.0500 time 0.4060 (0.4089) data time 0.0007 (0.0025) model time 0.4053 (0.4040) loss 9.2105 (7.8805) grad_norm 1.5630 (inf) loss_scale 4096.0000 (5287.2669) mem 14939MB [2024-07-24 16:52:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][260/625] eta 0:02:29 lr 0.001073 wd 0.0500 time 0.4031 (0.4088) data time 0.0008 (0.0025) model time 0.4023 (0.4039) loss 9.6022 (7.8993) grad_norm 2.0921 (inf) loss_scale 4096.0000 (5241.6245) mem 14939MB [2024-07-24 16:52:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][270/625] eta 0:02:25 lr 0.001073 wd 0.0500 time 0.4019 (0.4098) data time 0.0009 (0.0024) model time 0.4010 (0.4054) loss 7.8589 (7.8831) grad_norm 1.4323 (inf) loss_scale 4096.0000 (5199.3506) mem 14939MB [2024-07-24 16:53:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][280/625] eta 0:02:21 lr 0.001073 wd 0.0500 time 0.4004 (0.4096) data time 0.0007 (0.0023) model time 0.3997 (0.4053) loss 8.6517 (7.9046) grad_norm 1.8780 (inf) loss_scale 4096.0000 (5160.0854) mem 14939MB [2024-07-24 16:53:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][290/625] eta 0:02:17 lr 0.001073 wd 0.0500 time 0.3997 (0.4093) data time 0.0006 (0.0023) model time 0.3992 (0.4051) loss 7.1333 (7.9058) grad_norm 1.5069 (inf) loss_scale 4096.0000 (5123.5189) mem 14939MB [2024-07-24 16:53:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][300/625] eta 0:02:12 lr 0.001073 wd 0.0500 time 0.4019 (0.4091) data time 0.0008 (0.0022) model time 0.4010 (0.4049) loss 8.3003 (7.8956) grad_norm 1.5249 (inf) loss_scale 4096.0000 (5089.3821) mem 14939MB [2024-07-24 16:53:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][310/625] eta 0:02:08 lr 0.001072 wd 0.0500 time 0.4106 (0.4089) data time 0.0007 (0.0022) model time 0.4099 (0.4049) loss 6.7751 (7.8914) grad_norm 1.7567 (inf) loss_scale 4096.0000 (5057.4405) mem 14939MB [2024-07-24 16:53:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][320/625] eta 0:02:04 lr 0.001072 wd 0.0500 time 0.4074 (0.4087) data time 0.0006 (0.0022) model time 0.4068 (0.4047) loss 8.9570 (7.8993) grad_norm 1.7528 (inf) loss_scale 4096.0000 (5027.4891) mem 14939MB [2024-07-24 16:53:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][330/625] eta 0:02:00 lr 0.001072 wd 0.0500 time 0.4070 (0.4085) data time 0.0006 (0.0021) model time 0.4063 (0.4046) loss 8.4393 (7.9090) grad_norm 1.9082 (inf) loss_scale 4096.0000 (4999.3474) mem 14939MB [2024-07-24 16:53:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][340/625] eta 0:01:56 lr 0.001072 wd 0.0500 time 0.4006 (0.4084) data time 0.0008 (0.0021) model time 0.3998 (0.4046) loss 7.2963 (7.9149) grad_norm 1.3048 (inf) loss_scale 4096.0000 (4972.8563) mem 14939MB [2024-07-24 16:53:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][350/625] eta 0:01:52 lr 0.001072 wd 0.0500 time 0.4046 (0.4083) data time 0.0007 (0.0020) model time 0.4039 (0.4046) loss 7.6217 (7.9168) grad_norm 1.9722 (inf) loss_scale 4096.0000 (4947.8746) mem 14939MB [2024-07-24 16:53:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][360/625] eta 0:01:48 lr 0.001072 wd 0.0500 time 0.4108 (0.4082) data time 0.0010 (0.0020) model time 0.4098 (0.4046) loss 8.7805 (7.9268) grad_norm 2.5716 (inf) loss_scale 4096.0000 (4924.2770) mem 14939MB [2024-07-24 16:53:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][370/625] eta 0:01:44 lr 0.001072 wd 0.0500 time 0.4046 (0.4081) data time 0.0006 (0.0020) model time 0.4039 (0.4046) loss 7.2414 (7.9208) grad_norm 1.3250 (inf) loss_scale 4096.0000 (4901.9515) mem 14939MB [2024-07-24 16:53:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][380/625] eta 0:01:39 lr 0.001072 wd 0.0500 time 0.4064 (0.4080) data time 0.0009 (0.0020) model time 0.4055 (0.4045) loss 7.0052 (7.9165) grad_norm 1.6139 (inf) loss_scale 4096.0000 (4880.7979) mem 14939MB [2024-07-24 16:53:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][390/625] eta 0:01:35 lr 0.001072 wd 0.0500 time 0.4108 (0.4080) data time 0.0011 (0.0019) model time 0.4097 (0.4045) loss 8.1970 (7.9273) grad_norm 3.5187 (inf) loss_scale 4096.0000 (4860.7263) mem 14939MB [2024-07-24 16:53:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][400/625] eta 0:01:31 lr 0.001072 wd 0.0500 time 0.3989 (0.4079) data time 0.0008 (0.0019) model time 0.3980 (0.4045) loss 9.1582 (7.9295) grad_norm 1.6613 (inf) loss_scale 4096.0000 (4841.6559) mem 14939MB [2024-07-24 16:53:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][410/625] eta 0:01:27 lr 0.001072 wd 0.0500 time 0.4125 (0.4078) data time 0.0009 (0.0019) model time 0.4116 (0.4045) loss 6.5001 (7.9208) grad_norm 1.7534 (inf) loss_scale 4096.0000 (4823.5134) mem 14939MB [2024-07-24 16:53:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][420/625] eta 0:01:23 lr 0.001072 wd 0.0500 time 0.4062 (0.4077) data time 0.0008 (0.0019) model time 0.4054 (0.4044) loss 6.7110 (7.9231) grad_norm 1.5157 (inf) loss_scale 4096.0000 (4806.2328) mem 14939MB [2024-07-24 16:54:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][430/625] eta 0:01:19 lr 0.001072 wd 0.0500 time 0.4040 (0.4076) data time 0.0009 (0.0018) model time 0.4031 (0.4044) loss 8.3809 (7.9089) grad_norm 1.5980 (inf) loss_scale 4096.0000 (4789.7541) mem 14939MB [2024-07-24 16:54:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][440/625] eta 0:01:15 lr 0.001072 wd 0.0500 time 0.5658 (0.4080) data time 0.0009 (0.0018) model time 0.5649 (0.4049) loss 7.0849 (7.9113) grad_norm 1.8832 (inf) loss_scale 4096.0000 (4774.0227) mem 14939MB [2024-07-24 16:54:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][450/625] eta 0:01:11 lr 0.001072 wd 0.0500 time 0.4146 (0.4079) data time 0.0011 (0.0018) model time 0.4136 (0.4048) loss 6.9999 (7.9078) grad_norm 1.7982 (inf) loss_scale 4096.0000 (4758.9889) mem 14939MB [2024-07-24 16:54:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][460/625] eta 0:01:07 lr 0.001072 wd 0.0500 time 0.4004 (0.4104) data time 0.0008 (0.0018) model time 0.3996 (0.4077) loss 8.9459 (7.8983) grad_norm 2.0133 (inf) loss_scale 4096.0000 (4744.6074) mem 14939MB [2024-07-24 16:54:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][470/625] eta 0:01:03 lr 0.001071 wd 0.0500 time 0.4023 (0.4102) data time 0.0006 (0.0018) model time 0.4017 (0.4075) loss 6.6412 (7.8926) grad_norm 1.5975 (inf) loss_scale 4096.0000 (4730.8365) mem 14939MB [2024-07-24 16:54:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][480/625] eta 0:00:59 lr 0.001071 wd 0.0500 time 0.4010 (0.4100) data time 0.0007 (0.0017) model time 0.4003 (0.4073) loss 6.2703 (7.8909) grad_norm 2.3873 (inf) loss_scale 4096.0000 (4717.6383) mem 14939MB [2024-07-24 16:54:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][490/625] eta 0:00:55 lr 0.001071 wd 0.0500 time 0.4043 (0.4107) data time 0.0006 (0.0017) model time 0.4037 (0.4082) loss 7.9446 (7.8900) grad_norm 1.4330 (inf) loss_scale 4096.0000 (4704.9776) mem 14939MB [2024-07-24 16:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][500/625] eta 0:00:51 lr 0.001071 wd 0.0500 time 0.4010 (0.4106) data time 0.0009 (0.0017) model time 0.4001 (0.4081) loss 8.7569 (7.8898) grad_norm 3.2701 (inf) loss_scale 4096.0000 (4692.8224) mem 14939MB [2024-07-24 16:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][510/625] eta 0:00:47 lr 0.001071 wd 0.0500 time 0.4072 (0.4106) data time 0.0009 (0.0017) model time 0.4063 (0.4080) loss 6.4436 (7.9000) grad_norm 1.4920 (inf) loss_scale 4096.0000 (4681.1429) mem 14939MB [2024-07-24 16:54:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][520/625] eta 0:00:43 lr 0.001071 wd 0.0500 time 0.4062 (0.4104) data time 0.0006 (0.0017) model time 0.4055 (0.4080) loss 8.6762 (7.8987) grad_norm 1.4973 (inf) loss_scale 4096.0000 (4669.9117) mem 14939MB [2024-07-24 16:54:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][530/625] eta 0:00:38 lr 0.001071 wd 0.0500 time 0.4141 (0.4104) data time 0.0006 (0.0017) model time 0.4135 (0.4079) loss 8.8514 (7.9016) grad_norm 2.0265 (inf) loss_scale 4096.0000 (4659.1036) mem 14939MB [2024-07-24 16:54:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][540/625] eta 0:00:34 lr 0.001071 wd 0.0500 time 0.3984 (0.4102) data time 0.0007 (0.0017) model time 0.3977 (0.4078) loss 8.4271 (7.9030) grad_norm 1.4071 (inf) loss_scale 4096.0000 (4648.6950) mem 14939MB [2024-07-24 16:54:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][550/625] eta 0:00:30 lr 0.001071 wd 0.0500 time 0.4006 (0.4101) data time 0.0007 (0.0016) model time 0.3999 (0.4077) loss 7.4726 (7.9030) grad_norm 1.8911 (inf) loss_scale 4096.0000 (4638.6642) mem 14939MB [2024-07-24 16:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][560/625] eta 0:00:26 lr 0.001071 wd 0.0500 time 0.3958 (0.4099) data time 0.0008 (0.0016) model time 0.3949 (0.4075) loss 8.7540 (7.9002) grad_norm 2.4663 (inf) loss_scale 4096.0000 (4628.9911) mem 14939MB [2024-07-24 16:54:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][570/625] eta 0:00:22 lr 0.001071 wd 0.0500 time 0.3977 (0.4098) data time 0.0008 (0.0016) model time 0.3969 (0.4074) loss 6.9134 (7.9049) grad_norm 1.3161 (inf) loss_scale 4096.0000 (4619.6567) mem 14939MB [2024-07-24 16:55:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][580/625] eta 0:00:18 lr 0.001071 wd 0.0500 time 0.4007 (0.4097) data time 0.0008 (0.0016) model time 0.3999 (0.4073) loss 9.1013 (7.9000) grad_norm 1.9716 (inf) loss_scale 4096.0000 (4610.6437) mem 14939MB [2024-07-24 16:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][590/625] eta 0:00:14 lr 0.001071 wd 0.0500 time 0.4019 (0.4096) data time 0.0008 (0.0016) model time 0.4011 (0.4072) loss 6.3744 (7.8971) grad_norm 2.0442 (inf) loss_scale 4096.0000 (4601.9357) mem 14939MB [2024-07-24 16:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][600/625] eta 0:00:10 lr 0.001071 wd 0.0500 time 0.4044 (0.4094) data time 0.0006 (0.0016) model time 0.4038 (0.4071) loss 7.5464 (7.8876) grad_norm 2.5037 (inf) loss_scale 4096.0000 (4593.5175) mem 14939MB [2024-07-24 16:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][610/625] eta 0:00:06 lr 0.001071 wd 0.0500 time 0.4036 (0.4094) data time 0.0006 (0.0016) model time 0.4030 (0.4071) loss 6.8083 (7.8844) grad_norm 1.2457 (inf) loss_scale 4096.0000 (4585.3748) mem 14939MB [2024-07-24 16:55:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][620/625] eta 0:00:02 lr 0.001070 wd 0.0500 time 0.4023 (0.4093) data time 0.0004 (0.0015) model time 0.4019 (0.4070) loss 6.9204 (7.8874) grad_norm 1.2815 (inf) loss_scale 4096.0000 (4577.4944) mem 14939MB [2024-07-24 16:55:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 79 training takes 0:04:15 [2024-07-24 16:55:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:55:21 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 16:55:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.667 (0.667) Loss 0.6484 (0.6484) Acc@1 87.354 (87.354) Acc@5 97.803 (97.803) Mem 14939MB [2024-07-24 16:55:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.139) Loss 1.0615 (0.8026) Acc@1 76.611 (83.145) Acc@5 94.092 (96.640) Mem 14939MB [2024-07-24 16:55:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.114) Loss 1.2217 (0.9594) Acc@1 72.656 (79.311) Acc@5 91.992 (94.799) Mem 14939MB [2024-07-24 16:55:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.879 Acc@5 94.808 [2024-07-24 16:55:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-07-24 16:55:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.88% [2024-07-24 16:55:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 16:55:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 16:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.587 (0.587) Loss 0.6548 (0.6548) Acc@1 86.914 (86.914) Acc@5 97.803 (97.803) Mem 14939MB [2024-07-24 16:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.132) Loss 1.0400 (0.7987) Acc@1 77.148 (83.589) Acc@5 94.434 (96.817) Mem 14939MB [2024-07-24 16:55:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.110) Loss 1.2031 (0.9481) Acc@1 71.436 (79.732) Acc@5 92.871 (95.115) Mem 14939MB [2024-07-24 16:55:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.427 Acc@5 95.102 [2024-07-24 16:55:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-07-24 16:55:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.43% [2024-07-24 16:55:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 16:55:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 16:55:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][0/625] eta 0:24:17 lr 0.001070 wd 0.0500 time 2.3321 (2.3321) data time 1.9596 (1.9596) model time 0.0000 (0.0000) loss 7.9388 (7.9388) grad_norm 1.5551 (1.5551) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:55:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][10/625] eta 0:05:55 lr 0.001070 wd 0.0500 time 0.4025 (0.5788) data time 0.0007 (0.1790) model time 0.0000 (0.0000) loss 6.8275 (7.4983) grad_norm 1.3363 (1.5949) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:55:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][20/625] eta 0:05:00 lr 0.001070 wd 0.0500 time 0.4069 (0.4962) data time 0.0006 (0.0941) model time 0.0000 (0.0000) loss 6.2508 (7.6502) grad_norm 1.9439 (1.6718) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:55:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][30/625] eta 0:04:37 lr 0.001070 wd 0.0500 time 0.4056 (0.4662) data time 0.0008 (0.0640) model time 0.0000 (0.0000) loss 7.5950 (7.7382) grad_norm 2.8579 (1.8092) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:55:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][40/625] eta 0:04:23 lr 0.001070 wd 0.0500 time 0.4033 (0.4511) data time 0.0007 (0.0486) model time 0.0000 (0.0000) loss 6.7699 (7.7511) grad_norm 1.3387 (1.8292) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:55:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][50/625] eta 0:04:13 lr 0.001070 wd 0.0500 time 0.3987 (0.4415) data time 0.0006 (0.0392) model time 0.0000 (0.0000) loss 6.3084 (7.7505) grad_norm 1.2668 (1.7845) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:55:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][60/625] eta 0:04:05 lr 0.001070 wd 0.0500 time 0.4039 (0.4352) data time 0.0008 (0.0329) model time 0.4031 (0.4023) loss 7.4959 (7.7660) grad_norm 1.7438 (1.7734) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][70/625] eta 0:03:58 lr 0.001070 wd 0.0500 time 0.4009 (0.4305) data time 0.0009 (0.0284) model time 0.4000 (0.4017) loss 7.9226 (7.8087) grad_norm 1.5039 (1.7499) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][80/625] eta 0:03:53 lr 0.001070 wd 0.0500 time 0.4113 (0.4290) data time 0.0008 (0.0250) model time 0.4105 (0.4070) loss 8.0705 (7.8107) grad_norm 2.3494 (1.7485) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][90/625] eta 0:03:51 lr 0.001070 wd 0.0500 time 0.4053 (0.4322) data time 0.0006 (0.0224) model time 0.4046 (0.4194) loss 6.6171 (7.7951) grad_norm 1.4981 (1.7272) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][100/625] eta 0:03:45 lr 0.001070 wd 0.0500 time 0.4061 (0.4294) data time 0.0008 (0.0202) model time 0.4053 (0.4163) loss 7.7384 (7.7567) grad_norm 2.1492 (1.7599) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][110/625] eta 0:03:39 lr 0.001070 wd 0.0500 time 0.4001 (0.4271) data time 0.0009 (0.0185) model time 0.3993 (0.4139) loss 8.0829 (7.7708) grad_norm 1.6500 (1.7545) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][120/625] eta 0:03:34 lr 0.001070 wd 0.0500 time 0.4081 (0.4252) data time 0.0008 (0.0170) model time 0.4073 (0.4124) loss 6.5905 (7.8022) grad_norm 1.3884 (1.7637) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][130/625] eta 0:03:29 lr 0.001070 wd 0.0500 time 0.4018 (0.4234) data time 0.0006 (0.0158) model time 0.4011 (0.4111) loss 9.2442 (7.8006) grad_norm 1.7608 (1.7707) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][140/625] eta 0:03:24 lr 0.001069 wd 0.0500 time 0.4083 (0.4221) data time 0.0007 (0.0148) model time 0.4076 (0.4102) loss 6.3827 (7.8051) grad_norm 1.6644 (1.7592) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][150/625] eta 0:03:19 lr 0.001069 wd 0.0500 time 0.4113 (0.4209) data time 0.0009 (0.0139) model time 0.4104 (0.4094) loss 9.1719 (7.8124) grad_norm 2.3507 (1.7481) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][160/625] eta 0:03:15 lr 0.001069 wd 0.0500 time 0.4018 (0.4207) data time 0.0009 (0.0131) model time 0.4010 (0.4101) loss 6.3521 (7.8444) grad_norm 1.4674 (1.7375) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][170/625] eta 0:03:10 lr 0.001069 wd 0.0500 time 0.4044 (0.4197) data time 0.0008 (0.0124) model time 0.4035 (0.4096) loss 8.1679 (7.8162) grad_norm 2.1573 (1.7693) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][180/625] eta 0:03:06 lr 0.001069 wd 0.0500 time 0.4041 (0.4189) data time 0.0008 (0.0117) model time 0.4033 (0.4090) loss 7.9900 (7.8110) grad_norm 1.7286 (1.7850) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][190/625] eta 0:03:01 lr 0.001069 wd 0.0500 time 0.4033 (0.4181) data time 0.0006 (0.0112) model time 0.4026 (0.4086) loss 7.1257 (7.8214) grad_norm 1.8783 (1.7924) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][200/625] eta 0:02:57 lr 0.001069 wd 0.0500 time 0.4028 (0.4173) data time 0.0009 (0.0106) model time 0.4019 (0.4082) loss 8.7956 (7.8438) grad_norm 1.6345 (1.8074) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:56:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][210/625] eta 0:02:52 lr 0.001069 wd 0.0500 time 0.4049 (0.4167) data time 0.0008 (0.0102) model time 0.4041 (0.4079) loss 7.3948 (7.8463) grad_norm 2.1106 (1.8373) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:57:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][220/625] eta 0:02:48 lr 0.001069 wd 0.0500 time 0.4257 (0.4163) data time 0.0009 (0.0098) model time 0.4248 (0.4077) loss 9.0021 (7.8331) grad_norm 1.7755 (1.8464) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:57:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][230/625] eta 0:02:44 lr 0.001069 wd 0.0500 time 0.4001 (0.4158) data time 0.0006 (0.0094) model time 0.3995 (0.4075) loss 7.2171 (7.8222) grad_norm 1.6510 (1.8390) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:57:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][240/625] eta 0:02:39 lr 0.001069 wd 0.0500 time 0.4072 (0.4152) data time 0.0006 (0.0091) model time 0.4065 (0.4072) loss 9.7441 (7.8161) grad_norm 2.3269 (1.8387) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:57:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][250/625] eta 0:02:35 lr 0.001069 wd 0.0500 time 0.4014 (0.4147) data time 0.0006 (0.0087) model time 0.4007 (0.4069) loss 8.3364 (7.8150) grad_norm 1.5779 (1.8403) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:57:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][260/625] eta 0:02:31 lr 0.001069 wd 0.0500 time 0.4021 (0.4143) data time 0.0008 (0.0084) model time 0.4013 (0.4067) loss 8.3971 (7.8236) grad_norm 1.5538 (1.8353) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][270/625] eta 0:02:26 lr 0.001069 wd 0.0500 time 0.4032 (0.4138) data time 0.0008 (0.0082) model time 0.4024 (0.4064) loss 7.9630 (7.8362) grad_norm 1.3666 (1.8253) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 16:57:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-24 16:57:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 16:57:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 17:09:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-24 17:09:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-24 17:09:41 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-24 17:09:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-24 17:09:53 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-24 17:09:53 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-24 17:09:53 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-24 17:09:53 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 80) [2024-07-24 17:09:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-24 17:10:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][280/625] eta 0:09:29 lr 0.001069 wd 0.0500 time 0.3965 (1.6494) data time 0.0006 (0.1113) model time 0.3959 (1.5380) loss 7.6376 (8.4866) grad_norm 1.3882 (1.8658) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][290/625] eta 0:05:18 lr 0.001068 wd 0.0500 time 0.3911 (0.9522) data time 0.0006 (0.0499) model time 0.3905 (0.9022) loss 9.0333 (8.3165) grad_norm 1.8424 (1.8082) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][300/625] eta 0:04:04 lr 0.001068 wd 0.0500 time 0.3935 (0.7528) data time 0.0008 (0.0324) model time 0.3927 (0.7204) loss 9.3157 (8.4500) grad_norm 2.3494 (1.8906) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][310/625] eta 0:03:27 lr 0.001068 wd 0.0500 time 0.3925 (0.6586) data time 0.0008 (0.0241) model time 0.3917 (0.6346) loss 8.2257 (8.3051) grad_norm 2.1933 (1.8385) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][320/625] eta 0:03:04 lr 0.001068 wd 0.0500 time 0.3927 (0.6039) data time 0.0007 (0.0192) model time 0.3920 (0.5846) loss 7.9105 (8.1631) grad_norm 1.3421 (1.7988) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][330/625] eta 0:02:49 lr 0.001068 wd 0.0500 time 0.3940 (0.5757) data time 0.0006 (0.0160) model time 0.3934 (0.5597) loss 7.0481 (8.1493) grad_norm 2.0385 (1.8984) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][340/625] eta 0:02:36 lr 0.001068 wd 0.0500 time 0.3949 (0.5496) data time 0.0006 (0.0138) model time 0.3943 (0.5358) loss 6.7833 (8.0714) grad_norm 3.7903 (2.0021) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][350/625] eta 0:02:25 lr 0.001068 wd 0.0500 time 0.3924 (0.5300) data time 0.0007 (0.0121) model time 0.3918 (0.5179) loss 7.1408 (8.0439) grad_norm 1.7816 (2.0211) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][360/625] eta 0:02:16 lr 0.001068 wd 0.0500 time 0.4021 (0.5148) data time 0.0007 (0.0109) model time 0.4014 (0.5039) loss 9.0540 (8.0380) grad_norm 2.0063 (2.0166) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][370/625] eta 0:02:08 lr 0.001068 wd 0.0500 time 0.3963 (0.5026) data time 0.0006 (0.0098) model time 0.3958 (0.4928) loss 9.4335 (8.0746) grad_norm 1.6769 (2.0060) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][380/625] eta 0:02:00 lr 0.001068 wd 0.0500 time 0.3948 (0.4927) data time 0.0006 (0.0090) model time 0.3941 (0.4838) loss 6.4416 (8.0676) grad_norm 1.4704 (1.9671) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][390/625] eta 0:01:53 lr 0.001068 wd 0.0500 time 0.4019 (0.4846) data time 0.0007 (0.0083) model time 0.4012 (0.4763) loss 7.7462 (8.0491) grad_norm 1.5458 (1.9401) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:10:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][400/625] eta 0:01:47 lr 0.001068 wd 0.0500 time 0.3970 (0.4777) data time 0.0006 (0.0077) model time 0.3964 (0.4700) loss 7.3552 (8.0248) grad_norm 1.4790 (1.9207) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][410/625] eta 0:01:41 lr 0.001068 wd 0.0500 time 0.3964 (0.4718) data time 0.0007 (0.0072) model time 0.3957 (0.4646) loss 7.3685 (8.0016) grad_norm 1.4340 (1.8928) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][420/625] eta 0:01:35 lr 0.001068 wd 0.0500 time 0.3951 (0.4670) data time 0.0007 (0.0068) model time 0.3943 (0.4602) loss 8.0048 (7.9830) grad_norm 1.6491 (1.9413) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][430/625] eta 0:01:30 lr 0.001068 wd 0.0500 time 0.3936 (0.4627) data time 0.0005 (0.0064) model time 0.3931 (0.4563) loss 7.0964 (7.9784) grad_norm 3.0130 (1.9799) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][440/625] eta 0:01:24 lr 0.001067 wd 0.0500 time 0.3990 (0.4589) data time 0.0009 (0.0061) model time 0.3982 (0.4529) loss 9.4165 (7.9929) grad_norm 1.5766 (1.9802) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][450/625] eta 0:01:19 lr 0.001067 wd 0.0500 time 0.3929 (0.4554) data time 0.0007 (0.0058) model time 0.3923 (0.4496) loss 7.4999 (7.9721) grad_norm 2.3074 (1.9770) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][460/625] eta 0:01:14 lr 0.001067 wd 0.0500 time 0.3902 (0.4526) data time 0.0007 (0.0055) model time 0.3896 (0.4471) loss 7.9295 (7.9709) grad_norm 2.2816 (1.9656) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][470/625] eta 0:01:09 lr 0.001067 wd 0.0500 time 0.4046 (0.4498) data time 0.0009 (0.0053) model time 0.4037 (0.4445) loss 6.8366 (7.9564) grad_norm 1.3584 (1.9542) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][480/625] eta 0:01:04 lr 0.001067 wd 0.0500 time 0.4032 (0.4473) data time 0.0007 (0.0050) model time 0.4025 (0.4422) loss 7.9390 (7.9400) grad_norm 1.5358 (1.9486) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][490/625] eta 0:01:00 lr 0.001067 wd 0.0500 time 0.4145 (0.4451) data time 0.0008 (0.0049) model time 0.4137 (0.4402) loss 7.0495 (7.9246) grad_norm 1.3779 (1.9309) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][500/625] eta 0:00:55 lr 0.001067 wd 0.0500 time 0.3978 (0.4430) data time 0.0008 (0.0047) model time 0.3970 (0.4383) loss 8.8062 (7.9313) grad_norm 1.4759 (1.9239) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][510/625] eta 0:00:50 lr 0.001067 wd 0.0500 time 0.3975 (0.4411) data time 0.0006 (0.0045) model time 0.3969 (0.4366) loss 8.6701 (7.9268) grad_norm 1.6009 (1.9304) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][520/625] eta 0:00:46 lr 0.001067 wd 0.0500 time 0.3987 (0.4393) data time 0.0008 (0.0044) model time 0.3979 (0.4349) loss 7.1401 (7.9334) grad_norm 1.5596 (1.9310) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][530/625] eta 0:00:41 lr 0.001067 wd 0.0500 time 0.3975 (0.4376) data time 0.0008 (0.0042) model time 0.3967 (0.4334) loss 8.1444 (7.9223) grad_norm 1.6963 (1.9381) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][540/625] eta 0:00:37 lr 0.001067 wd 0.0500 time 0.3962 (0.4361) data time 0.0007 (0.0041) model time 0.3955 (0.4320) loss 7.9493 (7.9121) grad_norm 1.8430 (1.9391) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:11:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][550/625] eta 0:00:32 lr 0.001067 wd 0.0500 time 0.3939 (0.4366) data time 0.0008 (0.0040) model time 0.3931 (0.4327) loss 6.2918 (7.9170) grad_norm 1.4795 (1.9327) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][560/625] eta 0:00:28 lr 0.001067 wd 0.0500 time 0.4227 (0.4355) data time 0.0007 (0.0039) model time 0.4220 (0.4316) loss 9.4067 (7.9237) grad_norm 2.6695 (1.9424) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:12:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][570/625] eta 0:00:23 lr 0.001067 wd 0.0500 time 0.4038 (0.4342) data time 0.0008 (0.0038) model time 0.4030 (0.4304) loss 7.6919 (7.8939) grad_norm 1.1452 (1.9404) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:12:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][580/625] eta 0:00:19 lr 0.001067 wd 0.0500 time 0.3956 (0.4330) data time 0.0006 (0.0037) model time 0.3950 (0.4293) loss 6.4544 (7.8825) grad_norm 1.2019 (1.9319) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][590/625] eta 0:00:15 lr 0.001066 wd 0.0500 time 0.3931 (0.4318) data time 0.0008 (0.0036) model time 0.3923 (0.4283) loss 7.9561 (7.8996) grad_norm 1.5603 (1.9170) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:12:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][600/625] eta 0:00:10 lr 0.001066 wd 0.0500 time 0.3871 (0.4308) data time 0.0006 (0.0035) model time 0.3865 (0.4273) loss 8.6633 (7.9170) grad_norm 2.7422 (1.9106) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][610/625] eta 0:00:06 lr 0.001066 wd 0.0500 time 0.4062 (0.4299) data time 0.0005 (0.0034) model time 0.4057 (0.4264) loss 9.0274 (7.9111) grad_norm 1.2228 (1.9178) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:12:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][620/625] eta 0:00:02 lr 0.001066 wd 0.0500 time 0.3959 (0.4289) data time 0.0004 (0.0033) model time 0.3955 (0.4256) loss 6.9904 (7.9121) grad_norm 2.6799 (1.9177) loss_scale 4096.0000 (4096.0000) mem 14931MB [2024-07-24 17:12:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 80 training takes 0:02:30 [2024-07-24 17:12:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 17:12:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 17:12:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.439 (0.439) Loss 0.6519 (0.6519) Acc@1 86.768 (86.768) Acc@5 97.998 (97.998) Mem 14931MB [2024-07-24 17:12:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.117) Loss 1.0342 (0.7982) Acc@1 75.781 (83.057) Acc@5 94.385 (96.795) Mem 14931MB [2024-07-24 17:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.102) Loss 1.2041 (0.9473) Acc@1 72.314 (79.181) Acc@5 91.943 (94.922) Mem 14931MB [2024-07-24 17:12:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.907 Acc@5 94.882 [2024-07-24 17:12:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-07-24 17:12:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.91% [2024-07-24 17:12:35 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 17:12:36 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 17:12:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.851 (0.851) Loss 0.6523 (0.6523) Acc@1 87.109 (87.109) Acc@5 97.852 (97.852) Mem 14931MB [2024-07-24 17:12:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 1.0371 (0.7963) Acc@1 77.246 (83.638) Acc@5 94.434 (96.857) Mem 14931MB [2024-07-24 17:12:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.2002 (0.9450) Acc@1 71.533 (79.781) Acc@5 92.969 (95.147) Mem 14931MB [2024-07-24 17:12:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.477 Acc@5 95.136 [2024-07-24 17:12:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.5% [2024-07-24 17:12:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.48% [2024-07-24 17:12:39 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 17:12:41 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 17:12:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][0/625] eta 0:09:44 lr 0.001066 wd 0.0500 time 0.9350 (0.9350) data time 0.3857 (0.3857) model time 0.0000 (0.0000) loss 8.8294 (8.8294) grad_norm 1.3509 (1.3509) loss_scale 4096.0000 (4096.0000) mem 14938MB [2024-07-24 17:12:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][10/625] eta 0:04:34 lr 0.001066 wd 0.0500 time 0.3993 (0.4457) data time 0.0007 (0.0359) model time 0.0000 (0.0000) loss 7.6186 (8.0625) grad_norm 1.4822 (1.5712) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:12:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][20/625] eta 0:04:15 lr 0.001066 wd 0.0500 time 0.3997 (0.4226) data time 0.0006 (0.0192) model time 0.0000 (0.0000) loss 7.8179 (7.8003) grad_norm 1.5511 (1.7218) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:12:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][30/625] eta 0:04:06 lr 0.001066 wd 0.0500 time 0.4106 (0.4148) data time 0.0008 (0.0133) model time 0.0000 (0.0000) loss 6.4790 (7.7150) grad_norm 1.5443 (1.6652) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:12:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][40/625] eta 0:04:00 lr 0.001066 wd 0.0500 time 0.3995 (0.4105) data time 0.0008 (0.0102) model time 0.0000 (0.0000) loss 8.9980 (7.7674) grad_norm 2.5534 (1.6826) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][50/625] eta 0:03:55 lr 0.001066 wd 0.0500 time 0.3970 (0.4101) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 7.0202 (7.8446) grad_norm 1.7651 (1.6523) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][60/625] eta 0:03:50 lr 0.001066 wd 0.0500 time 0.3995 (0.4081) data time 0.0009 (0.0071) model time 0.3986 (0.3970) loss 8.2137 (7.8577) grad_norm 1.9679 (1.6795) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][70/625] eta 0:03:45 lr 0.001066 wd 0.0500 time 0.3962 (0.4069) data time 0.0005 (0.0062) model time 0.3956 (0.3981) loss 7.2427 (7.8292) grad_norm 1.9101 (1.7166) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][80/625] eta 0:03:41 lr 0.001066 wd 0.0500 time 0.3956 (0.4059) data time 0.0006 (0.0056) model time 0.3950 (0.3980) loss 8.8580 (7.9124) grad_norm 2.4483 (1.7346) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][90/625] eta 0:03:36 lr 0.001066 wd 0.0500 time 0.3959 (0.4049) data time 0.0006 (0.0050) model time 0.3953 (0.3975) loss 7.3710 (7.9246) grad_norm 1.4394 (1.7562) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][100/625] eta 0:03:32 lr 0.001066 wd 0.0500 time 0.4012 (0.4041) data time 0.0006 (0.0046) model time 0.4006 (0.3973) loss 6.5257 (7.8800) grad_norm 2.6612 (1.7565) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][110/625] eta 0:03:27 lr 0.001065 wd 0.0500 time 0.4110 (0.4035) data time 0.0006 (0.0043) model time 0.4104 (0.3972) loss 6.8688 (7.8480) grad_norm 1.7201 (1.7560) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][120/625] eta 0:03:23 lr 0.001065 wd 0.0500 time 0.3967 (0.4029) data time 0.0005 (0.0040) model time 0.3962 (0.3969) loss 8.9008 (7.8207) grad_norm 2.9008 (1.7533) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][130/625] eta 0:03:19 lr 0.001065 wd 0.0500 time 0.4077 (0.4026) data time 0.0006 (0.0037) model time 0.4071 (0.3971) loss 9.1577 (7.8350) grad_norm 1.5707 (1.7495) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][140/625] eta 0:03:15 lr 0.001065 wd 0.0500 time 0.3916 (0.4023) data time 0.0009 (0.0035) model time 0.3906 (0.3971) loss 6.8326 (7.8472) grad_norm 1.4810 (1.7433) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][150/625] eta 0:03:12 lr 0.001065 wd 0.0500 time 0.3983 (0.4054) data time 0.0006 (0.0033) model time 0.3977 (0.4022) loss 7.0307 (7.8544) grad_norm 3.7425 (1.7710) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][160/625] eta 0:03:08 lr 0.001065 wd 0.0500 time 0.4009 (0.4049) data time 0.0006 (0.0032) model time 0.4002 (0.4017) loss 8.2108 (7.9064) grad_norm 1.6020 (1.7668) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][170/625] eta 0:03:04 lr 0.001065 wd 0.0500 time 0.3908 (0.4046) data time 0.0009 (0.0030) model time 0.3899 (0.4014) loss 7.9203 (7.8882) grad_norm 1.5608 (1.7605) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][180/625] eta 0:02:59 lr 0.001065 wd 0.0500 time 0.4006 (0.4042) data time 0.0006 (0.0029) model time 0.4000 (0.4010) loss 8.7282 (7.8896) grad_norm 1.3219 (1.7536) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:13:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][190/625] eta 0:02:55 lr 0.001065 wd 0.0500 time 0.3992 (0.4038) data time 0.0008 (0.0028) model time 0.3984 (0.4007) loss 8.0243 (7.8820) grad_norm 2.9314 (1.7609) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][200/625] eta 0:02:51 lr 0.001065 wd 0.0500 time 0.3959 (0.4035) data time 0.0006 (0.0027) model time 0.3954 (0.4004) loss 8.7821 (7.8884) grad_norm 1.7186 (1.7633) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][210/625] eta 0:02:47 lr 0.001065 wd 0.0500 time 0.3998 (0.4031) data time 0.0008 (0.0026) model time 0.3990 (0.4002) loss 7.5515 (7.8919) grad_norm 1.3803 (1.7821) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][220/625] eta 0:02:43 lr 0.001065 wd 0.0500 time 0.4093 (0.4030) data time 0.0006 (0.0025) model time 0.4087 (0.4001) loss 7.2682 (7.8886) grad_norm 1.6391 (1.7976) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][230/625] eta 0:02:39 lr 0.001065 wd 0.0500 time 0.3954 (0.4030) data time 0.0008 (0.0025) model time 0.3946 (0.4002) loss 8.0608 (7.9001) grad_norm 1.8989 (1.7961) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][240/625] eta 0:02:35 lr 0.001065 wd 0.0500 time 0.3943 (0.4029) data time 0.0007 (0.0024) model time 0.3936 (0.4001) loss 6.7257 (7.9080) grad_norm 2.3740 (1.8013) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][250/625] eta 0:02:30 lr 0.001065 wd 0.0500 time 0.3942 (0.4026) data time 0.0009 (0.0023) model time 0.3933 (0.4000) loss 7.3114 (7.8976) grad_norm 1.5904 (1.8161) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][260/625] eta 0:02:26 lr 0.001064 wd 0.0500 time 0.4083 (0.4026) data time 0.0007 (0.0023) model time 0.4075 (0.4001) loss 7.9975 (7.9042) grad_norm 1.9141 (1.8237) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][270/625] eta 0:02:23 lr 0.001064 wd 0.0500 time 0.3956 (0.4029) data time 0.0006 (0.0022) model time 0.3951 (0.4004) loss 8.2413 (7.9100) grad_norm 1.3090 (1.8235) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][280/625] eta 0:02:18 lr 0.001064 wd 0.0500 time 0.4012 (0.4027) data time 0.0006 (0.0022) model time 0.4006 (0.4002) loss 9.0392 (7.9120) grad_norm 2.0401 (1.8293) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][290/625] eta 0:02:14 lr 0.001064 wd 0.0500 time 0.3967 (0.4025) data time 0.0007 (0.0021) model time 0.3960 (0.4001) loss 9.1275 (7.8944) grad_norm 1.3823 (1.8305) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][300/625] eta 0:02:10 lr 0.001064 wd 0.0500 time 0.3951 (0.4023) data time 0.0006 (0.0021) model time 0.3944 (0.3999) loss 8.1286 (7.8855) grad_norm 2.4399 (1.8346) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][310/625] eta 0:02:06 lr 0.001064 wd 0.0500 time 0.3973 (0.4022) data time 0.0008 (0.0020) model time 0.3965 (0.3998) loss 8.0183 (7.8645) grad_norm 1.4470 (1.8322) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][320/625] eta 0:02:02 lr 0.001064 wd 0.0500 time 0.3941 (0.4020) data time 0.0006 (0.0020) model time 0.3934 (0.3997) loss 8.0861 (7.8722) grad_norm 1.5902 (1.8267) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][330/625] eta 0:01:58 lr 0.001064 wd 0.0500 time 0.3964 (0.4021) data time 0.0006 (0.0020) model time 0.3958 (0.3998) loss 7.6398 (7.8728) grad_norm 1.3092 (1.8185) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][340/625] eta 0:01:54 lr 0.001064 wd 0.0500 time 0.4005 (0.4019) data time 0.0007 (0.0019) model time 0.3998 (0.3997) loss 8.3090 (7.8648) grad_norm 1.5932 (1.8122) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][350/625] eta 0:01:50 lr 0.001064 wd 0.0500 time 0.3955 (0.4018) data time 0.0006 (0.0019) model time 0.3949 (0.3997) loss 9.2445 (7.8605) grad_norm 1.7401 (1.8119) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][360/625] eta 0:01:46 lr 0.001064 wd 0.0500 time 0.3935 (0.4018) data time 0.0006 (0.0019) model time 0.3929 (0.3997) loss 8.0731 (7.8584) grad_norm 2.3648 (1.8268) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][370/625] eta 0:01:42 lr 0.001064 wd 0.0500 time 0.3954 (0.4033) data time 0.0006 (0.0018) model time 0.3948 (0.4015) loss 8.4853 (7.8633) grad_norm 2.0648 (1.8257) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][380/625] eta 0:01:38 lr 0.001064 wd 0.0500 time 0.3970 (0.4034) data time 0.0007 (0.0018) model time 0.3963 (0.4015) loss 8.3202 (7.8638) grad_norm 2.4300 (1.8325) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][390/625] eta 0:01:34 lr 0.001064 wd 0.0500 time 0.3987 (0.4033) data time 0.0009 (0.0018) model time 0.3978 (0.4015) loss 7.6103 (7.8704) grad_norm 1.5903 (1.8315) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][400/625] eta 0:01:30 lr 0.001064 wd 0.0500 time 0.4022 (0.4032) data time 0.0008 (0.0018) model time 0.4013 (0.4013) loss 6.6675 (7.8716) grad_norm 1.8439 (1.8334) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][410/625] eta 0:01:26 lr 0.001063 wd 0.0500 time 0.3959 (0.4030) data time 0.0007 (0.0017) model time 0.3952 (0.4012) loss 7.6427 (7.8771) grad_norm 1.9443 (1.8372) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][420/625] eta 0:01:22 lr 0.001063 wd 0.0500 time 0.3976 (0.4028) data time 0.0006 (0.0017) model time 0.3969 (0.4011) loss 8.7272 (7.8820) grad_norm 2.5478 (1.8469) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][430/625] eta 0:01:18 lr 0.001063 wd 0.0500 time 0.4051 (0.4028) data time 0.0008 (0.0017) model time 0.4043 (0.4010) loss 9.0462 (7.8838) grad_norm 2.5425 (1.8576) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][440/625] eta 0:01:14 lr 0.001063 wd 0.0500 time 0.9292 (0.4038) data time 0.5570 (0.0029) model time 0.3722 (0.4008) loss 8.6046 (7.8831) grad_norm 1.4958 (1.8544) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][450/625] eta 0:01:10 lr 0.001063 wd 0.0500 time 0.3932 (0.4039) data time 0.0009 (0.0029) model time 0.3924 (0.4009) loss 6.9811 (7.8808) grad_norm 1.7221 (1.8531) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][460/625] eta 0:01:06 lr 0.001063 wd 0.0500 time 0.3915 (0.4038) data time 0.0007 (0.0029) model time 0.3908 (0.4009) loss 6.9721 (7.8712) grad_norm 1.7256 (1.8528) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][470/625] eta 0:01:02 lr 0.001063 wd 0.0500 time 0.4011 (0.4061) data time 0.0008 (0.0028) model time 0.4003 (0.4035) loss 6.9402 (7.8721) grad_norm 2.0511 (1.8505) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:15:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][480/625] eta 0:00:58 lr 0.001063 wd 0.0500 time 0.3991 (0.4059) data time 0.0006 (0.0028) model time 0.3985 (0.4033) loss 7.8239 (7.8665) grad_norm 2.6156 (1.8538) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][490/625] eta 0:00:54 lr 0.001063 wd 0.0500 time 0.4022 (0.4061) data time 0.0006 (0.0027) model time 0.4016 (0.4036) loss 9.4255 (7.8643) grad_norm 1.5875 (1.8546) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][500/625] eta 0:00:50 lr 0.001063 wd 0.0500 time 0.3936 (0.4059) data time 0.0006 (0.0027) model time 0.3929 (0.4034) loss 6.6703 (7.8622) grad_norm 1.3467 (1.8553) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][510/625] eta 0:00:46 lr 0.001063 wd 0.0500 time 0.4005 (0.4058) data time 0.0008 (0.0027) model time 0.3998 (0.4033) loss 5.7754 (7.8607) grad_norm 2.2882 (1.8591) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][520/625] eta 0:00:42 lr 0.001063 wd 0.0500 time 0.4007 (0.4056) data time 0.0006 (0.0026) model time 0.4001 (0.4031) loss 8.0453 (7.8595) grad_norm 2.4422 (1.8736) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][530/625] eta 0:00:38 lr 0.001063 wd 0.0500 time 0.4013 (0.4055) data time 0.0009 (0.0026) model time 0.4004 (0.4030) loss 7.6494 (7.8473) grad_norm 2.2409 (1.8877) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][540/625] eta 0:00:34 lr 0.001063 wd 0.0500 time 0.3968 (0.4053) data time 0.0006 (0.0026) model time 0.3963 (0.4029) loss 7.9775 (7.8478) grad_norm 2.2108 (1.8883) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][550/625] eta 0:00:30 lr 0.001062 wd 0.0500 time 0.4063 (0.4052) data time 0.0008 (0.0025) model time 0.4055 (0.4028) loss 8.7044 (7.8500) grad_norm 1.5763 (1.8855) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][560/625] eta 0:00:26 lr 0.001062 wd 0.0500 time 0.3964 (0.4051) data time 0.0006 (0.0025) model time 0.3958 (0.4027) loss 8.8874 (7.8468) grad_norm 1.4501 (1.8830) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][570/625] eta 0:00:22 lr 0.001062 wd 0.0500 time 0.3927 (0.4050) data time 0.0008 (0.0025) model time 0.3919 (0.4026) loss 7.9584 (7.8512) grad_norm 1.9734 (1.8779) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][580/625] eta 0:00:18 lr 0.001062 wd 0.0500 time 0.4013 (0.4050) data time 0.0006 (0.0024) model time 0.4007 (0.4026) loss 7.3570 (7.8563) grad_norm 1.8969 (1.8804) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][590/625] eta 0:00:14 lr 0.001062 wd 0.0500 time 0.3959 (0.4055) data time 0.0007 (0.0024) model time 0.3952 (0.4033) loss 6.8440 (7.8507) grad_norm 2.0792 (1.8881) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][600/625] eta 0:00:10 lr 0.001062 wd 0.0500 time 0.3966 (0.4054) data time 0.0006 (0.0024) model time 0.3960 (0.4032) loss 7.3333 (7.8447) grad_norm 1.9346 (1.8948) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][610/625] eta 0:00:06 lr 0.001062 wd 0.0500 time 0.3910 (0.4053) data time 0.0006 (0.0024) model time 0.3905 (0.4030) loss 9.2355 (7.8462) grad_norm 1.7735 (1.8923) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][620/625] eta 0:00:02 lr 0.001062 wd 0.0500 time 0.3950 (0.4051) data time 0.0004 (0.0023) model time 0.3947 (0.4029) loss 8.2038 (7.8535) grad_norm 1.2967 (1.8868) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:16:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 81 training takes 0:04:13 [2024-07-24 17:16:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 17:16:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 17:16:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.708 (0.708) Loss 0.6523 (0.6523) Acc@1 87.549 (87.549) Acc@5 97.607 (97.607) Mem 14939MB [2024-07-24 17:16:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.084 (0.145) Loss 1.0361 (0.7851) Acc@1 76.758 (83.407) Acc@5 94.043 (96.768) Mem 14939MB [2024-07-24 17:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.117) Loss 1.2002 (0.9455) Acc@1 71.387 (79.253) Acc@5 92.529 (94.954) Mem 14939MB [2024-07-24 17:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.951 Acc@5 94.924 [2024-07-24 17:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-24 17:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.95% [2024-07-24 17:16:58 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 17:16:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 17:17:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.792 (0.792) Loss 0.6484 (0.6484) Acc@1 87.256 (87.256) Acc@5 97.900 (97.900) Mem 14939MB [2024-07-24 17:17:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.150) Loss 1.0352 (0.7934) Acc@1 77.344 (83.722) Acc@5 94.531 (96.893) Mem 14939MB [2024-07-24 17:17:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.119) Loss 1.1973 (0.9417) Acc@1 71.777 (79.846) Acc@5 92.920 (95.171) Mem 14939MB [2024-07-24 17:17:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.531 Acc@5 95.162 [2024-07-24 17:17:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.5% [2024-07-24 17:17:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.53% [2024-07-24 17:17:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 17:17:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 17:17:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][0/625] eta 0:08:08 lr 0.001062 wd 0.0500 time 0.7816 (0.7816) data time 0.4098 (0.4098) model time 0.0000 (0.0000) loss 6.8641 (6.8641) grad_norm 1.6462 (1.6462) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][10/625] eta 0:04:25 lr 0.001062 wd 0.0500 time 0.3955 (0.4319) data time 0.0007 (0.0380) model time 0.0000 (0.0000) loss 6.8526 (7.9770) grad_norm 1.7467 (2.1312) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][20/625] eta 0:04:13 lr 0.001062 wd 0.0500 time 0.3981 (0.4194) data time 0.0008 (0.0203) model time 0.0000 (0.0000) loss 7.7339 (7.9738) grad_norm 1.7595 (2.0755) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][30/625] eta 0:04:06 lr 0.001062 wd 0.0500 time 0.3958 (0.4140) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 9.0897 (7.8785) grad_norm 2.0304 (2.0964) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][40/625] eta 0:03:59 lr 0.001062 wd 0.0500 time 0.4039 (0.4101) data time 0.0006 (0.0108) model time 0.0000 (0.0000) loss 6.6882 (7.8761) grad_norm 2.6740 (2.1318) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][50/625] eta 0:03:54 lr 0.001062 wd 0.0500 time 0.4064 (0.4076) data time 0.0007 (0.0088) model time 0.0000 (0.0000) loss 6.7560 (7.7509) grad_norm 2.2132 (2.0393) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][60/625] eta 0:03:49 lr 0.001062 wd 0.0500 time 0.3930 (0.4060) data time 0.0007 (0.0075) model time 0.3923 (0.3970) loss 7.6919 (7.7490) grad_norm 1.8677 (2.0496) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][70/625] eta 0:03:44 lr 0.001062 wd 0.0500 time 0.3942 (0.4048) data time 0.0009 (0.0066) model time 0.3933 (0.3970) loss 8.1047 (7.7530) grad_norm 1.3589 (1.9769) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][80/625] eta 0:03:40 lr 0.001061 wd 0.0500 time 0.3930 (0.4037) data time 0.0007 (0.0059) model time 0.3922 (0.3964) loss 8.9551 (7.8313) grad_norm 1.8886 (1.9427) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][90/625] eta 0:03:35 lr 0.001061 wd 0.0500 time 0.3957 (0.4030) data time 0.0007 (0.0053) model time 0.3950 (0.3963) loss 8.1532 (7.8285) grad_norm 2.5975 (1.9326) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][100/625] eta 0:03:31 lr 0.001061 wd 0.0500 time 0.3968 (0.4024) data time 0.0008 (0.0049) model time 0.3960 (0.3963) loss 8.8671 (7.8115) grad_norm 1.7462 (1.9014) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][110/625] eta 0:03:27 lr 0.001061 wd 0.0500 time 0.4108 (0.4020) data time 0.0006 (0.0045) model time 0.4102 (0.3965) loss 7.3942 (7.7982) grad_norm 1.2965 (1.8842) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][120/625] eta 0:03:22 lr 0.001061 wd 0.0500 time 0.3993 (0.4017) data time 0.0006 (0.0042) model time 0.3987 (0.3965) loss 6.2629 (7.7903) grad_norm 1.5997 (1.8669) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][130/625] eta 0:03:18 lr 0.001061 wd 0.0500 time 0.3984 (0.4014) data time 0.0006 (0.0040) model time 0.3978 (0.3966) loss 6.8335 (7.7932) grad_norm 1.3718 (1.8532) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][140/625] eta 0:03:14 lr 0.001061 wd 0.0500 time 0.3968 (0.4011) data time 0.0007 (0.0037) model time 0.3962 (0.3965) loss 9.7371 (7.8277) grad_norm 2.8004 (1.8431) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:18:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][150/625] eta 0:03:10 lr 0.001061 wd 0.0500 time 0.3993 (0.4008) data time 0.0009 (0.0035) model time 0.3985 (0.3966) loss 6.7551 (7.8173) grad_norm 2.4753 (1.8611) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:18:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][160/625] eta 0:03:06 lr 0.001061 wd 0.0500 time 0.3999 (0.4007) data time 0.0008 (0.0034) model time 0.3992 (0.3966) loss 8.5010 (7.8142) grad_norm 1.1871 (1.8677) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:18:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][170/625] eta 0:03:02 lr 0.001061 wd 0.0500 time 0.3958 (0.4005) data time 0.0007 (0.0032) model time 0.3951 (0.3966) loss 8.1480 (7.8218) grad_norm 1.4611 (1.8622) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:18:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][180/625] eta 0:02:58 lr 0.001061 wd 0.0500 time 0.3974 (0.4003) data time 0.0007 (0.0031) model time 0.3966 (0.3966) loss 6.1888 (7.8030) grad_norm 3.0910 (1.8716) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:18:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][190/625] eta 0:02:54 lr 0.001061 wd 0.0500 time 0.4101 (0.4016) data time 0.0006 (0.0030) model time 0.4096 (0.3986) loss 8.7642 (7.7862) grad_norm 1.4820 (1.8652) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 17:18:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][200/625] eta 0:02:50 lr 0.001061 wd 0.0500 time 0.4085 (0.4016) data time 0.0008 (0.0029) model time 0.4077 (0.3987) loss 9.2442 (7.8290) grad_norm 1.5222 (1.8764) loss_scale 8192.0000 (4157.1343) mem 14939MB [2024-07-24 17:18:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][210/625] eta 0:02:46 lr 0.001061 wd 0.0500 time 0.4032 (0.4018) data time 0.0007 (0.0028) model time 0.4025 (0.3991) loss 8.3112 (7.8562) grad_norm 2.0861 (1.8780) loss_scale 8192.0000 (4348.3602) mem 14939MB [2024-07-24 17:18:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][220/625] eta 0:02:42 lr 0.001060 wd 0.0500 time 0.3981 (0.4024) data time 0.0007 (0.0027) model time 0.3974 (0.3999) loss 8.0613 (7.8629) grad_norm 2.2354 (1.8738) loss_scale 8192.0000 (4522.2805) mem 14939MB [2024-07-24 17:18:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][230/625] eta 0:02:38 lr 0.001060 wd 0.0500 time 0.4039 (0.4023) data time 0.0008 (0.0026) model time 0.4031 (0.4000) loss 7.4309 (7.8737) grad_norm 2.7282 (1.8784) loss_scale 8192.0000 (4681.1429) mem 14939MB [2024-07-24 17:18:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][240/625] eta 0:02:34 lr 0.001060 wd 0.0500 time 0.4061 (0.4024) data time 0.0006 (0.0025) model time 0.4056 (0.4002) loss 7.4328 (7.8721) grad_norm 1.5844 (1.8779) loss_scale 8192.0000 (4826.8216) mem 14939MB [2024-07-24 17:18:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][250/625] eta 0:02:30 lr 0.001060 wd 0.0500 time 0.4043 (0.4025) data time 0.0006 (0.0025) model time 0.4037 (0.4004) loss 5.4905 (7.8613) grad_norm 1.6978 (1.8739) loss_scale 8192.0000 (4960.8924) mem 14939MB [2024-07-24 17:18:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][260/625] eta 0:02:27 lr 0.001060 wd 0.0500 time 0.4042 (0.4027) data time 0.0008 (0.0024) model time 0.4034 (0.4007) loss 8.2419 (7.8642) grad_norm 2.2044 (1.8712) loss_scale 8192.0000 (5084.6897) mem 14939MB [2024-07-24 17:18:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][270/625] eta 0:02:23 lr 0.001060 wd 0.0500 time 0.4096 (0.4028) data time 0.0008 (0.0023) model time 0.4089 (0.4009) loss 6.1918 (7.8691) grad_norm 1.4477 (1.8711) loss_scale 8192.0000 (5199.3506) mem 14939MB [2024-07-24 17:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][280/625] eta 0:02:18 lr 0.001060 wd 0.0500 time 0.4070 (0.4029) data time 0.0006 (0.0023) model time 0.4064 (0.4010) loss 8.5709 (7.8623) grad_norm 1.7949 (1.8666) loss_scale 8192.0000 (5305.8505) mem 14939MB [2024-07-24 17:19:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][290/625] eta 0:02:14 lr 0.001060 wd 0.0500 time 0.4047 (0.4030) data time 0.0006 (0.0022) model time 0.4041 (0.4012) loss 7.0858 (7.8741) grad_norm 1.5334 (1.8824) loss_scale 8192.0000 (5405.0309) mem 14939MB [2024-07-24 17:19:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][300/625] eta 0:02:10 lr 0.001060 wd 0.0500 time 0.4039 (0.4029) data time 0.0006 (0.0022) model time 0.4033 (0.4011) loss 9.2011 (7.8761) grad_norm 2.1766 (1.8896) loss_scale 8192.0000 (5497.6213) mem 14939MB [2024-07-24 17:19:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][310/625] eta 0:02:06 lr 0.001060 wd 0.0500 time 0.3955 (0.4027) data time 0.0007 (0.0021) model time 0.3948 (0.4010) loss 8.6219 (7.8837) grad_norm 2.3780 (1.9031) loss_scale 8192.0000 (5584.2572) mem 14939MB [2024-07-24 17:19:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][320/625] eta 0:02:02 lr 0.001060 wd 0.0500 time 0.3960 (0.4026) data time 0.0006 (0.0021) model time 0.3954 (0.4008) loss 7.2123 (7.8808) grad_norm 1.7553 (1.9027) loss_scale 8192.0000 (5665.4953) mem 14939MB [2024-07-24 17:19:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][330/625] eta 0:01:58 lr 0.001060 wd 0.0500 time 0.3996 (0.4025) data time 0.0008 (0.0021) model time 0.3989 (0.4007) loss 7.5259 (7.8792) grad_norm 1.8665 (1.8992) loss_scale 8192.0000 (5741.8248) mem 14939MB [2024-07-24 17:19:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][340/625] eta 0:01:54 lr 0.001060 wd 0.0500 time 0.4008 (0.4023) data time 0.0008 (0.0020) model time 0.4000 (0.4006) loss 7.4337 (7.8641) grad_norm 1.4443 (1.8920) loss_scale 8192.0000 (5813.6774) mem 14939MB [2024-07-24 17:19:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][350/625] eta 0:01:50 lr 0.001060 wd 0.0500 time 0.3972 (0.4022) data time 0.0006 (0.0020) model time 0.3966 (0.4005) loss 8.8062 (7.8581) grad_norm 1.2660 (1.8987) loss_scale 8192.0000 (5881.4359) mem 14939MB [2024-07-24 17:19:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][360/625] eta 0:01:46 lr 0.001060 wd 0.0500 time 0.4030 (0.4021) data time 0.0008 (0.0020) model time 0.4022 (0.4004) loss 8.2230 (7.8565) grad_norm 2.0286 (1.9069) loss_scale 8192.0000 (5945.4404) mem 14939MB [2024-07-24 17:19:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][370/625] eta 0:01:42 lr 0.001059 wd 0.0500 time 0.3953 (0.4020) data time 0.0008 (0.0019) model time 0.3945 (0.4002) loss 9.0154 (7.8669) grad_norm 1.5360 (1.9026) loss_scale 8192.0000 (6005.9946) mem 14939MB [2024-07-24 17:19:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][380/625] eta 0:01:38 lr 0.001059 wd 0.0500 time 0.3948 (0.4019) data time 0.0007 (0.0019) model time 0.3941 (0.4001) loss 8.2824 (7.8732) grad_norm 1.7341 (1.8937) loss_scale 8192.0000 (6063.3701) mem 14939MB [2024-07-24 17:19:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][390/625] eta 0:01:34 lr 0.001059 wd 0.0500 time 0.3974 (0.4017) data time 0.0006 (0.0019) model time 0.3968 (0.4000) loss 7.4412 (7.8868) grad_norm 1.1747 (1.8863) loss_scale 8192.0000 (6117.8107) mem 14939MB [2024-07-24 17:19:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][400/625] eta 0:01:30 lr 0.001059 wd 0.0500 time 0.3968 (0.4016) data time 0.0008 (0.0018) model time 0.3960 (0.3999) loss 7.7130 (7.8909) grad_norm 1.5866 (1.8910) loss_scale 8192.0000 (6169.5362) mem 14939MB [2024-07-24 17:19:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][410/625] eta 0:01:26 lr 0.001059 wd 0.0500 time 0.3952 (0.4028) data time 0.0006 (0.0018) model time 0.3946 (0.4013) loss 6.5238 (7.8950) grad_norm 1.5425 (1.8905) loss_scale 8192.0000 (6218.7445) mem 14939MB [2024-07-24 17:19:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][420/625] eta 0:01:22 lr 0.001059 wd 0.0500 time 0.3967 (0.4027) data time 0.0008 (0.0018) model time 0.3959 (0.4012) loss 8.8697 (7.9000) grad_norm 2.2286 (1.8924) loss_scale 8192.0000 (6265.6152) mem 14939MB [2024-07-24 17:19:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][430/625] eta 0:01:18 lr 0.001059 wd 0.0500 time 0.3954 (0.4025) data time 0.0006 (0.0018) model time 0.3948 (0.4010) loss 7.5779 (7.9047) grad_norm 1.3698 (1.8908) loss_scale 8192.0000 (6310.3109) mem 14939MB [2024-07-24 17:20:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][440/625] eta 0:01:14 lr 0.001059 wd 0.0500 time 0.3704 (0.4029) data time 0.0009 (0.0018) model time 0.3695 (0.4015) loss 7.4387 (7.8993) grad_norm 1.5464 (1.8910) loss_scale 8192.0000 (6352.9796) mem 14939MB [2024-07-24 17:20:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][450/625] eta 0:01:10 lr 0.001059 wd 0.0500 time 0.3985 (0.4028) data time 0.0008 (0.0017) model time 0.3977 (0.4013) loss 7.2987 (7.8959) grad_norm 1.2573 (1.8850) loss_scale 8192.0000 (6393.7561) mem 14939MB [2024-07-24 17:20:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][460/625] eta 0:01:06 lr 0.001059 wd 0.0500 time 0.3971 (0.4026) data time 0.0008 (0.0017) model time 0.3964 (0.4012) loss 6.8318 (7.9017) grad_norm 2.1609 (1.8834) loss_scale 8192.0000 (6432.7636) mem 14939MB [2024-07-24 17:20:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][470/625] eta 0:01:02 lr 0.001059 wd 0.0500 time 0.3988 (0.4025) data time 0.0008 (0.0017) model time 0.3981 (0.4011) loss 6.4307 (7.8974) grad_norm 1.3739 (1.8889) loss_scale 8192.0000 (6470.1146) mem 14939MB [2024-07-24 17:20:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][480/625] eta 0:00:58 lr 0.001059 wd 0.0500 time 0.3964 (0.4025) data time 0.0006 (0.0017) model time 0.3958 (0.4011) loss 8.5239 (7.8998) grad_norm 2.5137 (1.8933) loss_scale 8192.0000 (6505.9127) mem 14939MB [2024-07-24 17:20:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-24 17:20:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 17:20:19 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 17:28:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-24 17:28:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-24 17:29:06 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-24 17:29:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-24 17:29:16 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-24 17:29:16 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-24 17:29:16 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-24 17:29:16 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 82) [2024-07-24 17:29:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-24 17:29:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][490/625] eta 0:05:15 lr 0.001059 wd 0.0500 time 0.3900 (2.3386) data time 0.0007 (0.1473) model time 0.3893 (2.1913) loss 9.0001 (8.3401) grad_norm 2.2695 (2.1221) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:29:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][500/625] eta 0:01:58 lr 0.001059 wd 0.0500 time 0.3931 (0.9492) data time 0.0006 (0.0427) model time 0.3925 (0.9065) loss 7.8440 (8.1914) grad_norm 2.1339 (2.0944) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:29:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][510/625] eta 0:01:22 lr 0.001058 wd 0.0500 time 0.3935 (0.7177) data time 0.0009 (0.0253) model time 0.3927 (0.6924) loss 7.1981 (8.1291) grad_norm 1.5612 (2.0507) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:29:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][520/625] eta 0:01:05 lr 0.001058 wd 0.0500 time 0.3903 (0.6223) data time 0.0007 (0.0181) model time 0.3897 (0.6042) loss 6.9686 (8.1730) grad_norm 2.2054 (1.9448) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:29:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][530/625] eta 0:00:54 lr 0.001058 wd 0.0500 time 0.3953 (0.5707) data time 0.0006 (0.0142) model time 0.3947 (0.5565) loss 8.2213 (8.0937) grad_norm 2.9890 (1.9191) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:29:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][540/625] eta 0:00:46 lr 0.001058 wd 0.0500 time 0.3921 (0.5418) data time 0.0006 (0.0117) model time 0.3915 (0.5301) loss 8.6188 (8.0794) grad_norm 2.0338 (1.9437) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:29:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][550/625] eta 0:00:39 lr 0.001058 wd 0.0500 time 0.3957 (0.5233) data time 0.0006 (0.0100) model time 0.3951 (0.5133) loss 6.9070 (7.9723) grad_norm 1.6003 (1.9724) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:29:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][560/625] eta 0:00:32 lr 0.001058 wd 0.0500 time 0.3936 (0.5062) data time 0.0006 (0.0088) model time 0.3930 (0.4974) loss 8.4160 (7.9564) grad_norm 2.2413 (1.9521) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:30:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][570/625] eta 0:00:27 lr 0.001058 wd 0.0500 time 0.3964 (0.4936) data time 0.0009 (0.0079) model time 0.3955 (0.4857) loss 9.2499 (7.9218) grad_norm 1.6111 (1.9073) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:30:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][580/625] eta 0:00:21 lr 0.001058 wd 0.0500 time 0.3999 (0.4836) data time 0.0009 (0.0071) model time 0.3990 (0.4765) loss 7.0489 (7.8962) grad_norm 2.7009 (1.8983) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:30:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][590/625] eta 0:00:16 lr 0.001058 wd 0.0500 time 0.3976 (0.4753) data time 0.0010 (0.0065) model time 0.3967 (0.4688) loss 7.3437 (7.9516) grad_norm 1.5289 (1.9009) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:30:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][600/625] eta 0:00:11 lr 0.001058 wd 0.0500 time 0.3991 (0.4685) data time 0.0009 (0.0060) model time 0.3983 (0.4624) loss 8.8239 (7.9603) grad_norm 1.2684 (1.8937) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:30:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][610/625] eta 0:00:06 lr 0.001058 wd 0.0500 time 0.3980 (0.4627) data time 0.0006 (0.0057) model time 0.3974 (0.4571) loss 6.6961 (7.9491) grad_norm 1.7063 (1.9114) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:30:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][620/625] eta 0:00:02 lr 0.001058 wd 0.0500 time 0.3950 (0.4579) data time 0.0006 (0.0053) model time 0.3943 (0.4526) loss 8.1842 (7.9387) grad_norm 2.5779 (1.9058) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 17:30:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 82 training takes 0:01:02 [2024-07-24 17:30:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 17:30:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 17:30:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.425 (0.425) Loss 0.6543 (0.6543) Acc@1 87.305 (87.305) Acc@5 98.047 (98.047) Mem 14931MB [2024-07-24 17:30:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 1.0693 (0.8090) Acc@1 76.221 (83.265) Acc@5 94.092 (96.764) Mem 14931MB [2024-07-24 17:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.102) Loss 1.2227 (0.9604) Acc@1 71.729 (79.241) Acc@5 92.041 (94.915) Mem 14931MB [2024-07-24 17:30:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.923 Acc@5 94.874 [2024-07-24 17:30:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-07-24 17:30:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.885 (0.885) Loss 0.6460 (0.6460) Acc@1 87.256 (87.256) Acc@5 97.900 (97.900) Mem 14931MB [2024-07-24 17:30:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.163) Loss 1.0312 (0.7909) Acc@1 77.393 (83.816) Acc@5 94.629 (96.906) Mem 14931MB [2024-07-24 17:30:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.126) Loss 1.1914 (0.9389) Acc@1 71.777 (79.920) Acc@5 92.969 (95.201) Mem 14931MB [2024-07-24 17:30:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.613 Acc@5 95.196 [2024-07-24 17:30:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-24 17:30:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.61% [2024-07-24 17:30:33 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 17:30:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 17:30:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][0/625] eta 0:20:35 lr 0.001058 wd 0.0500 time 1.9772 (1.9772) data time 1.4205 (1.4205) model time 0.0000 (0.0000) loss 8.4818 (8.4818) grad_norm 1.8246 (1.8246) loss_scale 8192.0000 (8192.0000) mem 14938MB [2024-07-24 17:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-24 17:30:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 17:30:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 21:31:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-24 21:31:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-24 21:33:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-24 21:33:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-24 21:37:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-24 21:37:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-24 21:37:49 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-24 21:37:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-24 21:37:59 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-24 21:38:00 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-24 21:38:00 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-24 21:38:00 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 83) [2024-07-24 21:38:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-24 21:38:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][10/625] eta 0:17:41 lr 0.001058 wd 0.0500 time 0.3913 (1.7264) data time 0.0010 (0.6069) model time 0.0000 (0.0000) loss 8.4939 (8.3875) grad_norm 1.5235 (1.4645) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][20/625] eta 0:10:41 lr 0.001058 wd 0.0500 time 0.3954 (1.0608) data time 0.0007 (0.3039) model time 0.0000 (0.0000) loss 8.0956 (8.2083) grad_norm 1.6656 (1.6305) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][30/625] eta 0:08:19 lr 0.001057 wd 0.0500 time 0.3915 (0.8389) data time 0.0008 (0.2029) model time 0.0000 (0.0000) loss 9.0260 (8.3446) grad_norm 1.3970 (1.6720) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][40/625] eta 0:07:06 lr 0.001057 wd 0.0500 time 0.3935 (0.7283) data time 0.0007 (0.1524) model time 0.0000 (0.0000) loss 7.7876 (8.2164) grad_norm 2.0717 (1.7126) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][50/625] eta 0:06:20 lr 0.001057 wd 0.0500 time 0.3990 (0.6620) data time 0.0008 (0.1221) model time 0.0000 (0.0000) loss 7.3957 (8.1871) grad_norm 2.5427 (1.8874) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][60/625] eta 0:05:53 lr 0.001057 wd 0.0500 time 0.3997 (0.6251) data time 0.0007 (0.1019) model time 0.3990 (0.4399) loss 8.1379 (8.1443) grad_norm 1.9529 (1.9370) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][70/625] eta 0:05:28 lr 0.001057 wd 0.0500 time 0.3941 (0.5924) data time 0.0006 (0.0875) model time 0.3934 (0.4174) loss 6.7942 (8.0613) grad_norm 1.6317 (1.8736) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][80/625] eta 0:05:09 lr 0.001057 wd 0.0500 time 0.3931 (0.5677) data time 0.0010 (0.0767) model time 0.3921 (0.4095) loss 9.0218 (8.0628) grad_norm 1.9763 (1.8515) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][90/625] eta 0:04:53 lr 0.001057 wd 0.0500 time 0.3984 (0.5486) data time 0.0006 (0.0683) model time 0.3977 (0.4059) loss 8.7053 (8.0358) grad_norm 1.5706 (1.8343) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:38:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][100/625] eta 0:04:39 lr 0.001057 wd 0.0500 time 0.3981 (0.5333) data time 0.0007 (0.0615) model time 0.3973 (0.4036) loss 8.3025 (8.0516) grad_norm 1.8475 (1.8295) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][110/625] eta 0:04:28 lr 0.001057 wd 0.0500 time 0.3953 (0.5209) data time 0.0011 (0.0560) model time 0.3943 (0.4023) loss 7.3567 (8.0729) grad_norm 1.6182 (1.8262) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][120/625] eta 0:04:17 lr 0.001057 wd 0.0500 time 0.3961 (0.5106) data time 0.0006 (0.0514) model time 0.3955 (0.4015) loss 9.1765 (8.0856) grad_norm 1.4742 (1.8254) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][130/625] eta 0:04:08 lr 0.001057 wd 0.0500 time 0.3988 (0.5019) data time 0.0006 (0.0475) model time 0.3982 (0.4010) loss 7.6784 (8.0403) grad_norm 1.3385 (1.8291) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][140/625] eta 0:03:59 lr 0.001057 wd 0.0500 time 0.4031 (0.4945) data time 0.0006 (0.0442) model time 0.4025 (0.4005) loss 6.4968 (8.0130) grad_norm 1.8142 (1.8233) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][150/625] eta 0:03:51 lr 0.001057 wd 0.0500 time 0.3992 (0.4880) data time 0.0009 (0.0413) model time 0.3983 (0.4000) loss 8.4295 (8.0015) grad_norm 1.7314 (1.8220) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][160/625] eta 0:03:44 lr 0.001057 wd 0.0500 time 0.3982 (0.4823) data time 0.0008 (0.0388) model time 0.3974 (0.3997) loss 8.3139 (7.9883) grad_norm 1.5205 (1.8552) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][170/625] eta 0:03:37 lr 0.001057 wd 0.0500 time 0.3984 (0.4774) data time 0.0007 (0.0366) model time 0.3977 (0.3995) loss 6.5716 (7.9852) grad_norm 2.6490 (1.9180) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][180/625] eta 0:03:30 lr 0.001056 wd 0.0500 time 0.4093 (0.4730) data time 0.0007 (0.0346) model time 0.4085 (0.3993) loss 6.6510 (7.9564) grad_norm 1.5080 (1.9143) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][190/625] eta 0:03:24 lr 0.001056 wd 0.0500 time 0.4059 (0.4695) data time 0.0007 (0.0328) model time 0.4053 (0.3998) loss 7.2795 (7.9601) grad_norm 1.3227 (1.8998) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][200/625] eta 0:03:18 lr 0.001056 wd 0.0500 time 0.3967 (0.4659) data time 0.0008 (0.0312) model time 0.3959 (0.3996) loss 7.8994 (7.9340) grad_norm 1.4491 (1.8913) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][210/625] eta 0:03:11 lr 0.001056 wd 0.0500 time 0.3947 (0.4626) data time 0.0006 (0.0298) model time 0.3940 (0.3994) loss 8.4619 (7.9166) grad_norm 2.4021 (1.8920) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][220/625] eta 0:03:06 lr 0.001056 wd 0.0500 time 0.3958 (0.4597) data time 0.0009 (0.0285) model time 0.3949 (0.3993) loss 7.1271 (7.8985) grad_norm 4.3001 (1.9274) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][230/625] eta 0:03:00 lr 0.001056 wd 0.0500 time 0.3991 (0.4571) data time 0.0010 (0.0273) model time 0.3981 (0.3992) loss 8.5640 (7.9035) grad_norm 1.8212 (1.9300) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][240/625] eta 0:02:55 lr 0.001056 wd 0.0500 time 0.4000 (0.4547) data time 0.0010 (0.0262) model time 0.3990 (0.3993) loss 8.3532 (7.8956) grad_norm 2.0624 (1.9331) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:39:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][250/625] eta 0:02:49 lr 0.001056 wd 0.0500 time 0.3987 (0.4524) data time 0.0006 (0.0252) model time 0.3981 (0.3991) loss 6.2053 (7.8907) grad_norm 1.8835 (1.9327) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][260/625] eta 0:02:44 lr 0.001056 wd 0.0500 time 0.3993 (0.4504) data time 0.0008 (0.0242) model time 0.3985 (0.3990) loss 8.1203 (7.8786) grad_norm 2.4184 (1.9288) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][270/625] eta 0:02:39 lr 0.001056 wd 0.0500 time 0.3960 (0.4485) data time 0.0007 (0.0234) model time 0.3953 (0.3990) loss 8.8084 (7.8680) grad_norm 2.5205 (1.9354) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][280/625] eta 0:02:34 lr 0.001056 wd 0.0500 time 0.3989 (0.4487) data time 0.0009 (0.0226) model time 0.3980 (0.4014) loss 8.2547 (7.8853) grad_norm 1.6383 (1.9360) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][290/625] eta 0:02:29 lr 0.001056 wd 0.0500 time 0.4089 (0.4470) data time 0.0009 (0.0218) model time 0.4081 (0.4013) loss 6.9955 (7.8845) grad_norm 2.4120 (1.9324) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][300/625] eta 0:02:24 lr 0.001056 wd 0.0500 time 0.3970 (0.4454) data time 0.0007 (0.0211) model time 0.3963 (0.4012) loss 6.7485 (7.8622) grad_norm 2.2467 (1.9439) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][310/625] eta 0:02:19 lr 0.001056 wd 0.0500 time 0.4037 (0.4439) data time 0.0008 (0.0205) model time 0.4029 (0.4010) loss 7.9332 (7.8542) grad_norm 1.6609 (1.9336) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][320/625] eta 0:02:14 lr 0.001055 wd 0.0500 time 0.3975 (0.4425) data time 0.0008 (0.0199) model time 0.3967 (0.4010) loss 6.9674 (7.8773) grad_norm 1.6595 (1.9273) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][330/625] eta 0:02:10 lr 0.001055 wd 0.0500 time 0.3999 (0.4412) data time 0.0007 (0.0193) model time 0.3992 (0.4009) loss 8.0707 (7.8729) grad_norm 1.5669 (1.9179) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][340/625] eta 0:02:05 lr 0.001055 wd 0.0500 time 0.4010 (0.4400) data time 0.0009 (0.0188) model time 0.4001 (0.4008) loss 7.6192 (7.8716) grad_norm 1.8939 (1.9158) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][350/625] eta 0:02:00 lr 0.001055 wd 0.0500 time 0.4004 (0.4392) data time 0.0008 (0.0182) model time 0.3996 (0.4011) loss 7.0546 (7.8720) grad_norm 1.6853 (1.9171) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][360/625] eta 0:01:56 lr 0.001055 wd 0.0500 time 0.4003 (0.4381) data time 0.0005 (0.0178) model time 0.3997 (0.4011) loss 9.3390 (7.8741) grad_norm 1.6587 (1.9202) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][370/625] eta 0:01:51 lr 0.001055 wd 0.0500 time 0.3997 (0.4371) data time 0.0008 (0.0173) model time 0.3989 (0.4011) loss 8.7000 (7.8718) grad_norm 2.3513 (1.9144) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][380/625] eta 0:01:46 lr 0.001055 wd 0.0500 time 0.4038 (0.4361) data time 0.0007 (0.0169) model time 0.4032 (0.4010) loss 7.6448 (7.8690) grad_norm 1.6911 (1.9079) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][390/625] eta 0:01:42 lr 0.001055 wd 0.0500 time 0.4027 (0.4352) data time 0.0009 (0.0165) model time 0.4018 (0.4009) loss 5.8326 (7.8555) grad_norm 1.9867 (1.9033) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:40:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][400/625] eta 0:01:37 lr 0.001055 wd 0.0500 time 0.3949 (0.4343) data time 0.0008 (0.0161) model time 0.3941 (0.4008) loss 7.5690 (7.8592) grad_norm 2.0754 (1.9100) loss_scale 8192.0000 (8192.0000) mem 14931MB [2024-07-24 21:41:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][410/625] eta 0:01:33 lr 0.001055 wd 0.0500 time 0.4065 (0.4334) data time 0.0006 (0.0157) model time 0.4059 (0.4007) loss 7.9001 (7.8625) grad_norm 1.4100 (inf) loss_scale 4096.0000 (8162.0293) mem 14931MB [2024-07-24 21:41:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][420/625] eta 0:01:28 lr 0.001055 wd 0.0500 time 0.4088 (0.4326) data time 0.0007 (0.0154) model time 0.4081 (0.4006) loss 8.0912 (7.8565) grad_norm 1.6819 (inf) loss_scale 4096.0000 (8065.2190) mem 14931MB [2024-07-24 21:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][430/625] eta 0:01:24 lr 0.001055 wd 0.0500 time 0.3944 (0.4318) data time 0.0008 (0.0150) model time 0.3936 (0.4005) loss 9.1853 (7.8653) grad_norm 1.7882 (inf) loss_scale 4096.0000 (7972.9116) mem 14931MB [2024-07-24 21:41:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][440/625] eta 0:01:19 lr 0.001055 wd 0.0500 time 0.4155 (0.4311) data time 0.0008 (0.0147) model time 0.4147 (0.4005) loss 6.8571 (7.8693) grad_norm 2.1135 (inf) loss_scale 4096.0000 (7884.8000) mem 14931MB [2024-07-24 21:41:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][450/625] eta 0:01:15 lr 0.001055 wd 0.0500 time 0.4098 (0.4305) data time 0.0007 (0.0144) model time 0.4092 (0.4007) loss 7.2073 (7.8675) grad_norm 2.6454 (inf) loss_scale 4096.0000 (7800.6044) mem 14931MB [2024-07-24 21:41:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][460/625] eta 0:01:10 lr 0.001054 wd 0.0500 time 0.3933 (0.4298) data time 0.0006 (0.0141) model time 0.3927 (0.4006) loss 7.5457 (7.8580) grad_norm 1.5127 (inf) loss_scale 4096.0000 (7720.0696) mem 14931MB [2024-07-24 21:41:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][470/625] eta 0:01:06 lr 0.001054 wd 0.0500 time 0.3995 (0.4292) data time 0.0008 (0.0138) model time 0.3988 (0.4006) loss 7.2474 (7.8431) grad_norm 2.6020 (inf) loss_scale 4096.0000 (7642.9617) mem 14931MB [2024-07-24 21:41:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][480/625] eta 0:01:02 lr 0.001054 wd 0.0500 time 0.3987 (0.4286) data time 0.0008 (0.0135) model time 0.3979 (0.4005) loss 7.7333 (7.8390) grad_norm 2.8924 (inf) loss_scale 4096.0000 (7569.0667) mem 14931MB [2024-07-24 21:41:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][490/625] eta 0:00:57 lr 0.001054 wd 0.0500 time 0.3960 (0.4280) data time 0.0006 (0.0133) model time 0.3954 (0.4004) loss 7.1446 (7.8470) grad_norm 1.5929 (inf) loss_scale 4096.0000 (7498.1878) mem 14931MB [2024-07-24 21:41:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][500/625] eta 0:00:53 lr 0.001054 wd 0.0500 time 0.3992 (0.4284) data time 0.0006 (0.0130) model time 0.3986 (0.4015) loss 6.8807 (7.8419) grad_norm 1.7704 (inf) loss_scale 4096.0000 (7430.1440) mem 14931MB [2024-07-24 21:41:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][510/625] eta 0:00:49 lr 0.001054 wd 0.0500 time 0.4081 (0.4278) data time 0.0008 (0.0128) model time 0.4073 (0.4014) loss 8.0990 (7.8527) grad_norm 2.0160 (inf) loss_scale 4096.0000 (7364.7686) mem 14931MB [2024-07-24 21:41:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][520/625] eta 0:00:44 lr 0.001054 wd 0.0500 time 0.3977 (0.4273) data time 0.0007 (0.0126) model time 0.3969 (0.4014) loss 7.2631 (7.8564) grad_norm 2.4648 (inf) loss_scale 4096.0000 (7301.9077) mem 14931MB [2024-07-24 21:41:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][530/625] eta 0:00:40 lr 0.001054 wd 0.0500 time 0.3967 (0.4267) data time 0.0008 (0.0123) model time 0.3960 (0.4013) loss 8.3205 (7.8457) grad_norm 2.0893 (inf) loss_scale 4096.0000 (7241.4189) mem 14931MB [2024-07-24 21:41:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][540/625] eta 0:00:36 lr 0.001054 wd 0.0500 time 0.4043 (0.4262) data time 0.0006 (0.0121) model time 0.4037 (0.4012) loss 8.2438 (7.8433) grad_norm 1.5373 (inf) loss_scale 4096.0000 (7183.1704) mem 14931MB [2024-07-24 21:41:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][550/625] eta 0:00:31 lr 0.001054 wd 0.0500 time 0.3940 (0.4257) data time 0.0009 (0.0119) model time 0.3932 (0.4012) loss 6.3841 (7.8389) grad_norm 2.5922 (inf) loss_scale 4096.0000 (7127.0400) mem 14931MB [2024-07-24 21:42:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][560/625] eta 0:00:27 lr 0.001054 wd 0.0500 time 0.3990 (0.4253) data time 0.0008 (0.0117) model time 0.3981 (0.4012) loss 8.6343 (7.8466) grad_norm 2.1864 (inf) loss_scale 4096.0000 (7072.9143) mem 14931MB [2024-07-24 21:42:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][570/625] eta 0:00:23 lr 0.001054 wd 0.0500 time 0.3977 (0.4248) data time 0.0009 (0.0115) model time 0.3968 (0.4011) loss 8.0757 (7.8543) grad_norm 2.1860 (inf) loss_scale 4096.0000 (7020.6877) mem 14931MB [2024-07-24 21:42:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][580/625] eta 0:00:19 lr 0.001054 wd 0.0500 time 0.4095 (0.4244) data time 0.0006 (0.0114) model time 0.4089 (0.4011) loss 6.9929 (7.8539) grad_norm 3.2051 (inf) loss_scale 4096.0000 (6970.2621) mem 14931MB [2024-07-24 21:42:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][590/625] eta 0:00:14 lr 0.001054 wd 0.0500 time 0.3945 (0.4239) data time 0.0006 (0.0112) model time 0.3939 (0.4010) loss 7.7515 (7.8537) grad_norm 2.3433 (inf) loss_scale 4096.0000 (6921.5458) mem 14931MB [2024-07-24 21:42:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][600/625] eta 0:00:10 lr 0.001053 wd 0.0500 time 0.3967 (0.4236) data time 0.0010 (0.0110) model time 0.3958 (0.4010) loss 8.2112 (7.8556) grad_norm 1.5074 (inf) loss_scale 4096.0000 (6874.4533) mem 14931MB [2024-07-24 21:42:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][610/625] eta 0:00:06 lr 0.001053 wd 0.0500 time 0.3948 (0.4232) data time 0.0004 (0.0108) model time 0.3944 (0.4009) loss 7.6813 (7.8514) grad_norm 2.4275 (inf) loss_scale 4096.0000 (6828.9049) mem 14931MB [2024-07-24 21:42:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][620/625] eta 0:00:02 lr 0.001053 wd 0.0500 time 0.3985 (0.4227) data time 0.0006 (0.0107) model time 0.3979 (0.4008) loss 7.9893 (7.8538) grad_norm 1.8751 (inf) loss_scale 4096.0000 (6784.8258) mem 14931MB [2024-07-24 21:42:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 83 training takes 0:04:23 [2024-07-24 21:42:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 21:42:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 21:42:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.489 (0.489) Loss 0.6426 (0.6426) Acc@1 86.914 (86.914) Acc@5 97.705 (97.705) Mem 14931MB [2024-07-24 21:42:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.123) Loss 1.0391 (0.7949) Acc@1 75.830 (83.256) Acc@5 94.385 (96.786) Mem 14931MB [2024-07-24 21:42:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.2285 (0.9474) Acc@1 72.168 (79.402) Acc@5 91.846 (94.875) Mem 14931MB [2024-07-24 21:42:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.067 Acc@5 94.854 [2024-07-24 21:42:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-07-24 21:42:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.07% [2024-07-24 21:42:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 21:42:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 21:42:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.6436 (0.6436) Acc@1 87.256 (87.256) Acc@5 97.998 (97.998) Mem 14931MB [2024-07-24 21:42:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.124) Loss 1.0293 (0.7884) Acc@1 77.588 (83.869) Acc@5 94.629 (96.946) Mem 14931MB [2024-07-24 21:42:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.106) Loss 1.1875 (0.9360) Acc@1 71.826 (79.946) Acc@5 93.018 (95.236) Mem 14931MB [2024-07-24 21:42:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.639 Acc@5 95.222 [2024-07-24 21:42:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-24 21:42:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.64% [2024-07-24 21:42:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 21:42:41 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 21:42:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][0/625] eta 0:14:18 lr 0.001053 wd 0.0500 time 1.3742 (1.3742) data time 0.3613 (0.3613) model time 0.0000 (0.0000) loss 7.7029 (7.7029) grad_norm 2.6856 (2.6856) loss_scale 4096.0000 (4096.0000) mem 14942MB [2024-07-24 21:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][10/625] eta 0:05:01 lr 0.001053 wd 0.0500 time 0.3999 (0.4894) data time 0.0008 (0.0337) model time 0.0000 (0.0000) loss 6.5874 (7.7480) grad_norm 2.9498 (2.1699) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:42:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][20/625] eta 0:04:29 lr 0.001053 wd 0.0500 time 0.3965 (0.4455) data time 0.0006 (0.0181) model time 0.0000 (0.0000) loss 6.4308 (7.4495) grad_norm 1.8866 (2.1623) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][30/625] eta 0:04:15 lr 0.001053 wd 0.0500 time 0.3930 (0.4298) data time 0.0007 (0.0125) model time 0.0000 (0.0000) loss 7.4870 (7.5612) grad_norm 1.4119 (1.9170) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:42:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][40/625] eta 0:04:06 lr 0.001053 wd 0.0500 time 0.3977 (0.4219) data time 0.0007 (0.0098) model time 0.0000 (0.0000) loss 6.7660 (7.4770) grad_norm 1.6260 (1.8561) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][50/625] eta 0:04:00 lr 0.001053 wd 0.0500 time 0.3970 (0.4174) data time 0.0006 (0.0080) model time 0.0000 (0.0000) loss 7.4878 (7.5497) grad_norm 2.7026 (1.9031) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][60/625] eta 0:03:53 lr 0.001053 wd 0.0500 time 0.3955 (0.4142) data time 0.0008 (0.0069) model time 0.3947 (0.3965) loss 7.7401 (7.5710) grad_norm 1.8954 (1.8843) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][70/625] eta 0:03:48 lr 0.001053 wd 0.0500 time 0.3942 (0.4118) data time 0.0009 (0.0060) model time 0.3933 (0.3964) loss 7.5750 (7.5974) grad_norm 2.3493 (1.8668) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][80/625] eta 0:03:43 lr 0.001053 wd 0.0500 time 0.3970 (0.4099) data time 0.0006 (0.0054) model time 0.3964 (0.3963) loss 9.2255 (7.6088) grad_norm 1.7208 (1.8860) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][90/625] eta 0:03:38 lr 0.001053 wd 0.0500 time 0.4151 (0.4091) data time 0.0007 (0.0049) model time 0.4144 (0.3975) loss 8.4050 (7.5773) grad_norm 1.5939 (1.8509) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][100/625] eta 0:03:37 lr 0.001053 wd 0.0500 time 0.3940 (0.4134) data time 0.0007 (0.0045) model time 0.3933 (0.4085) loss 7.5997 (7.5699) grad_norm 2.6552 (1.8610) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][110/625] eta 0:03:32 lr 0.001053 wd 0.0500 time 0.3992 (0.4124) data time 0.0007 (0.0042) model time 0.3985 (0.4071) loss 7.6111 (7.6256) grad_norm 1.5203 (1.8618) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][120/625] eta 0:03:27 lr 0.001052 wd 0.0500 time 0.3977 (0.4113) data time 0.0008 (0.0039) model time 0.3969 (0.4059) loss 8.0863 (7.6652) grad_norm 1.5891 (1.8622) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][130/625] eta 0:03:23 lr 0.001052 wd 0.0500 time 0.3977 (0.4106) data time 0.0008 (0.0037) model time 0.3970 (0.4053) loss 6.6024 (7.6572) grad_norm 1.4142 (1.8341) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][140/625] eta 0:03:18 lr 0.001052 wd 0.0500 time 0.4110 (0.4099) data time 0.0007 (0.0035) model time 0.4103 (0.4047) loss 8.1251 (7.7116) grad_norm 1.6654 (1.8492) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][150/625] eta 0:03:14 lr 0.001052 wd 0.0500 time 0.4002 (0.4092) data time 0.0008 (0.0033) model time 0.3995 (0.4041) loss 6.7486 (7.7260) grad_norm 1.5672 (1.8562) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][160/625] eta 0:03:09 lr 0.001052 wd 0.0500 time 0.3959 (0.4086) data time 0.0008 (0.0032) model time 0.3951 (0.4035) loss 7.2136 (7.7363) grad_norm 1.6686 (1.8519) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][170/625] eta 0:03:05 lr 0.001052 wd 0.0500 time 0.3925 (0.4081) data time 0.0008 (0.0030) model time 0.3917 (0.4032) loss 7.2913 (7.7333) grad_norm 1.4551 (1.8364) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][180/625] eta 0:03:01 lr 0.001052 wd 0.0500 time 0.3967 (0.4075) data time 0.0006 (0.0029) model time 0.3961 (0.4027) loss 9.6439 (7.7261) grad_norm 2.3868 (1.8541) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:43:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][190/625] eta 0:02:57 lr 0.001052 wd 0.0500 time 0.3942 (0.4070) data time 0.0007 (0.0028) model time 0.3936 (0.4023) loss 8.5053 (7.6935) grad_norm 1.9985 (1.8530) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][200/625] eta 0:02:52 lr 0.001052 wd 0.0500 time 0.4021 (0.4065) data time 0.0006 (0.0027) model time 0.4015 (0.4019) loss 7.0945 (7.7022) grad_norm 1.6452 (1.8613) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][210/625] eta 0:02:49 lr 0.001052 wd 0.0500 time 0.4018 (0.4086) data time 0.0008 (0.0026) model time 0.4010 (0.4048) loss 7.0054 (7.6876) grad_norm 1.7027 (1.8476) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][220/625] eta 0:02:45 lr 0.001052 wd 0.0500 time 0.3957 (0.4081) data time 0.0006 (0.0026) model time 0.3951 (0.4044) loss 6.4390 (7.6857) grad_norm 1.5470 (1.8387) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][230/625] eta 0:02:41 lr 0.001052 wd 0.0500 time 0.4034 (0.4084) data time 0.0006 (0.0025) model time 0.4028 (0.4049) loss 8.1148 (7.7010) grad_norm 1.7973 (1.8345) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][240/625] eta 0:02:37 lr 0.001052 wd 0.0500 time 0.4011 (0.4080) data time 0.0008 (0.0024) model time 0.4003 (0.4046) loss 7.7981 (7.6936) grad_norm 2.2413 (1.8491) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][250/625] eta 0:02:32 lr 0.001052 wd 0.0500 time 0.3978 (0.4077) data time 0.0006 (0.0024) model time 0.3972 (0.4043) loss 6.7770 (7.7046) grad_norm 1.7276 (1.8445) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][260/625] eta 0:02:28 lr 0.001051 wd 0.0500 time 0.4029 (0.4073) data time 0.0006 (0.0023) model time 0.4023 (0.4040) loss 6.9325 (7.6986) grad_norm 2.4877 (1.8472) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][270/625] eta 0:02:24 lr 0.001051 wd 0.0500 time 0.3989 (0.4070) data time 0.0008 (0.0022) model time 0.3981 (0.4037) loss 7.5689 (7.7033) grad_norm 1.6733 (1.8491) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][280/625] eta 0:02:20 lr 0.001051 wd 0.0500 time 0.3977 (0.4067) data time 0.0006 (0.0022) model time 0.3972 (0.4034) loss 8.2226 (7.7201) grad_norm 1.4323 (1.8465) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][290/625] eta 0:02:16 lr 0.001051 wd 0.0500 time 0.4042 (0.4064) data time 0.0006 (0.0021) model time 0.4036 (0.4032) loss 8.7214 (7.7166) grad_norm 2.1741 (1.8426) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][300/625] eta 0:02:12 lr 0.001051 wd 0.0500 time 0.3928 (0.4062) data time 0.0008 (0.0021) model time 0.3920 (0.4030) loss 8.1156 (7.7376) grad_norm 2.0805 (1.8401) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][310/625] eta 0:02:07 lr 0.001051 wd 0.0500 time 0.3941 (0.4059) data time 0.0007 (0.0021) model time 0.3934 (0.4027) loss 6.4122 (7.7465) grad_norm 1.3524 (1.8442) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][320/625] eta 0:02:04 lr 0.001051 wd 0.0500 time 0.4059 (0.4074) data time 0.0008 (0.0021) model time 0.4051 (0.4045) loss 8.0776 (7.7430) grad_norm 1.6937 (1.8422) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:44:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][330/625] eta 0:02:00 lr 0.001051 wd 0.0500 time 0.3970 (0.4071) data time 0.0010 (0.0020) model time 0.3960 (0.4043) loss 8.6640 (7.7356) grad_norm 2.1493 (1.8354) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][340/625] eta 0:01:55 lr 0.001051 wd 0.0500 time 0.3912 (0.4068) data time 0.0009 (0.0020) model time 0.3903 (0.4040) loss 7.1427 (7.7385) grad_norm 2.1748 (1.8337) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][350/625] eta 0:01:51 lr 0.001051 wd 0.0500 time 0.3994 (0.4067) data time 0.0006 (0.0020) model time 0.3988 (0.4039) loss 8.5239 (7.7584) grad_norm 2.3286 (1.8403) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][360/625] eta 0:01:47 lr 0.001051 wd 0.0500 time 0.3992 (0.4064) data time 0.0007 (0.0019) model time 0.3985 (0.4037) loss 7.8387 (7.7632) grad_norm 3.2339 (1.8499) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][370/625] eta 0:01:43 lr 0.001051 wd 0.0500 time 0.3943 (0.4063) data time 0.0006 (0.0019) model time 0.3937 (0.4036) loss 7.3052 (7.7593) grad_norm 2.0987 (1.8577) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][380/625] eta 0:01:39 lr 0.001051 wd 0.0500 time 0.3985 (0.4061) data time 0.0006 (0.0019) model time 0.3979 (0.4035) loss 6.2384 (7.7527) grad_norm 1.8922 (1.8635) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][390/625] eta 0:01:35 lr 0.001051 wd 0.0500 time 0.4442 (0.4060) data time 0.0006 (0.0019) model time 0.4436 (0.4034) loss 8.4293 (7.7732) grad_norm 1.4282 (1.8609) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][400/625] eta 0:01:31 lr 0.001051 wd 0.0500 time 0.4040 (0.4059) data time 0.0008 (0.0018) model time 0.4033 (0.4032) loss 6.1520 (7.7547) grad_norm 2.1059 (1.8636) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][410/625] eta 0:01:27 lr 0.001050 wd 0.0500 time 0.3979 (0.4057) data time 0.0007 (0.0018) model time 0.3972 (0.4031) loss 8.1093 (7.7540) grad_norm 1.6539 (1.8603) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][420/625] eta 0:01:23 lr 0.001050 wd 0.0500 time 0.3995 (0.4055) data time 0.0008 (0.0018) model time 0.3987 (0.4029) loss 7.1551 (7.7565) grad_norm 1.3742 (1.8540) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][430/625] eta 0:01:19 lr 0.001050 wd 0.0500 time 0.3978 (0.4063) data time 0.0007 (0.0018) model time 0.3972 (0.4038) loss 9.0064 (7.7638) grad_norm 1.5439 (1.8476) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][440/625] eta 0:01:15 lr 0.001050 wd 0.0500 time 0.3982 (0.4061) data time 0.0006 (0.0018) model time 0.3976 (0.4037) loss 8.7008 (7.7728) grad_norm 1.9532 (1.8514) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][450/625] eta 0:01:11 lr 0.001050 wd 0.0500 time 0.3968 (0.4063) data time 0.0009 (0.0017) model time 0.3959 (0.4039) loss 6.9425 (7.7694) grad_norm 2.1344 (1.8488) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][460/625] eta 0:01:07 lr 0.001050 wd 0.0500 time 0.3994 (0.4061) data time 0.0006 (0.0017) model time 0.3988 (0.4038) loss 6.7493 (7.7703) grad_norm 1.2813 (1.8462) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][470/625] eta 0:01:02 lr 0.001050 wd 0.0500 time 0.3937 (0.4059) data time 0.0008 (0.0017) model time 0.3929 (0.4036) loss 6.5116 (7.7692) grad_norm 2.2593 (1.8462) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:45:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][480/625] eta 0:00:58 lr 0.001050 wd 0.0500 time 0.3926 (0.4058) data time 0.0007 (0.0017) model time 0.3919 (0.4035) loss 7.1677 (7.7753) grad_norm 1.9409 (1.8477) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][490/625] eta 0:00:54 lr 0.001050 wd 0.0500 time 0.3992 (0.4057) data time 0.0006 (0.0017) model time 0.3986 (0.4034) loss 8.3076 (7.7749) grad_norm 2.7572 (1.8516) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][500/625] eta 0:00:50 lr 0.001050 wd 0.0500 time 0.3971 (0.4056) data time 0.0007 (0.0016) model time 0.3964 (0.4033) loss 9.0091 (7.7727) grad_norm 1.8764 (1.8561) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][510/625] eta 0:00:46 lr 0.001050 wd 0.0500 time 0.3979 (0.4055) data time 0.0008 (0.0016) model time 0.3971 (0.4032) loss 6.6569 (7.7715) grad_norm 2.0932 (1.8557) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][520/625] eta 0:00:42 lr 0.001050 wd 0.0500 time 0.3980 (0.4055) data time 0.0008 (0.0016) model time 0.3972 (0.4033) loss 6.9879 (7.7751) grad_norm 1.4447 (1.8528) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][530/625] eta 0:00:38 lr 0.001050 wd 0.0500 time 0.3997 (0.4054) data time 0.0008 (0.0016) model time 0.3989 (0.4032) loss 8.6097 (7.7708) grad_norm 2.4381 (1.8530) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][540/625] eta 0:00:34 lr 0.001050 wd 0.0500 time 0.4032 (0.4063) data time 0.0008 (0.0016) model time 0.4024 (0.4042) loss 7.0346 (7.7620) grad_norm 2.7592 (1.8658) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][550/625] eta 0:00:30 lr 0.001049 wd 0.0500 time 0.3940 (0.4062) data time 0.0007 (0.0016) model time 0.3933 (0.4041) loss 8.5812 (7.7754) grad_norm 2.3693 (1.8708) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][560/625] eta 0:00:26 lr 0.001049 wd 0.0500 time 0.3997 (0.4061) data time 0.0007 (0.0016) model time 0.3991 (0.4040) loss 8.5059 (7.7885) grad_norm 1.8825 (1.8692) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][570/625] eta 0:00:22 lr 0.001049 wd 0.0500 time 0.4001 (0.4059) data time 0.0006 (0.0016) model time 0.3995 (0.4039) loss 7.2036 (7.7880) grad_norm 1.5739 (1.8665) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][580/625] eta 0:00:18 lr 0.001049 wd 0.0500 time 0.4105 (0.4058) data time 0.0006 (0.0015) model time 0.4098 (0.4038) loss 8.4011 (7.7932) grad_norm 1.4173 (1.8630) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][590/625] eta 0:00:14 lr 0.001049 wd 0.0500 time 0.3980 (0.4057) data time 0.0007 (0.0015) model time 0.3973 (0.4036) loss 8.4767 (7.7962) grad_norm 1.8455 (1.8655) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][600/625] eta 0:00:10 lr 0.001049 wd 0.0500 time 0.4024 (0.4055) data time 0.0008 (0.0015) model time 0.4016 (0.4035) loss 9.2705 (7.7969) grad_norm 1.7055 (1.8732) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][610/625] eta 0:00:06 lr 0.001049 wd 0.0500 time 0.3947 (0.4054) data time 0.0006 (0.0015) model time 0.3942 (0.4034) loss 7.7601 (7.7918) grad_norm 1.8312 (1.8695) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][620/625] eta 0:00:02 lr 0.001049 wd 0.0500 time 0.3948 (0.4053) data time 0.0006 (0.0015) model time 0.3942 (0.4033) loss 6.9699 (7.7941) grad_norm 1.4228 (1.8684) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:46:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 84 training takes 0:04:13 [2024-07-24 21:46:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 21:46:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 21:46:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.424 (0.424) Loss 0.6694 (0.6694) Acc@1 87.305 (87.305) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-24 21:46:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.117) Loss 1.0908 (0.8308) Acc@1 74.805 (83.296) Acc@5 94.141 (96.764) Mem 14939MB [2024-07-24 21:46:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.102) Loss 1.2432 (0.9791) Acc@1 72.119 (79.492) Acc@5 92.236 (94.950) Mem 14939MB [2024-07-24 21:46:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.149 Acc@5 94.888 [2024-07-24 21:46:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-07-24 21:46:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.15% [2024-07-24 21:46:58 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 21:46:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 21:46:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.425 (0.425) Loss 0.6411 (0.6411) Acc@1 87.207 (87.207) Acc@5 97.998 (97.998) Mem 14939MB [2024-07-24 21:47:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.117) Loss 1.0283 (0.7860) Acc@1 77.539 (83.869) Acc@5 94.580 (96.955) Mem 14939MB [2024-07-24 21:47:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.102) Loss 1.1855 (0.9334) Acc@1 72.021 (79.990) Acc@5 93.115 (95.264) Mem 14939MB [2024-07-24 21:47:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.675 Acc@5 95.252 [2024-07-24 21:47:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-07-24 21:47:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.67% [2024-07-24 21:47:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 21:47:02 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 21:47:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][0/625] eta 0:10:46 lr 0.001049 wd 0.0500 time 1.0341 (1.0341) data time 0.6604 (0.6604) model time 0.0000 (0.0000) loss 7.0366 (7.0366) grad_norm 2.8759 (2.8759) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][10/625] eta 0:04:40 lr 0.001049 wd 0.0500 time 0.3963 (0.4556) data time 0.0009 (0.0610) model time 0.0000 (0.0000) loss 8.7332 (8.0471) grad_norm 2.8535 (2.2758) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][20/625] eta 0:04:19 lr 0.001049 wd 0.0500 time 0.3998 (0.4288) data time 0.0009 (0.0323) model time 0.0000 (0.0000) loss 7.6660 (8.1505) grad_norm 1.7855 (2.0904) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][30/625] eta 0:04:09 lr 0.001049 wd 0.0500 time 0.4037 (0.4195) data time 0.0007 (0.0222) model time 0.0000 (0.0000) loss 8.4682 (8.0986) grad_norm 1.7423 (1.9644) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][40/625] eta 0:04:02 lr 0.001049 wd 0.0500 time 0.3951 (0.4150) data time 0.0008 (0.0170) model time 0.0000 (0.0000) loss 7.3051 (8.0639) grad_norm 2.1528 (1.9901) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][50/625] eta 0:03:56 lr 0.001049 wd 0.0500 time 0.3996 (0.4122) data time 0.0008 (0.0138) model time 0.0000 (0.0000) loss 8.3166 (8.0531) grad_norm 2.0326 (1.9639) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][60/625] eta 0:03:51 lr 0.001048 wd 0.0500 time 0.3977 (0.4097) data time 0.0007 (0.0117) model time 0.3970 (0.3965) loss 9.0392 (8.0195) grad_norm 1.6591 (1.9519) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][70/625] eta 0:03:46 lr 0.001048 wd 0.0500 time 0.3959 (0.4086) data time 0.0006 (0.0110) model time 0.3953 (0.3959) loss 7.3569 (7.9024) grad_norm 1.8978 (1.9881) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][80/625] eta 0:03:41 lr 0.001048 wd 0.0500 time 0.3948 (0.4071) data time 0.0008 (0.0097) model time 0.3940 (0.3957) loss 7.8278 (7.8834) grad_norm 2.3161 (1.9796) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][90/625] eta 0:03:47 lr 0.001048 wd 0.0500 time 0.3935 (0.4261) data time 0.0008 (0.0087) model time 0.3927 (0.4417) loss 8.2775 (7.8986) grad_norm 3.0137 (2.0148) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][100/625] eta 0:03:42 lr 0.001048 wd 0.0500 time 0.3949 (0.4235) data time 0.0006 (0.0080) model time 0.3944 (0.4330) loss 8.5476 (7.9136) grad_norm 1.9097 (1.9954) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][110/625] eta 0:03:36 lr 0.001048 wd 0.0500 time 0.3996 (0.4213) data time 0.0008 (0.0073) model time 0.3988 (0.4274) loss 9.1163 (7.9318) grad_norm 2.0588 (1.9629) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][120/625] eta 0:03:31 lr 0.001048 wd 0.0500 time 0.3998 (0.4197) data time 0.0008 (0.0068) model time 0.3990 (0.4235) loss 7.8053 (7.9544) grad_norm 1.6683 (1.9515) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][130/625] eta 0:03:26 lr 0.001048 wd 0.0500 time 0.3990 (0.4180) data time 0.0008 (0.0063) model time 0.3982 (0.4202) loss 8.0712 (7.9559) grad_norm 1.9388 (1.9754) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][140/625] eta 0:03:24 lr 0.001048 wd 0.0500 time 0.4093 (0.4206) data time 0.0006 (0.0060) model time 0.4087 (0.4239) loss 8.6199 (7.9493) grad_norm 1.2705 (1.9899) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][150/625] eta 0:03:19 lr 0.001048 wd 0.0500 time 0.3965 (0.4194) data time 0.0008 (0.0056) model time 0.3957 (0.4216) loss 6.8263 (7.9064) grad_norm 1.8073 (2.0106) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][160/625] eta 0:03:14 lr 0.001048 wd 0.0500 time 0.4410 (0.4184) data time 0.0009 (0.0053) model time 0.4402 (0.4198) loss 8.6571 (7.9306) grad_norm 2.1810 (2.0007) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][170/625] eta 0:03:09 lr 0.001048 wd 0.0500 time 0.3989 (0.4171) data time 0.0008 (0.0051) model time 0.3981 (0.4179) loss 8.7313 (7.9212) grad_norm 2.9334 (1.9932) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][180/625] eta 0:03:05 lr 0.001048 wd 0.0500 time 0.3955 (0.4170) data time 0.0006 (0.0048) model time 0.3948 (0.4176) loss 8.1426 (7.9121) grad_norm 2.0320 (2.0087) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][190/625] eta 0:03:01 lr 0.001048 wd 0.0500 time 0.3952 (0.4161) data time 0.0008 (0.0046) model time 0.3944 (0.4162) loss 9.1259 (7.9070) grad_norm 1.8389 (1.9980) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][200/625] eta 0:02:56 lr 0.001047 wd 0.0500 time 0.3992 (0.4152) data time 0.0007 (0.0044) model time 0.3985 (0.4150) loss 7.3397 (7.8907) grad_norm 1.9158 (2.0037) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][210/625] eta 0:02:52 lr 0.001047 wd 0.0500 time 0.3937 (0.4145) data time 0.0007 (0.0043) model time 0.3931 (0.4140) loss 8.9972 (7.9039) grad_norm 2.4515 (1.9959) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][220/625] eta 0:02:47 lr 0.001047 wd 0.0500 time 0.3958 (0.4137) data time 0.0008 (0.0041) model time 0.3950 (0.4130) loss 8.1796 (7.9029) grad_norm 1.3336 (1.9823) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][230/625] eta 0:02:43 lr 0.001047 wd 0.0500 time 0.3997 (0.4130) data time 0.0006 (0.0040) model time 0.3991 (0.4121) loss 6.4729 (7.8798) grad_norm 2.1653 (2.0053) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][240/625] eta 0:02:38 lr 0.001047 wd 0.0500 time 0.4053 (0.4125) data time 0.0008 (0.0039) model time 0.4045 (0.4114) loss 6.1835 (7.8454) grad_norm 2.2832 (2.0163) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][250/625] eta 0:02:34 lr 0.001047 wd 0.0500 time 0.4085 (0.4120) data time 0.0007 (0.0037) model time 0.4078 (0.4107) loss 8.4317 (7.8356) grad_norm 2.0721 (2.0094) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][260/625] eta 0:02:30 lr 0.001047 wd 0.0500 time 0.3979 (0.4116) data time 0.0009 (0.0036) model time 0.3970 (0.4102) loss 6.3178 (7.8400) grad_norm 1.6812 (2.0033) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][270/625] eta 0:02:25 lr 0.001047 wd 0.0500 time 0.3973 (0.4111) data time 0.0006 (0.0035) model time 0.3967 (0.4097) loss 7.6222 (7.8545) grad_norm 3.2007 (2.0079) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:48:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][280/625] eta 0:02:21 lr 0.001047 wd 0.0500 time 0.3980 (0.4106) data time 0.0008 (0.0034) model time 0.3971 (0.4091) loss 7.3513 (7.8523) grad_norm 1.9032 (2.0058) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][290/625] eta 0:02:17 lr 0.001047 wd 0.0500 time 0.3958 (0.4102) data time 0.0006 (0.0033) model time 0.3952 (0.4087) loss 8.9969 (7.8553) grad_norm 1.4455 (1.9999) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][300/625] eta 0:02:13 lr 0.001047 wd 0.0500 time 0.3985 (0.4098) data time 0.0009 (0.0033) model time 0.3976 (0.4082) loss 7.4783 (7.8392) grad_norm 2.4332 (1.9961) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][310/625] eta 0:02:08 lr 0.001047 wd 0.0500 time 0.3981 (0.4094) data time 0.0008 (0.0032) model time 0.3972 (0.4078) loss 7.3821 (7.8469) grad_norm 1.6815 (1.9948) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][320/625] eta 0:02:04 lr 0.001047 wd 0.0500 time 0.3988 (0.4092) data time 0.0006 (0.0031) model time 0.3982 (0.4075) loss 7.1876 (7.8299) grad_norm 2.0058 (1.9952) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][330/625] eta 0:02:00 lr 0.001047 wd 0.0500 time 0.3976 (0.4089) data time 0.0008 (0.0030) model time 0.3968 (0.4072) loss 6.7117 (7.8247) grad_norm 1.7547 (1.9872) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][340/625] eta 0:01:56 lr 0.001046 wd 0.0500 time 0.3986 (0.4087) data time 0.0007 (0.0030) model time 0.3979 (0.4070) loss 7.8815 (7.8240) grad_norm 2.2736 (1.9882) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][350/625] eta 0:01:52 lr 0.001046 wd 0.0500 time 0.4049 (0.4084) data time 0.0007 (0.0029) model time 0.4042 (0.4067) loss 8.6063 (7.8294) grad_norm 1.9527 (1.9895) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][360/625] eta 0:01:48 lr 0.001046 wd 0.0500 time 0.4005 (0.4098) data time 0.0006 (0.0029) model time 0.3998 (0.4084) loss 6.5586 (7.8317) grad_norm 2.0502 (1.9900) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][370/625] eta 0:01:44 lr 0.001046 wd 0.0500 time 0.3990 (0.4095) data time 0.0007 (0.0028) model time 0.3982 (0.4081) loss 7.3567 (7.8283) grad_norm 1.8176 (1.9816) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][380/625] eta 0:01:40 lr 0.001046 wd 0.0500 time 0.4124 (0.4093) data time 0.0008 (0.0028) model time 0.4116 (0.4078) loss 8.0742 (7.8225) grad_norm 1.5460 (1.9766) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][390/625] eta 0:01:36 lr 0.001046 wd 0.0500 time 0.3993 (0.4092) data time 0.0008 (0.0027) model time 0.3985 (0.4077) loss 6.2457 (7.8215) grad_norm 1.6845 (1.9700) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][400/625] eta 0:01:32 lr 0.001046 wd 0.0500 time 0.4004 (0.4092) data time 0.0008 (0.0027) model time 0.3996 (0.4077) loss 7.5189 (7.8081) grad_norm 2.0782 (1.9700) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][410/625] eta 0:01:27 lr 0.001046 wd 0.0500 time 0.3969 (0.4090) data time 0.0008 (0.0026) model time 0.3961 (0.4075) loss 8.1679 (7.7947) grad_norm 2.2646 (1.9673) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][420/625] eta 0:01:23 lr 0.001046 wd 0.0500 time 0.3946 (0.4087) data time 0.0006 (0.0026) model time 0.3940 (0.4072) loss 7.6121 (7.7886) grad_norm 2.0679 (1.9755) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][430/625] eta 0:01:19 lr 0.001046 wd 0.0500 time 1.0558 (0.4100) data time 0.0006 (0.0025) model time 1.0553 (0.4087) loss 7.8609 (7.7934) grad_norm 1.4322 (1.9665) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][440/625] eta 0:01:15 lr 0.001046 wd 0.0500 time 0.3975 (0.4097) data time 0.0006 (0.0025) model time 0.3969 (0.4084) loss 6.9902 (7.7991) grad_norm 1.7423 (1.9671) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][450/625] eta 0:01:11 lr 0.001046 wd 0.0500 time 0.3981 (0.4095) data time 0.0009 (0.0025) model time 0.3973 (0.4081) loss 8.9412 (7.8112) grad_norm 1.6437 (1.9675) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][460/625] eta 0:01:07 lr 0.001046 wd 0.0500 time 0.3943 (0.4093) data time 0.0009 (0.0024) model time 0.3934 (0.4079) loss 6.7925 (7.8135) grad_norm 1.6718 (1.9695) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][470/625] eta 0:01:03 lr 0.001046 wd 0.0500 time 0.4028 (0.4093) data time 0.0008 (0.0024) model time 0.4019 (0.4079) loss 7.6169 (7.8160) grad_norm 1.3322 (1.9676) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][480/625] eta 0:00:59 lr 0.001045 wd 0.0500 time 0.4093 (0.4091) data time 0.0006 (0.0024) model time 0.4086 (0.4077) loss 9.3435 (7.8221) grad_norm 1.3688 (1.9606) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][490/625] eta 0:00:55 lr 0.001045 wd 0.0500 time 0.4123 (0.4089) data time 0.0008 (0.0023) model time 0.4115 (0.4075) loss 7.8311 (7.8190) grad_norm 2.7798 (1.9624) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][500/625] eta 0:00:51 lr 0.001045 wd 0.0500 time 0.3924 (0.4087) data time 0.0009 (0.0023) model time 0.3915 (0.4073) loss 7.1982 (7.8177) grad_norm 2.9610 (1.9675) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][510/625] eta 0:00:46 lr 0.001045 wd 0.0500 time 0.3973 (0.4085) data time 0.0006 (0.0023) model time 0.3966 (0.4071) loss 9.4774 (7.8163) grad_norm 1.5034 (1.9614) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][520/625] eta 0:00:42 lr 0.001045 wd 0.0500 time 0.4148 (0.4083) data time 0.0006 (0.0023) model time 0.4142 (0.4069) loss 7.0617 (7.8205) grad_norm 2.0934 (1.9587) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][530/625] eta 0:00:38 lr 0.001045 wd 0.0500 time 0.3985 (0.4082) data time 0.0008 (0.0022) model time 0.3976 (0.4067) loss 8.1912 (7.8209) grad_norm 1.7277 (1.9604) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][540/625] eta 0:00:34 lr 0.001045 wd 0.0500 time 0.3953 (0.4080) data time 0.0008 (0.0022) model time 0.3945 (0.4065) loss 8.5811 (7.8201) grad_norm 1.9134 (1.9595) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][550/625] eta 0:00:30 lr 0.001045 wd 0.0500 time 0.4012 (0.4078) data time 0.0006 (0.0022) model time 0.4005 (0.4063) loss 8.3602 (7.8280) grad_norm 1.6700 (1.9558) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][560/625] eta 0:00:26 lr 0.001045 wd 0.0500 time 0.3949 (0.4076) data time 0.0007 (0.0022) model time 0.3942 (0.4061) loss 6.1022 (7.8243) grad_norm 1.4270 (1.9532) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][570/625] eta 0:00:22 lr 0.001045 wd 0.0500 time 0.4030 (0.4074) data time 0.0008 (0.0022) model time 0.4022 (0.4060) loss 9.1042 (7.8229) grad_norm 1.7572 (1.9492) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:50:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][580/625] eta 0:00:18 lr 0.001045 wd 0.0500 time 0.3998 (0.4082) data time 0.0008 (0.0021) model time 0.3990 (0.4068) loss 8.3436 (7.8302) grad_norm 1.7452 (1.9501) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 21:51:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][590/625] eta 0:00:14 lr 0.001045 wd 0.0500 time 0.4013 (0.4081) data time 0.0008 (0.0021) model time 0.4005 (0.4067) loss 9.3601 (7.8332) grad_norm 1.9617 (inf) loss_scale 2048.0000 (4071.7428) mem 14939MB [2024-07-24 21:51:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][600/625] eta 0:00:10 lr 0.001045 wd 0.0500 time 0.4030 (0.4079) data time 0.0008 (0.0021) model time 0.4022 (0.4065) loss 7.7687 (7.8273) grad_norm 1.9849 (inf) loss_scale 2048.0000 (4038.0699) mem 14939MB [2024-07-24 21:51:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][610/625] eta 0:00:06 lr 0.001045 wd 0.0500 time 0.3982 (0.4078) data time 0.0006 (0.0021) model time 0.3976 (0.4064) loss 8.2963 (7.8242) grad_norm 1.5472 (inf) loss_scale 2048.0000 (4005.4992) mem 14939MB [2024-07-24 21:51:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][620/625] eta 0:00:02 lr 0.001044 wd 0.0500 time 0.3947 (0.4078) data time 0.0004 (0.0021) model time 0.3944 (0.4064) loss 9.1829 (7.8195) grad_norm 3.1478 (inf) loss_scale 2048.0000 (3973.9775) mem 14939MB [2024-07-24 21:51:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 85 training takes 0:04:14 [2024-07-24 21:51:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 21:51:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 21:51:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.436 (0.436) Loss 0.6338 (0.6338) Acc@1 87.402 (87.402) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-24 21:51:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.118) Loss 1.0449 (0.7878) Acc@1 76.318 (83.057) Acc@5 94.043 (96.813) Mem 14939MB [2024-07-24 21:51:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.1504 (0.9355) Acc@1 72.852 (79.329) Acc@5 92.578 (95.017) Mem 14939MB [2024-07-24 21:51:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.007 Acc@5 94.974 [2024-07-24 21:51:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-07-24 21:51:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.794 (0.794) Loss 0.6401 (0.6401) Acc@1 87.109 (87.109) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-24 21:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 1.0254 (0.7837) Acc@1 77.637 (83.900) Acc@5 94.629 (96.986) Mem 14939MB [2024-07-24 21:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 1.1807 (0.9307) Acc@1 72.021 (80.043) Acc@5 93.262 (95.287) Mem 14939MB [2024-07-24 21:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.730 Acc@5 95.264 [2024-07-24 21:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-07-24 21:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.73% [2024-07-24 21:51:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 21:51:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 21:51:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][0/625] eta 0:14:29 lr 0.001044 wd 0.0500 time 1.3912 (1.3912) data time 1.0212 (1.0212) model time 0.0000 (0.0000) loss 9.0115 (9.0115) grad_norm 2.0586 (2.0586) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:51:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][10/625] eta 0:05:00 lr 0.001044 wd 0.0500 time 0.3968 (0.4880) data time 0.0006 (0.0936) model time 0.0000 (0.0000) loss 8.7595 (8.3682) grad_norm 2.3221 (2.2051) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:51:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][20/625] eta 0:04:30 lr 0.001044 wd 0.0500 time 0.3934 (0.4471) data time 0.0010 (0.0495) model time 0.0000 (0.0000) loss 7.1198 (7.9731) grad_norm 4.0577 (2.3044) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:51:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][30/625] eta 0:04:16 lr 0.001044 wd 0.0500 time 0.3991 (0.4313) data time 0.0009 (0.0338) model time 0.0000 (0.0000) loss 8.9491 (7.8916) grad_norm 1.9692 (2.1799) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:51:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][40/625] eta 0:04:07 lr 0.001044 wd 0.0500 time 0.3965 (0.4232) data time 0.0008 (0.0258) model time 0.0000 (0.0000) loss 6.3118 (7.7458) grad_norm 1.8660 (2.1335) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:51:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][50/625] eta 0:04:00 lr 0.001044 wd 0.0500 time 0.3973 (0.4183) data time 0.0006 (0.0209) model time 0.0000 (0.0000) loss 7.6952 (7.6592) grad_norm 1.3735 (2.0192) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:51:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][60/625] eta 0:03:54 lr 0.001044 wd 0.0500 time 0.4088 (0.4151) data time 0.0008 (0.0176) model time 0.4080 (0.3977) loss 8.2736 (7.6787) grad_norm 1.9246 (1.9491) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:51:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][70/625] eta 0:03:49 lr 0.001044 wd 0.0500 time 0.3973 (0.4127) data time 0.0006 (0.0153) model time 0.3966 (0.3975) loss 8.0087 (7.6499) grad_norm 1.7153 (1.9489) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:51:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][80/625] eta 0:03:43 lr 0.001044 wd 0.0500 time 0.3972 (0.4109) data time 0.0006 (0.0135) model time 0.3967 (0.3973) loss 7.3640 (7.6267) grad_norm 2.2876 (2.0314) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][90/625] eta 0:03:39 lr 0.001044 wd 0.0500 time 0.3968 (0.4095) data time 0.0007 (0.0121) model time 0.3961 (0.3973) loss 8.1968 (7.6432) grad_norm 1.5030 (2.0138) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][100/625] eta 0:03:34 lr 0.001044 wd 0.0500 time 0.4013 (0.4084) data time 0.0009 (0.0110) model time 0.4004 (0.3974) loss 7.8483 (7.6428) grad_norm 1.5622 (1.9943) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][110/625] eta 0:03:29 lr 0.001044 wd 0.0500 time 0.3998 (0.4075) data time 0.0007 (0.0101) model time 0.3991 (0.3975) loss 8.6582 (7.6326) grad_norm 1.9038 (1.9628) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][120/625] eta 0:03:25 lr 0.001044 wd 0.0500 time 0.4004 (0.4069) data time 0.0007 (0.0093) model time 0.3997 (0.3978) loss 6.5288 (7.6149) grad_norm 1.3670 (1.9965) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][130/625] eta 0:03:21 lr 0.001044 wd 0.0500 time 0.3957 (0.4061) data time 0.0006 (0.0087) model time 0.3952 (0.3975) loss 8.2710 (7.6335) grad_norm 2.4272 (2.0030) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][140/625] eta 0:03:16 lr 0.001043 wd 0.0500 time 0.4005 (0.4060) data time 0.0007 (0.0081) model time 0.3998 (0.3981) loss 7.1491 (7.6627) grad_norm 2.7627 (2.0327) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][150/625] eta 0:03:13 lr 0.001043 wd 0.0500 time 0.4020 (0.4066) data time 0.0008 (0.0077) model time 0.4012 (0.3997) loss 8.5457 (7.6875) grad_norm 2.0652 (2.0397) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][160/625] eta 0:03:08 lr 0.001043 wd 0.0500 time 0.3970 (0.4060) data time 0.0006 (0.0072) model time 0.3964 (0.3994) loss 6.9536 (7.7141) grad_norm 2.0682 (2.0405) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][170/625] eta 0:03:05 lr 0.001043 wd 0.0500 time 0.4001 (0.4067) data time 0.0008 (0.0069) model time 0.3993 (0.4008) loss 7.8701 (7.7242) grad_norm 1.5494 (2.0293) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][180/625] eta 0:03:01 lr 0.001043 wd 0.0500 time 0.3999 (0.4087) data time 0.0005 (0.0065) model time 0.3994 (0.4041) loss 6.6413 (7.7317) grad_norm 1.6608 (2.0409) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][190/625] eta 0:02:57 lr 0.001043 wd 0.0500 time 0.4013 (0.4082) data time 0.0006 (0.0062) model time 0.4007 (0.4036) loss 7.6375 (7.7152) grad_norm 1.3368 (2.0401) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][200/625] eta 0:02:53 lr 0.001043 wd 0.0500 time 0.3963 (0.4076) data time 0.0007 (0.0060) model time 0.3956 (0.4031) loss 7.4915 (7.7287) grad_norm 1.9630 (2.0333) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][210/625] eta 0:02:48 lr 0.001043 wd 0.0500 time 0.4009 (0.4072) data time 0.0007 (0.0057) model time 0.4002 (0.4027) loss 7.6669 (7.7174) grad_norm 3.5869 (2.0459) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][220/625] eta 0:02:44 lr 0.001043 wd 0.0500 time 0.3998 (0.4069) data time 0.0005 (0.0055) model time 0.3993 (0.4025) loss 9.1065 (7.7373) grad_norm 1.6584 (2.0307) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:52:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][230/625] eta 0:02:40 lr 0.001043 wd 0.0500 time 0.3989 (0.4065) data time 0.0007 (0.0053) model time 0.3982 (0.4023) loss 8.8965 (7.7326) grad_norm 1.2966 (2.0252) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][240/625] eta 0:02:36 lr 0.001043 wd 0.0500 time 0.3984 (0.4061) data time 0.0008 (0.0051) model time 0.3976 (0.4020) loss 9.4442 (7.7480) grad_norm 1.5431 (2.0299) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][250/625] eta 0:02:32 lr 0.001043 wd 0.0500 time 0.3959 (0.4059) data time 0.0006 (0.0049) model time 0.3953 (0.4019) loss 7.8680 (7.7507) grad_norm 1.4974 (2.0225) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][260/625] eta 0:02:28 lr 0.001043 wd 0.0500 time 0.3973 (0.4056) data time 0.0007 (0.0048) model time 0.3965 (0.4016) loss 7.7969 (7.7589) grad_norm 1.5220 (2.0110) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][270/625] eta 0:02:23 lr 0.001042 wd 0.0500 time 0.3952 (0.4053) data time 0.0006 (0.0046) model time 0.3946 (0.4014) loss 7.6442 (7.7572) grad_norm 1.4727 (1.9999) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][280/625] eta 0:02:19 lr 0.001042 wd 0.0500 time 0.3955 (0.4050) data time 0.0008 (0.0045) model time 0.3947 (0.4012) loss 7.7442 (7.7612) grad_norm 2.3462 (1.9913) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][290/625] eta 0:02:15 lr 0.001042 wd 0.0500 time 0.3956 (0.4047) data time 0.0006 (0.0044) model time 0.3951 (0.4010) loss 7.8020 (7.7631) grad_norm 2.3988 (1.9946) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][300/625] eta 0:02:11 lr 0.001042 wd 0.0500 time 0.4058 (0.4047) data time 0.0007 (0.0043) model time 0.4051 (0.4011) loss 6.2632 (7.7555) grad_norm 1.9997 (2.0009) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][310/625] eta 0:02:07 lr 0.001042 wd 0.0500 time 0.3968 (0.4045) data time 0.0009 (0.0041) model time 0.3959 (0.4009) loss 9.2530 (7.7608) grad_norm 3.0151 (2.0226) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][320/625] eta 0:02:03 lr 0.001042 wd 0.0500 time 0.3949 (0.4043) data time 0.0008 (0.0040) model time 0.3940 (0.4008) loss 7.3705 (7.7799) grad_norm 2.4930 (2.0289) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][330/625] eta 0:01:59 lr 0.001042 wd 0.0500 time 0.4007 (0.4041) data time 0.0007 (0.0039) model time 0.3999 (0.4007) loss 5.9086 (7.7671) grad_norm 1.3741 (2.0346) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][340/625] eta 0:01:55 lr 0.001042 wd 0.0500 time 0.4024 (0.4039) data time 0.0008 (0.0039) model time 0.4016 (0.4006) loss 7.9071 (7.7697) grad_norm 1.6171 (2.0352) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][350/625] eta 0:01:51 lr 0.001042 wd 0.0500 time 0.3985 (0.4038) data time 0.0006 (0.0038) model time 0.3979 (0.4004) loss 6.3566 (7.7557) grad_norm 1.8058 (2.0224) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][360/625] eta 0:01:46 lr 0.001042 wd 0.0500 time 0.3968 (0.4036) data time 0.0008 (0.0037) model time 0.3960 (0.4003) loss 9.5919 (7.7524) grad_norm 2.0812 (2.0166) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][370/625] eta 0:01:42 lr 0.001042 wd 0.0500 time 0.4003 (0.4037) data time 0.0008 (0.0036) model time 0.3996 (0.4005) loss 7.1353 (7.7524) grad_norm 1.4910 (2.0063) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][380/625] eta 0:01:38 lr 0.001042 wd 0.0500 time 0.4031 (0.4036) data time 0.0006 (0.0035) model time 0.4025 (0.4004) loss 7.1636 (7.7539) grad_norm 1.5412 (2.0107) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][390/625] eta 0:01:35 lr 0.001042 wd 0.0500 time 0.6076 (0.4045) data time 0.0005 (0.0035) model time 0.6071 (0.4016) loss 9.8194 (7.7610) grad_norm 1.7103 (2.0154) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:54:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][400/625] eta 0:01:31 lr 0.001042 wd 0.0500 time 0.3947 (0.4059) data time 0.0008 (0.0034) model time 0.3939 (0.4032) loss 9.2101 (7.7662) grad_norm 1.2975 (2.0062) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][410/625] eta 0:01:27 lr 0.001041 wd 0.0500 time 0.3956 (0.4056) data time 0.0007 (0.0033) model time 0.3948 (0.4030) loss 6.3763 (7.7607) grad_norm 1.5529 (2.0005) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][420/625] eta 0:01:23 lr 0.001041 wd 0.0500 time 0.4151 (0.4055) data time 0.0006 (0.0033) model time 0.4144 (0.4029) loss 6.7249 (7.7582) grad_norm 1.9836 (1.9960) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:54:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][430/625] eta 0:01:19 lr 0.001041 wd 0.0500 time 0.3975 (0.4054) data time 0.0006 (0.0032) model time 0.3969 (0.4028) loss 7.1437 (7.7559) grad_norm 2.1883 (1.9960) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:54:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][440/625] eta 0:01:14 lr 0.001041 wd 0.0500 time 0.3964 (0.4052) data time 0.0008 (0.0032) model time 0.3956 (0.4026) loss 7.1053 (7.7583) grad_norm 1.6102 (1.9899) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:54:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][450/625] eta 0:01:10 lr 0.001041 wd 0.0500 time 0.3979 (0.4050) data time 0.0009 (0.0031) model time 0.3970 (0.4025) loss 8.9493 (7.7694) grad_norm 1.9743 (1.9842) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 21:54:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-24 21:54:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 21:54:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:14:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-24 22:14:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-24 22:14:25 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-24 22:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-24 22:14:34 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-24 22:14:34 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-24 22:14:34 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-24 22:14:34 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 86) [2024-07-24 22:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-24 22:14:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][460/625] eta 0:08:44 lr 0.001041 wd 0.0500 time 0.3962 (3.1787) data time 0.0007 (0.2316) model time 0.3955 (2.9471) loss 5.9631 (7.7518) grad_norm 1.5857 (1.9156) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:14:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][470/625] eta 0:02:41 lr 0.001041 wd 0.0500 time 0.3954 (1.0391) data time 0.0008 (0.0542) model time 0.3946 (0.9849) loss 8.9186 (8.0696) grad_norm 2.3684 (2.3411) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:14:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][480/625] eta 0:01:50 lr 0.001041 wd 0.0500 time 0.3992 (0.7592) data time 0.0007 (0.0311) model time 0.3985 (0.7282) loss 8.8363 (8.1795) grad_norm 1.6082 (2.2585) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][490/625] eta 0:01:27 lr 0.001041 wd 0.0500 time 0.3939 (0.6486) data time 0.0008 (0.0219) model time 0.3932 (0.6266) loss 8.6962 (8.1455) grad_norm 1.7819 (2.1349) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][500/625] eta 0:01:13 lr 0.001041 wd 0.0500 time 0.3939 (0.5898) data time 0.0009 (0.0171) model time 0.3929 (0.5727) loss 8.2430 (8.0689) grad_norm 1.4905 (2.0834) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][510/625] eta 0:01:04 lr 0.001041 wd 0.0500 time 0.3946 (0.5574) data time 0.0008 (0.0141) model time 0.3938 (0.5434) loss 8.2551 (8.0730) grad_norm 1.4038 (1.9891) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][520/625] eta 0:00:56 lr 0.001041 wd 0.0500 time 0.3954 (0.5357) data time 0.0010 (0.0120) model time 0.3944 (0.5237) loss 6.9612 (7.9873) grad_norm 1.6168 (1.9571) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][530/625] eta 0:00:49 lr 0.001041 wd 0.0500 time 0.3964 (0.5164) data time 0.0010 (0.0105) model time 0.3954 (0.5059) loss 9.0073 (7.9514) grad_norm 2.8654 (1.9951) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][540/625] eta 0:00:42 lr 0.001041 wd 0.0500 time 0.4022 (0.5020) data time 0.0007 (0.0093) model time 0.4016 (0.4926) loss 5.9705 (7.9339) grad_norm 1.6621 (2.0121) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][550/625] eta 0:00:36 lr 0.001040 wd 0.0500 time 0.3944 (0.4906) data time 0.0007 (0.0084) model time 0.3937 (0.4822) loss 8.9428 (7.9086) grad_norm 3.4414 (2.0478) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][560/625] eta 0:00:31 lr 0.001040 wd 0.0500 time 0.3972 (0.4814) data time 0.0007 (0.0077) model time 0.3966 (0.4737) loss 9.6435 (7.9765) grad_norm 2.2748 (2.0321) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][570/625] eta 0:00:26 lr 0.001040 wd 0.0500 time 0.3962 (0.4740) data time 0.0007 (0.0071) model time 0.3955 (0.4669) loss 7.5133 (7.9481) grad_norm 2.4569 (2.0281) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][580/625] eta 0:00:21 lr 0.001040 wd 0.0500 time 0.3964 (0.4678) data time 0.0009 (0.0066) model time 0.3955 (0.4612) loss 8.4029 (7.9434) grad_norm 2.0603 (2.0102) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][590/625] eta 0:00:16 lr 0.001040 wd 0.0500 time 0.4112 (0.4626) data time 0.0009 (0.0061) model time 0.4102 (0.4565) loss 8.3076 (7.9294) grad_norm 1.5729 (1.9859) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][600/625] eta 0:00:11 lr 0.001040 wd 0.0500 time 0.3940 (0.4582) data time 0.0009 (0.0058) model time 0.3931 (0.4524) loss 9.1970 (7.9309) grad_norm 1.8294 (1.9995) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][610/625] eta 0:00:06 lr 0.001040 wd 0.0500 time 0.3986 (0.4545) data time 0.0004 (0.0055) model time 0.3982 (0.4490) loss 7.5314 (7.9227) grad_norm 2.0498 (2.0160) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][620/625] eta 0:00:02 lr 0.001040 wd 0.0500 time 0.4107 (0.4511) data time 0.0004 (0.0052) model time 0.4103 (0.4460) loss 7.2352 (7.9253) grad_norm 1.2724 (1.9980) loss_scale 2048.0000 (2048.0000) mem 14931MB [2024-07-24 22:15:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 86 training takes 0:01:15 [2024-07-24 22:15:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:15:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:15:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.438 (0.438) Loss 0.6479 (0.6479) Acc@1 87.305 (87.305) Acc@5 97.949 (97.949) Mem 14931MB [2024-07-24 22:15:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.118) Loss 1.1016 (0.8126) Acc@1 76.562 (83.021) Acc@5 93.408 (96.529) Mem 14931MB [2024-07-24 22:16:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.2549 (0.9741) Acc@1 71.582 (79.097) Acc@5 91.455 (94.717) Mem 14931MB [2024-07-24 22:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.741 Acc@5 94.740 [2024-07-24 22:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.7% [2024-07-24 22:16:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.490 (1.490) Loss 0.6377 (0.6377) Acc@1 87.109 (87.109) Acc@5 98.096 (98.096) Mem 14931MB [2024-07-24 22:16:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.219) Loss 1.0215 (0.7818) Acc@1 77.832 (83.953) Acc@5 94.580 (96.995) Mem 14931MB [2024-07-24 22:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.155) Loss 1.1748 (0.9279) Acc@1 72.070 (80.078) Acc@5 93.164 (95.306) Mem 14931MB [2024-07-24 22:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.764 Acc@5 95.282 [2024-07-24 22:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-07-24 22:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.76% [2024-07-24 22:16:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:16:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][0/625] eta 0:10:59 lr 0.001040 wd 0.0500 time 1.0558 (1.0558) data time 0.4421 (0.4421) model time 0.0000 (0.0000) loss 7.9970 (7.9970) grad_norm 2.1906 (2.1906) loss_scale 2048.0000 (2048.0000) mem 14938MB [2024-07-24 22:16:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][10/625] eta 0:04:40 lr 0.001040 wd 0.0500 time 0.3964 (0.4560) data time 0.0006 (0.0410) model time 0.0000 (0.0000) loss 7.4557 (7.3249) grad_norm 1.6178 (1.9399) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][20/625] eta 0:04:19 lr 0.001040 wd 0.0500 time 0.3962 (0.4285) data time 0.0007 (0.0219) model time 0.0000 (0.0000) loss 8.4345 (7.5190) grad_norm 2.4293 (1.9061) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][30/625] eta 0:04:08 lr 0.001040 wd 0.0500 time 0.4013 (0.4185) data time 0.0008 (0.0152) model time 0.0000 (0.0000) loss 7.0962 (7.4917) grad_norm 1.4298 (2.0552) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][40/625] eta 0:04:04 lr 0.001040 wd 0.0500 time 0.5523 (0.4172) data time 0.0009 (0.0117) model time 0.0000 (0.0000) loss 7.3031 (7.4451) grad_norm 1.9313 (1.9532) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][50/625] eta 0:03:57 lr 0.001040 wd 0.0500 time 0.3971 (0.4129) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 6.8238 (7.4087) grad_norm 2.0561 (1.9545) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][60/625] eta 0:03:51 lr 0.001039 wd 0.0500 time 0.3964 (0.4104) data time 0.0009 (0.0082) model time 0.3956 (0.3964) loss 7.8641 (7.5036) grad_norm 1.9566 (1.9435) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][70/625] eta 0:03:46 lr 0.001039 wd 0.0500 time 0.3968 (0.4087) data time 0.0006 (0.0071) model time 0.3962 (0.3971) loss 8.7640 (7.5473) grad_norm 1.3359 (1.9378) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][80/625] eta 0:03:42 lr 0.001039 wd 0.0500 time 0.3968 (0.4074) data time 0.0009 (0.0064) model time 0.3959 (0.3972) loss 6.3182 (7.5485) grad_norm 1.5510 (1.9168) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][90/625] eta 0:03:37 lr 0.001039 wd 0.0500 time 0.3980 (0.4065) data time 0.0009 (0.0058) model time 0.3971 (0.3974) loss 7.9051 (7.5432) grad_norm 2.0556 (1.9219) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][100/625] eta 0:03:33 lr 0.001039 wd 0.0500 time 0.3989 (0.4058) data time 0.0007 (0.0053) model time 0.3982 (0.3976) loss 8.5185 (7.5626) grad_norm 1.8811 (1.9315) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][110/625] eta 0:03:29 lr 0.001039 wd 0.0500 time 0.3987 (0.4072) data time 0.0009 (0.0049) model time 0.3978 (0.4013) loss 6.4467 (7.6121) grad_norm 2.3641 (1.9394) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:16:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][120/625] eta 0:03:25 lr 0.001039 wd 0.0500 time 0.3970 (0.4064) data time 0.0007 (0.0046) model time 0.3963 (0.4007) loss 9.1359 (7.6486) grad_norm 1.5094 (1.9507) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][130/625] eta 0:03:20 lr 0.001039 wd 0.0500 time 0.3993 (0.4059) data time 0.0009 (0.0043) model time 0.3984 (0.4004) loss 6.8178 (7.6072) grad_norm 1.9279 (1.9407) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][140/625] eta 0:03:16 lr 0.001039 wd 0.0500 time 0.3973 (0.4055) data time 0.0007 (0.0041) model time 0.3966 (0.4003) loss 7.2242 (7.6215) grad_norm 3.7065 (2.0037) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][150/625] eta 0:03:12 lr 0.001039 wd 0.0500 time 0.3977 (0.4050) data time 0.0009 (0.0039) model time 0.3969 (0.4000) loss 7.5465 (7.6726) grad_norm 2.0427 (2.0403) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][160/625] eta 0:03:08 lr 0.001039 wd 0.0500 time 0.4043 (0.4046) data time 0.0005 (0.0037) model time 0.4038 (0.3998) loss 8.4570 (7.7253) grad_norm 1.9013 (2.0276) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][170/625] eta 0:03:03 lr 0.001039 wd 0.0500 time 0.3957 (0.4042) data time 0.0009 (0.0035) model time 0.3949 (0.3996) loss 8.9353 (7.7270) grad_norm 2.2914 (2.0250) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][180/625] eta 0:02:59 lr 0.001039 wd 0.0500 time 0.3976 (0.4039) data time 0.0007 (0.0034) model time 0.3969 (0.3994) loss 7.3989 (7.7329) grad_norm 1.6725 (2.0113) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][190/625] eta 0:02:55 lr 0.001039 wd 0.0500 time 0.4045 (0.4037) data time 0.0009 (0.0033) model time 0.4036 (0.3993) loss 8.3405 (7.7279) grad_norm 1.5355 (2.0152) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][200/625] eta 0:02:51 lr 0.001038 wd 0.0500 time 0.4003 (0.4034) data time 0.0006 (0.0031) model time 0.3997 (0.3993) loss 6.7545 (7.7187) grad_norm 1.7540 (2.0351) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][210/625] eta 0:02:47 lr 0.001038 wd 0.0500 time 0.3985 (0.4032) data time 0.0008 (0.0030) model time 0.3978 (0.3992) loss 8.5030 (7.7276) grad_norm 2.6316 (2.0306) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][220/625] eta 0:02:43 lr 0.001038 wd 0.0500 time 0.3969 (0.4031) data time 0.0007 (0.0029) model time 0.3962 (0.3992) loss 6.4214 (7.7115) grad_norm 1.4904 (2.0183) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][230/625] eta 0:02:39 lr 0.001038 wd 0.0500 time 0.4006 (0.4029) data time 0.0006 (0.0029) model time 0.3999 (0.3992) loss 7.8591 (7.7173) grad_norm 2.4656 (2.0309) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][240/625] eta 0:02:35 lr 0.001038 wd 0.0500 time 0.3987 (0.4028) data time 0.0009 (0.0028) model time 0.3978 (0.3991) loss 8.7115 (7.7351) grad_norm 1.3247 (2.0382) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][250/625] eta 0:02:30 lr 0.001038 wd 0.0500 time 0.3970 (0.4026) data time 0.0008 (0.0027) model time 0.3962 (0.3990) loss 8.7903 (7.7299) grad_norm 1.3791 (2.0347) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][260/625] eta 0:02:26 lr 0.001038 wd 0.0500 time 0.3951 (0.4024) data time 0.0007 (0.0026) model time 0.3944 (0.3989) loss 8.2124 (7.7457) grad_norm 2.2181 (2.0279) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][270/625] eta 0:02:23 lr 0.001038 wd 0.0500 time 0.4005 (0.4030) data time 0.0010 (0.0026) model time 0.3994 (0.3998) loss 8.2963 (7.7540) grad_norm 2.1078 (2.0207) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][280/625] eta 0:02:18 lr 0.001038 wd 0.0500 time 0.3987 (0.4028) data time 0.0008 (0.0025) model time 0.3979 (0.3996) loss 6.9733 (7.7527) grad_norm 1.3547 (2.0105) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][290/625] eta 0:02:14 lr 0.001038 wd 0.0500 time 0.3976 (0.4026) data time 0.0006 (0.0025) model time 0.3970 (0.3995) loss 7.1202 (7.7372) grad_norm 2.1221 (2.0034) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][300/625] eta 0:02:10 lr 0.001038 wd 0.0500 time 0.3950 (0.4025) data time 0.0009 (0.0024) model time 0.3941 (0.3994) loss 6.4077 (7.7245) grad_norm 1.5359 (1.9963) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][310/625] eta 0:02:06 lr 0.001038 wd 0.0500 time 0.3970 (0.4024) data time 0.0009 (0.0024) model time 0.3961 (0.3994) loss 6.6811 (7.7180) grad_norm 1.6027 (2.0012) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][320/625] eta 0:02:02 lr 0.001038 wd 0.0500 time 0.3985 (0.4022) data time 0.0009 (0.0023) model time 0.3976 (0.3993) loss 7.9781 (7.7326) grad_norm 1.8879 (2.0045) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][330/625] eta 0:01:58 lr 0.001038 wd 0.0500 time 0.3981 (0.4026) data time 0.0006 (0.0023) model time 0.3975 (0.3997) loss 7.8672 (7.7338) grad_norm 1.7821 (1.9955) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][340/625] eta 0:01:54 lr 0.001037 wd 0.0500 time 0.3973 (0.4025) data time 0.0007 (0.0022) model time 0.3966 (0.3998) loss 8.7595 (7.7354) grad_norm 2.3699 (1.9944) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][350/625] eta 0:01:50 lr 0.001037 wd 0.0500 time 0.3986 (0.4024) data time 0.0008 (0.0022) model time 0.3978 (0.3997) loss 8.1134 (7.7491) grad_norm 1.6734 (1.9926) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][360/625] eta 0:01:46 lr 0.001037 wd 0.0500 time 0.3988 (0.4023) data time 0.0006 (0.0022) model time 0.3981 (0.3996) loss 6.7070 (7.7398) grad_norm 3.1092 (2.0057) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][370/625] eta 0:01:42 lr 0.001037 wd 0.0500 time 0.4310 (0.4023) data time 0.0007 (0.0021) model time 0.4303 (0.3997) loss 9.0890 (7.7352) grad_norm 1.3966 (2.0021) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][380/625] eta 0:01:38 lr 0.001037 wd 0.0500 time 0.3996 (0.4023) data time 0.0007 (0.0022) model time 0.3989 (0.3996) loss 7.4231 (7.7334) grad_norm 2.1516 (2.0061) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][390/625] eta 0:01:34 lr 0.001037 wd 0.0500 time 0.4000 (0.4022) data time 0.0009 (0.0022) model time 0.3991 (0.3995) loss 8.5690 (7.7484) grad_norm 4.8092 (2.0132) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][400/625] eta 0:01:30 lr 0.001037 wd 0.0500 time 0.4007 (0.4021) data time 0.0007 (0.0021) model time 0.4000 (0.3995) loss 8.3301 (7.7576) grad_norm 2.1704 (2.0348) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][410/625] eta 0:01:26 lr 0.001037 wd 0.0500 time 0.3993 (0.4021) data time 0.0011 (0.0021) model time 0.3983 (0.3995) loss 9.2064 (7.7630) grad_norm 1.8766 (2.0300) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:18:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][420/625] eta 0:01:22 lr 0.001037 wd 0.0500 time 0.3970 (0.4020) data time 0.0009 (0.0021) model time 0.3961 (0.3994) loss 8.7191 (7.7741) grad_norm 1.7039 (2.0318) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][430/625] eta 0:01:18 lr 0.001037 wd 0.0500 time 0.4012 (0.4019) data time 0.0008 (0.0021) model time 0.4004 (0.3994) loss 6.4332 (7.7695) grad_norm 2.1225 (2.0459) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][440/625] eta 0:01:14 lr 0.001037 wd 0.0500 time 0.3946 (0.4018) data time 0.0009 (0.0020) model time 0.3937 (0.3993) loss 7.1004 (7.7653) grad_norm 2.5766 (2.0473) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][450/625] eta 0:01:10 lr 0.001037 wd 0.0500 time 0.3977 (0.4018) data time 0.0011 (0.0020) model time 0.3966 (0.3993) loss 7.3433 (7.7665) grad_norm 1.3922 (2.0418) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][460/625] eta 0:01:06 lr 0.001037 wd 0.0500 time 0.4006 (0.4017) data time 0.0008 (0.0020) model time 0.3998 (0.3993) loss 6.8824 (7.7689) grad_norm 2.7228 (2.0349) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][470/625] eta 0:01:02 lr 0.001036 wd 0.0500 time 0.4012 (0.4017) data time 0.0010 (0.0020) model time 0.4002 (0.3993) loss 7.8413 (7.7761) grad_norm 1.3687 (2.0256) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][480/625] eta 0:00:58 lr 0.001036 wd 0.0500 time 0.3785 (0.4020) data time 0.0009 (0.0019) model time 0.3776 (0.3996) loss 7.8553 (7.7720) grad_norm 3.3009 (2.0282) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][490/625] eta 0:00:54 lr 0.001036 wd 0.0500 time 0.3986 (0.4019) data time 0.0006 (0.0019) model time 0.3980 (0.3996) loss 6.6184 (7.7638) grad_norm 3.1972 (2.0260) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][500/625] eta 0:00:50 lr 0.001036 wd 0.0500 time 0.3971 (0.4018) data time 0.0006 (0.0019) model time 0.3965 (0.3995) loss 7.7498 (7.7613) grad_norm 1.4205 (2.0257) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][510/625] eta 0:00:46 lr 0.001036 wd 0.0500 time 0.3983 (0.4018) data time 0.0008 (0.0019) model time 0.3975 (0.3995) loss 8.2486 (7.7637) grad_norm 2.1305 (2.0281) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][520/625] eta 0:00:42 lr 0.001036 wd 0.0500 time 0.3981 (0.4017) data time 0.0007 (0.0019) model time 0.3974 (0.3995) loss 6.5887 (7.7579) grad_norm 1.4227 (2.0192) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][530/625] eta 0:00:38 lr 0.001036 wd 0.0500 time 0.4063 (0.4017) data time 0.0007 (0.0018) model time 0.4056 (0.3995) loss 8.1884 (7.7458) grad_norm 1.4807 (2.0135) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][540/625] eta 0:00:34 lr 0.001036 wd 0.0500 time 0.3966 (0.4017) data time 0.0008 (0.0018) model time 0.3957 (0.3995) loss 7.8010 (7.7475) grad_norm 2.5430 (2.0068) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][550/625] eta 0:00:30 lr 0.001036 wd 0.0500 time 0.3983 (0.4019) data time 0.0006 (0.0018) model time 0.3977 (0.3998) loss 8.5140 (7.7401) grad_norm 1.9534 (2.0033) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][560/625] eta 0:00:26 lr 0.001036 wd 0.0500 time 0.3999 (0.4019) data time 0.0008 (0.0018) model time 0.3991 (0.3997) loss 7.8294 (7.7344) grad_norm 2.1548 (1.9997) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:19:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][570/625] eta 0:00:22 lr 0.001036 wd 0.0500 time 0.3974 (0.4018) data time 0.0008 (0.0018) model time 0.3966 (0.3997) loss 8.6882 (7.7472) grad_norm 1.7946 (1.9990) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][580/625] eta 0:00:18 lr 0.001036 wd 0.0500 time 0.3972 (0.4018) data time 0.0007 (0.0018) model time 0.3966 (0.3997) loss 6.1529 (7.7459) grad_norm 1.5758 (1.9927) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][590/625] eta 0:00:14 lr 0.001036 wd 0.0500 time 0.4007 (0.4017) data time 0.0006 (0.0018) model time 0.4001 (0.3996) loss 8.3019 (7.7456) grad_norm 2.4187 (2.0003) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][600/625] eta 0:00:10 lr 0.001036 wd 0.0500 time 0.3983 (0.4016) data time 0.0009 (0.0017) model time 0.3974 (0.3996) loss 7.9117 (7.7497) grad_norm 1.8623 (1.9964) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][610/625] eta 0:00:06 lr 0.001035 wd 0.0500 time 0.3998 (0.4016) data time 0.0004 (0.0017) model time 0.3994 (0.3995) loss 9.8456 (7.7575) grad_norm 1.8394 (1.9942) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][620/625] eta 0:00:02 lr 0.001035 wd 0.0500 time 0.3983 (0.4015) data time 0.0004 (0.0017) model time 0.3979 (0.3995) loss 7.0016 (7.7545) grad_norm 1.8817 (1.9937) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 87 training takes 0:04:10 [2024-07-24 22:20:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:20:21 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:20:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.6445 (0.6445) Acc@1 86.816 (86.816) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-24 22:20:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 1.0566 (0.8037) Acc@1 77.002 (83.461) Acc@5 94.336 (96.826) Mem 14939MB [2024-07-24 22:20:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1582 (0.9513) Acc@1 73.535 (79.641) Acc@5 93.115 (95.045) Mem 14939MB [2024-07-24 22:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.395 Acc@5 95.036 [2024-07-24 22:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-07-24 22:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.39% [2024-07-24 22:20:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 22:20:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 22:20:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.426 (0.426) Loss 0.6362 (0.6362) Acc@1 87.207 (87.207) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-24 22:20:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 1.0215 (0.7796) Acc@1 77.734 (83.980) Acc@5 94.678 (97.004) Mem 14939MB [2024-07-24 22:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.102) Loss 1.1719 (0.9254) Acc@1 72.217 (80.134) Acc@5 93.213 (95.338) Mem 14939MB [2024-07-24 22:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.818 Acc@5 95.317 [2024-07-24 22:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-07-24 22:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.82% [2024-07-24 22:20:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:20:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:20:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][0/625] eta 0:07:35 lr 0.001035 wd 0.0500 time 0.7282 (0.7282) data time 0.3464 (0.3464) model time 0.0000 (0.0000) loss 8.0487 (8.0487) grad_norm 1.9807 (1.9807) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][10/625] eta 0:04:22 lr 0.001035 wd 0.0500 time 0.3967 (0.4275) data time 0.0009 (0.0324) model time 0.0000 (0.0000) loss 6.6615 (7.6897) grad_norm 1.6853 (1.8791) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][20/625] eta 0:04:10 lr 0.001035 wd 0.0500 time 0.3949 (0.4137) data time 0.0006 (0.0173) model time 0.0000 (0.0000) loss 5.8216 (7.3888) grad_norm 1.6242 (1.7823) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][30/625] eta 0:04:02 lr 0.001035 wd 0.0500 time 0.3978 (0.4083) data time 0.0009 (0.0121) model time 0.0000 (0.0000) loss 8.0442 (7.6396) grad_norm 1.8535 (1.7580) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][40/625] eta 0:03:57 lr 0.001035 wd 0.0500 time 0.4072 (0.4058) data time 0.0007 (0.0093) model time 0.0000 (0.0000) loss 7.7393 (7.5674) grad_norm 1.6777 (1.7541) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][50/625] eta 0:03:52 lr 0.001035 wd 0.0500 time 0.3946 (0.4043) data time 0.0006 (0.0077) model time 0.0000 (0.0000) loss 8.3390 (7.6055) grad_norm 1.5821 (1.7779) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:20:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][60/625] eta 0:03:47 lr 0.001035 wd 0.0500 time 0.3989 (0.4033) data time 0.0006 (0.0066) model time 0.3982 (0.3976) loss 8.0298 (7.6234) grad_norm 1.8243 (1.7806) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][70/625] eta 0:03:43 lr 0.001035 wd 0.0500 time 0.4062 (0.4028) data time 0.0008 (0.0058) model time 0.4054 (0.3982) loss 6.1940 (7.6103) grad_norm 1.3480 (1.7984) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][80/625] eta 0:03:39 lr 0.001035 wd 0.0500 time 0.3930 (0.4026) data time 0.0007 (0.0052) model time 0.3923 (0.3990) loss 7.6543 (7.6363) grad_norm 1.2607 (1.8152) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][90/625] eta 0:03:35 lr 0.001035 wd 0.0500 time 0.4007 (0.4024) data time 0.0008 (0.0047) model time 0.3999 (0.3991) loss 9.0948 (7.6150) grad_norm 1.8003 (1.8218) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][100/625] eta 0:03:31 lr 0.001035 wd 0.0500 time 0.4034 (0.4021) data time 0.0006 (0.0043) model time 0.4027 (0.3991) loss 7.5090 (7.6329) grad_norm 1.9339 (1.8289) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][110/625] eta 0:03:26 lr 0.001035 wd 0.0500 time 0.3958 (0.4018) data time 0.0009 (0.0040) model time 0.3949 (0.3988) loss 9.0149 (7.6630) grad_norm 1.8483 (1.8406) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][120/625] eta 0:03:22 lr 0.001034 wd 0.0500 time 0.4016 (0.4016) data time 0.0007 (0.0038) model time 0.4009 (0.3987) loss 9.0520 (7.6417) grad_norm 1.5417 (1.8419) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][130/625] eta 0:03:18 lr 0.001034 wd 0.0500 time 0.4029 (0.4015) data time 0.0009 (0.0035) model time 0.4020 (0.3989) loss 7.5878 (7.6807) grad_norm 2.4925 (1.8514) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][140/625] eta 0:03:14 lr 0.001034 wd 0.0500 time 0.3937 (0.4014) data time 0.0007 (0.0033) model time 0.3929 (0.3989) loss 7.6289 (7.7436) grad_norm 1.6909 (1.8651) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][150/625] eta 0:03:11 lr 0.001034 wd 0.0500 time 0.3980 (0.4026) data time 0.0009 (0.0032) model time 0.3971 (0.4009) loss 7.1478 (7.7121) grad_norm 1.7077 (1.8680) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][160/625] eta 0:03:07 lr 0.001034 wd 0.0500 time 0.3981 (0.4025) data time 0.0007 (0.0030) model time 0.3974 (0.4008) loss 7.2689 (7.6846) grad_norm 1.2574 (1.8900) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][170/625] eta 0:03:03 lr 0.001034 wd 0.0500 time 0.3990 (0.4023) data time 0.0008 (0.0029) model time 0.3982 (0.4006) loss 8.0432 (7.6806) grad_norm 1.9026 (1.8858) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][180/625] eta 0:02:58 lr 0.001034 wd 0.0500 time 0.3947 (0.4022) data time 0.0007 (0.0028) model time 0.3940 (0.4005) loss 7.8311 (7.7137) grad_norm 1.6501 (1.9115) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][190/625] eta 0:02:54 lr 0.001034 wd 0.0500 time 0.4032 (0.4022) data time 0.0006 (0.0027) model time 0.4026 (0.4006) loss 6.4975 (7.7117) grad_norm 2.3620 (1.9231) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][200/625] eta 0:02:50 lr 0.001034 wd 0.0500 time 0.3961 (0.4022) data time 0.0007 (0.0026) model time 0.3953 (0.4006) loss 6.8494 (7.7188) grad_norm 1.5026 (1.9683) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:21:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][210/625] eta 0:02:46 lr 0.001034 wd 0.0500 time 0.3972 (0.4020) data time 0.0007 (0.0025) model time 0.3965 (0.4004) loss 8.2150 (7.7168) grad_norm 1.3242 (1.9778) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][220/625] eta 0:02:43 lr 0.001034 wd 0.0500 time 0.3968 (0.4027) data time 0.0008 (0.0025) model time 0.3960 (0.4014) loss 8.2790 (7.7364) grad_norm 3.6291 (1.9910) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][230/625] eta 0:02:39 lr 0.001034 wd 0.0500 time 0.3995 (0.4026) data time 0.0007 (0.0024) model time 0.3988 (0.4013) loss 6.4194 (7.7221) grad_norm 3.4171 (2.0038) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][240/625] eta 0:02:34 lr 0.001034 wd 0.0500 time 0.4070 (0.4025) data time 0.0006 (0.0023) model time 0.4064 (0.4012) loss 6.5406 (7.7047) grad_norm 1.5875 (1.9975) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][250/625] eta 0:02:30 lr 0.001033 wd 0.0500 time 0.4001 (0.4024) data time 0.0006 (0.0023) model time 0.3995 (0.4010) loss 8.2128 (7.7273) grad_norm 1.9235 (1.9827) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][260/625] eta 0:02:26 lr 0.001033 wd 0.0500 time 0.4140 (0.4025) data time 0.0008 (0.0022) model time 0.4132 (0.4011) loss 8.5340 (7.7197) grad_norm 1.5647 (1.9740) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][270/625] eta 0:02:22 lr 0.001033 wd 0.0500 time 0.4020 (0.4025) data time 0.0006 (0.0022) model time 0.4013 (0.4012) loss 7.9063 (7.7270) grad_norm 1.3686 (1.9556) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][280/625] eta 0:02:18 lr 0.001033 wd 0.0500 time 0.3972 (0.4025) data time 0.0008 (0.0021) model time 0.3964 (0.4012) loss 7.0887 (7.7098) grad_norm 1.4070 (1.9460) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][290/625] eta 0:02:14 lr 0.001033 wd 0.0500 time 0.4027 (0.4025) data time 0.0009 (0.0021) model time 0.4017 (0.4012) loss 8.7015 (7.7170) grad_norm 1.9503 (1.9458) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][300/625] eta 0:02:10 lr 0.001033 wd 0.0500 time 0.3961 (0.4025) data time 0.0008 (0.0020) model time 0.3953 (0.4012) loss 7.9950 (7.7109) grad_norm 1.4878 (1.9439) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][310/625] eta 0:02:06 lr 0.001033 wd 0.0500 time 0.4031 (0.4024) data time 0.0009 (0.0020) model time 0.4022 (0.4011) loss 8.4781 (7.7146) grad_norm 1.4042 (1.9293) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][320/625] eta 0:02:02 lr 0.001033 wd 0.0500 time 0.3979 (0.4022) data time 0.0007 (0.0020) model time 0.3972 (0.4010) loss 8.0731 (7.7229) grad_norm 3.8832 (1.9429) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][330/625] eta 0:01:58 lr 0.001033 wd 0.0500 time 0.3991 (0.4022) data time 0.0009 (0.0019) model time 0.3982 (0.4009) loss 7.7721 (7.7311) grad_norm 2.0109 (1.9438) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][340/625] eta 0:01:54 lr 0.001033 wd 0.0500 time 0.4010 (0.4022) data time 0.0008 (0.0019) model time 0.4002 (0.4009) loss 7.3477 (7.7362) grad_norm 1.5363 (1.9411) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][350/625] eta 0:01:50 lr 0.001033 wd 0.0500 time 0.4043 (0.4023) data time 0.0007 (0.0019) model time 0.4036 (0.4011) loss 7.9398 (7.7400) grad_norm 3.0675 (1.9433) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:22:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][360/625] eta 0:01:46 lr 0.001033 wd 0.0500 time 0.3972 (0.4023) data time 0.0007 (0.0019) model time 0.3966 (0.4010) loss 8.1566 (7.7429) grad_norm 2.1788 (1.9525) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][370/625] eta 0:01:42 lr 0.001033 wd 0.0500 time 0.3978 (0.4036) data time 0.0008 (0.0018) model time 0.3970 (0.4025) loss 5.9362 (7.7353) grad_norm 1.4314 (1.9536) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][380/625] eta 0:01:38 lr 0.001033 wd 0.0500 time 0.4004 (0.4035) data time 0.0007 (0.0018) model time 0.3997 (0.4025) loss 8.0617 (7.7361) grad_norm 3.6260 (1.9557) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][390/625] eta 0:01:34 lr 0.001032 wd 0.0500 time 0.3991 (0.4034) data time 0.0007 (0.0018) model time 0.3984 (0.4024) loss 8.9147 (7.7570) grad_norm 1.8684 (1.9544) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][400/625] eta 0:01:30 lr 0.001032 wd 0.0500 time 0.3979 (0.4033) data time 0.0010 (0.0018) model time 0.3969 (0.4023) loss 8.3427 (7.7644) grad_norm 1.8993 (1.9492) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][410/625] eta 0:01:26 lr 0.001032 wd 0.0500 time 0.3997 (0.4033) data time 0.0007 (0.0018) model time 0.3990 (0.4022) loss 9.0114 (7.7695) grad_norm 1.7872 (1.9480) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][420/625] eta 0:01:22 lr 0.001032 wd 0.0500 time 0.4063 (0.4032) data time 0.0007 (0.0017) model time 0.4056 (0.4021) loss 7.5201 (7.7684) grad_norm 3.4733 (1.9659) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][430/625] eta 0:01:18 lr 0.001032 wd 0.0500 time 0.4005 (0.4031) data time 0.0010 (0.0017) model time 0.3995 (0.4021) loss 8.3013 (7.7638) grad_norm 2.5691 (1.9725) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][440/625] eta 0:01:14 lr 0.001032 wd 0.0500 time 0.4030 (0.4033) data time 0.0007 (0.0017) model time 0.4023 (0.4023) loss 7.8634 (7.7620) grad_norm 1.5559 (1.9722) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][450/625] eta 0:01:10 lr 0.001032 wd 0.0500 time 0.4010 (0.4033) data time 0.0009 (0.0017) model time 0.4001 (0.4022) loss 8.0858 (7.7653) grad_norm 1.8393 (1.9656) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][460/625] eta 0:01:06 lr 0.001032 wd 0.0500 time 0.3996 (0.4032) data time 0.0008 (0.0017) model time 0.3987 (0.4022) loss 8.5706 (7.7573) grad_norm 1.6774 (1.9694) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][470/625] eta 0:01:02 lr 0.001032 wd 0.0500 time 0.3987 (0.4032) data time 0.0010 (0.0016) model time 0.3977 (0.4021) loss 9.5276 (7.7577) grad_norm 1.5881 (1.9830) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][480/625] eta 0:00:58 lr 0.001032 wd 0.0500 time 0.3983 (0.4031) data time 0.0008 (0.0016) model time 0.3975 (0.4021) loss 7.6726 (7.7602) grad_norm 1.9236 (1.9834) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][490/625] eta 0:00:54 lr 0.001032 wd 0.0500 time 0.4040 (0.4031) data time 0.0009 (0.0016) model time 0.4032 (0.4021) loss 8.6720 (7.7773) grad_norm 1.5314 (1.9796) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][500/625] eta 0:00:50 lr 0.001032 wd 0.0500 time 0.4138 (0.4031) data time 0.0008 (0.0016) model time 0.4130 (0.4021) loss 8.9216 (7.7797) grad_norm 1.8313 (1.9723) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:23:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][510/625] eta 0:00:46 lr 0.001032 wd 0.0500 time 0.3990 (0.4031) data time 0.0008 (0.0016) model time 0.3982 (0.4020) loss 9.0266 (7.7795) grad_norm 2.2194 (1.9723) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][520/625] eta 0:00:42 lr 0.001031 wd 0.0500 time 0.3968 (0.4030) data time 0.0008 (0.0016) model time 0.3960 (0.4020) loss 6.2559 (7.7709) grad_norm 1.6337 (1.9693) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][530/625] eta 0:00:38 lr 0.001031 wd 0.0500 time 0.3968 (0.4030) data time 0.0008 (0.0016) model time 0.3960 (0.4019) loss 8.1008 (7.7650) grad_norm 3.4591 (1.9694) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][540/625] eta 0:00:34 lr 0.001031 wd 0.0500 time 0.3972 (0.4029) data time 0.0008 (0.0015) model time 0.3964 (0.4019) loss 7.8738 (7.7652) grad_norm 1.9830 (1.9677) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][550/625] eta 0:00:30 lr 0.001031 wd 0.0500 time 0.3989 (0.4029) data time 0.0009 (0.0015) model time 0.3980 (0.4018) loss 6.9438 (7.7670) grad_norm 1.6484 (1.9666) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][560/625] eta 0:00:26 lr 0.001031 wd 0.0500 time 0.4019 (0.4028) data time 0.0006 (0.0015) model time 0.4013 (0.4018) loss 7.5121 (7.7728) grad_norm 1.7418 (1.9638) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][570/625] eta 0:00:22 lr 0.001031 wd 0.0500 time 0.3997 (0.4027) data time 0.0008 (0.0015) model time 0.3989 (0.4017) loss 8.8176 (7.7840) grad_norm 1.9595 (1.9654) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][580/625] eta 0:00:18 lr 0.001031 wd 0.0500 time 0.3992 (0.4027) data time 0.0006 (0.0015) model time 0.3986 (0.4016) loss 8.6827 (7.7867) grad_norm 1.7996 (1.9607) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][590/625] eta 0:00:14 lr 0.001031 wd 0.0500 time 0.3977 (0.4036) data time 0.0007 (0.0015) model time 0.3970 (0.4026) loss 9.5113 (7.7875) grad_norm 1.7337 (1.9609) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][600/625] eta 0:00:10 lr 0.001031 wd 0.0500 time 0.3995 (0.4035) data time 0.0008 (0.0015) model time 0.3987 (0.4025) loss 7.3639 (7.7818) grad_norm 1.7277 (1.9583) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][610/625] eta 0:00:06 lr 0.001031 wd 0.0500 time 0.3988 (0.4034) data time 0.0004 (0.0015) model time 0.3984 (0.4024) loss 8.5782 (7.7848) grad_norm 1.9181 (1.9624) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][620/625] eta 0:00:02 lr 0.001031 wd 0.0500 time 0.3989 (0.4033) data time 0.0006 (0.0015) model time 0.3983 (0.4024) loss 8.5655 (7.7879) grad_norm 2.7127 (1.9651) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 88 training takes 0:04:12 [2024-07-24 22:24:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:24:44 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:24:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.6558 (0.6558) Acc@1 86.426 (86.426) Acc@5 97.510 (97.510) Mem 14939MB [2024-07-24 22:24:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 1.0625 (0.7881) Acc@1 76.025 (83.469) Acc@5 94.238 (96.804) Mem 14939MB [2024-07-24 22:24:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.1924 (0.9409) Acc@1 72.559 (79.590) Acc@5 92.725 (95.117) Mem 14939MB [2024-07-24 22:24:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.279 Acc@5 95.098 [2024-07-24 22:24:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-24 22:24:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.751 (0.751) Loss 0.6333 (0.6333) Acc@1 87.354 (87.354) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-24 22:24:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.152) Loss 1.0186 (0.7775) Acc@1 77.783 (84.029) Acc@5 94.629 (97.004) Mem 14939MB [2024-07-24 22:24:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 1.1670 (0.9228) Acc@1 72.412 (80.178) Acc@5 93.164 (95.347) Mem 14939MB [2024-07-24 22:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.852 Acc@5 95.325 [2024-07-24 22:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-24 22:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.85% [2024-07-24 22:24:50 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:24:51 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:24:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][0/625] eta 0:08:09 lr 0.001031 wd 0.0500 time 0.7838 (0.7838) data time 0.4007 (0.4007) model time 0.0000 (0.0000) loss 8.1685 (8.1685) grad_norm 1.8409 (1.8409) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][10/625] eta 0:04:26 lr 0.001031 wd 0.0500 time 0.4008 (0.4340) data time 0.0009 (0.0373) model time 0.0000 (0.0000) loss 7.3071 (7.6818) grad_norm 1.8977 (1.9137) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:24:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][20/625] eta 0:04:12 lr 0.001031 wd 0.0500 time 0.4001 (0.4176) data time 0.0008 (0.0200) model time 0.0000 (0.0000) loss 8.0801 (7.8510) grad_norm 2.0450 (1.8111) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:25:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][30/625] eta 0:04:04 lr 0.001030 wd 0.0500 time 0.3948 (0.4117) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 7.2409 (7.9008) grad_norm 2.5190 (1.9105) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:25:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][40/625] eta 0:03:59 lr 0.001030 wd 0.0500 time 0.3980 (0.4090) data time 0.0007 (0.0107) model time 0.0000 (0.0000) loss 9.0317 (7.9369) grad_norm 1.5569 (1.9166) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:25:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][50/625] eta 0:03:54 lr 0.001030 wd 0.0500 time 0.3956 (0.4072) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 8.4860 (7.9222) grad_norm 2.0911 (1.9269) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:25:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][60/625] eta 0:03:49 lr 0.001030 wd 0.0500 time 0.4050 (0.4061) data time 0.0007 (0.0075) model time 0.4042 (0.3993) loss 8.0404 (7.8628) grad_norm 1.5694 (1.9125) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:25:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][70/625] eta 0:03:44 lr 0.001030 wd 0.0500 time 0.4033 (0.4053) data time 0.0007 (0.0065) model time 0.4026 (0.3994) loss 7.4056 (7.7685) grad_norm 2.7386 (1.9621) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:25:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][80/625] eta 0:03:40 lr 0.001030 wd 0.0500 time 0.3994 (0.4044) data time 0.0007 (0.0058) model time 0.3988 (0.3988) loss 7.4039 (7.7016) grad_norm 2.7644 (2.0282) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:25:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][90/625] eta 0:03:36 lr 0.001030 wd 0.0500 time 0.3998 (0.4040) data time 0.0008 (0.0053) model time 0.3990 (0.3989) loss 8.0192 (7.7443) grad_norm 1.8282 (2.0269) loss_scale 4096.0000 (2205.5385) mem 14939MB [2024-07-24 22:25:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][100/625] eta 0:03:31 lr 0.001030 wd 0.0500 time 0.4007 (0.4036) data time 0.0008 (0.0049) model time 0.3999 (0.3991) loss 8.4370 (7.7386) grad_norm 2.1589 (2.0670) loss_scale 4096.0000 (2392.7129) mem 14939MB [2024-07-24 22:25:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][110/625] eta 0:03:27 lr 0.001030 wd 0.0500 time 0.4011 (0.4033) data time 0.0006 (0.0045) model time 0.4005 (0.3992) loss 6.1124 (7.7677) grad_norm 1.4185 (2.0802) loss_scale 4096.0000 (2546.1622) mem 14939MB [2024-07-24 22:25:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][120/625] eta 0:03:23 lr 0.001030 wd 0.0500 time 0.3998 (0.4031) data time 0.0009 (0.0042) model time 0.3990 (0.3992) loss 8.7488 (7.7619) grad_norm 1.3931 (2.0713) loss_scale 4096.0000 (2674.2479) mem 14939MB [2024-07-24 22:25:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][130/625] eta 0:03:19 lr 0.001030 wd 0.0500 time 0.4005 (0.4028) data time 0.0006 (0.0040) model time 0.3999 (0.3991) loss 8.9524 (7.7599) grad_norm 2.5766 (2.0738) loss_scale 4096.0000 (2782.7786) mem 14939MB [2024-07-24 22:25:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][140/625] eta 0:03:15 lr 0.001030 wd 0.0500 time 0.4018 (0.4026) data time 0.0006 (0.0038) model time 0.4012 (0.3991) loss 7.1152 (7.7518) grad_norm 1.7836 (2.0787) loss_scale 4096.0000 (2875.9149) mem 14939MB [2024-07-24 22:25:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][150/625] eta 0:03:11 lr 0.001030 wd 0.0500 time 0.3981 (0.4025) data time 0.0007 (0.0036) model time 0.3974 (0.3992) loss 6.3104 (7.6970) grad_norm 1.9057 (2.0826) loss_scale 4096.0000 (2956.7152) mem 14939MB [2024-07-24 22:25:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][160/625] eta 0:03:07 lr 0.001030 wd 0.0500 time 0.4000 (0.4023) data time 0.0009 (0.0034) model time 0.3991 (0.3992) loss 8.8084 (7.7093) grad_norm 1.8677 (2.0799) loss_scale 4096.0000 (3027.4783) mem 14939MB [2024-07-24 22:25:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][170/625] eta 0:03:02 lr 0.001029 wd 0.0500 time 0.3965 (0.4021) data time 0.0007 (0.0033) model time 0.3958 (0.3990) loss 8.6066 (7.7084) grad_norm 1.5399 (2.0839) loss_scale 4096.0000 (3089.9649) mem 14939MB [2024-07-24 22:26:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][180/625] eta 0:02:58 lr 0.001029 wd 0.0500 time 0.3944 (0.4018) data time 0.0009 (0.0031) model time 0.3935 (0.3988) loss 8.6628 (7.7280) grad_norm 1.6634 (2.0886) loss_scale 4096.0000 (3145.5470) mem 14939MB [2024-07-24 22:26:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][190/625] eta 0:02:56 lr 0.001029 wd 0.0500 time 0.3969 (0.4049) data time 0.0008 (0.0030) model time 0.3961 (0.4031) loss 6.6356 (7.7577) grad_norm 1.5564 (2.0772) loss_scale 4096.0000 (3195.3089) mem 14939MB [2024-07-24 22:26:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][200/625] eta 0:02:52 lr 0.001029 wd 0.0500 time 0.4003 (0.4055) data time 0.0007 (0.0029) model time 0.3996 (0.4041) loss 8.8579 (7.7786) grad_norm 2.3924 (inf) loss_scale 2048.0000 (3158.6070) mem 14939MB [2024-07-24 22:26:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][210/625] eta 0:02:48 lr 0.001029 wd 0.0500 time 0.4043 (0.4053) data time 0.0007 (0.0028) model time 0.4036 (0.4038) loss 6.8574 (7.7669) grad_norm 2.0404 (inf) loss_scale 2048.0000 (3105.9716) mem 14939MB [2024-07-24 22:26:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][220/625] eta 0:02:44 lr 0.001029 wd 0.0500 time 0.3985 (0.4051) data time 0.0006 (0.0027) model time 0.3979 (0.4035) loss 7.7438 (7.7881) grad_norm 1.6960 (inf) loss_scale 2048.0000 (3058.0995) mem 14939MB [2024-07-24 22:26:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][230/625] eta 0:02:39 lr 0.001029 wd 0.0500 time 0.3985 (0.4049) data time 0.0009 (0.0027) model time 0.3976 (0.4033) loss 6.7935 (7.7543) grad_norm 1.8601 (inf) loss_scale 2048.0000 (3014.3723) mem 14939MB [2024-07-24 22:26:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][240/625] eta 0:02:35 lr 0.001029 wd 0.0500 time 0.4033 (0.4047) data time 0.0006 (0.0026) model time 0.4027 (0.4031) loss 6.9052 (7.7370) grad_norm 2.7112 (inf) loss_scale 2048.0000 (2974.2739) mem 14939MB [2024-07-24 22:26:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][250/625] eta 0:02:31 lr 0.001029 wd 0.0500 time 0.3958 (0.4044) data time 0.0007 (0.0025) model time 0.3951 (0.4028) loss 6.3007 (7.7321) grad_norm 1.7606 (inf) loss_scale 2048.0000 (2937.3705) mem 14939MB [2024-07-24 22:26:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][260/625] eta 0:02:27 lr 0.001029 wd 0.0500 time 0.3993 (0.4042) data time 0.0008 (0.0024) model time 0.3985 (0.4026) loss 8.8477 (7.7337) grad_norm 1.9707 (inf) loss_scale 2048.0000 (2903.2950) mem 14939MB [2024-07-24 22:26:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][270/625] eta 0:02:23 lr 0.001029 wd 0.0500 time 0.3961 (0.4040) data time 0.0008 (0.0024) model time 0.3953 (0.4024) loss 7.1756 (7.7366) grad_norm 1.7486 (inf) loss_scale 2048.0000 (2871.7343) mem 14939MB [2024-07-24 22:26:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][280/625] eta 0:02:19 lr 0.001029 wd 0.0500 time 0.3934 (0.4038) data time 0.0007 (0.0023) model time 0.3927 (0.4021) loss 9.0450 (7.7482) grad_norm 1.7182 (inf) loss_scale 2048.0000 (2842.4199) mem 14939MB [2024-07-24 22:26:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][290/625] eta 0:02:15 lr 0.001029 wd 0.0500 time 0.4114 (0.4036) data time 0.0007 (0.0023) model time 0.4107 (0.4020) loss 7.8770 (7.7652) grad_norm 1.5898 (inf) loss_scale 2048.0000 (2815.1203) mem 14939MB [2024-07-24 22:26:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][300/625] eta 0:02:11 lr 0.001028 wd 0.0500 time 0.3997 (0.4035) data time 0.0007 (0.0022) model time 0.3990 (0.4018) loss 6.4274 (7.7618) grad_norm 1.7533 (inf) loss_scale 2048.0000 (2789.6346) mem 14939MB [2024-07-24 22:26:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][310/625] eta 0:02:07 lr 0.001028 wd 0.0500 time 0.3960 (0.4033) data time 0.0009 (0.0022) model time 0.3951 (0.4017) loss 8.9039 (7.7627) grad_norm 1.6703 (inf) loss_scale 2048.0000 (2765.7878) mem 14939MB [2024-07-24 22:27:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][320/625] eta 0:02:03 lr 0.001028 wd 0.0500 time 0.4011 (0.4033) data time 0.0008 (0.0022) model time 0.4003 (0.4017) loss 7.2890 (7.7576) grad_norm 1.7031 (inf) loss_scale 2048.0000 (2743.4268) mem 14939MB [2024-07-24 22:27:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][330/625] eta 0:01:58 lr 0.001028 wd 0.0500 time 0.3970 (0.4033) data time 0.0007 (0.0021) model time 0.3963 (0.4016) loss 8.7762 (7.7526) grad_norm 1.8788 (inf) loss_scale 2048.0000 (2722.4169) mem 14939MB [2024-07-24 22:27:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][340/625] eta 0:01:54 lr 0.001028 wd 0.0500 time 0.3963 (0.4031) data time 0.0007 (0.0021) model time 0.3956 (0.4015) loss 7.3716 (7.7434) grad_norm 2.0580 (inf) loss_scale 2048.0000 (2702.6393) mem 14939MB [2024-07-24 22:27:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][350/625] eta 0:01:50 lr 0.001028 wd 0.0500 time 0.3988 (0.4030) data time 0.0007 (0.0021) model time 0.3981 (0.4014) loss 8.0237 (7.7606) grad_norm 2.0489 (inf) loss_scale 2048.0000 (2683.9886) mem 14939MB [2024-07-24 22:27:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][360/625] eta 0:01:46 lr 0.001028 wd 0.0500 time 0.3990 (0.4030) data time 0.0008 (0.0020) model time 0.3982 (0.4014) loss 6.2750 (7.7630) grad_norm 1.8109 (inf) loss_scale 2048.0000 (2666.3712) mem 14939MB [2024-07-24 22:27:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][370/625] eta 0:01:42 lr 0.001028 wd 0.0500 time 0.4036 (0.4029) data time 0.0008 (0.0020) model time 0.4028 (0.4013) loss 7.2145 (7.7561) grad_norm 2.1610 (inf) loss_scale 2048.0000 (2649.7035) mem 14939MB [2024-07-24 22:27:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][380/625] eta 0:01:38 lr 0.001028 wd 0.0500 time 0.4005 (0.4029) data time 0.0007 (0.0020) model time 0.3998 (0.4013) loss 7.9539 (7.7742) grad_norm 1.7967 (inf) loss_scale 2048.0000 (2633.9108) mem 14939MB [2024-07-24 22:27:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][390/625] eta 0:01:34 lr 0.001028 wd 0.0500 time 0.4064 (0.4028) data time 0.0008 (0.0019) model time 0.4056 (0.4013) loss 8.2292 (7.7700) grad_norm 2.0877 (inf) loss_scale 2048.0000 (2618.9258) mem 14939MB [2024-07-24 22:27:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][400/625] eta 0:01:30 lr 0.001028 wd 0.0500 time 0.3951 (0.4028) data time 0.0009 (0.0019) model time 0.3942 (0.4012) loss 8.1144 (7.7644) grad_norm 2.4764 (inf) loss_scale 2048.0000 (2604.6883) mem 14939MB [2024-07-24 22:27:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][410/625] eta 0:01:26 lr 0.001028 wd 0.0500 time 0.3962 (0.4036) data time 0.0009 (0.0019) model time 0.3953 (0.4022) loss 8.1398 (7.7719) grad_norm 1.5712 (inf) loss_scale 2048.0000 (2591.1436) mem 14939MB [2024-07-24 22:27:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][420/625] eta 0:01:22 lr 0.001028 wd 0.0500 time 0.3989 (0.4040) data time 0.0007 (0.0019) model time 0.3982 (0.4026) loss 6.6431 (7.7731) grad_norm 1.6785 (inf) loss_scale 2048.0000 (2578.2423) mem 14939MB [2024-07-24 22:27:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][430/625] eta 0:01:18 lr 0.001027 wd 0.0500 time 0.3977 (0.4039) data time 0.0008 (0.0018) model time 0.3968 (0.4026) loss 7.7389 (7.7675) grad_norm 1.3810 (inf) loss_scale 2048.0000 (2565.9397) mem 14939MB [2024-07-24 22:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][440/625] eta 0:01:14 lr 0.001027 wd 0.0500 time 0.3965 (0.4038) data time 0.0008 (0.0018) model time 0.3957 (0.4025) loss 8.4921 (7.7576) grad_norm 1.7821 (inf) loss_scale 2048.0000 (2554.1950) mem 14939MB [2024-07-24 22:27:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][450/625] eta 0:01:10 lr 0.001027 wd 0.0500 time 0.4030 (0.4038) data time 0.0009 (0.0018) model time 0.4021 (0.4025) loss 8.0604 (7.7514) grad_norm 2.5082 (inf) loss_scale 2048.0000 (2542.9712) mem 14939MB [2024-07-24 22:27:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][460/625] eta 0:01:06 lr 0.001027 wd 0.0500 time 0.3993 (0.4038) data time 0.0009 (0.0018) model time 0.3984 (0.4024) loss 6.5716 (7.7635) grad_norm 1.2219 (inf) loss_scale 2048.0000 (2532.2343) mem 14939MB [2024-07-24 22:28:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][470/625] eta 0:01:02 lr 0.001027 wd 0.0500 time 0.4054 (0.4037) data time 0.0007 (0.0018) model time 0.4048 (0.4024) loss 8.4677 (7.7674) grad_norm 1.6999 (inf) loss_scale 2048.0000 (2521.9533) mem 14939MB [2024-07-24 22:28:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][480/625] eta 0:00:58 lr 0.001027 wd 0.0500 time 0.3993 (0.4036) data time 0.0008 (0.0017) model time 0.3985 (0.4023) loss 6.8361 (7.7553) grad_norm 1.6349 (inf) loss_scale 2048.0000 (2512.0998) mem 14939MB [2024-07-24 22:28:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][490/625] eta 0:00:54 lr 0.001027 wd 0.0500 time 0.3973 (0.4035) data time 0.0009 (0.0017) model time 0.3964 (0.4022) loss 7.8689 (7.7650) grad_norm 1.5299 (inf) loss_scale 2048.0000 (2502.6477) mem 14939MB [2024-07-24 22:28:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][500/625] eta 0:00:50 lr 0.001027 wd 0.0500 time 0.4011 (0.4035) data time 0.0008 (0.0017) model time 0.4003 (0.4022) loss 6.5365 (7.7536) grad_norm 4.5858 (inf) loss_scale 2048.0000 (2493.5729) mem 14939MB [2024-07-24 22:28:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][510/625] eta 0:00:46 lr 0.001027 wd 0.0500 time 0.3953 (0.4035) data time 0.0009 (0.0017) model time 0.3945 (0.4021) loss 8.3553 (7.7486) grad_norm 3.2841 (inf) loss_scale 2048.0000 (2484.8532) mem 14939MB [2024-07-24 22:28:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][520/625] eta 0:00:42 lr 0.001027 wd 0.0500 time 0.3985 (0.4034) data time 0.0006 (0.0017) model time 0.3979 (0.4021) loss 6.3936 (7.7445) grad_norm 1.8726 (inf) loss_scale 2048.0000 (2476.4683) mem 14939MB [2024-07-24 22:28:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][530/625] eta 0:00:38 lr 0.001027 wd 0.0500 time 0.3982 (0.4033) data time 0.0009 (0.0017) model time 0.3974 (0.4020) loss 6.1652 (7.7382) grad_norm 2.4063 (inf) loss_scale 2048.0000 (2468.3992) mem 14939MB [2024-07-24 22:28:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][540/625] eta 0:00:34 lr 0.001027 wd 0.0500 time 0.4022 (0.4033) data time 0.0009 (0.0017) model time 0.4013 (0.4020) loss 8.0042 (7.7403) grad_norm 1.8411 (inf) loss_scale 2048.0000 (2460.6285) mem 14939MB [2024-07-24 22:28:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][550/625] eta 0:00:30 lr 0.001027 wd 0.0500 time 0.3979 (0.4033) data time 0.0006 (0.0016) model time 0.3973 (0.4019) loss 6.6494 (7.7358) grad_norm 2.2921 (inf) loss_scale 2048.0000 (2453.1397) mem 14939MB [2024-07-24 22:28:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][560/625] eta 0:00:26 lr 0.001027 wd 0.0500 time 0.3981 (0.4032) data time 0.0006 (0.0016) model time 0.3975 (0.4019) loss 9.4291 (7.7447) grad_norm 1.5080 (inf) loss_scale 2048.0000 (2445.9180) mem 14939MB [2024-07-24 22:28:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][570/625] eta 0:00:22 lr 0.001026 wd 0.0500 time 0.3984 (0.4031) data time 0.0007 (0.0016) model time 0.3977 (0.4018) loss 8.5028 (7.7361) grad_norm 1.6083 (inf) loss_scale 2048.0000 (2438.9492) mem 14939MB [2024-07-24 22:28:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][580/625] eta 0:00:18 lr 0.001026 wd 0.0500 time 0.3979 (0.4031) data time 0.0006 (0.0016) model time 0.3974 (0.4018) loss 6.6585 (7.7328) grad_norm 1.8249 (inf) loss_scale 2048.0000 (2432.2203) mem 14939MB [2024-07-24 22:28:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][590/625] eta 0:00:14 lr 0.001026 wd 0.0500 time 0.4000 (0.4030) data time 0.0010 (0.0016) model time 0.3990 (0.4017) loss 8.6353 (7.7440) grad_norm 1.2522 (inf) loss_scale 2048.0000 (2425.7191) mem 14939MB [2024-07-24 22:28:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][600/625] eta 0:00:10 lr 0.001026 wd 0.0500 time 0.4031 (0.4030) data time 0.0009 (0.0016) model time 0.4022 (0.4017) loss 8.5510 (7.7445) grad_norm 2.5110 (inf) loss_scale 2048.0000 (2419.4343) mem 14939MB [2024-07-24 22:28:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][610/625] eta 0:00:06 lr 0.001026 wd 0.0500 time 0.3935 (0.4030) data time 0.0005 (0.0016) model time 0.3930 (0.4017) loss 7.1239 (7.7477) grad_norm 2.5509 (inf) loss_scale 2048.0000 (2413.3552) mem 14939MB [2024-07-24 22:29:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][620/625] eta 0:00:02 lr 0.001026 wd 0.0500 time 0.3983 (0.4029) data time 0.0006 (0.0016) model time 0.3977 (0.4016) loss 7.5026 (7.7493) grad_norm 2.8036 (inf) loss_scale 2048.0000 (2407.4718) mem 14939MB [2024-07-24 22:29:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 89 training takes 0:04:11 [2024-07-24 22:29:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:29:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:29:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.443 (0.443) Loss 0.6445 (0.6445) Acc@1 87.061 (87.061) Acc@5 97.656 (97.656) Mem 14939MB [2024-07-24 22:29:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 1.0703 (0.7888) Acc@1 74.023 (83.296) Acc@5 94.287 (96.746) Mem 14939MB [2024-07-24 22:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1895 (0.9363) Acc@1 71.875 (79.520) Acc@5 92.578 (95.013) Mem 14939MB [2024-07-24 22:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.223 Acc@5 94.994 [2024-07-24 22:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-24 22:29:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.751 (0.751) Loss 0.6313 (0.6313) Acc@1 87.402 (87.402) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-24 22:29:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.152) Loss 1.0156 (0.7754) Acc@1 77.832 (84.069) Acc@5 94.727 (97.026) Mem 14939MB [2024-07-24 22:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 1.1650 (0.9204) Acc@1 72.461 (80.234) Acc@5 93.164 (95.382) Mem 14939MB [2024-07-24 22:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.922 Acc@5 95.355 [2024-07-24 22:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-24 22:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.92% [2024-07-24 22:29:09 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:29:10 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:29:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][0/625] eta 0:08:16 lr 0.001026 wd 0.0500 time 0.7936 (0.7936) data time 0.4174 (0.4174) model time 0.0000 (0.0000) loss 8.2721 (8.2721) grad_norm 1.6444 (1.6444) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][10/625] eta 0:04:35 lr 0.001026 wd 0.0500 time 0.3967 (0.4483) data time 0.0007 (0.0387) model time 0.0000 (0.0000) loss 8.1886 (7.7709) grad_norm 3.1876 (2.5948) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][20/625] eta 0:04:17 lr 0.001026 wd 0.0500 time 0.4026 (0.4250) data time 0.0009 (0.0207) model time 0.0000 (0.0000) loss 7.5454 (7.5400) grad_norm 2.1553 (2.5034) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][30/625] eta 0:04:08 lr 0.001026 wd 0.0500 time 0.4004 (0.4179) data time 0.0007 (0.0143) model time 0.0000 (0.0000) loss 7.6659 (7.7301) grad_norm 1.6219 (2.3079) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][40/625] eta 0:04:02 lr 0.001026 wd 0.0500 time 0.3941 (0.4137) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 7.8440 (7.6600) grad_norm 1.3464 (2.2057) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][50/625] eta 0:03:56 lr 0.001026 wd 0.0500 time 0.4057 (0.4110) data time 0.0007 (0.0091) model time 0.0000 (0.0000) loss 9.4795 (7.7164) grad_norm 1.5790 (2.0961) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][60/625] eta 0:03:51 lr 0.001026 wd 0.0500 time 0.4025 (0.4093) data time 0.0009 (0.0077) model time 0.4016 (0.3999) loss 8.0079 (7.6925) grad_norm 2.9193 (2.2300) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][70/625] eta 0:03:46 lr 0.001025 wd 0.0500 time 0.3986 (0.4079) data time 0.0009 (0.0068) model time 0.3977 (0.3991) loss 7.2248 (7.6724) grad_norm 3.7262 (2.2939) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][80/625] eta 0:03:41 lr 0.001025 wd 0.0500 time 0.3994 (0.4069) data time 0.0009 (0.0060) model time 0.3985 (0.3990) loss 8.4216 (7.6824) grad_norm 1.8462 (2.2441) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][90/625] eta 0:03:37 lr 0.001025 wd 0.0500 time 0.3980 (0.4061) data time 0.0009 (0.0055) model time 0.3971 (0.3988) loss 8.8459 (7.7240) grad_norm 1.5181 (2.1790) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][100/625] eta 0:03:32 lr 0.001025 wd 0.0500 time 0.3993 (0.4056) data time 0.0007 (0.0050) model time 0.3985 (0.3990) loss 5.6424 (7.7215) grad_norm 2.5272 (2.1621) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][110/625] eta 0:03:28 lr 0.001025 wd 0.0500 time 0.3966 (0.4051) data time 0.0008 (0.0047) model time 0.3958 (0.3990) loss 6.8658 (7.7229) grad_norm 1.8034 (2.1693) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:29:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][120/625] eta 0:03:24 lr 0.001025 wd 0.0500 time 0.4037 (0.4045) data time 0.0007 (0.0044) model time 0.4030 (0.3989) loss 8.0158 (7.7239) grad_norm 2.8967 (2.2108) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][130/625] eta 0:03:20 lr 0.001025 wd 0.0500 time 0.4025 (0.4041) data time 0.0007 (0.0041) model time 0.4018 (0.3988) loss 6.7592 (7.7345) grad_norm 1.3502 (2.1972) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][140/625] eta 0:03:15 lr 0.001025 wd 0.0500 time 0.3963 (0.4039) data time 0.0009 (0.0039) model time 0.3955 (0.3990) loss 6.8411 (7.7511) grad_norm 2.2809 (2.1736) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][150/625] eta 0:03:11 lr 0.001025 wd 0.0500 time 0.4099 (0.4037) data time 0.0006 (0.0037) model time 0.4092 (0.3991) loss 8.6595 (7.7879) grad_norm 2.5373 (2.1761) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][160/625] eta 0:03:07 lr 0.001025 wd 0.0500 time 0.4072 (0.4034) data time 0.0008 (0.0035) model time 0.4064 (0.3990) loss 8.7058 (7.7714) grad_norm 1.3844 (2.1602) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][170/625] eta 0:03:04 lr 0.001025 wd 0.0500 time 0.3801 (0.4045) data time 0.0010 (0.0034) model time 0.3791 (0.4008) loss 6.8440 (7.7523) grad_norm 2.1781 (2.1533) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][180/625] eta 0:02:59 lr 0.001025 wd 0.0500 time 0.4004 (0.4044) data time 0.0009 (0.0032) model time 0.3995 (0.4008) loss 7.0218 (7.7321) grad_norm 3.1387 (2.1384) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][190/625] eta 0:02:55 lr 0.001025 wd 0.0500 time 0.3999 (0.4042) data time 0.0006 (0.0031) model time 0.3993 (0.4007) loss 7.5208 (7.7200) grad_norm 3.1117 (2.1391) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][200/625] eta 0:02:51 lr 0.001025 wd 0.0500 time 0.4006 (0.4040) data time 0.0010 (0.0030) model time 0.3996 (0.4006) loss 7.3856 (7.7146) grad_norm 3.7067 (2.1590) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][210/625] eta 0:02:47 lr 0.001024 wd 0.0500 time 0.3996 (0.4038) data time 0.0006 (0.0029) model time 0.3989 (0.4006) loss 6.8707 (7.7018) grad_norm 2.3256 (2.1475) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][220/625] eta 0:02:44 lr 0.001024 wd 0.0500 time 0.6269 (0.4056) data time 0.0006 (0.0028) model time 0.6263 (0.4030) loss 7.3285 (7.7076) grad_norm 2.9759 (2.1558) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][230/625] eta 0:02:40 lr 0.001024 wd 0.0500 time 0.4003 (0.4053) data time 0.0007 (0.0027) model time 0.3997 (0.4028) loss 7.8573 (7.7187) grad_norm 1.8318 (2.1378) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][240/625] eta 0:02:36 lr 0.001024 wd 0.0500 time 0.4019 (0.4052) data time 0.0008 (0.0027) model time 0.4011 (0.4027) loss 6.7527 (7.7224) grad_norm 1.6024 (2.1301) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][250/625] eta 0:02:31 lr 0.001024 wd 0.0500 time 0.3994 (0.4049) data time 0.0006 (0.0026) model time 0.3988 (0.4024) loss 6.8320 (7.7175) grad_norm 2.0104 (2.1172) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][260/625] eta 0:02:27 lr 0.001024 wd 0.0500 time 0.3980 (0.4047) data time 0.0007 (0.0025) model time 0.3973 (0.4023) loss 8.2004 (7.7068) grad_norm 1.3852 (2.1098) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][270/625] eta 0:02:23 lr 0.001024 wd 0.0500 time 0.3948 (0.4045) data time 0.0009 (0.0025) model time 0.3940 (0.4021) loss 8.3751 (7.7142) grad_norm 1.4932 (2.1132) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][280/625] eta 0:02:19 lr 0.001024 wd 0.0500 time 0.3988 (0.4044) data time 0.0009 (0.0024) model time 0.3979 (0.4020) loss 8.9065 (7.7242) grad_norm 2.6036 (2.1004) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][290/625] eta 0:02:15 lr 0.001024 wd 0.0500 time 0.3942 (0.4042) data time 0.0009 (0.0024) model time 0.3933 (0.4018) loss 6.3379 (7.7328) grad_norm 1.5891 (2.0858) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][300/625] eta 0:02:11 lr 0.001024 wd 0.0500 time 0.4001 (0.4041) data time 0.0008 (0.0023) model time 0.3993 (0.4018) loss 8.0697 (7.7381) grad_norm 1.4798 (2.0759) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][310/625] eta 0:02:07 lr 0.001024 wd 0.0500 time 0.4031 (0.4040) data time 0.0009 (0.0023) model time 0.4022 (0.4017) loss 8.0470 (7.7422) grad_norm 2.2412 (2.0717) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][320/625] eta 0:02:03 lr 0.001024 wd 0.0500 time 0.3962 (0.4038) data time 0.0008 (0.0022) model time 0.3953 (0.4015) loss 7.4626 (7.7369) grad_norm 2.3674 (2.0745) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][330/625] eta 0:01:59 lr 0.001024 wd 0.0500 time 0.3988 (0.4037) data time 0.0009 (0.0022) model time 0.3980 (0.4015) loss 8.8838 (7.7377) grad_norm 1.9473 (2.0617) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][340/625] eta 0:01:55 lr 0.001023 wd 0.0500 time 0.4030 (0.4036) data time 0.0008 (0.0022) model time 0.4022 (0.4014) loss 7.7915 (7.7405) grad_norm 3.2205 (2.0549) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][350/625] eta 0:01:50 lr 0.001023 wd 0.0500 time 0.3984 (0.4035) data time 0.0009 (0.0021) model time 0.3974 (0.4013) loss 7.5175 (7.7264) grad_norm 1.4848 (2.0483) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][360/625] eta 0:01:46 lr 0.001023 wd 0.0500 time 0.3969 (0.4034) data time 0.0008 (0.0021) model time 0.3961 (0.4012) loss 8.3428 (7.7187) grad_norm 2.9264 (2.0431) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][370/625] eta 0:01:42 lr 0.001023 wd 0.0500 time 0.3998 (0.4033) data time 0.0007 (0.0020) model time 0.3991 (0.4011) loss 8.6254 (7.7211) grad_norm 1.3995 (2.0373) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][380/625] eta 0:01:38 lr 0.001023 wd 0.0500 time 0.3971 (0.4032) data time 0.0008 (0.0020) model time 0.3963 (0.4010) loss 8.9843 (7.7120) grad_norm 2.0742 (2.0406) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][390/625] eta 0:01:34 lr 0.001023 wd 0.0500 time 0.3981 (0.4034) data time 0.0008 (0.0020) model time 0.3973 (0.4014) loss 7.8324 (7.7203) grad_norm 3.0410 (2.0430) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][400/625] eta 0:01:30 lr 0.001023 wd 0.0500 time 0.3987 (0.4034) data time 0.0010 (0.0020) model time 0.3978 (0.4013) loss 7.6894 (7.7259) grad_norm 2.0566 (2.0360) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:31:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][410/625] eta 0:01:26 lr 0.001023 wd 0.0500 time 0.3994 (0.4034) data time 0.0007 (0.0019) model time 0.3987 (0.4013) loss 8.5839 (7.7370) grad_norm 2.6711 (2.0403) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][420/625] eta 0:01:22 lr 0.001023 wd 0.0500 time 0.3978 (0.4033) data time 0.0006 (0.0019) model time 0.3972 (0.4013) loss 6.8008 (7.7304) grad_norm 2.2961 (2.0403) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][430/625] eta 0:01:18 lr 0.001023 wd 0.0500 time 0.3930 (0.4032) data time 0.0008 (0.0019) model time 0.3921 (0.4012) loss 7.2655 (7.7285) grad_norm 1.5492 (2.0442) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][440/625] eta 0:01:14 lr 0.001023 wd 0.0500 time 0.6057 (0.4044) data time 0.0008 (0.0019) model time 0.6049 (0.4026) loss 9.1852 (7.7319) grad_norm 2.3168 (2.0441) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][450/625] eta 0:01:10 lr 0.001023 wd 0.0500 time 0.3959 (0.4047) data time 0.0009 (0.0018) model time 0.3949 (0.4030) loss 7.4133 (7.7363) grad_norm 1.7971 (2.0383) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][460/625] eta 0:01:06 lr 0.001023 wd 0.0500 time 0.3961 (0.4046) data time 0.0007 (0.0018) model time 0.3953 (0.4029) loss 7.7591 (7.7275) grad_norm 2.7620 (2.0402) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][470/625] eta 0:01:02 lr 0.001022 wd 0.0500 time 0.4018 (0.4045) data time 0.0008 (0.0018) model time 0.4010 (0.4028) loss 6.5029 (7.7254) grad_norm 1.7797 (2.0422) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][480/625] eta 0:00:58 lr 0.001022 wd 0.0500 time 0.4161 (0.4045) data time 0.0007 (0.0018) model time 0.4154 (0.4028) loss 6.4164 (7.7297) grad_norm 3.9785 (2.0567) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][490/625] eta 0:00:54 lr 0.001022 wd 0.0500 time 0.4000 (0.4045) data time 0.0006 (0.0018) model time 0.3994 (0.4028) loss 6.1290 (7.7342) grad_norm 2.2130 (2.0731) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][500/625] eta 0:00:50 lr 0.001022 wd 0.0500 time 0.3999 (0.4045) data time 0.0007 (0.0018) model time 0.3992 (0.4028) loss 8.8749 (7.7374) grad_norm 1.5034 (2.0700) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][510/625] eta 0:00:46 lr 0.001022 wd 0.0500 time 0.3970 (0.4044) data time 0.0009 (0.0017) model time 0.3961 (0.4027) loss 8.7728 (7.7443) grad_norm 2.3365 (2.0699) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][520/625] eta 0:00:42 lr 0.001022 wd 0.0500 time 0.3994 (0.4043) data time 0.0009 (0.0017) model time 0.3986 (0.4026) loss 8.3517 (7.7540) grad_norm 2.0367 (2.0631) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][530/625] eta 0:00:38 lr 0.001022 wd 0.0500 time 0.4155 (0.4043) data time 0.0007 (0.0017) model time 0.4148 (0.4026) loss 7.6542 (7.7568) grad_norm 2.0537 (2.0593) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][540/625] eta 0:00:34 lr 0.001022 wd 0.0500 time 0.3967 (0.4042) data time 0.0009 (0.0017) model time 0.3958 (0.4025) loss 6.9812 (7.7537) grad_norm 2.1383 (2.0517) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][550/625] eta 0:00:30 lr 0.001022 wd 0.0500 time 0.3962 (0.4041) data time 0.0007 (0.0017) model time 0.3955 (0.4025) loss 8.6224 (7.7571) grad_norm 1.8429 (2.0498) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:32:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][560/625] eta 0:00:26 lr 0.001022 wd 0.0500 time 0.3982 (0.4040) data time 0.0007 (0.0017) model time 0.3975 (0.4024) loss 7.5556 (7.7538) grad_norm 2.3233 (2.0547) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][570/625] eta 0:00:22 lr 0.001022 wd 0.0500 time 0.4044 (0.4039) data time 0.0008 (0.0017) model time 0.4036 (0.4023) loss 7.1171 (7.7468) grad_norm 2.0365 (2.0658) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][580/625] eta 0:00:18 lr 0.001022 wd 0.0500 time 0.3965 (0.4038) data time 0.0009 (0.0016) model time 0.3956 (0.4022) loss 8.4013 (7.7534) grad_norm 1.4758 (2.0680) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][590/625] eta 0:00:14 lr 0.001022 wd 0.0500 time 0.3990 (0.4038) data time 0.0006 (0.0016) model time 0.3984 (0.4021) loss 8.7870 (7.7550) grad_norm 2.3138 (2.0632) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][600/625] eta 0:00:10 lr 0.001021 wd 0.0500 time 0.4018 (0.4037) data time 0.0008 (0.0016) model time 0.4010 (0.4021) loss 6.9299 (7.7595) grad_norm 1.4477 (2.0603) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][610/625] eta 0:00:06 lr 0.001021 wd 0.0500 time 0.3944 (0.4039) data time 0.0004 (0.0016) model time 0.3940 (0.4023) loss 6.8777 (7.7545) grad_norm 2.8325 (2.0576) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][620/625] eta 0:00:02 lr 0.001021 wd 0.0500 time 0.3975 (0.4038) data time 0.0004 (0.0016) model time 0.3972 (0.4022) loss 7.1014 (7.7627) grad_norm 1.4462 (2.0537) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 90 training takes 0:04:12 [2024-07-24 22:33:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:33:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:33:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.442 (0.442) Loss 0.6650 (0.6650) Acc@1 86.963 (86.963) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-24 22:33:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 1.0586 (0.7933) Acc@1 77.100 (83.718) Acc@5 94.531 (96.897) Mem 14939MB [2024-07-24 22:33:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.2100 (0.9562) Acc@1 72.461 (79.648) Acc@5 92.480 (95.080) Mem 14939MB [2024-07-24 22:33:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.281 Acc@5 95.054 [2024-07-24 22:33:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-07-24 22:33:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.750 (0.750) Loss 0.6299 (0.6299) Acc@1 87.402 (87.402) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-24 22:33:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 1.0137 (0.7738) Acc@1 77.832 (84.082) Acc@5 94.727 (97.035) Mem 14939MB [2024-07-24 22:33:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 1.1621 (0.9185) Acc@1 72.559 (80.283) Acc@5 93.262 (95.403) Mem 14939MB [2024-07-24 22:33:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.966 Acc@5 95.375 [2024-07-24 22:33:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-07-24 22:33:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.97% [2024-07-24 22:33:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:33:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:33:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][0/625] eta 0:08:20 lr 0.001021 wd 0.0500 time 0.8013 (0.8013) data time 0.4227 (0.4227) model time 0.0000 (0.0000) loss 7.7897 (7.7897) grad_norm 1.4803 (1.4803) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][10/625] eta 0:04:26 lr 0.001021 wd 0.0500 time 0.3961 (0.4338) data time 0.0008 (0.0394) model time 0.0000 (0.0000) loss 7.5619 (7.9666) grad_norm 2.4608 (1.6713) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][20/625] eta 0:04:12 lr 0.001021 wd 0.0500 time 0.3934 (0.4166) data time 0.0009 (0.0210) model time 0.0000 (0.0000) loss 8.0414 (7.8881) grad_norm 1.6043 (1.8375) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][30/625] eta 0:04:04 lr 0.001021 wd 0.0500 time 0.4023 (0.4106) data time 0.0009 (0.0145) model time 0.0000 (0.0000) loss 7.6037 (7.8896) grad_norm 4.0900 (2.0843) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][40/625] eta 0:04:06 lr 0.001021 wd 0.0500 time 0.3970 (0.4221) data time 0.0006 (0.0112) model time 0.0000 (0.0000) loss 7.2527 (7.9582) grad_norm 1.7236 (2.0261) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][50/625] eta 0:04:00 lr 0.001021 wd 0.0500 time 0.3981 (0.4175) data time 0.0007 (0.0092) model time 0.0000 (0.0000) loss 8.3411 (7.9188) grad_norm 1.4207 (2.0495) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][60/625] eta 0:03:54 lr 0.001021 wd 0.0500 time 0.3989 (0.4149) data time 0.0008 (0.0078) model time 0.3981 (0.4003) loss 8.7502 (7.9294) grad_norm 1.8675 (2.0334) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][70/625] eta 0:03:48 lr 0.001021 wd 0.0500 time 0.3987 (0.4126) data time 0.0009 (0.0069) model time 0.3978 (0.3990) loss 5.8536 (7.8417) grad_norm 1.9890 (2.0333) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][80/625] eta 0:03:44 lr 0.001021 wd 0.0500 time 0.3998 (0.4110) data time 0.0008 (0.0061) model time 0.3990 (0.3990) loss 7.6049 (7.8280) grad_norm 2.2061 (2.0326) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][90/625] eta 0:03:39 lr 0.001021 wd 0.0500 time 0.4011 (0.4099) data time 0.0005 (0.0056) model time 0.4006 (0.3993) loss 6.9186 (7.7892) grad_norm 2.1486 (2.0284) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][100/625] eta 0:03:34 lr 0.001021 wd 0.0500 time 0.3994 (0.4090) data time 0.0009 (0.0051) model time 0.3985 (0.3993) loss 8.2324 (7.8192) grad_norm 1.9723 (2.0309) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][110/625] eta 0:03:30 lr 0.001020 wd 0.0500 time 0.4063 (0.4081) data time 0.0006 (0.0047) model time 0.4056 (0.3992) loss 7.3869 (7.8145) grad_norm 1.8445 (2.0148) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][120/625] eta 0:03:25 lr 0.001020 wd 0.0500 time 0.4012 (0.4073) data time 0.0007 (0.0044) model time 0.4005 (0.3990) loss 8.3522 (7.7843) grad_norm 3.4884 (2.0249) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][130/625] eta 0:03:21 lr 0.001020 wd 0.0500 time 0.3975 (0.4069) data time 0.0006 (0.0041) model time 0.3969 (0.3991) loss 9.3178 (7.7729) grad_norm 2.2045 (2.0414) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][140/625] eta 0:03:17 lr 0.001020 wd 0.0500 time 0.3779 (0.4072) data time 0.0009 (0.0039) model time 0.3770 (0.4005) loss 7.7647 (7.7630) grad_norm 2.0269 (2.0224) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][150/625] eta 0:03:13 lr 0.001020 wd 0.0500 time 0.4008 (0.4067) data time 0.0009 (0.0037) model time 0.3999 (0.4002) loss 7.3083 (7.7798) grad_norm 1.4605 (2.0121) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][160/625] eta 0:03:08 lr 0.001020 wd 0.0500 time 0.3974 (0.4062) data time 0.0007 (0.0035) model time 0.3966 (0.4001) loss 8.2462 (7.7726) grad_norm 1.5964 (1.9927) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][170/625] eta 0:03:04 lr 0.001020 wd 0.0500 time 0.4012 (0.4059) data time 0.0008 (0.0034) model time 0.4003 (0.4000) loss 7.2849 (7.7725) grad_norm 1.7485 (2.0011) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][180/625] eta 0:03:00 lr 0.001020 wd 0.0500 time 0.4014 (0.4056) data time 0.0008 (0.0032) model time 0.4006 (0.4000) loss 6.8645 (7.7392) grad_norm 3.9011 (2.0260) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][190/625] eta 0:02:56 lr 0.001020 wd 0.0500 time 0.3965 (0.4052) data time 0.0009 (0.0031) model time 0.3956 (0.3998) loss 6.5957 (7.7065) grad_norm 1.5637 (2.0125) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][200/625] eta 0:02:52 lr 0.001020 wd 0.0500 time 0.4003 (0.4049) data time 0.0007 (0.0030) model time 0.3997 (0.3998) loss 7.1819 (7.6782) grad_norm 1.6179 (1.9919) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][210/625] eta 0:02:47 lr 0.001020 wd 0.0500 time 0.3998 (0.4047) data time 0.0006 (0.0029) model time 0.3993 (0.3997) loss 6.7121 (7.6868) grad_norm 1.9362 (1.9872) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:34:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][220/625] eta 0:02:43 lr 0.001020 wd 0.0500 time 0.4002 (0.4046) data time 0.0009 (0.0028) model time 0.3993 (0.3998) loss 6.9998 (7.6565) grad_norm 2.3417 (2.0001) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][230/625] eta 0:02:39 lr 0.001020 wd 0.0500 time 0.4070 (0.4045) data time 0.0007 (0.0027) model time 0.4063 (0.3998) loss 7.2494 (7.6631) grad_norm 1.2219 (2.0072) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][240/625] eta 0:02:35 lr 0.001019 wd 0.0500 time 0.3988 (0.4043) data time 0.0009 (0.0027) model time 0.3979 (0.3998) loss 9.5115 (7.6732) grad_norm 2.2075 (1.9992) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][250/625] eta 0:02:31 lr 0.001019 wd 0.0500 time 0.3977 (0.4040) data time 0.0006 (0.0026) model time 0.3971 (0.3997) loss 6.7394 (7.6762) grad_norm 1.9440 (1.9850) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][260/625] eta 0:02:28 lr 0.001019 wd 0.0500 time 0.3968 (0.4064) data time 0.0009 (0.0025) model time 0.3959 (0.4028) loss 8.1890 (7.6705) grad_norm 1.8041 (1.9762) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][270/625] eta 0:02:24 lr 0.001019 wd 0.0500 time 0.3969 (0.4061) data time 0.0009 (0.0025) model time 0.3960 (0.4025) loss 7.0616 (7.6704) grad_norm 2.5166 (1.9748) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][280/625] eta 0:02:20 lr 0.001019 wd 0.0500 time 0.3979 (0.4059) data time 0.0009 (0.0024) model time 0.3970 (0.4024) loss 7.4063 (7.6648) grad_norm 1.8196 (1.9839) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][290/625] eta 0:02:15 lr 0.001019 wd 0.0500 time 0.3960 (0.4057) data time 0.0006 (0.0024) model time 0.3954 (0.4023) loss 8.2710 (7.6556) grad_norm 2.2226 (1.9832) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][300/625] eta 0:02:11 lr 0.001019 wd 0.0500 time 0.4051 (0.4055) data time 0.0008 (0.0023) model time 0.4043 (0.4021) loss 7.4691 (7.6618) grad_norm 1.9433 (1.9899) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][310/625] eta 0:02:07 lr 0.001019 wd 0.0500 time 0.4014 (0.4053) data time 0.0008 (0.0023) model time 0.4006 (0.4020) loss 8.1840 (7.6767) grad_norm 1.7122 (1.9856) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][320/625] eta 0:02:03 lr 0.001019 wd 0.0500 time 0.3972 (0.4051) data time 0.0011 (0.0022) model time 0.3962 (0.4019) loss 8.4375 (7.6682) grad_norm 2.2578 (1.9894) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][330/625] eta 0:01:59 lr 0.001019 wd 0.0500 time 0.3947 (0.4050) data time 0.0009 (0.0022) model time 0.3938 (0.4018) loss 7.7969 (7.6827) grad_norm 1.5845 (1.9851) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][340/625] eta 0:01:55 lr 0.001019 wd 0.0500 time 0.4024 (0.4049) data time 0.0007 (0.0021) model time 0.4017 (0.4018) loss 6.5320 (7.6792) grad_norm 1.7820 (1.9909) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][350/625] eta 0:01:51 lr 0.001019 wd 0.0500 time 0.4048 (0.4048) data time 0.0007 (0.0021) model time 0.4041 (0.4017) loss 8.1679 (7.6958) grad_norm 1.4862 (1.9872) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:35:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][360/625] eta 0:01:47 lr 0.001019 wd 0.0500 time 0.3804 (0.4051) data time 0.0009 (0.0021) model time 0.3795 (0.4022) loss 7.2599 (7.6889) grad_norm 3.6922 (2.0222) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][370/625] eta 0:01:43 lr 0.001018 wd 0.0500 time 0.4003 (0.4050) data time 0.0009 (0.0020) model time 0.3995 (0.4021) loss 8.4059 (7.6954) grad_norm 2.1869 (2.0252) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][380/625] eta 0:01:39 lr 0.001018 wd 0.0500 time 0.3968 (0.4049) data time 0.0008 (0.0020) model time 0.3960 (0.4020) loss 8.9439 (7.6950) grad_norm 2.8127 (2.0249) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][390/625] eta 0:01:35 lr 0.001018 wd 0.0500 time 0.4134 (0.4048) data time 0.0007 (0.0020) model time 0.4127 (0.4020) loss 7.8058 (7.6991) grad_norm 1.7443 (2.0158) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][400/625] eta 0:01:31 lr 0.001018 wd 0.0500 time 0.4052 (0.4048) data time 0.0008 (0.0020) model time 0.4044 (0.4020) loss 7.7187 (7.6858) grad_norm 2.0136 (2.0099) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][410/625] eta 0:01:26 lr 0.001018 wd 0.0500 time 0.4011 (0.4046) data time 0.0009 (0.0019) model time 0.4003 (0.4019) loss 7.0009 (7.6853) grad_norm 1.8979 (2.0060) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][420/625] eta 0:01:22 lr 0.001018 wd 0.0500 time 0.4001 (0.4046) data time 0.0008 (0.0019) model time 0.3993 (0.4019) loss 7.9921 (7.6839) grad_norm 2.6984 (2.0156) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][430/625] eta 0:01:18 lr 0.001018 wd 0.0500 time 0.4017 (0.4045) data time 0.0009 (0.0019) model time 0.4008 (0.4018) loss 7.3884 (7.6885) grad_norm 2.3970 (2.0165) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][440/625] eta 0:01:14 lr 0.001018 wd 0.0500 time 0.3987 (0.4044) data time 0.0008 (0.0019) model time 0.3979 (0.4017) loss 8.6672 (7.6853) grad_norm 1.5269 (2.0149) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][450/625] eta 0:01:10 lr 0.001018 wd 0.0500 time 0.3991 (0.4042) data time 0.0009 (0.0018) model time 0.3983 (0.4016) loss 8.3699 (7.6871) grad_norm 1.4189 (2.0084) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][460/625] eta 0:01:06 lr 0.001018 wd 0.0500 time 0.4001 (0.4041) data time 0.0007 (0.0018) model time 0.3993 (0.4015) loss 6.5316 (7.6804) grad_norm 2.1216 (2.0042) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][470/625] eta 0:01:02 lr 0.001018 wd 0.0500 time 0.4000 (0.4040) data time 0.0006 (0.0018) model time 0.3994 (0.4014) loss 6.4623 (7.6787) grad_norm 1.6642 (1.9978) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][480/625] eta 0:00:58 lr 0.001018 wd 0.0500 time 0.3987 (0.4055) data time 0.0008 (0.0018) model time 0.3978 (0.4031) loss 9.4179 (7.6768) grad_norm 1.6798 (1.9918) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][490/625] eta 0:00:54 lr 0.001018 wd 0.0500 time 0.3984 (0.4053) data time 0.0008 (0.0018) model time 0.3976 (0.4030) loss 6.1025 (7.6732) grad_norm 2.1853 (1.9883) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][500/625] eta 0:00:50 lr 0.001017 wd 0.0500 time 0.3972 (0.4052) data time 0.0007 (0.0018) model time 0.3966 (0.4029) loss 7.8976 (7.6749) grad_norm 1.6932 (1.9866) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:36:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][510/625] eta 0:00:46 lr 0.001017 wd 0.0500 time 0.4095 (0.4052) data time 0.0008 (0.0017) model time 0.4087 (0.4029) loss 8.9568 (7.6716) grad_norm 1.6249 (1.9933) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][520/625] eta 0:00:42 lr 0.001017 wd 0.0500 time 0.3969 (0.4051) data time 0.0010 (0.0017) model time 0.3960 (0.4028) loss 8.9310 (7.6711) grad_norm 3.2190 (1.9905) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][530/625] eta 0:00:38 lr 0.001017 wd 0.0500 time 0.4008 (0.4050) data time 0.0008 (0.0017) model time 0.4000 (0.4027) loss 7.4275 (7.6759) grad_norm 1.7333 (1.9942) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][540/625] eta 0:00:34 lr 0.001017 wd 0.0500 time 0.3969 (0.4049) data time 0.0010 (0.0017) model time 0.3958 (0.4027) loss 7.0241 (7.6885) grad_norm 1.4947 (1.9872) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][550/625] eta 0:00:30 lr 0.001017 wd 0.0500 time 0.3993 (0.4048) data time 0.0010 (0.0017) model time 0.3982 (0.4026) loss 7.1382 (7.6835) grad_norm 1.3495 (1.9839) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][560/625] eta 0:00:26 lr 0.001017 wd 0.0500 time 0.4001 (0.4047) data time 0.0008 (0.0017) model time 0.3993 (0.4025) loss 7.2630 (7.6851) grad_norm 2.1141 (1.9802) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][570/625] eta 0:00:22 lr 0.001017 wd 0.0500 time 0.3983 (0.4046) data time 0.0011 (0.0017) model time 0.3973 (0.4025) loss 8.2524 (7.6994) grad_norm 1.2631 (1.9786) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][580/625] eta 0:00:18 lr 0.001017 wd 0.0500 time 0.3969 (0.4046) data time 0.0008 (0.0016) model time 0.3961 (0.4024) loss 8.6991 (7.7026) grad_norm 1.7389 (1.9789) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][590/625] eta 0:00:14 lr 0.001017 wd 0.0500 time 0.3974 (0.4048) data time 0.0006 (0.0016) model time 0.3967 (0.4027) loss 8.1018 (7.7031) grad_norm 2.1769 (1.9769) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][600/625] eta 0:00:10 lr 0.001017 wd 0.0500 time 0.4020 (0.4047) data time 0.0007 (0.0016) model time 0.4014 (0.4026) loss 6.6009 (7.7043) grad_norm 1.2933 (1.9748) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][610/625] eta 0:00:06 lr 0.001017 wd 0.0500 time 0.3938 (0.4046) data time 0.0005 (0.0016) model time 0.3934 (0.4025) loss 7.5226 (7.7042) grad_norm 1.9795 (1.9844) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][620/625] eta 0:00:02 lr 0.001017 wd 0.0500 time 0.3962 (0.4045) data time 0.0004 (0.0016) model time 0.3957 (0.4024) loss 10.4551 (7.7072) grad_norm 3.1068 (1.9905) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 91 training takes 0:04:12 [2024-07-24 22:37:42 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:37:43 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:37:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.6646 (0.6646) Acc@1 86.963 (86.963) Acc@5 97.803 (97.803) Mem 14939MB [2024-07-24 22:37:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 1.0537 (0.8018) Acc@1 76.611 (83.376) Acc@5 94.189 (96.839) Mem 14939MB [2024-07-24 22:37:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.2148 (0.9588) Acc@1 72.461 (79.434) Acc@5 92.773 (95.022) Mem 14939MB [2024-07-24 22:37:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.171 Acc@5 94.954 [2024-07-24 22:37:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-24 22:37:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.808 (0.808) Loss 0.6270 (0.6270) Acc@1 87.451 (87.451) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-24 22:37:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 1.0127 (0.7719) Acc@1 77.832 (84.109) Acc@5 94.678 (97.035) Mem 14939MB [2024-07-24 22:37:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 1.1592 (0.9164) Acc@1 72.607 (80.304) Acc@5 93.164 (95.424) Mem 14939MB [2024-07-24 22:37:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.982 Acc@5 95.393 [2024-07-24 22:37:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-07-24 22:37:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.98% [2024-07-24 22:37:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:37:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:37:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][0/625] eta 0:08:29 lr 0.001016 wd 0.0500 time 0.8157 (0.8157) data time 0.4402 (0.4402) model time 0.0000 (0.0000) loss 8.4883 (8.4883) grad_norm 2.1915 (2.1915) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][10/625] eta 0:04:28 lr 0.001016 wd 0.0500 time 0.3968 (0.4367) data time 0.0006 (0.0408) model time 0.0000 (0.0000) loss 8.1954 (7.9836) grad_norm 2.0236 (1.9206) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:37:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][20/625] eta 0:04:12 lr 0.001016 wd 0.0500 time 0.3951 (0.4180) data time 0.0009 (0.0218) model time 0.0000 (0.0000) loss 7.6913 (7.8571) grad_norm 2.4660 (1.8121) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][30/625] eta 0:04:04 lr 0.001016 wd 0.0500 time 0.3996 (0.4116) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 6.2602 (7.7710) grad_norm 2.8544 (1.8424) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][40/625] eta 0:03:58 lr 0.001016 wd 0.0500 time 0.4004 (0.4085) data time 0.0009 (0.0116) model time 0.0000 (0.0000) loss 6.9541 (7.7189) grad_norm 1.5312 (1.9050) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][50/625] eta 0:03:54 lr 0.001016 wd 0.0500 time 0.3972 (0.4073) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 7.7606 (7.6558) grad_norm 2.6153 (1.9810) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][60/625] eta 0:03:49 lr 0.001016 wd 0.0500 time 0.4009 (0.4064) data time 0.0007 (0.0081) model time 0.4002 (0.4010) loss 8.9131 (7.6905) grad_norm 1.9481 (2.0075) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][70/625] eta 0:03:46 lr 0.001016 wd 0.0500 time 0.5488 (0.4079) data time 0.0008 (0.0071) model time 0.5480 (0.4084) loss 8.2247 (7.7183) grad_norm 2.1854 (2.0255) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][80/625] eta 0:03:46 lr 0.001016 wd 0.0500 time 0.3942 (0.4151) data time 0.0006 (0.0063) model time 0.3936 (0.4274) loss 6.7087 (7.6862) grad_norm 2.9278 (2.0083) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][90/625] eta 0:03:41 lr 0.001016 wd 0.0500 time 0.3959 (0.4133) data time 0.0007 (0.0057) model time 0.3953 (0.4200) loss 8.7974 (7.7221) grad_norm 2.5281 (2.0683) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][100/625] eta 0:03:36 lr 0.001016 wd 0.0500 time 0.3959 (0.4118) data time 0.0006 (0.0053) model time 0.3953 (0.4155) loss 6.9109 (7.7510) grad_norm 3.8076 (2.1444) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][110/625] eta 0:03:32 lr 0.001016 wd 0.0500 time 0.4129 (0.4122) data time 0.0008 (0.0049) model time 0.4121 (0.4155) loss 7.9482 (7.7267) grad_norm 3.6580 (2.1561) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][120/625] eta 0:03:27 lr 0.001016 wd 0.0500 time 0.3933 (0.4111) data time 0.0006 (0.0045) model time 0.3926 (0.4129) loss 6.4124 (7.7137) grad_norm 1.4815 (2.1351) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][130/625] eta 0:03:23 lr 0.001015 wd 0.0500 time 0.3972 (0.4101) data time 0.0009 (0.0043) model time 0.3963 (0.4110) loss 8.6203 (7.7424) grad_norm 1.8059 (2.0873) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][140/625] eta 0:03:18 lr 0.001015 wd 0.0500 time 0.4017 (0.4095) data time 0.0009 (0.0041) model time 0.4008 (0.4097) loss 8.4396 (7.7242) grad_norm 1.9765 (2.0797) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][150/625] eta 0:03:14 lr 0.001015 wd 0.0500 time 0.3966 (0.4087) data time 0.0010 (0.0039) model time 0.3956 (0.4084) loss 8.3231 (7.6967) grad_norm 1.7497 (2.0486) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][160/625] eta 0:03:09 lr 0.001015 wd 0.0500 time 0.4053 (0.4082) data time 0.0007 (0.0037) model time 0.4046 (0.4076) loss 8.2993 (7.6943) grad_norm 1.8604 (2.0371) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][170/625] eta 0:03:05 lr 0.001015 wd 0.0500 time 0.3961 (0.4077) data time 0.0007 (0.0035) model time 0.3955 (0.4068) loss 8.5188 (7.6924) grad_norm 2.5061 (2.0253) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][180/625] eta 0:03:01 lr 0.001015 wd 0.0500 time 0.3969 (0.4072) data time 0.0008 (0.0034) model time 0.3961 (0.4062) loss 8.2399 (7.6606) grad_norm 2.1292 (2.0136) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][190/625] eta 0:02:56 lr 0.001015 wd 0.0500 time 0.4002 (0.4068) data time 0.0007 (0.0033) model time 0.3995 (0.4056) loss 6.8072 (7.6567) grad_norm 2.0438 (2.0174) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][200/625] eta 0:02:52 lr 0.001015 wd 0.0500 time 0.3983 (0.4064) data time 0.0006 (0.0031) model time 0.3977 (0.4052) loss 7.7189 (7.6589) grad_norm 1.3881 (1.9995) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][210/625] eta 0:02:48 lr 0.001015 wd 0.0500 time 0.4008 (0.4061) data time 0.0007 (0.0030) model time 0.4001 (0.4048) loss 7.9810 (7.6624) grad_norm 2.9053 (1.9991) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][220/625] eta 0:02:44 lr 0.001015 wd 0.0500 time 0.4045 (0.4059) data time 0.0006 (0.0029) model time 0.4039 (0.4045) loss 8.1121 (7.6548) grad_norm 1.4415 (1.9916) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][230/625] eta 0:02:40 lr 0.001015 wd 0.0500 time 0.4013 (0.4057) data time 0.0006 (0.0028) model time 0.4007 (0.4043) loss 9.0017 (7.6522) grad_norm 2.1899 (1.9883) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][240/625] eta 0:02:36 lr 0.001015 wd 0.0500 time 0.4080 (0.4055) data time 0.0006 (0.0028) model time 0.4074 (0.4041) loss 9.0479 (7.6600) grad_norm 3.5113 (2.0095) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][250/625] eta 0:02:32 lr 0.001015 wd 0.0500 time 0.3997 (0.4054) data time 0.0007 (0.0027) model time 0.3990 (0.4039) loss 6.6373 (7.6534) grad_norm 1.9860 (2.0124) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][260/625] eta 0:02:27 lr 0.001014 wd 0.0500 time 0.3966 (0.4051) data time 0.0007 (0.0026) model time 0.3960 (0.4036) loss 7.4428 (7.6520) grad_norm 1.9491 (2.0087) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][270/625] eta 0:02:23 lr 0.001014 wd 0.0500 time 0.3991 (0.4049) data time 0.0006 (0.0026) model time 0.3985 (0.4033) loss 8.6407 (7.6681) grad_norm 1.6578 (2.0034) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][280/625] eta 0:02:19 lr 0.001014 wd 0.0500 time 0.4005 (0.4047) data time 0.0009 (0.0025) model time 0.3996 (0.4032) loss 7.6794 (7.6659) grad_norm 1.6651 (2.0005) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][290/625] eta 0:02:15 lr 0.001014 wd 0.0500 time 0.3981 (0.4045) data time 0.0007 (0.0024) model time 0.3974 (0.4030) loss 7.3659 (7.6771) grad_norm 1.5875 (2.0119) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][300/625] eta 0:02:12 lr 0.001014 wd 0.0500 time 0.3965 (0.4062) data time 0.0007 (0.0024) model time 0.3958 (0.4051) loss 6.0582 (7.6895) grad_norm 1.5938 (2.0046) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:39:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][310/625] eta 0:02:07 lr 0.001014 wd 0.0500 time 0.4012 (0.4060) data time 0.0007 (0.0023) model time 0.4005 (0.4048) loss 6.1353 (7.6810) grad_norm 1.6132 (2.0066) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 22:40:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][320/625] eta 0:02:03 lr 0.001014 wd 0.0500 time 0.3959 (0.4058) data time 0.0009 (0.0023) model time 0.3951 (0.4045) loss 7.4717 (7.6984) grad_norm 1.9216 (1.9967) loss_scale 4096.0000 (2067.1402) mem 14939MB [2024-07-24 22:40:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][330/625] eta 0:01:59 lr 0.001014 wd 0.0500 time 0.3973 (0.4060) data time 0.0007 (0.0023) model time 0.3966 (0.4048) loss 7.5264 (7.7152) grad_norm 1.5217 (1.9829) loss_scale 4096.0000 (2128.4350) mem 14939MB [2024-07-24 22:40:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][340/625] eta 0:01:55 lr 0.001014 wd 0.0500 time 0.3977 (0.4058) data time 0.0009 (0.0022) model time 0.3968 (0.4046) loss 7.6033 (7.7239) grad_norm 1.7848 (1.9813) loss_scale 4096.0000 (2186.1349) mem 14939MB [2024-07-24 22:40:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][350/625] eta 0:01:51 lr 0.001014 wd 0.0500 time 0.3950 (0.4056) data time 0.0007 (0.0022) model time 0.3944 (0.4044) loss 8.9787 (7.7230) grad_norm 2.2364 (1.9934) loss_scale 4096.0000 (2240.5470) mem 14939MB [2024-07-24 22:40:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][360/625] eta 0:01:47 lr 0.001014 wd 0.0500 time 0.3995 (0.4054) data time 0.0008 (0.0022) model time 0.3987 (0.4042) loss 8.1074 (7.7318) grad_norm 2.2890 (2.0045) loss_scale 4096.0000 (2291.9446) mem 14939MB [2024-07-24 22:40:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][370/625] eta 0:01:43 lr 0.001014 wd 0.0500 time 0.4006 (0.4053) data time 0.0007 (0.0021) model time 0.3998 (0.4040) loss 7.8821 (7.7291) grad_norm 2.6640 (2.0258) loss_scale 4096.0000 (2340.5714) mem 14939MB [2024-07-24 22:40:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][380/625] eta 0:01:39 lr 0.001014 wd 0.0500 time 0.4018 (0.4052) data time 0.0008 (0.0021) model time 0.4011 (0.4039) loss 7.8568 (7.7351) grad_norm 2.1089 (2.0274) loss_scale 4096.0000 (2386.6457) mem 14939MB [2024-07-24 22:40:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][390/625] eta 0:01:35 lr 0.001013 wd 0.0500 time 0.4049 (0.4051) data time 0.0008 (0.0021) model time 0.4041 (0.4037) loss 8.4833 (7.7400) grad_norm 1.5091 (2.0495) loss_scale 4096.0000 (2430.3632) mem 14939MB [2024-07-24 22:40:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][400/625] eta 0:01:31 lr 0.001013 wd 0.0500 time 0.4030 (0.4050) data time 0.0010 (0.0021) model time 0.4020 (0.4036) loss 7.1074 (7.7380) grad_norm 1.7340 (2.0502) loss_scale 4096.0000 (2471.9002) mem 14939MB [2024-07-24 22:40:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][410/625] eta 0:01:27 lr 0.001013 wd 0.0500 time 0.3995 (0.4049) data time 0.0007 (0.0021) model time 0.3988 (0.4035) loss 7.8831 (7.7377) grad_norm 2.1485 (2.0522) loss_scale 4096.0000 (2511.4161) mem 14939MB [2024-07-24 22:40:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][420/625] eta 0:01:22 lr 0.001013 wd 0.0500 time 0.3989 (0.4048) data time 0.0007 (0.0020) model time 0.3982 (0.4035) loss 8.7860 (7.7378) grad_norm 2.5344 (2.0537) loss_scale 4096.0000 (2549.0546) mem 14939MB [2024-07-24 22:40:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][430/625] eta 0:01:18 lr 0.001013 wd 0.0500 time 0.4004 (0.4047) data time 0.0009 (0.0020) model time 0.3995 (0.4033) loss 8.7768 (7.7442) grad_norm 1.7371 (2.0498) loss_scale 4096.0000 (2584.9466) mem 14939MB [2024-07-24 22:40:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][440/625] eta 0:01:14 lr 0.001013 wd 0.0500 time 0.3975 (0.4046) data time 0.0008 (0.0020) model time 0.3967 (0.4032) loss 7.2617 (7.7495) grad_norm 2.6727 (2.0546) loss_scale 4096.0000 (2619.2109) mem 14939MB [2024-07-24 22:40:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][450/625] eta 0:01:10 lr 0.001013 wd 0.0500 time 0.3990 (0.4045) data time 0.0009 (0.0020) model time 0.3981 (0.4031) loss 8.7515 (7.7546) grad_norm 2.2188 (2.0597) loss_scale 4096.0000 (2651.9557) mem 14939MB [2024-07-24 22:40:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][460/625] eta 0:01:06 lr 0.001013 wd 0.0500 time 0.4009 (0.4044) data time 0.0008 (0.0019) model time 0.4001 (0.4030) loss 8.3946 (7.7565) grad_norm 1.8030 (2.0641) loss_scale 4096.0000 (2683.2798) mem 14939MB [2024-07-24 22:41:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][470/625] eta 0:01:02 lr 0.001013 wd 0.0500 time 0.4015 (0.4043) data time 0.0008 (0.0019) model time 0.4007 (0.4030) loss 7.1149 (7.7600) grad_norm 1.6655 (2.0627) loss_scale 4096.0000 (2713.2739) mem 14939MB [2024-07-24 22:41:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][480/625] eta 0:00:58 lr 0.001013 wd 0.0500 time 0.3964 (0.4042) data time 0.0007 (0.0019) model time 0.3957 (0.4029) loss 6.1493 (7.7595) grad_norm 1.4623 (2.0560) loss_scale 4096.0000 (2742.0208) mem 14939MB [2024-07-24 22:41:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][490/625] eta 0:00:54 lr 0.001013 wd 0.0500 time 0.4001 (0.4042) data time 0.0009 (0.0019) model time 0.3992 (0.4028) loss 8.6906 (7.7584) grad_norm 1.5164 (2.0470) loss_scale 4096.0000 (2769.5967) mem 14939MB [2024-07-24 22:41:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][500/625] eta 0:00:50 lr 0.001013 wd 0.0500 time 0.4107 (0.4042) data time 0.0009 (0.0019) model time 0.4099 (0.4028) loss 7.7026 (7.7660) grad_norm 2.3370 (2.0447) loss_scale 4096.0000 (2796.0719) mem 14939MB [2024-07-24 22:41:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][510/625] eta 0:00:46 lr 0.001013 wd 0.0500 time 0.5938 (0.4046) data time 0.0009 (0.0019) model time 0.5929 (0.4032) loss 8.2067 (7.7682) grad_norm 1.3856 (2.0371) loss_scale 4096.0000 (2821.5108) mem 14939MB [2024-07-24 22:41:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][520/625] eta 0:00:42 lr 0.001012 wd 0.0500 time 0.3973 (0.4055) data time 0.0009 (0.0019) model time 0.3964 (0.4042) loss 6.1850 (7.7585) grad_norm 2.2034 (2.0359) loss_scale 4096.0000 (2845.9731) mem 14939MB [2024-07-24 22:41:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][530/625] eta 0:00:38 lr 0.001012 wd 0.0500 time 0.3988 (0.4054) data time 0.0009 (0.0019) model time 0.3979 (0.4041) loss 8.6584 (7.7566) grad_norm 2.0325 (2.0377) loss_scale 4096.0000 (2869.5141) mem 14939MB [2024-07-24 22:41:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][540/625] eta 0:00:34 lr 0.001012 wd 0.0500 time 0.3979 (0.4053) data time 0.0009 (0.0019) model time 0.3970 (0.4040) loss 7.8421 (7.7644) grad_norm 1.3864 (2.0387) loss_scale 4096.0000 (2892.1848) mem 14939MB [2024-07-24 22:41:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][550/625] eta 0:00:30 lr 0.001012 wd 0.0500 time 0.5511 (0.4054) data time 0.0007 (0.0019) model time 0.5504 (0.4042) loss 8.8585 (7.7606) grad_norm 1.7570 (2.0401) loss_scale 4096.0000 (2914.0327) mem 14939MB [2024-07-24 22:41:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][560/625] eta 0:00:26 lr 0.001012 wd 0.0500 time 0.4154 (0.4054) data time 0.0009 (0.0018) model time 0.4146 (0.4041) loss 8.3035 (7.7669) grad_norm 2.7210 (2.0420) loss_scale 4096.0000 (2935.1016) mem 14939MB [2024-07-24 22:41:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][570/625] eta 0:00:22 lr 0.001012 wd 0.0500 time 0.4005 (0.4053) data time 0.0008 (0.0018) model time 0.3997 (0.4040) loss 8.2286 (7.7620) grad_norm 1.6098 (2.0393) loss_scale 4096.0000 (2955.4326) mem 14939MB [2024-07-24 22:41:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][580/625] eta 0:00:18 lr 0.001012 wd 0.0500 time 0.3990 (0.4052) data time 0.0007 (0.0018) model time 0.3983 (0.4039) loss 5.8783 (7.7624) grad_norm 2.1274 (2.0348) loss_scale 4096.0000 (2975.0637) mem 14939MB [2024-07-24 22:41:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][590/625] eta 0:00:14 lr 0.001012 wd 0.0500 time 0.4120 (0.4051) data time 0.0007 (0.0018) model time 0.4113 (0.4038) loss 6.9186 (7.7545) grad_norm 1.6700 (2.0393) loss_scale 4096.0000 (2994.0305) mem 14939MB [2024-07-24 22:41:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][600/625] eta 0:00:10 lr 0.001012 wd 0.0500 time 0.3976 (0.4050) data time 0.0006 (0.0018) model time 0.3969 (0.4037) loss 7.6419 (7.7529) grad_norm 1.5397 (2.0366) loss_scale 4096.0000 (3012.3661) mem 14939MB [2024-07-24 22:41:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][610/625] eta 0:00:06 lr 0.001012 wd 0.0500 time 0.3958 (0.4049) data time 0.0004 (0.0018) model time 0.3954 (0.4036) loss 9.0049 (7.7522) grad_norm 1.6779 (2.0389) loss_scale 4096.0000 (3030.1015) mem 14939MB [2024-07-24 22:42:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][620/625] eta 0:00:02 lr 0.001012 wd 0.0500 time 0.4000 (0.4048) data time 0.0004 (0.0017) model time 0.3996 (0.4035) loss 8.1152 (7.7492) grad_norm 1.5585 (2.0366) loss_scale 4096.0000 (3047.2657) mem 14939MB [2024-07-24 22:42:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 92 training takes 0:04:12 [2024-07-24 22:42:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:42:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:42:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.6558 (0.6558) Acc@1 87.305 (87.305) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-24 22:42:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 1.0576 (0.8153) Acc@1 77.100 (83.323) Acc@5 94.141 (96.862) Mem 14939MB [2024-07-24 22:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.2031 (0.9671) Acc@1 72.607 (79.432) Acc@5 92.432 (95.029) Mem 14939MB [2024-07-24 22:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.167 Acc@5 94.974 [2024-07-24 22:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-07-24 22:42:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.733 (0.733) Loss 0.6250 (0.6250) Acc@1 87.451 (87.451) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-24 22:42:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.150) Loss 1.0107 (0.7702) Acc@1 77.881 (84.140) Acc@5 94.678 (97.035) Mem 14939MB [2024-07-24 22:42:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.120) Loss 1.1572 (0.9144) Acc@1 72.607 (80.357) Acc@5 93.359 (95.433) Mem 14939MB [2024-07-24 22:42:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.026 Acc@5 95.397 [2024-07-24 22:42:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-07-24 22:42:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.03% [2024-07-24 22:42:09 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:42:10 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][0/625] eta 0:08:00 lr 0.001012 wd 0.0500 time 0.7684 (0.7684) data time 0.3882 (0.3882) model time 0.0000 (0.0000) loss 8.7232 (8.7232) grad_norm 2.8387 (2.8387) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][10/625] eta 0:04:25 lr 0.001012 wd 0.0500 time 0.3969 (0.4323) data time 0.0008 (0.0360) model time 0.0000 (0.0000) loss 8.0324 (7.3865) grad_norm 2.2251 (1.8067) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][20/625] eta 0:04:11 lr 0.001011 wd 0.0500 time 0.3971 (0.4163) data time 0.0006 (0.0193) model time 0.0000 (0.0000) loss 7.6172 (7.4449) grad_norm 1.7617 (1.9010) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][30/625] eta 0:04:04 lr 0.001011 wd 0.0500 time 0.4043 (0.4113) data time 0.0009 (0.0133) model time 0.0000 (0.0000) loss 6.2827 (7.3875) grad_norm 1.4899 (1.8706) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][40/625] eta 0:03:58 lr 0.001011 wd 0.0500 time 0.3991 (0.4083) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 7.6601 (7.4740) grad_norm 1.6182 (1.8652) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][50/625] eta 0:03:53 lr 0.001011 wd 0.0500 time 0.3984 (0.4068) data time 0.0008 (0.0085) model time 0.0000 (0.0000) loss 7.3172 (7.5271) grad_norm 2.3331 (1.9057) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][60/625] eta 0:03:49 lr 0.001011 wd 0.0500 time 0.4007 (0.4057) data time 0.0009 (0.0072) model time 0.3998 (0.3989) loss 7.8400 (7.5635) grad_norm 1.7616 (1.9175) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][70/625] eta 0:03:44 lr 0.001011 wd 0.0500 time 0.3992 (0.4050) data time 0.0006 (0.0063) model time 0.3986 (0.3995) loss 8.2375 (7.5860) grad_norm 1.9313 (1.9196) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][80/625] eta 0:03:40 lr 0.001011 wd 0.0500 time 0.3993 (0.4044) data time 0.0007 (0.0057) model time 0.3986 (0.3993) loss 6.8097 (7.5906) grad_norm 1.6374 (1.9234) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][90/625] eta 0:03:37 lr 0.001011 wd 0.0500 time 0.3983 (0.4060) data time 0.0007 (0.0052) model time 0.3976 (0.4040) loss 7.6899 (7.5813) grad_norm 1.8799 (1.9527) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][100/625] eta 0:03:32 lr 0.001011 wd 0.0500 time 0.3943 (0.4053) data time 0.0009 (0.0047) model time 0.3934 (0.4027) loss 8.2497 (7.5921) grad_norm 1.5824 (1.9527) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:42:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][110/625] eta 0:03:30 lr 0.001011 wd 0.0500 time 0.5986 (0.4095) data time 0.0009 (0.0044) model time 0.5977 (0.4108) loss 7.5765 (7.5973) grad_norm 2.6849 (1.9718) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][120/625] eta 0:03:28 lr 0.001011 wd 0.0500 time 0.4023 (0.4120) data time 0.0006 (0.0041) model time 0.4017 (0.4148) loss 9.3316 (7.6222) grad_norm 2.1347 (1.9738) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][130/625] eta 0:03:23 lr 0.001011 wd 0.0500 time 0.4007 (0.4110) data time 0.0007 (0.0039) model time 0.4000 (0.4128) loss 7.7975 (7.6586) grad_norm 2.4902 (1.9970) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][140/625] eta 0:03:18 lr 0.001011 wd 0.0500 time 0.3962 (0.4103) data time 0.0009 (0.0037) model time 0.3953 (0.4113) loss 6.4961 (7.6894) grad_norm 1.8543 (2.0533) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][150/625] eta 0:03:14 lr 0.001010 wd 0.0500 time 0.3999 (0.4096) data time 0.0009 (0.0035) model time 0.3990 (0.4100) loss 7.9699 (7.6633) grad_norm 1.5672 (2.0519) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][160/625] eta 0:03:10 lr 0.001010 wd 0.0500 time 0.4012 (0.4089) data time 0.0009 (0.0033) model time 0.4003 (0.4089) loss 7.1717 (7.6600) grad_norm 1.8170 (2.0381) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][170/625] eta 0:03:05 lr 0.001010 wd 0.0500 time 0.3977 (0.4083) data time 0.0007 (0.0032) model time 0.3971 (0.4080) loss 8.6007 (7.6474) grad_norm 1.8465 (2.0279) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][180/625] eta 0:03:01 lr 0.001010 wd 0.0500 time 0.4013 (0.4079) data time 0.0007 (0.0031) model time 0.4006 (0.4074) loss 6.2298 (7.6429) grad_norm 1.3842 (2.0166) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][190/625] eta 0:02:57 lr 0.001010 wd 0.0500 time 0.3997 (0.4074) data time 0.0008 (0.0030) model time 0.3989 (0.4066) loss 7.6422 (7.6172) grad_norm 1.7933 (2.0016) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][200/625] eta 0:02:52 lr 0.001010 wd 0.0500 time 0.3944 (0.4069) data time 0.0006 (0.0029) model time 0.3937 (0.4060) loss 7.5175 (7.6303) grad_norm 1.5824 (1.9876) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][210/625] eta 0:02:48 lr 0.001010 wd 0.0500 time 0.4038 (0.4066) data time 0.0009 (0.0028) model time 0.4029 (0.4056) loss 8.0134 (7.6246) grad_norm 2.5157 (1.9916) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][220/625] eta 0:02:44 lr 0.001010 wd 0.0500 time 0.3980 (0.4062) data time 0.0009 (0.0027) model time 0.3972 (0.4051) loss 8.3293 (7.6251) grad_norm 1.6349 (1.9921) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][230/625] eta 0:02:40 lr 0.001010 wd 0.0500 time 0.3984 (0.4058) data time 0.0007 (0.0026) model time 0.3977 (0.4046) loss 7.4415 (7.6294) grad_norm 1.5086 (2.0043) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][240/625] eta 0:02:36 lr 0.001010 wd 0.0500 time 0.3979 (0.4055) data time 0.0007 (0.0025) model time 0.3972 (0.4043) loss 7.2027 (7.6454) grad_norm 1.9398 (2.0040) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][250/625] eta 0:02:31 lr 0.001010 wd 0.0500 time 0.4020 (0.4053) data time 0.0007 (0.0025) model time 0.4013 (0.4040) loss 6.8519 (7.6525) grad_norm 1.5455 (2.0010) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][260/625] eta 0:02:27 lr 0.001010 wd 0.0500 time 0.4019 (0.4051) data time 0.0009 (0.0024) model time 0.4011 (0.4038) loss 8.6399 (7.6613) grad_norm 1.6052 (1.9956) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:43:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][270/625] eta 0:02:23 lr 0.001010 wd 0.0500 time 0.4008 (0.4049) data time 0.0009 (0.0024) model time 0.3999 (0.4035) loss 7.3818 (7.6635) grad_norm 1.4850 (1.9813) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][280/625] eta 0:02:19 lr 0.001009 wd 0.0500 time 0.3998 (0.4048) data time 0.0007 (0.0023) model time 0.3991 (0.4034) loss 6.1726 (7.6583) grad_norm 1.9027 (1.9737) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][290/625] eta 0:02:15 lr 0.001009 wd 0.0500 time 0.3989 (0.4047) data time 0.0006 (0.0023) model time 0.3983 (0.4033) loss 7.1016 (7.6599) grad_norm 2.4424 (1.9766) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][300/625] eta 0:02:11 lr 0.001009 wd 0.0500 time 0.3991 (0.4045) data time 0.0007 (0.0022) model time 0.3984 (0.4031) loss 6.6291 (7.6594) grad_norm 1.7301 (1.9811) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][310/625] eta 0:02:07 lr 0.001009 wd 0.0500 time 0.4015 (0.4048) data time 0.0007 (0.0022) model time 0.4008 (0.4035) loss 5.7455 (7.6623) grad_norm 1.6063 (1.9819) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][320/625] eta 0:02:03 lr 0.001009 wd 0.0500 time 0.4027 (0.4047) data time 0.0008 (0.0021) model time 0.4018 (0.4034) loss 7.4502 (7.6609) grad_norm 1.8528 (1.9834) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][330/625] eta 0:01:59 lr 0.001009 wd 0.0500 time 0.5874 (0.4062) data time 0.0009 (0.0021) model time 0.5866 (0.4051) loss 8.9796 (7.6702) grad_norm 1.9290 (1.9791) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][340/625] eta 0:01:55 lr 0.001009 wd 0.0500 time 0.3979 (0.4070) data time 0.0008 (0.0021) model time 0.3971 (0.4061) loss 8.1007 (7.6766) grad_norm 3.7896 (1.9829) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][350/625] eta 0:01:51 lr 0.001009 wd 0.0500 time 0.3958 (0.4067) data time 0.0009 (0.0020) model time 0.3950 (0.4058) loss 8.3110 (7.6701) grad_norm 3.0576 (1.9989) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][360/625] eta 0:01:47 lr 0.001009 wd 0.0500 time 0.4020 (0.4065) data time 0.0007 (0.0020) model time 0.4013 (0.4056) loss 7.2501 (7.6681) grad_norm 1.9455 (2.0030) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][370/625] eta 0:01:43 lr 0.001009 wd 0.0500 time 0.3947 (0.4063) data time 0.0009 (0.0020) model time 0.3938 (0.4053) loss 7.0541 (7.6674) grad_norm 2.4083 (2.0154) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][380/625] eta 0:01:39 lr 0.001009 wd 0.0500 time 0.3943 (0.4061) data time 0.0009 (0.0019) model time 0.3934 (0.4050) loss 7.7568 (7.6768) grad_norm 2.4046 (2.0218) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][390/625] eta 0:01:35 lr 0.001009 wd 0.0500 time 0.3965 (0.4059) data time 0.0009 (0.0019) model time 0.3956 (0.4048) loss 8.1352 (7.6682) grad_norm 2.5990 (2.0196) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][400/625] eta 0:01:31 lr 0.001009 wd 0.0500 time 0.3993 (0.4057) data time 0.0008 (0.0019) model time 0.3984 (0.4046) loss 7.8051 (7.6801) grad_norm 2.4615 (2.0170) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:44:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][410/625] eta 0:01:27 lr 0.001008 wd 0.0500 time 0.3994 (0.4056) data time 0.0009 (0.0019) model time 0.3985 (0.4045) loss 7.1647 (7.6848) grad_norm 1.6376 (2.0125) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][420/625] eta 0:01:23 lr 0.001008 wd 0.0500 time 0.4026 (0.4054) data time 0.0007 (0.0018) model time 0.4019 (0.4043) loss 6.6858 (7.6794) grad_norm 2.0002 (2.0187) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][430/625] eta 0:01:19 lr 0.001008 wd 0.0500 time 0.4003 (0.4053) data time 0.0007 (0.0018) model time 0.3996 (0.4041) loss 6.4851 (7.6822) grad_norm 3.8409 (2.0276) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][440/625] eta 0:01:14 lr 0.001008 wd 0.0500 time 0.4041 (0.4051) data time 0.0008 (0.0018) model time 0.4032 (0.4040) loss 8.5085 (7.6901) grad_norm 1.5018 (2.0276) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][450/625] eta 0:01:10 lr 0.001008 wd 0.0500 time 0.4015 (0.4050) data time 0.0006 (0.0018) model time 0.4008 (0.4039) loss 7.5155 (7.6954) grad_norm 2.0197 (2.0264) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][460/625] eta 0:01:06 lr 0.001008 wd 0.0500 time 0.4009 (0.4049) data time 0.0009 (0.0018) model time 0.4000 (0.4037) loss 6.5841 (7.6950) grad_norm 1.5903 (2.0191) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][470/625] eta 0:01:02 lr 0.001008 wd 0.0500 time 0.3984 (0.4047) data time 0.0006 (0.0017) model time 0.3978 (0.4035) loss 7.0534 (7.6919) grad_norm 1.8531 (2.0139) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][480/625] eta 0:00:58 lr 0.001008 wd 0.0500 time 0.3983 (0.4046) data time 0.0007 (0.0017) model time 0.3976 (0.4034) loss 8.1971 (7.7039) grad_norm 2.7904 (2.0134) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][490/625] eta 0:00:54 lr 0.001008 wd 0.0500 time 0.4003 (0.4045) data time 0.0009 (0.0017) model time 0.3995 (0.4033) loss 8.2852 (7.7086) grad_norm 1.3347 (2.0115) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][500/625] eta 0:00:50 lr 0.001008 wd 0.0500 time 0.4013 (0.4043) data time 0.0007 (0.0017) model time 0.4006 (0.4031) loss 6.6630 (7.7032) grad_norm 1.8057 (2.0024) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][510/625] eta 0:00:46 lr 0.001008 wd 0.0500 time 0.4019 (0.4043) data time 0.0007 (0.0017) model time 0.4012 (0.4030) loss 5.7313 (7.6999) grad_norm 2.9071 (2.0084) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][520/625] eta 0:00:42 lr 0.001008 wd 0.0500 time 0.4208 (0.4043) data time 0.0008 (0.0017) model time 0.4199 (0.4031) loss 8.3988 (7.7046) grad_norm 2.3553 (2.0027) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][530/625] eta 0:00:38 lr 0.001008 wd 0.0500 time 0.4000 (0.4044) data time 0.0009 (0.0017) model time 0.3992 (0.4032) loss 8.4946 (7.7110) grad_norm 1.3036 (2.0036) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][540/625] eta 0:00:34 lr 0.001007 wd 0.0500 time 0.3999 (0.4043) data time 0.0009 (0.0016) model time 0.3990 (0.4032) loss 9.1164 (7.7056) grad_norm 3.4099 (2.0080) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][550/625] eta 0:00:30 lr 0.001007 wd 0.0500 time 0.3980 (0.4049) data time 0.0010 (0.0016) model time 0.3970 (0.4038) loss 8.5852 (7.7148) grad_norm 2.6667 (2.0161) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:45:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][560/625] eta 0:00:26 lr 0.001007 wd 0.0500 time 0.3989 (0.4051) data time 0.0009 (0.0016) model time 0.3980 (0.4040) loss 8.2921 (7.7163) grad_norm 1.7310 (2.0166) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][570/625] eta 0:00:22 lr 0.001007 wd 0.0500 time 0.4011 (0.4051) data time 0.0007 (0.0016) model time 0.4004 (0.4040) loss 7.7386 (7.7163) grad_norm 2.2163 (2.0294) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][580/625] eta 0:00:18 lr 0.001007 wd 0.0500 time 0.3973 (0.4050) data time 0.0007 (0.0016) model time 0.3967 (0.4039) loss 9.0490 (7.7167) grad_norm 1.9337 (2.0332) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][590/625] eta 0:00:14 lr 0.001007 wd 0.0500 time 0.4003 (0.4049) data time 0.0006 (0.0016) model time 0.3996 (0.4038) loss 8.0793 (7.7149) grad_norm 1.3994 (2.0312) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][600/625] eta 0:00:10 lr 0.001007 wd 0.0500 time 0.3998 (0.4048) data time 0.0007 (0.0016) model time 0.3991 (0.4037) loss 9.1077 (7.7178) grad_norm 1.4436 (2.0285) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][610/625] eta 0:00:06 lr 0.001007 wd 0.0500 time 0.4001 (0.4047) data time 0.0005 (0.0016) model time 0.3996 (0.4036) loss 8.7856 (7.7192) grad_norm 1.5045 (2.0223) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][620/625] eta 0:00:02 lr 0.001007 wd 0.0500 time 0.3988 (0.4046) data time 0.0006 (0.0015) model time 0.3982 (0.4035) loss 7.4267 (7.7176) grad_norm 1.9918 (2.0172) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 93 training takes 0:04:12 [2024-07-24 22:46:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:46:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:46:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 0.6187 (0.6187) Acc@1 87.988 (87.988) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-24 22:46:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 1.0225 (0.7846) Acc@1 77.441 (83.638) Acc@5 94.092 (96.777) Mem 14939MB [2024-07-24 22:46:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.1465 (0.9323) Acc@1 73.242 (79.983) Acc@5 92.773 (95.024) Mem 14939MB [2024-07-24 22:46:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.615 Acc@5 95.030 [2024-07-24 22:46:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-24 22:46:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.61% [2024-07-24 22:46:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 22:46:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 22:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.6240 (0.6240) Acc@1 87.451 (87.451) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-24 22:46:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 1.0078 (0.7682) Acc@1 77.979 (84.162) Acc@5 94.629 (97.030) Mem 14939MB [2024-07-24 22:46:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.1533 (0.9120) Acc@1 72.559 (80.399) Acc@5 93.408 (95.429) Mem 14939MB [2024-07-24 22:46:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.062 Acc@5 95.397 [2024-07-24 22:46:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-24 22:46:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.06% [2024-07-24 22:46:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:46:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:46:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][0/625] eta 0:20:37 lr 0.001007 wd 0.0500 time 1.9805 (1.9805) data time 1.3630 (1.3630) model time 0.0000 (0.0000) loss 7.2188 (7.2188) grad_norm 2.4892 (2.4892) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][10/625] eta 0:05:33 lr 0.001007 wd 0.0500 time 0.3956 (0.5424) data time 0.0009 (0.1247) model time 0.0000 (0.0000) loss 9.0018 (7.9244) grad_norm 2.8838 (2.1175) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][20/625] eta 0:04:47 lr 0.001007 wd 0.0500 time 0.3988 (0.4747) data time 0.0006 (0.0658) model time 0.0000 (0.0000) loss 8.1265 (7.9582) grad_norm 1.7920 (1.9880) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][30/625] eta 0:04:30 lr 0.001007 wd 0.0500 time 0.4063 (0.4547) data time 0.0009 (0.0448) model time 0.0000 (0.0000) loss 7.1235 (7.8809) grad_norm 2.8373 (2.2192) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][40/625] eta 0:04:18 lr 0.001006 wd 0.0500 time 0.3976 (0.4414) data time 0.0008 (0.0341) model time 0.0000 (0.0000) loss 7.9190 (7.8942) grad_norm 2.4513 (2.3000) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][50/625] eta 0:04:09 lr 0.001006 wd 0.0500 time 0.3983 (0.4332) data time 0.0007 (0.0276) model time 0.0000 (0.0000) loss 6.4215 (7.8833) grad_norm 1.5422 (2.1711) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:46:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][60/625] eta 0:04:01 lr 0.001006 wd 0.0500 time 0.4001 (0.4279) data time 0.0007 (0.0232) model time 0.3994 (0.3994) loss 7.9121 (7.8217) grad_norm 2.3548 (2.0963) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][70/625] eta 0:03:55 lr 0.001006 wd 0.0500 time 0.3978 (0.4240) data time 0.0009 (0.0201) model time 0.3969 (0.3995) loss 7.9812 (7.7451) grad_norm 1.4599 (2.0469) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][80/625] eta 0:03:49 lr 0.001006 wd 0.0500 time 0.3978 (0.4210) data time 0.0007 (0.0177) model time 0.3971 (0.3991) loss 8.7762 (7.7733) grad_norm 1.6102 (2.0406) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][90/625] eta 0:03:44 lr 0.001006 wd 0.0500 time 0.4033 (0.4188) data time 0.0008 (0.0159) model time 0.4025 (0.3993) loss 7.6511 (7.7822) grad_norm 2.1347 (2.0344) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][100/625] eta 0:03:38 lr 0.001006 wd 0.0500 time 0.3959 (0.4168) data time 0.0006 (0.0144) model time 0.3953 (0.3992) loss 6.8996 (7.8035) grad_norm 2.2258 (2.0268) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][110/625] eta 0:03:33 lr 0.001006 wd 0.0500 time 0.3986 (0.4155) data time 0.0009 (0.0132) model time 0.3978 (0.3994) loss 6.7660 (7.7697) grad_norm 2.2952 (2.0229) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][120/625] eta 0:03:29 lr 0.001006 wd 0.0500 time 0.4001 (0.4144) data time 0.0008 (0.0122) model time 0.3994 (0.3997) loss 7.0998 (7.7836) grad_norm 1.5942 (2.0202) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][130/625] eta 0:03:24 lr 0.001006 wd 0.0500 time 0.3979 (0.4133) data time 0.0007 (0.0113) model time 0.3972 (0.3997) loss 6.7344 (7.7722) grad_norm 1.6172 (1.9996) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][140/625] eta 0:03:20 lr 0.001006 wd 0.0500 time 0.3999 (0.4124) data time 0.0009 (0.0106) model time 0.3990 (0.3997) loss 8.5018 (7.7838) grad_norm 1.9790 (2.0033) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][150/625] eta 0:03:18 lr 0.001006 wd 0.0500 time 0.6018 (0.4174) data time 0.0008 (0.0099) model time 0.6009 (0.4083) loss 8.2804 (7.8103) grad_norm 1.7866 (1.9880) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][160/625] eta 0:03:13 lr 0.001005 wd 0.0500 time 0.4015 (0.4163) data time 0.0008 (0.0094) model time 0.4007 (0.4075) loss 7.6890 (7.8228) grad_norm 1.7095 (1.9917) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][170/625] eta 0:03:08 lr 0.001005 wd 0.0500 time 0.3982 (0.4153) data time 0.0009 (0.0089) model time 0.3973 (0.4068) loss 7.9585 (7.8034) grad_norm 1.5492 (1.9823) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][180/625] eta 0:03:04 lr 0.001005 wd 0.0500 time 0.4005 (0.4145) data time 0.0009 (0.0084) model time 0.3996 (0.4062) loss 7.5187 (7.8155) grad_norm 1.5298 (1.9818) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][190/625] eta 0:02:59 lr 0.001005 wd 0.0500 time 0.3966 (0.4137) data time 0.0009 (0.0080) model time 0.3957 (0.4057) loss 8.8530 (7.8008) grad_norm 1.5542 (1.9618) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][200/625] eta 0:02:55 lr 0.001005 wd 0.0500 time 0.3985 (0.4130) data time 0.0007 (0.0077) model time 0.3978 (0.4052) loss 8.1543 (7.8112) grad_norm 1.5352 (1.9462) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][210/625] eta 0:02:51 lr 0.001005 wd 0.0500 time 0.3999 (0.4124) data time 0.0010 (0.0074) model time 0.3990 (0.4048) loss 7.0942 (7.7824) grad_norm 2.1815 (1.9369) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][220/625] eta 0:02:46 lr 0.001005 wd 0.0500 time 0.3978 (0.4119) data time 0.0009 (0.0071) model time 0.3969 (0.4045) loss 8.1026 (7.7885) grad_norm 2.1233 (1.9313) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][230/625] eta 0:02:42 lr 0.001005 wd 0.0500 time 0.4000 (0.4115) data time 0.0006 (0.0068) model time 0.3994 (0.4044) loss 6.0865 (7.7668) grad_norm 1.8290 (1.9245) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][240/625] eta 0:02:38 lr 0.001005 wd 0.0500 time 0.4039 (0.4110) data time 0.0008 (0.0066) model time 0.4031 (0.4041) loss 6.2684 (7.7541) grad_norm 2.6371 (1.9217) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][250/625] eta 0:02:34 lr 0.001005 wd 0.0500 time 0.3997 (0.4114) data time 0.0009 (0.0063) model time 0.3988 (0.4049) loss 8.9406 (7.7650) grad_norm 1.9251 (1.9166) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][260/625] eta 0:02:30 lr 0.001005 wd 0.0500 time 0.3979 (0.4110) data time 0.0007 (0.0061) model time 0.3973 (0.4047) loss 7.5707 (7.7605) grad_norm 2.0299 (1.9308) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][270/625] eta 0:02:25 lr 0.001005 wd 0.0500 time 0.3991 (0.4106) data time 0.0008 (0.0059) model time 0.3983 (0.4044) loss 8.0147 (7.7559) grad_norm 1.7980 (1.9328) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][280/625] eta 0:02:21 lr 0.001005 wd 0.0500 time 0.4002 (0.4102) data time 0.0006 (0.0058) model time 0.3996 (0.4041) loss 7.0580 (7.7514) grad_norm 2.8486 (1.9372) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][290/625] eta 0:02:17 lr 0.001004 wd 0.0500 time 0.4018 (0.4098) data time 0.0007 (0.0056) model time 0.4011 (0.4039) loss 7.7159 (7.7459) grad_norm 2.2296 (1.9402) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][300/625] eta 0:02:13 lr 0.001004 wd 0.0500 time 0.4106 (0.4095) data time 0.0006 (0.0054) model time 0.4099 (0.4038) loss 9.2455 (7.7525) grad_norm 2.3194 (1.9471) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][310/625] eta 0:02:08 lr 0.001004 wd 0.0500 time 0.4018 (0.4092) data time 0.0008 (0.0053) model time 0.4011 (0.4036) loss 8.5400 (7.7448) grad_norm 2.9553 (1.9698) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][320/625] eta 0:02:04 lr 0.001004 wd 0.0500 time 0.4026 (0.4090) data time 0.0006 (0.0052) model time 0.4019 (0.4035) loss 7.3923 (7.7293) grad_norm 1.7040 (1.9722) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][330/625] eta 0:02:00 lr 0.001004 wd 0.0500 time 0.3977 (0.4087) data time 0.0007 (0.0050) model time 0.3970 (0.4033) loss 7.7422 (7.7133) grad_norm 1.4430 (1.9622) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][340/625] eta 0:01:56 lr 0.001004 wd 0.0500 time 0.4001 (0.4085) data time 0.0008 (0.0049) model time 0.3993 (0.4032) loss 8.7985 (7.7226) grad_norm 1.4386 (1.9508) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][350/625] eta 0:01:52 lr 0.001004 wd 0.0500 time 0.4053 (0.4083) data time 0.0008 (0.0048) model time 0.4045 (0.4031) loss 8.0799 (7.7269) grad_norm 1.6370 (1.9515) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:48:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][360/625] eta 0:01:48 lr 0.001004 wd 0.0500 time 0.6379 (0.4087) data time 0.0009 (0.0047) model time 0.6370 (0.4037) loss 8.4851 (7.7173) grad_norm 2.5514 (1.9585) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][370/625] eta 0:01:44 lr 0.001004 wd 0.0500 time 0.5975 (0.4105) data time 0.0009 (0.0046) model time 0.5965 (0.4060) loss 8.1585 (7.7132) grad_norm 2.2060 (1.9671) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][380/625] eta 0:01:40 lr 0.001004 wd 0.0500 time 0.4006 (0.4102) data time 0.0008 (0.0045) model time 0.3998 (0.4058) loss 7.4816 (7.7122) grad_norm 3.0391 (1.9735) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][390/625] eta 0:01:36 lr 0.001004 wd 0.0500 time 0.3976 (0.4100) data time 0.0007 (0.0044) model time 0.3969 (0.4056) loss 9.3305 (7.7112) grad_norm 2.3465 (1.9747) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][400/625] eta 0:01:32 lr 0.001004 wd 0.0500 time 0.4056 (0.4097) data time 0.0008 (0.0043) model time 0.4048 (0.4054) loss 8.1214 (7.7192) grad_norm 1.7191 (1.9740) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][410/625] eta 0:01:28 lr 0.001004 wd 0.0500 time 0.4015 (0.4095) data time 0.0007 (0.0042) model time 0.4008 (0.4052) loss 8.4662 (7.7129) grad_norm 2.8739 (1.9739) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][420/625] eta 0:01:23 lr 0.001003 wd 0.0500 time 0.4018 (0.4093) data time 0.0007 (0.0042) model time 0.4011 (0.4051) loss 6.9626 (7.7079) grad_norm 1.6442 (1.9748) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][430/625] eta 0:01:19 lr 0.001003 wd 0.0500 time 0.3965 (0.4091) data time 0.0010 (0.0041) model time 0.3955 (0.4050) loss 8.3491 (7.7064) grad_norm 1.5632 (1.9751) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][440/625] eta 0:01:15 lr 0.001003 wd 0.0500 time 0.4011 (0.4090) data time 0.0006 (0.0040) model time 0.4005 (0.4048) loss 7.8887 (7.7146) grad_norm 1.3426 (1.9689) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][450/625] eta 0:01:11 lr 0.001003 wd 0.0500 time 0.3976 (0.4088) data time 0.0009 (0.0039) model time 0.3967 (0.4047) loss 7.5858 (7.7188) grad_norm 1.5934 (1.9790) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][460/625] eta 0:01:07 lr 0.001003 wd 0.0500 time 0.3955 (0.4085) data time 0.0007 (0.0039) model time 0.3949 (0.4045) loss 7.3245 (7.7263) grad_norm 2.3770 (1.9843) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][470/625] eta 0:01:03 lr 0.001003 wd 0.0500 time 0.4059 (0.4088) data time 0.0008 (0.0038) model time 0.4051 (0.4049) loss 8.7180 (7.7305) grad_norm 1.6915 (1.9771) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][480/625] eta 0:00:59 lr 0.001003 wd 0.0500 time 0.3979 (0.4086) data time 0.0009 (0.0038) model time 0.3970 (0.4047) loss 7.2105 (7.7310) grad_norm 1.4515 (1.9772) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][490/625] eta 0:00:55 lr 0.001003 wd 0.0500 time 0.4012 (0.4084) data time 0.0007 (0.0037) model time 0.4004 (0.4046) loss 7.0381 (7.7290) grad_norm 2.2544 (1.9792) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][500/625] eta 0:00:51 lr 0.001003 wd 0.0500 time 0.3975 (0.4082) data time 0.0010 (0.0036) model time 0.3966 (0.4044) loss 6.7955 (7.7242) grad_norm 1.6285 (1.9753) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][510/625] eta 0:00:46 lr 0.001003 wd 0.0500 time 0.4003 (0.4080) data time 0.0007 (0.0036) model time 0.3996 (0.4043) loss 7.7781 (7.7236) grad_norm 1.5736 (1.9722) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][520/625] eta 0:00:42 lr 0.001003 wd 0.0500 time 0.3979 (0.4078) data time 0.0007 (0.0035) model time 0.3973 (0.4041) loss 7.4299 (7.7249) grad_norm 2.2213 (1.9728) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][530/625] eta 0:00:38 lr 0.001003 wd 0.0500 time 0.4099 (0.4077) data time 0.0009 (0.0035) model time 0.4090 (0.4041) loss 6.9398 (7.7217) grad_norm 2.2294 (1.9719) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][540/625] eta 0:00:34 lr 0.001002 wd 0.0500 time 0.4012 (0.4076) data time 0.0006 (0.0034) model time 0.4006 (0.4040) loss 8.7836 (7.7239) grad_norm 2.1937 (1.9786) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][550/625] eta 0:00:30 lr 0.001002 wd 0.0500 time 0.4000 (0.4074) data time 0.0009 (0.0034) model time 0.3992 (0.4038) loss 7.4617 (7.7180) grad_norm 1.8302 (1.9747) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][560/625] eta 0:00:26 lr 0.001002 wd 0.0500 time 0.3975 (0.4073) data time 0.0010 (0.0034) model time 0.3965 (0.4037) loss 8.5768 (7.7208) grad_norm 2.3617 (1.9714) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][570/625] eta 0:00:22 lr 0.001002 wd 0.0500 time 0.3968 (0.4071) data time 0.0008 (0.0033) model time 0.3960 (0.4036) loss 8.4146 (7.7228) grad_norm 1.8444 (1.9815) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][580/625] eta 0:00:18 lr 0.001002 wd 0.0500 time 0.4074 (0.4070) data time 0.0008 (0.0033) model time 0.4066 (0.4035) loss 7.3029 (7.7147) grad_norm 1.3696 (1.9752) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][590/625] eta 0:00:14 lr 0.001002 wd 0.0500 time 0.3979 (0.4084) data time 0.0009 (0.0032) model time 0.3970 (0.4052) loss 8.8434 (7.7217) grad_norm 1.2679 (1.9732) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][600/625] eta 0:00:10 lr 0.001002 wd 0.0500 time 0.3989 (0.4083) data time 0.0009 (0.0032) model time 0.3980 (0.4051) loss 7.9567 (7.7265) grad_norm 1.4837 (1.9703) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][610/625] eta 0:00:06 lr 0.001002 wd 0.0500 time 0.3984 (0.4082) data time 0.0006 (0.0032) model time 0.3978 (0.4050) loss 6.4312 (7.7215) grad_norm 2.0552 (1.9688) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][620/625] eta 0:00:02 lr 0.001002 wd 0.0500 time 0.3972 (0.4080) data time 0.0004 (0.0031) model time 0.3968 (0.4048) loss 9.4817 (7.7273) grad_norm 2.3265 (1.9716) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 94 training takes 0:04:14 [2024-07-24 22:50:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:50:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:50:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.444 (0.444) Loss 0.6187 (0.6187) Acc@1 87.061 (87.061) Acc@5 97.900 (97.900) Mem 14939MB [2024-07-24 22:50:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 1.0352 (0.7658) Acc@1 76.611 (83.545) Acc@5 94.287 (96.933) Mem 14939MB [2024-07-24 22:50:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.1445 (0.9178) Acc@1 72.852 (79.822) Acc@5 93.311 (95.240) Mem 14939MB [2024-07-24 22:50:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.549 Acc@5 95.232 [2024-07-24 22:50:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.5% [2024-07-24 22:50:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.735 (0.735) Loss 0.6226 (0.6226) Acc@1 87.451 (87.451) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-24 22:50:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.157) Loss 1.0049 (0.7665) Acc@1 78.027 (84.193) Acc@5 94.629 (97.017) Mem 14939MB [2024-07-24 22:50:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 1.1523 (0.9099) Acc@1 72.705 (80.415) Acc@5 93.506 (95.433) Mem 14939MB [2024-07-24 22:50:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.072 Acc@5 95.405 [2024-07-24 22:50:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-24 22:50:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.07% [2024-07-24 22:50:51 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:50:52 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:50:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][0/625] eta 0:10:10 lr 0.001002 wd 0.0500 time 0.9761 (0.9761) data time 0.3728 (0.3728) model time 0.0000 (0.0000) loss 6.4297 (6.4297) grad_norm 1.8502 (1.8502) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:50:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][10/625] eta 0:04:38 lr 0.001002 wd 0.0500 time 0.4149 (0.4524) data time 0.0010 (0.0349) model time 0.0000 (0.0000) loss 7.8344 (7.5151) grad_norm 1.5224 (2.2042) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][20/625] eta 0:04:18 lr 0.001002 wd 0.0500 time 0.4054 (0.4277) data time 0.0008 (0.0188) model time 0.0000 (0.0000) loss 6.5887 (7.5985) grad_norm 2.1704 (2.3252) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][30/625] eta 0:04:09 lr 0.001002 wd 0.0500 time 0.4007 (0.4189) data time 0.0009 (0.0131) model time 0.0000 (0.0000) loss 8.1483 (7.6942) grad_norm 2.3738 (2.3018) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][40/625] eta 0:04:02 lr 0.001001 wd 0.0500 time 0.4013 (0.4141) data time 0.0007 (0.0101) model time 0.0000 (0.0000) loss 9.3137 (7.7006) grad_norm 1.9331 (2.1606) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][50/625] eta 0:03:56 lr 0.001001 wd 0.0500 time 0.4049 (0.4114) data time 0.0008 (0.0083) model time 0.0000 (0.0000) loss 7.2926 (7.7886) grad_norm 1.2548 (2.0569) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][60/625] eta 0:03:51 lr 0.001001 wd 0.0500 time 0.4030 (0.4096) data time 0.0007 (0.0071) model time 0.4024 (0.3993) loss 8.0611 (7.7692) grad_norm 1.6182 (1.9803) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][70/625] eta 0:03:46 lr 0.001001 wd 0.0500 time 0.4055 (0.4082) data time 0.0007 (0.0062) model time 0.4048 (0.3990) loss 8.3712 (7.7722) grad_norm 1.3527 (2.0715) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][80/625] eta 0:03:41 lr 0.001001 wd 0.0500 time 0.3960 (0.4071) data time 0.0006 (0.0056) model time 0.3954 (0.3989) loss 6.6870 (7.7696) grad_norm 1.9889 (2.1340) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][90/625] eta 0:03:37 lr 0.001001 wd 0.0500 time 0.4163 (0.4064) data time 0.0008 (0.0051) model time 0.4155 (0.3990) loss 8.7685 (7.7764) grad_norm 1.5793 (2.1108) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][100/625] eta 0:03:33 lr 0.001001 wd 0.0500 time 0.3936 (0.4058) data time 0.0007 (0.0046) model time 0.3929 (0.3992) loss 6.5140 (7.7739) grad_norm 1.9119 (2.1343) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][110/625] eta 0:03:28 lr 0.001001 wd 0.0500 time 0.4020 (0.4055) data time 0.0009 (0.0043) model time 0.4011 (0.3996) loss 8.6225 (7.7768) grad_norm 2.1226 (2.1556) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][120/625] eta 0:03:24 lr 0.001001 wd 0.0500 time 0.3969 (0.4049) data time 0.0006 (0.0040) model time 0.3962 (0.3993) loss 7.8130 (7.7952) grad_norm 1.9151 (2.1300) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][130/625] eta 0:03:20 lr 0.001001 wd 0.0500 time 0.3984 (0.4044) data time 0.0006 (0.0038) model time 0.3978 (0.3991) loss 7.2941 (7.7790) grad_norm 1.6056 (2.1114) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][140/625] eta 0:03:15 lr 0.001001 wd 0.0500 time 0.3961 (0.4040) data time 0.0009 (0.0036) model time 0.3952 (0.3989) loss 8.0150 (7.7813) grad_norm 1.8582 (2.1097) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][150/625] eta 0:03:11 lr 0.001001 wd 0.0500 time 0.3969 (0.4037) data time 0.0008 (0.0034) model time 0.3961 (0.3989) loss 8.0743 (7.7516) grad_norm 2.0658 (2.0885) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:51:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][160/625] eta 0:03:07 lr 0.001001 wd 0.0500 time 0.4054 (0.4035) data time 0.0007 (0.0033) model time 0.4047 (0.3990) loss 9.0440 (7.7399) grad_norm 1.8146 (2.0715) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][170/625] eta 0:03:03 lr 0.001000 wd 0.0500 time 0.3966 (0.4032) data time 0.0009 (0.0031) model time 0.3957 (0.3988) loss 8.3687 (7.7226) grad_norm 1.5903 (2.0457) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][180/625] eta 0:03:00 lr 0.001000 wd 0.0500 time 0.5867 (0.4062) data time 0.0008 (0.0030) model time 0.5859 (0.4033) loss 7.7346 (7.7459) grad_norm 1.7279 (2.0419) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][190/625] eta 0:02:57 lr 0.001000 wd 0.0500 time 0.4018 (0.4082) data time 0.0007 (0.0029) model time 0.4012 (0.4060) loss 6.4580 (7.7281) grad_norm 2.0822 (2.0271) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][200/625] eta 0:02:53 lr 0.001000 wd 0.0500 time 0.4024 (0.4078) data time 0.0007 (0.0028) model time 0.4017 (0.4057) loss 7.8369 (7.7182) grad_norm 1.9369 (2.0217) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][210/625] eta 0:02:49 lr 0.001000 wd 0.0500 time 0.4026 (0.4076) data time 0.0007 (0.0027) model time 0.4019 (0.4054) loss 6.9617 (7.7225) grad_norm 4.5211 (2.0583) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][220/625] eta 0:02:44 lr 0.001000 wd 0.0500 time 0.4028 (0.4072) data time 0.0008 (0.0026) model time 0.4020 (0.4051) loss 8.2975 (7.7108) grad_norm 1.6352 (2.0790) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][230/625] eta 0:02:40 lr 0.001000 wd 0.0500 time 0.3981 (0.4075) data time 0.0010 (0.0026) model time 0.3971 (0.4054) loss 8.6622 (7.7215) grad_norm 1.8240 (2.0665) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][240/625] eta 0:02:36 lr 0.001000 wd 0.0500 time 0.3969 (0.4071) data time 0.0008 (0.0025) model time 0.3961 (0.4051) loss 8.7651 (7.7242) grad_norm 1.5238 (2.0653) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][250/625] eta 0:02:32 lr 0.001000 wd 0.0500 time 0.4005 (0.4069) data time 0.0008 (0.0024) model time 0.3996 (0.4048) loss 9.2292 (7.7317) grad_norm 5.1672 (2.0802) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][260/625] eta 0:02:28 lr 0.001000 wd 0.0500 time 0.3971 (0.4068) data time 0.0007 (0.0024) model time 0.3965 (0.4048) loss 8.4188 (7.7236) grad_norm 2.1135 (2.0783) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][270/625] eta 0:02:24 lr 0.001000 wd 0.0500 time 0.4009 (0.4066) data time 0.0008 (0.0023) model time 0.4001 (0.4045) loss 8.7066 (7.7289) grad_norm 1.5210 (2.0744) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][280/625] eta 0:02:20 lr 0.001000 wd 0.0500 time 0.3999 (0.4064) data time 0.0006 (0.0023) model time 0.3992 (0.4043) loss 9.0599 (7.7333) grad_norm 1.5115 (2.1059) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][290/625] eta 0:02:16 lr 0.000999 wd 0.0500 time 0.4020 (0.4061) data time 0.0007 (0.0022) model time 0.4014 (0.4041) loss 9.2607 (7.7390) grad_norm 2.3293 (2.1069) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][300/625] eta 0:02:11 lr 0.000999 wd 0.0500 time 0.3998 (0.4061) data time 0.0010 (0.0022) model time 0.3988 (0.4041) loss 7.0997 (7.7544) grad_norm 1.4553 (2.1053) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:52:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][310/625] eta 0:02:07 lr 0.000999 wd 0.0500 time 0.4029 (0.4059) data time 0.0007 (0.0021) model time 0.4023 (0.4039) loss 7.4850 (7.7492) grad_norm 1.4617 (2.0984) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][320/625] eta 0:02:03 lr 0.000999 wd 0.0500 time 0.4011 (0.4058) data time 0.0009 (0.0021) model time 0.4001 (0.4038) loss 7.5084 (7.7404) grad_norm 2.1111 (2.0868) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][330/625] eta 0:01:59 lr 0.000999 wd 0.0500 time 0.3998 (0.4056) data time 0.0007 (0.0021) model time 0.3991 (0.4036) loss 6.5174 (7.7397) grad_norm 1.6090 (2.0756) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][340/625] eta 0:01:55 lr 0.000999 wd 0.0500 time 0.4038 (0.4055) data time 0.0007 (0.0020) model time 0.4030 (0.4035) loss 8.2788 (7.7349) grad_norm 1.7409 (2.0710) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][350/625] eta 0:01:51 lr 0.000999 wd 0.0500 time 0.4024 (0.4053) data time 0.0008 (0.0020) model time 0.4016 (0.4034) loss 8.4149 (7.7269) grad_norm 2.4742 (2.0705) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][360/625] eta 0:01:47 lr 0.000999 wd 0.0500 time 0.4014 (0.4052) data time 0.0009 (0.0020) model time 0.4005 (0.4033) loss 7.0952 (7.7328) grad_norm 2.5972 (2.0771) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][370/625] eta 0:01:43 lr 0.000999 wd 0.0500 time 0.4196 (0.4052) data time 0.0008 (0.0019) model time 0.4188 (0.4033) loss 7.8216 (7.7278) grad_norm 2.0238 (2.0874) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][380/625] eta 0:01:39 lr 0.000999 wd 0.0500 time 0.3982 (0.4051) data time 0.0009 (0.0019) model time 0.3973 (0.4032) loss 7.5900 (7.7175) grad_norm 1.7279 (2.0914) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][390/625] eta 0:01:35 lr 0.000999 wd 0.0500 time 0.3988 (0.4050) data time 0.0006 (0.0019) model time 0.3981 (0.4031) loss 7.4980 (7.7142) grad_norm 1.8651 (2.0871) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][400/625] eta 0:01:31 lr 0.000999 wd 0.0500 time 0.3976 (0.4062) data time 0.0009 (0.0019) model time 0.3967 (0.4045) loss 6.6879 (7.7158) grad_norm 1.4914 (2.0839) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][410/625] eta 0:01:27 lr 0.000999 wd 0.0500 time 0.4056 (0.4073) data time 0.0009 (0.0018) model time 0.4046 (0.4058) loss 8.2032 (7.7167) grad_norm 2.1018 (2.0817) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][420/625] eta 0:01:23 lr 0.000998 wd 0.0500 time 0.4108 (0.4072) data time 0.0008 (0.0018) model time 0.4100 (0.4057) loss 7.9849 (7.7195) grad_norm 2.9057 (2.0862) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][430/625] eta 0:01:19 lr 0.000998 wd 0.0500 time 0.4067 (0.4071) data time 0.0007 (0.0018) model time 0.4060 (0.4056) loss 7.4555 (7.7183) grad_norm 2.3431 (2.0968) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][440/625] eta 0:01:15 lr 0.000998 wd 0.0500 time 0.4010 (0.4069) data time 0.0007 (0.0018) model time 0.4003 (0.4054) loss 8.9986 (7.7199) grad_norm 1.9822 (2.0990) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:53:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][450/625] eta 0:01:11 lr 0.000998 wd 0.0500 time 0.3972 (0.4071) data time 0.0009 (0.0018) model time 0.3963 (0.4056) loss 8.2072 (7.7155) grad_norm 2.2474 (2.0949) loss_scale 8192.0000 (4168.6563) mem 14939MB [2024-07-24 22:54:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][460/625] eta 0:01:07 lr 0.000998 wd 0.0500 time 0.4295 (0.4070) data time 0.0006 (0.0017) model time 0.4289 (0.4055) loss 9.2149 (7.7208) grad_norm 1.8607 (inf) loss_scale 4096.0000 (4184.8503) mem 14939MB [2024-07-24 22:54:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][470/625] eta 0:01:03 lr 0.000998 wd 0.0500 time 0.3998 (0.4068) data time 0.0008 (0.0017) model time 0.3990 (0.4053) loss 7.6543 (7.7272) grad_norm 2.1381 (inf) loss_scale 4096.0000 (4182.9639) mem 14939MB [2024-07-24 22:54:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][480/625] eta 0:00:58 lr 0.000998 wd 0.0500 time 0.4036 (0.4067) data time 0.0009 (0.0017) model time 0.4027 (0.4052) loss 6.7085 (7.7205) grad_norm 1.9894 (inf) loss_scale 4096.0000 (4181.1559) mem 14939MB [2024-07-24 22:54:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][490/625] eta 0:00:54 lr 0.000998 wd 0.0500 time 0.4010 (0.4065) data time 0.0007 (0.0017) model time 0.4003 (0.4050) loss 8.0545 (7.7242) grad_norm 2.2176 (inf) loss_scale 4096.0000 (4179.4216) mem 14939MB [2024-07-24 22:54:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][500/625] eta 0:00:50 lr 0.000998 wd 0.0500 time 0.3977 (0.4065) data time 0.0007 (0.0017) model time 0.3970 (0.4050) loss 6.7595 (7.7195) grad_norm 2.4190 (inf) loss_scale 4096.0000 (4177.7565) mem 14939MB [2024-07-24 22:54:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][510/625] eta 0:00:46 lr 0.000998 wd 0.0500 time 0.4018 (0.4064) data time 0.0007 (0.0017) model time 0.4011 (0.4049) loss 7.5319 (7.7223) grad_norm 2.1933 (inf) loss_scale 4096.0000 (4176.1566) mem 14939MB [2024-07-24 22:54:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][520/625] eta 0:00:42 lr 0.000998 wd 0.0500 time 0.4067 (0.4076) data time 0.0006 (0.0017) model time 0.4061 (0.4063) loss 8.0844 (7.7306) grad_norm 2.3864 (inf) loss_scale 4096.0000 (4174.6180) mem 14939MB [2024-07-24 22:54:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][530/625] eta 0:00:38 lr 0.000998 wd 0.0500 time 0.3950 (0.4075) data time 0.0009 (0.0016) model time 0.3940 (0.4062) loss 9.0615 (7.7355) grad_norm 3.2691 (inf) loss_scale 4096.0000 (4173.1375) mem 14939MB [2024-07-24 22:54:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][540/625] eta 0:00:34 lr 0.000997 wd 0.0500 time 0.3932 (0.4075) data time 0.0007 (0.0016) model time 0.3925 (0.4061) loss 8.6182 (7.7411) grad_norm 1.8735 (inf) loss_scale 4096.0000 (4171.7116) mem 14939MB [2024-07-24 22:54:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][550/625] eta 0:00:30 lr 0.000997 wd 0.0500 time 0.4033 (0.4074) data time 0.0006 (0.0016) model time 0.4027 (0.4060) loss 6.6901 (7.7326) grad_norm 1.9346 (inf) loss_scale 4096.0000 (4170.3376) mem 14939MB [2024-07-24 22:54:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][560/625] eta 0:00:26 lr 0.000997 wd 0.0500 time 0.4103 (0.4073) data time 0.0006 (0.0016) model time 0.4097 (0.4059) loss 7.9412 (7.7334) grad_norm 1.4484 (inf) loss_scale 4096.0000 (4169.0125) mem 14939MB [2024-07-24 22:54:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][570/625] eta 0:00:22 lr 0.000997 wd 0.0500 time 0.4038 (0.4072) data time 0.0009 (0.0016) model time 0.4030 (0.4059) loss 8.4596 (7.7314) grad_norm 1.6358 (inf) loss_scale 4096.0000 (4167.7338) mem 14939MB [2024-07-24 22:54:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][580/625] eta 0:00:18 lr 0.000997 wd 0.0500 time 0.4036 (0.4071) data time 0.0008 (0.0016) model time 0.4028 (0.4058) loss 8.7365 (7.7321) grad_norm 2.1802 (inf) loss_scale 4096.0000 (4166.4991) mem 14939MB [2024-07-24 22:54:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][590/625] eta 0:00:14 lr 0.000997 wd 0.0500 time 0.4022 (0.4071) data time 0.0006 (0.0016) model time 0.4016 (0.4057) loss 7.0257 (7.7285) grad_norm 2.0224 (inf) loss_scale 4096.0000 (4165.3063) mem 14939MB [2024-07-24 22:54:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][600/625] eta 0:00:10 lr 0.000997 wd 0.0500 time 0.4076 (0.4070) data time 0.0008 (0.0016) model time 0.4068 (0.4056) loss 6.9730 (7.7289) grad_norm 2.1744 (inf) loss_scale 4096.0000 (4164.1531) mem 14939MB [2024-07-24 22:55:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][610/625] eta 0:00:06 lr 0.000997 wd 0.0500 time 0.4051 (0.4069) data time 0.0006 (0.0016) model time 0.4045 (0.4055) loss 6.1361 (7.7221) grad_norm 2.5344 (inf) loss_scale 4096.0000 (4163.0376) mem 14939MB [2024-07-24 22:55:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][620/625] eta 0:00:02 lr 0.000997 wd 0.0500 time 0.3949 (0.4075) data time 0.0004 (0.0015) model time 0.3945 (0.4062) loss 6.4685 (7.7199) grad_norm 2.0409 (inf) loss_scale 4096.0000 (4161.9581) mem 14939MB [2024-07-24 22:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 95 training takes 0:04:14 [2024-07-24 22:55:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:55:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:55:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.610 (0.610) Loss 0.6338 (0.6338) Acc@1 87.646 (87.646) Acc@5 97.998 (97.998) Mem 14939MB [2024-07-24 22:55:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.134) Loss 1.0625 (0.7751) Acc@1 77.197 (83.825) Acc@5 94.043 (96.919) Mem 14939MB [2024-07-24 22:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.111) Loss 1.2070 (0.9278) Acc@1 71.094 (80.008) Acc@5 92.627 (95.129) Mem 14939MB [2024-07-24 22:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.687 Acc@5 95.136 [2024-07-24 22:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-07-24 22:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.69% [2024-07-24 22:55:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 22:55:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 22:55:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.504 (0.504) Loss 0.6201 (0.6201) Acc@1 87.549 (87.549) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-24 22:55:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.125) Loss 1.0049 (0.7648) Acc@1 78.076 (84.251) Acc@5 94.629 (97.044) Mem 14939MB [2024-07-24 22:55:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.106) Loss 1.1475 (0.9079) Acc@1 72.852 (80.497) Acc@5 93.652 (95.450) Mem 14939MB [2024-07-24 22:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.156 Acc@5 95.419 [2024-07-24 22:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-07-24 22:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.16% [2024-07-24 22:55:15 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:55:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:55:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][0/625] eta 0:12:34 lr 0.000997 wd 0.0500 time 1.2067 (1.2067) data time 0.8278 (0.8278) model time 0.0000 (0.0000) loss 7.9454 (7.9454) grad_norm 3.1425 (3.1425) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][10/625] eta 0:05:01 lr 0.000997 wd 0.0500 time 0.3969 (0.4908) data time 0.0007 (0.0762) model time 0.0000 (0.0000) loss 7.6984 (7.7288) grad_norm 1.6341 (2.0757) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][20/625] eta 0:04:32 lr 0.000997 wd 0.0500 time 0.4158 (0.4501) data time 0.0012 (0.0403) model time 0.0000 (0.0000) loss 6.7177 (7.6465) grad_norm 2.2198 (2.0035) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][30/625] eta 0:04:18 lr 0.000997 wd 0.0500 time 0.4039 (0.4341) data time 0.0008 (0.0276) model time 0.0000 (0.0000) loss 8.4026 (7.7742) grad_norm 3.1682 (2.0146) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][40/625] eta 0:04:09 lr 0.000996 wd 0.0500 time 0.4021 (0.4263) data time 0.0008 (0.0211) model time 0.0000 (0.0000) loss 7.5566 (7.8059) grad_norm 2.1969 (2.0503) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][50/625] eta 0:04:02 lr 0.000996 wd 0.0500 time 0.4031 (0.4217) data time 0.0008 (0.0172) model time 0.0000 (0.0000) loss 8.1997 (7.7723) grad_norm 1.4951 (2.0399) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][60/625] eta 0:03:56 lr 0.000996 wd 0.0500 time 0.4014 (0.4187) data time 0.0009 (0.0145) model time 0.4005 (0.4026) loss 6.3743 (7.7653) grad_norm 1.5276 (2.0748) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][70/625] eta 0:03:51 lr 0.000996 wd 0.0500 time 0.4031 (0.4165) data time 0.0009 (0.0126) model time 0.4022 (0.4022) loss 8.0469 (7.7859) grad_norm 1.7161 (2.0557) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][80/625] eta 0:03:46 lr 0.000996 wd 0.0500 time 0.4055 (0.4147) data time 0.0006 (0.0112) model time 0.4049 (0.4018) loss 7.6028 (7.8235) grad_norm 2.8219 (2.0698) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][90/625] eta 0:03:41 lr 0.000996 wd 0.0500 time 0.4061 (0.4133) data time 0.0008 (0.0100) model time 0.4053 (0.4016) loss 7.5745 (7.8143) grad_norm 2.8824 (2.0861) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:55:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][100/625] eta 0:03:36 lr 0.000996 wd 0.0500 time 0.4095 (0.4119) data time 0.0006 (0.0091) model time 0.4088 (0.4011) loss 8.4698 (7.8205) grad_norm 3.1806 (2.0660) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][110/625] eta 0:03:35 lr 0.000996 wd 0.0500 time 0.4189 (0.4193) data time 0.0007 (0.0084) model time 0.4182 (0.4164) loss 8.2453 (7.7671) grad_norm 2.2476 (2.0567) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][120/625] eta 0:03:30 lr 0.000996 wd 0.0500 time 0.3983 (0.4178) data time 0.0007 (0.0078) model time 0.3977 (0.4140) loss 7.8549 (7.7758) grad_norm 1.7595 (2.0686) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][130/625] eta 0:03:26 lr 0.000996 wd 0.0500 time 0.3990 (0.4171) data time 0.0010 (0.0073) model time 0.3980 (0.4132) loss 7.0541 (7.7929) grad_norm 1.7565 (2.0894) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][140/625] eta 0:03:21 lr 0.000996 wd 0.0500 time 0.4120 (0.4160) data time 0.0008 (0.0068) model time 0.4111 (0.4118) loss 8.0453 (7.7933) grad_norm 1.7992 (2.0880) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][150/625] eta 0:03:17 lr 0.000996 wd 0.0500 time 0.3984 (0.4150) data time 0.0008 (0.0064) model time 0.3976 (0.4106) loss 7.9476 (7.7996) grad_norm 1.9910 (2.0858) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][160/625] eta 0:03:12 lr 0.000996 wd 0.0500 time 0.3954 (0.4140) data time 0.0009 (0.0061) model time 0.3945 (0.4095) loss 7.1047 (7.7918) grad_norm 2.1562 (2.0856) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][170/625] eta 0:03:08 lr 0.000995 wd 0.0500 time 0.3954 (0.4132) data time 0.0010 (0.0058) model time 0.3944 (0.4087) loss 8.7452 (7.7785) grad_norm 2.3999 (2.1258) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][180/625] eta 0:03:03 lr 0.000995 wd 0.0500 time 0.4028 (0.4126) data time 0.0009 (0.0055) model time 0.4019 (0.4080) loss 8.3169 (7.7851) grad_norm 1.7879 (2.1057) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][190/625] eta 0:02:59 lr 0.000995 wd 0.0500 time 0.4069 (0.4130) data time 0.0008 (0.0053) model time 0.4061 (0.4088) loss 6.8075 (7.7908) grad_norm 1.3710 (2.0898) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][200/625] eta 0:02:55 lr 0.000995 wd 0.0500 time 0.3952 (0.4123) data time 0.0007 (0.0051) model time 0.3945 (0.4081) loss 7.0468 (7.7885) grad_norm 1.3178 (2.0736) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][210/625] eta 0:02:51 lr 0.000995 wd 0.0500 time 0.3988 (0.4128) data time 0.0009 (0.0049) model time 0.3979 (0.4090) loss 7.5316 (7.8040) grad_norm 2.5400 (2.0575) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][220/625] eta 0:02:47 lr 0.000995 wd 0.0500 time 0.3966 (0.4144) data time 0.0007 (0.0047) model time 0.3959 (0.4113) loss 9.2844 (7.8097) grad_norm 3.5294 (2.0747) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][230/625] eta 0:02:43 lr 0.000995 wd 0.0500 time 0.4103 (0.4147) data time 0.0011 (0.0045) model time 0.4092 (0.4118) loss 7.8212 (7.8036) grad_norm 2.0650 (2.0714) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][240/625] eta 0:02:39 lr 0.000995 wd 0.0500 time 0.3995 (0.4141) data time 0.0009 (0.0044) model time 0.3986 (0.4111) loss 7.2658 (7.7737) grad_norm 2.0573 (2.0632) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:56:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][250/625] eta 0:02:35 lr 0.000995 wd 0.0500 time 0.3994 (0.4135) data time 0.0009 (0.0043) model time 0.3986 (0.4105) loss 7.5416 (7.7688) grad_norm 2.1440 (2.0590) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][260/625] eta 0:02:30 lr 0.000995 wd 0.0500 time 0.4034 (0.4131) data time 0.0007 (0.0041) model time 0.4027 (0.4101) loss 8.3152 (7.7703) grad_norm 2.0431 (2.0604) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][270/625] eta 0:02:26 lr 0.000995 wd 0.0500 time 0.4162 (0.4127) data time 0.0008 (0.0040) model time 0.4154 (0.4097) loss 9.2967 (7.7586) grad_norm 2.0081 (2.0654) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][280/625] eta 0:02:22 lr 0.000995 wd 0.0500 time 0.3975 (0.4123) data time 0.0010 (0.0039) model time 0.3965 (0.4093) loss 7.2920 (7.7558) grad_norm 1.9556 (2.0615) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][290/625] eta 0:02:17 lr 0.000994 wd 0.0500 time 0.4035 (0.4119) data time 0.0008 (0.0038) model time 0.4027 (0.4089) loss 9.1271 (7.7549) grad_norm 5.3910 (2.0821) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][300/625] eta 0:02:13 lr 0.000994 wd 0.0500 time 0.4024 (0.4116) data time 0.0008 (0.0037) model time 0.4016 (0.4085) loss 7.9538 (7.7611) grad_norm 1.6526 (2.0806) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][310/625] eta 0:02:09 lr 0.000994 wd 0.0500 time 0.4062 (0.4112) data time 0.0009 (0.0036) model time 0.4053 (0.4082) loss 8.3273 (7.7552) grad_norm 1.7133 (2.0830) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][320/625] eta 0:02:05 lr 0.000994 wd 0.0500 time 0.4030 (0.4109) data time 0.0009 (0.0035) model time 0.4021 (0.4079) loss 6.6123 (7.7463) grad_norm 1.6500 (2.0736) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][330/625] eta 0:02:01 lr 0.000994 wd 0.0500 time 0.4008 (0.4107) data time 0.0007 (0.0035) model time 0.4001 (0.4077) loss 6.9122 (7.7477) grad_norm 1.6233 (2.0699) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][340/625] eta 0:01:56 lr 0.000994 wd 0.0500 time 0.3964 (0.4104) data time 0.0007 (0.0034) model time 0.3957 (0.4074) loss 8.1919 (7.7523) grad_norm 1.6752 (2.0736) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][350/625] eta 0:01:52 lr 0.000994 wd 0.0500 time 0.4019 (0.4101) data time 0.0006 (0.0033) model time 0.4013 (0.4072) loss 7.2157 (7.7317) grad_norm 3.1440 (2.0804) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][360/625] eta 0:01:48 lr 0.000994 wd 0.0500 time 0.4016 (0.4100) data time 0.0006 (0.0032) model time 0.4010 (0.4071) loss 9.5523 (7.7340) grad_norm 2.3701 (2.0872) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][370/625] eta 0:01:44 lr 0.000994 wd 0.0500 time 0.4097 (0.4097) data time 0.0010 (0.0032) model time 0.4087 (0.4069) loss 7.7980 (7.7277) grad_norm 3.1020 (2.0966) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][380/625] eta 0:01:40 lr 0.000994 wd 0.0500 time 0.4007 (0.4095) data time 0.0008 (0.0031) model time 0.4000 (0.4067) loss 8.5267 (7.7335) grad_norm 1.4327 (2.0867) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][390/625] eta 0:01:36 lr 0.000994 wd 0.0500 time 0.4037 (0.4093) data time 0.0006 (0.0031) model time 0.4031 (0.4065) loss 5.7213 (7.7304) grad_norm 2.5024 (2.0788) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:57:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][400/625] eta 0:01:32 lr 0.000994 wd 0.0500 time 0.4022 (0.4091) data time 0.0007 (0.0030) model time 0.4016 (0.4063) loss 6.7667 (7.7311) grad_norm 2.1349 (2.0720) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][410/625] eta 0:01:27 lr 0.000994 wd 0.0500 time 0.4018 (0.4092) data time 0.0011 (0.0030) model time 0.4008 (0.4065) loss 8.3025 (7.7366) grad_norm 1.5025 (2.0651) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][420/625] eta 0:01:23 lr 0.000993 wd 0.0500 time 0.4068 (0.4091) data time 0.0009 (0.0029) model time 0.4058 (0.4064) loss 7.1559 (7.7180) grad_norm 3.4244 (2.0717) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][430/625] eta 0:01:19 lr 0.000993 wd 0.0500 time 0.5249 (0.4091) data time 0.0010 (0.0029) model time 0.5239 (0.4065) loss 7.9197 (7.7185) grad_norm 1.6916 (2.0686) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][440/625] eta 0:01:15 lr 0.000993 wd 0.0500 time 0.3969 (0.4106) data time 0.0009 (0.0028) model time 0.3960 (0.4082) loss 8.3943 (7.7090) grad_norm 3.1936 (2.0792) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][450/625] eta 0:01:11 lr 0.000993 wd 0.0500 time 0.3995 (0.4112) data time 0.0009 (0.0028) model time 0.3986 (0.4089) loss 8.5489 (7.7187) grad_norm 2.3494 (2.0844) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][460/625] eta 0:01:07 lr 0.000993 wd 0.0500 time 0.4066 (0.4110) data time 0.0006 (0.0027) model time 0.4060 (0.4087) loss 8.3898 (7.7295) grad_norm 2.1050 (2.0835) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][470/625] eta 0:01:03 lr 0.000993 wd 0.0500 time 0.3955 (0.4108) data time 0.0009 (0.0027) model time 0.3946 (0.4085) loss 7.8241 (7.7275) grad_norm 4.8721 (2.0998) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][480/625] eta 0:00:59 lr 0.000993 wd 0.0500 time 0.4004 (0.4106) data time 0.0010 (0.0027) model time 0.3994 (0.4083) loss 8.1452 (7.7331) grad_norm 1.6180 (2.1117) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][490/625] eta 0:00:55 lr 0.000993 wd 0.0500 time 0.4020 (0.4104) data time 0.0006 (0.0026) model time 0.4014 (0.4081) loss 8.0493 (7.7289) grad_norm 2.4688 (2.1125) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][500/625] eta 0:00:51 lr 0.000993 wd 0.0500 time 0.3945 (0.4102) data time 0.0007 (0.0026) model time 0.3938 (0.4079) loss 8.7198 (7.7380) grad_norm 1.5835 (2.1105) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][510/625] eta 0:00:47 lr 0.000993 wd 0.0500 time 0.4019 (0.4100) data time 0.0009 (0.0026) model time 0.4010 (0.4078) loss 8.6207 (7.7441) grad_norm 2.1535 (2.1074) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][520/625] eta 0:00:43 lr 0.000993 wd 0.0500 time 0.4051 (0.4099) data time 0.0009 (0.0025) model time 0.4042 (0.4076) loss 7.9249 (7.7579) grad_norm 2.7752 (2.1101) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][530/625] eta 0:00:38 lr 0.000993 wd 0.0500 time 0.4017 (0.4097) data time 0.0010 (0.0025) model time 0.4008 (0.4075) loss 8.7603 (7.7632) grad_norm 1.9850 (2.1114) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:58:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][540/625] eta 0:00:34 lr 0.000992 wd 0.0500 time 0.4007 (0.4095) data time 0.0009 (0.0025) model time 0.3998 (0.4073) loss 7.6960 (7.7580) grad_norm 1.9108 (2.1184) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][550/625] eta 0:00:30 lr 0.000992 wd 0.0500 time 0.3986 (0.4093) data time 0.0006 (0.0024) model time 0.3980 (0.4071) loss 7.6196 (7.7573) grad_norm 1.9323 (2.1200) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][560/625] eta 0:00:26 lr 0.000992 wd 0.0500 time 0.3944 (0.4092) data time 0.0007 (0.0024) model time 0.3937 (0.4070) loss 8.2533 (7.7541) grad_norm 1.4290 (2.1181) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][570/625] eta 0:00:22 lr 0.000992 wd 0.0500 time 0.3993 (0.4090) data time 0.0008 (0.0024) model time 0.3985 (0.4068) loss 7.7585 (7.7559) grad_norm 2.0980 (2.1221) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][580/625] eta 0:00:18 lr 0.000992 wd 0.0500 time 0.4038 (0.4090) data time 0.0008 (0.0024) model time 0.4030 (0.4068) loss 6.7196 (7.7587) grad_norm 2.1791 (2.1298) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][590/625] eta 0:00:14 lr 0.000992 wd 0.0500 time 0.3989 (0.4089) data time 0.0008 (0.0023) model time 0.3981 (0.4067) loss 6.7561 (7.7601) grad_norm 2.3426 (2.1352) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][600/625] eta 0:00:10 lr 0.000992 wd 0.0500 time 0.4004 (0.4088) data time 0.0009 (0.0023) model time 0.3995 (0.4066) loss 6.9081 (7.7652) grad_norm 2.3211 (2.1408) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][610/625] eta 0:00:06 lr 0.000992 wd 0.0500 time 0.3996 (0.4087) data time 0.0004 (0.0023) model time 0.3991 (0.4065) loss 7.1071 (7.7601) grad_norm 1.4426 (2.1429) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][620/625] eta 0:00:02 lr 0.000992 wd 0.0500 time 0.4007 (0.4085) data time 0.0006 (0.0023) model time 0.4001 (0.4064) loss 8.7008 (7.7616) grad_norm 3.3423 (2.1416) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 96 training takes 0:04:15 [2024-07-24 22:59:31 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 22:59:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 22:59:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.587 (0.587) Loss 0.6440 (0.6440) Acc@1 87.646 (87.646) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-24 22:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.132) Loss 1.0508 (0.7998) Acc@1 76.904 (83.594) Acc@5 94.824 (96.937) Mem 14939MB [2024-07-24 22:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.110) Loss 1.1768 (0.9574) Acc@1 73.926 (79.962) Acc@5 93.359 (95.157) Mem 14939MB [2024-07-24 22:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.595 Acc@5 95.126 [2024-07-24 22:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-24 22:59:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.033 (1.033) Loss 0.6196 (0.6196) Acc@1 87.695 (87.695) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-24 22:59:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.176) Loss 1.0029 (0.7638) Acc@1 78.271 (84.304) Acc@5 94.629 (97.053) Mem 14939MB [2024-07-24 22:59:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.133) Loss 1.1445 (0.9061) Acc@1 72.852 (80.543) Acc@5 93.604 (95.461) Mem 14939MB [2024-07-24 22:59:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.190 Acc@5 95.427 [2024-07-24 22:59:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-07-24 22:59:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.19% [2024-07-24 22:59:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 22:59:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 22:59:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][0/625] eta 0:10:17 lr 0.000992 wd 0.0500 time 0.9878 (0.9878) data time 0.5921 (0.5921) model time 0.0000 (0.0000) loss 9.2356 (9.2356) grad_norm 1.5076 (1.5076) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][10/625] eta 0:04:39 lr 0.000992 wd 0.0500 time 0.3960 (0.4542) data time 0.0007 (0.0549) model time 0.0000 (0.0000) loss 7.0887 (7.9749) grad_norm 1.9246 (2.3418) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][20/625] eta 0:04:19 lr 0.000992 wd 0.0500 time 0.4105 (0.4292) data time 0.0006 (0.0297) model time 0.0000 (0.0000) loss 7.8675 (7.7386) grad_norm 2.3046 (2.2123) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][30/625] eta 0:04:20 lr 0.000992 wd 0.0500 time 0.4034 (0.4381) data time 0.0008 (0.0204) model time 0.0000 (0.0000) loss 8.9491 (7.7319) grad_norm 3.6600 (2.3643) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 22:59:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][40/625] eta 0:04:26 lr 0.000991 wd 0.0500 time 0.5477 (0.4550) data time 0.0007 (0.0157) model time 0.0000 (0.0000) loss 6.7748 (7.7099) grad_norm 3.6037 (2.4703) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][50/625] eta 0:04:17 lr 0.000991 wd 0.0500 time 0.3992 (0.4482) data time 0.0006 (0.0128) model time 0.0000 (0.0000) loss 7.8208 (7.6597) grad_norm 2.2480 (2.4726) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][60/625] eta 0:04:08 lr 0.000991 wd 0.0500 time 0.3991 (0.4402) data time 0.0008 (0.0109) model time 0.3982 (0.3985) loss 8.6527 (7.6003) grad_norm 1.9700 (2.3414) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][70/625] eta 0:04:01 lr 0.000991 wd 0.0500 time 0.3991 (0.4345) data time 0.0010 (0.0095) model time 0.3981 (0.3983) loss 8.7797 (7.6338) grad_norm 1.5128 (2.2953) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][80/625] eta 0:03:54 lr 0.000991 wd 0.0500 time 0.3942 (0.4303) data time 0.0012 (0.0084) model time 0.3930 (0.3990) loss 6.9014 (7.5575) grad_norm 1.6300 (2.2509) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][90/625] eta 0:03:48 lr 0.000991 wd 0.0500 time 0.3988 (0.4270) data time 0.0007 (0.0076) model time 0.3981 (0.3989) loss 7.1950 (7.5800) grad_norm 1.6313 (2.2661) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][100/625] eta 0:03:42 lr 0.000991 wd 0.0500 time 0.4062 (0.4244) data time 0.0007 (0.0069) model time 0.4055 (0.3991) loss 5.7511 (7.5851) grad_norm 1.9195 (2.2418) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][110/625] eta 0:03:37 lr 0.000991 wd 0.0500 time 0.4040 (0.4223) data time 0.0008 (0.0064) model time 0.4032 (0.3993) loss 8.6533 (7.6043) grad_norm 1.5297 (2.2067) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][120/625] eta 0:03:32 lr 0.000991 wd 0.0500 time 0.4008 (0.4205) data time 0.0009 (0.0060) model time 0.4000 (0.3993) loss 7.3258 (7.6212) grad_norm 2.9274 (2.2693) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][130/625] eta 0:03:27 lr 0.000991 wd 0.0500 time 0.4008 (0.4189) data time 0.0007 (0.0056) model time 0.4001 (0.3992) loss 8.5676 (7.6368) grad_norm 3.2289 (2.2703) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][140/625] eta 0:03:22 lr 0.000991 wd 0.0500 time 0.4105 (0.4176) data time 0.0009 (0.0052) model time 0.4096 (0.3993) loss 7.1392 (7.6507) grad_norm 1.6917 (2.2320) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][150/625] eta 0:03:17 lr 0.000991 wd 0.0500 time 0.3958 (0.4163) data time 0.0007 (0.0050) model time 0.3952 (0.3991) loss 8.3864 (7.6740) grad_norm 2.6965 (2.2150) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][160/625] eta 0:03:13 lr 0.000990 wd 0.0500 time 0.3940 (0.4160) data time 0.0008 (0.0047) model time 0.3932 (0.4001) loss 7.1203 (7.6709) grad_norm 1.4807 (2.2212) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][170/625] eta 0:03:11 lr 0.000990 wd 0.0500 time 0.4152 (0.4205) data time 0.0007 (0.0045) model time 0.4145 (0.4078) loss 8.8172 (7.6947) grad_norm 1.7805 (2.2098) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][180/625] eta 0:03:06 lr 0.000990 wd 0.0500 time 0.3996 (0.4193) data time 0.0009 (0.0043) model time 0.3987 (0.4071) loss 7.9496 (7.6816) grad_norm 2.6588 (2.2035) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:00:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][190/625] eta 0:03:01 lr 0.000990 wd 0.0500 time 0.4023 (0.4183) data time 0.0007 (0.0041) model time 0.4017 (0.4065) loss 7.5315 (7.6690) grad_norm 1.4325 (2.1731) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][200/625] eta 0:02:57 lr 0.000990 wd 0.0500 time 0.3983 (0.4174) data time 0.0009 (0.0040) model time 0.3974 (0.4060) loss 7.3900 (7.6724) grad_norm 1.6332 (2.1713) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][210/625] eta 0:02:52 lr 0.000990 wd 0.0500 time 0.3966 (0.4167) data time 0.0007 (0.0038) model time 0.3959 (0.4056) loss 8.1155 (7.6952) grad_norm 1.8113 (2.1562) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][220/625] eta 0:02:48 lr 0.000990 wd 0.0500 time 0.3973 (0.4159) data time 0.0010 (0.0037) model time 0.3962 (0.4052) loss 8.2169 (7.6876) grad_norm 2.1933 (2.1460) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][230/625] eta 0:02:44 lr 0.000990 wd 0.0500 time 0.3986 (0.4152) data time 0.0009 (0.0036) model time 0.3977 (0.4049) loss 7.5478 (7.6882) grad_norm 1.4186 (2.1408) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][240/625] eta 0:02:39 lr 0.000990 wd 0.0500 time 0.3973 (0.4146) data time 0.0008 (0.0035) model time 0.3965 (0.4047) loss 8.9493 (7.6790) grad_norm 1.5952 (2.1649) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][250/625] eta 0:02:35 lr 0.000990 wd 0.0500 time 0.4055 (0.4153) data time 0.0007 (0.0034) model time 0.4048 (0.4060) loss 7.7077 (7.6714) grad_norm 1.6189 (2.1756) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][260/625] eta 0:02:32 lr 0.000990 wd 0.0500 time 0.5944 (0.4176) data time 0.0010 (0.0033) model time 0.5934 (0.4092) loss 7.0204 (7.6579) grad_norm 2.4685 (2.1799) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][270/625] eta 0:02:28 lr 0.000990 wd 0.0500 time 0.4018 (0.4178) data time 0.0006 (0.0032) model time 0.4012 (0.4098) loss 8.4137 (7.6574) grad_norm 2.0478 (2.1695) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][280/625] eta 0:02:23 lr 0.000989 wd 0.0500 time 0.4031 (0.4171) data time 0.0008 (0.0031) model time 0.4023 (0.4093) loss 8.3555 (7.6527) grad_norm 2.0855 (2.1511) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][290/625] eta 0:02:19 lr 0.000989 wd 0.0500 time 0.3967 (0.4166) data time 0.0009 (0.0030) model time 0.3958 (0.4089) loss 7.3538 (7.6670) grad_norm 1.7821 (2.1422) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][300/625] eta 0:02:15 lr 0.000989 wd 0.0500 time 0.4040 (0.4160) data time 0.0009 (0.0030) model time 0.4031 (0.4085) loss 8.4436 (7.6793) grad_norm 1.4665 (2.1308) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][310/625] eta 0:02:10 lr 0.000989 wd 0.0500 time 0.4138 (0.4155) data time 0.0008 (0.0029) model time 0.4130 (0.4082) loss 8.0656 (7.6930) grad_norm 1.6774 (2.1384) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][320/625] eta 0:02:06 lr 0.000989 wd 0.0500 time 0.4052 (0.4151) data time 0.0007 (0.0028) model time 0.4046 (0.4079) loss 7.1991 (7.6991) grad_norm 2.1766 (2.1582) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:01:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][330/625] eta 0:02:02 lr 0.000989 wd 0.0500 time 0.3953 (0.4147) data time 0.0010 (0.0028) model time 0.3944 (0.4076) loss 6.3114 (7.6764) grad_norm 1.7833 (2.1559) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:02:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][340/625] eta 0:01:58 lr 0.000989 wd 0.0500 time 0.4020 (0.4143) data time 0.0008 (0.0027) model time 0.4013 (0.4074) loss 6.4378 (7.6783) grad_norm 1.4132 (2.1442) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:02:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][350/625] eta 0:01:53 lr 0.000989 wd 0.0500 time 0.4091 (0.4139) data time 0.0006 (0.0027) model time 0.4084 (0.4071) loss 6.4176 (7.6754) grad_norm 1.9198 (2.1347) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:02:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][360/625] eta 0:01:49 lr 0.000989 wd 0.0500 time 0.4022 (0.4137) data time 0.0008 (0.0026) model time 0.4014 (0.4071) loss 7.6656 (7.6861) grad_norm 2.0733 (inf) loss_scale 2048.0000 (4078.9806) mem 14939MB [2024-07-24 23:02:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][370/625] eta 0:01:45 lr 0.000989 wd 0.0500 time 0.4009 (0.4133) data time 0.0008 (0.0026) model time 0.4001 (0.4068) loss 8.3141 (7.6898) grad_norm 2.3451 (inf) loss_scale 2048.0000 (4024.2372) mem 14939MB [2024-07-24 23:02:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][380/625] eta 0:01:41 lr 0.000989 wd 0.0500 time 0.4028 (0.4134) data time 0.0007 (0.0025) model time 0.4021 (0.4071) loss 6.5713 (7.7013) grad_norm 2.5521 (inf) loss_scale 2048.0000 (3972.3675) mem 14939MB [2024-07-24 23:02:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][390/625] eta 0:01:37 lr 0.000989 wd 0.0500 time 0.4048 (0.4131) data time 0.0007 (0.0025) model time 0.4041 (0.4069) loss 8.0367 (7.6949) grad_norm 2.3662 (inf) loss_scale 2048.0000 (3923.1509) mem 14939MB [2024-07-24 23:02:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][400/625] eta 0:01:32 lr 0.000989 wd 0.0500 time 0.3969 (0.4128) data time 0.0007 (0.0024) model time 0.3962 (0.4067) loss 9.0231 (7.6959) grad_norm 1.6582 (inf) loss_scale 2048.0000 (3876.3890) mem 14939MB [2024-07-24 23:02:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][410/625] eta 0:01:28 lr 0.000988 wd 0.0500 time 0.3988 (0.4125) data time 0.0009 (0.0024) model time 0.3978 (0.4065) loss 8.6531 (7.6927) grad_norm 1.8166 (inf) loss_scale 2048.0000 (3831.9027) mem 14939MB [2024-07-24 23:02:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][420/625] eta 0:01:24 lr 0.000988 wd 0.0500 time 0.4005 (0.4122) data time 0.0006 (0.0024) model time 0.3999 (0.4063) loss 8.6377 (7.6834) grad_norm 2.5758 (inf) loss_scale 2048.0000 (3789.5297) mem 14939MB [2024-07-24 23:02:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][430/625] eta 0:01:20 lr 0.000988 wd 0.0500 time 0.4014 (0.4119) data time 0.0007 (0.0024) model time 0.4007 (0.4061) loss 7.5362 (7.6829) grad_norm 1.6011 (inf) loss_scale 2048.0000 (3749.1230) mem 14939MB [2024-07-24 23:02:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][440/625] eta 0:01:16 lr 0.000988 wd 0.0500 time 0.3986 (0.4117) data time 0.0006 (0.0023) model time 0.3979 (0.4060) loss 7.1190 (7.6636) grad_norm 3.0327 (inf) loss_scale 2048.0000 (3710.5488) mem 14939MB [2024-07-24 23:02:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][450/625] eta 0:01:12 lr 0.000988 wd 0.0500 time 0.3977 (0.4115) data time 0.0010 (0.0023) model time 0.3967 (0.4059) loss 6.8802 (7.6646) grad_norm 2.1335 (inf) loss_scale 2048.0000 (3673.6851) mem 14939MB [2024-07-24 23:02:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][460/625] eta 0:01:07 lr 0.000988 wd 0.0500 time 0.4030 (0.4113) data time 0.0011 (0.0023) model time 0.4020 (0.4057) loss 8.5753 (7.6691) grad_norm 1.8358 (inf) loss_scale 2048.0000 (3638.4208) mem 14939MB [2024-07-24 23:02:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][470/625] eta 0:01:03 lr 0.000988 wd 0.0500 time 0.5928 (0.4122) data time 0.0009 (0.0022) model time 0.5919 (0.4068) loss 5.9934 (7.6654) grad_norm 1.5655 (inf) loss_scale 2048.0000 (3604.6539) mem 14939MB [2024-07-24 23:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][480/625] eta 0:01:00 lr 0.000988 wd 0.0500 time 0.5580 (0.4141) data time 0.0008 (0.0022) model time 0.5571 (0.4091) loss 7.2010 (7.6679) grad_norm 1.5626 (inf) loss_scale 2048.0000 (3572.2911) mem 14939MB [2024-07-24 23:03:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][490/625] eta 0:00:55 lr 0.000988 wd 0.0500 time 0.3923 (0.4146) data time 0.0008 (0.0022) model time 0.3915 (0.4097) loss 8.8118 (7.6659) grad_norm 2.0570 (inf) loss_scale 2048.0000 (3541.2464) mem 14939MB [2024-07-24 23:03:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][500/625] eta 0:00:51 lr 0.000988 wd 0.0500 time 0.4011 (0.4143) data time 0.0009 (0.0022) model time 0.4002 (0.4095) loss 7.8016 (7.6589) grad_norm 2.3594 (inf) loss_scale 2048.0000 (3511.4411) mem 14939MB [2024-07-24 23:03:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][510/625] eta 0:00:47 lr 0.000988 wd 0.0500 time 0.4025 (0.4140) data time 0.0006 (0.0021) model time 0.4019 (0.4093) loss 7.3200 (7.6631) grad_norm 1.6876 (inf) loss_scale 2048.0000 (3482.8023) mem 14939MB [2024-07-24 23:03:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][520/625] eta 0:00:43 lr 0.000988 wd 0.0500 time 0.4012 (0.4138) data time 0.0010 (0.0021) model time 0.4002 (0.4091) loss 8.0105 (7.6693) grad_norm 3.5861 (inf) loss_scale 2048.0000 (3455.2630) mem 14939MB [2024-07-24 23:03:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][530/625] eta 0:00:39 lr 0.000987 wd 0.0500 time 0.4078 (0.4136) data time 0.0008 (0.0021) model time 0.4070 (0.4089) loss 6.6928 (7.6691) grad_norm 3.6453 (inf) loss_scale 2048.0000 (3428.7608) mem 14939MB [2024-07-24 23:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][540/625] eta 0:00:35 lr 0.000987 wd 0.0500 time 0.4010 (0.4133) data time 0.0008 (0.0021) model time 0.4003 (0.4087) loss 6.4529 (7.6613) grad_norm 1.8091 (inf) loss_scale 2048.0000 (3403.2384) mem 14939MB [2024-07-24 23:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][550/625] eta 0:00:30 lr 0.000987 wd 0.0500 time 0.4032 (0.4131) data time 0.0008 (0.0021) model time 0.4024 (0.4085) loss 7.8178 (7.6675) grad_norm 2.0766 (inf) loss_scale 2048.0000 (3378.6425) mem 14939MB [2024-07-24 23:03:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][560/625] eta 0:00:26 lr 0.000987 wd 0.0500 time 0.3970 (0.4129) data time 0.0006 (0.0020) model time 0.3964 (0.4084) loss 7.5706 (7.6642) grad_norm 2.2375 (inf) loss_scale 2048.0000 (3354.9234) mem 14939MB [2024-07-24 23:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][570/625] eta 0:00:22 lr 0.000987 wd 0.0500 time 0.4083 (0.4127) data time 0.0008 (0.0020) model time 0.4075 (0.4083) loss 8.0990 (7.6600) grad_norm 1.8200 (inf) loss_scale 2048.0000 (3332.0350) mem 14939MB [2024-07-24 23:03:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][580/625] eta 0:00:18 lr 0.000987 wd 0.0500 time 0.4000 (0.4126) data time 0.0006 (0.0020) model time 0.3994 (0.4082) loss 7.2599 (7.6579) grad_norm 1.8398 (inf) loss_scale 2048.0000 (3309.9346) mem 14939MB [2024-07-24 23:03:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][590/625] eta 0:00:14 lr 0.000987 wd 0.0500 time 0.4040 (0.4124) data time 0.0007 (0.0020) model time 0.4032 (0.4081) loss 7.3530 (7.6457) grad_norm 1.8612 (inf) loss_scale 2048.0000 (3288.5821) mem 14939MB [2024-07-24 23:03:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][600/625] eta 0:00:10 lr 0.000987 wd 0.0500 time 0.4002 (0.4126) data time 0.0009 (0.0020) model time 0.3994 (0.4083) loss 8.1044 (7.6513) grad_norm 2.7245 (inf) loss_scale 2048.0000 (3267.9401) mem 14939MB [2024-07-24 23:03:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][610/625] eta 0:00:06 lr 0.000987 wd 0.0500 time 0.3993 (0.4124) data time 0.0006 (0.0020) model time 0.3987 (0.4082) loss 8.0024 (7.6490) grad_norm 2.0524 (inf) loss_scale 2048.0000 (3247.9738) mem 14939MB [2024-07-24 23:03:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][620/625] eta 0:00:02 lr 0.000987 wd 0.0500 time 0.3997 (0.4122) data time 0.0004 (0.0019) model time 0.3993 (0.4080) loss 8.7061 (7.6516) grad_norm 1.5882 (inf) loss_scale 2048.0000 (3228.6506) mem 14939MB [2024-07-24 23:03:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 97 training takes 0:04:17 [2024-07-24 23:03:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:03:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:03:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.6152 (0.6152) Acc@1 87.451 (87.451) Acc@5 97.900 (97.900) Mem 14939MB [2024-07-24 23:03:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 1.0430 (0.7701) Acc@1 77.002 (83.616) Acc@5 94.238 (96.893) Mem 14939MB [2024-07-24 23:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1270 (0.9164) Acc@1 73.828 (79.962) Acc@5 93.164 (95.266) Mem 14939MB [2024-07-24 23:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.625 Acc@5 95.244 [2024-07-24 23:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-24 23:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.805 (0.805) Loss 0.6187 (0.6187) Acc@1 87.744 (87.744) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-24 23:04:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 1.0010 (0.7626) Acc@1 78.467 (84.335) Acc@5 94.678 (97.075) Mem 14939MB [2024-07-24 23:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.1406 (0.9046) Acc@1 73.096 (80.592) Acc@5 93.652 (95.494) Mem 14939MB [2024-07-24 23:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.238 Acc@5 95.463 [2024-07-24 23:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-07-24 23:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.24% [2024-07-24 23:04:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:04:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][0/625] eta 0:14:45 lr 0.000987 wd 0.0500 time 1.4161 (1.4161) data time 1.0386 (1.0386) model time 0.0000 (0.0000) loss 6.0903 (6.0903) grad_norm 8.7653 (8.7653) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][10/625] eta 0:05:02 lr 0.000987 wd 0.0500 time 0.3992 (0.4922) data time 0.0009 (0.0953) model time 0.0000 (0.0000) loss 7.3309 (7.8529) grad_norm 2.1257 (3.3462) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][20/625] eta 0:04:31 lr 0.000987 wd 0.0500 time 0.3951 (0.4481) data time 0.0008 (0.0504) model time 0.0000 (0.0000) loss 8.2354 (7.8618) grad_norm 1.6319 (2.8185) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][30/625] eta 0:04:18 lr 0.000986 wd 0.0500 time 0.4087 (0.4345) data time 0.0009 (0.0345) model time 0.0000 (0.0000) loss 7.4658 (7.7934) grad_norm 1.4720 (2.5569) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][40/625] eta 0:04:09 lr 0.000986 wd 0.0500 time 0.4003 (0.4262) data time 0.0010 (0.0263) model time 0.0000 (0.0000) loss 7.0462 (7.6671) grad_norm 1.5418 (2.3778) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][50/625] eta 0:04:02 lr 0.000986 wd 0.0500 time 0.3973 (0.4211) data time 0.0010 (0.0214) model time 0.0000 (0.0000) loss 8.3075 (7.7517) grad_norm 2.3853 (2.3425) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][60/625] eta 0:03:56 lr 0.000986 wd 0.0500 time 0.4040 (0.4177) data time 0.0006 (0.0180) model time 0.4034 (0.3996) loss 7.6000 (7.7274) grad_norm 2.0909 (2.3113) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][70/625] eta 0:03:54 lr 0.000986 wd 0.0500 time 0.5750 (0.4227) data time 0.0008 (0.0156) model time 0.5742 (0.4259) loss 6.9552 (7.6619) grad_norm 2.7219 (2.2874) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][80/625] eta 0:03:54 lr 0.000986 wd 0.0500 time 0.6352 (0.4298) data time 0.0007 (0.0138) model time 0.6344 (0.4437) loss 6.7624 (7.6409) grad_norm 2.5016 (2.2795) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][90/625] eta 0:03:48 lr 0.000986 wd 0.0500 time 0.4075 (0.4265) data time 0.0008 (0.0124) model time 0.4068 (0.4324) loss 7.9803 (7.6702) grad_norm 3.3320 (2.3446) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][100/625] eta 0:03:42 lr 0.000986 wd 0.0500 time 0.3990 (0.4238) data time 0.0008 (0.0113) model time 0.3982 (0.4256) loss 7.4196 (7.6532) grad_norm 2.1891 (2.3568) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][110/625] eta 0:03:38 lr 0.000986 wd 0.0500 time 0.4033 (0.4243) data time 0.0008 (0.0103) model time 0.4025 (0.4261) loss 8.3710 (7.6973) grad_norm 1.5531 (2.3373) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][120/625] eta 0:03:33 lr 0.000986 wd 0.0500 time 0.4012 (0.4227) data time 0.0009 (0.0096) model time 0.4003 (0.4229) loss 7.7507 (7.7546) grad_norm 3.7165 (2.3501) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:04:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][130/625] eta 0:03:28 lr 0.000986 wd 0.0500 time 0.3962 (0.4221) data time 0.0011 (0.0089) model time 0.3951 (0.4218) loss 6.8466 (7.7438) grad_norm 1.8234 (2.3679) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][140/625] eta 0:03:24 lr 0.000986 wd 0.0500 time 0.4030 (0.4207) data time 0.0007 (0.0083) model time 0.4024 (0.4195) loss 8.8870 (7.7346) grad_norm 2.2979 (2.3769) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][150/625] eta 0:03:19 lr 0.000985 wd 0.0500 time 0.4000 (0.4193) data time 0.0006 (0.0079) model time 0.3994 (0.4174) loss 6.2372 (7.7086) grad_norm 2.0767 (2.3306) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][160/625] eta 0:03:14 lr 0.000985 wd 0.0500 time 0.3995 (0.4181) data time 0.0007 (0.0074) model time 0.3988 (0.4158) loss 7.5073 (7.7200) grad_norm 2.1983 (2.3002) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][170/625] eta 0:03:09 lr 0.000985 wd 0.0500 time 0.3984 (0.4171) data time 0.0009 (0.0071) model time 0.3975 (0.4144) loss 7.1981 (7.6881) grad_norm 1.3923 (2.2753) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][180/625] eta 0:03:05 lr 0.000985 wd 0.0500 time 0.4023 (0.4163) data time 0.0010 (0.0068) model time 0.4013 (0.4134) loss 8.6586 (7.7051) grad_norm 1.6693 (2.2674) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][190/625] eta 0:03:00 lr 0.000985 wd 0.0500 time 0.4020 (0.4155) data time 0.0007 (0.0065) model time 0.4013 (0.4125) loss 7.5787 (7.6991) grad_norm 2.0908 (2.2858) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][200/625] eta 0:02:56 lr 0.000985 wd 0.0500 time 0.3995 (0.4148) data time 0.0006 (0.0062) model time 0.3989 (0.4116) loss 8.0285 (7.7116) grad_norm 1.6702 (2.2737) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][210/625] eta 0:02:51 lr 0.000985 wd 0.0500 time 0.4000 (0.4142) data time 0.0006 (0.0059) model time 0.3994 (0.4110) loss 6.4091 (7.7046) grad_norm 2.6772 (2.2578) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][220/625] eta 0:02:47 lr 0.000985 wd 0.0500 time 0.3955 (0.4135) data time 0.0006 (0.0057) model time 0.3948 (0.4102) loss 8.8840 (7.7204) grad_norm 1.6076 (2.2391) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][230/625] eta 0:02:43 lr 0.000985 wd 0.0500 time 0.3974 (0.4141) data time 0.0006 (0.0055) model time 0.3968 (0.4111) loss 8.7051 (7.7338) grad_norm 1.5168 (2.2193) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][240/625] eta 0:02:39 lr 0.000985 wd 0.0500 time 0.3979 (0.4135) data time 0.0007 (0.0053) model time 0.3972 (0.4105) loss 7.3770 (7.7187) grad_norm 1.7304 (2.1997) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][250/625] eta 0:02:34 lr 0.000985 wd 0.0500 time 0.4358 (0.4132) data time 0.0011 (0.0051) model time 0.4348 (0.4101) loss 7.4686 (7.7155) grad_norm 1.4943 (2.1922) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][260/625] eta 0:02:30 lr 0.000985 wd 0.0500 time 0.4003 (0.4127) data time 0.0006 (0.0050) model time 0.3997 (0.4097) loss 7.0577 (7.7155) grad_norm 1.6122 (2.1847) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][270/625] eta 0:02:26 lr 0.000984 wd 0.0500 time 0.4046 (0.4123) data time 0.0006 (0.0048) model time 0.4039 (0.4093) loss 7.5596 (7.7177) grad_norm 1.8251 (2.1743) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][280/625] eta 0:02:22 lr 0.000984 wd 0.0500 time 0.3996 (0.4120) data time 0.0007 (0.0047) model time 0.3990 (0.4090) loss 7.3900 (7.7014) grad_norm 2.0380 (2.1666) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][290/625] eta 0:02:18 lr 0.000984 wd 0.0500 time 0.4124 (0.4132) data time 0.0008 (0.0045) model time 0.4116 (0.4106) loss 7.5862 (7.6875) grad_norm 1.7946 (2.1777) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][300/625] eta 0:02:15 lr 0.000984 wd 0.0500 time 0.4042 (0.4164) data time 0.0008 (0.0044) model time 0.4034 (0.4144) loss 7.4625 (7.6889) grad_norm 2.8525 (2.1853) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][310/625] eta 0:02:11 lr 0.000984 wd 0.0500 time 0.4090 (0.4159) data time 0.0007 (0.0043) model time 0.4083 (0.4139) loss 7.4162 (7.7001) grad_norm 1.8693 (2.1729) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][320/625] eta 0:02:06 lr 0.000984 wd 0.0500 time 0.3967 (0.4155) data time 0.0010 (0.0042) model time 0.3957 (0.4135) loss 7.0657 (7.7030) grad_norm 1.4682 (2.1675) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][330/625] eta 0:02:02 lr 0.000984 wd 0.0500 time 0.3977 (0.4151) data time 0.0007 (0.0041) model time 0.3970 (0.4130) loss 6.9349 (7.6881) grad_norm 1.6722 (2.1730) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][340/625] eta 0:01:58 lr 0.000984 wd 0.0500 time 0.4020 (0.4148) data time 0.0007 (0.0040) model time 0.4014 (0.4128) loss 6.9802 (7.6929) grad_norm 1.6615 (2.1600) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][350/625] eta 0:01:54 lr 0.000984 wd 0.0500 time 0.4008 (0.4148) data time 0.0006 (0.0039) model time 0.4002 (0.4128) loss 7.9263 (7.7038) grad_norm 2.6846 (2.1608) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][360/625] eta 0:01:49 lr 0.000984 wd 0.0500 time 0.3967 (0.4144) data time 0.0008 (0.0038) model time 0.3959 (0.4124) loss 6.5269 (7.6860) grad_norm 1.6567 (2.1708) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][370/625] eta 0:01:45 lr 0.000984 wd 0.0500 time 0.4045 (0.4141) data time 0.0007 (0.0038) model time 0.4039 (0.4120) loss 8.7549 (7.6957) grad_norm 1.9924 (2.1676) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][380/625] eta 0:01:41 lr 0.000984 wd 0.0500 time 0.4010 (0.4153) data time 0.0007 (0.0037) model time 0.4003 (0.4134) loss 6.9632 (7.6928) grad_norm 2.0665 (2.1589) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][390/625] eta 0:01:37 lr 0.000983 wd 0.0500 time 0.3998 (0.4149) data time 0.0008 (0.0036) model time 0.3990 (0.4130) loss 8.4694 (7.6970) grad_norm 1.8650 (2.1638) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][400/625] eta 0:01:33 lr 0.000983 wd 0.0500 time 0.4009 (0.4168) data time 0.0007 (0.0036) model time 0.4002 (0.4153) loss 6.3900 (7.6856) grad_norm 1.5902 (2.1584) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][410/625] eta 0:01:29 lr 0.000983 wd 0.0500 time 0.3998 (0.4165) data time 0.0007 (0.0035) model time 0.3991 (0.4148) loss 8.4764 (7.6817) grad_norm 1.9876 (2.1675) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:06:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][420/625] eta 0:01:25 lr 0.000983 wd 0.0500 time 0.3977 (0.4161) data time 0.0009 (0.0034) model time 0.3968 (0.4144) loss 7.5573 (7.6830) grad_norm 3.1944 (2.1636) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][430/625] eta 0:01:21 lr 0.000983 wd 0.0500 time 0.4003 (0.4157) data time 0.0007 (0.0034) model time 0.3996 (0.4140) loss 7.7946 (7.6855) grad_norm 1.9091 (2.1554) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][440/625] eta 0:01:17 lr 0.000983 wd 0.0500 time 0.4129 (0.4178) data time 0.0012 (0.0033) model time 0.4117 (0.4164) loss 7.2267 (7.6873) grad_norm 1.4295 (2.1510) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][450/625] eta 0:01:13 lr 0.000983 wd 0.0500 time 0.3933 (0.4173) data time 0.0009 (0.0033) model time 0.3924 (0.4159) loss 6.8120 (7.6771) grad_norm 1.7698 (2.1413) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][460/625] eta 0:01:08 lr 0.000983 wd 0.0500 time 0.3986 (0.4170) data time 0.0008 (0.0032) model time 0.3978 (0.4155) loss 7.9451 (7.6802) grad_norm 3.3698 (2.1497) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][470/625] eta 0:01:04 lr 0.000983 wd 0.0500 time 0.3952 (0.4166) data time 0.0007 (0.0032) model time 0.3945 (0.4151) loss 6.9994 (7.6805) grad_norm 2.2418 (2.1498) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][480/625] eta 0:01:00 lr 0.000983 wd 0.0500 time 0.4065 (0.4168) data time 0.0007 (0.0031) model time 0.4058 (0.4153) loss 8.3594 (7.6793) grad_norm 1.7665 (2.1438) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][490/625] eta 0:00:56 lr 0.000983 wd 0.0500 time 0.3955 (0.4165) data time 0.0010 (0.0031) model time 0.3945 (0.4150) loss 7.3644 (7.6777) grad_norm 1.9435 (2.1429) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][500/625] eta 0:00:52 lr 0.000983 wd 0.0500 time 0.3930 (0.4162) data time 0.0009 (0.0030) model time 0.3921 (0.4147) loss 8.0590 (7.6789) grad_norm 1.5414 (2.1396) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][510/625] eta 0:00:48 lr 0.000982 wd 0.0500 time 0.3990 (0.4184) data time 0.0009 (0.0030) model time 0.3981 (0.4171) loss 8.6890 (7.6810) grad_norm 1.5671 (2.1433) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][520/625] eta 0:00:44 lr 0.000982 wd 0.0500 time 0.3959 (0.4202) data time 0.0010 (0.0030) model time 0.3949 (0.4191) loss 6.6814 (7.6683) grad_norm 1.6003 (2.1338) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][530/625] eta 0:00:39 lr 0.000982 wd 0.0500 time 0.4056 (0.4202) data time 0.0008 (0.0029) model time 0.4048 (0.4191) loss 7.5729 (7.6648) grad_norm 2.4899 (2.1347) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][540/625] eta 0:00:35 lr 0.000982 wd 0.0500 time 0.4022 (0.4198) data time 0.0009 (0.0029) model time 0.4013 (0.4187) loss 8.4043 (7.6654) grad_norm 1.5676 (2.1311) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:07:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][550/625] eta 0:00:31 lr 0.000982 wd 0.0500 time 0.4052 (0.4194) data time 0.0005 (0.0028) model time 0.4047 (0.4183) loss 6.7660 (7.6646) grad_norm 2.4454 (2.1250) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][560/625] eta 0:00:27 lr 0.000982 wd 0.0500 time 0.3959 (0.4233) data time 0.0007 (0.0028) model time 0.3951 (0.4226) loss 6.4894 (7.6597) grad_norm 1.8852 (2.1171) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][570/625] eta 0:00:23 lr 0.000982 wd 0.0500 time 0.3960 (0.4233) data time 0.0007 (0.0028) model time 0.3952 (0.4226) loss 6.7010 (7.6608) grad_norm 1.9952 (2.1225) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][580/625] eta 0:00:19 lr 0.000982 wd 0.0500 time 0.3983 (0.4234) data time 0.0008 (0.0027) model time 0.3975 (0.4227) loss 8.3452 (7.6624) grad_norm 2.2637 (2.1229) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][590/625] eta 0:00:14 lr 0.000982 wd 0.0500 time 0.4012 (0.4230) data time 0.0010 (0.0027) model time 0.4002 (0.4223) loss 8.1018 (7.6701) grad_norm 1.5700 (2.1314) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][600/625] eta 0:00:10 lr 0.000982 wd 0.0500 time 0.4005 (0.4227) data time 0.0007 (0.0027) model time 0.3999 (0.4219) loss 6.5639 (7.6663) grad_norm 1.8657 (2.1301) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][610/625] eta 0:00:06 lr 0.000982 wd 0.0500 time 0.4085 (0.4224) data time 0.0005 (0.0027) model time 0.4080 (0.4215) loss 6.5512 (7.6670) grad_norm 1.3744 (2.1219) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][620/625] eta 0:00:02 lr 0.000982 wd 0.0500 time 0.4000 (0.4220) data time 0.0003 (0.0026) model time 0.3996 (0.4211) loss 9.2154 (7.6701) grad_norm 1.9146 (2.1174) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 98 training takes 0:04:23 [2024-07-24 23:08:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:08:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:08:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.620 (0.620) Loss 0.6709 (0.6709) Acc@1 86.865 (86.865) Acc@5 97.754 (97.754) Mem 14939MB [2024-07-24 23:08:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.135) Loss 1.0791 (0.8016) Acc@1 77.002 (84.095) Acc@5 93.848 (96.884) Mem 14939MB [2024-07-24 23:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.112) Loss 1.1719 (0.9492) Acc@1 73.779 (80.190) Acc@5 93.555 (95.208) Mem 14939MB [2024-07-24 23:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.872 Acc@5 95.172 [2024-07-24 23:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-24 23:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.87% [2024-07-24 23:08:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 23:08:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 23:08:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.570 (0.570) Loss 0.6167 (0.6167) Acc@1 87.744 (87.744) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-24 23:08:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.131) Loss 0.9990 (0.7612) Acc@1 78.662 (84.379) Acc@5 94.727 (97.101) Mem 14939MB [2024-07-24 23:08:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.109) Loss 1.1357 (0.9025) Acc@1 73.145 (80.627) Acc@5 93.604 (95.522) Mem 14939MB [2024-07-24 23:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.274 Acc@5 95.495 [2024-07-24 23:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-24 23:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.27% [2024-07-24 23:08:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:08:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:08:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][0/625] eta 0:12:43 lr 0.000982 wd 0.0500 time 1.2213 (1.2213) data time 0.8453 (0.8453) model time 0.0000 (0.0000) loss 7.3606 (7.3606) grad_norm 2.4698 (2.4698) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][10/625] eta 0:04:52 lr 0.000981 wd 0.0500 time 0.3924 (0.4757) data time 0.0007 (0.0778) model time 0.0000 (0.0000) loss 6.4908 (7.3840) grad_norm 1.8014 (2.5068) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][20/625] eta 0:04:26 lr 0.000981 wd 0.0500 time 0.3985 (0.4406) data time 0.0007 (0.0414) model time 0.0000 (0.0000) loss 6.3392 (7.4762) grad_norm 1.7075 (2.3606) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][30/625] eta 0:04:33 lr 0.000981 wd 0.0500 time 1.3868 (0.4593) data time 0.0006 (0.0284) model time 0.0000 (0.0000) loss 7.9125 (7.6095) grad_norm 2.1820 (2.2072) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][40/625] eta 0:04:20 lr 0.000981 wd 0.0500 time 0.4015 (0.4446) data time 0.0006 (0.0217) model time 0.0000 (0.0000) loss 9.2700 (7.6847) grad_norm 2.4530 (2.1615) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:08:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][50/625] eta 0:04:10 lr 0.000981 wd 0.0500 time 0.4072 (0.4357) data time 0.0007 (0.0176) model time 0.0000 (0.0000) loss 6.4107 (7.6109) grad_norm 1.5058 (2.0650) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][60/625] eta 0:04:02 lr 0.000981 wd 0.0500 time 0.3988 (0.4299) data time 0.0006 (0.0149) model time 0.3982 (0.3994) loss 8.4395 (7.5804) grad_norm 1.9165 (2.0495) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][70/625] eta 0:04:13 lr 0.000981 wd 0.0500 time 0.3941 (0.4566) data time 0.0009 (0.0185) model time 0.3932 (0.4891) loss 7.9520 (7.6014) grad_norm 2.2579 (2.0937) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][80/625] eta 0:04:06 lr 0.000981 wd 0.0500 time 0.4070 (0.4521) data time 0.0006 (0.0164) model time 0.4064 (0.4656) loss 7.5858 (7.5890) grad_norm 2.7846 (2.1863) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][90/625] eta 0:03:58 lr 0.000981 wd 0.0500 time 0.3950 (0.4463) data time 0.0009 (0.0147) model time 0.3941 (0.4489) loss 6.6397 (7.5808) grad_norm 1.7124 (2.1907) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][100/625] eta 0:03:51 lr 0.000981 wd 0.0500 time 0.3956 (0.4418) data time 0.0008 (0.0133) model time 0.3949 (0.4390) loss 8.0976 (7.6084) grad_norm 3.1932 (2.1506) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][110/625] eta 0:03:47 lr 0.000981 wd 0.0500 time 0.5846 (0.4427) data time 0.0009 (0.0122) model time 0.5838 (0.4409) loss 6.3941 (7.5887) grad_norm 2.7657 (2.1677) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][120/625] eta 0:03:45 lr 0.000981 wd 0.0500 time 0.4026 (0.4471) data time 0.0009 (0.0113) model time 0.4017 (0.4487) loss 6.1300 (7.5866) grad_norm 3.4170 (2.2087) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][130/625] eta 0:03:39 lr 0.000980 wd 0.0500 time 0.4003 (0.4435) data time 0.0006 (0.0105) model time 0.3997 (0.4426) loss 6.3678 (7.6198) grad_norm 2.2094 (2.2103) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][140/625] eta 0:03:33 lr 0.000980 wd 0.0500 time 0.3947 (0.4404) data time 0.0010 (0.0098) model time 0.3937 (0.4377) loss 8.7115 (7.6328) grad_norm 1.9906 (2.1880) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][150/625] eta 0:03:28 lr 0.000980 wd 0.0500 time 0.3983 (0.4379) data time 0.0009 (0.0092) model time 0.3973 (0.4341) loss 7.6619 (7.6381) grad_norm 2.4488 (2.1639) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][160/625] eta 0:03:22 lr 0.000980 wd 0.0500 time 0.4074 (0.4357) data time 0.0006 (0.0087) model time 0.4067 (0.4311) loss 7.8039 (7.6220) grad_norm 3.2057 (2.1632) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][170/625] eta 0:03:17 lr 0.000980 wd 0.0500 time 0.3994 (0.4336) data time 0.0007 (0.0082) model time 0.3987 (0.4285) loss 6.0168 (7.6382) grad_norm 2.0513 (2.1638) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][180/625] eta 0:03:12 lr 0.000980 wd 0.0500 time 0.4075 (0.4319) data time 0.0010 (0.0078) model time 0.4065 (0.4264) loss 7.7120 (7.6494) grad_norm 1.5977 (2.1603) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:09:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][190/625] eta 0:03:07 lr 0.000980 wd 0.0500 time 0.3988 (0.4303) data time 0.0007 (0.0075) model time 0.3981 (0.4245) loss 7.5722 (7.6359) grad_norm 1.9217 (2.1419) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][200/625] eta 0:03:02 lr 0.000980 wd 0.0500 time 0.4058 (0.4291) data time 0.0008 (0.0071) model time 0.4050 (0.4232) loss 8.4052 (7.6528) grad_norm 1.7646 (2.1173) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][210/625] eta 0:02:57 lr 0.000980 wd 0.0500 time 0.3992 (0.4278) data time 0.0007 (0.0068) model time 0.3985 (0.4219) loss 6.1820 (7.6416) grad_norm 1.8593 (2.1175) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][220/625] eta 0:02:52 lr 0.000980 wd 0.0500 time 0.3993 (0.4266) data time 0.0011 (0.0066) model time 0.3982 (0.4206) loss 6.4011 (7.6383) grad_norm 2.9487 (2.1194) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][230/625] eta 0:02:48 lr 0.000980 wd 0.0500 time 0.3989 (0.4255) data time 0.0009 (0.0063) model time 0.3981 (0.4194) loss 7.4400 (7.6530) grad_norm 1.9306 (2.1187) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][240/625] eta 0:02:43 lr 0.000980 wd 0.0500 time 0.4042 (0.4244) data time 0.0006 (0.0061) model time 0.4036 (0.4184) loss 7.5539 (7.6627) grad_norm 2.2646 (2.1079) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][250/625] eta 0:02:38 lr 0.000979 wd 0.0500 time 0.3984 (0.4235) data time 0.0006 (0.0059) model time 0.3978 (0.4174) loss 6.3803 (7.6479) grad_norm 2.1102 (2.1088) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][260/625] eta 0:02:34 lr 0.000979 wd 0.0500 time 0.3993 (0.4226) data time 0.0007 (0.0057) model time 0.3986 (0.4166) loss 6.7169 (7.6460) grad_norm 3.0172 (2.1338) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][270/625] eta 0:02:29 lr 0.000979 wd 0.0500 time 0.4006 (0.4218) data time 0.0009 (0.0055) model time 0.3997 (0.4159) loss 7.7232 (7.6493) grad_norm 2.6788 (2.1442) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][280/625] eta 0:02:25 lr 0.000979 wd 0.0500 time 0.3987 (0.4211) data time 0.0007 (0.0054) model time 0.3980 (0.4152) loss 7.2829 (7.6486) grad_norm 1.8255 (2.1414) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][290/625] eta 0:02:20 lr 0.000979 wd 0.0500 time 0.4000 (0.4204) data time 0.0008 (0.0052) model time 0.3992 (0.4146) loss 8.0070 (7.6397) grad_norm 2.0741 (2.1372) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][300/625] eta 0:02:16 lr 0.000979 wd 0.0500 time 0.3936 (0.4204) data time 0.0009 (0.0051) model time 0.3927 (0.4148) loss 7.5150 (7.6283) grad_norm 1.5702 (2.1326) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][310/625] eta 0:02:12 lr 0.000979 wd 0.0500 time 0.4021 (0.4201) data time 0.0010 (0.0049) model time 0.4011 (0.4145) loss 7.7272 (7.6327) grad_norm 1.2921 (2.1210) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][320/625] eta 0:02:07 lr 0.000979 wd 0.0500 time 0.3984 (0.4194) data time 0.0009 (0.0048) model time 0.3975 (0.4140) loss 8.1497 (7.6312) grad_norm 2.0390 (2.1285) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][330/625] eta 0:02:04 lr 0.000979 wd 0.0500 time 0.3958 (0.4206) data time 0.0007 (0.0047) model time 0.3951 (0.4155) loss 6.7945 (7.6252) grad_norm 1.9444 (2.1277) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:10:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][340/625] eta 0:02:00 lr 0.000979 wd 0.0500 time 0.4017 (0.4232) data time 0.0008 (0.0046) model time 0.4009 (0.4187) loss 5.9711 (7.6226) grad_norm 2.1258 (2.1219) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][350/625] eta 0:01:56 lr 0.000979 wd 0.0500 time 0.3977 (0.4226) data time 0.0007 (0.0045) model time 0.3970 (0.4181) loss 8.9335 (7.6217) grad_norm 2.1307 (2.1173) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][360/625] eta 0:01:51 lr 0.000979 wd 0.0500 time 0.4037 (0.4220) data time 0.0006 (0.0044) model time 0.4030 (0.4176) loss 8.5714 (7.6137) grad_norm 1.9781 (2.1065) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][370/625] eta 0:01:47 lr 0.000978 wd 0.0500 time 0.3992 (0.4215) data time 0.0009 (0.0043) model time 0.3984 (0.4171) loss 6.4360 (7.6145) grad_norm 1.9254 (2.1040) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][380/625] eta 0:01:43 lr 0.000978 wd 0.0500 time 0.3951 (0.4209) data time 0.0008 (0.0042) model time 0.3943 (0.4165) loss 6.2797 (7.6081) grad_norm 1.8253 (2.1001) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][390/625] eta 0:01:38 lr 0.000978 wd 0.0500 time 0.4100 (0.4205) data time 0.0008 (0.0041) model time 0.4091 (0.4161) loss 8.3625 (7.6007) grad_norm 1.9232 (2.1156) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][400/625] eta 0:01:34 lr 0.000978 wd 0.0500 time 0.4010 (0.4200) data time 0.0006 (0.0040) model time 0.4004 (0.4156) loss 7.8154 (7.6011) grad_norm 2.4335 (2.1167) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][410/625] eta 0:01:30 lr 0.000978 wd 0.0500 time 0.4032 (0.4213) data time 0.0007 (0.0040) model time 0.4025 (0.4172) loss 9.2931 (7.6137) grad_norm 2.8908 (2.1361) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][420/625] eta 0:01:26 lr 0.000978 wd 0.0500 time 0.4016 (0.4208) data time 0.0008 (0.0039) model time 0.4009 (0.4168) loss 7.7692 (7.6153) grad_norm 2.0193 (2.1308) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][430/625] eta 0:01:21 lr 0.000978 wd 0.0500 time 0.3981 (0.4203) data time 0.0007 (0.0038) model time 0.3974 (0.4163) loss 6.5546 (7.6241) grad_norm 1.5055 (2.1237) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][440/625] eta 0:01:17 lr 0.000978 wd 0.0500 time 0.3934 (0.4199) data time 0.0009 (0.0038) model time 0.3925 (0.4159) loss 6.5169 (7.6102) grad_norm 2.7488 (2.1322) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][450/625] eta 0:01:13 lr 0.000978 wd 0.0500 time 0.3973 (0.4195) data time 0.0009 (0.0037) model time 0.3964 (0.4155) loss 8.2085 (7.6037) grad_norm 1.8609 (2.1270) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][460/625] eta 0:01:09 lr 0.000978 wd 0.0500 time 0.4029 (0.4191) data time 0.0007 (0.0036) model time 0.4022 (0.4151) loss 7.5406 (7.6109) grad_norm 1.7646 (2.1208) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][470/625] eta 0:01:04 lr 0.000978 wd 0.0500 time 0.3984 (0.4187) data time 0.0007 (0.0036) model time 0.3977 (0.4148) loss 8.4981 (7.6096) grad_norm 1.9480 (2.1125) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:11:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][480/625] eta 0:01:00 lr 0.000978 wd 0.0500 time 0.3988 (0.4184) data time 0.0008 (0.0035) model time 0.3980 (0.4145) loss 6.9981 (7.6085) grad_norm 2.0389 (2.1087) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][490/625] eta 0:00:56 lr 0.000977 wd 0.0500 time 0.4058 (0.4180) data time 0.0006 (0.0035) model time 0.4052 (0.4141) loss 8.1959 (7.6143) grad_norm 2.2192 (2.1128) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][500/625] eta 0:00:52 lr 0.000977 wd 0.0500 time 0.3966 (0.4176) data time 0.0008 (0.0034) model time 0.3958 (0.4137) loss 8.3656 (7.6097) grad_norm 2.9459 (2.1145) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][510/625] eta 0:00:48 lr 0.000977 wd 0.0500 time 0.3991 (0.4219) data time 0.0008 (0.0034) model time 0.3982 (0.4186) loss 8.2647 (7.6140) grad_norm 1.5460 (2.1102) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][520/625] eta 0:00:44 lr 0.000977 wd 0.0500 time 0.4048 (0.4218) data time 0.0008 (0.0033) model time 0.4040 (0.4185) loss 6.5969 (7.6126) grad_norm 1.4450 (2.1007) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][530/625] eta 0:00:40 lr 0.000977 wd 0.0500 time 0.3965 (0.4214) data time 0.0007 (0.0033) model time 0.3957 (0.4181) loss 7.0265 (7.6088) grad_norm 1.5339 (2.0953) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][540/625] eta 0:00:35 lr 0.000977 wd 0.0500 time 0.4032 (0.4211) data time 0.0007 (0.0032) model time 0.4025 (0.4178) loss 6.6631 (7.6140) grad_norm 2.7796 (2.0949) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][550/625] eta 0:00:31 lr 0.000977 wd 0.0500 time 0.4004 (0.4219) data time 0.0009 (0.0032) model time 0.3995 (0.4187) loss 7.4393 (7.6099) grad_norm 1.8344 (2.0998) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][560/625] eta 0:00:27 lr 0.000977 wd 0.0500 time 0.3980 (0.4235) data time 0.0009 (0.0032) model time 0.3971 (0.4206) loss 7.8323 (7.6113) grad_norm 1.5891 (2.1127) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][570/625] eta 0:00:23 lr 0.000977 wd 0.0500 time 0.4004 (0.4231) data time 0.0007 (0.0031) model time 0.3998 (0.4201) loss 6.9764 (7.6036) grad_norm 1.8523 (2.1094) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][580/625] eta 0:00:19 lr 0.000977 wd 0.0500 time 0.8885 (0.4236) data time 0.0006 (0.0031) model time 0.8878 (0.4207) loss 6.2914 (7.6059) grad_norm 1.2996 (2.1043) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][590/625] eta 0:00:14 lr 0.000977 wd 0.0500 time 0.3984 (0.4233) data time 0.0009 (0.0031) model time 0.3975 (0.4204) loss 8.2622 (7.6134) grad_norm 1.6858 (2.0991) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][600/625] eta 0:00:10 lr 0.000977 wd 0.0500 time 0.4022 (0.4229) data time 0.0006 (0.0030) model time 0.4016 (0.4200) loss 6.7169 (7.6200) grad_norm 1.8414 (2.1006) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][610/625] eta 0:00:06 lr 0.000976 wd 0.0500 time 0.3964 (0.4225) data time 0.0004 (0.0030) model time 0.3960 (0.4196) loss 7.1549 (7.6268) grad_norm 3.6864 (2.1078) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][620/625] eta 0:00:02 lr 0.000976 wd 0.0500 time 0.3963 (0.4221) data time 0.0005 (0.0030) model time 0.3958 (0.4192) loss 7.0762 (7.6261) grad_norm 1.9740 (2.1056) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:12:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 99 training takes 0:04:23 [2024-07-24 23:12:58 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:12:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:13:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 6.423 (6.423) Loss 0.6367 (0.6367) Acc@1 87.744 (87.744) Acc@5 97.949 (97.949) Mem 14939MB [2024-07-24 23:13:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.666) Loss 1.0557 (0.7890) Acc@1 76.904 (83.518) Acc@5 93.896 (96.995) Mem 14939MB [2024-07-24 23:13:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.389) Loss 1.1406 (0.9335) Acc@1 73.486 (79.948) Acc@5 93.408 (95.338) Mem 14939MB [2024-07-24 23:13:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.637 Acc@5 95.284 [2024-07-24 23:13:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-07-24 23:13:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 13.727 (13.727) Loss 0.6152 (0.6152) Acc@1 87.744 (87.744) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-24 23:13:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (1.326) Loss 0.9956 (0.7597) Acc@1 78.613 (84.393) Acc@5 94.580 (97.088) Mem 14939MB [2024-07-24 23:13:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.735) Loss 1.1328 (0.9005) Acc@1 73.340 (80.669) Acc@5 93.652 (95.524) Mem 14939MB [2024-07-24 23:13:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.302 Acc@5 95.507 [2024-07-24 23:13:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-24 23:13:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.30% [2024-07-24 23:13:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:13:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:13:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][0/625] eta 1:09:00 lr 0.000976 wd 0.0500 time 6.6246 (6.6246) data time 5.3659 (5.3659) model time 0.0000 (0.0000) loss 8.0042 (8.0042) grad_norm 2.0448 (2.0448) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][10/625] eta 0:09:52 lr 0.000976 wd 0.0500 time 0.3991 (0.9636) data time 0.0007 (0.4886) model time 0.0000 (0.0000) loss 6.8184 (7.4625) grad_norm 1.4562 (2.0405) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:13:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][20/625] eta 0:07:00 lr 0.000976 wd 0.0500 time 0.3981 (0.6943) data time 0.0008 (0.2564) model time 0.0000 (0.0000) loss 7.6994 (7.4198) grad_norm 1.6367 (2.0326) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:13:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][30/625] eta 0:06:37 lr 0.000976 wd 0.0500 time 0.3974 (0.6683) data time 0.0008 (0.2364) model time 0.0000 (0.0000) loss 8.8089 (7.4880) grad_norm 1.8526 (1.9266) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:13:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][40/625] eta 0:05:52 lr 0.000976 wd 0.0500 time 0.3993 (0.6025) data time 0.0007 (0.1790) model time 0.0000 (0.0000) loss 7.5624 (7.5050) grad_norm 1.4396 (1.9373) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:13:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][50/625] eta 0:05:25 lr 0.000976 wd 0.0500 time 0.3918 (0.5661) data time 0.0007 (0.1441) model time 0.0000 (0.0000) loss 8.3374 (7.5761) grad_norm 1.6266 (1.9065) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:13:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][60/625] eta 0:05:04 lr 0.000976 wd 0.0500 time 0.3934 (0.5389) data time 0.0008 (0.1206) model time 0.3925 (0.3990) loss 8.9416 (7.5846) grad_norm 2.0182 (1.8796) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][70/625] eta 0:04:48 lr 0.000976 wd 0.0500 time 0.3934 (0.5191) data time 0.0007 (0.1037) model time 0.3927 (0.3983) loss 8.4970 (7.6390) grad_norm 3.2897 (1.9418) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][80/625] eta 0:04:34 lr 0.000976 wd 0.0500 time 0.4027 (0.5044) data time 0.0009 (0.0911) model time 0.4018 (0.3986) loss 8.5251 (7.6205) grad_norm 2.2048 (2.0107) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][90/625] eta 0:04:23 lr 0.000976 wd 0.0500 time 0.4018 (0.4928) data time 0.0008 (0.0811) model time 0.4010 (0.3984) loss 7.8983 (7.6104) grad_norm 1.8781 (2.0423) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][100/625] eta 0:04:14 lr 0.000976 wd 0.0500 time 0.4064 (0.4839) data time 0.0009 (0.0732) model time 0.4055 (0.3991) loss 7.3542 (7.6280) grad_norm 1.9619 (2.0765) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][110/625] eta 0:04:05 lr 0.000975 wd 0.0500 time 0.3973 (0.4768) data time 0.0007 (0.0667) model time 0.3966 (0.3999) loss 8.4240 (7.6296) grad_norm 1.4398 (2.0595) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][120/625] eta 0:03:57 lr 0.000975 wd 0.0500 time 0.3999 (0.4707) data time 0.0010 (0.0613) model time 0.3990 (0.4002) loss 7.7225 (7.6510) grad_norm 3.0429 (2.0718) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][130/625] eta 0:03:50 lr 0.000975 wd 0.0500 time 0.3936 (0.4654) data time 0.0009 (0.0567) model time 0.3928 (0.4003) loss 8.4661 (7.6338) grad_norm 1.5519 (2.0469) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][140/625] eta 0:03:44 lr 0.000975 wd 0.0500 time 0.5488 (0.4619) data time 0.0007 (0.0527) model time 0.5481 (0.4019) loss 7.0975 (7.6249) grad_norm 1.4863 (2.0242) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][150/625] eta 0:03:39 lr 0.000975 wd 0.0500 time 0.5916 (0.4628) data time 0.0007 (0.0493) model time 0.5910 (0.4092) loss 7.9842 (7.6342) grad_norm 2.5626 (2.0631) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][160/625] eta 0:03:38 lr 0.000975 wd 0.0500 time 0.4029 (0.4708) data time 0.0007 (0.0463) model time 0.4023 (0.4258) loss 7.2543 (7.6529) grad_norm 2.0600 (2.0593) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][170/625] eta 0:03:32 lr 0.000975 wd 0.0500 time 0.4066 (0.4668) data time 0.0006 (0.0436) model time 0.4059 (0.4237) loss 6.4358 (7.6499) grad_norm 1.6071 (2.0444) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][180/625] eta 0:03:26 lr 0.000975 wd 0.0500 time 0.4003 (0.4632) data time 0.0009 (0.0413) model time 0.3994 (0.4219) loss 7.8604 (7.6398) grad_norm 1.6294 (2.0469) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][190/625] eta 0:03:20 lr 0.000975 wd 0.0500 time 0.3958 (0.4600) data time 0.0007 (0.0392) model time 0.3950 (0.4204) loss 6.4002 (7.6347) grad_norm 2.6404 (2.0379) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:14:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][200/625] eta 0:03:14 lr 0.000975 wd 0.0500 time 0.3980 (0.4571) data time 0.0008 (0.0373) model time 0.3972 (0.4191) loss 8.6226 (7.6469) grad_norm 1.5724 (2.0366) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][210/625] eta 0:03:08 lr 0.000975 wd 0.0500 time 0.3990 (0.4544) data time 0.0008 (0.0355) model time 0.3982 (0.4179) loss 7.3682 (7.6589) grad_norm 3.8207 (2.0441) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][220/625] eta 0:03:03 lr 0.000975 wd 0.0500 time 0.3975 (0.4520) data time 0.0009 (0.0340) model time 0.3966 (0.4168) loss 6.4663 (7.6724) grad_norm 1.8752 (2.0586) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][230/625] eta 0:02:57 lr 0.000974 wd 0.0500 time 0.3981 (0.4500) data time 0.0007 (0.0325) model time 0.3974 (0.4161) loss 6.7393 (7.6639) grad_norm 2.5475 (2.0656) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][240/625] eta 0:02:52 lr 0.000974 wd 0.0500 time 0.4010 (0.4480) data time 0.0006 (0.0312) model time 0.4004 (0.4154) loss 6.8597 (7.6604) grad_norm 1.4459 (2.0538) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][250/625] eta 0:02:48 lr 0.000974 wd 0.0500 time 0.4063 (0.4487) data time 0.0008 (0.0300) model time 0.4054 (0.4178) loss 7.6310 (7.6693) grad_norm 2.1698 (2.0437) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][260/625] eta 0:02:43 lr 0.000974 wd 0.0500 time 0.4005 (0.4469) data time 0.0006 (0.0289) model time 0.4000 (0.4170) loss 8.3896 (7.6719) grad_norm 3.1653 (2.0718) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][270/625] eta 0:02:40 lr 0.000974 wd 0.0500 time 1.7839 (0.4510) data time 0.0014 (0.0279) model time 1.7826 (0.4234) loss 8.3721 (7.6685) grad_norm 1.8485 (2.0899) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][280/625] eta 0:02:34 lr 0.000974 wd 0.0500 time 0.4046 (0.4492) data time 0.0009 (0.0269) model time 0.4038 (0.4224) loss 7.8886 (7.6637) grad_norm 1.4305 (2.0897) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][290/625] eta 0:02:29 lr 0.000974 wd 0.0500 time 0.3979 (0.4475) data time 0.0007 (0.0260) model time 0.3973 (0.4214) loss 7.3103 (7.6661) grad_norm 1.5482 (2.0822) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][300/625] eta 0:02:24 lr 0.000974 wd 0.0500 time 0.4000 (0.4460) data time 0.0008 (0.0252) model time 0.3991 (0.4206) loss 8.9186 (7.6763) grad_norm 1.8666 (2.0712) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][310/625] eta 0:02:20 lr 0.000974 wd 0.0500 time 0.3971 (0.4447) data time 0.0007 (0.0244) model time 0.3964 (0.4199) loss 7.9816 (7.6830) grad_norm 1.7651 (2.0691) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][320/625] eta 0:02:15 lr 0.000974 wd 0.0500 time 0.3998 (0.4434) data time 0.0009 (0.0237) model time 0.3989 (0.4193) loss 6.6711 (7.6784) grad_norm 2.4052 (2.0617) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][330/625] eta 0:02:10 lr 0.000974 wd 0.0500 time 0.3966 (0.4420) data time 0.0007 (0.0230) model time 0.3959 (0.4185) loss 8.8384 (7.6824) grad_norm 2.8986 (2.0647) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][340/625] eta 0:02:05 lr 0.000974 wd 0.0500 time 0.3932 (0.4411) data time 0.0008 (0.0223) model time 0.3924 (0.4181) loss 6.6893 (7.6861) grad_norm 2.4212 (2.0655) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:15:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][350/625] eta 0:02:00 lr 0.000973 wd 0.0500 time 0.4034 (0.4399) data time 0.0007 (0.0217) model time 0.4027 (0.4175) loss 7.8622 (7.6833) grad_norm 1.7047 (2.0691) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][360/625] eta 0:01:56 lr 0.000973 wd 0.0500 time 0.4279 (0.4389) data time 0.0010 (0.0212) model time 0.4269 (0.4171) loss 8.7002 (7.6801) grad_norm 2.0712 (2.0640) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][370/625] eta 0:01:53 lr 0.000973 wd 0.0500 time 0.5720 (0.4460) data time 0.0006 (0.0206) model time 0.5713 (0.4260) loss 8.0751 (7.6722) grad_norm 1.6550 (2.0593) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][380/625] eta 0:01:49 lr 0.000973 wd 0.0500 time 0.4011 (0.4468) data time 0.0008 (0.0201) model time 0.4004 (0.4275) loss 7.4388 (7.6720) grad_norm 1.8217 (2.0604) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][390/625] eta 0:01:44 lr 0.000973 wd 0.0500 time 0.4158 (0.4457) data time 0.0009 (0.0196) model time 0.4149 (0.4267) loss 8.3081 (7.6713) grad_norm 2.3398 (2.0614) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][400/625] eta 0:01:40 lr 0.000973 wd 0.0500 time 0.4138 (0.4447) data time 0.0008 (0.0191) model time 0.4131 (0.4261) loss 6.9661 (7.6596) grad_norm 3.1431 (2.0619) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][410/625] eta 0:01:35 lr 0.000973 wd 0.0500 time 0.3939 (0.4437) data time 0.0008 (0.0187) model time 0.3931 (0.4254) loss 8.6559 (7.6650) grad_norm 2.3344 (2.0755) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][420/625] eta 0:01:30 lr 0.000973 wd 0.0500 time 0.4018 (0.4427) data time 0.0006 (0.0183) model time 0.4012 (0.4247) loss 7.0171 (7.6631) grad_norm 7.2434 (2.1226) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][430/625] eta 0:01:26 lr 0.000973 wd 0.0500 time 0.4006 (0.4417) data time 0.0008 (0.0179) model time 0.3998 (0.4241) loss 7.2650 (7.6691) grad_norm 3.3892 (2.1424) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][440/625] eta 0:01:21 lr 0.000973 wd 0.0500 time 0.3973 (0.4410) data time 0.0009 (0.0175) model time 0.3964 (0.4236) loss 7.6825 (7.6744) grad_norm 2.4006 (2.1492) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][450/625] eta 0:01:17 lr 0.000973 wd 0.0500 time 0.4053 (0.4401) data time 0.0007 (0.0171) model time 0.4046 (0.4231) loss 7.0516 (7.6868) grad_norm 1.4961 (2.1393) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][460/625] eta 0:01:12 lr 0.000973 wd 0.0500 time 0.3962 (0.4393) data time 0.0007 (0.0168) model time 0.3955 (0.4226) loss 7.6031 (7.6853) grad_norm 2.1194 (2.1415) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][470/625] eta 0:01:07 lr 0.000972 wd 0.0500 time 0.4004 (0.4384) data time 0.0006 (0.0164) model time 0.3998 (0.4220) loss 7.2941 (7.6902) grad_norm 2.0585 (2.1430) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][480/625] eta 0:01:03 lr 0.000972 wd 0.0500 time 0.3987 (0.4376) data time 0.0008 (0.0161) model time 0.3979 (0.4214) loss 8.7089 (7.6907) grad_norm 2.2686 (2.1450) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:16:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][490/625] eta 0:00:59 lr 0.000972 wd 0.0500 time 0.4050 (0.4387) data time 0.0008 (0.0158) model time 0.4042 (0.4230) loss 8.2323 (7.6907) grad_norm 1.7913 (2.1394) loss_scale 4096.0000 (2081.3686) mem 14939MB [2024-07-24 23:17:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][500/625] eta 0:00:54 lr 0.000972 wd 0.0500 time 0.3978 (0.4381) data time 0.0007 (0.0155) model time 0.3971 (0.4226) loss 8.5179 (7.6930) grad_norm 1.8202 (2.1353) loss_scale 4096.0000 (2121.5808) mem 14939MB [2024-07-24 23:17:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][510/625] eta 0:00:50 lr 0.000972 wd 0.0500 time 0.4003 (0.4374) data time 0.0008 (0.0152) model time 0.3995 (0.4222) loss 8.1035 (7.6907) grad_norm 2.9511 (2.1345) loss_scale 4096.0000 (2160.2192) mem 14939MB [2024-07-24 23:17:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][520/625] eta 0:00:45 lr 0.000972 wd 0.0500 time 0.4039 (0.4367) data time 0.0008 (0.0150) model time 0.4031 (0.4217) loss 8.2695 (7.6839) grad_norm 2.0348 (2.1335) loss_scale 4096.0000 (2197.3743) mem 14939MB [2024-07-24 23:17:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][530/625] eta 0:00:41 lr 0.000972 wd 0.0500 time 0.4014 (0.4360) data time 0.0008 (0.0147) model time 0.4006 (0.4213) loss 7.4747 (7.6883) grad_norm 1.4942 (2.1302) loss_scale 4096.0000 (2233.1299) mem 14939MB [2024-07-24 23:17:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][540/625] eta 0:00:37 lr 0.000972 wd 0.0500 time 0.4172 (0.4354) data time 0.0008 (0.0144) model time 0.4164 (0.4209) loss 7.0389 (7.6878) grad_norm 1.5739 (2.1314) loss_scale 4096.0000 (2267.5638) mem 14939MB [2024-07-24 23:17:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][550/625] eta 0:00:32 lr 0.000972 wd 0.0500 time 0.4029 (0.4350) data time 0.0007 (0.0142) model time 0.4022 (0.4206) loss 7.6377 (7.6923) grad_norm 1.6437 (2.1366) loss_scale 4096.0000 (2300.7477) mem 14939MB [2024-07-24 23:17:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][560/625] eta 0:00:28 lr 0.000972 wd 0.0500 time 0.3984 (0.4344) data time 0.0008 (0.0140) model time 0.3977 (0.4203) loss 7.0116 (7.6934) grad_norm 1.8058 (2.1370) loss_scale 4096.0000 (2332.7487) mem 14939MB [2024-07-24 23:17:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][570/625] eta 0:00:23 lr 0.000972 wd 0.0500 time 0.3949 (0.4338) data time 0.0008 (0.0137) model time 0.3941 (0.4199) loss 8.5799 (7.6943) grad_norm 3.3739 (2.1384) loss_scale 4096.0000 (2363.6287) mem 14939MB [2024-07-24 23:17:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][580/625] eta 0:00:19 lr 0.000971 wd 0.0500 time 0.4008 (0.4333) data time 0.0008 (0.0135) model time 0.4000 (0.4195) loss 6.5045 (7.6858) grad_norm 2.0181 (2.1392) loss_scale 4096.0000 (2393.4458) mem 14939MB [2024-07-24 23:17:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][590/625] eta 0:00:15 lr 0.000971 wd 0.0500 time 0.3986 (0.4342) data time 0.0006 (0.0133) model time 0.3980 (0.4208) loss 7.6491 (7.6886) grad_norm 1.5130 (2.1334) loss_scale 4096.0000 (2422.2538) mem 14939MB [2024-07-24 23:17:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][600/625] eta 0:00:10 lr 0.000971 wd 0.0500 time 0.3980 (0.4353) data time 0.0007 (0.0131) model time 0.3973 (0.4222) loss 6.8735 (7.6843) grad_norm 1.7131 (2.1266) loss_scale 4096.0000 (2450.1032) mem 14939MB [2024-07-24 23:17:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][610/625] eta 0:00:06 lr 0.000971 wd 0.0500 time 0.4036 (0.4348) data time 0.0006 (0.0129) model time 0.4030 (0.4219) loss 5.6747 (7.6774) grad_norm 3.7558 (2.1258) loss_scale 4096.0000 (2477.0409) mem 14939MB [2024-07-24 23:17:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][620/625] eta 0:00:02 lr 0.000971 wd 0.0500 time 0.4051 (0.4342) data time 0.0004 (0.0127) model time 0.4047 (0.4215) loss 7.3711 (7.6791) grad_norm 2.4334 (2.1265) loss_scale 4096.0000 (2503.1111) mem 14939MB [2024-07-24 23:17:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 100 training takes 0:04:31 [2024-07-24 23:17:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:17:56 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:17:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.887 (0.887) Loss 0.6421 (0.6421) Acc@1 87.939 (87.939) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-24 23:17:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.159) Loss 1.0430 (0.8020) Acc@1 77.832 (83.887) Acc@5 94.385 (96.950) Mem 14939MB [2024-07-24 23:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 1.1748 (0.9455) Acc@1 73.193 (80.294) Acc@5 93.262 (95.366) Mem 14939MB [2024-07-24 23:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.862 Acc@5 95.333 [2024-07-24 23:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-24 23:18:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.031 (1.031) Loss 0.6138 (0.6138) Acc@1 87.744 (87.744) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-24 23:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.179) Loss 0.9937 (0.7583) Acc@1 78.564 (84.375) Acc@5 94.678 (97.110) Mem 14939MB [2024-07-24 23:18:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.135) Loss 1.1299 (0.8989) Acc@1 73.438 (80.699) Acc@5 93.701 (95.543) Mem 14939MB [2024-07-24 23:18:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.346 Acc@5 95.537 [2024-07-24 23:18:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-24 23:18:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.35% [2024-07-24 23:18:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:18:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:18:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][0/625] eta 0:10:28 lr 0.000971 wd 0.0500 time 1.0056 (1.0056) data time 0.6143 (0.6143) model time 0.0000 (0.0000) loss 7.8822 (7.8822) grad_norm 3.1999 (3.1999) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][10/625] eta 0:04:40 lr 0.000971 wd 0.0500 time 0.4017 (0.4567) data time 0.0006 (0.0567) model time 0.0000 (0.0000) loss 7.0682 (7.7561) grad_norm 2.1220 (2.6002) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][20/625] eta 0:04:24 lr 0.000971 wd 0.0500 time 0.3990 (0.4367) data time 0.0008 (0.0301) model time 0.0000 (0.0000) loss 8.6238 (7.9496) grad_norm 2.1851 (2.4945) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][30/625] eta 0:04:13 lr 0.000971 wd 0.0500 time 0.4198 (0.4260) data time 0.0008 (0.0207) model time 0.0000 (0.0000) loss 7.9906 (7.8054) grad_norm 1.5144 (2.3683) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][40/625] eta 0:04:05 lr 0.000971 wd 0.0500 time 0.4054 (0.4204) data time 0.0011 (0.0159) model time 0.0000 (0.0000) loss 7.7296 (7.7082) grad_norm 1.6059 (2.2600) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][50/625] eta 0:03:59 lr 0.000971 wd 0.0500 time 0.4048 (0.4169) data time 0.0007 (0.0130) model time 0.0000 (0.0000) loss 8.4038 (7.8437) grad_norm 2.8854 (2.2675) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][60/625] eta 0:03:54 lr 0.000971 wd 0.0500 time 0.3993 (0.4142) data time 0.0008 (0.0110) model time 0.3986 (0.3991) loss 8.7542 (7.8031) grad_norm 1.4196 (2.2470) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][70/625] eta 0:03:48 lr 0.000971 wd 0.0500 time 0.4019 (0.4125) data time 0.0007 (0.0096) model time 0.4012 (0.4003) loss 8.4976 (7.7794) grad_norm 1.5611 (2.2076) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][80/625] eta 0:03:44 lr 0.000970 wd 0.0500 time 0.4012 (0.4110) data time 0.0008 (0.0085) model time 0.4005 (0.4000) loss 6.2230 (7.7245) grad_norm 1.3114 (2.1702) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][90/625] eta 0:03:39 lr 0.000970 wd 0.0500 time 0.3977 (0.4098) data time 0.0008 (0.0077) model time 0.3969 (0.3999) loss 7.0635 (7.6715) grad_norm 2.1324 (2.1481) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][100/625] eta 0:03:46 lr 0.000970 wd 0.0500 time 0.8240 (0.4313) data time 0.0933 (0.0079) model time 0.7306 (0.4431) loss 8.4433 (7.6744) grad_norm 3.1669 (2.1226) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][110/625] eta 0:03:50 lr 0.000970 wd 0.0500 time 0.3977 (0.4471) data time 0.0006 (0.0259) model time 0.3970 (0.4358) loss 7.8344 (7.6557) grad_norm 1.6200 (2.1142) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][120/625] eta 0:03:43 lr 0.000970 wd 0.0500 time 0.4088 (0.4435) data time 0.0007 (0.0239) model time 0.4082 (0.4311) loss 6.4714 (7.6545) grad_norm 1.7778 (2.0764) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][130/625] eta 0:03:38 lr 0.000970 wd 0.0500 time 0.4014 (0.4406) data time 0.0007 (0.0221) model time 0.4007 (0.4278) loss 6.8484 (7.6374) grad_norm 1.7304 (2.0441) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][140/625] eta 0:03:32 lr 0.000970 wd 0.0500 time 0.4050 (0.4381) data time 0.0006 (0.0206) model time 0.4044 (0.4251) loss 7.2582 (7.6286) grad_norm 2.2131 (2.0323) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][150/625] eta 0:03:26 lr 0.000970 wd 0.0500 time 0.4068 (0.4355) data time 0.0008 (0.0193) model time 0.4059 (0.4225) loss 7.2814 (7.6229) grad_norm 3.2862 (2.0719) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][160/625] eta 0:03:24 lr 0.000970 wd 0.0500 time 0.7331 (0.4400) data time 0.0008 (0.0182) model time 0.7323 (0.4301) loss 6.1347 (7.6302) grad_norm 2.1025 (2.0760) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][170/625] eta 0:03:24 lr 0.000970 wd 0.0500 time 0.4023 (0.4494) data time 0.0007 (0.0261) model time 0.4016 (0.4315) loss 6.5524 (7.6129) grad_norm 1.7521 (2.0951) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][180/625] eta 0:03:19 lr 0.000970 wd 0.0500 time 0.4196 (0.4489) data time 0.0009 (0.0247) model time 0.4187 (0.4321) loss 7.0884 (7.6136) grad_norm 2.4584 (2.0899) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][190/625] eta 0:03:17 lr 0.000970 wd 0.0500 time 0.5979 (0.4539) data time 0.0008 (0.0234) model time 0.5971 (0.4402) loss 7.6081 (7.6290) grad_norm 1.9454 (2.0958) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][200/625] eta 0:03:12 lr 0.000969 wd 0.0500 time 0.3962 (0.4531) data time 0.0010 (0.0223) model time 0.3952 (0.4400) loss 6.3612 (7.5992) grad_norm 3.5792 (2.0959) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][210/625] eta 0:03:07 lr 0.000969 wd 0.0500 time 0.3954 (0.4507) data time 0.0009 (0.0213) model time 0.3945 (0.4374) loss 7.6021 (7.5971) grad_norm 3.8491 (2.0949) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][220/625] eta 0:03:01 lr 0.000969 wd 0.0500 time 0.4051 (0.4484) data time 0.0008 (0.0204) model time 0.4043 (0.4352) loss 7.1866 (7.5973) grad_norm 1.4071 (2.0998) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][230/625] eta 0:02:56 lr 0.000969 wd 0.0500 time 0.4034 (0.4464) data time 0.0008 (0.0195) model time 0.4026 (0.4333) loss 6.7318 (7.5982) grad_norm 1.8202 (2.0950) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][240/625] eta 0:02:51 lr 0.000969 wd 0.0500 time 0.3959 (0.4451) data time 0.0007 (0.0188) model time 0.3952 (0.4323) loss 6.9671 (7.5819) grad_norm 1.8709 (2.0963) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][250/625] eta 0:02:46 lr 0.000969 wd 0.0500 time 0.4031 (0.4433) data time 0.0009 (0.0181) model time 0.4022 (0.4307) loss 8.3719 (7.5977) grad_norm 1.3710 (2.0932) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:19:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][260/625] eta 0:02:41 lr 0.000969 wd 0.0500 time 0.4003 (0.4417) data time 0.0006 (0.0174) model time 0.3997 (0.4292) loss 6.6178 (7.5840) grad_norm 5.1875 (2.1285) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:20:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][270/625] eta 0:02:36 lr 0.000969 wd 0.0500 time 0.4012 (0.4402) data time 0.0009 (0.0168) model time 0.4003 (0.4279) loss 7.3299 (7.5731) grad_norm 2.2262 (2.1561) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:20:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][280/625] eta 0:02:31 lr 0.000969 wd 0.0500 time 0.3952 (0.4388) data time 0.0010 (0.0162) model time 0.3941 (0.4266) loss 9.3719 (7.5871) grad_norm 1.9731 (2.1545) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:20:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][290/625] eta 0:02:26 lr 0.000969 wd 0.0500 time 0.4003 (0.4375) data time 0.0009 (0.0157) model time 0.3994 (0.4256) loss 8.9314 (7.5886) grad_norm 3.3057 (2.1593) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:20:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][300/625] eta 0:02:21 lr 0.000969 wd 0.0500 time 0.4076 (0.4364) data time 0.0008 (0.0152) model time 0.4067 (0.4246) loss 8.0126 (7.5988) grad_norm 1.7016 (2.1536) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:20:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][310/625] eta 0:02:17 lr 0.000969 wd 0.0500 time 0.4005 (0.4352) data time 0.0007 (0.0148) model time 0.3997 (0.4237) loss 7.8017 (7.5980) grad_norm 1.7719 (2.1490) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:20:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][320/625] eta 0:02:12 lr 0.000968 wd 0.0500 time 0.3972 (0.4343) data time 0.0007 (0.0143) model time 0.3966 (0.4230) loss 8.4646 (7.5920) grad_norm 2.0143 (2.1501) loss_scale 4096.0000 (4096.0000) mem 14939MB [2024-07-24 23:20:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][330/625] eta 0:02:07 lr 0.000968 wd 0.0500 time 0.3982 (0.4332) data time 0.0009 (0.0139) model time 0.3973 (0.4221) loss 7.5201 (7.5872) grad_norm 2.1743 (inf) loss_scale 2048.0000 (4071.2508) mem 14939MB [2024-07-24 23:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][340/625] eta 0:02:03 lr 0.000968 wd 0.0500 time 0.4011 (0.4323) data time 0.0008 (0.0135) model time 0.4004 (0.4214) loss 7.2572 (7.5926) grad_norm 1.5232 (inf) loss_scale 2048.0000 (4011.9179) mem 14939MB [2024-07-24 23:20:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][350/625] eta 0:01:58 lr 0.000968 wd 0.0500 time 0.3993 (0.4315) data time 0.0009 (0.0132) model time 0.3984 (0.4207) loss 7.9907 (7.6011) grad_norm 2.7016 (inf) loss_scale 2048.0000 (3955.9658) mem 14939MB [2024-07-24 23:20:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][360/625] eta 0:01:54 lr 0.000968 wd 0.0500 time 0.4010 (0.4307) data time 0.0007 (0.0128) model time 0.4003 (0.4201) loss 8.3480 (7.5986) grad_norm 1.8386 (inf) loss_scale 2048.0000 (3903.1136) mem 14939MB [2024-07-24 23:20:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][370/625] eta 0:01:49 lr 0.000968 wd 0.0500 time 0.3994 (0.4299) data time 0.0008 (0.0125) model time 0.3986 (0.4195) loss 7.1174 (7.6023) grad_norm 2.0695 (inf) loss_scale 2048.0000 (3853.1105) mem 14939MB [2024-07-24 23:20:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][380/625] eta 0:01:45 lr 0.000968 wd 0.0500 time 0.4009 (0.4291) data time 0.0006 (0.0122) model time 0.4003 (0.4189) loss 6.5391 (7.6010) grad_norm 2.0337 (inf) loss_scale 2048.0000 (3805.7323) mem 14939MB [2024-07-24 23:20:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][390/625] eta 0:01:40 lr 0.000968 wd 0.0500 time 0.3992 (0.4284) data time 0.0009 (0.0119) model time 0.3983 (0.4184) loss 7.6962 (7.6086) grad_norm 1.8301 (inf) loss_scale 2048.0000 (3760.7775) mem 14939MB [2024-07-24 23:20:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][400/625] eta 0:01:36 lr 0.000968 wd 0.0500 time 0.5853 (0.4286) data time 0.0009 (0.0117) model time 0.5844 (0.4189) loss 8.4847 (7.6009) grad_norm 2.2161 (inf) loss_scale 2048.0000 (3718.0648) mem 14939MB [2024-07-24 23:21:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][410/625] eta 0:01:32 lr 0.000968 wd 0.0500 time 0.5958 (0.4302) data time 0.0012 (0.0114) model time 0.5946 (0.4210) loss 6.3594 (7.5897) grad_norm 1.5234 (inf) loss_scale 2048.0000 (3677.4307) mem 14939MB [2024-07-24 23:21:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][420/625] eta 0:01:28 lr 0.000968 wd 0.0500 time 0.3956 (0.4305) data time 0.0009 (0.0111) model time 0.3947 (0.4214) loss 8.6067 (7.5861) grad_norm 1.7284 (inf) loss_scale 2048.0000 (3638.7268) mem 14939MB [2024-07-24 23:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][430/625] eta 0:01:23 lr 0.000967 wd 0.0500 time 0.4037 (0.4298) data time 0.0011 (0.0109) model time 0.4026 (0.4209) loss 8.0272 (7.5781) grad_norm 2.1206 (inf) loss_scale 2048.0000 (3601.8190) mem 14939MB [2024-07-24 23:21:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][440/625] eta 0:01:19 lr 0.000967 wd 0.0500 time 0.4077 (0.4292) data time 0.0009 (0.0107) model time 0.4069 (0.4204) loss 7.4093 (7.5736) grad_norm 3.9135 (inf) loss_scale 2048.0000 (3566.5850) mem 14939MB [2024-07-24 23:21:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][450/625] eta 0:01:15 lr 0.000967 wd 0.0500 time 0.3972 (0.4286) data time 0.0007 (0.0105) model time 0.3965 (0.4200) loss 8.6983 (7.5859) grad_norm 1.5262 (inf) loss_scale 2048.0000 (3532.9135) mem 14939MB [2024-07-24 23:21:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][460/625] eta 0:01:10 lr 0.000967 wd 0.0500 time 0.3823 (0.4284) data time 0.0008 (0.0103) model time 0.3815 (0.4199) loss 8.1717 (7.5818) grad_norm 2.0593 (inf) loss_scale 2048.0000 (3500.7028) mem 14939MB [2024-07-24 23:21:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][470/625] eta 0:01:06 lr 0.000967 wd 0.0500 time 0.4003 (0.4278) data time 0.0008 (0.0101) model time 0.3995 (0.4194) loss 7.1495 (7.5871) grad_norm 4.0427 (inf) loss_scale 2048.0000 (3469.8599) mem 14939MB [2024-07-24 23:21:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][480/625] eta 0:01:01 lr 0.000967 wd 0.0500 time 0.3982 (0.4273) data time 0.0006 (0.0099) model time 0.3976 (0.4190) loss 7.8109 (7.5972) grad_norm 1.8333 (inf) loss_scale 2048.0000 (3440.2994) mem 14939MB [2024-07-24 23:21:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][490/625] eta 0:00:57 lr 0.000967 wd 0.0500 time 0.3967 (0.4268) data time 0.0008 (0.0097) model time 0.3960 (0.4186) loss 8.7253 (7.6028) grad_norm 2.0257 (inf) loss_scale 2048.0000 (3411.9430) mem 14939MB [2024-07-24 23:21:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][500/625] eta 0:00:53 lr 0.000967 wd 0.0500 time 0.4004 (0.4263) data time 0.0007 (0.0095) model time 0.3998 (0.4182) loss 8.0732 (7.6077) grad_norm 2.0217 (inf) loss_scale 2048.0000 (3384.7186) mem 14939MB [2024-07-24 23:21:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][510/625] eta 0:00:48 lr 0.000967 wd 0.0500 time 0.4010 (0.4258) data time 0.0006 (0.0094) model time 0.4004 (0.4178) loss 6.7789 (7.6130) grad_norm 1.6995 (inf) loss_scale 2048.0000 (3358.5597) mem 14939MB [2024-07-24 23:21:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][520/625] eta 0:00:44 lr 0.000967 wd 0.0500 time 0.3985 (0.4253) data time 0.0009 (0.0092) model time 0.3976 (0.4175) loss 8.4327 (7.6180) grad_norm 1.4695 (inf) loss_scale 2048.0000 (3333.4050) mem 14939MB [2024-07-24 23:21:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][530/625] eta 0:00:40 lr 0.000967 wd 0.0500 time 0.3998 (0.4249) data time 0.0007 (0.0090) model time 0.3991 (0.4171) loss 8.2889 (7.6218) grad_norm 3.0083 (inf) loss_scale 2048.0000 (3309.1977) mem 14939MB [2024-07-24 23:21:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][540/625] eta 0:00:36 lr 0.000967 wd 0.0500 time 0.3978 (0.4247) data time 0.0006 (0.0089) model time 0.3972 (0.4171) loss 7.7435 (7.6145) grad_norm 1.6845 (inf) loss_scale 2048.0000 (3285.8854) mem 14939MB [2024-07-24 23:21:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][550/625] eta 0:00:31 lr 0.000966 wd 0.0500 time 0.4036 (0.4243) data time 0.0007 (0.0087) model time 0.4029 (0.4167) loss 6.4778 (7.6186) grad_norm 1.3666 (inf) loss_scale 2048.0000 (3263.4192) mem 14939MB [2024-07-24 23:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][560/625] eta 0:00:27 lr 0.000966 wd 0.0500 time 0.4047 (0.4239) data time 0.0009 (0.0086) model time 0.4037 (0.4164) loss 6.2403 (7.6256) grad_norm 1.9966 (inf) loss_scale 2048.0000 (3241.7540) mem 14939MB [2024-07-24 23:22:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][570/625] eta 0:00:23 lr 0.000966 wd 0.0500 time 0.4005 (0.4236) data time 0.0008 (0.0085) model time 0.3997 (0.4162) loss 8.3660 (7.6334) grad_norm 2.8805 (inf) loss_scale 2048.0000 (3220.8476) mem 14939MB [2024-07-24 23:22:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][580/625] eta 0:00:19 lr 0.000966 wd 0.0500 time 0.3994 (0.4232) data time 0.0008 (0.0083) model time 0.3986 (0.4159) loss 7.0504 (7.6379) grad_norm 1.5551 (inf) loss_scale 2048.0000 (3200.6609) mem 14939MB [2024-07-24 23:22:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][590/625] eta 0:00:14 lr 0.000966 wd 0.0500 time 0.4244 (0.4229) data time 0.0010 (0.0082) model time 0.4234 (0.4157) loss 6.8014 (7.6247) grad_norm 2.2251 (inf) loss_scale 2048.0000 (3181.1574) mem 14939MB [2024-07-24 23:22:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][600/625] eta 0:00:10 lr 0.000966 wd 0.0500 time 0.4054 (0.4225) data time 0.0008 (0.0081) model time 0.4046 (0.4154) loss 7.2308 (7.6166) grad_norm 2.3421 (inf) loss_scale 2048.0000 (3162.3028) mem 14939MB [2024-07-24 23:22:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][610/625] eta 0:00:06 lr 0.000966 wd 0.0500 time 0.4070 (0.4222) data time 0.0004 (0.0080) model time 0.4066 (0.4151) loss 9.2685 (7.6214) grad_norm 2.5710 (inf) loss_scale 2048.0000 (3144.0655) mem 14939MB [2024-07-24 23:22:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][620/625] eta 0:00:02 lr 0.000966 wd 0.0500 time 0.3978 (0.4219) data time 0.0004 (0.0079) model time 0.3974 (0.4150) loss 7.8179 (7.6249) grad_norm 1.6504 (inf) loss_scale 2048.0000 (3126.4155) mem 14939MB [2024-07-24 23:22:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 101 training takes 0:04:24 [2024-07-24 23:22:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:22:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:22:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 3.179 (3.179) Loss 0.6191 (0.6191) Acc@1 87.598 (87.598) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-24 23:22:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.368) Loss 1.0391 (0.7800) Acc@1 76.953 (83.736) Acc@5 94.434 (96.871) Mem 14939MB [2024-07-24 23:22:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.233) Loss 1.1611 (0.9265) Acc@1 72.559 (79.980) Acc@5 92.334 (95.187) Mem 14939MB [2024-07-24 23:22:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.700 Acc@5 95.200 [2024-07-24 23:22:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-07-24 23:22:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.058 (1.058) Loss 0.6133 (0.6133) Acc@1 87.842 (87.842) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-24 23:22:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.179) Loss 0.9927 (0.7569) Acc@1 78.760 (84.459) Acc@5 94.629 (97.119) Mem 14939MB [2024-07-24 23:22:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.134) Loss 1.1270 (0.8970) Acc@1 73.389 (80.801) Acc@5 93.750 (95.559) Mem 14939MB [2024-07-24 23:22:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.444 Acc@5 95.545 [2024-07-24 23:22:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-24 23:22:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.44% [2024-07-24 23:22:36 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:22:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:22:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][0/625] eta 0:42:52 lr 0.000966 wd 0.0500 time 4.1157 (4.1157) data time 3.7259 (3.7259) model time 0.0000 (0.0000) loss 8.7501 (8.7501) grad_norm 2.2832 (2.2832) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][10/625] eta 0:08:37 lr 0.000966 wd 0.0500 time 0.3986 (0.8413) data time 0.0009 (0.3395) model time 0.0000 (0.0000) loss 9.4806 (7.6050) grad_norm 1.5022 (1.8455) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:22:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][20/625] eta 0:06:27 lr 0.000966 wd 0.0500 time 0.3977 (0.6406) data time 0.0007 (0.1783) model time 0.0000 (0.0000) loss 6.6503 (7.4615) grad_norm 1.4073 (1.7641) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:22:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][30/625] eta 0:05:35 lr 0.000966 wd 0.0500 time 0.3955 (0.5635) data time 0.0010 (0.1211) model time 0.0000 (0.0000) loss 6.6123 (7.4140) grad_norm 1.8356 (1.9512) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:22:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][40/625] eta 0:05:06 lr 0.000965 wd 0.0500 time 0.3959 (0.5237) data time 0.0008 (0.0918) model time 0.0000 (0.0000) loss 9.0424 (7.5036) grad_norm 2.3318 (2.1166) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][50/625] eta 0:04:47 lr 0.000965 wd 0.0500 time 0.4002 (0.4992) data time 0.0009 (0.0740) model time 0.0000 (0.0000) loss 7.4878 (7.4871) grad_norm 3.7628 (2.2274) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][60/625] eta 0:04:41 lr 0.000965 wd 0.0500 time 0.3995 (0.4979) data time 0.0008 (0.0620) model time 0.3987 (0.4903) loss 7.4543 (7.4429) grad_norm 1.5706 (2.2163) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][70/625] eta 0:04:29 lr 0.000965 wd 0.0500 time 0.4106 (0.4847) data time 0.0006 (0.0534) model time 0.4100 (0.4466) loss 6.5956 (7.4711) grad_norm 1.3906 (2.1775) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][80/625] eta 0:04:18 lr 0.000965 wd 0.0500 time 0.4119 (0.4746) data time 0.0009 (0.0470) model time 0.4110 (0.4317) loss 7.7898 (7.4455) grad_norm 1.4889 (2.1375) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][90/625] eta 0:04:09 lr 0.000965 wd 0.0500 time 0.3988 (0.4666) data time 0.0010 (0.0419) model time 0.3979 (0.4239) loss 7.0618 (7.4831) grad_norm 2.1471 (2.1007) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][100/625] eta 0:04:01 lr 0.000965 wd 0.0500 time 0.3971 (0.4603) data time 0.0008 (0.0379) model time 0.3963 (0.4195) loss 7.4193 (7.4376) grad_norm 2.3605 (2.1272) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][110/625] eta 0:03:54 lr 0.000965 wd 0.0500 time 0.4186 (0.4550) data time 0.0008 (0.0345) model time 0.4178 (0.4163) loss 7.5358 (7.4739) grad_norm 2.0944 (2.1486) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][120/625] eta 0:03:47 lr 0.000965 wd 0.0500 time 0.4007 (0.4506) data time 0.0008 (0.0318) model time 0.3999 (0.4141) loss 7.9370 (7.4987) grad_norm 2.1176 (2.1599) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][130/625] eta 0:03:41 lr 0.000965 wd 0.0500 time 0.3999 (0.4473) data time 0.0007 (0.0294) model time 0.3992 (0.4131) loss 6.7154 (7.5293) grad_norm 1.5140 (2.1385) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][140/625] eta 0:03:35 lr 0.000965 wd 0.0500 time 0.4016 (0.4441) data time 0.0008 (0.0274) model time 0.4007 (0.4118) loss 8.4442 (7.5657) grad_norm 1.7685 (2.1387) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][150/625] eta 0:03:29 lr 0.000965 wd 0.0500 time 0.4017 (0.4412) data time 0.0011 (0.0257) model time 0.4006 (0.4106) loss 7.7398 (7.5710) grad_norm 2.0301 (2.1156) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][160/625] eta 0:03:24 lr 0.000964 wd 0.0500 time 0.4073 (0.4388) data time 0.0008 (0.0241) model time 0.4065 (0.4097) loss 7.6493 (7.5944) grad_norm 1.9251 (2.1077) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][170/625] eta 0:03:18 lr 0.000964 wd 0.0500 time 0.4084 (0.4365) data time 0.0009 (0.0228) model time 0.4075 (0.4089) loss 7.8503 (7.5979) grad_norm 1.7650 (2.1059) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:23:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][180/625] eta 0:03:13 lr 0.000964 wd 0.0500 time 0.4071 (0.4347) data time 0.0008 (0.0216) model time 0.4063 (0.4084) loss 8.3020 (7.5832) grad_norm 1.9464 (2.1212) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][190/625] eta 0:03:08 lr 0.000964 wd 0.0500 time 0.3999 (0.4329) data time 0.0008 (0.0205) model time 0.3990 (0.4077) loss 7.9767 (7.5948) grad_norm 2.9125 (2.1113) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][200/625] eta 0:03:03 lr 0.000964 wd 0.0500 time 0.4015 (0.4314) data time 0.0008 (0.0195) model time 0.4007 (0.4073) loss 6.6526 (7.6031) grad_norm 3.1777 (2.1228) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][210/625] eta 0:02:58 lr 0.000964 wd 0.0500 time 0.4032 (0.4300) data time 0.0006 (0.0186) model time 0.4025 (0.4069) loss 6.5763 (7.6009) grad_norm 1.5089 (2.1113) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][220/625] eta 0:02:54 lr 0.000964 wd 0.0500 time 0.4004 (0.4307) data time 0.0006 (0.0178) model time 0.3998 (0.4092) loss 7.5327 (7.6017) grad_norm 1.5900 (2.0943) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][230/625] eta 0:02:51 lr 0.000964 wd 0.0500 time 0.3788 (0.4334) data time 0.0008 (0.0171) model time 0.3780 (0.4137) loss 7.4167 (7.5868) grad_norm 2.5055 (2.0920) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][240/625] eta 0:02:46 lr 0.000964 wd 0.0500 time 0.3972 (0.4327) data time 0.0008 (0.0164) model time 0.3964 (0.4138) loss 7.8933 (7.5835) grad_norm 2.6527 (2.0873) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][250/625] eta 0:02:42 lr 0.000964 wd 0.0500 time 0.4020 (0.4327) data time 0.0008 (0.0158) model time 0.4012 (0.4147) loss 7.2381 (7.5850) grad_norm 1.7709 (2.0811) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][260/625] eta 0:02:37 lr 0.000964 wd 0.0500 time 0.4055 (0.4315) data time 0.0006 (0.0153) model time 0.4048 (0.4140) loss 6.6270 (7.5925) grad_norm 1.9286 (2.0821) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][270/625] eta 0:02:32 lr 0.000964 wd 0.0500 time 0.4011 (0.4304) data time 0.0006 (0.0147) model time 0.4005 (0.4134) loss 6.2303 (7.5871) grad_norm 4.8877 (2.1157) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][280/625] eta 0:02:28 lr 0.000963 wd 0.0500 time 0.3955 (0.4294) data time 0.0008 (0.0142) model time 0.3947 (0.4130) loss 5.8595 (7.5856) grad_norm 1.8629 (2.1238) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][290/625] eta 0:02:23 lr 0.000963 wd 0.0500 time 0.4010 (0.4285) data time 0.0008 (0.0138) model time 0.4001 (0.4125) loss 7.9441 (7.5908) grad_norm 1.4687 (2.1206) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][300/625] eta 0:02:18 lr 0.000963 wd 0.0500 time 0.3986 (0.4276) data time 0.0007 (0.0134) model time 0.3979 (0.4120) loss 7.2066 (7.5877) grad_norm 2.1010 (2.1216) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][310/625] eta 0:02:14 lr 0.000963 wd 0.0500 time 0.4040 (0.4268) data time 0.0006 (0.0130) model time 0.4034 (0.4116) loss 7.0591 (7.5794) grad_norm 2.2678 (2.1275) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][320/625] eta 0:02:09 lr 0.000963 wd 0.0500 time 0.4019 (0.4262) data time 0.0009 (0.0126) model time 0.4009 (0.4114) loss 8.0485 (7.5784) grad_norm 1.5800 (2.1409) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:24:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][330/625] eta 0:02:05 lr 0.000963 wd 0.0500 time 0.4141 (0.4254) data time 0.0007 (0.0122) model time 0.4135 (0.4109) loss 6.5246 (7.5733) grad_norm 1.7412 (2.1338) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][340/625] eta 0:02:01 lr 0.000963 wd 0.0500 time 0.3958 (0.4246) data time 0.0010 (0.0119) model time 0.3948 (0.4105) loss 7.8555 (7.5794) grad_norm 2.0290 (2.1233) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][350/625] eta 0:01:56 lr 0.000963 wd 0.0500 time 0.3995 (0.4240) data time 0.0008 (0.0116) model time 0.3986 (0.4102) loss 6.3269 (7.5770) grad_norm 3.1028 (2.1261) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][360/625] eta 0:01:52 lr 0.000963 wd 0.0500 time 0.3953 (0.4233) data time 0.0008 (0.0113) model time 0.3945 (0.4098) loss 7.3075 (7.5654) grad_norm 2.4933 (2.1250) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][370/625] eta 0:01:50 lr 0.000963 wd 0.0500 time 0.3977 (0.4321) data time 0.0008 (0.0110) model time 0.3969 (0.4204) loss 7.0301 (7.5686) grad_norm 1.8189 (2.1156) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][380/625] eta 0:01:45 lr 0.000963 wd 0.0500 time 0.4056 (0.4313) data time 0.0009 (0.0108) model time 0.4047 (0.4198) loss 8.4743 (7.5703) grad_norm 1.5937 (2.1163) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][390/625] eta 0:01:41 lr 0.000963 wd 0.0500 time 0.3952 (0.4304) data time 0.0007 (0.0105) model time 0.3945 (0.4191) loss 7.8465 (7.5767) grad_norm 2.6340 (2.1142) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][400/625] eta 0:01:36 lr 0.000962 wd 0.0500 time 0.3928 (0.4297) data time 0.0007 (0.0103) model time 0.3921 (0.4186) loss 7.2863 (7.5781) grad_norm 1.3426 (2.1174) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][410/625] eta 0:01:32 lr 0.000962 wd 0.0500 time 0.4055 (0.4290) data time 0.0010 (0.0100) model time 0.4046 (0.4181) loss 8.4406 (7.5793) grad_norm 3.0550 (2.1178) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][420/625] eta 0:01:27 lr 0.000962 wd 0.0500 time 0.3963 (0.4283) data time 0.0009 (0.0098) model time 0.3954 (0.4175) loss 7.0668 (7.5727) grad_norm 1.9578 (2.1183) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][430/625] eta 0:01:23 lr 0.000962 wd 0.0500 time 0.4025 (0.4277) data time 0.0008 (0.0096) model time 0.4017 (0.4171) loss 6.5965 (7.5775) grad_norm 2.8920 (2.1126) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][440/625] eta 0:01:19 lr 0.000962 wd 0.0500 time 0.5917 (0.4283) data time 0.0006 (0.0094) model time 0.5910 (0.4180) loss 7.7094 (7.5813) grad_norm 2.0853 (2.1137) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][450/625] eta 0:01:15 lr 0.000962 wd 0.0500 time 0.5990 (0.4308) data time 0.0009 (0.0092) model time 0.5982 (0.4211) loss 7.7381 (7.5741) grad_norm 1.7262 (2.1179) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][460/625] eta 0:01:11 lr 0.000962 wd 0.0500 time 0.4024 (0.4304) data time 0.0010 (0.0091) model time 0.4015 (0.4208) loss 6.9391 (7.5774) grad_norm 1.4288 (2.1107) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:25:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][470/625] eta 0:01:06 lr 0.000962 wd 0.0500 time 0.4038 (0.4298) data time 0.0006 (0.0089) model time 0.4032 (0.4204) loss 7.8076 (7.5809) grad_norm 1.7969 (2.1063) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][480/625] eta 0:01:02 lr 0.000962 wd 0.0500 time 0.4005 (0.4292) data time 0.0009 (0.0087) model time 0.3996 (0.4199) loss 8.6305 (7.5938) grad_norm 2.9772 (2.1089) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][490/625] eta 0:00:57 lr 0.000962 wd 0.0500 time 0.4016 (0.4286) data time 0.0008 (0.0086) model time 0.4008 (0.4195) loss 8.3710 (7.5982) grad_norm 1.9897 (2.1065) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][500/625] eta 0:00:53 lr 0.000962 wd 0.0500 time 0.3980 (0.4281) data time 0.0007 (0.0084) model time 0.3973 (0.4191) loss 6.5224 (7.6042) grad_norm 1.9824 (2.1025) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][510/625] eta 0:00:49 lr 0.000961 wd 0.0500 time 0.3974 (0.4276) data time 0.0010 (0.0083) model time 0.3964 (0.4186) loss 8.0544 (7.5975) grad_norm 1.9530 (2.0958) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][520/625] eta 0:00:44 lr 0.000961 wd 0.0500 time 0.3995 (0.4271) data time 0.0008 (0.0081) model time 0.3987 (0.4183) loss 7.1301 (7.6096) grad_norm 3.2299 (2.0998) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][530/625] eta 0:00:40 lr 0.000961 wd 0.0500 time 0.4039 (0.4266) data time 0.0009 (0.0080) model time 0.4031 (0.4179) loss 7.3252 (7.6121) grad_norm 1.6546 (2.0954) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][540/625] eta 0:00:36 lr 0.000961 wd 0.0500 time 0.3963 (0.4261) data time 0.0010 (0.0079) model time 0.3954 (0.4176) loss 9.7822 (7.6183) grad_norm 1.6576 (2.0957) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][550/625] eta 0:00:31 lr 0.000961 wd 0.0500 time 0.4004 (0.4257) data time 0.0006 (0.0077) model time 0.3998 (0.4172) loss 7.3640 (7.6205) grad_norm 2.1939 (2.1053) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][560/625] eta 0:00:27 lr 0.000961 wd 0.0500 time 0.4069 (0.4262) data time 0.0011 (0.0076) model time 0.4059 (0.4179) loss 7.5344 (7.6197) grad_norm 1.7139 (2.1106) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][570/625] eta 0:00:23 lr 0.000961 wd 0.0500 time 0.4043 (0.4258) data time 0.0012 (0.0075) model time 0.4031 (0.4176) loss 7.2884 (7.6196) grad_norm 3.1178 (2.1197) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][580/625] eta 0:00:19 lr 0.000961 wd 0.0500 time 0.3945 (0.4254) data time 0.0008 (0.0074) model time 0.3937 (0.4173) loss 8.8571 (7.6230) grad_norm 4.2871 (2.1212) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][590/625] eta 0:00:14 lr 0.000961 wd 0.0500 time 0.4048 (0.4249) data time 0.0007 (0.0073) model time 0.4041 (0.4169) loss 7.4861 (7.6227) grad_norm 2.5315 (2.1250) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][600/625] eta 0:00:10 lr 0.000961 wd 0.0500 time 0.3973 (0.4245) data time 0.0009 (0.0072) model time 0.3965 (0.4166) loss 6.7137 (7.6245) grad_norm 1.8721 (2.1309) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:26:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][610/625] eta 0:00:06 lr 0.000961 wd 0.0500 time 0.4004 (0.4242) data time 0.0006 (0.0071) model time 0.3998 (0.4164) loss 7.6741 (7.6274) grad_norm 1.6514 (2.1295) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][620/625] eta 0:00:02 lr 0.000961 wd 0.0500 time 0.3970 (0.4238) data time 0.0004 (0.0070) model time 0.3966 (0.4161) loss 6.5351 (7.6242) grad_norm 2.3675 (2.1272) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 102 training takes 0:04:24 [2024-07-24 23:27:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:27:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:27:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.592 (0.592) Loss 0.6206 (0.6206) Acc@1 87.598 (87.598) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-24 23:27:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.132) Loss 1.0176 (0.7694) Acc@1 76.953 (83.802) Acc@5 94.043 (96.950) Mem 14939MB [2024-07-24 23:27:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.110) Loss 1.1348 (0.9199) Acc@1 73.779 (80.046) Acc@5 93.164 (95.222) Mem 14939MB [2024-07-24 23:27:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.655 Acc@5 95.206 [2024-07-24 23:27:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-07-24 23:27:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 6.914 (6.914) Loss 0.6123 (0.6123) Acc@1 87.842 (87.842) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-24 23:27:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.709) Loss 0.9917 (0.7557) Acc@1 79.102 (84.504) Acc@5 94.531 (97.132) Mem 14939MB [2024-07-24 23:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.412) Loss 1.1240 (0.8954) Acc@1 73.535 (80.850) Acc@5 93.896 (95.578) Mem 14939MB [2024-07-24 23:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.502 Acc@5 95.563 [2024-07-24 23:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-07-24 23:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.50% [2024-07-24 23:27:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:27:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:27:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][0/625] eta 1:15:34 lr 0.000961 wd 0.0500 time 7.2554 (7.2554) data time 0.3523 (0.3523) model time 0.0000 (0.0000) loss 7.9241 (7.9241) grad_norm 2.7991 (2.7991) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][10/625] eta 0:10:29 lr 0.000960 wd 0.0500 time 0.3995 (1.0232) data time 0.0006 (0.0328) model time 0.0000 (0.0000) loss 8.3610 (7.8354) grad_norm 5.3228 (2.5175) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][20/625] eta 0:07:19 lr 0.000960 wd 0.0500 time 0.4073 (0.7266) data time 0.0008 (0.0176) model time 0.0000 (0.0000) loss 8.3826 (7.7587) grad_norm 1.6185 (2.8682) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][30/625] eta 0:06:09 lr 0.000960 wd 0.0500 time 0.3963 (0.6214) data time 0.0009 (0.0123) model time 0.0000 (0.0000) loss 7.3386 (7.7766) grad_norm 3.0084 (2.6899) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][40/625] eta 0:05:43 lr 0.000960 wd 0.0500 time 0.3999 (0.5870) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 8.3220 (7.8232) grad_norm 1.7496 (2.5441) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][50/625] eta 0:05:26 lr 0.000960 wd 0.0500 time 0.5559 (0.5678) data time 0.0008 (0.0078) model time 0.0000 (0.0000) loss 8.5099 (7.7580) grad_norm 2.3827 (2.4315) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][60/625] eta 0:05:05 lr 0.000960 wd 0.0500 time 0.3972 (0.5404) data time 0.0008 (0.0067) model time 0.3964 (0.3997) loss 7.2215 (7.6771) grad_norm 2.6373 (2.4855) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][70/625] eta 0:04:49 lr 0.000960 wd 0.0500 time 0.3928 (0.5209) data time 0.0010 (0.0059) model time 0.3919 (0.4002) loss 8.9788 (7.7067) grad_norm 3.1769 (2.4626) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:27:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][80/625] eta 0:04:35 lr 0.000960 wd 0.0500 time 0.4005 (0.5060) data time 0.0006 (0.0053) model time 0.3998 (0.3999) loss 8.9595 (7.6923) grad_norm 1.9193 (2.3877) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][90/625] eta 0:04:24 lr 0.000960 wd 0.0500 time 0.4005 (0.4944) data time 0.0008 (0.0048) model time 0.3997 (0.3999) loss 7.4413 (7.7209) grad_norm 1.5376 (2.3508) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][100/625] eta 0:04:14 lr 0.000960 wd 0.0500 time 0.4030 (0.4855) data time 0.0008 (0.0044) model time 0.4022 (0.4006) loss 7.7623 (7.7149) grad_norm 3.0479 (2.3208) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][110/625] eta 0:04:06 lr 0.000960 wd 0.0500 time 0.3967 (0.4780) data time 0.0008 (0.0041) model time 0.3959 (0.4007) loss 7.9418 (7.6786) grad_norm 3.3378 (2.3255) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][120/625] eta 0:03:58 lr 0.000959 wd 0.0500 time 0.4026 (0.4716) data time 0.0007 (0.0039) model time 0.4019 (0.4006) loss 7.8690 (7.6723) grad_norm 2.1656 (2.3068) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][130/625] eta 0:03:50 lr 0.000959 wd 0.0500 time 0.3947 (0.4662) data time 0.0008 (0.0036) model time 0.3939 (0.4004) loss 7.0935 (7.6677) grad_norm 2.6058 (2.2908) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][140/625] eta 0:03:43 lr 0.000959 wd 0.0500 time 0.3966 (0.4615) data time 0.0007 (0.0035) model time 0.3958 (0.4003) loss 6.9021 (7.6989) grad_norm 2.3080 (2.2946) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][150/625] eta 0:03:37 lr 0.000959 wd 0.0500 time 0.3968 (0.4576) data time 0.0008 (0.0033) model time 0.3960 (0.4005) loss 7.5264 (7.6564) grad_norm 1.5887 (2.2808) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][160/625] eta 0:03:31 lr 0.000959 wd 0.0500 time 0.4014 (0.4541) data time 0.0009 (0.0031) model time 0.4006 (0.4004) loss 6.9906 (7.6571) grad_norm 1.6053 (2.2883) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][170/625] eta 0:03:25 lr 0.000959 wd 0.0500 time 0.4015 (0.4509) data time 0.0009 (0.0030) model time 0.4006 (0.4003) loss 8.5093 (7.6591) grad_norm 1.6879 (2.2886) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][180/625] eta 0:03:19 lr 0.000959 wd 0.0500 time 0.3945 (0.4482) data time 0.0009 (0.0029) model time 0.3937 (0.4004) loss 7.8569 (7.6658) grad_norm 1.4059 (2.2533) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][190/625] eta 0:03:13 lr 0.000959 wd 0.0500 time 0.3941 (0.4457) data time 0.0007 (0.0028) model time 0.3935 (0.4003) loss 6.5111 (7.6656) grad_norm 1.5902 (2.2286) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][200/625] eta 0:03:11 lr 0.000959 wd 0.0500 time 0.4010 (0.4500) data time 0.0006 (0.0027) model time 0.4004 (0.4090) loss 8.6203 (7.6789) grad_norm 1.7377 (2.2196) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][210/625] eta 0:03:05 lr 0.000959 wd 0.0500 time 0.4241 (0.4477) data time 0.0008 (0.0026) model time 0.4234 (0.4085) loss 8.3578 (7.6781) grad_norm 2.1680 (2.2120) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][220/625] eta 0:03:00 lr 0.000959 wd 0.0500 time 0.4052 (0.4464) data time 0.0006 (0.0025) model time 0.4046 (0.4091) loss 7.4117 (7.6601) grad_norm 3.6119 (2.2178) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][230/625] eta 0:02:55 lr 0.000959 wd 0.0500 time 0.3976 (0.4444) data time 0.0006 (0.0025) model time 0.3970 (0.4085) loss 8.3514 (7.6622) grad_norm 2.3983 (2.2076) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][240/625] eta 0:02:50 lr 0.000958 wd 0.0500 time 0.3983 (0.4426) data time 0.0009 (0.0024) model time 0.3974 (0.4080) loss 7.6803 (7.6320) grad_norm 1.4859 (2.2070) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][250/625] eta 0:02:45 lr 0.000958 wd 0.0500 time 0.4023 (0.4410) data time 0.0008 (0.0023) model time 0.4014 (0.4077) loss 8.2545 (7.6413) grad_norm 1.7337 (2.2022) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][260/625] eta 0:02:41 lr 0.000958 wd 0.0500 time 0.3890 (0.4419) data time 0.0008 (0.0023) model time 0.3882 (0.4104) loss 6.7817 (7.6319) grad_norm 1.8040 (2.2104) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][270/625] eta 0:02:37 lr 0.000958 wd 0.0500 time 0.4068 (0.4437) data time 0.0006 (0.0022) model time 0.4062 (0.4140) loss 6.0762 (7.6228) grad_norm 3.0603 (2.2053) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][280/625] eta 0:02:32 lr 0.000958 wd 0.0500 time 0.3992 (0.4422) data time 0.0007 (0.0022) model time 0.3985 (0.4134) loss 8.4996 (7.6196) grad_norm 2.4533 (2.2069) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][290/625] eta 0:02:27 lr 0.000958 wd 0.0500 time 0.4011 (0.4408) data time 0.0006 (0.0022) model time 0.4004 (0.4128) loss 5.7739 (7.6099) grad_norm 1.3481 (2.1912) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][300/625] eta 0:02:22 lr 0.000958 wd 0.0500 time 0.4037 (0.4394) data time 0.0008 (0.0021) model time 0.4029 (0.4123) loss 7.9778 (7.6137) grad_norm 2.4992 (2.1978) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][310/625] eta 0:02:18 lr 0.000958 wd 0.0500 time 0.4009 (0.4384) data time 0.0007 (0.0021) model time 0.4003 (0.4121) loss 8.9764 (7.6275) grad_norm 2.9876 (2.2100) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][320/625] eta 0:02:13 lr 0.000958 wd 0.0500 time 0.4023 (0.4372) data time 0.0007 (0.0020) model time 0.4016 (0.4116) loss 6.7836 (7.6230) grad_norm 1.2900 (2.2011) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][330/625] eta 0:02:08 lr 0.000958 wd 0.0500 time 0.4058 (0.4360) data time 0.0008 (0.0020) model time 0.4050 (0.4111) loss 7.4281 (7.6139) grad_norm 1.3429 (2.1905) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][340/625] eta 0:02:03 lr 0.000958 wd 0.0500 time 0.4013 (0.4350) data time 0.0009 (0.0020) model time 0.4004 (0.4107) loss 8.4338 (7.6236) grad_norm 1.4371 (2.1738) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][350/625] eta 0:01:59 lr 0.000958 wd 0.0500 time 0.3941 (0.4340) data time 0.0009 (0.0019) model time 0.3932 (0.4103) loss 6.6571 (7.6214) grad_norm 1.4941 (2.1660) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][360/625] eta 0:01:54 lr 0.000957 wd 0.0500 time 0.3984 (0.4331) data time 0.0007 (0.0019) model time 0.3977 (0.4100) loss 8.6658 (7.6198) grad_norm 2.5177 (2.1659) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][370/625] eta 0:01:50 lr 0.000957 wd 0.0500 time 0.4039 (0.4322) data time 0.0006 (0.0019) model time 0.4032 (0.4097) loss 6.7901 (7.6047) grad_norm 1.6600 (2.1838) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][380/625] eta 0:01:45 lr 0.000957 wd 0.0500 time 0.4023 (0.4314) data time 0.0007 (0.0019) model time 0.4016 (0.4094) loss 8.4596 (7.6063) grad_norm 1.3826 (2.1798) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][390/625] eta 0:01:41 lr 0.000957 wd 0.0500 time 0.4118 (0.4307) data time 0.0009 (0.0018) model time 0.4109 (0.4092) loss 8.4769 (7.6089) grad_norm 3.3398 (2.1790) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][400/625] eta 0:01:36 lr 0.000957 wd 0.0500 time 0.4007 (0.4300) data time 0.0007 (0.0018) model time 0.4000 (0.4090) loss 6.9059 (7.5888) grad_norm 2.1034 (2.1816) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][410/625] eta 0:01:32 lr 0.000957 wd 0.0500 time 0.4011 (0.4293) data time 0.0007 (0.0018) model time 0.4004 (0.4087) loss 6.0830 (7.5845) grad_norm 1.4476 (2.1805) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][420/625] eta 0:01:27 lr 0.000957 wd 0.0500 time 0.4020 (0.4286) data time 0.0007 (0.0018) model time 0.4013 (0.4085) loss 7.9404 (7.5888) grad_norm 1.3668 (2.1768) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][430/625] eta 0:01:23 lr 0.000957 wd 0.0500 time 0.4037 (0.4280) data time 0.0007 (0.0018) model time 0.4030 (0.4083) loss 7.6842 (7.5794) grad_norm 2.4884 (2.1737) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][440/625] eta 0:01:19 lr 0.000957 wd 0.0500 time 0.4008 (0.4278) data time 0.0008 (0.0017) model time 0.4000 (0.4086) loss 8.6009 (7.5749) grad_norm 1.5091 (2.1713) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][450/625] eta 0:01:14 lr 0.000957 wd 0.0500 time 0.4179 (0.4273) data time 0.0009 (0.0017) model time 0.4170 (0.4084) loss 6.6530 (7.5693) grad_norm 2.0338 (2.1614) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][460/625] eta 0:01:10 lr 0.000957 wd 0.0500 time 0.3981 (0.4267) data time 0.0006 (0.0017) model time 0.3975 (0.4082) loss 7.0416 (7.5627) grad_norm 2.1392 (2.1613) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][470/625] eta 0:01:06 lr 0.000956 wd 0.0500 time 0.4012 (0.4281) data time 0.0011 (0.0036) model time 0.4002 (0.4081) loss 9.0135 (7.5760) grad_norm 2.3307 (2.1578) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][480/625] eta 0:01:02 lr 0.000956 wd 0.0500 time 0.3943 (0.4289) data time 0.0007 (0.0035) model time 0.3936 (0.4094) loss 7.5919 (7.5670) grad_norm 3.7820 (2.1680) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][490/625] eta 0:00:58 lr 0.000956 wd 0.0500 time 0.3969 (0.4301) data time 0.0007 (0.0035) model time 0.3962 (0.4112) loss 6.6938 (7.5642) grad_norm 3.2144 (2.1838) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][500/625] eta 0:00:53 lr 0.000956 wd 0.0500 time 0.4014 (0.4296) data time 0.0010 (0.0034) model time 0.4004 (0.4110) loss 7.8639 (7.5655) grad_norm 1.9720 (2.1971) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][510/625] eta 0:00:49 lr 0.000956 wd 0.0500 time 0.3975 (0.4290) data time 0.0008 (0.0034) model time 0.3967 (0.4108) loss 7.9028 (7.5605) grad_norm 2.1802 (2.2040) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:30:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][520/625] eta 0:00:44 lr 0.000956 wd 0.0500 time 0.3968 (0.4285) data time 0.0006 (0.0033) model time 0.3962 (0.4105) loss 8.1658 (7.5709) grad_norm 1.6014 (2.1983) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][530/625] eta 0:00:40 lr 0.000956 wd 0.0500 time 0.3965 (0.4296) data time 0.0009 (0.0033) model time 0.3957 (0.4121) loss 8.2461 (7.5718) grad_norm 1.8423 (2.1948) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][540/625] eta 0:00:36 lr 0.000956 wd 0.0500 time 0.4007 (0.4291) data time 0.0007 (0.0032) model time 0.3999 (0.4119) loss 8.6975 (7.5785) grad_norm 2.1617 (2.1908) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][550/625] eta 0:00:32 lr 0.000956 wd 0.0500 time 0.3974 (0.4285) data time 0.0006 (0.0032) model time 0.3967 (0.4116) loss 6.3730 (7.5814) grad_norm 1.7151 (2.1880) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][560/625] eta 0:00:27 lr 0.000956 wd 0.0500 time 0.4011 (0.4294) data time 0.0009 (0.0032) model time 0.4002 (0.4128) loss 8.6167 (7.5810) grad_norm 2.3483 (2.1828) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][570/625] eta 0:00:23 lr 0.000956 wd 0.0500 time 0.3987 (0.4289) data time 0.0008 (0.0031) model time 0.3978 (0.4126) loss 7.0655 (7.5843) grad_norm 2.2111 (2.1888) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][580/625] eta 0:00:19 lr 0.000956 wd 0.0500 time 0.4059 (0.4284) data time 0.0011 (0.0031) model time 0.4048 (0.4124) loss 8.2630 (7.5913) grad_norm 3.0303 (2.2059) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][590/625] eta 0:00:14 lr 0.000955 wd 0.0500 time 0.4029 (0.4279) data time 0.0006 (0.0030) model time 0.4023 (0.4121) loss 7.3854 (7.5922) grad_norm 2.6999 (2.2054) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][600/625] eta 0:00:10 lr 0.000955 wd 0.0500 time 0.4049 (0.4275) data time 0.0008 (0.0030) model time 0.4041 (0.4119) loss 7.7008 (7.5882) grad_norm 3.0825 (2.2124) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][610/625] eta 0:00:06 lr 0.000955 wd 0.0500 time 0.4014 (0.4271) data time 0.0006 (0.0030) model time 0.4008 (0.4117) loss 8.5246 (7.5856) grad_norm 1.7619 (2.2134) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][620/625] eta 0:00:02 lr 0.000955 wd 0.0500 time 0.3973 (0.4266) data time 0.0004 (0.0029) model time 0.3968 (0.4115) loss 7.8172 (7.5885) grad_norm 1.4971 (2.2087) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 103 training takes 0:04:26 [2024-07-24 23:31:42 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:31:43 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:31:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.001 (1.001) Loss 0.6094 (0.6094) Acc@1 87.354 (87.354) Acc@5 97.900 (97.900) Mem 14939MB [2024-07-24 23:31:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.174) Loss 1.0264 (0.7603) Acc@1 76.562 (83.771) Acc@5 94.775 (96.946) Mem 14939MB [2024-07-24 23:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.132) Loss 1.1357 (0.9065) Acc@1 73.730 (80.150) Acc@5 93.018 (95.317) Mem 14939MB [2024-07-24 23:31:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.768 Acc@5 95.288 [2024-07-24 23:31:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-07-24 23:31:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.234 (1.234) Loss 0.6104 (0.6104) Acc@1 87.939 (87.939) Acc@5 98.242 (98.242) Mem 14939MB [2024-07-24 23:31:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.195) Loss 0.9888 (0.7539) Acc@1 79.199 (84.526) Acc@5 94.678 (97.146) Mem 14939MB [2024-07-24 23:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.143) Loss 1.1221 (0.8934) Acc@1 73.633 (80.906) Acc@5 93.896 (95.594) Mem 14939MB [2024-07-24 23:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.566 Acc@5 95.577 [2024-07-24 23:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-24 23:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.57% [2024-07-24 23:31:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:31:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:31:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][0/625] eta 0:08:20 lr 0.000955 wd 0.0500 time 0.8009 (0.8009) data time 0.4177 (0.4177) model time 0.0000 (0.0000) loss 9.6969 (9.6969) grad_norm 1.8004 (1.8004) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][10/625] eta 0:04:28 lr 0.000955 wd 0.0500 time 0.4003 (0.4361) data time 0.0009 (0.0389) model time 0.0000 (0.0000) loss 7.9064 (7.4409) grad_norm 1.9012 (1.9040) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:31:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][20/625] eta 0:04:13 lr 0.000955 wd 0.0500 time 0.3967 (0.4184) data time 0.0007 (0.0208) model time 0.0000 (0.0000) loss 6.7936 (7.4116) grad_norm 2.1258 (2.0069) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][30/625] eta 0:04:05 lr 0.000955 wd 0.0500 time 0.4037 (0.4129) data time 0.0007 (0.0144) model time 0.0000 (0.0000) loss 7.8522 (7.4393) grad_norm 1.8330 (2.1454) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][40/625] eta 0:03:59 lr 0.000955 wd 0.0500 time 0.3995 (0.4097) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 7.9678 (7.4255) grad_norm 2.7327 (2.1265) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][50/625] eta 0:03:54 lr 0.000955 wd 0.0500 time 0.3992 (0.4086) data time 0.0006 (0.0091) model time 0.0000 (0.0000) loss 9.0977 (7.5463) grad_norm 1.9134 (2.0801) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][60/625] eta 0:03:59 lr 0.000955 wd 0.0500 time 0.4164 (0.4245) data time 0.0008 (0.0078) model time 0.4155 (0.5048) loss 8.5348 (7.6354) grad_norm 2.1029 (2.1498) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][70/625] eta 0:03:53 lr 0.000955 wd 0.0500 time 0.4026 (0.4211) data time 0.0007 (0.0068) model time 0.4020 (0.4521) loss 7.1647 (7.6356) grad_norm 1.9799 (2.1247) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][80/625] eta 0:03:54 lr 0.000954 wd 0.0500 time 0.5982 (0.4296) data time 0.0009 (0.0061) model time 0.5972 (0.4643) loss 8.2147 (7.6518) grad_norm 1.4321 (2.1572) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][90/625] eta 0:03:52 lr 0.000954 wd 0.0500 time 0.3978 (0.4348) data time 0.0006 (0.0055) model time 0.3972 (0.4672) loss 7.1085 (7.6489) grad_norm 3.6799 (2.1788) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][100/625] eta 0:03:56 lr 0.000954 wd 0.0500 time 0.4158 (0.4497) data time 0.0008 (0.0051) model time 0.4150 (0.4908) loss 7.8635 (7.6437) grad_norm 2.3112 (2.2437) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][110/625] eta 0:03:49 lr 0.000954 wd 0.0500 time 0.3989 (0.4452) data time 0.0007 (0.0047) model time 0.3982 (0.4753) loss 8.0770 (7.5896) grad_norm 2.5791 (2.2667) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][120/625] eta 0:03:43 lr 0.000954 wd 0.0500 time 0.4011 (0.4416) data time 0.0008 (0.0044) model time 0.4003 (0.4648) loss 6.1853 (7.5790) grad_norm 2.4529 (2.2668) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][130/625] eta 0:03:37 lr 0.000954 wd 0.0500 time 0.4031 (0.4386) data time 0.0007 (0.0041) model time 0.4024 (0.4568) loss 6.9258 (7.5533) grad_norm 1.6869 (2.2495) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][140/625] eta 0:03:31 lr 0.000954 wd 0.0500 time 0.3998 (0.4359) data time 0.0009 (0.0039) model time 0.3989 (0.4503) loss 7.3186 (7.5627) grad_norm 1.9233 (2.2474) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][150/625] eta 0:03:25 lr 0.000954 wd 0.0500 time 0.4061 (0.4334) data time 0.0008 (0.0038) model time 0.4053 (0.4450) loss 7.0134 (7.5486) grad_norm 1.5120 (2.2488) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:32:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][160/625] eta 0:03:20 lr 0.000954 wd 0.0500 time 0.3987 (0.4313) data time 0.0008 (0.0036) model time 0.3979 (0.4408) loss 7.7190 (7.5617) grad_norm 1.6524 (2.2247) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][170/625] eta 0:03:15 lr 0.000954 wd 0.0500 time 0.4031 (0.4295) data time 0.0007 (0.0034) model time 0.4024 (0.4373) loss 7.8724 (7.5733) grad_norm 1.6549 (2.2033) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][180/625] eta 0:03:11 lr 0.000954 wd 0.0500 time 0.3980 (0.4308) data time 0.0007 (0.0033) model time 0.3974 (0.4384) loss 7.8158 (7.5862) grad_norm 1.6645 (2.1806) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][190/625] eta 0:03:06 lr 0.000954 wd 0.0500 time 0.4066 (0.4292) data time 0.0007 (0.0032) model time 0.4059 (0.4357) loss 6.7200 (7.5990) grad_norm 1.4406 (2.1710) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][200/625] eta 0:03:02 lr 0.000953 wd 0.0500 time 0.4040 (0.4290) data time 0.0007 (0.0031) model time 0.4034 (0.4349) loss 8.6842 (7.5923) grad_norm 2.2643 (2.1502) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][210/625] eta 0:02:57 lr 0.000953 wd 0.0500 time 0.4004 (0.4277) data time 0.0011 (0.0030) model time 0.3993 (0.4328) loss 7.3108 (7.5921) grad_norm 2.3873 (2.1505) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][220/625] eta 0:02:52 lr 0.000953 wd 0.0500 time 0.4006 (0.4265) data time 0.0008 (0.0029) model time 0.3998 (0.4308) loss 7.5609 (7.6099) grad_norm 4.0890 (2.2053) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][230/625] eta 0:02:47 lr 0.000953 wd 0.0500 time 0.3991 (0.4253) data time 0.0009 (0.0028) model time 0.3982 (0.4290) loss 8.3884 (7.6193) grad_norm 2.1218 (2.1938) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][240/625] eta 0:02:43 lr 0.000953 wd 0.0500 time 0.4026 (0.4242) data time 0.0011 (0.0027) model time 0.4016 (0.4274) loss 8.7440 (7.6228) grad_norm 1.8495 (2.1910) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][250/625] eta 0:02:38 lr 0.000953 wd 0.0500 time 0.3991 (0.4232) data time 0.0006 (0.0027) model time 0.3985 (0.4259) loss 7.9518 (7.6110) grad_norm 2.1426 (2.1812) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][260/625] eta 0:02:34 lr 0.000953 wd 0.0500 time 0.4026 (0.4223) data time 0.0006 (0.0026) model time 0.4020 (0.4246) loss 6.6316 (7.6056) grad_norm 2.3710 (2.1709) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][270/625] eta 0:02:29 lr 0.000953 wd 0.0500 time 0.4065 (0.4215) data time 0.0009 (0.0025) model time 0.4056 (0.4235) loss 8.8306 (7.6228) grad_norm 1.6643 (2.1727) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][280/625] eta 0:02:25 lr 0.000953 wd 0.0500 time 0.3962 (0.4207) data time 0.0009 (0.0025) model time 0.3953 (0.4223) loss 6.4296 (7.6176) grad_norm 2.3484 (2.1775) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][290/625] eta 0:02:22 lr 0.000953 wd 0.0500 time 0.4229 (0.4247) data time 0.0009 (0.0024) model time 0.4219 (0.4271) loss 6.1063 (7.6126) grad_norm 4.2324 (2.1878) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][300/625] eta 0:02:18 lr 0.000953 wd 0.0500 time 0.5899 (0.4273) data time 0.0007 (0.0024) model time 0.5892 (0.4301) loss 7.2293 (7.6146) grad_norm 2.2509 (2.1823) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][310/625] eta 0:02:15 lr 0.000952 wd 0.0500 time 0.3940 (0.4289) data time 0.0009 (0.0023) model time 0.3931 (0.4319) loss 9.0030 (7.6198) grad_norm 2.4164 (inf) loss_scale 1024.0000 (2015.0740) mem 14939MB [2024-07-24 23:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][320/625] eta 0:02:10 lr 0.000952 wd 0.0500 time 0.4082 (0.4280) data time 0.0006 (0.0023) model time 0.4076 (0.4307) loss 7.6809 (7.6170) grad_norm 1.5707 (inf) loss_scale 1024.0000 (1984.1994) mem 14939MB [2024-07-24 23:34:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][330/625] eta 0:02:06 lr 0.000952 wd 0.0500 time 0.3988 (0.4272) data time 0.0009 (0.0023) model time 0.3979 (0.4295) loss 8.1772 (7.6164) grad_norm 1.4083 (inf) loss_scale 1024.0000 (1955.1903) mem 14939MB [2024-07-24 23:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][340/625] eta 0:02:01 lr 0.000952 wd 0.0500 time 0.4005 (0.4264) data time 0.0009 (0.0022) model time 0.3995 (0.4285) loss 6.1997 (7.6149) grad_norm 2.7511 (inf) loss_scale 1024.0000 (1927.8827) mem 14939MB [2024-07-24 23:34:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][350/625] eta 0:01:57 lr 0.000952 wd 0.0500 time 0.3979 (0.4256) data time 0.0011 (0.0022) model time 0.3967 (0.4275) loss 7.2371 (7.6112) grad_norm 1.5197 (inf) loss_scale 1024.0000 (1902.1311) mem 14939MB [2024-07-24 23:34:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][360/625] eta 0:01:52 lr 0.000952 wd 0.0500 time 0.3975 (0.4249) data time 0.0008 (0.0021) model time 0.3967 (0.4266) loss 6.0890 (7.6230) grad_norm 2.5488 (inf) loss_scale 1024.0000 (1877.8061) mem 14939MB [2024-07-24 23:34:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][370/625] eta 0:01:48 lr 0.000952 wd 0.0500 time 0.4024 (0.4242) data time 0.0007 (0.0021) model time 0.4017 (0.4257) loss 7.8806 (7.6237) grad_norm 2.4609 (inf) loss_scale 1024.0000 (1854.7925) mem 14939MB [2024-07-24 23:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][380/625] eta 0:01:43 lr 0.000952 wd 0.0500 time 0.3981 (0.4237) data time 0.0009 (0.0021) model time 0.3972 (0.4250) loss 8.1757 (7.6198) grad_norm 3.7811 (inf) loss_scale 1024.0000 (1832.9869) mem 14939MB [2024-07-24 23:34:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][390/625] eta 0:01:39 lr 0.000952 wd 0.0500 time 0.4005 (0.4244) data time 0.0006 (0.0034) model time 0.3998 (0.4242) loss 8.8200 (7.6191) grad_norm 1.3809 (inf) loss_scale 1024.0000 (1812.2967) mem 14939MB [2024-07-24 23:34:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][400/625] eta 0:01:35 lr 0.000952 wd 0.0500 time 0.3996 (0.4238) data time 0.0009 (0.0033) model time 0.3987 (0.4235) loss 7.7851 (7.6183) grad_norm 1.8179 (inf) loss_scale 1024.0000 (1792.6384) mem 14939MB [2024-07-24 23:34:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][410/625] eta 0:01:30 lr 0.000952 wd 0.0500 time 0.3983 (0.4232) data time 0.0009 (0.0033) model time 0.3975 (0.4228) loss 8.3045 (7.6238) grad_norm 3.0065 (inf) loss_scale 1024.0000 (1773.9367) mem 14939MB [2024-07-24 23:34:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][420/625] eta 0:01:26 lr 0.000952 wd 0.0500 time 0.3979 (0.4231) data time 0.0006 (0.0032) model time 0.3974 (0.4227) loss 8.8980 (7.6230) grad_norm 2.6962 (inf) loss_scale 1024.0000 (1756.1235) mem 14939MB [2024-07-24 23:34:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][430/625] eta 0:01:22 lr 0.000951 wd 0.0500 time 0.3951 (0.4226) data time 0.0007 (0.0032) model time 0.3943 (0.4220) loss 7.8560 (7.6194) grad_norm 2.5048 (inf) loss_scale 1024.0000 (1739.1369) mem 14939MB [2024-07-24 23:34:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][440/625] eta 0:01:18 lr 0.000951 wd 0.0500 time 0.3991 (0.4221) data time 0.0009 (0.0031) model time 0.3982 (0.4215) loss 7.8167 (7.6270) grad_norm 3.8096 (inf) loss_scale 1024.0000 (1722.9206) mem 14939MB [2024-07-24 23:35:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][450/625] eta 0:01:13 lr 0.000951 wd 0.0500 time 0.3906 (0.4216) data time 0.0009 (0.0031) model time 0.3896 (0.4210) loss 8.5501 (7.6254) grad_norm 1.8278 (inf) loss_scale 1024.0000 (1707.4235) mem 14939MB [2024-07-24 23:35:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][460/625] eta 0:01:09 lr 0.000951 wd 0.0500 time 0.3996 (0.4212) data time 0.0009 (0.0030) model time 0.3987 (0.4205) loss 6.7621 (7.6176) grad_norm 2.0362 (inf) loss_scale 1024.0000 (1692.5987) mem 14939MB [2024-07-24 23:35:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][470/625] eta 0:01:05 lr 0.000951 wd 0.0500 time 0.4081 (0.4208) data time 0.0008 (0.0030) model time 0.4073 (0.4200) loss 6.0936 (7.6162) grad_norm 4.3301 (inf) loss_scale 1024.0000 (1678.4034) mem 14939MB [2024-07-24 23:35:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][480/625] eta 0:01:00 lr 0.000951 wd 0.0500 time 0.4024 (0.4203) data time 0.0007 (0.0030) model time 0.4017 (0.4195) loss 8.0875 (7.6157) grad_norm 2.3191 (inf) loss_scale 1024.0000 (1664.7983) mem 14939MB [2024-07-24 23:35:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][490/625] eta 0:00:56 lr 0.000951 wd 0.0500 time 0.4001 (0.4199) data time 0.0008 (0.0029) model time 0.3993 (0.4191) loss 8.9026 (7.6107) grad_norm 1.4617 (inf) loss_scale 1024.0000 (1651.7475) mem 14939MB [2024-07-24 23:35:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][500/625] eta 0:00:52 lr 0.000951 wd 0.0500 time 0.4048 (0.4195) data time 0.0007 (0.0029) model time 0.4041 (0.4186) loss 7.2419 (7.6198) grad_norm 1.7569 (inf) loss_scale 1024.0000 (1639.2176) mem 14939MB [2024-07-24 23:35:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][510/625] eta 0:00:48 lr 0.000951 wd 0.0500 time 0.3983 (0.4192) data time 0.0008 (0.0028) model time 0.3974 (0.4182) loss 8.6146 (7.6264) grad_norm 2.3530 (inf) loss_scale 1024.0000 (1627.1781) mem 14939MB [2024-07-24 23:35:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][520/625] eta 0:00:44 lr 0.000951 wd 0.0500 time 0.3960 (0.4201) data time 0.0009 (0.0028) model time 0.3951 (0.4192) loss 7.8318 (7.6327) grad_norm 1.6898 (inf) loss_scale 1024.0000 (1615.6008) mem 14939MB [2024-07-24 23:35:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][530/625] eta 0:00:40 lr 0.000951 wd 0.0500 time 0.3949 (0.4218) data time 0.0008 (0.0028) model time 0.3941 (0.4211) loss 8.3335 (7.6297) grad_norm 2.8700 (inf) loss_scale 1024.0000 (1604.4595) mem 14939MB [2024-07-24 23:35:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][540/625] eta 0:00:35 lr 0.000950 wd 0.0500 time 0.3965 (0.4213) data time 0.0006 (0.0027) model time 0.3959 (0.4206) loss 7.7411 (7.6271) grad_norm 1.7108 (inf) loss_scale 1024.0000 (1593.7301) mem 14939MB [2024-07-24 23:35:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][550/625] eta 0:00:31 lr 0.000950 wd 0.0500 time 0.4086 (0.4210) data time 0.0008 (0.0027) model time 0.4078 (0.4202) loss 7.8570 (7.6314) grad_norm 2.2228 (inf) loss_scale 1024.0000 (1583.3902) mem 14939MB [2024-07-24 23:35:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][560/625] eta 0:00:27 lr 0.000950 wd 0.0500 time 0.3937 (0.4206) data time 0.0007 (0.0027) model time 0.3930 (0.4198) loss 6.4710 (7.6379) grad_norm 1.5150 (inf) loss_scale 1024.0000 (1573.4189) mem 14939MB [2024-07-24 23:35:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][570/625] eta 0:00:23 lr 0.000950 wd 0.0500 time 0.3991 (0.4203) data time 0.0009 (0.0026) model time 0.3982 (0.4194) loss 8.2679 (7.6419) grad_norm 2.9001 (inf) loss_scale 1024.0000 (1563.7968) mem 14939MB [2024-07-24 23:35:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][580/625] eta 0:00:18 lr 0.000950 wd 0.0500 time 0.3982 (0.4199) data time 0.0007 (0.0026) model time 0.3975 (0.4190) loss 7.8113 (7.6470) grad_norm 2.0012 (inf) loss_scale 1024.0000 (1554.5060) mem 14939MB [2024-07-24 23:35:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][590/625] eta 0:00:14 lr 0.000950 wd 0.0500 time 0.3985 (0.4196) data time 0.0006 (0.0026) model time 0.3979 (0.4187) loss 6.5781 (7.6460) grad_norm 2.0573 (inf) loss_scale 1024.0000 (1545.5296) mem 14939MB [2024-07-24 23:36:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][600/625] eta 0:00:10 lr 0.000950 wd 0.0500 time 0.4043 (0.4193) data time 0.0009 (0.0025) model time 0.4035 (0.4183) loss 8.5389 (7.6422) grad_norm 2.0309 (inf) loss_scale 1024.0000 (1536.8519) mem 14939MB [2024-07-24 23:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][610/625] eta 0:00:06 lr 0.000950 wd 0.0500 time 0.3994 (0.4190) data time 0.0004 (0.0025) model time 0.3990 (0.4180) loss 6.4855 (7.6408) grad_norm 1.3350 (inf) loss_scale 1024.0000 (1528.4583) mem 14939MB [2024-07-24 23:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][620/625] eta 0:00:02 lr 0.000950 wd 0.0500 time 0.3958 (0.4187) data time 0.0006 (0.0025) model time 0.3952 (0.4177) loss 8.6706 (7.6401) grad_norm 2.1430 (inf) loss_scale 1024.0000 (1520.3349) mem 14939MB [2024-07-24 23:36:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 104 training takes 0:04:21 [2024-07-24 23:36:12 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:36:13 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.572 (0.572) Loss 0.6357 (0.6357) Acc@1 87.256 (87.256) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-24 23:36:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.130) Loss 1.0771 (0.7891) Acc@1 76.270 (83.780) Acc@5 93.799 (96.973) Mem 14939MB [2024-07-24 23:36:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.109) Loss 1.1621 (0.9337) Acc@1 72.510 (79.999) Acc@5 93.408 (95.345) Mem 14939MB [2024-07-24 23:36:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.663 Acc@5 95.308 [2024-07-24 23:36:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-07-24 23:36:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.991 (0.991) Loss 0.6079 (0.6079) Acc@1 87.891 (87.891) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-24 23:36:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.176) Loss 0.9873 (0.7526) Acc@1 79.150 (84.517) Acc@5 94.629 (97.155) Mem 14939MB [2024-07-24 23:36:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.133) Loss 1.1191 (0.8911) Acc@1 73.535 (80.931) Acc@5 93.848 (95.608) Mem 14939MB [2024-07-24 23:36:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.588 Acc@5 95.595 [2024-07-24 23:36:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-24 23:36:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.59% [2024-07-24 23:36:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:36:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][0/625] eta 0:07:59 lr 0.000950 wd 0.0500 time 0.7675 (0.7675) data time 0.3882 (0.3882) model time 0.0000 (0.0000) loss 7.9085 (7.9085) grad_norm 2.9652 (2.9652) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][10/625] eta 0:04:27 lr 0.000950 wd 0.0500 time 0.3996 (0.4348) data time 0.0006 (0.0361) model time 0.0000 (0.0000) loss 7.1316 (7.8304) grad_norm 3.8563 (2.5346) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][20/625] eta 0:04:13 lr 0.000950 wd 0.0500 time 0.3969 (0.4183) data time 0.0009 (0.0193) model time 0.0000 (0.0000) loss 6.7983 (7.7271) grad_norm 1.8708 (2.3625) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][30/625] eta 0:04:05 lr 0.000949 wd 0.0500 time 0.4021 (0.4123) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 8.5423 (7.6425) grad_norm 1.9900 (2.2795) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][40/625] eta 0:04:00 lr 0.000949 wd 0.0500 time 0.4452 (0.4105) data time 0.0010 (0.0104) model time 0.0000 (0.0000) loss 7.9138 (7.6079) grad_norm 1.8626 (2.2552) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][50/625] eta 0:03:55 lr 0.000949 wd 0.0500 time 0.3984 (0.4089) data time 0.0009 (0.0085) model time 0.0000 (0.0000) loss 7.8759 (7.6172) grad_norm 2.1006 (2.2180) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][60/625] eta 0:03:50 lr 0.000949 wd 0.0500 time 0.3990 (0.4074) data time 0.0009 (0.0073) model time 0.3982 (0.3990) loss 7.5821 (7.5885) grad_norm 2.5256 (2.3152) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][70/625] eta 0:03:45 lr 0.000949 wd 0.0500 time 0.3976 (0.4065) data time 0.0006 (0.0064) model time 0.3970 (0.3992) loss 7.5450 (7.6499) grad_norm 1.6093 (2.2951) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][80/625] eta 0:03:41 lr 0.000949 wd 0.0500 time 0.3978 (0.4056) data time 0.0007 (0.0058) model time 0.3971 (0.3990) loss 8.1991 (7.6871) grad_norm 2.2392 (2.2934) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:36:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][90/625] eta 0:03:36 lr 0.000949 wd 0.0500 time 0.3996 (0.4050) data time 0.0009 (0.0052) model time 0.3987 (0.3989) loss 8.2161 (7.7047) grad_norm 2.0877 (2.3368) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][100/625] eta 0:03:32 lr 0.000949 wd 0.0500 time 0.3964 (0.4045) data time 0.0009 (0.0048) model time 0.3955 (0.3990) loss 7.9551 (7.7032) grad_norm 2.6771 (2.3430) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][110/625] eta 0:03:29 lr 0.000949 wd 0.0500 time 0.5869 (0.4059) data time 0.0007 (0.0045) model time 0.5862 (0.4024) loss 7.7983 (7.7041) grad_norm 1.6572 (2.3250) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][120/625] eta 0:03:27 lr 0.000949 wd 0.0500 time 0.5736 (0.4119) data time 0.0006 (0.0042) model time 0.5731 (0.4130) loss 6.9012 (7.6541) grad_norm 2.2555 (2.2955) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][130/625] eta 0:03:25 lr 0.000949 wd 0.0500 time 0.4015 (0.4153) data time 0.0009 (0.0039) model time 0.4006 (0.4184) loss 8.9734 (7.6875) grad_norm 1.8856 (2.2700) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][140/625] eta 0:03:20 lr 0.000949 wd 0.0500 time 0.3989 (0.4143) data time 0.0006 (0.0037) model time 0.3982 (0.4164) loss 7.2145 (7.6704) grad_norm 3.2168 (2.2848) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][150/625] eta 0:03:16 lr 0.000948 wd 0.0500 time 0.4050 (0.4137) data time 0.0006 (0.0035) model time 0.4043 (0.4152) loss 7.2580 (7.6593) grad_norm 1.5966 (2.2785) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][160/625] eta 0:03:12 lr 0.000948 wd 0.0500 time 0.3982 (0.4130) data time 0.0009 (0.0034) model time 0.3974 (0.4139) loss 8.1658 (7.6630) grad_norm 1.4366 (2.2549) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][170/625] eta 0:03:07 lr 0.000948 wd 0.0500 time 0.3961 (0.4123) data time 0.0007 (0.0032) model time 0.3954 (0.4128) loss 7.6700 (7.6529) grad_norm 1.6809 (2.2254) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][180/625] eta 0:03:03 lr 0.000948 wd 0.0500 time 0.3965 (0.4118) data time 0.0007 (0.0031) model time 0.3958 (0.4120) loss 8.0196 (7.6456) grad_norm 2.1859 (2.2243) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][190/625] eta 0:02:59 lr 0.000948 wd 0.0500 time 0.3952 (0.4121) data time 0.0009 (0.0030) model time 0.3943 (0.4123) loss 9.2084 (7.6493) grad_norm 2.3269 (2.2107) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][200/625] eta 0:02:54 lr 0.000948 wd 0.0500 time 0.4064 (0.4116) data time 0.0008 (0.0029) model time 0.4056 (0.4115) loss 8.4921 (7.6340) grad_norm 2.0572 (2.2078) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][210/625] eta 0:02:50 lr 0.000948 wd 0.0500 time 0.3972 (0.4110) data time 0.0007 (0.0028) model time 0.3964 (0.4107) loss 7.3181 (7.6122) grad_norm 2.5565 (2.2069) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][220/625] eta 0:02:46 lr 0.000948 wd 0.0500 time 0.4069 (0.4106) data time 0.0009 (0.0027) model time 0.4061 (0.4102) loss 8.7070 (7.6151) grad_norm 2.8162 (2.1958) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][230/625] eta 0:02:42 lr 0.000948 wd 0.0500 time 0.4013 (0.4102) data time 0.0007 (0.0026) model time 0.4006 (0.4096) loss 7.2677 (7.6261) grad_norm 2.4724 (2.2048) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:37:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][240/625] eta 0:02:37 lr 0.000948 wd 0.0500 time 0.4137 (0.4100) data time 0.0007 (0.0025) model time 0.4130 (0.4093) loss 7.9988 (7.6237) grad_norm 2.4977 (2.2462) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][250/625] eta 0:02:33 lr 0.000948 wd 0.0500 time 0.3987 (0.4096) data time 0.0008 (0.0025) model time 0.3979 (0.4089) loss 6.2423 (7.6175) grad_norm 2.1022 (2.2673) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][260/625] eta 0:02:29 lr 0.000947 wd 0.0500 time 0.3979 (0.4092) data time 0.0009 (0.0024) model time 0.3971 (0.4084) loss 8.7060 (7.6230) grad_norm 2.2353 (2.2600) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][270/625] eta 0:02:25 lr 0.000947 wd 0.0500 time 0.4017 (0.4089) data time 0.0009 (0.0024) model time 0.4008 (0.4080) loss 8.3580 (7.6214) grad_norm 1.4324 (2.2500) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][280/625] eta 0:02:21 lr 0.000947 wd 0.0500 time 0.3990 (0.4103) data time 0.0008 (0.0023) model time 0.3982 (0.4097) loss 7.5819 (7.6336) grad_norm 1.5337 (2.2331) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][290/625] eta 0:02:17 lr 0.000947 wd 0.0500 time 0.4037 (0.4100) data time 0.0008 (0.0023) model time 0.4030 (0.4093) loss 9.0273 (7.6435) grad_norm 1.4007 (2.2258) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][300/625] eta 0:02:13 lr 0.000947 wd 0.0500 time 0.4002 (0.4098) data time 0.0007 (0.0022) model time 0.3995 (0.4090) loss 6.6423 (7.6312) grad_norm 1.8471 (2.2151) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][310/625] eta 0:02:09 lr 0.000947 wd 0.0500 time 0.3987 (0.4097) data time 0.0010 (0.0022) model time 0.3977 (0.4089) loss 7.9515 (7.6176) grad_norm 1.7734 (2.2212) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][320/625] eta 0:02:04 lr 0.000947 wd 0.0500 time 0.4044 (0.4095) data time 0.0006 (0.0021) model time 0.4038 (0.4086) loss 6.3406 (7.6121) grad_norm 2.9034 (2.2219) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][330/625] eta 0:02:00 lr 0.000947 wd 0.0500 time 0.5899 (0.4098) data time 0.0010 (0.0021) model time 0.5890 (0.4090) loss 7.5782 (7.6108) grad_norm 1.9748 (2.2136) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][340/625] eta 0:01:57 lr 0.000947 wd 0.0500 time 0.5769 (0.4121) data time 0.0006 (0.0021) model time 0.5763 (0.4117) loss 7.0590 (7.6135) grad_norm 1.7122 (2.2019) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][350/625] eta 0:01:53 lr 0.000947 wd 0.0500 time 0.3993 (0.4138) data time 0.0009 (0.0020) model time 0.3985 (0.4136) loss 6.9181 (7.6255) grad_norm 3.3023 (2.2020) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][360/625] eta 0:01:49 lr 0.000947 wd 0.0500 time 0.3999 (0.4137) data time 0.0008 (0.0020) model time 0.3991 (0.4135) loss 8.1061 (7.6219) grad_norm 2.1482 (2.2117) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][370/625] eta 0:01:45 lr 0.000947 wd 0.0500 time 0.3938 (0.4133) data time 0.0008 (0.0020) model time 0.3930 (0.4131) loss 7.2192 (7.6242) grad_norm 1.7537 (2.2150) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][380/625] eta 0:01:41 lr 0.000946 wd 0.0500 time 0.3983 (0.4130) data time 0.0008 (0.0020) model time 0.3974 (0.4127) loss 8.2212 (7.6189) grad_norm 1.9049 (2.2012) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][390/625] eta 0:01:36 lr 0.000946 wd 0.0500 time 0.3993 (0.4127) data time 0.0007 (0.0019) model time 0.3987 (0.4123) loss 6.5643 (7.6217) grad_norm 1.5097 (2.1996) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][400/625] eta 0:01:32 lr 0.000946 wd 0.0500 time 0.3964 (0.4124) data time 0.0006 (0.0019) model time 0.3957 (0.4120) loss 6.4039 (7.6323) grad_norm 1.6335 (2.1987) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][410/625] eta 0:01:28 lr 0.000946 wd 0.0500 time 0.3938 (0.4124) data time 0.0009 (0.0019) model time 0.3929 (0.4120) loss 7.1446 (7.6304) grad_norm 1.9111 (2.1911) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][420/625] eta 0:01:24 lr 0.000946 wd 0.0500 time 0.4035 (0.4122) data time 0.0009 (0.0019) model time 0.4026 (0.4117) loss 8.3002 (7.6232) grad_norm 2.2508 (2.1922) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][430/625] eta 0:01:20 lr 0.000946 wd 0.0500 time 0.3976 (0.4119) data time 0.0009 (0.0018) model time 0.3967 (0.4114) loss 7.4812 (7.6196) grad_norm 3.1564 (2.1929) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][440/625] eta 0:01:16 lr 0.000946 wd 0.0500 time 0.3984 (0.4116) data time 0.0008 (0.0018) model time 0.3976 (0.4111) loss 8.0345 (7.6207) grad_norm 2.8382 (2.2137) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][450/625] eta 0:01:11 lr 0.000946 wd 0.0500 time 0.4037 (0.4114) data time 0.0009 (0.0018) model time 0.4028 (0.4108) loss 6.8444 (7.6117) grad_norm 2.4618 (2.2177) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][460/625] eta 0:01:07 lr 0.000946 wd 0.0500 time 0.4049 (0.4112) data time 0.0007 (0.0018) model time 0.4042 (0.4105) loss 6.2406 (7.6165) grad_norm 1.9334 (2.2116) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][470/625] eta 0:01:03 lr 0.000946 wd 0.0500 time 0.4069 (0.4110) data time 0.0006 (0.0018) model time 0.4062 (0.4103) loss 6.8910 (7.6187) grad_norm 1.4511 (2.2112) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][480/625] eta 0:00:59 lr 0.000946 wd 0.0500 time 0.4031 (0.4107) data time 0.0007 (0.0017) model time 0.4024 (0.4100) loss 6.7634 (7.6202) grad_norm 1.5883 (2.2075) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][490/625] eta 0:00:55 lr 0.000945 wd 0.0500 time 0.4000 (0.4108) data time 0.0006 (0.0018) model time 0.3994 (0.4100) loss 8.2537 (7.6272) grad_norm 1.6649 (2.2110) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][500/625] eta 0:00:51 lr 0.000945 wd 0.0500 time 0.3980 (0.4106) data time 0.0008 (0.0018) model time 0.3972 (0.4098) loss 9.3773 (7.6219) grad_norm 4.1157 (2.2116) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][510/625] eta 0:00:47 lr 0.000945 wd 0.0500 time 0.4047 (0.4106) data time 0.0008 (0.0018) model time 0.4039 (0.4098) loss 7.9052 (7.6275) grad_norm 3.4503 (2.2167) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][520/625] eta 0:00:43 lr 0.000945 wd 0.0500 time 0.3942 (0.4103) data time 0.0009 (0.0017) model time 0.3933 (0.4095) loss 8.3307 (7.6208) grad_norm 1.4947 (2.2220) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:39:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][530/625] eta 0:00:38 lr 0.000945 wd 0.0500 time 0.4009 (0.4102) data time 0.0010 (0.0017) model time 0.4000 (0.4094) loss 8.6322 (7.6250) grad_norm 1.7576 (2.2168) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][540/625] eta 0:00:34 lr 0.000945 wd 0.0500 time 0.4035 (0.4100) data time 0.0008 (0.0017) model time 0.4027 (0.4092) loss 8.1841 (7.6252) grad_norm 1.5010 (2.2062) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][550/625] eta 0:00:30 lr 0.000945 wd 0.0500 time 0.3967 (0.4098) data time 0.0006 (0.0017) model time 0.3961 (0.4089) loss 6.4421 (7.6209) grad_norm 4.2329 (2.2027) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][560/625] eta 0:00:26 lr 0.000945 wd 0.0500 time 0.4081 (0.4113) data time 0.0008 (0.0017) model time 0.4073 (0.4105) loss 7.5453 (7.6184) grad_norm 1.6332 (2.2050) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][570/625] eta 0:00:22 lr 0.000945 wd 0.0500 time 0.4018 (0.4127) data time 0.0009 (0.0017) model time 0.4009 (0.4121) loss 9.1448 (7.6169) grad_norm 1.8825 (2.2037) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][580/625] eta 0:00:18 lr 0.000945 wd 0.0500 time 0.3974 (0.4125) data time 0.0008 (0.0017) model time 0.3966 (0.4119) loss 8.2307 (7.6166) grad_norm 2.9090 (2.2047) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][590/625] eta 0:00:14 lr 0.000945 wd 0.0500 time 0.4038 (0.4123) data time 0.0006 (0.0016) model time 0.4032 (0.4116) loss 7.6259 (7.6207) grad_norm 1.8162 (2.2040) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][600/625] eta 0:00:10 lr 0.000944 wd 0.0500 time 0.4246 (0.4124) data time 0.0009 (0.0017) model time 0.4236 (0.4117) loss 6.4328 (7.6169) grad_norm 2.7098 (2.1998) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][610/625] eta 0:00:06 lr 0.000944 wd 0.0500 time 0.3944 (0.4122) data time 0.0006 (0.0017) model time 0.3938 (0.4114) loss 6.8039 (7.6149) grad_norm 1.3590 (2.2035) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][620/625] eta 0:00:02 lr 0.000944 wd 0.0500 time 0.3986 (0.4120) data time 0.0004 (0.0017) model time 0.3982 (0.4112) loss 8.7654 (7.6152) grad_norm 1.9888 (2.2006) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 105 training takes 0:04:17 [2024-07-24 23:40:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:40:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.6323 (0.6323) Acc@1 87.402 (87.402) Acc@5 98.242 (98.242) Mem 14939MB [2024-07-24 23:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 1.0215 (0.7894) Acc@1 77.344 (83.891) Acc@5 94.727 (96.999) Mem 14939MB [2024-07-24 23:40:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1387 (0.9305) Acc@1 74.756 (80.297) Acc@5 93.164 (95.387) Mem 14939MB [2024-07-24 23:40:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.946 Acc@5 95.353 [2024-07-24 23:40:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-24 23:40:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.95% [2024-07-24 23:40:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 23:40:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 23:40:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.6060 (0.6060) Acc@1 87.988 (87.988) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-24 23:40:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.122) Loss 0.9863 (0.7510) Acc@1 79.150 (84.579) Acc@5 94.678 (97.181) Mem 14939MB [2024-07-24 23:40:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.1152 (0.8891) Acc@1 73.584 (80.985) Acc@5 93.848 (95.640) Mem 14939MB [2024-07-24 23:40:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.642 Acc@5 95.623 [2024-07-24 23:40:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-24 23:40:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.64% [2024-07-24 23:40:44 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:40:45 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:40:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][0/625] eta 0:08:41 lr 0.000944 wd 0.0500 time 0.8341 (0.8341) data time 0.4370 (0.4370) model time 0.0000 (0.0000) loss 8.3295 (8.3295) grad_norm 1.6648 (1.6648) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][10/625] eta 0:04:28 lr 0.000944 wd 0.0500 time 0.3970 (0.4374) data time 0.0008 (0.0405) model time 0.0000 (0.0000) loss 7.2224 (7.5118) grad_norm 2.1764 (1.7170) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][20/625] eta 0:04:13 lr 0.000944 wd 0.0500 time 0.3992 (0.4190) data time 0.0006 (0.0216) model time 0.0000 (0.0000) loss 7.5538 (7.6975) grad_norm 1.8991 (1.8587) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:40:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][30/625] eta 0:04:05 lr 0.000944 wd 0.0500 time 0.3967 (0.4125) data time 0.0009 (0.0149) model time 0.0000 (0.0000) loss 8.5417 (7.7356) grad_norm 1.6122 (1.8110) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][40/625] eta 0:03:59 lr 0.000944 wd 0.0500 time 0.3989 (0.4095) data time 0.0006 (0.0115) model time 0.0000 (0.0000) loss 7.0422 (7.7050) grad_norm 1.4310 (1.8210) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][50/625] eta 0:03:54 lr 0.000944 wd 0.0500 time 0.4001 (0.4079) data time 0.0007 (0.0094) model time 0.0000 (0.0000) loss 7.3278 (7.7047) grad_norm 3.7499 (2.0051) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][60/625] eta 0:03:49 lr 0.000944 wd 0.0500 time 0.3894 (0.4067) data time 0.0011 (0.0080) model time 0.3883 (0.4000) loss 5.8817 (7.5852) grad_norm 2.1234 (2.0156) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][70/625] eta 0:03:45 lr 0.000944 wd 0.0500 time 0.3980 (0.4058) data time 0.0011 (0.0071) model time 0.3969 (0.3996) loss 6.9943 (7.5617) grad_norm 1.7132 (2.0093) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][80/625] eta 0:03:40 lr 0.000944 wd 0.0500 time 0.3991 (0.4052) data time 0.0008 (0.0063) model time 0.3983 (0.3996) loss 8.1009 (7.5422) grad_norm 1.5960 (2.0007) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][90/625] eta 0:03:36 lr 0.000943 wd 0.0500 time 0.4008 (0.4045) data time 0.0006 (0.0057) model time 0.4002 (0.3993) loss 9.1675 (7.5217) grad_norm 2.0323 (2.0062) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][100/625] eta 0:03:32 lr 0.000943 wd 0.0500 time 0.4008 (0.4039) data time 0.0009 (0.0052) model time 0.4000 (0.3990) loss 8.5252 (7.5123) grad_norm 1.7279 (2.0204) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][110/625] eta 0:03:27 lr 0.000943 wd 0.0500 time 0.4013 (0.4037) data time 0.0007 (0.0048) model time 0.4006 (0.3992) loss 8.2637 (7.5213) grad_norm 2.0194 (2.0239) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][120/625] eta 0:03:24 lr 0.000943 wd 0.0500 time 0.3798 (0.4049) data time 0.0008 (0.0045) model time 0.3790 (0.4018) loss 7.9383 (7.5325) grad_norm 1.2961 (2.0245) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][130/625] eta 0:03:20 lr 0.000943 wd 0.0500 time 0.3982 (0.4045) data time 0.0007 (0.0042) model time 0.3975 (0.4014) loss 8.7833 (7.5433) grad_norm 1.8820 (2.0294) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][140/625] eta 0:03:16 lr 0.000943 wd 0.0500 time 0.3961 (0.4041) data time 0.0008 (0.0040) model time 0.3952 (0.4011) loss 6.1838 (7.5099) grad_norm 1.7413 (2.0286) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][150/625] eta 0:03:13 lr 0.000943 wd 0.0500 time 0.6063 (0.4076) data time 0.0007 (0.0038) model time 0.6056 (0.4066) loss 7.2284 (7.4907) grad_norm 2.4184 (2.0594) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][160/625] eta 0:03:12 lr 0.000943 wd 0.0500 time 0.6106 (0.4129) data time 0.0008 (0.0036) model time 0.6097 (0.4143) loss 7.2725 (7.4825) grad_norm 1.6286 (2.0872) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:41:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][170/625] eta 0:03:08 lr 0.000943 wd 0.0500 time 0.3961 (0.4141) data time 0.0008 (0.0034) model time 0.3953 (0.4159) loss 6.7793 (7.4869) grad_norm 1.8489 (2.0944) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][180/625] eta 0:03:03 lr 0.000943 wd 0.0500 time 0.4045 (0.4135) data time 0.0008 (0.0033) model time 0.4037 (0.4147) loss 7.4974 (7.5022) grad_norm 2.5403 (2.0968) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][190/625] eta 0:02:59 lr 0.000943 wd 0.0500 time 0.4020 (0.4127) data time 0.0008 (0.0032) model time 0.4012 (0.4135) loss 8.3024 (7.5108) grad_norm 2.2082 (2.0980) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][200/625] eta 0:02:55 lr 0.000943 wd 0.0500 time 0.3976 (0.4121) data time 0.0006 (0.0031) model time 0.3969 (0.4127) loss 7.5406 (7.5184) grad_norm 2.2550 (2.0998) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][210/625] eta 0:02:50 lr 0.000942 wd 0.0500 time 0.4001 (0.4115) data time 0.0008 (0.0030) model time 0.3993 (0.4118) loss 8.9684 (7.5228) grad_norm 1.5561 (2.0981) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][220/625] eta 0:02:46 lr 0.000942 wd 0.0500 time 0.3955 (0.4110) data time 0.0008 (0.0029) model time 0.3947 (0.4110) loss 8.8310 (7.5330) grad_norm 3.0595 (2.1126) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][230/625] eta 0:02:42 lr 0.000942 wd 0.0500 time 0.3973 (0.4105) data time 0.0008 (0.0028) model time 0.3965 (0.4103) loss 6.1728 (7.5245) grad_norm 3.0122 (2.1087) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][240/625] eta 0:02:37 lr 0.000942 wd 0.0500 time 0.3984 (0.4100) data time 0.0006 (0.0027) model time 0.3977 (0.4097) loss 7.4053 (7.5304) grad_norm 1.9059 (2.1114) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][250/625] eta 0:02:33 lr 0.000942 wd 0.0500 time 0.3977 (0.4097) data time 0.0007 (0.0026) model time 0.3970 (0.4092) loss 7.3592 (7.5313) grad_norm 3.3151 (2.1207) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][260/625] eta 0:02:29 lr 0.000942 wd 0.0500 time 0.4045 (0.4094) data time 0.0008 (0.0026) model time 0.4037 (0.4088) loss 7.5094 (7.5477) grad_norm 4.9992 (2.1491) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][270/625] eta 0:02:25 lr 0.000942 wd 0.0500 time 0.3988 (0.4091) data time 0.0010 (0.0025) model time 0.3978 (0.4084) loss 7.3988 (7.5434) grad_norm 2.8596 (2.1742) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][280/625] eta 0:02:21 lr 0.000942 wd 0.0500 time 0.3996 (0.4088) data time 0.0010 (0.0024) model time 0.3986 (0.4081) loss 8.1157 (7.5441) grad_norm 1.4131 (2.1694) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][290/625] eta 0:02:16 lr 0.000942 wd 0.0500 time 0.3986 (0.4085) data time 0.0007 (0.0024) model time 0.3978 (0.4077) loss 7.8051 (7.5360) grad_norm 2.8077 (2.1841) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][300/625] eta 0:02:12 lr 0.000942 wd 0.0500 time 0.4012 (0.4082) data time 0.0007 (0.0023) model time 0.4005 (0.4073) loss 8.8074 (7.5453) grad_norm 1.9458 (2.1805) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][310/625] eta 0:02:08 lr 0.000942 wd 0.0500 time 0.4034 (0.4079) data time 0.0008 (0.0023) model time 0.4026 (0.4070) loss 6.3870 (7.5386) grad_norm 1.6352 (2.1737) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:42:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][320/625] eta 0:02:04 lr 0.000941 wd 0.0500 time 0.3978 (0.4077) data time 0.0007 (0.0023) model time 0.3971 (0.4068) loss 6.6209 (7.5340) grad_norm 1.5434 (2.1749) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][330/625] eta 0:02:00 lr 0.000941 wd 0.0500 time 0.4015 (0.4075) data time 0.0007 (0.0022) model time 0.4008 (0.4065) loss 7.0089 (7.5155) grad_norm 1.5756 (2.1859) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][340/625] eta 0:01:56 lr 0.000941 wd 0.0500 time 0.5985 (0.4078) data time 0.0006 (0.0022) model time 0.5979 (0.4069) loss 7.0222 (7.5240) grad_norm 1.9871 (2.1796) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][350/625] eta 0:01:52 lr 0.000941 wd 0.0500 time 0.3985 (0.4075) data time 0.0008 (0.0021) model time 0.3978 (0.4066) loss 6.6666 (7.5198) grad_norm 1.5452 (2.1922) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][360/625] eta 0:01:47 lr 0.000941 wd 0.0500 time 0.3994 (0.4073) data time 0.0007 (0.0021) model time 0.3987 (0.4063) loss 6.8780 (7.5202) grad_norm 1.7284 (2.1975) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][370/625] eta 0:01:44 lr 0.000941 wd 0.0500 time 0.5900 (0.4089) data time 0.0008 (0.0021) model time 0.5892 (0.4081) loss 8.3898 (7.5243) grad_norm 2.6024 (2.2018) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][380/625] eta 0:01:40 lr 0.000941 wd 0.0500 time 0.5909 (0.4114) data time 0.0007 (0.0020) model time 0.5902 (0.4111) loss 6.2644 (7.5232) grad_norm 1.9265 (2.1944) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][390/625] eta 0:01:36 lr 0.000941 wd 0.0500 time 0.3974 (0.4125) data time 0.0008 (0.0020) model time 0.3966 (0.4123) loss 6.9373 (7.5171) grad_norm 3.1374 (2.1996) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][400/625] eta 0:01:32 lr 0.000941 wd 0.0500 time 0.4166 (0.4122) data time 0.0008 (0.0020) model time 0.4158 (0.4119) loss 7.2696 (7.5103) grad_norm 3.2628 (2.2101) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][410/625] eta 0:01:28 lr 0.000941 wd 0.0500 time 0.4003 (0.4119) data time 0.0007 (0.0020) model time 0.3996 (0.4115) loss 8.0409 (7.5104) grad_norm 3.3713 (2.2516) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][420/625] eta 0:01:25 lr 0.000941 wd 0.0500 time 0.3993 (0.4160) data time 0.0006 (0.0020) model time 0.3987 (0.4162) loss 7.0738 (7.5080) grad_norm 1.8625 (2.2461) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][430/625] eta 0:01:21 lr 0.000940 wd 0.0500 time 0.3955 (0.4156) data time 0.0006 (0.0019) model time 0.3949 (0.4157) loss 7.1253 (7.5166) grad_norm 2.1502 (2.2382) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][440/625] eta 0:01:16 lr 0.000940 wd 0.0500 time 0.3964 (0.4152) data time 0.0006 (0.0019) model time 0.3958 (0.4152) loss 7.7720 (7.5108) grad_norm 1.8175 (2.2315) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][450/625] eta 0:01:12 lr 0.000940 wd 0.0500 time 0.3983 (0.4148) data time 0.0006 (0.0019) model time 0.3977 (0.4147) loss 7.2824 (7.5088) grad_norm 2.6496 (2.2283) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:43:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][460/625] eta 0:01:08 lr 0.000940 wd 0.0500 time 0.4039 (0.4144) data time 0.0006 (0.0019) model time 0.4033 (0.4143) loss 7.5521 (7.5164) grad_norm 1.9040 (2.2385) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][470/625] eta 0:01:04 lr 0.000940 wd 0.0500 time 0.3986 (0.4141) data time 0.0007 (0.0018) model time 0.3980 (0.4139) loss 7.7213 (7.5213) grad_norm 1.4187 (2.2369) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][480/625] eta 0:01:00 lr 0.000940 wd 0.0500 time 0.3992 (0.4138) data time 0.0009 (0.0018) model time 0.3983 (0.4136) loss 7.2297 (7.5269) grad_norm 1.6075 (2.2338) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][490/625] eta 0:00:55 lr 0.000940 wd 0.0500 time 0.3967 (0.4135) data time 0.0008 (0.0018) model time 0.3959 (0.4132) loss 7.8710 (7.5313) grad_norm 2.2969 (2.2306) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][500/625] eta 0:00:51 lr 0.000940 wd 0.0500 time 0.3986 (0.4132) data time 0.0008 (0.0018) model time 0.3979 (0.4129) loss 7.4746 (7.5288) grad_norm 1.8055 (2.2251) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][510/625] eta 0:00:47 lr 0.000940 wd 0.0500 time 0.3940 (0.4129) data time 0.0009 (0.0018) model time 0.3931 (0.4126) loss 7.8306 (7.5349) grad_norm 1.8693 (2.2269) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][520/625] eta 0:00:43 lr 0.000940 wd 0.0500 time 0.4022 (0.4127) data time 0.0007 (0.0018) model time 0.4015 (0.4123) loss 8.6739 (7.5287) grad_norm 2.1078 (2.2306) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][530/625] eta 0:00:39 lr 0.000940 wd 0.0500 time 0.3964 (0.4124) data time 0.0006 (0.0017) model time 0.3958 (0.4120) loss 7.2286 (7.5258) grad_norm 10.8023 (2.2463) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][540/625] eta 0:00:35 lr 0.000940 wd 0.0500 time 0.3952 (0.4122) data time 0.0006 (0.0017) model time 0.3946 (0.4117) loss 7.4550 (7.5308) grad_norm 1.9473 (2.2558) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][550/625] eta 0:00:30 lr 0.000939 wd 0.0500 time 0.3977 (0.4119) data time 0.0006 (0.0017) model time 0.3971 (0.4114) loss 7.2549 (7.5373) grad_norm 2.0507 (2.2570) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][560/625] eta 0:00:26 lr 0.000939 wd 0.0500 time 0.3972 (0.4120) data time 0.0008 (0.0017) model time 0.3964 (0.4115) loss 6.8044 (7.5338) grad_norm 2.3659 (2.2555) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][570/625] eta 0:00:22 lr 0.000939 wd 0.0500 time 0.3974 (0.4118) data time 0.0008 (0.0017) model time 0.3967 (0.4113) loss 7.8612 (7.5396) grad_norm 3.5790 (2.2548) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][580/625] eta 0:00:18 lr 0.000939 wd 0.0500 time 0.3993 (0.4116) data time 0.0009 (0.0017) model time 0.3984 (0.4110) loss 7.2659 (7.5364) grad_norm 1.7058 (2.2585) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][590/625] eta 0:00:14 lr 0.000939 wd 0.0500 time 0.4021 (0.4122) data time 0.0008 (0.0017) model time 0.4013 (0.4117) loss 7.4846 (7.5438) grad_norm 2.2084 (2.2535) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][600/625] eta 0:00:10 lr 0.000939 wd 0.0500 time 0.5809 (0.4132) data time 0.0007 (0.0016) model time 0.5802 (0.4128) loss 8.5412 (7.5451) grad_norm 3.3258 (2.2594) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:44:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][610/625] eta 0:00:06 lr 0.000939 wd 0.0500 time 0.3961 (0.4136) data time 0.0004 (0.0016) model time 0.3957 (0.4131) loss 8.5223 (7.5494) grad_norm 2.0622 (2.2682) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][620/625] eta 0:00:02 lr 0.000939 wd 0.0500 time 0.3951 (0.4133) data time 0.0005 (0.0016) model time 0.3946 (0.4128) loss 7.4028 (7.5476) grad_norm 1.9377 (2.2649) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 106 training takes 0:04:18 [2024-07-24 23:45:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:45:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.6367 (0.6367) Acc@1 87.305 (87.305) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-24 23:45:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 1.0312 (0.7752) Acc@1 77.295 (83.913) Acc@5 94.336 (96.902) Mem 14939MB [2024-07-24 23:45:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 1.1650 (0.9243) Acc@1 72.314 (80.094) Acc@5 92.969 (95.301) Mem 14939MB [2024-07-24 23:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.738 Acc@5 95.292 [2024-07-24 23:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-07-24 23:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.786 (0.786) Loss 0.6040 (0.6040) Acc@1 88.037 (88.037) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-24 23:45:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.157) Loss 0.9849 (0.7492) Acc@1 79.199 (84.655) Acc@5 94.727 (97.190) Mem 14939MB [2024-07-24 23:45:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 1.1143 (0.8872) Acc@1 73.486 (81.031) Acc@5 93.896 (95.673) Mem 14939MB [2024-07-24 23:45:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.684 Acc@5 95.647 [2024-07-24 23:45:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-07-24 23:45:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.68% [2024-07-24 23:45:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:45:10 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][0/625] eta 0:08:53 lr 0.000939 wd 0.0500 time 0.8538 (0.8538) data time 0.4727 (0.4727) model time 0.0000 (0.0000) loss 8.9148 (8.9148) grad_norm 2.5512 (2.5512) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][10/625] eta 0:04:30 lr 0.000939 wd 0.0500 time 0.3973 (0.4400) data time 0.0008 (0.0438) model time 0.0000 (0.0000) loss 8.1016 (7.9239) grad_norm 2.1986 (2.6194) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][20/625] eta 0:04:14 lr 0.000939 wd 0.0500 time 0.3995 (0.4206) data time 0.0008 (0.0234) model time 0.0000 (0.0000) loss 7.4816 (7.8239) grad_norm 1.9126 (2.4897) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][30/625] eta 0:04:06 lr 0.000939 wd 0.0500 time 0.4044 (0.4139) data time 0.0008 (0.0162) model time 0.0000 (0.0000) loss 7.6424 (7.7506) grad_norm 1.9494 (2.2901) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][40/625] eta 0:03:59 lr 0.000938 wd 0.0500 time 0.4000 (0.4100) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 5.9687 (7.7145) grad_norm 2.0149 (2.2609) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][50/625] eta 0:03:54 lr 0.000938 wd 0.0500 time 0.3979 (0.4076) data time 0.0009 (0.0102) model time 0.0000 (0.0000) loss 8.1821 (7.7197) grad_norm 1.4902 (2.1537) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][60/625] eta 0:03:49 lr 0.000938 wd 0.0500 time 0.3976 (0.4060) data time 0.0008 (0.0087) model time 0.3968 (0.3969) loss 7.7900 (7.7523) grad_norm 1.6502 (2.1269) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][70/625] eta 0:03:44 lr 0.000938 wd 0.0500 time 0.3945 (0.4049) data time 0.0008 (0.0076) model time 0.3936 (0.3970) loss 7.3390 (7.7179) grad_norm 4.1692 (2.2463) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][80/625] eta 0:03:40 lr 0.000938 wd 0.0500 time 0.3981 (0.4042) data time 0.0007 (0.0068) model time 0.3974 (0.3974) loss 8.8803 (7.6790) grad_norm 1.5026 (2.2226) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][90/625] eta 0:03:35 lr 0.000938 wd 0.0500 time 0.4015 (0.4036) data time 0.0009 (0.0061) model time 0.4007 (0.3975) loss 6.8566 (7.6676) grad_norm 2.4805 (2.1970) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][100/625] eta 0:03:32 lr 0.000938 wd 0.0500 time 0.3979 (0.4047) data time 0.0009 (0.0056) model time 0.3970 (0.4007) loss 7.0125 (7.6814) grad_norm 1.7894 (2.1928) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][110/625] eta 0:03:28 lr 0.000938 wd 0.0500 time 0.4086 (0.4043) data time 0.0006 (0.0052) model time 0.4080 (0.4005) loss 6.9503 (7.6593) grad_norm 1.8326 (2.1717) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][120/625] eta 0:03:23 lr 0.000938 wd 0.0500 time 0.3997 (0.4038) data time 0.0007 (0.0048) model time 0.3990 (0.4000) loss 5.7683 (7.5918) grad_norm 1.8497 (2.1350) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][130/625] eta 0:03:19 lr 0.000938 wd 0.0500 time 0.3943 (0.4033) data time 0.0009 (0.0046) model time 0.3934 (0.3996) loss 7.8231 (7.5753) grad_norm 3.2054 (2.1759) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][140/625] eta 0:03:15 lr 0.000938 wd 0.0500 time 0.3997 (0.4031) data time 0.0008 (0.0043) model time 0.3989 (0.3995) loss 6.5758 (7.5504) grad_norm 1.7316 (2.1957) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][150/625] eta 0:03:11 lr 0.000937 wd 0.0500 time 0.3984 (0.4031) data time 0.0008 (0.0041) model time 0.3976 (0.3998) loss 8.6952 (7.5952) grad_norm 1.5077 (2.1911) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][160/625] eta 0:03:07 lr 0.000937 wd 0.0500 time 0.3988 (0.4029) data time 0.0007 (0.0039) model time 0.3982 (0.3997) loss 9.2583 (7.5936) grad_norm 2.1869 (2.1633) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][170/625] eta 0:03:03 lr 0.000937 wd 0.0500 time 0.3992 (0.4028) data time 0.0008 (0.0037) model time 0.3984 (0.3997) loss 6.3065 (7.5774) grad_norm 2.0211 (2.1492) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][180/625] eta 0:02:59 lr 0.000937 wd 0.0500 time 0.4007 (0.4026) data time 0.0007 (0.0036) model time 0.4001 (0.3997) loss 6.8335 (7.5924) grad_norm 1.6472 (2.1603) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][190/625] eta 0:02:57 lr 0.000937 wd 0.0500 time 0.4018 (0.4075) data time 0.0006 (0.0034) model time 0.4011 (0.4065) loss 6.9944 (7.5865) grad_norm 1.4594 (2.1397) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][200/625] eta 0:02:54 lr 0.000937 wd 0.0500 time 0.4002 (0.4101) data time 0.0009 (0.0033) model time 0.3993 (0.4099) loss 7.6568 (7.5701) grad_norm 2.6240 (2.1596) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][210/625] eta 0:02:50 lr 0.000937 wd 0.0500 time 0.3987 (0.4102) data time 0.0008 (0.0032) model time 0.3979 (0.4100) loss 7.7702 (7.5750) grad_norm 2.2590 (2.1810) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][220/625] eta 0:02:45 lr 0.000937 wd 0.0500 time 0.3987 (0.4098) data time 0.0006 (0.0031) model time 0.3981 (0.4095) loss 6.2774 (7.5751) grad_norm 1.8258 (2.1718) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][230/625] eta 0:02:41 lr 0.000937 wd 0.0500 time 0.4005 (0.4094) data time 0.0008 (0.0030) model time 0.3997 (0.4089) loss 8.4889 (7.5960) grad_norm 1.7686 (2.1898) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][240/625] eta 0:02:37 lr 0.000937 wd 0.0500 time 0.3986 (0.4091) data time 0.0008 (0.0029) model time 0.3978 (0.4086) loss 8.6173 (7.6108) grad_norm 2.0446 (2.2353) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][250/625] eta 0:02:33 lr 0.000937 wd 0.0500 time 0.4001 (0.4087) data time 0.0006 (0.0028) model time 0.3995 (0.4080) loss 7.3216 (7.6148) grad_norm 1.5800 (2.2450) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:46:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][260/625] eta 0:02:29 lr 0.000936 wd 0.0500 time 0.4017 (0.4084) data time 0.0009 (0.0028) model time 0.4008 (0.4076) loss 7.3349 (7.6205) grad_norm 2.1603 (2.2395) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][270/625] eta 0:02:24 lr 0.000936 wd 0.0500 time 0.3989 (0.4081) data time 0.0008 (0.0027) model time 0.3980 (0.4072) loss 7.8974 (7.6193) grad_norm 2.0767 (2.2388) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][280/625] eta 0:02:20 lr 0.000936 wd 0.0500 time 0.3996 (0.4078) data time 0.0007 (0.0026) model time 0.3989 (0.4069) loss 9.0466 (7.6188) grad_norm 1.4688 (2.2226) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][290/625] eta 0:02:16 lr 0.000936 wd 0.0500 time 0.3970 (0.4075) data time 0.0005 (0.0026) model time 0.3964 (0.4065) loss 8.8060 (7.6221) grad_norm 2.2500 (2.2097) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][300/625] eta 0:02:12 lr 0.000936 wd 0.0500 time 0.4029 (0.4073) data time 0.0009 (0.0025) model time 0.4020 (0.4062) loss 7.5222 (7.6237) grad_norm 2.0385 (2.2092) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][310/625] eta 0:02:08 lr 0.000936 wd 0.0500 time 0.4016 (0.4070) data time 0.0006 (0.0025) model time 0.4010 (0.4059) loss 5.9877 (7.6169) grad_norm 1.7620 (2.2259) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][320/625] eta 0:02:04 lr 0.000936 wd 0.0500 time 0.3952 (0.4072) data time 0.0010 (0.0024) model time 0.3943 (0.4061) loss 5.8112 (7.6055) grad_norm 2.2537 (2.2170) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][330/625] eta 0:02:00 lr 0.000936 wd 0.0500 time 0.3998 (0.4070) data time 0.0006 (0.0024) model time 0.3992 (0.4059) loss 7.4698 (7.5936) grad_norm 1.5233 (2.2058) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][340/625] eta 0:01:55 lr 0.000936 wd 0.0500 time 0.4000 (0.4068) data time 0.0006 (0.0023) model time 0.3994 (0.4057) loss 6.7042 (7.5943) grad_norm 1.6969 (2.1953) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][350/625] eta 0:01:51 lr 0.000936 wd 0.0500 time 0.4014 (0.4066) data time 0.0007 (0.0023) model time 0.4006 (0.4054) loss 7.3257 (7.6042) grad_norm 1.8949 (2.1907) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][360/625] eta 0:01:47 lr 0.000936 wd 0.0500 time 0.4086 (0.4064) data time 0.0009 (0.0023) model time 0.4078 (0.4053) loss 8.7172 (7.6157) grad_norm 2.3396 (2.1930) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][370/625] eta 0:01:43 lr 0.000935 wd 0.0500 time 0.3996 (0.4063) data time 0.0008 (0.0022) model time 0.3988 (0.4051) loss 7.8763 (7.6111) grad_norm 2.0040 (2.1854) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][380/625] eta 0:01:39 lr 0.000935 wd 0.0500 time 0.3952 (0.4061) data time 0.0009 (0.0022) model time 0.3943 (0.4049) loss 8.3890 (7.6185) grad_norm 1.7943 (2.1921) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][390/625] eta 0:01:35 lr 0.000935 wd 0.0500 time 0.4001 (0.4059) data time 0.0008 (0.0022) model time 0.3993 (0.4046) loss 6.5736 (7.6296) grad_norm 1.8167 (2.2111) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][400/625] eta 0:01:31 lr 0.000935 wd 0.0500 time 0.5766 (0.4061) data time 0.0011 (0.0021) model time 0.5755 (0.4050) loss 7.9761 (7.6207) grad_norm 2.5893 (2.2174) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:47:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][410/625] eta 0:01:27 lr 0.000935 wd 0.0500 time 0.5970 (0.4080) data time 0.0007 (0.0021) model time 0.5963 (0.4071) loss 6.4775 (7.6150) grad_norm 5.1135 (2.2389) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][420/625] eta 0:01:23 lr 0.000935 wd 0.0500 time 0.4007 (0.4096) data time 0.0006 (0.0021) model time 0.4000 (0.4089) loss 7.9625 (7.6138) grad_norm 4.1124 (2.2469) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:48:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][430/625] eta 0:01:19 lr 0.000935 wd 0.0500 time 0.4031 (0.4098) data time 0.0007 (0.0021) model time 0.4024 (0.4091) loss 7.2500 (7.6100) grad_norm 2.8260 (2.2583) loss_scale 2048.0000 (1035.8794) mem 14939MB [2024-07-24 23:48:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][440/625] eta 0:01:15 lr 0.000935 wd 0.0500 time 0.3971 (0.4096) data time 0.0006 (0.0021) model time 0.3965 (0.4088) loss 7.6605 (7.6259) grad_norm 2.6335 (2.2643) loss_scale 2048.0000 (1058.8299) mem 14939MB [2024-07-24 23:48:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][450/625] eta 0:01:11 lr 0.000935 wd 0.0500 time 0.4009 (0.4094) data time 0.0009 (0.0020) model time 0.4000 (0.4087) loss 6.4029 (7.6129) grad_norm 1.5032 (2.2709) loss_scale 2048.0000 (1080.7627) mem 14939MB [2024-07-24 23:48:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][460/625] eta 0:01:07 lr 0.000935 wd 0.0500 time 0.3984 (0.4092) data time 0.0006 (0.0020) model time 0.3977 (0.4084) loss 7.1232 (7.6225) grad_norm 1.4795 (2.2719) loss_scale 2048.0000 (1101.7440) mem 14939MB [2024-07-24 23:48:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][470/625] eta 0:01:03 lr 0.000935 wd 0.0500 time 0.4048 (0.4091) data time 0.0006 (0.0020) model time 0.4041 (0.4082) loss 6.4515 (7.6159) grad_norm 1.9241 (2.2681) loss_scale 2048.0000 (1121.8344) mem 14939MB [2024-07-24 23:48:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][480/625] eta 0:00:59 lr 0.000935 wd 0.0500 time 0.4043 (0.4089) data time 0.0007 (0.0020) model time 0.4037 (0.4080) loss 7.8356 (7.6215) grad_norm 2.1854 (2.2636) loss_scale 2048.0000 (1141.0894) mem 14939MB [2024-07-24 23:48:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][490/625] eta 0:00:55 lr 0.000934 wd 0.0500 time 0.4075 (0.4087) data time 0.0007 (0.0020) model time 0.4068 (0.4078) loss 6.8112 (7.6282) grad_norm 1.6524 (2.2624) loss_scale 2048.0000 (1159.5601) mem 14939MB [2024-07-24 23:48:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][500/625] eta 0:00:51 lr 0.000934 wd 0.0500 time 0.3955 (0.4085) data time 0.0007 (0.0019) model time 0.3948 (0.4076) loss 7.4305 (7.6216) grad_norm 1.9961 (2.2577) loss_scale 2048.0000 (1177.2934) mem 14939MB [2024-07-24 23:48:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][510/625] eta 0:00:46 lr 0.000934 wd 0.0500 time 0.3976 (0.4083) data time 0.0007 (0.0019) model time 0.3970 (0.4074) loss 8.5694 (7.6209) grad_norm 2.1557 (2.2525) loss_scale 2048.0000 (1194.3327) mem 14939MB [2024-07-24 23:48:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][520/625] eta 0:00:42 lr 0.000934 wd 0.0500 time 0.3950 (0.4082) data time 0.0010 (0.0019) model time 0.3939 (0.4073) loss 7.2587 (7.6260) grad_norm 1.8299 (2.2475) loss_scale 2048.0000 (1210.7179) mem 14939MB [2024-07-24 23:48:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][530/625] eta 0:00:38 lr 0.000934 wd 0.0500 time 0.4061 (0.4081) data time 0.0006 (0.0019) model time 0.4055 (0.4071) loss 6.5763 (7.6185) grad_norm 2.4332 (2.2494) loss_scale 2048.0000 (1226.4859) mem 14939MB [2024-07-24 23:48:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][540/625] eta 0:00:34 lr 0.000934 wd 0.0500 time 0.3984 (0.4083) data time 0.0009 (0.0019) model time 0.3975 (0.4073) loss 8.2460 (7.6271) grad_norm 1.4676 (2.2560) loss_scale 2048.0000 (1241.6710) mem 14939MB [2024-07-24 23:48:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][550/625] eta 0:00:30 lr 0.000934 wd 0.0500 time 0.4001 (0.4081) data time 0.0007 (0.0018) model time 0.3995 (0.4072) loss 8.2919 (7.6361) grad_norm 2.2857 (2.2538) loss_scale 2048.0000 (1256.3049) mem 14939MB [2024-07-24 23:48:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][560/625] eta 0:00:26 lr 0.000934 wd 0.0500 time 0.4078 (0.4080) data time 0.0006 (0.0018) model time 0.4072 (0.4071) loss 7.8075 (7.6407) grad_norm 2.3947 (2.2500) loss_scale 2048.0000 (1270.4171) mem 14939MB [2024-07-24 23:49:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][570/625] eta 0:00:22 lr 0.000934 wd 0.0500 time 0.3984 (0.4079) data time 0.0006 (0.0018) model time 0.3977 (0.4069) loss 7.0567 (7.6365) grad_norm 1.9181 (2.2435) loss_scale 2048.0000 (1284.0350) mem 14939MB [2024-07-24 23:49:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][580/625] eta 0:00:18 lr 0.000934 wd 0.0500 time 0.4013 (0.4078) data time 0.0006 (0.0018) model time 0.4007 (0.4068) loss 7.1115 (7.6303) grad_norm 2.0394 (2.2393) loss_scale 2048.0000 (1297.1842) mem 14939MB [2024-07-24 23:49:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][590/625] eta 0:00:14 lr 0.000934 wd 0.0500 time 0.4082 (0.4077) data time 0.0007 (0.0018) model time 0.4075 (0.4067) loss 6.2493 (7.6295) grad_norm 2.0063 (2.2427) loss_scale 2048.0000 (1309.8883) mem 14939MB [2024-07-24 23:49:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][600/625] eta 0:00:10 lr 0.000933 wd 0.0500 time 0.4009 (0.4076) data time 0.0006 (0.0018) model time 0.4003 (0.4066) loss 6.5995 (7.6255) grad_norm 1.3492 (2.2369) loss_scale 2048.0000 (1322.1697) mem 14939MB [2024-07-24 23:49:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][610/625] eta 0:00:06 lr 0.000933 wd 0.0500 time 0.3988 (0.4076) data time 0.0006 (0.0018) model time 0.3982 (0.4066) loss 8.1057 (7.6211) grad_norm 2.2173 (2.2350) loss_scale 2048.0000 (1334.0491) mem 14939MB [2024-07-24 23:49:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][620/625] eta 0:00:02 lr 0.000933 wd 0.0500 time 0.3984 (0.4075) data time 0.0004 (0.0017) model time 0.3981 (0.4065) loss 8.5259 (7.6217) grad_norm 2.0607 (2.2320) loss_scale 2048.0000 (1345.5459) mem 14939MB [2024-07-24 23:49:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 107 training takes 0:04:14 [2024-07-24 23:49:25 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:49:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:49:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.435 (0.435) Loss 0.6294 (0.6294) Acc@1 87.402 (87.402) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-24 23:49:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 1.0391 (0.7739) Acc@1 77.393 (84.038) Acc@5 94.189 (96.924) Mem 14939MB [2024-07-24 23:49:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.1279 (0.9168) Acc@1 73.291 (80.394) Acc@5 93.994 (95.336) Mem 14939MB [2024-07-24 23:49:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.964 Acc@5 95.284 [2024-07-24 23:49:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-07-24 23:49:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.96% [2024-07-24 23:49:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 23:49:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 23:49:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.6035 (0.6035) Acc@1 88.135 (88.135) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-24 23:49:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.9824 (0.7480) Acc@1 79.150 (84.708) Acc@5 94.775 (97.203) Mem 14939MB [2024-07-24 23:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.1123 (0.8856) Acc@1 73.535 (81.069) Acc@5 93.848 (95.687) Mem 14939MB [2024-07-24 23:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.722 Acc@5 95.657 [2024-07-24 23:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-07-24 23:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.72% [2024-07-24 23:49:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:49:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][0/625] eta 0:08:41 lr 0.000933 wd 0.0500 time 0.8351 (0.8351) data time 0.4535 (0.4535) model time 0.0000 (0.0000) loss 7.3451 (7.3451) grad_norm 1.3952 (1.3952) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:49:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][10/625] eta 0:05:42 lr 0.000933 wd 0.0500 time 0.5806 (0.5569) data time 0.0006 (0.0420) model time 0.0000 (0.0000) loss 6.9720 (7.5785) grad_norm 3.4650 (3.7877) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:49:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][20/625] eta 0:05:10 lr 0.000933 wd 0.0500 time 0.3992 (0.5137) data time 0.0009 (0.0227) model time 0.0000 (0.0000) loss 6.9435 (7.2674) grad_norm 1.9357 (3.3038) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:49:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][30/625] eta 0:04:44 lr 0.000933 wd 0.0500 time 0.4015 (0.4777) data time 0.0008 (0.0156) model time 0.0000 (0.0000) loss 7.5976 (7.3124) grad_norm 2.3362 (2.9591) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:49:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][40/625] eta 0:04:30 lr 0.000933 wd 0.0500 time 0.3997 (0.4624) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 7.8849 (7.3714) grad_norm 1.5605 (2.6376) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:49:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][50/625] eta 0:04:18 lr 0.000933 wd 0.0500 time 0.4006 (0.4504) data time 0.0007 (0.0099) model time 0.0000 (0.0000) loss 7.8189 (7.4040) grad_norm 1.3647 (2.4638) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][60/625] eta 0:04:09 lr 0.000933 wd 0.0500 time 0.4000 (0.4424) data time 0.0007 (0.0084) model time 0.3993 (0.4009) loss 7.6842 (7.4283) grad_norm 2.4977 (2.3864) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][70/625] eta 0:04:02 lr 0.000933 wd 0.0500 time 0.3981 (0.4367) data time 0.0007 (0.0074) model time 0.3974 (0.4008) loss 8.3556 (7.4318) grad_norm 2.5363 (2.3289) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][80/625] eta 0:03:55 lr 0.000933 wd 0.0500 time 0.3967 (0.4324) data time 0.0009 (0.0066) model time 0.3958 (0.4009) loss 8.4000 (7.4819) grad_norm 2.5109 (2.3031) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][90/625] eta 0:03:49 lr 0.000932 wd 0.0500 time 0.4170 (0.4289) data time 0.0006 (0.0059) model time 0.4164 (0.4007) loss 6.1564 (7.5286) grad_norm 2.8950 (2.2885) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][100/625] eta 0:03:43 lr 0.000932 wd 0.0500 time 0.3961 (0.4262) data time 0.0009 (0.0055) model time 0.3953 (0.4004) loss 8.5147 (7.5151) grad_norm 1.3951 (2.2654) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][110/625] eta 0:03:38 lr 0.000932 wd 0.0500 time 0.3972 (0.4238) data time 0.0006 (0.0051) model time 0.3965 (0.4001) loss 5.6379 (7.4404) grad_norm 3.5622 (2.2539) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][120/625] eta 0:03:33 lr 0.000932 wd 0.0500 time 0.4002 (0.4221) data time 0.0007 (0.0048) model time 0.3995 (0.4004) loss 7.0201 (7.4153) grad_norm 1.7298 (2.2118) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][130/625] eta 0:03:28 lr 0.000932 wd 0.0500 time 0.3983 (0.4203) data time 0.0008 (0.0045) model time 0.3975 (0.4001) loss 7.4875 (7.4138) grad_norm 1.5675 (2.1833) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][140/625] eta 0:03:23 lr 0.000932 wd 0.0500 time 0.3987 (0.4188) data time 0.0008 (0.0042) model time 0.3979 (0.3999) loss 8.4821 (7.4411) grad_norm 1.5443 (2.1663) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][150/625] eta 0:03:18 lr 0.000932 wd 0.0500 time 0.4014 (0.4176) data time 0.0009 (0.0040) model time 0.4006 (0.3999) loss 8.5875 (7.4863) grad_norm 2.5108 (2.1522) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][160/625] eta 0:03:13 lr 0.000932 wd 0.0500 time 0.3956 (0.4164) data time 0.0010 (0.0038) model time 0.3946 (0.3997) loss 6.2926 (7.4859) grad_norm 5.0323 (2.1873) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][170/625] eta 0:03:09 lr 0.000932 wd 0.0500 time 0.3985 (0.4156) data time 0.0008 (0.0036) model time 0.3977 (0.3998) loss 9.0060 (7.4992) grad_norm 2.2727 (2.2010) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][180/625] eta 0:03:04 lr 0.000932 wd 0.0500 time 0.4022 (0.4147) data time 0.0007 (0.0035) model time 0.4015 (0.3997) loss 6.7519 (7.5047) grad_norm 2.0067 (2.1865) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][190/625] eta 0:03:00 lr 0.000932 wd 0.0500 time 0.3963 (0.4141) data time 0.0008 (0.0034) model time 0.3954 (0.3999) loss 8.2703 (7.5197) grad_norm 2.1278 (2.1680) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:50:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][200/625] eta 0:02:55 lr 0.000931 wd 0.0500 time 0.3969 (0.4133) data time 0.0007 (0.0032) model time 0.3962 (0.3998) loss 8.2376 (7.5261) grad_norm 2.1376 (2.1556) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][210/625] eta 0:02:51 lr 0.000931 wd 0.0500 time 0.3990 (0.4126) data time 0.0006 (0.0031) model time 0.3984 (0.3996) loss 8.5820 (7.5150) grad_norm 2.3874 (2.1371) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][220/625] eta 0:02:47 lr 0.000931 wd 0.0500 time 0.4097 (0.4141) data time 0.0009 (0.0030) model time 0.4088 (0.4023) loss 7.3233 (7.5158) grad_norm 3.2265 (2.1346) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][230/625] eta 0:02:45 lr 0.000931 wd 0.0500 time 0.5988 (0.4188) data time 0.0008 (0.0029) model time 0.5980 (0.4089) loss 8.9629 (7.5095) grad_norm 3.0785 (2.1354) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][240/625] eta 0:02:41 lr 0.000931 wd 0.0500 time 0.4111 (0.4206) data time 0.0006 (0.0028) model time 0.4105 (0.4117) loss 8.3646 (7.5098) grad_norm 2.2122 (2.1500) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][250/625] eta 0:02:37 lr 0.000931 wd 0.0500 time 0.3949 (0.4203) data time 0.0010 (0.0028) model time 0.3939 (0.4117) loss 6.8367 (7.4957) grad_norm 1.4419 (2.1347) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][260/625] eta 0:02:33 lr 0.000931 wd 0.0500 time 0.4042 (0.4205) data time 0.0007 (0.0027) model time 0.4034 (0.4122) loss 7.9443 (7.4969) grad_norm 2.1370 (2.1268) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][270/625] eta 0:02:29 lr 0.000931 wd 0.0500 time 0.4182 (0.4198) data time 0.0008 (0.0026) model time 0.4174 (0.4117) loss 6.0868 (7.4836) grad_norm 2.2100 (2.1181) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][280/625] eta 0:02:24 lr 0.000931 wd 0.0500 time 0.3975 (0.4191) data time 0.0007 (0.0026) model time 0.3968 (0.4111) loss 7.9605 (7.4703) grad_norm 2.0587 (2.1068) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][290/625] eta 0:02:20 lr 0.000931 wd 0.0500 time 0.4022 (0.4184) data time 0.0009 (0.0025) model time 0.4013 (0.4107) loss 8.6794 (7.4792) grad_norm 4.1156 (2.1210) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][300/625] eta 0:02:15 lr 0.000931 wd 0.0500 time 0.3966 (0.4179) data time 0.0007 (0.0025) model time 0.3959 (0.4103) loss 6.5908 (7.4665) grad_norm 2.1793 (2.1208) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][310/625] eta 0:02:11 lr 0.000930 wd 0.0500 time 0.4052 (0.4174) data time 0.0006 (0.0024) model time 0.4046 (0.4099) loss 7.6927 (7.4657) grad_norm 2.3789 (2.1319) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][320/625] eta 0:02:07 lr 0.000930 wd 0.0500 time 0.4001 (0.4169) data time 0.0009 (0.0024) model time 0.3992 (0.4097) loss 6.4621 (7.4779) grad_norm 2.2254 (2.1426) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][330/625] eta 0:02:02 lr 0.000930 wd 0.0500 time 0.4026 (0.4165) data time 0.0009 (0.0023) model time 0.4017 (0.4093) loss 7.5717 (7.4829) grad_norm 2.3579 (2.1489) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][340/625] eta 0:01:58 lr 0.000930 wd 0.0500 time 0.3989 (0.4161) data time 0.0009 (0.0023) model time 0.3980 (0.4091) loss 7.5690 (7.4921) grad_norm 2.2085 (2.1456) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:51:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][350/625] eta 0:01:54 lr 0.000930 wd 0.0500 time 0.3979 (0.4156) data time 0.0010 (0.0023) model time 0.3969 (0.4087) loss 7.5161 (7.4787) grad_norm 3.0056 (2.1461) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][360/625] eta 0:01:50 lr 0.000930 wd 0.0500 time 0.4023 (0.4152) data time 0.0006 (0.0022) model time 0.4017 (0.4084) loss 7.1580 (7.4702) grad_norm 2.0475 (2.1442) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][370/625] eta 0:01:45 lr 0.000930 wd 0.0500 time 0.3961 (0.4147) data time 0.0006 (0.0022) model time 0.3955 (0.4081) loss 8.3967 (7.4643) grad_norm 1.6154 (2.1405) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][380/625] eta 0:01:41 lr 0.000930 wd 0.0500 time 0.3989 (0.4143) data time 0.0009 (0.0022) model time 0.3979 (0.4078) loss 8.2131 (7.4719) grad_norm 1.8324 (2.1342) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][390/625] eta 0:01:37 lr 0.000930 wd 0.0500 time 0.3956 (0.4139) data time 0.0008 (0.0021) model time 0.3948 (0.4075) loss 8.1350 (7.4851) grad_norm 1.7277 (2.1307) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][400/625] eta 0:01:33 lr 0.000930 wd 0.0500 time 0.3993 (0.4136) data time 0.0009 (0.0021) model time 0.3984 (0.4072) loss 8.6501 (7.4840) grad_norm 2.3337 (2.1320) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][410/625] eta 0:01:28 lr 0.000930 wd 0.0500 time 0.3985 (0.4132) data time 0.0008 (0.0021) model time 0.3977 (0.4070) loss 6.5444 (7.4831) grad_norm 2.5859 (2.1320) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][420/625] eta 0:01:24 lr 0.000929 wd 0.0500 time 0.4000 (0.4129) data time 0.0006 (0.0021) model time 0.3993 (0.4067) loss 8.1534 (7.4931) grad_norm 1.9584 (2.1259) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][430/625] eta 0:01:20 lr 0.000929 wd 0.0500 time 0.3959 (0.4126) data time 0.0007 (0.0020) model time 0.3952 (0.4065) loss 7.4665 (7.4966) grad_norm 1.5378 (2.1228) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][440/625] eta 0:01:16 lr 0.000929 wd 0.0500 time 0.6011 (0.4134) data time 0.0008 (0.0020) model time 0.6003 (0.4076) loss 8.2378 (7.4976) grad_norm 1.3730 (2.1150) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][450/625] eta 0:01:12 lr 0.000929 wd 0.0500 time 0.3990 (0.4146) data time 0.0006 (0.0020) model time 0.3984 (0.4091) loss 8.5463 (7.5005) grad_norm 1.3996 (2.1013) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][460/625] eta 0:01:08 lr 0.000929 wd 0.0500 time 0.3918 (0.4156) data time 0.0009 (0.0019) model time 0.3910 (0.4103) loss 6.6105 (7.5063) grad_norm 2.6639 (2.1088) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][470/625] eta 0:01:04 lr 0.000929 wd 0.0500 time 0.4020 (0.4156) data time 0.0006 (0.0019) model time 0.4013 (0.4104) loss 8.1254 (7.5109) grad_norm 3.3045 (2.1231) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][480/625] eta 0:01:00 lr 0.000929 wd 0.0500 time 0.3988 (0.4156) data time 0.0007 (0.0019) model time 0.3981 (0.4105) loss 8.6105 (7.5195) grad_norm 1.6353 (2.1307) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:52:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][490/625] eta 0:00:56 lr 0.000929 wd 0.0500 time 0.4000 (0.4154) data time 0.0008 (0.0019) model time 0.3993 (0.4103) loss 6.8765 (7.5281) grad_norm 1.8495 (2.1401) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][500/625] eta 0:00:51 lr 0.000929 wd 0.0500 time 0.4015 (0.4151) data time 0.0009 (0.0019) model time 0.4006 (0.4102) loss 7.3154 (7.5251) grad_norm 2.6499 (2.1389) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][510/625] eta 0:00:47 lr 0.000929 wd 0.0500 time 0.4020 (0.4148) data time 0.0006 (0.0019) model time 0.4014 (0.4099) loss 8.1843 (7.5232) grad_norm 3.7813 (2.1432) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][520/625] eta 0:00:43 lr 0.000929 wd 0.0500 time 0.4024 (0.4145) data time 0.0007 (0.0018) model time 0.4017 (0.4097) loss 8.3060 (7.5198) grad_norm 2.0198 (2.1421) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][530/625] eta 0:00:39 lr 0.000929 wd 0.0500 time 0.3986 (0.4143) data time 0.0009 (0.0018) model time 0.3978 (0.4095) loss 7.6815 (7.5154) grad_norm 2.4504 (2.1412) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][540/625] eta 0:00:35 lr 0.000928 wd 0.0500 time 0.4039 (0.4140) data time 0.0007 (0.0018) model time 0.4032 (0.4093) loss 6.5464 (7.5066) grad_norm 1.6839 (2.1373) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][550/625] eta 0:00:31 lr 0.000928 wd 0.0500 time 0.3992 (0.4138) data time 0.0007 (0.0018) model time 0.3985 (0.4091) loss 7.2809 (7.5149) grad_norm 1.4520 (2.1418) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][560/625] eta 0:00:26 lr 0.000928 wd 0.0500 time 0.3991 (0.4135) data time 0.0009 (0.0018) model time 0.3982 (0.4089) loss 6.5585 (7.5153) grad_norm 2.9209 (2.1454) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][570/625] eta 0:00:22 lr 0.000928 wd 0.0500 time 0.4020 (0.4133) data time 0.0008 (0.0018) model time 0.4012 (0.4087) loss 6.8590 (7.5149) grad_norm 1.9663 (2.1384) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][580/625] eta 0:00:18 lr 0.000928 wd 0.0500 time 0.4095 (0.4131) data time 0.0009 (0.0017) model time 0.4086 (0.4085) loss 8.0907 (7.5214) grad_norm 1.7341 (2.1347) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][590/625] eta 0:00:14 lr 0.000928 wd 0.0500 time 0.3983 (0.4129) data time 0.0008 (0.0017) model time 0.3976 (0.4084) loss 7.9381 (7.5158) grad_norm 2.8789 (2.1383) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][600/625] eta 0:00:10 lr 0.000928 wd 0.0500 time 0.3982 (0.4127) data time 0.0009 (0.0017) model time 0.3973 (0.4082) loss 7.5124 (7.5190) grad_norm 1.9553 (2.1354) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][610/625] eta 0:00:06 lr 0.000928 wd 0.0500 time 0.3997 (0.4125) data time 0.0006 (0.0017) model time 0.3991 (0.4081) loss 5.9783 (7.5182) grad_norm 2.3082 (2.1385) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][620/625] eta 0:00:02 lr 0.000928 wd 0.0500 time 0.3982 (0.4123) data time 0.0006 (0.0017) model time 0.3976 (0.4079) loss 7.1565 (7.5268) grad_norm 1.9124 (2.1409) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:53:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 108 training takes 0:04:17 [2024-07-24 23:53:51 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:53:52 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:53:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.6270 (0.6270) Acc@1 87.793 (87.793) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-24 23:53:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 1.0205 (0.7754) Acc@1 77.686 (84.060) Acc@5 94.629 (97.030) Mem 14939MB [2024-07-24 23:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.1338 (0.9197) Acc@1 74.219 (80.341) Acc@5 93.896 (95.401) Mem 14939MB [2024-07-24 23:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.036 Acc@5 95.379 [2024-07-24 23:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-07-24 23:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.04% [2024-07-24 23:53:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 23:53:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 23:53:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.555 (0.555) Loss 0.6021 (0.6021) Acc@1 88.184 (88.184) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-24 23:53:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.130) Loss 0.9814 (0.7470) Acc@1 79.199 (84.717) Acc@5 94.824 (97.208) Mem 14939MB [2024-07-24 23:53:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.109) Loss 1.1104 (0.8842) Acc@1 73.486 (81.080) Acc@5 93.848 (95.705) Mem 14939MB [2024-07-24 23:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.742 Acc@5 95.677 [2024-07-24 23:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-07-24 23:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.74% [2024-07-24 23:53:58 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:53:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:54:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][0/625] eta 0:14:34 lr 0.000928 wd 0.0500 time 1.3988 (1.3988) data time 0.4568 (0.4568) model time 0.0000 (0.0000) loss 6.9854 (6.9854) grad_norm 2.4251 (2.4251) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][10/625] eta 0:05:02 lr 0.000928 wd 0.0500 time 0.3946 (0.4912) data time 0.0008 (0.0424) model time 0.0000 (0.0000) loss 7.9255 (7.2412) grad_norm 2.1838 (2.1465) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][20/625] eta 0:04:31 lr 0.000927 wd 0.0500 time 0.4013 (0.4480) data time 0.0009 (0.0226) model time 0.0000 (0.0000) loss 9.7148 (7.2886) grad_norm 2.0465 (2.5710) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][30/625] eta 0:04:18 lr 0.000927 wd 0.0500 time 0.3971 (0.4341) data time 0.0007 (0.0156) model time 0.0000 (0.0000) loss 6.3886 (7.3193) grad_norm 2.4137 (2.4698) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][40/625] eta 0:04:21 lr 0.000927 wd 0.0500 time 0.5895 (0.4466) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 7.6353 (7.2850) grad_norm 1.8323 (2.3653) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][50/625] eta 0:04:24 lr 0.000927 wd 0.0500 time 0.5983 (0.4592) data time 0.0008 (0.0098) model time 0.0000 (0.0000) loss 8.5409 (7.3371) grad_norm 1.6308 (2.2585) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][60/625] eta 0:04:15 lr 0.000927 wd 0.0500 time 0.4006 (0.4518) data time 0.0008 (0.0084) model time 0.3998 (0.4130) loss 8.3964 (7.3434) grad_norm 1.9567 (2.2137) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][70/625] eta 0:04:06 lr 0.000927 wd 0.0500 time 0.3978 (0.4443) data time 0.0009 (0.0074) model time 0.3969 (0.4051) loss 7.2796 (7.4356) grad_norm 1.5612 (2.1582) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][80/625] eta 0:03:59 lr 0.000927 wd 0.0500 time 0.3979 (0.4387) data time 0.0007 (0.0066) model time 0.3972 (0.4028) loss 8.0483 (7.4858) grad_norm 2.4109 (2.1355) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][90/625] eta 0:03:52 lr 0.000927 wd 0.0500 time 0.3961 (0.4342) data time 0.0009 (0.0060) model time 0.3952 (0.4013) loss 7.3285 (7.5055) grad_norm 2.2278 (2.1409) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][100/625] eta 0:03:46 lr 0.000927 wd 0.0500 time 0.3985 (0.4307) data time 0.0006 (0.0055) model time 0.3979 (0.4005) loss 7.6700 (7.5070) grad_norm 2.1040 (2.1600) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][110/625] eta 0:03:40 lr 0.000927 wd 0.0500 time 0.4008 (0.4277) data time 0.0006 (0.0050) model time 0.4001 (0.3999) loss 8.3879 (7.5317) grad_norm 1.9834 (2.1545) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][120/625] eta 0:03:34 lr 0.000927 wd 0.0500 time 0.3966 (0.4253) data time 0.0007 (0.0047) model time 0.3959 (0.3997) loss 7.0296 (7.5294) grad_norm 3.1070 (2.1657) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][130/625] eta 0:03:29 lr 0.000926 wd 0.0500 time 0.3959 (0.4232) data time 0.0009 (0.0044) model time 0.3950 (0.3993) loss 7.6092 (7.5452) grad_norm 2.3398 (2.2390) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:54:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][140/625] eta 0:03:24 lr 0.000926 wd 0.0500 time 0.3952 (0.4216) data time 0.0007 (0.0042) model time 0.3944 (0.3992) loss 7.8324 (7.5460) grad_norm 1.6811 (2.2379) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:55:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][150/625] eta 0:03:19 lr 0.000926 wd 0.0500 time 0.4052 (0.4200) data time 0.0007 (0.0040) model time 0.4045 (0.3991) loss 6.7230 (7.5547) grad_norm 1.5893 (2.2090) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:55:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][160/625] eta 0:03:14 lr 0.000926 wd 0.0500 time 0.3947 (0.4187) data time 0.0008 (0.0038) model time 0.3939 (0.3989) loss 7.5555 (7.5735) grad_norm 2.3462 (2.2039) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:55:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][170/625] eta 0:03:09 lr 0.000926 wd 0.0500 time 0.3988 (0.4176) data time 0.0009 (0.0036) model time 0.3979 (0.3989) loss 7.3637 (7.6022) grad_norm 2.3790 (2.2259) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:55:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][180/625] eta 0:03:05 lr 0.000926 wd 0.0500 time 0.4011 (0.4166) data time 0.0006 (0.0035) model time 0.4004 (0.3990) loss 8.5687 (7.6110) grad_norm 1.7530 (2.2228) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:55:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][190/625] eta 0:03:00 lr 0.000926 wd 0.0500 time 0.3976 (0.4157) data time 0.0007 (0.0033) model time 0.3969 (0.3989) loss 6.2531 (7.6030) grad_norm 1.9044 (2.2160) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:55:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][200/625] eta 0:02:56 lr 0.000926 wd 0.0500 time 0.3973 (0.4149) data time 0.0007 (0.0032) model time 0.3966 (0.3989) loss 7.5720 (7.6035) grad_norm 2.1048 (2.2227) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-24 23:55:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][210/625] eta 0:02:52 lr 0.000926 wd 0.0500 time 0.3985 (0.4147) data time 0.0008 (0.0031) model time 0.3977 (0.3996) loss 6.6134 (7.5905) grad_norm 2.3872 (inf) loss_scale 1024.0000 (2018.8815) mem 14939MB [2024-07-24 23:55:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][220/625] eta 0:02:47 lr 0.000926 wd 0.0500 time 0.3998 (0.4140) data time 0.0009 (0.0030) model time 0.3989 (0.3995) loss 8.3931 (7.6001) grad_norm 2.0410 (inf) loss_scale 1024.0000 (1973.8643) mem 14939MB [2024-07-24 23:55:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][230/625] eta 0:02:43 lr 0.000926 wd 0.0500 time 0.3943 (0.4133) data time 0.0007 (0.0029) model time 0.3936 (0.3994) loss 9.3882 (7.6243) grad_norm 1.7107 (inf) loss_scale 1024.0000 (1932.7446) mem 14939MB [2024-07-24 23:55:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][240/625] eta 0:02:38 lr 0.000925 wd 0.0500 time 0.3973 (0.4127) data time 0.0010 (0.0028) model time 0.3963 (0.3993) loss 7.6144 (7.6307) grad_norm 2.2029 (inf) loss_scale 1024.0000 (1895.0373) mem 14939MB [2024-07-24 23:55:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][250/625] eta 0:02:34 lr 0.000925 wd 0.0500 time 0.5758 (0.4129) data time 0.0009 (0.0027) model time 0.5749 (0.4001) loss 6.6876 (7.6222) grad_norm 1.7085 (inf) loss_scale 1024.0000 (1860.3347) mem 14939MB [2024-07-24 23:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][260/625] eta 0:02:31 lr 0.000925 wd 0.0500 time 0.5874 (0.4146) data time 0.0008 (0.0027) model time 0.5866 (0.4028) loss 7.3991 (7.6129) grad_norm 1.6099 (inf) loss_scale 1024.0000 (1828.2912) mem 14939MB [2024-07-24 23:55:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][270/625] eta 0:02:28 lr 0.000925 wd 0.0500 time 0.5982 (0.4177) data time 0.0007 (0.0026) model time 0.5975 (0.4071) loss 8.2798 (7.6175) grad_norm 2.6779 (inf) loss_scale 1024.0000 (1798.6125) mem 14939MB [2024-07-24 23:55:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][280/625] eta 0:02:24 lr 0.000925 wd 0.0500 time 0.3972 (0.4174) data time 0.0009 (0.0025) model time 0.3963 (0.4072) loss 7.9177 (7.6344) grad_norm 3.3179 (inf) loss_scale 1024.0000 (1771.0463) mem 14939MB [2024-07-24 23:56:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][290/625] eta 0:02:19 lr 0.000925 wd 0.0500 time 0.3983 (0.4168) data time 0.0007 (0.0025) model time 0.3976 (0.4068) loss 6.5210 (7.6252) grad_norm 2.8949 (inf) loss_scale 1024.0000 (1745.3746) mem 14939MB [2024-07-24 23:56:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][300/625] eta 0:02:15 lr 0.000925 wd 0.0500 time 0.3957 (0.4162) data time 0.0009 (0.0024) model time 0.3949 (0.4064) loss 5.2674 (7.6044) grad_norm 2.2791 (inf) loss_scale 1024.0000 (1721.4086) mem 14939MB [2024-07-24 23:56:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][310/625] eta 0:02:10 lr 0.000925 wd 0.0500 time 0.3992 (0.4156) data time 0.0007 (0.0024) model time 0.3985 (0.4061) loss 7.5072 (7.6032) grad_norm 1.8042 (inf) loss_scale 1024.0000 (1698.9839) mem 14939MB [2024-07-24 23:56:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][320/625] eta 0:02:06 lr 0.000925 wd 0.0500 time 0.3986 (0.4151) data time 0.0008 (0.0023) model time 0.3978 (0.4058) loss 7.5803 (7.5998) grad_norm 1.5544 (inf) loss_scale 1024.0000 (1677.9564) mem 14939MB [2024-07-24 23:56:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][330/625] eta 0:02:02 lr 0.000925 wd 0.0500 time 0.3980 (0.4146) data time 0.0009 (0.0023) model time 0.3971 (0.4055) loss 6.7409 (7.6017) grad_norm 1.8766 (inf) loss_scale 1024.0000 (1658.1994) mem 14939MB [2024-07-24 23:56:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][340/625] eta 0:01:58 lr 0.000925 wd 0.0500 time 0.3981 (0.4142) data time 0.0009 (0.0023) model time 0.3973 (0.4054) loss 6.5767 (7.5934) grad_norm 2.3556 (inf) loss_scale 1024.0000 (1639.6012) mem 14939MB [2024-07-24 23:56:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][350/625] eta 0:01:53 lr 0.000925 wd 0.0500 time 0.3977 (0.4138) data time 0.0006 (0.0022) model time 0.3971 (0.4051) loss 7.0178 (7.5920) grad_norm 1.7321 (inf) loss_scale 1024.0000 (1622.0627) mem 14939MB [2024-07-24 23:56:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][360/625] eta 0:01:49 lr 0.000924 wd 0.0500 time 0.3986 (0.4133) data time 0.0007 (0.0022) model time 0.3979 (0.4049) loss 7.0531 (7.5929) grad_norm 1.9412 (inf) loss_scale 1024.0000 (1605.4958) mem 14939MB [2024-07-24 23:56:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][370/625] eta 0:01:45 lr 0.000924 wd 0.0500 time 0.4008 (0.4130) data time 0.0006 (0.0022) model time 0.4002 (0.4048) loss 8.8335 (7.5895) grad_norm 3.1230 (inf) loss_scale 1024.0000 (1589.8221) mem 14939MB [2024-07-24 23:56:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][380/625] eta 0:01:41 lr 0.000924 wd 0.0500 time 0.4025 (0.4128) data time 0.0007 (0.0021) model time 0.4018 (0.4046) loss 8.2532 (7.5859) grad_norm 1.7303 (inf) loss_scale 1024.0000 (1574.9711) mem 14939MB [2024-07-24 23:56:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][390/625] eta 0:01:36 lr 0.000924 wd 0.0500 time 0.4086 (0.4125) data time 0.0008 (0.0021) model time 0.4077 (0.4046) loss 6.1420 (7.5771) grad_norm 2.8418 (inf) loss_scale 1024.0000 (1560.8798) mem 14939MB [2024-07-24 23:56:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][400/625] eta 0:01:32 lr 0.000924 wd 0.0500 time 0.3969 (0.4122) data time 0.0013 (0.0021) model time 0.3956 (0.4044) loss 8.7775 (7.5848) grad_norm 1.6995 (inf) loss_scale 1024.0000 (1547.4913) mem 14939MB [2024-07-24 23:56:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][410/625] eta 0:01:28 lr 0.000924 wd 0.0500 time 0.4001 (0.4119) data time 0.0008 (0.0020) model time 0.3993 (0.4043) loss 7.0621 (7.5803) grad_norm 2.3153 (inf) loss_scale 1024.0000 (1534.7543) mem 14939MB [2024-07-24 23:56:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][420/625] eta 0:01:24 lr 0.000924 wd 0.0500 time 0.4006 (0.4117) data time 0.0008 (0.0020) model time 0.3998 (0.4042) loss 8.0334 (7.5806) grad_norm 2.3122 (inf) loss_scale 1024.0000 (1522.6223) mem 14939MB [2024-07-24 23:56:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][430/625] eta 0:01:20 lr 0.000924 wd 0.0500 time 0.3996 (0.4118) data time 0.0006 (0.0020) model time 0.3990 (0.4045) loss 7.3321 (7.5767) grad_norm 1.7983 (inf) loss_scale 1024.0000 (1511.0534) mem 14939MB [2024-07-24 23:57:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][440/625] eta 0:01:16 lr 0.000924 wd 0.0500 time 0.3993 (0.4115) data time 0.0008 (0.0020) model time 0.3985 (0.4044) loss 6.5789 (7.5753) grad_norm 1.5988 (inf) loss_scale 1024.0000 (1500.0091) mem 14939MB [2024-07-24 23:57:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][450/625] eta 0:01:11 lr 0.000924 wd 0.0500 time 0.4005 (0.4113) data time 0.0006 (0.0020) model time 0.3999 (0.4042) loss 7.6545 (7.5773) grad_norm 1.9494 (inf) loss_scale 1024.0000 (1489.4545) mem 14939MB [2024-07-24 23:57:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][460/625] eta 0:01:07 lr 0.000924 wd 0.0500 time 0.4083 (0.4111) data time 0.0006 (0.0019) model time 0.4077 (0.4041) loss 7.4708 (7.5742) grad_norm 2.7092 (inf) loss_scale 1024.0000 (1479.3579) mem 14939MB [2024-07-24 23:57:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][470/625] eta 0:01:03 lr 0.000923 wd 0.0500 time 0.3972 (0.4109) data time 0.0010 (0.0019) model time 0.3963 (0.4040) loss 6.0797 (7.5726) grad_norm 2.6130 (inf) loss_scale 1024.0000 (1469.6900) mem 14939MB [2024-07-24 23:57:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][480/625] eta 0:00:59 lr 0.000923 wd 0.0500 time 0.5915 (0.4127) data time 0.0007 (0.0019) model time 0.5908 (0.4062) loss 8.3143 (7.5791) grad_norm 2.7616 (inf) loss_scale 1024.0000 (1460.4241) mem 14939MB [2024-07-24 23:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][490/625] eta 0:00:55 lr 0.000923 wd 0.0500 time 0.6073 (0.4147) data time 0.0010 (0.0019) model time 0.6063 (0.4086) loss 6.8484 (7.5799) grad_norm 2.0807 (inf) loss_scale 1024.0000 (1451.5356) mem 14939MB [2024-07-24 23:57:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][500/625] eta 0:00:51 lr 0.000923 wd 0.0500 time 0.3976 (0.4146) data time 0.0008 (0.0019) model time 0.3968 (0.4086) loss 5.8483 (7.5774) grad_norm 2.3074 (inf) loss_scale 1024.0000 (1443.0020) mem 14939MB [2024-07-24 23:57:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][510/625] eta 0:00:47 lr 0.000923 wd 0.0500 time 0.4001 (0.4143) data time 0.0009 (0.0018) model time 0.3992 (0.4084) loss 8.1265 (7.5899) grad_norm 1.6008 (inf) loss_scale 1024.0000 (1434.8023) mem 14939MB [2024-07-24 23:57:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][520/625] eta 0:00:43 lr 0.000923 wd 0.0500 time 0.4009 (0.4141) data time 0.0009 (0.0018) model time 0.4000 (0.4082) loss 8.9749 (7.5932) grad_norm 2.3690 (inf) loss_scale 1024.0000 (1426.9175) mem 14939MB [2024-07-24 23:57:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][530/625] eta 0:00:39 lr 0.000923 wd 0.0500 time 0.4008 (0.4138) data time 0.0007 (0.0018) model time 0.4002 (0.4080) loss 7.3837 (7.5896) grad_norm 1.7245 (inf) loss_scale 1024.0000 (1419.3296) mem 14939MB [2024-07-24 23:57:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][540/625] eta 0:00:35 lr 0.000923 wd 0.0500 time 0.4007 (0.4135) data time 0.0009 (0.0018) model time 0.3998 (0.4078) loss 8.1410 (7.5973) grad_norm 1.5909 (inf) loss_scale 1024.0000 (1412.0222) mem 14939MB [2024-07-24 23:57:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][550/625] eta 0:00:31 lr 0.000923 wd 0.0500 time 0.4035 (0.4134) data time 0.0009 (0.0018) model time 0.4027 (0.4077) loss 7.4195 (7.6018) grad_norm 3.0466 (inf) loss_scale 1024.0000 (1404.9800) mem 14939MB [2024-07-24 23:57:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][560/625] eta 0:00:26 lr 0.000923 wd 0.0500 time 0.3983 (0.4132) data time 0.0007 (0.0018) model time 0.3976 (0.4076) loss 7.6635 (7.6054) grad_norm 1.7316 (inf) loss_scale 1024.0000 (1398.1889) mem 14939MB [2024-07-24 23:57:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][570/625] eta 0:00:22 lr 0.000923 wd 0.0500 time 0.3984 (0.4129) data time 0.0007 (0.0017) model time 0.3977 (0.4074) loss 8.0463 (7.6054) grad_norm 2.9335 (inf) loss_scale 1024.0000 (1391.6357) mem 14939MB [2024-07-24 23:57:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][580/625] eta 0:00:18 lr 0.000922 wd 0.0500 time 0.3968 (0.4127) data time 0.0007 (0.0017) model time 0.3961 (0.4072) loss 7.3049 (7.6106) grad_norm 1.4261 (inf) loss_scale 1024.0000 (1385.3081) mem 14939MB [2024-07-24 23:58:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][590/625] eta 0:00:14 lr 0.000922 wd 0.0500 time 0.3993 (0.4124) data time 0.0008 (0.0017) model time 0.3986 (0.4071) loss 6.1508 (7.6048) grad_norm 4.4714 (inf) loss_scale 1024.0000 (1379.1946) mem 14939MB [2024-07-24 23:58:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][600/625] eta 0:00:10 lr 0.000922 wd 0.0500 time 0.4008 (0.4122) data time 0.0008 (0.0017) model time 0.4001 (0.4069) loss 8.9731 (7.6042) grad_norm 1.4529 (inf) loss_scale 1024.0000 (1373.2845) mem 14939MB [2024-07-24 23:58:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][610/625] eta 0:00:06 lr 0.000922 wd 0.0500 time 0.3992 (0.4121) data time 0.0006 (0.0017) model time 0.3986 (0.4068) loss 8.9490 (7.6074) grad_norm 2.4979 (inf) loss_scale 1024.0000 (1367.5679) mem 14939MB [2024-07-24 23:58:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][620/625] eta 0:00:02 lr 0.000922 wd 0.0500 time 0.3975 (0.4118) data time 0.0004 (0.0017) model time 0.3971 (0.4066) loss 7.0013 (7.6071) grad_norm 1.7133 (inf) loss_scale 1024.0000 (1362.0354) mem 14939MB [2024-07-24 23:58:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 109 training takes 0:04:17 [2024-07-24 23:58:16 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-24 23:58:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-24 23:58:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.666 (0.666) Loss 0.5933 (0.5933) Acc@1 88.135 (88.135) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-24 23:58:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.139) Loss 1.0146 (0.7658) Acc@1 77.881 (83.953) Acc@5 94.775 (97.124) Mem 14939MB [2024-07-24 23:58:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.113) Loss 1.1504 (0.9128) Acc@1 73.682 (80.420) Acc@5 93.506 (95.475) Mem 14939MB [2024-07-24 23:58:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.120 Acc@5 95.443 [2024-07-24 23:58:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-24 23:58:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.12% [2024-07-24 23:58:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-24 23:58:21 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-24 23:58:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.6011 (0.6011) Acc@1 88.232 (88.232) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-24 23:58:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9795 (0.7459) Acc@1 79.199 (84.770) Acc@5 94.873 (97.221) Mem 14939MB [2024-07-24 23:58:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.1084 (0.8829) Acc@1 73.633 (81.152) Acc@5 93.848 (95.712) Mem 14939MB [2024-07-24 23:58:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.810 Acc@5 95.685 [2024-07-24 23:58:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-07-24 23:58:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.81% [2024-07-24 23:58:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-24 23:58:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-24 23:58:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][0/625] eta 0:09:12 lr 0.000922 wd 0.0500 time 0.8848 (0.8848) data time 0.5023 (0.5023) model time 0.0000 (0.0000) loss 8.8883 (8.8883) grad_norm 2.5536 (2.5536) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:58:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][10/625] eta 0:04:33 lr 0.000922 wd 0.0500 time 0.4009 (0.4453) data time 0.0006 (0.0465) model time 0.0000 (0.0000) loss 8.6745 (7.6765) grad_norm 2.0152 (1.9819) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:58:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][20/625] eta 0:04:16 lr 0.000922 wd 0.0500 time 0.3989 (0.4234) data time 0.0006 (0.0247) model time 0.0000 (0.0000) loss 6.4119 (7.6189) grad_norm 2.2866 (2.0200) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:58:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][30/625] eta 0:04:07 lr 0.000922 wd 0.0500 time 0.4021 (0.4159) data time 0.0007 (0.0170) model time 0.0000 (0.0000) loss 7.5605 (7.5844) grad_norm 1.6622 (2.0305) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][40/625] eta 0:04:00 lr 0.000922 wd 0.0500 time 0.3994 (0.4117) data time 0.0007 (0.0131) model time 0.0000 (0.0000) loss 7.3606 (7.5453) grad_norm 1.5593 (2.1444) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:58:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][50/625] eta 0:03:55 lr 0.000922 wd 0.0500 time 0.3995 (0.4094) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 7.2150 (7.5992) grad_norm 2.5845 (2.1768) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:58:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][60/625] eta 0:03:50 lr 0.000921 wd 0.0500 time 0.3993 (0.4079) data time 0.0008 (0.0091) model time 0.3985 (0.3993) loss 8.0874 (7.6712) grad_norm 1.5157 (2.1630) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:58:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][70/625] eta 0:03:47 lr 0.000921 wd 0.0500 time 0.3999 (0.4094) data time 0.0007 (0.0079) model time 0.3992 (0.4085) loss 8.3950 (7.6948) grad_norm 2.1077 (2.1273) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:58:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][80/625] eta 0:03:49 lr 0.000921 wd 0.0500 time 0.5881 (0.4220) data time 0.0006 (0.0071) model time 0.5875 (0.4424) loss 7.0993 (7.6734) grad_norm 1.3918 (2.1541) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][90/625] eta 0:03:48 lr 0.000921 wd 0.0500 time 0.4001 (0.4274) data time 0.0007 (0.0064) model time 0.3994 (0.4495) loss 6.6579 (7.6792) grad_norm 2.4668 (2.2757) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][100/625] eta 0:03:43 lr 0.000921 wd 0.0500 time 0.3999 (0.4264) data time 0.0006 (0.0059) model time 0.3993 (0.4428) loss 9.5314 (7.6725) grad_norm 1.9595 (2.2489) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][110/625] eta 0:03:38 lr 0.000921 wd 0.0500 time 0.4003 (0.4240) data time 0.0006 (0.0054) model time 0.3997 (0.4356) loss 8.2931 (7.6612) grad_norm 1.9697 (2.2753) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][120/625] eta 0:03:33 lr 0.000921 wd 0.0500 time 0.4083 (0.4222) data time 0.0009 (0.0050) model time 0.4074 (0.4307) loss 6.1135 (7.6564) grad_norm 1.8666 (2.3092) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][130/625] eta 0:03:28 lr 0.000921 wd 0.0500 time 0.4002 (0.4206) data time 0.0007 (0.0047) model time 0.3995 (0.4268) loss 7.6289 (7.6759) grad_norm 2.7258 (2.3238) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][140/625] eta 0:03:23 lr 0.000921 wd 0.0500 time 0.3990 (0.4197) data time 0.0007 (0.0045) model time 0.3983 (0.4246) loss 6.2687 (7.6515) grad_norm 1.9849 (2.3246) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][150/625] eta 0:03:18 lr 0.000921 wd 0.0500 time 0.4036 (0.4184) data time 0.0006 (0.0042) model time 0.4030 (0.4221) loss 5.6931 (7.6463) grad_norm 3.2681 (2.3224) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][160/625] eta 0:03:14 lr 0.000921 wd 0.0500 time 0.4003 (0.4186) data time 0.0008 (0.0041) model time 0.3995 (0.4219) loss 6.8035 (7.6102) grad_norm 2.0477 (2.3423) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][170/625] eta 0:03:09 lr 0.000920 wd 0.0500 time 0.3998 (0.4176) data time 0.0007 (0.0039) model time 0.3991 (0.4200) loss 8.4172 (7.6075) grad_norm 2.0001 (2.3229) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][180/625] eta 0:03:05 lr 0.000920 wd 0.0500 time 0.4011 (0.4166) data time 0.0008 (0.0037) model time 0.4003 (0.4184) loss 7.3676 (7.5814) grad_norm 1.7707 (2.3326) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][190/625] eta 0:03:00 lr 0.000920 wd 0.0500 time 0.3979 (0.4157) data time 0.0006 (0.0036) model time 0.3972 (0.4169) loss 6.1895 (7.5799) grad_norm 1.5444 (2.3132) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][200/625] eta 0:02:56 lr 0.000920 wd 0.0500 time 0.3988 (0.4150) data time 0.0007 (0.0035) model time 0.3981 (0.4158) loss 7.0545 (7.5752) grad_norm 2.5864 (2.2966) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][210/625] eta 0:02:51 lr 0.000920 wd 0.0500 time 0.3993 (0.4142) data time 0.0007 (0.0034) model time 0.3986 (0.4147) loss 5.6388 (7.5709) grad_norm 2.4716 (2.2934) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-24 23:59:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][220/625] eta 0:02:47 lr 0.000920 wd 0.0500 time 0.3982 (0.4136) data time 0.0009 (0.0033) model time 0.3974 (0.4138) loss 7.9229 (7.5663) grad_norm 2.1862 (2.2783) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][230/625] eta 0:02:43 lr 0.000920 wd 0.0500 time 0.4070 (0.4130) data time 0.0007 (0.0032) model time 0.4063 (0.4130) loss 8.4779 (7.5522) grad_norm 1.3869 (2.2698) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][240/625] eta 0:02:38 lr 0.000920 wd 0.0500 time 0.3966 (0.4124) data time 0.0006 (0.0031) model time 0.3960 (0.4122) loss 8.2520 (7.5398) grad_norm 1.7469 (2.2716) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][250/625] eta 0:02:34 lr 0.000920 wd 0.0500 time 0.3987 (0.4119) data time 0.0008 (0.0030) model time 0.3979 (0.4115) loss 8.1070 (7.5552) grad_norm 1.6902 (2.2527) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][260/625] eta 0:02:30 lr 0.000920 wd 0.0500 time 0.3981 (0.4114) data time 0.0008 (0.0029) model time 0.3972 (0.4109) loss 8.0312 (7.5679) grad_norm 1.7327 (2.2450) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][270/625] eta 0:02:25 lr 0.000920 wd 0.0500 time 0.3965 (0.4110) data time 0.0006 (0.0028) model time 0.3959 (0.4103) loss 8.6077 (7.5724) grad_norm 2.1121 (2.2388) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][280/625] eta 0:02:21 lr 0.000919 wd 0.0500 time 0.3993 (0.4108) data time 0.0007 (0.0031) model time 0.3987 (0.4098) loss 6.4196 (7.5597) grad_norm 1.9343 (2.2286) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][290/625] eta 0:02:17 lr 0.000919 wd 0.0500 time 0.6697 (0.4118) data time 0.0007 (0.0030) model time 0.6691 (0.4109) loss 8.5573 (7.5629) grad_norm 2.4864 (2.2243) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][300/625] eta 0:02:14 lr 0.000919 wd 0.0500 time 0.5983 (0.4151) data time 0.0008 (0.0029) model time 0.5975 (0.4149) loss 8.9531 (7.5688) grad_norm 2.6037 (2.2185) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][310/625] eta 0:02:11 lr 0.000919 wd 0.0500 time 0.3998 (0.4170) data time 0.0009 (0.0028) model time 0.3989 (0.4172) loss 7.5215 (7.5754) grad_norm 1.5677 (2.1979) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][320/625] eta 0:02:07 lr 0.000919 wd 0.0500 time 0.4010 (0.4169) data time 0.0006 (0.0028) model time 0.4004 (0.4170) loss 8.3033 (7.5767) grad_norm 1.6467 (2.1917) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][330/625] eta 0:02:02 lr 0.000919 wd 0.0500 time 0.3962 (0.4164) data time 0.0009 (0.0027) model time 0.3953 (0.4164) loss 7.5083 (7.5672) grad_norm 1.5162 (2.1911) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][340/625] eta 0:01:58 lr 0.000919 wd 0.0500 time 0.3977 (0.4159) data time 0.0007 (0.0027) model time 0.3970 (0.4157) loss 8.1047 (7.5715) grad_norm 1.7967 (2.2141) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][350/625] eta 0:01:54 lr 0.000919 wd 0.0500 time 0.4023 (0.4155) data time 0.0008 (0.0026) model time 0.4015 (0.4152) loss 7.7871 (7.5787) grad_norm 2.7297 (2.2232) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][360/625] eta 0:01:49 lr 0.000919 wd 0.0500 time 0.3994 (0.4150) data time 0.0006 (0.0026) model time 0.3988 (0.4147) loss 7.1134 (7.5709) grad_norm 2.1270 (2.2181) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:00:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][370/625] eta 0:01:45 lr 0.000919 wd 0.0500 time 0.4042 (0.4146) data time 0.0007 (0.0025) model time 0.4035 (0.4142) loss 8.2871 (7.5747) grad_norm 2.8261 (2.2166) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:01:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][380/625] eta 0:01:41 lr 0.000919 wd 0.0500 time 0.4034 (0.4147) data time 0.0008 (0.0025) model time 0.4026 (0.4143) loss 6.2138 (7.5656) grad_norm 2.0809 (2.2170) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:01:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][390/625] eta 0:01:37 lr 0.000918 wd 0.0500 time 0.3972 (0.4143) data time 0.0009 (0.0025) model time 0.3964 (0.4138) loss 7.8035 (7.5611) grad_norm 2.8377 (2.2262) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:01:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][400/625] eta 0:01:33 lr 0.000918 wd 0.0500 time 0.3980 (0.4139) data time 0.0008 (0.0024) model time 0.3972 (0.4133) loss 7.2259 (7.5625) grad_norm 2.0674 (inf) loss_scale 512.0000 (1018.8928) mem 14939MB [2024-07-25 00:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][410/625] eta 0:01:28 lr 0.000918 wd 0.0500 time 0.3970 (0.4136) data time 0.0007 (0.0024) model time 0.3963 (0.4130) loss 6.1096 (7.5618) grad_norm 2.5786 (inf) loss_scale 512.0000 (1006.5596) mem 14939MB [2024-07-25 00:01:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][420/625] eta 0:01:24 lr 0.000918 wd 0.0500 time 0.4068 (0.4133) data time 0.0006 (0.0023) model time 0.4062 (0.4127) loss 8.0054 (7.5613) grad_norm 2.1559 (inf) loss_scale 512.0000 (994.8124) mem 14939MB [2024-07-25 00:01:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][430/625] eta 0:01:21 lr 0.000918 wd 0.0500 time 0.3956 (0.4173) data time 0.0007 (0.0023) model time 0.3949 (0.4171) loss 6.1856 (7.5499) grad_norm 1.5520 (inf) loss_scale 512.0000 (983.6102) mem 14939MB [2024-07-25 00:01:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][440/625] eta 0:01:17 lr 0.000918 wd 0.0500 time 0.4054 (0.4169) data time 0.0007 (0.0023) model time 0.4048 (0.4166) loss 7.4666 (7.5506) grad_norm 2.7750 (inf) loss_scale 512.0000 (972.9161) mem 14939MB [2024-07-25 00:01:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][450/625] eta 0:01:12 lr 0.000918 wd 0.0500 time 0.3927 (0.4166) data time 0.0008 (0.0023) model time 0.3918 (0.4163) loss 8.0342 (7.5544) grad_norm 1.7755 (inf) loss_scale 512.0000 (962.6962) mem 14939MB [2024-07-25 00:01:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][460/625] eta 0:01:08 lr 0.000918 wd 0.0500 time 0.3966 (0.4162) data time 0.0007 (0.0022) model time 0.3959 (0.4159) loss 7.7879 (7.5590) grad_norm 2.1743 (inf) loss_scale 512.0000 (952.9197) mem 14939MB [2024-07-25 00:01:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][470/625] eta 0:01:04 lr 0.000918 wd 0.0500 time 0.4058 (0.4160) data time 0.0007 (0.0022) model time 0.4051 (0.4156) loss 7.9809 (7.5601) grad_norm 2.3111 (inf) loss_scale 512.0000 (943.5584) mem 14939MB [2024-07-25 00:01:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][480/625] eta 0:01:00 lr 0.000918 wd 0.0500 time 0.3941 (0.4156) data time 0.0007 (0.0022) model time 0.3934 (0.4152) loss 7.3570 (7.5504) grad_norm 1.7195 (inf) loss_scale 512.0000 (934.5863) mem 14939MB [2024-07-25 00:01:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][490/625] eta 0:00:56 lr 0.000918 wd 0.0500 time 0.3972 (0.4154) data time 0.0007 (0.0022) model time 0.3964 (0.4149) loss 6.9831 (7.5478) grad_norm 1.9926 (inf) loss_scale 512.0000 (925.9796) mem 14939MB [2024-07-25 00:01:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][500/625] eta 0:00:51 lr 0.000917 wd 0.0500 time 0.4409 (0.4152) data time 0.0009 (0.0021) model time 0.4400 (0.4146) loss 8.3687 (7.5466) grad_norm 3.2085 (inf) loss_scale 512.0000 (917.7166) mem 14939MB [2024-07-25 00:01:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][510/625] eta 0:00:47 lr 0.000917 wd 0.0500 time 0.3932 (0.4152) data time 0.0007 (0.0021) model time 0.3925 (0.4146) loss 8.9355 (7.5450) grad_norm 2.5178 (inf) loss_scale 512.0000 (909.7769) mem 14939MB [2024-07-25 00:02:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][520/625] eta 0:00:43 lr 0.000917 wd 0.0500 time 0.5883 (0.4176) data time 0.0007 (0.0021) model time 0.5876 (0.4173) loss 7.4108 (7.5371) grad_norm 3.0613 (inf) loss_scale 512.0000 (902.1420) mem 14939MB [2024-07-25 00:02:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][530/625] eta 0:00:39 lr 0.000917 wd 0.0500 time 0.3964 (0.4186) data time 0.0009 (0.0021) model time 0.3955 (0.4184) loss 8.7839 (7.5319) grad_norm 1.8503 (inf) loss_scale 512.0000 (894.7947) mem 14939MB [2024-07-25 00:02:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][540/625] eta 0:00:35 lr 0.000917 wd 0.0500 time 0.4088 (0.4186) data time 0.0008 (0.0021) model time 0.4080 (0.4184) loss 7.8524 (7.5384) grad_norm 2.4543 (inf) loss_scale 512.0000 (887.7190) mem 14939MB [2024-07-25 00:02:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][550/625] eta 0:00:31 lr 0.000917 wd 0.0500 time 0.4010 (0.4183) data time 0.0008 (0.0020) model time 0.4002 (0.4180) loss 8.3818 (7.5409) grad_norm 1.6637 (inf) loss_scale 512.0000 (880.9002) mem 14939MB [2024-07-25 00:02:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][560/625] eta 0:00:27 lr 0.000917 wd 0.0500 time 0.3980 (0.4180) data time 0.0007 (0.0020) model time 0.3973 (0.4177) loss 8.8340 (7.5399) grad_norm 1.8909 (inf) loss_scale 512.0000 (874.3244) mem 14939MB [2024-07-25 00:02:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][570/625] eta 0:00:22 lr 0.000917 wd 0.0500 time 0.4089 (0.4177) data time 0.0007 (0.0020) model time 0.4082 (0.4174) loss 7.1996 (7.5413) grad_norm 2.0622 (inf) loss_scale 512.0000 (867.9790) mem 14939MB [2024-07-25 00:02:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][580/625] eta 0:00:18 lr 0.000917 wd 0.0500 time 0.4000 (0.4175) data time 0.0007 (0.0020) model time 0.3993 (0.4171) loss 7.7871 (7.5391) grad_norm 1.7767 (inf) loss_scale 512.0000 (861.8520) mem 14939MB [2024-07-25 00:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][590/625] eta 0:00:14 lr 0.000917 wd 0.0500 time 0.4020 (0.4172) data time 0.0007 (0.0020) model time 0.4013 (0.4168) loss 7.0715 (7.5349) grad_norm 1.6292 (inf) loss_scale 512.0000 (855.9323) mem 14939MB [2024-07-25 00:02:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][600/625] eta 0:00:10 lr 0.000917 wd 0.0500 time 0.3945 (0.4174) data time 0.0008 (0.0019) model time 0.3936 (0.4170) loss 7.3153 (7.5376) grad_norm 3.0206 (inf) loss_scale 512.0000 (850.2097) mem 14939MB [2024-07-25 00:02:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][610/625] eta 0:00:06 lr 0.000917 wd 0.0500 time 0.3960 (0.4171) data time 0.0005 (0.0019) model time 0.3954 (0.4167) loss 7.4838 (7.5457) grad_norm 4.5671 (inf) loss_scale 512.0000 (844.6743) mem 14939MB [2024-07-25 00:02:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][620/625] eta 0:00:02 lr 0.000916 wd 0.0500 time 0.4129 (0.4169) data time 0.0006 (0.0019) model time 0.4124 (0.4164) loss 6.6177 (7.5449) grad_norm 1.4685 (inf) loss_scale 512.0000 (839.3172) mem 14939MB [2024-07-25 00:02:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 110 training takes 0:04:20 [2024-07-25 00:02:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:02:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:02:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.6089 (0.6089) Acc@1 88.184 (88.184) Acc@5 98.096 (98.096) Mem 14939MB [2024-07-25 00:02:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.9888 (0.7621) Acc@1 77.539 (84.144) Acc@5 94.971 (97.066) Mem 14939MB [2024-07-25 00:02:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.1777 (0.9097) Acc@1 72.412 (80.385) Acc@5 92.334 (95.336) Mem 14939MB [2024-07-25 00:02:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.010 Acc@5 95.327 [2024-07-25 00:02:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-07-25 00:02:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.786 (0.786) Loss 0.5996 (0.5996) Acc@1 88.330 (88.330) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 00:02:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.9775 (0.7445) Acc@1 79.395 (84.814) Acc@5 94.824 (97.230) Mem 14939MB [2024-07-25 00:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 1.1064 (0.8813) Acc@1 73.779 (81.190) Acc@5 93.848 (95.724) Mem 14939MB [2024-07-25 00:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.850 Acc@5 95.693 [2024-07-25 00:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 00:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.85% [2024-07-25 00:02:51 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:02:52 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:02:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][0/625] eta 0:07:58 lr 0.000916 wd 0.0500 time 0.7648 (0.7648) data time 0.3600 (0.3600) model time 0.0000 (0.0000) loss 9.0839 (9.0839) grad_norm 1.4563 (1.4563) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:02:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][10/625] eta 0:04:27 lr 0.000916 wd 0.0500 time 0.3979 (0.4344) data time 0.0008 (0.0336) model time 0.0000 (0.0000) loss 7.6600 (7.6367) grad_norm 1.7341 (2.0407) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][20/625] eta 0:04:14 lr 0.000916 wd 0.0500 time 0.3956 (0.4211) data time 0.0007 (0.0188) model time 0.0000 (0.0000) loss 7.5755 (7.6937) grad_norm 1.8415 (1.9827) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][30/625] eta 0:04:07 lr 0.000916 wd 0.0500 time 0.4097 (0.4162) data time 0.0007 (0.0130) model time 0.0000 (0.0000) loss 7.2779 (7.6834) grad_norm 2.1010 (1.9871) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][40/625] eta 0:04:01 lr 0.000916 wd 0.0500 time 0.4002 (0.4126) data time 0.0010 (0.0102) model time 0.0000 (0.0000) loss 7.8463 (7.6298) grad_norm 2.1903 (1.9298) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][50/625] eta 0:03:56 lr 0.000916 wd 0.0500 time 0.3993 (0.4107) data time 0.0007 (0.0084) model time 0.0000 (0.0000) loss 8.9619 (7.6254) grad_norm 2.5235 (2.0098) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][60/625] eta 0:03:51 lr 0.000916 wd 0.0500 time 0.4069 (0.4094) data time 0.0007 (0.0072) model time 0.4062 (0.4021) loss 6.8042 (7.6022) grad_norm 1.9715 (2.0394) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][70/625] eta 0:03:47 lr 0.000916 wd 0.0500 time 0.3933 (0.4092) data time 0.0010 (0.0063) model time 0.3923 (0.4041) loss 7.6498 (7.5809) grad_norm 2.3252 (2.1348) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][80/625] eta 0:03:42 lr 0.000916 wd 0.0500 time 0.3988 (0.4088) data time 0.0007 (0.0057) model time 0.3981 (0.4044) loss 8.0893 (7.6158) grad_norm 1.4229 (2.1231) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][90/625] eta 0:03:38 lr 0.000916 wd 0.0500 time 0.4043 (0.4083) data time 0.0006 (0.0052) model time 0.4037 (0.4043) loss 8.1666 (7.6271) grad_norm 3.5538 (2.1612) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][100/625] eta 0:03:34 lr 0.000915 wd 0.0500 time 0.3951 (0.4079) data time 0.0008 (0.0048) model time 0.3942 (0.4040) loss 8.1527 (7.6346) grad_norm 2.0119 (2.1563) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][110/625] eta 0:03:32 lr 0.000915 wd 0.0500 time 0.4109 (0.4124) data time 0.0008 (0.0044) model time 0.4100 (0.4127) loss 9.1045 (7.6735) grad_norm 2.5008 (2.1702) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][120/625] eta 0:03:33 lr 0.000915 wd 0.0500 time 0.5931 (0.4224) data time 0.0007 (0.0042) model time 0.5924 (0.4299) loss 7.5871 (7.6596) grad_norm 1.6306 (2.1768) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][130/625] eta 0:03:29 lr 0.000915 wd 0.0500 time 0.3934 (0.4236) data time 0.0006 (0.0039) model time 0.3928 (0.4308) loss 7.2795 (7.6856) grad_norm 2.4653 (2.2356) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][140/625] eta 0:03:25 lr 0.000915 wd 0.0500 time 0.3988 (0.4235) data time 0.0010 (0.0038) model time 0.3978 (0.4296) loss 8.1585 (7.6631) grad_norm 1.7477 (2.2236) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:03:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][150/625] eta 0:03:20 lr 0.000915 wd 0.0500 time 0.4096 (0.4226) data time 0.0008 (0.0036) model time 0.4088 (0.4275) loss 8.9395 (7.6745) grad_norm 2.4969 (2.2581) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][160/625] eta 0:03:15 lr 0.000915 wd 0.0500 time 0.3937 (0.4213) data time 0.0008 (0.0035) model time 0.3928 (0.4250) loss 7.8655 (7.6482) grad_norm 2.4970 (2.2702) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][170/625] eta 0:03:11 lr 0.000915 wd 0.0500 time 0.3984 (0.4202) data time 0.0006 (0.0034) model time 0.3978 (0.4230) loss 7.2232 (7.6439) grad_norm 2.0046 (2.2909) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][180/625] eta 0:03:06 lr 0.000915 wd 0.0500 time 0.4141 (0.4195) data time 0.0006 (0.0032) model time 0.4134 (0.4217) loss 8.1934 (7.6301) grad_norm 2.4477 (2.2684) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][190/625] eta 0:03:02 lr 0.000915 wd 0.0500 time 0.3987 (0.4186) data time 0.0008 (0.0031) model time 0.3978 (0.4202) loss 7.2009 (7.6341) grad_norm 2.9070 (2.2507) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][200/625] eta 0:02:57 lr 0.000915 wd 0.0500 time 0.4005 (0.4179) data time 0.0008 (0.0031) model time 0.3997 (0.4191) loss 6.2371 (7.6259) grad_norm 1.5612 (2.2452) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][210/625] eta 0:02:53 lr 0.000914 wd 0.0500 time 0.4118 (0.4174) data time 0.0006 (0.0030) model time 0.4112 (0.4182) loss 8.1209 (7.6467) grad_norm 2.5003 (2.2545) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][220/625] eta 0:02:48 lr 0.000914 wd 0.0500 time 0.3964 (0.4167) data time 0.0008 (0.0029) model time 0.3956 (0.4172) loss 6.7823 (7.6388) grad_norm 2.1419 (2.2713) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][230/625] eta 0:02:44 lr 0.000914 wd 0.0500 time 0.3997 (0.4161) data time 0.0007 (0.0028) model time 0.3990 (0.4164) loss 7.0178 (7.6070) grad_norm 2.2998 (2.2620) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][240/625] eta 0:02:40 lr 0.000914 wd 0.0500 time 0.4191 (0.4157) data time 0.0007 (0.0028) model time 0.4185 (0.4158) loss 6.9465 (7.6045) grad_norm 2.4319 (2.2699) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][250/625] eta 0:02:35 lr 0.000914 wd 0.0500 time 0.3944 (0.4151) data time 0.0008 (0.0027) model time 0.3936 (0.4150) loss 6.5681 (7.6061) grad_norm 1.3520 (2.2570) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][260/625] eta 0:02:31 lr 0.000914 wd 0.0500 time 0.3981 (0.4146) data time 0.0008 (0.0026) model time 0.3973 (0.4143) loss 8.1588 (7.5985) grad_norm 2.2506 (2.2403) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][270/625] eta 0:02:27 lr 0.000914 wd 0.0500 time 0.4263 (0.4142) data time 0.0007 (0.0026) model time 0.4257 (0.4137) loss 8.7538 (7.6023) grad_norm 1.8550 (2.2365) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][280/625] eta 0:02:22 lr 0.000914 wd 0.0500 time 0.3946 (0.4136) data time 0.0008 (0.0025) model time 0.3937 (0.4131) loss 6.1322 (7.6012) grad_norm 2.7326 (2.2377) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][290/625] eta 0:02:18 lr 0.000914 wd 0.0500 time 0.3986 (0.4133) data time 0.0006 (0.0025) model time 0.3980 (0.4125) loss 6.5375 (7.5958) grad_norm 2.2610 (2.2387) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:04:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][300/625] eta 0:02:14 lr 0.000914 wd 0.0500 time 0.4077 (0.4130) data time 0.0008 (0.0025) model time 0.4068 (0.4121) loss 7.3986 (7.5935) grad_norm 1.8162 (2.2331) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][310/625] eta 0:02:09 lr 0.000914 wd 0.0500 time 0.3943 (0.4126) data time 0.0006 (0.0024) model time 0.3937 (0.4117) loss 8.5648 (7.5930) grad_norm 1.7063 (2.2352) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][320/625] eta 0:02:05 lr 0.000913 wd 0.0500 time 0.3984 (0.4122) data time 0.0006 (0.0024) model time 0.3978 (0.4113) loss 8.1220 (7.5913) grad_norm 3.2601 (2.2437) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][330/625] eta 0:02:01 lr 0.000913 wd 0.0500 time 0.4071 (0.4131) data time 0.0007 (0.0024) model time 0.4064 (0.4122) loss 7.8024 (7.6010) grad_norm 1.4034 (2.2506) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][340/625] eta 0:01:58 lr 0.000913 wd 0.0500 time 0.6074 (0.4161) data time 0.0009 (0.0023) model time 0.6066 (0.4157) loss 6.5956 (7.5999) grad_norm 1.8391 (2.2382) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][350/625] eta 0:01:54 lr 0.000913 wd 0.0500 time 0.5346 (0.4174) data time 0.0009 (0.0023) model time 0.5337 (0.4173) loss 7.4441 (7.6030) grad_norm 3.7711 (2.2383) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][360/625] eta 0:01:50 lr 0.000913 wd 0.0500 time 0.4066 (0.4170) data time 0.0009 (0.0022) model time 0.4057 (0.4168) loss 7.7179 (7.6061) grad_norm 2.7616 (2.2436) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][370/625] eta 0:01:46 lr 0.000913 wd 0.0500 time 0.3950 (0.4166) data time 0.0007 (0.0022) model time 0.3943 (0.4163) loss 9.3281 (7.6027) grad_norm 2.5657 (2.2433) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][380/625] eta 0:01:41 lr 0.000913 wd 0.0500 time 0.4008 (0.4163) data time 0.0008 (0.0022) model time 0.4000 (0.4159) loss 8.0636 (7.6095) grad_norm 2.5937 (2.2407) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][390/625] eta 0:01:37 lr 0.000913 wd 0.0500 time 0.4113 (0.4160) data time 0.0007 (0.0021) model time 0.4106 (0.4156) loss 7.4744 (7.6028) grad_norm 3.1977 (2.2623) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][400/625] eta 0:01:33 lr 0.000913 wd 0.0500 time 0.3955 (0.4156) data time 0.0007 (0.0021) model time 0.3949 (0.4152) loss 7.9860 (7.6061) grad_norm 2.9054 (2.2775) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][410/625] eta 0:01:29 lr 0.000913 wd 0.0500 time 0.4000 (0.4153) data time 0.0009 (0.0021) model time 0.3991 (0.4148) loss 7.6956 (7.6140) grad_norm 1.6014 (2.2726) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][420/625] eta 0:01:25 lr 0.000913 wd 0.0500 time 0.4047 (0.4150) data time 0.0006 (0.0021) model time 0.4041 (0.4144) loss 7.3023 (7.6075) grad_norm 2.3111 (2.2643) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][430/625] eta 0:01:20 lr 0.000912 wd 0.0500 time 0.3995 (0.4146) data time 0.0007 (0.0020) model time 0.3988 (0.4140) loss 8.3532 (7.5939) grad_norm 1.7703 (2.2506) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][440/625] eta 0:01:16 lr 0.000912 wd 0.0500 time 0.3975 (0.4143) data time 0.0007 (0.0020) model time 0.3968 (0.4136) loss 7.8941 (7.5993) grad_norm 2.0240 (2.2468) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][450/625] eta 0:01:12 lr 0.000912 wd 0.0500 time 0.4173 (0.4141) data time 0.0007 (0.0020) model time 0.4166 (0.4134) loss 8.5767 (7.5978) grad_norm 1.8866 (2.2557) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][460/625] eta 0:01:08 lr 0.000912 wd 0.0500 time 0.3929 (0.4139) data time 0.0007 (0.0020) model time 0.3921 (0.4131) loss 6.9875 (7.6021) grad_norm 2.7266 (2.2641) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][470/625] eta 0:01:04 lr 0.000912 wd 0.0500 time 0.4059 (0.4137) data time 0.0007 (0.0019) model time 0.4053 (0.4129) loss 6.8184 (7.5937) grad_norm 2.2369 (2.2558) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][480/625] eta 0:00:59 lr 0.000912 wd 0.0500 time 0.4073 (0.4134) data time 0.0007 (0.0019) model time 0.4066 (0.4126) loss 6.3176 (7.5892) grad_norm 2.0352 (2.2492) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][490/625] eta 0:00:55 lr 0.000912 wd 0.0500 time 0.3972 (0.4132) data time 0.0006 (0.0019) model time 0.3966 (0.4124) loss 8.6540 (7.5904) grad_norm 2.2616 (2.2479) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][500/625] eta 0:00:51 lr 0.000912 wd 0.0500 time 0.4018 (0.4130) data time 0.0006 (0.0019) model time 0.4012 (0.4121) loss 7.7572 (7.5859) grad_norm 1.4417 (2.2457) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][510/625] eta 0:00:47 lr 0.000912 wd 0.0500 time 0.4180 (0.4128) data time 0.0007 (0.0019) model time 0.4173 (0.4119) loss 6.6266 (7.5910) grad_norm 1.8674 (2.2398) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][520/625] eta 0:00:43 lr 0.000912 wd 0.0500 time 0.3939 (0.4126) data time 0.0008 (0.0018) model time 0.3931 (0.4117) loss 7.6989 (7.5885) grad_norm 1.7887 (2.2413) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][530/625] eta 0:00:39 lr 0.000912 wd 0.0500 time 0.4008 (0.4124) data time 0.0008 (0.0018) model time 0.4000 (0.4115) loss 5.9036 (7.5868) grad_norm 1.5126 (2.2314) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][540/625] eta 0:00:35 lr 0.000911 wd 0.0500 time 0.4070 (0.4123) data time 0.0007 (0.0018) model time 0.4063 (0.4113) loss 6.7103 (7.5882) grad_norm 2.0814 (2.2281) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][550/625] eta 0:00:30 lr 0.000911 wd 0.0500 time 0.5746 (0.4131) data time 0.0006 (0.0018) model time 0.5740 (0.4122) loss 8.8357 (7.5908) grad_norm 2.4121 (2.2267) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][560/625] eta 0:00:26 lr 0.000911 wd 0.0500 time 0.3930 (0.4145) data time 0.0009 (0.0018) model time 0.3921 (0.4138) loss 8.4990 (7.5888) grad_norm 2.3287 (2.2245) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][570/625] eta 0:00:22 lr 0.000911 wd 0.0500 time 0.3975 (0.4151) data time 0.0006 (0.0018) model time 0.3969 (0.4145) loss 6.2894 (7.5865) grad_norm 1.6959 (2.2215) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][580/625] eta 0:00:18 lr 0.000911 wd 0.0500 time 0.4087 (0.4152) data time 0.0009 (0.0017) model time 0.4078 (0.4145) loss 7.4130 (7.5871) grad_norm 3.2519 (2.2250) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:06:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][590/625] eta 0:00:14 lr 0.000911 wd 0.0500 time 0.3925 (0.4150) data time 0.0008 (0.0017) model time 0.3917 (0.4143) loss 7.5476 (7.5911) grad_norm 2.1777 (2.2194) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][600/625] eta 0:00:10 lr 0.000911 wd 0.0500 time 0.4026 (0.4148) data time 0.0007 (0.0017) model time 0.4019 (0.4140) loss 7.9737 (7.5905) grad_norm 2.6693 (2.2171) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][610/625] eta 0:00:06 lr 0.000911 wd 0.0500 time 0.4110 (0.4146) data time 0.0006 (0.0017) model time 0.4104 (0.4138) loss 7.2290 (7.5948) grad_norm 2.0558 (2.2147) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][620/625] eta 0:00:02 lr 0.000911 wd 0.0500 time 0.3931 (0.4143) data time 0.0006 (0.0017) model time 0.3925 (0.4135) loss 5.8595 (7.5895) grad_norm 1.4813 (2.2113) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 111 training takes 0:04:18 [2024-07-25 00:07:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:07:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:07:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.6416 (0.6416) Acc@1 87.646 (87.646) Acc@5 98.242 (98.242) Mem 14939MB [2024-07-25 00:07:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 1.0391 (0.7908) Acc@1 77.734 (83.900) Acc@5 94.580 (96.982) Mem 14939MB [2024-07-25 00:07:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.087 (0.105) Loss 1.1748 (0.9415) Acc@1 72.852 (80.215) Acc@5 93.555 (95.310) Mem 14939MB [2024-07-25 00:07:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.880 Acc@5 95.300 [2024-07-25 00:07:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-25 00:07:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.790 (0.790) Loss 0.5977 (0.5977) Acc@1 88.379 (88.379) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 00:07:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.9751 (0.7432) Acc@1 79.443 (84.814) Acc@5 94.824 (97.243) Mem 14939MB [2024-07-25 00:07:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 1.1055 (0.8800) Acc@1 73.730 (81.206) Acc@5 93.848 (95.736) Mem 14939MB [2024-07-25 00:07:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.872 Acc@5 95.707 [2024-07-25 00:07:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 00:07:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.87% [2024-07-25 00:07:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:07:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][0/625] eta 0:08:30 lr 0.000911 wd 0.0500 time 0.8171 (0.8171) data time 0.4215 (0.4215) model time 0.0000 (0.0000) loss 7.6046 (7.6046) grad_norm 2.3397 (2.3397) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][10/625] eta 0:04:30 lr 0.000911 wd 0.0500 time 0.3934 (0.4394) data time 0.0009 (0.0392) model time 0.0000 (0.0000) loss 7.9528 (7.8026) grad_norm 2.6115 (2.9465) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][20/625] eta 0:04:14 lr 0.000910 wd 0.0500 time 0.4007 (0.4214) data time 0.0006 (0.0210) model time 0.0000 (0.0000) loss 7.2362 (7.5760) grad_norm 2.1048 (2.5116) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][30/625] eta 0:04:07 lr 0.000910 wd 0.0500 time 0.4057 (0.4156) data time 0.0009 (0.0145) model time 0.0000 (0.0000) loss 8.7966 (7.6020) grad_norm 2.3587 (2.3584) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][40/625] eta 0:04:01 lr 0.000910 wd 0.0500 time 0.4000 (0.4121) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 8.6515 (7.6478) grad_norm 1.5347 (2.3342) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][50/625] eta 0:03:55 lr 0.000910 wd 0.0500 time 0.3938 (0.4098) data time 0.0009 (0.0092) model time 0.0000 (0.0000) loss 7.9478 (7.6779) grad_norm 1.7223 (2.2583) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][60/625] eta 0:03:51 lr 0.000910 wd 0.0500 time 0.4045 (0.4089) data time 0.0007 (0.0078) model time 0.4038 (0.4033) loss 5.9134 (7.6184) grad_norm 1.9067 (2.2163) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][70/625] eta 0:03:46 lr 0.000910 wd 0.0500 time 0.3964 (0.4079) data time 0.0007 (0.0069) model time 0.3958 (0.4023) loss 6.4085 (7.6347) grad_norm 2.4608 (2.2562) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][80/625] eta 0:03:41 lr 0.000910 wd 0.0500 time 0.3999 (0.4072) data time 0.0007 (0.0061) model time 0.3992 (0.4020) loss 6.5870 (7.6171) grad_norm 2.3214 (2.2770) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][90/625] eta 0:03:37 lr 0.000910 wd 0.0500 time 0.4110 (0.4066) data time 0.0006 (0.0056) model time 0.4103 (0.4016) loss 7.2359 (7.6606) grad_norm 2.2806 (2.2885) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:07:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][100/625] eta 0:03:33 lr 0.000910 wd 0.0500 time 0.3961 (0.4061) data time 0.0008 (0.0051) model time 0.3952 (0.4013) loss 8.1279 (7.6536) grad_norm 2.4298 (2.2781) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][110/625] eta 0:03:29 lr 0.000910 wd 0.0500 time 0.3972 (0.4069) data time 0.0008 (0.0047) model time 0.3963 (0.4035) loss 6.8078 (7.6452) grad_norm 2.1357 (2.2577) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][120/625] eta 0:03:25 lr 0.000910 wd 0.0500 time 0.4057 (0.4065) data time 0.0008 (0.0044) model time 0.4048 (0.4031) loss 8.2996 (7.6301) grad_norm 1.7768 (2.2744) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][130/625] eta 0:03:21 lr 0.000909 wd 0.0500 time 0.3960 (0.4062) data time 0.0007 (0.0042) model time 0.3953 (0.4029) loss 9.4849 (7.6044) grad_norm 2.0876 (2.2683) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][140/625] eta 0:03:17 lr 0.000909 wd 0.0500 time 0.5846 (0.4071) data time 0.0008 (0.0039) model time 0.5839 (0.4046) loss 6.9336 (7.5408) grad_norm 1.3372 (2.2536) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][150/625] eta 0:03:16 lr 0.000909 wd 0.0500 time 0.6246 (0.4138) data time 0.0007 (0.0037) model time 0.6239 (0.4149) loss 8.8502 (7.5310) grad_norm 2.6726 (2.2722) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][160/625] eta 0:03:14 lr 0.000909 wd 0.0500 time 0.6092 (0.4179) data time 0.0010 (0.0036) model time 0.6082 (0.4207) loss 7.0827 (7.5037) grad_norm 2.3897 (2.2818) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][170/625] eta 0:03:10 lr 0.000909 wd 0.0500 time 0.3941 (0.4186) data time 0.0006 (0.0034) model time 0.3935 (0.4214) loss 8.5071 (7.5052) grad_norm 2.3809 (2.2861) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][180/625] eta 0:03:05 lr 0.000909 wd 0.0500 time 0.3954 (0.4177) data time 0.0009 (0.0033) model time 0.3945 (0.4199) loss 8.6693 (7.4996) grad_norm 2.3971 (2.2982) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][190/625] eta 0:03:01 lr 0.000909 wd 0.0500 time 0.4035 (0.4169) data time 0.0008 (0.0031) model time 0.4026 (0.4186) loss 6.4964 (7.4875) grad_norm 3.0066 (2.2866) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][200/625] eta 0:02:56 lr 0.000909 wd 0.0500 time 0.3947 (0.4162) data time 0.0009 (0.0030) model time 0.3938 (0.4174) loss 7.6539 (7.4705) grad_norm 2.1799 (2.2567) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][210/625] eta 0:02:52 lr 0.000909 wd 0.0500 time 0.4017 (0.4155) data time 0.0007 (0.0029) model time 0.4010 (0.4163) loss 8.2556 (7.4846) grad_norm 1.8025 (2.2307) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][220/625] eta 0:02:48 lr 0.000909 wd 0.0500 time 0.4090 (0.4150) data time 0.0006 (0.0028) model time 0.4083 (0.4156) loss 6.8334 (7.4734) grad_norm 1.7776 (2.2170) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][230/625] eta 0:02:43 lr 0.000909 wd 0.0500 time 0.3952 (0.4146) data time 0.0006 (0.0028) model time 0.3945 (0.4151) loss 6.6037 (7.4743) grad_norm 2.0761 (2.2280) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:08:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][240/625] eta 0:02:39 lr 0.000908 wd 0.0500 time 0.4000 (0.4142) data time 0.0009 (0.0027) model time 0.3991 (0.4144) loss 8.8569 (7.4733) grad_norm 2.1744 (2.2396) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][250/625] eta 0:02:35 lr 0.000908 wd 0.0500 time 0.4074 (0.4138) data time 0.0007 (0.0026) model time 0.4067 (0.4138) loss 6.9754 (7.4673) grad_norm 2.6227 (2.2345) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][260/625] eta 0:02:30 lr 0.000908 wd 0.0500 time 0.3977 (0.4137) data time 0.0008 (0.0026) model time 0.3969 (0.4136) loss 6.8903 (7.4664) grad_norm 1.6870 (2.2167) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][270/625] eta 0:02:26 lr 0.000908 wd 0.0500 time 0.4023 (0.4133) data time 0.0007 (0.0025) model time 0.4016 (0.4131) loss 7.8644 (7.4558) grad_norm 1.4260 (2.2043) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][280/625] eta 0:02:22 lr 0.000908 wd 0.0500 time 0.4052 (0.4129) data time 0.0008 (0.0024) model time 0.4043 (0.4127) loss 7.0678 (7.4610) grad_norm 3.4749 (2.2070) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][290/625] eta 0:02:18 lr 0.000908 wd 0.0500 time 0.4020 (0.4126) data time 0.0007 (0.0024) model time 0.4013 (0.4122) loss 8.5355 (7.4743) grad_norm 2.0871 (2.2201) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][300/625] eta 0:02:13 lr 0.000908 wd 0.0500 time 0.4005 (0.4122) data time 0.0008 (0.0023) model time 0.3997 (0.4117) loss 7.5673 (7.4777) grad_norm 1.9932 (2.2305) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][310/625] eta 0:02:09 lr 0.000908 wd 0.0500 time 0.4120 (0.4119) data time 0.0009 (0.0023) model time 0.4111 (0.4113) loss 6.7742 (7.4632) grad_norm 2.6776 (2.2278) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][320/625] eta 0:02:05 lr 0.000908 wd 0.0500 time 0.3923 (0.4115) data time 0.0009 (0.0023) model time 0.3914 (0.4109) loss 7.1221 (7.4634) grad_norm 1.9044 (2.2160) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][330/625] eta 0:02:01 lr 0.000908 wd 0.0500 time 0.3965 (0.4118) data time 0.0006 (0.0022) model time 0.3958 (0.4112) loss 5.4951 (7.4712) grad_norm 1.6418 (2.2149) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][340/625] eta 0:01:57 lr 0.000908 wd 0.0500 time 0.4040 (0.4116) data time 0.0006 (0.0022) model time 0.4033 (0.4109) loss 6.9776 (7.4650) grad_norm 2.2999 (2.2181) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][350/625] eta 0:01:53 lr 0.000907 wd 0.0500 time 0.3978 (0.4114) data time 0.0008 (0.0021) model time 0.3970 (0.4107) loss 6.8129 (7.4751) grad_norm 1.8889 (2.2161) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][360/625] eta 0:01:48 lr 0.000907 wd 0.0500 time 0.3974 (0.4111) data time 0.0006 (0.0021) model time 0.3967 (0.4104) loss 6.8791 (7.4800) grad_norm 1.7551 (2.2067) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][370/625] eta 0:01:45 lr 0.000907 wd 0.0500 time 0.6215 (0.4132) data time 0.0006 (0.0021) model time 0.6209 (0.4128) loss 7.1157 (7.4869) grad_norm 2.4195 (2.2081) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:09:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][380/625] eta 0:01:41 lr 0.000907 wd 0.0500 time 0.6634 (0.4160) data time 0.0006 (0.0021) model time 0.6628 (0.4160) loss 8.3821 (7.4900) grad_norm 1.7911 (2.1988) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][390/625] eta 0:01:37 lr 0.000907 wd 0.0500 time 0.3989 (0.4165) data time 0.0006 (0.0021) model time 0.3983 (0.4165) loss 7.3991 (7.4803) grad_norm 1.9237 (2.1929) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][400/625] eta 0:01:33 lr 0.000907 wd 0.0500 time 0.4055 (0.4161) data time 0.0008 (0.0020) model time 0.4047 (0.4160) loss 8.4145 (7.4904) grad_norm 1.9733 (2.1878) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][410/625] eta 0:01:29 lr 0.000907 wd 0.0500 time 0.3951 (0.4157) data time 0.0007 (0.0020) model time 0.3944 (0.4155) loss 8.6201 (7.4928) grad_norm 1.9095 (2.1837) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][420/625] eta 0:01:25 lr 0.000907 wd 0.0500 time 0.3984 (0.4153) data time 0.0008 (0.0020) model time 0.3976 (0.4151) loss 6.1891 (7.4972) grad_norm 1.6734 (2.1726) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][430/625] eta 0:01:20 lr 0.000907 wd 0.0500 time 0.4051 (0.4150) data time 0.0009 (0.0020) model time 0.4042 (0.4147) loss 8.4636 (7.4943) grad_norm 1.8334 (2.1708) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][440/625] eta 0:01:16 lr 0.000907 wd 0.0500 time 0.3943 (0.4147) data time 0.0008 (0.0019) model time 0.3935 (0.4143) loss 8.5680 (7.5003) grad_norm 1.8358 (2.1749) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][450/625] eta 0:01:12 lr 0.000907 wd 0.0500 time 0.4041 (0.4144) data time 0.0008 (0.0019) model time 0.4033 (0.4140) loss 7.0431 (7.4911) grad_norm 1.5945 (2.1719) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][460/625] eta 0:01:08 lr 0.000906 wd 0.0500 time 0.4129 (0.4141) data time 0.0006 (0.0019) model time 0.4123 (0.4137) loss 6.5912 (7.4909) grad_norm 2.6448 (2.1684) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][470/625] eta 0:01:04 lr 0.000906 wd 0.0500 time 0.4030 (0.4139) data time 0.0007 (0.0019) model time 0.4023 (0.4134) loss 9.5462 (7.4916) grad_norm 1.5604 (2.1666) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][480/625] eta 0:00:59 lr 0.000906 wd 0.0500 time 0.3990 (0.4136) data time 0.0006 (0.0019) model time 0.3983 (0.4131) loss 6.3479 (7.4909) grad_norm 3.3339 (2.1774) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][490/625] eta 0:00:55 lr 0.000906 wd 0.0500 time 0.4058 (0.4134) data time 0.0006 (0.0018) model time 0.4051 (0.4128) loss 7.2880 (7.4892) grad_norm 1.6975 (2.1792) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][500/625] eta 0:00:51 lr 0.000906 wd 0.0500 time 0.3976 (0.4131) data time 0.0009 (0.0018) model time 0.3968 (0.4125) loss 7.6403 (7.4871) grad_norm 1.6782 (2.1760) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][510/625] eta 0:00:47 lr 0.000906 wd 0.0500 time 0.4004 (0.4129) data time 0.0007 (0.0018) model time 0.3998 (0.4122) loss 7.5165 (7.4813) grad_norm 2.4948 (2.1799) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][520/625] eta 0:00:43 lr 0.000906 wd 0.0500 time 0.4034 (0.4127) data time 0.0009 (0.0018) model time 0.4024 (0.4120) loss 8.2656 (7.4862) grad_norm 1.7608 (2.1798) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:10:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][530/625] eta 0:00:39 lr 0.000906 wd 0.0500 time 0.3927 (0.4124) data time 0.0006 (0.0018) model time 0.3921 (0.4117) loss 7.1108 (7.4912) grad_norm 1.9872 (2.1783) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][540/625] eta 0:00:35 lr 0.000906 wd 0.0500 time 0.5936 (0.4126) data time 0.0008 (0.0018) model time 0.5928 (0.4119) loss 7.7217 (7.4900) grad_norm 1.8743 (2.1858) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][550/625] eta 0:00:30 lr 0.000906 wd 0.0500 time 0.4099 (0.4123) data time 0.0008 (0.0017) model time 0.4091 (0.4116) loss 7.2060 (7.4917) grad_norm 2.1576 (2.1870) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][560/625] eta 0:00:26 lr 0.000906 wd 0.0500 time 0.3961 (0.4121) data time 0.0009 (0.0017) model time 0.3952 (0.4113) loss 6.5443 (7.4898) grad_norm 2.7352 (2.1901) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][570/625] eta 0:00:22 lr 0.000905 wd 0.0500 time 0.3974 (0.4119) data time 0.0006 (0.0017) model time 0.3968 (0.4111) loss 7.8402 (7.4907) grad_norm 1.6999 (2.1901) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][580/625] eta 0:00:18 lr 0.000905 wd 0.0500 time 0.4051 (0.4117) data time 0.0009 (0.0017) model time 0.4042 (0.4109) loss 5.7749 (7.4856) grad_norm 2.4685 (2.1841) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][590/625] eta 0:00:14 lr 0.000905 wd 0.0500 time 0.6125 (0.4133) data time 0.0006 (0.0017) model time 0.6118 (0.4126) loss 6.8503 (7.4852) grad_norm 2.0640 (2.1836) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][600/625] eta 0:00:10 lr 0.000905 wd 0.0500 time 0.5720 (0.4149) data time 0.0007 (0.0017) model time 0.5713 (0.4144) loss 7.7646 (7.4793) grad_norm 1.5760 (2.1842) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][610/625] eta 0:00:06 lr 0.000905 wd 0.0500 time 0.3956 (0.4153) data time 0.0006 (0.0017) model time 0.3950 (0.4148) loss 7.0434 (7.4764) grad_norm 3.4864 (2.1821) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][620/625] eta 0:00:02 lr 0.000905 wd 0.0500 time 0.3937 (0.4151) data time 0.0006 (0.0017) model time 0.3930 (0.4145) loss 8.6460 (7.4838) grad_norm 1.8316 (2.1843) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 112 training takes 0:04:19 [2024-07-25 00:11:38 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:11:39 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:11:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.568 (0.568) Loss 0.6211 (0.6211) Acc@1 87.451 (87.451) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 00:11:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.130) Loss 0.9932 (0.7588) Acc@1 77.832 (84.255) Acc@5 95.020 (97.150) Mem 14939MB [2024-07-25 00:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.109) Loss 1.1553 (0.9066) Acc@1 74.316 (80.527) Acc@5 92.725 (95.457) Mem 14939MB [2024-07-25 00:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.206 Acc@5 95.405 [2024-07-25 00:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-07-25 00:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.21% [2024-07-25 00:11:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 00:11:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 00:11:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.186 (1.186) Loss 0.5962 (0.5962) Acc@1 88.525 (88.525) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 00:11:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.187) Loss 0.9722 (0.7420) Acc@1 79.395 (84.850) Acc@5 94.775 (97.239) Mem 14939MB [2024-07-25 00:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.139) Loss 1.1045 (0.8786) Acc@1 73.828 (81.227) Acc@5 93.896 (95.740) Mem 14939MB [2024-07-25 00:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.888 Acc@5 95.715 [2024-07-25 00:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 00:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.89% [2024-07-25 00:11:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:11:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:11:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][0/625] eta 0:08:08 lr 0.000905 wd 0.0500 time 0.7817 (0.7817) data time 0.3901 (0.3901) model time 0.0000 (0.0000) loss 6.1879 (6.1879) grad_norm 1.9851 (1.9851) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][10/625] eta 0:04:27 lr 0.000905 wd 0.0500 time 0.3920 (0.4344) data time 0.0009 (0.0363) model time 0.0000 (0.0000) loss 6.7485 (7.3306) grad_norm 1.6956 (2.3878) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][20/625] eta 0:04:14 lr 0.000905 wd 0.0500 time 0.4020 (0.4204) data time 0.0007 (0.0195) model time 0.0000 (0.0000) loss 9.3957 (7.4128) grad_norm 5.8336 (3.0098) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:11:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][30/625] eta 0:04:07 lr 0.000905 wd 0.0500 time 0.4105 (0.4154) data time 0.0006 (0.0135) model time 0.0000 (0.0000) loss 7.5555 (7.5275) grad_norm 5.3460 (3.0193) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][40/625] eta 0:04:02 lr 0.000905 wd 0.0500 time 0.3987 (0.4149) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 8.5735 (7.5785) grad_norm 2.2395 (2.9674) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][50/625] eta 0:03:57 lr 0.000904 wd 0.0500 time 0.4009 (0.4124) data time 0.0009 (0.0086) model time 0.0000 (0.0000) loss 8.0208 (7.5159) grad_norm 1.7027 (2.8274) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][60/625] eta 0:03:52 lr 0.000904 wd 0.0500 time 0.4000 (0.4110) data time 0.0009 (0.0074) model time 0.3992 (0.4031) loss 8.1755 (7.5218) grad_norm 1.6461 (2.6742) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][70/625] eta 0:03:47 lr 0.000904 wd 0.0500 time 0.3934 (0.4098) data time 0.0007 (0.0065) model time 0.3928 (0.4023) loss 7.5372 (7.5512) grad_norm 1.4798 (2.5489) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][80/625] eta 0:03:42 lr 0.000904 wd 0.0500 time 0.4000 (0.4088) data time 0.0007 (0.0058) model time 0.3993 (0.4018) loss 7.0863 (7.5455) grad_norm 1.7503 (2.5357) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][90/625] eta 0:03:38 lr 0.000904 wd 0.0500 time 0.4103 (0.4082) data time 0.0009 (0.0053) model time 0.4094 (0.4020) loss 6.5189 (7.5737) grad_norm 1.8501 (2.4947) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][100/625] eta 0:03:34 lr 0.000904 wd 0.0500 time 0.3943 (0.4077) data time 0.0008 (0.0049) model time 0.3935 (0.4020) loss 6.8624 (7.6002) grad_norm 3.0475 (2.6013) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][110/625] eta 0:03:29 lr 0.000904 wd 0.0500 time 0.3964 (0.4073) data time 0.0007 (0.0047) model time 0.3956 (0.4018) loss 6.0416 (7.5859) grad_norm 1.6479 (2.5509) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][120/625] eta 0:03:25 lr 0.000904 wd 0.0500 time 0.4121 (0.4071) data time 0.0008 (0.0044) model time 0.4112 (0.4020) loss 7.0236 (7.5522) grad_norm 2.8747 (2.5282) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][130/625] eta 0:03:21 lr 0.000904 wd 0.0500 time 0.3951 (0.4067) data time 0.0009 (0.0041) model time 0.3942 (0.4019) loss 8.3061 (7.5406) grad_norm 3.1485 (2.5540) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][140/625] eta 0:03:17 lr 0.000904 wd 0.0500 time 0.3994 (0.4064) data time 0.0007 (0.0039) model time 0.3987 (0.4018) loss 6.1197 (7.4902) grad_norm 1.8796 (2.5584) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][150/625] eta 0:03:13 lr 0.000904 wd 0.0500 time 0.4151 (0.4063) data time 0.0007 (0.0037) model time 0.4144 (0.4021) loss 6.3992 (7.4866) grad_norm 3.0921 (2.5737) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][160/625] eta 0:03:08 lr 0.000903 wd 0.0500 time 0.3935 (0.4060) data time 0.0010 (0.0035) model time 0.3926 (0.4019) loss 6.6784 (7.4741) grad_norm 1.7568 (2.5437) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:12:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][170/625] eta 0:03:04 lr 0.000903 wd 0.0500 time 0.4002 (0.4057) data time 0.0007 (0.0034) model time 0.3996 (0.4018) loss 7.1393 (7.5051) grad_norm 2.3897 (2.5152) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][180/625] eta 0:03:00 lr 0.000903 wd 0.0500 time 0.4129 (0.4064) data time 0.0009 (0.0032) model time 0.4120 (0.4029) loss 6.9564 (7.4988) grad_norm 1.3177 (2.4744) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][190/625] eta 0:02:59 lr 0.000903 wd 0.0500 time 0.3931 (0.4124) data time 0.0009 (0.0031) model time 0.3922 (0.4112) loss 7.8584 (7.5099) grad_norm 3.0453 (2.4429) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][200/625] eta 0:02:57 lr 0.000903 wd 0.0500 time 0.4080 (0.4165) data time 0.0009 (0.0030) model time 0.4071 (0.4168) loss 8.3418 (7.5091) grad_norm 1.6920 (2.4220) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][210/625] eta 0:02:52 lr 0.000903 wd 0.0500 time 0.3936 (0.4166) data time 0.0008 (0.0030) model time 0.3927 (0.4167) loss 7.9094 (7.5133) grad_norm 1.9142 (2.4159) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][220/625] eta 0:02:48 lr 0.000903 wd 0.0500 time 0.3936 (0.4160) data time 0.0007 (0.0029) model time 0.3930 (0.4159) loss 9.1463 (7.5148) grad_norm 2.0489 (2.3925) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][230/625] eta 0:02:44 lr 0.000903 wd 0.0500 time 0.4058 (0.4153) data time 0.0009 (0.0028) model time 0.4049 (0.4150) loss 6.4014 (7.5103) grad_norm 2.7557 (2.3833) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][240/625] eta 0:02:39 lr 0.000903 wd 0.0500 time 0.3942 (0.4147) data time 0.0007 (0.0027) model time 0.3935 (0.4142) loss 6.6103 (7.5117) grad_norm 2.5794 (2.4054) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][250/625] eta 0:02:35 lr 0.000903 wd 0.0500 time 0.3969 (0.4141) data time 0.0008 (0.0026) model time 0.3961 (0.4134) loss 8.6467 (7.5159) grad_norm 1.4713 (2.3860) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][260/625] eta 0:02:31 lr 0.000903 wd 0.0500 time 0.4110 (0.4144) data time 0.0007 (0.0026) model time 0.4103 (0.4138) loss 7.8931 (7.5160) grad_norm 1.9693 (2.3720) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][270/625] eta 0:02:26 lr 0.000902 wd 0.0500 time 0.3970 (0.4139) data time 0.0009 (0.0025) model time 0.3961 (0.4132) loss 7.9467 (7.5173) grad_norm 1.6658 (2.3670) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][280/625] eta 0:02:22 lr 0.000902 wd 0.0500 time 0.3990 (0.4138) data time 0.0008 (0.0025) model time 0.3982 (0.4130) loss 8.0282 (7.5284) grad_norm 1.7007 (2.3626) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][290/625] eta 0:02:18 lr 0.000902 wd 0.0500 time 0.4159 (0.4134) data time 0.0008 (0.0024) model time 0.4151 (0.4126) loss 7.6164 (7.5367) grad_norm 2.4009 (2.3687) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][300/625] eta 0:02:14 lr 0.000902 wd 0.0500 time 0.3973 (0.4130) data time 0.0009 (0.0024) model time 0.3964 (0.4120) loss 6.6717 (7.5333) grad_norm 2.2831 (2.3620) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][310/625] eta 0:02:09 lr 0.000902 wd 0.0500 time 0.4040 (0.4126) data time 0.0007 (0.0023) model time 0.4033 (0.4116) loss 7.8407 (7.5287) grad_norm 2.1365 (2.3544) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:13:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][320/625] eta 0:02:05 lr 0.000902 wd 0.0500 time 0.4055 (0.4123) data time 0.0007 (0.0023) model time 0.4049 (0.4112) loss 6.6831 (7.5198) grad_norm 2.2719 (2.3445) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][330/625] eta 0:02:01 lr 0.000902 wd 0.0500 time 0.3995 (0.4120) data time 0.0009 (0.0022) model time 0.3986 (0.4109) loss 8.1458 (7.5307) grad_norm 3.7593 (2.3379) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][340/625] eta 0:01:57 lr 0.000902 wd 0.0500 time 0.3980 (0.4117) data time 0.0008 (0.0022) model time 0.3972 (0.4105) loss 8.9673 (7.5387) grad_norm 1.8994 (2.3410) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][350/625] eta 0:01:53 lr 0.000902 wd 0.0500 time 0.4037 (0.4114) data time 0.0006 (0.0022) model time 0.4031 (0.4102) loss 7.5768 (7.5464) grad_norm 2.0743 (2.3419) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][360/625] eta 0:01:48 lr 0.000902 wd 0.0500 time 0.3957 (0.4112) data time 0.0008 (0.0021) model time 0.3949 (0.4100) loss 6.3340 (7.5527) grad_norm 1.5182 (2.3468) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][370/625] eta 0:01:44 lr 0.000902 wd 0.0500 time 0.3996 (0.4109) data time 0.0009 (0.0021) model time 0.3987 (0.4097) loss 7.5577 (7.5494) grad_norm 2.1790 (2.3430) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][380/625] eta 0:01:40 lr 0.000901 wd 0.0500 time 0.4116 (0.4107) data time 0.0008 (0.0021) model time 0.4108 (0.4094) loss 9.3824 (7.5387) grad_norm 1.8463 (2.3427) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][390/625] eta 0:01:36 lr 0.000901 wd 0.0500 time 0.3938 (0.4104) data time 0.0009 (0.0020) model time 0.3929 (0.4091) loss 8.1454 (7.5330) grad_norm 1.3897 (2.3477) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][400/625] eta 0:01:32 lr 0.000901 wd 0.0500 time 0.3965 (0.4104) data time 0.0009 (0.0020) model time 0.3956 (0.4091) loss 7.6955 (7.5230) grad_norm 2.6637 (2.3559) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][410/625] eta 0:01:28 lr 0.000901 wd 0.0500 time 0.6201 (0.4135) data time 0.0010 (0.0020) model time 0.6192 (0.4127) loss 7.7968 (7.5298) grad_norm 2.9660 (2.3563) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][420/625] eta 0:01:25 lr 0.000901 wd 0.0500 time 0.5744 (0.4149) data time 0.0007 (0.0020) model time 0.5737 (0.4142) loss 7.9414 (7.5321) grad_norm 2.4131 (2.3500) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][430/625] eta 0:01:20 lr 0.000901 wd 0.0500 time 0.3969 (0.4151) data time 0.0006 (0.0019) model time 0.3963 (0.4144) loss 8.4726 (7.5408) grad_norm 2.4758 (2.3459) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][440/625] eta 0:01:16 lr 0.000901 wd 0.0500 time 0.3975 (0.4147) data time 0.0008 (0.0019) model time 0.3967 (0.4140) loss 6.5781 (7.5373) grad_norm 1.5559 (2.3408) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][450/625] eta 0:01:12 lr 0.000901 wd 0.0500 time 0.4122 (0.4144) data time 0.0006 (0.0019) model time 0.4116 (0.4137) loss 8.6263 (7.5369) grad_norm 1.5789 (2.3300) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:14:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][460/625] eta 0:01:08 lr 0.000901 wd 0.0500 time 0.3966 (0.4141) data time 0.0006 (0.0019) model time 0.3960 (0.4133) loss 6.3008 (7.5318) grad_norm 2.0758 (2.3232) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:15:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][470/625] eta 0:01:04 lr 0.000901 wd 0.0500 time 0.3951 (0.4138) data time 0.0011 (0.0019) model time 0.3939 (0.4130) loss 6.0317 (7.5284) grad_norm 1.5503 (2.3143) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:15:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][480/625] eta 0:01:00 lr 0.000900 wd 0.0500 time 0.3944 (0.4140) data time 0.0009 (0.0018) model time 0.3935 (0.4132) loss 7.5958 (7.5218) grad_norm 2.0251 (2.2986) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:15:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][490/625] eta 0:00:55 lr 0.000900 wd 0.0500 time 0.4007 (0.4138) data time 0.0006 (0.0018) model time 0.4001 (0.4129) loss 6.9728 (7.5107) grad_norm 1.8624 (2.2935) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:15:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][500/625] eta 0:00:51 lr 0.000900 wd 0.0500 time 0.4001 (0.4135) data time 0.0007 (0.0018) model time 0.3995 (0.4126) loss 8.6228 (7.5132) grad_norm 2.2481 (2.2962) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:15:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][510/625] eta 0:00:47 lr 0.000900 wd 0.0500 time 0.3946 (0.4133) data time 0.0007 (0.0018) model time 0.3939 (0.4123) loss 7.6746 (7.5086) grad_norm 2.2792 (2.3007) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:15:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][520/625] eta 0:00:43 lr 0.000900 wd 0.0500 time 0.3972 (0.4131) data time 0.0009 (0.0018) model time 0.3963 (0.4121) loss 7.1377 (7.4986) grad_norm 1.9514 (2.2972) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:15:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][530/625] eta 0:00:39 lr 0.000900 wd 0.0500 time 0.4129 (0.4129) data time 0.0010 (0.0018) model time 0.4119 (0.4119) loss 7.8732 (7.5010) grad_norm 2.2117 (2.2914) loss_scale 1024.0000 (520.6780) mem 14939MB [2024-07-25 00:15:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][540/625] eta 0:00:35 lr 0.000900 wd 0.0500 time 0.4033 (0.4127) data time 0.0008 (0.0017) model time 0.4024 (0.4117) loss 8.5696 (7.5114) grad_norm 2.1117 (2.2911) loss_scale 1024.0000 (529.9815) mem 14939MB [2024-07-25 00:15:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][550/625] eta 0:00:30 lr 0.000900 wd 0.0500 time 0.3995 (0.4125) data time 0.0010 (0.0017) model time 0.3984 (0.4115) loss 6.6563 (7.5067) grad_norm 2.0654 (2.2884) loss_scale 1024.0000 (538.9474) mem 14939MB [2024-07-25 00:15:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][560/625] eta 0:00:26 lr 0.000900 wd 0.0500 time 0.4018 (0.4123) data time 0.0009 (0.0017) model time 0.4009 (0.4112) loss 7.9948 (7.5133) grad_norm 5.1156 (2.2911) loss_scale 1024.0000 (547.5936) mem 14939MB [2024-07-25 00:15:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][570/625] eta 0:00:22 lr 0.000900 wd 0.0500 time 0.3972 (0.4121) data time 0.0009 (0.0017) model time 0.3963 (0.4110) loss 8.3366 (7.5041) grad_norm 1.5680 (2.2912) loss_scale 1024.0000 (555.9370) mem 14939MB [2024-07-25 00:15:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][580/625] eta 0:00:18 lr 0.000900 wd 0.0500 time 0.3984 (0.4119) data time 0.0008 (0.0017) model time 0.3976 (0.4108) loss 7.7769 (7.5057) grad_norm 2.6996 (2.2885) loss_scale 1024.0000 (563.9931) mem 14939MB [2024-07-25 00:15:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][590/625] eta 0:00:14 lr 0.000899 wd 0.0500 time 0.4053 (0.4117) data time 0.0007 (0.0017) model time 0.4046 (0.4106) loss 5.8893 (7.5003) grad_norm 2.4656 (2.2919) loss_scale 1024.0000 (571.7766) mem 14939MB [2024-07-25 00:15:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][600/625] eta 0:00:10 lr 0.000899 wd 0.0500 time 0.3936 (0.4115) data time 0.0009 (0.0017) model time 0.3927 (0.4104) loss 8.3681 (7.5009) grad_norm 2.5528 (2.2928) loss_scale 1024.0000 (579.3012) mem 14939MB [2024-07-25 00:15:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][610/625] eta 0:00:06 lr 0.000899 wd 0.0500 time 0.4005 (0.4114) data time 0.0004 (0.0017) model time 0.4001 (0.4103) loss 7.1365 (7.4993) grad_norm 3.7235 (2.3022) loss_scale 1024.0000 (586.5794) mem 14939MB [2024-07-25 00:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][620/625] eta 0:00:02 lr 0.000899 wd 0.0500 time 0.5560 (0.4115) data time 0.0004 (0.0016) model time 0.5556 (0.4104) loss 6.1316 (7.4907) grad_norm 1.9728 (2.3081) loss_scale 1024.0000 (593.6232) mem 14939MB [2024-07-25 00:16:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 113 training takes 0:04:17 [2024-07-25 00:16:04 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:16:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.6260 (0.6260) Acc@1 86.621 (86.621) Acc@5 97.949 (97.949) Mem 14939MB [2024-07-25 00:16:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 1.0303 (0.7678) Acc@1 77.051 (84.007) Acc@5 94.336 (97.057) Mem 14939MB [2024-07-25 00:16:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.1357 (0.9108) Acc@1 73.584 (80.332) Acc@5 93.408 (95.438) Mem 14939MB [2024-07-25 00:16:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.082 Acc@5 95.429 [2024-07-25 00:16:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-25 00:16:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.792 (0.792) Loss 0.5957 (0.5957) Acc@1 88.623 (88.623) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 00:16:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.9707 (0.7404) Acc@1 79.492 (84.877) Acc@5 94.873 (97.252) Mem 14939MB [2024-07-25 00:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.1025 (0.8770) Acc@1 73.975 (81.255) Acc@5 93.994 (95.775) Mem 14939MB [2024-07-25 00:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.928 Acc@5 95.743 [2024-07-25 00:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 00:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.93% [2024-07-25 00:16:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:16:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:16:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][0/625] eta 0:07:47 lr 0.000899 wd 0.0500 time 0.7485 (0.7485) data time 0.3556 (0.3556) model time 0.0000 (0.0000) loss 7.4048 (7.4048) grad_norm 1.7692 (1.7692) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][10/625] eta 0:05:39 lr 0.000899 wd 0.0500 time 0.3944 (0.5526) data time 0.0006 (0.0331) model time 0.0000 (0.0000) loss 6.7887 (7.4798) grad_norm 2.9899 (2.1461) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][20/625] eta 0:05:10 lr 0.000899 wd 0.0500 time 0.4043 (0.5129) data time 0.0007 (0.0178) model time 0.0000 (0.0000) loss 6.5850 (7.5914) grad_norm 1.7939 (2.1934) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][30/625] eta 0:04:43 lr 0.000899 wd 0.0500 time 0.3987 (0.4766) data time 0.0007 (0.0124) model time 0.0000 (0.0000) loss 7.6595 (7.5120) grad_norm 3.0403 (2.1799) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][40/625] eta 0:04:27 lr 0.000899 wd 0.0500 time 0.3997 (0.4580) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 7.6315 (7.5534) grad_norm 3.2268 (2.4329) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][50/625] eta 0:04:17 lr 0.000899 wd 0.0500 time 0.4143 (0.4471) data time 0.0007 (0.0078) model time 0.0000 (0.0000) loss 7.9990 (7.5710) grad_norm 2.2509 (2.3882) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][60/625] eta 0:04:08 lr 0.000899 wd 0.0500 time 0.3984 (0.4396) data time 0.0007 (0.0067) model time 0.3977 (0.4003) loss 7.2460 (7.6422) grad_norm 2.0128 (2.3087) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][70/625] eta 0:04:01 lr 0.000898 wd 0.0500 time 0.3967 (0.4343) data time 0.0008 (0.0059) model time 0.3959 (0.4004) loss 7.5461 (7.6007) grad_norm 2.3961 (2.2788) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][80/625] eta 0:03:54 lr 0.000898 wd 0.0500 time 0.4024 (0.4304) data time 0.0008 (0.0053) model time 0.4016 (0.4010) loss 8.0347 (7.6404) grad_norm 1.6610 (2.2380) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][90/625] eta 0:03:48 lr 0.000898 wd 0.0500 time 0.3966 (0.4273) data time 0.0009 (0.0048) model time 0.3957 (0.4011) loss 8.3858 (7.6266) grad_norm 2.6867 (2.2102) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][100/625] eta 0:03:42 lr 0.000898 wd 0.0500 time 0.3985 (0.4246) data time 0.0007 (0.0045) model time 0.3978 (0.4007) loss 7.2360 (7.6287) grad_norm 2.2744 (2.1876) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][110/625] eta 0:03:37 lr 0.000898 wd 0.0500 time 0.4035 (0.4226) data time 0.0006 (0.0042) model time 0.4028 (0.4006) loss 8.3353 (7.6359) grad_norm 1.7091 (2.1795) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][120/625] eta 0:03:32 lr 0.000898 wd 0.0500 time 0.4017 (0.4208) data time 0.0006 (0.0039) model time 0.4011 (0.4007) loss 6.2877 (7.6518) grad_norm 2.0948 (2.1947) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][130/625] eta 0:03:27 lr 0.000898 wd 0.0500 time 0.3998 (0.4194) data time 0.0006 (0.0037) model time 0.3992 (0.4008) loss 6.6424 (7.6284) grad_norm 2.2420 (2.1792) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][140/625] eta 0:03:22 lr 0.000898 wd 0.0500 time 0.4082 (0.4182) data time 0.0007 (0.0035) model time 0.4076 (0.4007) loss 6.2178 (7.5947) grad_norm 2.5762 (2.1489) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][150/625] eta 0:03:18 lr 0.000898 wd 0.0500 time 0.3961 (0.4171) data time 0.0006 (0.0034) model time 0.3955 (0.4007) loss 7.0042 (7.5633) grad_norm 1.5312 (2.1725) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][160/625] eta 0:03:13 lr 0.000898 wd 0.0500 time 0.3992 (0.4162) data time 0.0009 (0.0032) model time 0.3983 (0.4008) loss 6.7296 (7.5658) grad_norm 1.6848 (2.2016) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][170/625] eta 0:03:09 lr 0.000898 wd 0.0500 time 0.4199 (0.4155) data time 0.0007 (0.0031) model time 0.4192 (0.4010) loss 7.7289 (7.5725) grad_norm 1.8843 (2.2074) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][180/625] eta 0:03:04 lr 0.000897 wd 0.0500 time 0.4009 (0.4148) data time 0.0009 (0.0030) model time 0.4000 (0.4011) loss 6.2865 (7.5372) grad_norm 2.5237 (2.2216) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][190/625] eta 0:03:00 lr 0.000897 wd 0.0500 time 0.3998 (0.4141) data time 0.0007 (0.0029) model time 0.3991 (0.4011) loss 7.7113 (7.5221) grad_norm 2.2079 (2.2180) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][200/625] eta 0:02:55 lr 0.000897 wd 0.0500 time 0.4034 (0.4137) data time 0.0007 (0.0028) model time 0.4027 (0.4013) loss 8.2515 (7.5288) grad_norm 2.2468 (2.2035) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][210/625] eta 0:02:51 lr 0.000897 wd 0.0500 time 0.3996 (0.4131) data time 0.0008 (0.0027) model time 0.3988 (0.4012) loss 8.2209 (7.5269) grad_norm 1.3388 (2.1944) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][220/625] eta 0:02:48 lr 0.000897 wd 0.0500 time 0.5027 (0.4166) data time 0.0006 (0.0026) model time 0.5021 (0.4064) loss 6.8369 (7.5456) grad_norm 1.8232 (2.1870) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][230/625] eta 0:02:46 lr 0.000897 wd 0.0500 time 0.5807 (0.4209) data time 0.0008 (0.0025) model time 0.5799 (0.4124) loss 7.6978 (7.5349) grad_norm 1.5421 (2.1752) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][240/625] eta 0:02:42 lr 0.000897 wd 0.0500 time 0.4042 (0.4230) data time 0.0007 (0.0025) model time 0.4035 (0.4155) loss 7.5695 (7.5269) grad_norm 1.8553 (2.1722) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:17:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][250/625] eta 0:02:38 lr 0.000897 wd 0.0500 time 0.3966 (0.4222) data time 0.0006 (0.0024) model time 0.3960 (0.4148) loss 8.6328 (7.5358) grad_norm 2.7143 (2.1718) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][260/625] eta 0:02:33 lr 0.000897 wd 0.0500 time 0.4227 (0.4214) data time 0.0007 (0.0024) model time 0.4220 (0.4142) loss 8.8623 (7.5308) grad_norm 5.4750 (2.1777) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][270/625] eta 0:02:29 lr 0.000897 wd 0.0500 time 0.3973 (0.4207) data time 0.0007 (0.0023) model time 0.3966 (0.4135) loss 8.3684 (7.5348) grad_norm 2.0731 (2.1824) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][280/625] eta 0:02:24 lr 0.000897 wd 0.0500 time 0.4022 (0.4200) data time 0.0009 (0.0023) model time 0.4013 (0.4130) loss 8.3632 (7.5355) grad_norm 2.0287 (2.1947) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][290/625] eta 0:02:20 lr 0.000896 wd 0.0500 time 0.4045 (0.4194) data time 0.0009 (0.0022) model time 0.4036 (0.4125) loss 7.6343 (7.5536) grad_norm 1.7948 (2.1884) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][300/625] eta 0:02:16 lr 0.000896 wd 0.0500 time 0.4010 (0.4189) data time 0.0007 (0.0022) model time 0.4002 (0.4121) loss 6.4548 (7.5388) grad_norm 1.9699 (2.1837) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][310/625] eta 0:02:11 lr 0.000896 wd 0.0500 time 0.4000 (0.4183) data time 0.0006 (0.0021) model time 0.3994 (0.4116) loss 9.0829 (7.5407) grad_norm 1.9276 (2.1837) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][320/625] eta 0:02:07 lr 0.000896 wd 0.0500 time 0.4049 (0.4179) data time 0.0009 (0.0021) model time 0.4040 (0.4114) loss 6.4240 (7.5383) grad_norm 2.6066 (2.1861) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][330/625] eta 0:02:03 lr 0.000896 wd 0.0500 time 0.3982 (0.4174) data time 0.0009 (0.0021) model time 0.3973 (0.4110) loss 5.7661 (7.5326) grad_norm 3.3667 (2.1951) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][340/625] eta 0:01:58 lr 0.000896 wd 0.0500 time 0.3967 (0.4172) data time 0.0007 (0.0020) model time 0.3960 (0.4109) loss 7.3487 (7.5222) grad_norm 1.8510 (2.1904) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][350/625] eta 0:01:54 lr 0.000896 wd 0.0500 time 0.4051 (0.4168) data time 0.0006 (0.0020) model time 0.4044 (0.4106) loss 7.1819 (7.5252) grad_norm 1.8386 (2.1913) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][360/625] eta 0:01:50 lr 0.000896 wd 0.0500 time 0.4013 (0.4164) data time 0.0008 (0.0020) model time 0.4005 (0.4103) loss 7.6282 (7.5276) grad_norm 1.9317 (2.1937) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][370/625] eta 0:01:46 lr 0.000896 wd 0.0500 time 0.3798 (0.4160) data time 0.0007 (0.0020) model time 0.3791 (0.4100) loss 6.7114 (7.5235) grad_norm 2.1899 (2.1922) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][380/625] eta 0:01:41 lr 0.000896 wd 0.0500 time 0.4086 (0.4157) data time 0.0009 (0.0019) model time 0.4077 (0.4098) loss 7.4916 (7.5275) grad_norm 2.9178 (2.1916) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][390/625] eta 0:01:37 lr 0.000896 wd 0.0500 time 0.3949 (0.4153) data time 0.0008 (0.0019) model time 0.3940 (0.4095) loss 7.7282 (7.5230) grad_norm 2.5718 (2.1859) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:18:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][400/625] eta 0:01:33 lr 0.000895 wd 0.0500 time 0.4031 (0.4150) data time 0.0006 (0.0019) model time 0.4025 (0.4093) loss 6.0498 (7.5185) grad_norm 3.3723 (2.1962) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][410/625] eta 0:01:29 lr 0.000895 wd 0.0500 time 0.4025 (0.4146) data time 0.0007 (0.0019) model time 0.4018 (0.4090) loss 7.0630 (7.5094) grad_norm 1.5091 (2.1961) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][420/625] eta 0:01:24 lr 0.000895 wd 0.0500 time 0.3991 (0.4143) data time 0.0006 (0.0018) model time 0.3984 (0.4088) loss 7.6280 (7.5051) grad_norm 1.7439 (2.1979) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][430/625] eta 0:01:20 lr 0.000895 wd 0.0500 time 0.3966 (0.4140) data time 0.0009 (0.0018) model time 0.3957 (0.4086) loss 7.4372 (7.5077) grad_norm 2.0675 (2.2055) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][440/625] eta 0:01:16 lr 0.000895 wd 0.0500 time 0.6066 (0.4152) data time 0.0008 (0.0018) model time 0.6058 (0.4100) loss 8.2040 (7.5088) grad_norm 3.0211 (2.2158) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][450/625] eta 0:01:12 lr 0.000895 wd 0.0500 time 0.4098 (0.4169) data time 0.0006 (0.0018) model time 0.4092 (0.4120) loss 6.7202 (7.5085) grad_norm 2.6925 (2.2255) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][460/625] eta 0:01:08 lr 0.000895 wd 0.0500 time 0.4033 (0.4181) data time 0.0008 (0.0018) model time 0.4025 (0.4135) loss 6.7789 (7.4984) grad_norm 2.1506 (2.2274) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][470/625] eta 0:01:04 lr 0.000895 wd 0.0500 time 0.4076 (0.4179) data time 0.0007 (0.0018) model time 0.4069 (0.4133) loss 7.0327 (7.5071) grad_norm 1.6473 (2.2276) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][480/625] eta 0:01:00 lr 0.000895 wd 0.0500 time 0.3943 (0.4175) data time 0.0007 (0.0017) model time 0.3937 (0.4130) loss 8.4507 (7.5185) grad_norm 3.1960 (2.2307) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][490/625] eta 0:00:56 lr 0.000895 wd 0.0500 time 0.4028 (0.4173) data time 0.0005 (0.0017) model time 0.4022 (0.4128) loss 6.6265 (7.5156) grad_norm 1.7306 (2.2245) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][500/625] eta 0:00:52 lr 0.000894 wd 0.0500 time 0.4141 (0.4170) data time 0.0007 (0.0017) model time 0.4135 (0.4126) loss 7.5186 (7.5226) grad_norm 2.1474 (2.2192) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][510/625] eta 0:00:47 lr 0.000894 wd 0.0500 time 0.3957 (0.4167) data time 0.0007 (0.0017) model time 0.3949 (0.4123) loss 7.8228 (7.5182) grad_norm 3.2946 (2.2176) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][520/625] eta 0:00:43 lr 0.000894 wd 0.0500 time 0.4027 (0.4165) data time 0.0007 (0.0017) model time 0.4020 (0.4121) loss 6.6700 (7.5134) grad_norm 1.9679 (2.2133) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][530/625] eta 0:00:39 lr 0.000894 wd 0.0500 time 0.4099 (0.4162) data time 0.0007 (0.0017) model time 0.4092 (0.4119) loss 7.2128 (7.5206) grad_norm 1.7573 (2.2230) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:19:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][540/625] eta 0:00:35 lr 0.000894 wd 0.0500 time 0.3971 (0.4159) data time 0.0009 (0.0017) model time 0.3962 (0.4117) loss 9.1520 (7.5251) grad_norm 4.9063 (2.2217) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][550/625] eta 0:00:31 lr 0.000894 wd 0.0500 time 0.4001 (0.4157) data time 0.0009 (0.0017) model time 0.3992 (0.4115) loss 8.2482 (7.5333) grad_norm 2.8681 (2.2220) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][560/625] eta 0:00:27 lr 0.000894 wd 0.0500 time 0.4143 (0.4155) data time 0.0009 (0.0016) model time 0.4134 (0.4113) loss 7.0973 (7.5316) grad_norm 1.9031 (2.2264) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][570/625] eta 0:00:22 lr 0.000894 wd 0.0500 time 0.3955 (0.4153) data time 0.0006 (0.0016) model time 0.3949 (0.4112) loss 7.1988 (7.5267) grad_norm 2.8532 (2.2259) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][580/625] eta 0:00:18 lr 0.000894 wd 0.0500 time 0.4105 (0.4151) data time 0.0007 (0.0016) model time 0.4098 (0.4110) loss 7.5808 (7.5295) grad_norm 2.2667 (2.2306) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][590/625] eta 0:00:14 lr 0.000894 wd 0.0500 time 0.4291 (0.4149) data time 0.0007 (0.0016) model time 0.4284 (0.4109) loss 5.8305 (7.5233) grad_norm 2.5314 (2.2317) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][600/625] eta 0:00:10 lr 0.000894 wd 0.0500 time 0.3937 (0.4147) data time 0.0007 (0.0016) model time 0.3930 (0.4107) loss 7.3625 (7.5272) grad_norm 1.7499 (2.2298) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][610/625] eta 0:00:06 lr 0.000893 wd 0.0500 time 0.4016 (0.4145) data time 0.0006 (0.0016) model time 0.4010 (0.4105) loss 8.3789 (7.5238) grad_norm 1.6622 (2.2270) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][620/625] eta 0:00:02 lr 0.000893 wd 0.0500 time 0.4091 (0.4143) data time 0.0006 (0.0016) model time 0.4085 (0.4103) loss 7.8396 (7.5281) grad_norm 1.6015 (2.2236) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 114 training takes 0:04:18 [2024-07-25 00:20:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:20:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:20:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.6440 (0.6440) Acc@1 87.451 (87.451) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-25 00:20:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.088 (0.120) Loss 0.9795 (0.7781) Acc@1 78.223 (84.038) Acc@5 94.775 (97.013) Mem 14939MB [2024-07-25 00:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1396 (0.9259) Acc@1 73.389 (80.273) Acc@5 93.359 (95.350) Mem 14939MB [2024-07-25 00:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.950 Acc@5 95.300 [2024-07-25 00:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-07-25 00:20:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.761 (0.761) Loss 0.5947 (0.5947) Acc@1 88.672 (88.672) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 00:20:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.9688 (0.7397) Acc@1 79.443 (84.912) Acc@5 94.873 (97.283) Mem 14939MB [2024-07-25 00:20:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.1016 (0.8757) Acc@1 73.926 (81.266) Acc@5 94.092 (95.805) Mem 14939MB [2024-07-25 00:20:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.950 Acc@5 95.775 [2024-07-25 00:20:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 00:20:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.95% [2024-07-25 00:20:36 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:20:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:20:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][0/625] eta 0:07:50 lr 0.000893 wd 0.0500 time 0.7532 (0.7532) data time 0.3779 (0.3779) model time 0.0000 (0.0000) loss 8.0925 (8.0925) grad_norm 1.6602 (1.6602) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][10/625] eta 0:04:27 lr 0.000893 wd 0.0500 time 0.4102 (0.4347) data time 0.0006 (0.0352) model time 0.0000 (0.0000) loss 6.0312 (7.6021) grad_norm 1.6990 (2.1307) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][20/625] eta 0:04:13 lr 0.000893 wd 0.0500 time 0.3970 (0.4189) data time 0.0006 (0.0189) model time 0.0000 (0.0000) loss 7.3620 (7.3437) grad_norm 1.8266 (2.0698) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][30/625] eta 0:04:08 lr 0.000893 wd 0.0500 time 0.3997 (0.4169) data time 0.0006 (0.0131) model time 0.0000 (0.0000) loss 9.0408 (7.3291) grad_norm 4.8458 (2.4414) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:20:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][40/625] eta 0:04:16 lr 0.000893 wd 0.0500 time 0.5703 (0.4388) data time 0.0007 (0.0101) model time 0.0000 (0.0000) loss 7.3838 (7.3877) grad_norm 1.4447 (2.3814) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][50/625] eta 0:04:19 lr 0.000893 wd 0.0500 time 0.5934 (0.4521) data time 0.0008 (0.0083) model time 0.0000 (0.0000) loss 7.8497 (7.3760) grad_norm 2.8804 (2.3402) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][60/625] eta 0:04:12 lr 0.000893 wd 0.0500 time 0.3975 (0.4464) data time 0.0006 (0.0071) model time 0.3969 (0.4168) loss 6.3649 (7.3408) grad_norm 1.8127 (2.2774) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][70/625] eta 0:04:04 lr 0.000893 wd 0.0500 time 0.4112 (0.4407) data time 0.0010 (0.0062) model time 0.4102 (0.4106) loss 8.0507 (7.3736) grad_norm 3.0951 (2.2910) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][80/625] eta 0:03:57 lr 0.000893 wd 0.0500 time 0.3971 (0.4359) data time 0.0011 (0.0056) model time 0.3961 (0.4073) loss 8.1449 (7.4019) grad_norm 1.9907 (2.2635) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][90/625] eta 0:03:51 lr 0.000892 wd 0.0500 time 0.3977 (0.4321) data time 0.0006 (0.0051) model time 0.3970 (0.4056) loss 7.3192 (7.4535) grad_norm 1.9509 (2.2557) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][100/625] eta 0:03:45 lr 0.000892 wd 0.0500 time 0.4080 (0.4290) data time 0.0006 (0.0047) model time 0.4074 (0.4045) loss 7.8299 (7.4655) grad_norm 2.6178 (2.2790) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][110/625] eta 0:03:39 lr 0.000892 wd 0.0500 time 0.3960 (0.4266) data time 0.0009 (0.0043) model time 0.3951 (0.4040) loss 8.1539 (7.4754) grad_norm 2.3754 (2.3217) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][120/625] eta 0:03:34 lr 0.000892 wd 0.0500 time 0.3943 (0.4244) data time 0.0007 (0.0040) model time 0.3936 (0.4034) loss 6.0270 (7.4632) grad_norm 1.9572 (2.3327) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][130/625] eta 0:03:29 lr 0.000892 wd 0.0500 time 0.4081 (0.4227) data time 0.0008 (0.0038) model time 0.4072 (0.4031) loss 7.3880 (7.4789) grad_norm 1.7885 (2.3184) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][140/625] eta 0:03:24 lr 0.000892 wd 0.0500 time 0.4007 (0.4212) data time 0.0007 (0.0036) model time 0.4000 (0.4028) loss 7.1231 (7.4818) grad_norm 2.5779 (2.3190) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][150/625] eta 0:03:19 lr 0.000892 wd 0.0500 time 0.3982 (0.4199) data time 0.0007 (0.0034) model time 0.3975 (0.4025) loss 6.8723 (7.4728) grad_norm 1.6433 (2.3062) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][160/625] eta 0:03:14 lr 0.000892 wd 0.0500 time 0.4110 (0.4189) data time 0.0006 (0.0033) model time 0.4104 (0.4025) loss 6.5109 (7.4576) grad_norm 1.8397 (2.2933) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][170/625] eta 0:03:10 lr 0.000892 wd 0.0500 time 0.4000 (0.4177) data time 0.0009 (0.0031) model time 0.3991 (0.4022) loss 6.1911 (7.4640) grad_norm 2.5093 (2.2856) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][180/625] eta 0:03:05 lr 0.000892 wd 0.0500 time 0.3998 (0.4168) data time 0.0008 (0.0030) model time 0.3990 (0.4020) loss 7.5610 (7.4763) grad_norm 1.9929 (2.2919) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:21:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][190/625] eta 0:03:01 lr 0.000892 wd 0.0500 time 0.4105 (0.4161) data time 0.0009 (0.0029) model time 0.4096 (0.4020) loss 7.5807 (7.4677) grad_norm 3.7851 (2.2907) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][200/625] eta 0:02:56 lr 0.000891 wd 0.0500 time 0.6079 (0.4164) data time 0.0009 (0.0028) model time 0.6070 (0.4033) loss 8.7076 (7.4675) grad_norm 1.8788 (2.2786) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][210/625] eta 0:02:52 lr 0.000891 wd 0.0500 time 0.4093 (0.4157) data time 0.0008 (0.0027) model time 0.4085 (0.4031) loss 8.2883 (7.4822) grad_norm 2.1590 (2.2879) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][220/625] eta 0:02:48 lr 0.000891 wd 0.0500 time 0.3924 (0.4150) data time 0.0009 (0.0027) model time 0.3915 (0.4029) loss 8.1574 (7.4668) grad_norm 1.8967 (2.2864) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][230/625] eta 0:02:43 lr 0.000891 wd 0.0500 time 0.3941 (0.4144) data time 0.0010 (0.0026) model time 0.3931 (0.4028) loss 8.6973 (7.4818) grad_norm 1.8952 (2.2943) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][240/625] eta 0:02:39 lr 0.000891 wd 0.0500 time 0.4121 (0.4139) data time 0.0008 (0.0025) model time 0.4113 (0.4027) loss 7.3549 (7.4775) grad_norm 1.8310 (2.2828) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][250/625] eta 0:02:35 lr 0.000891 wd 0.0500 time 0.3954 (0.4138) data time 0.0009 (0.0025) model time 0.3945 (0.4031) loss 8.3146 (7.5053) grad_norm 2.7107 (2.3003) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][260/625] eta 0:02:31 lr 0.000891 wd 0.0500 time 0.3981 (0.4162) data time 0.0008 (0.0024) model time 0.3973 (0.4065) loss 6.7120 (7.5097) grad_norm 2.9540 (2.3110) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][270/625] eta 0:02:28 lr 0.000891 wd 0.0500 time 0.3981 (0.4189) data time 0.0006 (0.0023) model time 0.3975 (0.4103) loss 8.3981 (7.5003) grad_norm 1.8397 (2.3148) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][280/625] eta 0:02:24 lr 0.000891 wd 0.0500 time 0.4094 (0.4196) data time 0.0008 (0.0023) model time 0.4086 (0.4114) loss 7.1011 (7.5016) grad_norm 1.5402 (2.3264) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][290/625] eta 0:02:20 lr 0.000891 wd 0.0500 time 0.4161 (0.4191) data time 0.0006 (0.0022) model time 0.4155 (0.4112) loss 9.2643 (7.5016) grad_norm 3.4591 (2.3416) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][300/625] eta 0:02:16 lr 0.000891 wd 0.0500 time 0.3943 (0.4185) data time 0.0006 (0.0022) model time 0.3937 (0.4107) loss 7.8510 (7.5022) grad_norm 2.8985 (2.3504) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][310/625] eta 0:02:11 lr 0.000890 wd 0.0500 time 0.4026 (0.4181) data time 0.0006 (0.0022) model time 0.4020 (0.4105) loss 6.4862 (7.5016) grad_norm 2.6764 (2.3776) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][320/625] eta 0:02:07 lr 0.000890 wd 0.0500 time 0.4171 (0.4177) data time 0.0008 (0.0021) model time 0.4163 (0.4103) loss 8.2078 (7.4966) grad_norm 1.5887 (2.3774) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:22:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][330/625] eta 0:02:03 lr 0.000890 wd 0.0500 time 0.3931 (0.4173) data time 0.0007 (0.0021) model time 0.3923 (0.4100) loss 7.2535 (7.4852) grad_norm 4.7385 (2.3805) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][340/625] eta 0:01:58 lr 0.000890 wd 0.0500 time 0.4056 (0.4169) data time 0.0009 (0.0021) model time 0.4047 (0.4098) loss 7.8422 (7.4959) grad_norm 1.9885 (2.3918) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][350/625] eta 0:01:54 lr 0.000890 wd 0.0500 time 0.4124 (0.4165) data time 0.0008 (0.0020) model time 0.4116 (0.4095) loss 8.1285 (7.5055) grad_norm 2.2884 (2.3904) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][360/625] eta 0:01:50 lr 0.000890 wd 0.0500 time 0.3955 (0.4161) data time 0.0007 (0.0020) model time 0.3948 (0.4092) loss 6.8656 (7.5009) grad_norm 2.7773 (2.3884) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][370/625] eta 0:01:46 lr 0.000890 wd 0.0500 time 0.4066 (0.4158) data time 0.0006 (0.0020) model time 0.4060 (0.4091) loss 7.3287 (7.5031) grad_norm 2.4470 (2.3842) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][380/625] eta 0:01:41 lr 0.000890 wd 0.0500 time 0.4103 (0.4155) data time 0.0007 (0.0019) model time 0.4097 (0.4089) loss 7.4753 (7.5051) grad_norm 1.9004 (2.3843) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][390/625] eta 0:01:37 lr 0.000890 wd 0.0500 time 0.3928 (0.4153) data time 0.0008 (0.0019) model time 0.3920 (0.4088) loss 7.8132 (7.5067) grad_norm 1.8379 (2.3859) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][400/625] eta 0:01:33 lr 0.000890 wd 0.0500 time 0.4086 (0.4150) data time 0.0006 (0.0019) model time 0.4081 (0.4087) loss 6.3917 (7.5056) grad_norm 2.1325 (2.3851) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][410/625] eta 0:01:29 lr 0.000889 wd 0.0500 time 0.4051 (0.4150) data time 0.0010 (0.0019) model time 0.4041 (0.4088) loss 8.2959 (7.5164) grad_norm 1.6029 (2.3758) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][420/625] eta 0:01:25 lr 0.000889 wd 0.0500 time 0.3794 (0.4150) data time 0.0007 (0.0019) model time 0.3787 (0.4090) loss 8.4657 (7.5207) grad_norm 2.5245 (2.3721) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][430/625] eta 0:01:20 lr 0.000889 wd 0.0500 time 0.3993 (0.4148) data time 0.0010 (0.0018) model time 0.3983 (0.4088) loss 7.5913 (7.5262) grad_norm 1.9578 (2.3641) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][440/625] eta 0:01:16 lr 0.000889 wd 0.0500 time 0.4187 (0.4146) data time 0.0007 (0.0018) model time 0.4179 (0.4087) loss 8.0197 (7.5317) grad_norm 1.7790 (2.3556) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][450/625] eta 0:01:12 lr 0.000889 wd 0.0500 time 0.3938 (0.4144) data time 0.0006 (0.0018) model time 0.3932 (0.4086) loss 8.0497 (7.5339) grad_norm 2.1574 (2.3498) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][460/625] eta 0:01:08 lr 0.000889 wd 0.0500 time 0.4012 (0.4141) data time 0.0008 (0.0018) model time 0.4004 (0.4084) loss 6.9098 (7.5314) grad_norm 1.6626 (2.3383) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][470/625] eta 0:01:04 lr 0.000889 wd 0.0500 time 0.4212 (0.4139) data time 0.0009 (0.0018) model time 0.4203 (0.4083) loss 8.7259 (7.5353) grad_norm 2.6896 (2.3347) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:23:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][480/625] eta 0:01:00 lr 0.000889 wd 0.0500 time 0.6221 (0.4157) data time 0.0006 (0.0018) model time 0.6215 (0.4104) loss 8.2925 (7.5358) grad_norm 2.6724 (2.3302) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][490/625] eta 0:00:56 lr 0.000889 wd 0.0500 time 0.5692 (0.4181) data time 0.0009 (0.0018) model time 0.5683 (0.4131) loss 7.0587 (7.5338) grad_norm 1.9835 (2.3290) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][500/625] eta 0:00:52 lr 0.000889 wd 0.0500 time 0.3979 (0.4180) data time 0.0008 (0.0017) model time 0.3971 (0.4132) loss 8.1488 (7.5301) grad_norm 3.4271 (2.3278) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][510/625] eta 0:00:48 lr 0.000889 wd 0.0500 time 0.4035 (0.4177) data time 0.0007 (0.0017) model time 0.4028 (0.4129) loss 5.6078 (7.5322) grad_norm 1.7570 (2.3436) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][520/625] eta 0:00:43 lr 0.000888 wd 0.0500 time 0.3951 (0.4174) data time 0.0006 (0.0017) model time 0.3945 (0.4127) loss 7.2574 (7.5342) grad_norm 1.7474 (2.3357) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][530/625] eta 0:00:39 lr 0.000888 wd 0.0500 time 0.3993 (0.4171) data time 0.0007 (0.0017) model time 0.3986 (0.4124) loss 7.5644 (7.5367) grad_norm 5.7633 (2.3451) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][540/625] eta 0:00:35 lr 0.000888 wd 0.0500 time 0.4039 (0.4168) data time 0.0009 (0.0017) model time 0.4030 (0.4121) loss 7.7929 (7.5382) grad_norm 1.5093 (2.3452) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][550/625] eta 0:00:31 lr 0.000888 wd 0.0500 time 0.3951 (0.4165) data time 0.0006 (0.0017) model time 0.3944 (0.4119) loss 6.5329 (7.5397) grad_norm 2.8545 (2.3438) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][560/625] eta 0:00:27 lr 0.000888 wd 0.0500 time 0.4005 (0.4162) data time 0.0008 (0.0017) model time 0.3997 (0.4116) loss 6.3808 (7.5425) grad_norm 2.0829 (2.3418) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][570/625] eta 0:00:22 lr 0.000888 wd 0.0500 time 0.4077 (0.4159) data time 0.0007 (0.0017) model time 0.4070 (0.4114) loss 7.9111 (7.5524) grad_norm 2.7823 (2.3408) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][580/625] eta 0:00:18 lr 0.000888 wd 0.0500 time 0.3985 (0.4156) data time 0.0007 (0.0016) model time 0.3978 (0.4111) loss 6.3901 (7.5508) grad_norm 4.7049 (2.3525) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][590/625] eta 0:00:14 lr 0.000888 wd 0.0500 time 0.3970 (0.4154) data time 0.0008 (0.0016) model time 0.3961 (0.4109) loss 7.1752 (7.5563) grad_norm 1.6992 (2.3549) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][600/625] eta 0:00:10 lr 0.000888 wd 0.0500 time 0.4046 (0.4152) data time 0.0006 (0.0016) model time 0.4040 (0.4108) loss 7.2565 (7.5579) grad_norm 2.1004 (2.3480) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][610/625] eta 0:00:06 lr 0.000888 wd 0.0500 time 0.3924 (0.4150) data time 0.0004 (0.0016) model time 0.3920 (0.4106) loss 8.2847 (7.5606) grad_norm 2.7925 (2.3546) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][620/625] eta 0:00:02 lr 0.000888 wd 0.0500 time 0.3985 (0.4147) data time 0.0004 (0.0016) model time 0.3982 (0.4104) loss 8.0727 (7.5627) grad_norm 3.1056 (2.3609) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:24:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 115 training takes 0:04:19 [2024-07-25 00:24:57 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:24:58 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:24:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.439 (0.439) Loss 0.5850 (0.5850) Acc@1 87.939 (87.939) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-25 00:24:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 1.0049 (0.7421) Acc@1 77.441 (84.149) Acc@5 94.434 (96.990) Mem 14939MB [2024-07-25 00:25:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.1396 (0.8983) Acc@1 73.242 (80.369) Acc@5 93.311 (95.340) Mem 14939MB [2024-07-25 00:25:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.062 Acc@5 95.308 [2024-07-25 00:25:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-25 00:25:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.807 (0.807) Loss 0.5933 (0.5933) Acc@1 88.672 (88.672) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 00:25:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.155) Loss 0.9678 (0.7387) Acc@1 79.541 (84.934) Acc@5 94.873 (97.275) Mem 14939MB [2024-07-25 00:25:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0986 (0.8745) Acc@1 73.828 (81.287) Acc@5 93.994 (95.803) Mem 14939MB [2024-07-25 00:25:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.972 Acc@5 95.771 [2024-07-25 00:25:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 00:25:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.97% [2024-07-25 00:25:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:25:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:25:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][0/625] eta 0:08:13 lr 0.000887 wd 0.0500 time 0.7902 (0.7902) data time 0.3946 (0.3946) model time 0.0000 (0.0000) loss 6.4240 (6.4240) grad_norm 2.4665 (2.4665) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][10/625] eta 0:04:28 lr 0.000887 wd 0.0500 time 0.3943 (0.4360) data time 0.0009 (0.0367) model time 0.0000 (0.0000) loss 6.5091 (7.4883) grad_norm 4.1776 (2.6016) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][20/625] eta 0:04:13 lr 0.000887 wd 0.0500 time 0.3988 (0.4189) data time 0.0007 (0.0197) model time 0.0000 (0.0000) loss 7.0256 (7.5240) grad_norm 3.7687 (2.7497) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][30/625] eta 0:04:05 lr 0.000887 wd 0.0500 time 0.4097 (0.4134) data time 0.0007 (0.0137) model time 0.0000 (0.0000) loss 8.3857 (7.4866) grad_norm 2.3582 (2.6022) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][40/625] eta 0:04:00 lr 0.000887 wd 0.0500 time 0.3941 (0.4108) data time 0.0009 (0.0107) model time 0.0000 (0.0000) loss 6.1200 (7.4811) grad_norm 2.1655 (2.4357) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][50/625] eta 0:03:55 lr 0.000887 wd 0.0500 time 0.4026 (0.4091) data time 0.0006 (0.0087) model time 0.0000 (0.0000) loss 8.1003 (7.5016) grad_norm 2.6870 (2.4255) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][60/625] eta 0:03:50 lr 0.000887 wd 0.0500 time 0.4086 (0.4083) data time 0.0007 (0.0075) model time 0.4079 (0.4029) loss 6.2366 (7.4766) grad_norm 2.8594 (2.4374) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][70/625] eta 0:03:48 lr 0.000887 wd 0.0500 time 0.5920 (0.4122) data time 0.0009 (0.0065) model time 0.5911 (0.4191) loss 6.8909 (7.4600) grad_norm 1.8476 (2.4221) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][80/625] eta 0:03:50 lr 0.000887 wd 0.0500 time 0.6049 (0.4234) data time 0.0008 (0.0059) model time 0.6040 (0.4468) loss 8.4424 (7.4475) grad_norm 1.6621 (2.3777) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][90/625] eta 0:03:50 lr 0.000887 wd 0.0500 time 0.4111 (0.4310) data time 0.0009 (0.0053) model time 0.4102 (0.4580) loss 5.7904 (7.4879) grad_norm 1.8657 (2.3964) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][100/625] eta 0:03:44 lr 0.000887 wd 0.0500 time 0.3983 (0.4281) data time 0.0007 (0.0049) model time 0.3976 (0.4464) loss 8.3794 (7.5265) grad_norm 3.1206 (2.4270) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][110/625] eta 0:03:39 lr 0.000886 wd 0.0500 time 0.4031 (0.4258) data time 0.0009 (0.0045) model time 0.4022 (0.4389) loss 9.0720 (7.5204) grad_norm 1.8127 (2.3721) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][120/625] eta 0:03:34 lr 0.000886 wd 0.0500 time 0.4094 (0.4239) data time 0.0006 (0.0042) model time 0.4088 (0.4336) loss 8.0687 (7.5331) grad_norm 1.8207 (2.3319) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:25:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][130/625] eta 0:03:28 lr 0.000886 wd 0.0500 time 0.3948 (0.4221) data time 0.0009 (0.0040) model time 0.3940 (0.4294) loss 7.1109 (7.5195) grad_norm 1.9078 (2.3058) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][140/625] eta 0:03:23 lr 0.000886 wd 0.0500 time 0.4008 (0.4206) data time 0.0007 (0.0038) model time 0.4002 (0.4262) loss 8.2554 (7.4957) grad_norm 3.1141 (2.3094) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][150/625] eta 0:03:19 lr 0.000886 wd 0.0500 time 0.4092 (0.4194) data time 0.0008 (0.0036) model time 0.4084 (0.4237) loss 7.2024 (7.4759) grad_norm 1.9141 (2.2810) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][160/625] eta 0:03:14 lr 0.000886 wd 0.0500 time 0.4005 (0.4183) data time 0.0007 (0.0034) model time 0.3999 (0.4215) loss 6.2190 (7.4704) grad_norm 2.6523 (2.2502) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][170/625] eta 0:03:09 lr 0.000886 wd 0.0500 time 0.3959 (0.4174) data time 0.0007 (0.0033) model time 0.3952 (0.4199) loss 7.0930 (7.4656) grad_norm 2.6089 (2.2472) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][180/625] eta 0:03:05 lr 0.000886 wd 0.0500 time 0.4154 (0.4167) data time 0.0007 (0.0032) model time 0.4147 (0.4187) loss 7.9539 (7.4687) grad_norm 1.6339 (2.2240) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][190/625] eta 0:03:01 lr 0.000886 wd 0.0500 time 0.3960 (0.4167) data time 0.0007 (0.0030) model time 0.3954 (0.4185) loss 6.0805 (7.4751) grad_norm 2.9346 (2.2171) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][200/625] eta 0:02:56 lr 0.000886 wd 0.0500 time 0.4024 (0.4159) data time 0.0008 (0.0029) model time 0.4017 (0.4173) loss 7.0351 (7.4673) grad_norm 2.1738 (2.2294) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][210/625] eta 0:02:52 lr 0.000886 wd 0.0500 time 0.4075 (0.4153) data time 0.0007 (0.0028) model time 0.4068 (0.4163) loss 8.3299 (7.4968) grad_norm 1.7942 (2.2250) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][220/625] eta 0:02:47 lr 0.000885 wd 0.0500 time 0.3936 (0.4147) data time 0.0008 (0.0028) model time 0.3928 (0.4154) loss 8.4554 (7.4937) grad_norm 1.2518 (2.2077) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][230/625] eta 0:02:43 lr 0.000885 wd 0.0500 time 0.4010 (0.4142) data time 0.0008 (0.0027) model time 0.4002 (0.4146) loss 7.0626 (7.4969) grad_norm 2.2243 (2.2170) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][240/625] eta 0:02:39 lr 0.000885 wd 0.0500 time 0.4056 (0.4137) data time 0.0006 (0.0026) model time 0.4050 (0.4140) loss 6.2973 (7.4927) grad_norm 1.9713 (2.2192) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][250/625] eta 0:02:34 lr 0.000885 wd 0.0500 time 0.3944 (0.4133) data time 0.0009 (0.0026) model time 0.3935 (0.4133) loss 6.6804 (7.4630) grad_norm 2.6553 (2.2400) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][260/625] eta 0:02:30 lr 0.000885 wd 0.0500 time 0.4003 (0.4128) data time 0.0008 (0.0025) model time 0.3995 (0.4127) loss 8.3440 (7.4722) grad_norm 2.7513 (2.2608) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:26:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][270/625] eta 0:02:26 lr 0.000885 wd 0.0500 time 0.4043 (0.4125) data time 0.0008 (0.0024) model time 0.4035 (0.4122) loss 7.6886 (7.4816) grad_norm 2.6210 (2.2907) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:27:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][280/625] eta 0:02:22 lr 0.000885 wd 0.0500 time 0.3980 (0.4121) data time 0.0007 (0.0024) model time 0.3973 (0.4117) loss 6.7644 (7.4808) grad_norm 1.9897 (2.2935) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:27:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][290/625] eta 0:02:18 lr 0.000885 wd 0.0500 time 0.6168 (0.4131) data time 0.0009 (0.0023) model time 0.6159 (0.4130) loss 8.2482 (7.4815) grad_norm 1.8821 (inf) loss_scale 512.0000 (1009.9244) mem 14939MB [2024-07-25 00:27:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][300/625] eta 0:02:15 lr 0.000885 wd 0.0500 time 0.6103 (0.4167) data time 0.0008 (0.0023) model time 0.6095 (0.4173) loss 8.3592 (7.4825) grad_norm 2.3148 (inf) loss_scale 512.0000 (993.3821) mem 14939MB [2024-07-25 00:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][310/625] eta 0:02:12 lr 0.000885 wd 0.0500 time 0.3945 (0.4192) data time 0.0009 (0.0023) model time 0.3935 (0.4202) loss 8.6638 (7.4822) grad_norm 1.7299 (inf) loss_scale 512.0000 (977.9035) mem 14939MB [2024-07-25 00:27:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][320/625] eta 0:02:07 lr 0.000884 wd 0.0500 time 0.4027 (0.4187) data time 0.0007 (0.0022) model time 0.4020 (0.4195) loss 5.5803 (7.4779) grad_norm 2.7614 (inf) loss_scale 512.0000 (963.3894) mem 14939MB [2024-07-25 00:27:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][330/625] eta 0:02:03 lr 0.000884 wd 0.0500 time 0.4146 (0.4183) data time 0.0009 (0.0022) model time 0.4137 (0.4190) loss 6.7689 (7.4758) grad_norm 3.8595 (inf) loss_scale 512.0000 (949.7523) mem 14939MB [2024-07-25 00:27:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][340/625] eta 0:01:59 lr 0.000884 wd 0.0500 time 0.3961 (0.4179) data time 0.0006 (0.0022) model time 0.3954 (0.4184) loss 8.7057 (7.4685) grad_norm 1.8862 (inf) loss_scale 512.0000 (936.9150) mem 14939MB [2024-07-25 00:27:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][350/625] eta 0:01:54 lr 0.000884 wd 0.0500 time 0.4238 (0.4176) data time 0.0008 (0.0021) model time 0.4230 (0.4181) loss 6.8748 (7.4733) grad_norm 2.6441 (inf) loss_scale 512.0000 (924.8091) mem 14939MB [2024-07-25 00:27:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][360/625] eta 0:01:50 lr 0.000884 wd 0.0500 time 0.4088 (0.4172) data time 0.0007 (0.0021) model time 0.4082 (0.4176) loss 8.4001 (7.4735) grad_norm 1.5951 (inf) loss_scale 512.0000 (913.3740) mem 14939MB [2024-07-25 00:27:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][370/625] eta 0:01:46 lr 0.000884 wd 0.0500 time 0.4033 (0.4171) data time 0.0008 (0.0021) model time 0.4025 (0.4173) loss 6.7466 (7.4718) grad_norm 1.9182 (inf) loss_scale 512.0000 (902.5553) mem 14939MB [2024-07-25 00:27:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][380/625] eta 0:01:42 lr 0.000884 wd 0.0500 time 0.4001 (0.4167) data time 0.0008 (0.0020) model time 0.3993 (0.4168) loss 6.9477 (7.4681) grad_norm 2.1109 (inf) loss_scale 512.0000 (892.3045) mem 14939MB [2024-07-25 00:27:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][390/625] eta 0:01:37 lr 0.000884 wd 0.0500 time 0.4041 (0.4163) data time 0.0006 (0.0020) model time 0.4035 (0.4164) loss 6.9605 (7.4691) grad_norm 2.0390 (inf) loss_scale 512.0000 (882.5780) mem 14939MB [2024-07-25 00:27:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][400/625] eta 0:01:33 lr 0.000884 wd 0.0500 time 0.3964 (0.4160) data time 0.0007 (0.0020) model time 0.3956 (0.4159) loss 8.1515 (7.4726) grad_norm 1.9603 (inf) loss_scale 512.0000 (873.3367) mem 14939MB [2024-07-25 00:27:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][410/625] eta 0:01:29 lr 0.000884 wd 0.0500 time 0.4027 (0.4160) data time 0.0008 (0.0020) model time 0.4019 (0.4159) loss 5.7127 (7.4760) grad_norm 2.8862 (inf) loss_scale 512.0000 (864.5450) mem 14939MB [2024-07-25 00:27:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][420/625] eta 0:01:25 lr 0.000884 wd 0.0500 time 0.4067 (0.4156) data time 0.0008 (0.0019) model time 0.4059 (0.4155) loss 6.7465 (7.4685) grad_norm 4.3491 (inf) loss_scale 512.0000 (856.1710) mem 14939MB [2024-07-25 00:28:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][430/625] eta 0:01:20 lr 0.000883 wd 0.0500 time 0.3980 (0.4153) data time 0.0006 (0.0019) model time 0.3974 (0.4151) loss 7.8887 (7.4595) grad_norm 1.7562 (inf) loss_scale 512.0000 (848.1856) mem 14939MB [2024-07-25 00:28:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][440/625] eta 0:01:16 lr 0.000883 wd 0.0500 time 0.4030 (0.4149) data time 0.0006 (0.0019) model time 0.4024 (0.4147) loss 6.4633 (7.4529) grad_norm 2.1792 (inf) loss_scale 512.0000 (840.5624) mem 14939MB [2024-07-25 00:28:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][450/625] eta 0:01:12 lr 0.000883 wd 0.0500 time 0.3999 (0.4147) data time 0.0009 (0.0019) model time 0.3990 (0.4144) loss 6.7964 (7.4558) grad_norm 2.6956 (inf) loss_scale 512.0000 (833.2772) mem 14939MB [2024-07-25 00:28:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][460/625] eta 0:01:08 lr 0.000883 wd 0.0500 time 0.3938 (0.4144) data time 0.0008 (0.0018) model time 0.3929 (0.4141) loss 6.9543 (7.4580) grad_norm 2.0253 (inf) loss_scale 512.0000 (826.3080) mem 14939MB [2024-07-25 00:28:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][470/625] eta 0:01:04 lr 0.000883 wd 0.0500 time 0.4005 (0.4141) data time 0.0006 (0.0018) model time 0.3999 (0.4138) loss 6.8134 (7.4551) grad_norm 2.2564 (inf) loss_scale 512.0000 (819.6348) mem 14939MB [2024-07-25 00:28:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][480/625] eta 0:01:00 lr 0.000883 wd 0.0500 time 0.4069 (0.4139) data time 0.0008 (0.0018) model time 0.4061 (0.4135) loss 9.3339 (7.4663) grad_norm 1.5377 (inf) loss_scale 512.0000 (813.2391) mem 14939MB [2024-07-25 00:28:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][490/625] eta 0:00:55 lr 0.000883 wd 0.0500 time 0.3949 (0.4136) data time 0.0009 (0.0018) model time 0.3940 (0.4132) loss 7.3531 (7.4633) grad_norm 1.6922 (inf) loss_scale 512.0000 (807.1039) mem 14939MB [2024-07-25 00:28:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][500/625] eta 0:00:51 lr 0.000883 wd 0.0500 time 0.3988 (0.4134) data time 0.0009 (0.0018) model time 0.3979 (0.4128) loss 8.0164 (7.4676) grad_norm 2.4673 (inf) loss_scale 512.0000 (801.2136) mem 14939MB [2024-07-25 00:28:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][510/625] eta 0:00:47 lr 0.000883 wd 0.0500 time 0.4091 (0.4133) data time 0.0006 (0.0018) model time 0.4084 (0.4128) loss 6.3358 (7.4721) grad_norm 2.0381 (inf) loss_scale 512.0000 (795.5538) mem 14939MB [2024-07-25 00:28:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][520/625] eta 0:00:43 lr 0.000883 wd 0.0500 time 0.5747 (0.4155) data time 0.0008 (0.0017) model time 0.5739 (0.4152) loss 7.8107 (7.4784) grad_norm 2.7027 (inf) loss_scale 512.0000 (790.1113) mem 14939MB [2024-07-25 00:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][530/625] eta 0:00:39 lr 0.000882 wd 0.0500 time 0.4022 (0.4170) data time 0.0008 (0.0017) model time 0.4014 (0.4169) loss 6.7480 (7.4692) grad_norm 2.5246 (inf) loss_scale 512.0000 (784.8738) mem 14939MB [2024-07-25 00:28:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][540/625] eta 0:00:35 lr 0.000882 wd 0.0500 time 0.3981 (0.4168) data time 0.0006 (0.0017) model time 0.3975 (0.4166) loss 6.8260 (7.4624) grad_norm 2.0607 (inf) loss_scale 512.0000 (779.8299) mem 14939MB [2024-07-25 00:28:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][550/625] eta 0:00:31 lr 0.000882 wd 0.0500 time 0.4100 (0.4165) data time 0.0009 (0.0017) model time 0.4092 (0.4163) loss 8.1681 (7.4598) grad_norm 2.4709 (inf) loss_scale 512.0000 (774.9691) mem 14939MB [2024-07-25 00:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][560/625] eta 0:00:27 lr 0.000882 wd 0.0500 time 0.3975 (0.4162) data time 0.0006 (0.0017) model time 0.3969 (0.4160) loss 6.9806 (7.4484) grad_norm 1.6601 (inf) loss_scale 512.0000 (770.2816) mem 14939MB [2024-07-25 00:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][570/625] eta 0:00:22 lr 0.000882 wd 0.0500 time 0.4091 (0.4160) data time 0.0007 (0.0017) model time 0.4084 (0.4157) loss 7.6903 (7.4542) grad_norm 2.9362 (inf) loss_scale 512.0000 (765.7583) mem 14939MB [2024-07-25 00:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][580/625] eta 0:00:18 lr 0.000882 wd 0.0500 time 0.4085 (0.4158) data time 0.0006 (0.0017) model time 0.4079 (0.4154) loss 6.1292 (7.4555) grad_norm 1.7578 (inf) loss_scale 512.0000 (761.3907) mem 14939MB [2024-07-25 00:29:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][590/625] eta 0:00:14 lr 0.000882 wd 0.0500 time 0.4006 (0.4155) data time 0.0008 (0.0016) model time 0.3998 (0.4152) loss 8.2233 (7.4650) grad_norm 2.2198 (inf) loss_scale 512.0000 (757.1709) mem 14939MB [2024-07-25 00:29:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][600/625] eta 0:00:10 lr 0.000882 wd 0.0500 time 0.4019 (0.4153) data time 0.0008 (0.0016) model time 0.4011 (0.4149) loss 7.0737 (7.4729) grad_norm 2.7941 (inf) loss_scale 512.0000 (753.0915) mem 14939MB [2024-07-25 00:29:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][610/625] eta 0:00:06 lr 0.000882 wd 0.0500 time 0.4081 (0.4151) data time 0.0006 (0.0016) model time 0.4075 (0.4147) loss 7.7926 (7.4749) grad_norm 4.9399 (inf) loss_scale 512.0000 (749.1457) mem 14939MB [2024-07-25 00:29:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][620/625] eta 0:00:02 lr 0.000882 wd 0.0500 time 0.3958 (0.4149) data time 0.0006 (0.0016) model time 0.3952 (0.4144) loss 8.1361 (7.4742) grad_norm 1.9309 (inf) loss_scale 512.0000 (745.3269) mem 14939MB [2024-07-25 00:29:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 116 training takes 0:04:19 [2024-07-25 00:29:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:29:25 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:29:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.6211 (0.6211) Acc@1 87.061 (87.061) Acc@5 97.852 (97.852) Mem 14939MB [2024-07-25 00:29:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 1.0225 (0.7675) Acc@1 76.953 (83.776) Acc@5 94.922 (97.026) Mem 14939MB [2024-07-25 00:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.1084 (0.9002) Acc@1 74.658 (80.359) Acc@5 93.896 (95.487) Mem 14939MB [2024-07-25 00:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.082 Acc@5 95.491 [2024-07-25 00:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-25 00:29:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.836 (0.836) Loss 0.5913 (0.5913) Acc@1 88.623 (88.623) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 00:29:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.158) Loss 0.9653 (0.7373) Acc@1 79.346 (84.939) Acc@5 94.873 (97.297) Mem 14939MB [2024-07-25 00:29:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.124) Loss 1.0967 (0.8730) Acc@1 73.877 (81.299) Acc@5 94.092 (95.824) Mem 14939MB [2024-07-25 00:29:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.000 Acc@5 95.787 [2024-07-25 00:29:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 00:29:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.00% [2024-07-25 00:29:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:29:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:29:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][0/625] eta 0:07:49 lr 0.000882 wd 0.0500 time 0.7516 (0.7516) data time 0.3626 (0.3626) model time 0.0000 (0.0000) loss 5.9386 (5.9386) grad_norm 1.9137 (1.9137) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:29:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][10/625] eta 0:04:30 lr 0.000881 wd 0.0500 time 0.3940 (0.4401) data time 0.0009 (0.0338) model time 0.0000 (0.0000) loss 7.3019 (7.4852) grad_norm 1.6314 (1.8674) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:29:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][20/625] eta 0:04:15 lr 0.000881 wd 0.0500 time 0.3959 (0.4226) data time 0.0007 (0.0182) model time 0.0000 (0.0000) loss 6.9170 (7.3380) grad_norm 1.7514 (2.3923) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:29:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][30/625] eta 0:04:07 lr 0.000881 wd 0.0500 time 0.4076 (0.4162) data time 0.0008 (0.0127) model time 0.0000 (0.0000) loss 8.0064 (7.3327) grad_norm 1.9341 (2.3072) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][40/625] eta 0:04:01 lr 0.000881 wd 0.0500 time 0.3947 (0.4126) data time 0.0007 (0.0099) model time 0.0000 (0.0000) loss 6.6118 (7.2958) grad_norm 3.2720 (2.2876) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:29:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][50/625] eta 0:03:55 lr 0.000881 wd 0.0500 time 0.3969 (0.4103) data time 0.0011 (0.0081) model time 0.0000 (0.0000) loss 7.4171 (7.3554) grad_norm 2.0371 (2.5128) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][60/625] eta 0:03:51 lr 0.000881 wd 0.0500 time 0.4082 (0.4092) data time 0.0008 (0.0069) model time 0.4073 (0.4028) loss 9.0259 (7.4855) grad_norm 1.5743 (2.4833) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][70/625] eta 0:03:46 lr 0.000881 wd 0.0500 time 0.3999 (0.4080) data time 0.0006 (0.0060) model time 0.3993 (0.4015) loss 8.3847 (7.4830) grad_norm 3.1584 (2.4903) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][80/625] eta 0:03:41 lr 0.000881 wd 0.0500 time 0.3990 (0.4071) data time 0.0008 (0.0054) model time 0.3982 (0.4009) loss 6.6758 (7.4721) grad_norm 2.7148 (2.4798) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][90/625] eta 0:03:37 lr 0.000881 wd 0.0500 time 0.4048 (0.4066) data time 0.0007 (0.0049) model time 0.4042 (0.4011) loss 6.6868 (7.4723) grad_norm 2.1453 (2.4743) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][100/625] eta 0:03:33 lr 0.000881 wd 0.0500 time 0.3928 (0.4062) data time 0.0006 (0.0046) model time 0.3922 (0.4011) loss 6.6275 (7.4717) grad_norm 2.3433 (2.4475) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][110/625] eta 0:03:31 lr 0.000881 wd 0.0500 time 0.4090 (0.4116) data time 0.0006 (0.0043) model time 0.4084 (0.4117) loss 6.1297 (7.4718) grad_norm 3.3552 (2.5287) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][120/625] eta 0:03:32 lr 0.000880 wd 0.0500 time 0.5885 (0.4202) data time 0.0008 (0.0040) model time 0.5876 (0.4265) loss 6.3665 (7.4578) grad_norm 2.0198 (2.5449) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][130/625] eta 0:03:28 lr 0.000880 wd 0.0500 time 0.4024 (0.4212) data time 0.0008 (0.0038) model time 0.4016 (0.4271) loss 7.3803 (7.4701) grad_norm 1.5859 (2.5182) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][140/625] eta 0:03:23 lr 0.000880 wd 0.0500 time 0.3936 (0.4197) data time 0.0007 (0.0036) model time 0.3929 (0.4241) loss 6.4481 (7.4820) grad_norm 1.8079 (2.5187) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][150/625] eta 0:03:18 lr 0.000880 wd 0.0500 time 0.3980 (0.4185) data time 0.0006 (0.0034) model time 0.3974 (0.4216) loss 8.4819 (7.4949) grad_norm 2.8983 (2.5281) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][160/625] eta 0:03:14 lr 0.000880 wd 0.0500 time 0.4197 (0.4183) data time 0.0007 (0.0032) model time 0.4190 (0.4211) loss 8.3935 (7.5057) grad_norm 1.5198 (2.4957) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][170/625] eta 0:03:09 lr 0.000880 wd 0.0500 time 0.3936 (0.4173) data time 0.0008 (0.0031) model time 0.3928 (0.4193) loss 6.4012 (7.5024) grad_norm 2.6239 (2.4641) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][180/625] eta 0:03:05 lr 0.000880 wd 0.0500 time 0.3987 (0.4165) data time 0.0008 (0.0030) model time 0.3979 (0.4179) loss 8.3052 (7.5209) grad_norm 2.6418 (2.4233) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][190/625] eta 0:03:00 lr 0.000880 wd 0.0500 time 0.4046 (0.4157) data time 0.0007 (0.0029) model time 0.4039 (0.4167) loss 8.5771 (7.5536) grad_norm 2.6843 (2.3967) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][200/625] eta 0:02:56 lr 0.000880 wd 0.0500 time 0.3968 (0.4149) data time 0.0007 (0.0028) model time 0.3961 (0.4156) loss 8.1498 (7.5430) grad_norm 2.1324 (2.3947) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:30:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][210/625] eta 0:02:51 lr 0.000880 wd 0.0500 time 0.3969 (0.4142) data time 0.0009 (0.0027) model time 0.3961 (0.4145) loss 8.2789 (7.5456) grad_norm 2.1283 (2.3876) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][220/625] eta 0:02:47 lr 0.000880 wd 0.0500 time 0.4133 (0.4136) data time 0.0006 (0.0026) model time 0.4127 (0.4137) loss 5.7653 (7.5094) grad_norm 2.0873 (2.3611) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][230/625] eta 0:02:43 lr 0.000879 wd 0.0500 time 0.3941 (0.4131) data time 0.0007 (0.0025) model time 0.3934 (0.4129) loss 6.4506 (7.4999) grad_norm 1.6430 (2.3561) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][240/625] eta 0:02:38 lr 0.000879 wd 0.0500 time 0.3948 (0.4126) data time 0.0007 (0.0025) model time 0.3941 (0.4122) loss 5.9145 (7.4994) grad_norm 3.6198 (2.3491) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][250/625] eta 0:02:34 lr 0.000879 wd 0.0500 time 0.4091 (0.4122) data time 0.0007 (0.0024) model time 0.4085 (0.4117) loss 7.2340 (7.5039) grad_norm 1.7974 (2.3555) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][260/625] eta 0:02:30 lr 0.000879 wd 0.0500 time 0.3971 (0.4118) data time 0.0006 (0.0024) model time 0.3964 (0.4112) loss 7.5001 (7.5200) grad_norm 1.5885 (2.3587) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][270/625] eta 0:02:26 lr 0.000879 wd 0.0500 time 0.4010 (0.4114) data time 0.0006 (0.0023) model time 0.4004 (0.4107) loss 8.5704 (7.5259) grad_norm 4.5407 (2.3625) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][280/625] eta 0:02:21 lr 0.000879 wd 0.0500 time 0.4074 (0.4112) data time 0.0006 (0.0023) model time 0.4068 (0.4104) loss 7.9810 (7.5211) grad_norm 1.5394 (2.3540) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][290/625] eta 0:02:17 lr 0.000879 wd 0.0500 time 0.3979 (0.4109) data time 0.0009 (0.0022) model time 0.3970 (0.4100) loss 5.8814 (7.5056) grad_norm 1.7880 (2.3386) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][300/625] eta 0:02:13 lr 0.000879 wd 0.0500 time 0.3976 (0.4105) data time 0.0008 (0.0022) model time 0.3968 (0.4096) loss 8.1312 (7.5245) grad_norm 2.5597 (2.3451) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][310/625] eta 0:02:09 lr 0.000879 wd 0.0500 time 0.4111 (0.4103) data time 0.0008 (0.0021) model time 0.4103 (0.4094) loss 8.0688 (7.5228) grad_norm 3.1578 (2.3616) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][320/625] eta 0:02:05 lr 0.000879 wd 0.0500 time 0.3973 (0.4100) data time 0.0009 (0.0021) model time 0.3964 (0.4090) loss 7.7899 (7.5232) grad_norm 1.9519 (2.3595) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][330/625] eta 0:02:01 lr 0.000878 wd 0.0500 time 0.3994 (0.4114) data time 0.0008 (0.0021) model time 0.3987 (0.4106) loss 8.1566 (7.5265) grad_norm 1.8437 (2.3432) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][340/625] eta 0:01:58 lr 0.000878 wd 0.0500 time 0.5795 (0.4147) data time 0.0008 (0.0020) model time 0.5786 (0.4145) loss 6.1288 (7.5092) grad_norm 2.4517 (2.3386) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:31:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][350/625] eta 0:01:54 lr 0.000878 wd 0.0500 time 0.3968 (0.4153) data time 0.0006 (0.0020) model time 0.3962 (0.4153) loss 6.7617 (7.5034) grad_norm 1.9267 (2.3578) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][360/625] eta 0:01:49 lr 0.000878 wd 0.0500 time 0.4074 (0.4149) data time 0.0008 (0.0020) model time 0.4066 (0.4147) loss 8.0443 (7.5022) grad_norm 2.5746 (2.3650) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][370/625] eta 0:01:45 lr 0.000878 wd 0.0500 time 0.3962 (0.4145) data time 0.0008 (0.0019) model time 0.3953 (0.4143) loss 7.3958 (7.4929) grad_norm 1.6983 (2.3567) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][380/625] eta 0:01:41 lr 0.000878 wd 0.0500 time 0.4003 (0.4147) data time 0.0008 (0.0019) model time 0.3995 (0.4144) loss 7.0191 (7.4894) grad_norm 1.4972 (2.3445) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][390/625] eta 0:01:37 lr 0.000878 wd 0.0500 time 0.4099 (0.4144) data time 0.0008 (0.0019) model time 0.4091 (0.4141) loss 5.6075 (7.4811) grad_norm 1.7689 (2.3328) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][400/625] eta 0:01:33 lr 0.000878 wd 0.0500 time 0.4022 (0.4141) data time 0.0008 (0.0019) model time 0.4014 (0.4137) loss 7.3648 (7.4808) grad_norm 1.9660 (2.3254) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][410/625] eta 0:01:28 lr 0.000878 wd 0.0500 time 0.4003 (0.4138) data time 0.0008 (0.0018) model time 0.3994 (0.4134) loss 7.8055 (7.4758) grad_norm 1.7150 (2.3286) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][420/625] eta 0:01:24 lr 0.000878 wd 0.0500 time 0.4077 (0.4136) data time 0.0006 (0.0018) model time 0.4071 (0.4130) loss 6.8920 (7.4698) grad_norm 2.1560 (2.3190) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][430/625] eta 0:01:20 lr 0.000878 wd 0.0500 time 0.3983 (0.4133) data time 0.0008 (0.0018) model time 0.3975 (0.4127) loss 6.8241 (7.4777) grad_norm 2.1848 (2.3161) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][440/625] eta 0:01:16 lr 0.000877 wd 0.0500 time 0.4020 (0.4131) data time 0.0008 (0.0018) model time 0.4012 (0.4125) loss 7.5302 (7.4791) grad_norm 3.6356 (2.3308) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][450/625] eta 0:01:12 lr 0.000877 wd 0.0500 time 0.4167 (0.4129) data time 0.0006 (0.0018) model time 0.4161 (0.4122) loss 8.1487 (7.4832) grad_norm 3.3901 (2.3493) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][460/625] eta 0:01:08 lr 0.000877 wd 0.0500 time 0.3969 (0.4126) data time 0.0006 (0.0017) model time 0.3963 (0.4120) loss 5.8109 (7.4750) grad_norm 1.4874 (2.3394) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][470/625] eta 0:01:03 lr 0.000877 wd 0.0500 time 0.4023 (0.4125) data time 0.0008 (0.0017) model time 0.4015 (0.4118) loss 7.1575 (7.4648) grad_norm 2.1924 (2.3346) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][480/625] eta 0:00:59 lr 0.000877 wd 0.0500 time 0.4101 (0.4123) data time 0.0006 (0.0017) model time 0.4095 (0.4115) loss 7.0844 (7.4586) grad_norm 2.0400 (2.3258) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][490/625] eta 0:00:55 lr 0.000877 wd 0.0500 time 0.3950 (0.4121) data time 0.0006 (0.0017) model time 0.3944 (0.4113) loss 8.0993 (7.4673) grad_norm 1.5147 (2.3144) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:32:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][500/625] eta 0:00:51 lr 0.000877 wd 0.0500 time 0.4001 (0.4119) data time 0.0006 (0.0017) model time 0.3995 (0.4111) loss 7.6437 (7.4671) grad_norm 2.6786 (2.3119) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][510/625] eta 0:00:47 lr 0.000877 wd 0.0500 time 0.4111 (0.4117) data time 0.0008 (0.0017) model time 0.4103 (0.4109) loss 5.8404 (7.4711) grad_norm 1.8018 (2.3094) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][520/625] eta 0:00:43 lr 0.000877 wd 0.0500 time 0.3986 (0.4115) data time 0.0006 (0.0016) model time 0.3980 (0.4107) loss 7.5451 (7.4701) grad_norm 1.6705 (2.3083) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][530/625] eta 0:00:39 lr 0.000877 wd 0.0500 time 0.3974 (0.4115) data time 0.0008 (0.0016) model time 0.3966 (0.4106) loss 7.9102 (7.4725) grad_norm 1.7293 (2.2980) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][540/625] eta 0:00:34 lr 0.000876 wd 0.0500 time 0.4077 (0.4113) data time 0.0006 (0.0016) model time 0.4070 (0.4104) loss 8.5688 (7.4783) grad_norm 1.5848 (2.2876) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][550/625] eta 0:00:30 lr 0.000876 wd 0.0500 time 0.6084 (0.4123) data time 0.0009 (0.0016) model time 0.6075 (0.4116) loss 6.2482 (7.4767) grad_norm 1.7272 (2.2796) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][560/625] eta 0:00:26 lr 0.000876 wd 0.0500 time 0.5972 (0.4145) data time 0.0008 (0.0016) model time 0.5964 (0.4139) loss 8.6871 (7.4852) grad_norm 1.6683 (2.2720) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][570/625] eta 0:00:22 lr 0.000876 wd 0.0500 time 0.4060 (0.4151) data time 0.0008 (0.0016) model time 0.4052 (0.4146) loss 8.2333 (7.4890) grad_norm 2.2776 (2.2740) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][580/625] eta 0:00:18 lr 0.000876 wd 0.0500 time 0.3979 (0.4149) data time 0.0006 (0.0016) model time 0.3973 (0.4143) loss 7.8888 (7.4928) grad_norm 2.5457 (2.2696) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][590/625] eta 0:00:14 lr 0.000876 wd 0.0500 time 0.3963 (0.4146) data time 0.0009 (0.0016) model time 0.3954 (0.4140) loss 6.1592 (7.4839) grad_norm 2.8991 (2.2762) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][600/625] eta 0:00:10 lr 0.000876 wd 0.0500 time 0.4058 (0.4147) data time 0.0007 (0.0016) model time 0.4051 (0.4141) loss 7.2053 (7.4877) grad_norm 2.2764 (2.2797) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][610/625] eta 0:00:06 lr 0.000876 wd 0.0500 time 0.3923 (0.4144) data time 0.0004 (0.0016) model time 0.3920 (0.4138) loss 7.0169 (7.4876) grad_norm 2.0548 (2.2845) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][620/625] eta 0:00:02 lr 0.000876 wd 0.0500 time 0.3967 (0.4142) data time 0.0004 (0.0016) model time 0.3963 (0.4136) loss 8.0428 (7.4939) grad_norm 1.8116 (2.2779) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:33:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 117 training takes 0:04:18 [2024-07-25 00:33:50 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:33:51 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:33:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.6387 (0.6387) Acc@1 87.207 (87.207) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 00:33:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 1.0273 (0.7820) Acc@1 77.539 (83.984) Acc@5 94.678 (97.048) Mem 14939MB [2024-07-25 00:33:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1289 (0.9251) Acc@1 73.975 (80.313) Acc@5 93.799 (95.464) Mem 14939MB [2024-07-25 00:33:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.946 Acc@5 95.435 [2024-07-25 00:33:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-07-25 00:33:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.832 (0.832) Loss 0.5908 (0.5908) Acc@1 88.623 (88.623) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 00:33:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.156) Loss 0.9629 (0.7365) Acc@1 79.248 (84.921) Acc@5 94.922 (97.314) Mem 14939MB [2024-07-25 00:33:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0967 (0.8719) Acc@1 73.828 (81.313) Acc@5 94.092 (95.861) Mem 14939MB [2024-07-25 00:33:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.008 Acc@5 95.825 [2024-07-25 00:33:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 00:33:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.01% [2024-07-25 00:33:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:33:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:33:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][0/625] eta 0:09:33 lr 0.000876 wd 0.0500 time 0.9182 (0.9182) data time 0.5213 (0.5213) model time 0.0000 (0.0000) loss 9.0490 (9.0490) grad_norm 2.8934 (2.8934) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][10/625] eta 0:04:35 lr 0.000876 wd 0.0500 time 0.4009 (0.4477) data time 0.0008 (0.0482) model time 0.0000 (0.0000) loss 8.5666 (7.4961) grad_norm 2.2805 (2.5338) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][20/625] eta 0:04:17 lr 0.000875 wd 0.0500 time 0.3963 (0.4256) data time 0.0007 (0.0257) model time 0.0000 (0.0000) loss 6.7800 (7.4716) grad_norm 1.8041 (2.4466) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][30/625] eta 0:04:08 lr 0.000875 wd 0.0500 time 0.4177 (0.4185) data time 0.0008 (0.0177) model time 0.0000 (0.0000) loss 8.2860 (7.6481) grad_norm 4.2673 (2.4721) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][40/625] eta 0:04:02 lr 0.000875 wd 0.0500 time 0.3970 (0.4140) data time 0.0006 (0.0136) model time 0.0000 (0.0000) loss 7.8460 (7.6430) grad_norm 10.3704 (2.8367) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][50/625] eta 0:03:56 lr 0.000875 wd 0.0500 time 0.3934 (0.4115) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 6.6824 (7.5701) grad_norm 2.0569 (2.7372) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][60/625] eta 0:03:51 lr 0.000875 wd 0.0500 time 0.4099 (0.4098) data time 0.0006 (0.0094) model time 0.4093 (0.4002) loss 7.4241 (7.5455) grad_norm 2.2471 (2.5935) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][70/625] eta 0:03:46 lr 0.000875 wd 0.0500 time 0.4069 (0.4084) data time 0.0009 (0.0082) model time 0.4059 (0.3997) loss 7.5733 (7.5251) grad_norm 2.0702 (2.5366) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][80/625] eta 0:03:42 lr 0.000875 wd 0.0500 time 0.3989 (0.4074) data time 0.0009 (0.0073) model time 0.3980 (0.3995) loss 6.9380 (7.5185) grad_norm 2.0344 (2.5093) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][90/625] eta 0:03:37 lr 0.000875 wd 0.0500 time 0.4089 (0.4068) data time 0.0006 (0.0066) model time 0.4083 (0.3998) loss 6.7826 (7.5217) grad_norm 1.7775 (2.4584) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][100/625] eta 0:03:33 lr 0.000875 wd 0.0500 time 0.3959 (0.4061) data time 0.0006 (0.0060) model time 0.3953 (0.3996) loss 6.8563 (7.5392) grad_norm 1.8663 (2.4457) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][110/625] eta 0:03:28 lr 0.000875 wd 0.0500 time 0.3998 (0.4055) data time 0.0007 (0.0056) model time 0.3991 (0.3994) loss 6.0974 (7.4852) grad_norm 3.4515 (2.4566) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][120/625] eta 0:03:24 lr 0.000875 wd 0.0500 time 0.4138 (0.4055) data time 0.0008 (0.0052) model time 0.4130 (0.4002) loss 8.7862 (7.4819) grad_norm 2.4273 (2.4302) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][130/625] eta 0:03:21 lr 0.000874 wd 0.0500 time 0.3928 (0.4065) data time 0.0007 (0.0049) model time 0.3921 (0.4025) loss 9.3264 (7.4549) grad_norm 2.7200 (2.4355) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][140/625] eta 0:03:17 lr 0.000874 wd 0.0500 time 0.5824 (0.4075) data time 0.0007 (0.0046) model time 0.5818 (0.4043) loss 8.6056 (7.4479) grad_norm 1.5871 (2.4052) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:34:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][150/625] eta 0:03:15 lr 0.000874 wd 0.0500 time 0.6108 (0.4118) data time 0.0008 (0.0043) model time 0.6100 (0.4111) loss 7.5520 (7.4540) grad_norm 2.3936 (2.3885) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][160/625] eta 0:03:13 lr 0.000874 wd 0.0500 time 0.4007 (0.4170) data time 0.0006 (0.0041) model time 0.4001 (0.4187) loss 7.3459 (7.4697) grad_norm 2.0269 (2.4064) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][170/625] eta 0:03:09 lr 0.000874 wd 0.0500 time 0.4208 (0.4174) data time 0.0008 (0.0039) model time 0.4200 (0.4191) loss 7.3929 (7.4635) grad_norm 1.9182 (2.4103) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][180/625] eta 0:03:05 lr 0.000874 wd 0.0500 time 0.3941 (0.4165) data time 0.0006 (0.0038) model time 0.3935 (0.4176) loss 7.1143 (7.4598) grad_norm 1.7580 (2.4338) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][190/625] eta 0:03:00 lr 0.000874 wd 0.0500 time 0.3995 (0.4158) data time 0.0006 (0.0036) model time 0.3989 (0.4165) loss 6.3662 (7.4718) grad_norm 2.0006 (2.4652) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][200/625] eta 0:02:56 lr 0.000874 wd 0.0500 time 0.4329 (0.4155) data time 0.0007 (0.0035) model time 0.4322 (0.4159) loss 6.6897 (7.4890) grad_norm 2.2716 (2.4582) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][210/625] eta 0:02:52 lr 0.000874 wd 0.0500 time 0.3946 (0.4149) data time 0.0008 (0.0034) model time 0.3938 (0.4151) loss 8.1631 (7.4818) grad_norm 1.9027 (2.4499) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][220/625] eta 0:02:47 lr 0.000874 wd 0.0500 time 0.3988 (0.4144) data time 0.0009 (0.0032) model time 0.3979 (0.4144) loss 8.9293 (7.4981) grad_norm 1.7181 (2.4378) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][230/625] eta 0:02:43 lr 0.000873 wd 0.0500 time 0.4220 (0.4141) data time 0.0008 (0.0031) model time 0.4212 (0.4139) loss 7.9652 (7.4976) grad_norm 2.8391 (2.4488) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][240/625] eta 0:02:39 lr 0.000873 wd 0.0500 time 0.3977 (0.4136) data time 0.0008 (0.0031) model time 0.3969 (0.4133) loss 8.0197 (7.4854) grad_norm 3.6678 (2.4418) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][250/625] eta 0:02:34 lr 0.000873 wd 0.0500 time 0.3997 (0.4132) data time 0.0006 (0.0030) model time 0.3991 (0.4127) loss 5.4025 (7.4698) grad_norm 1.5977 (2.4364) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][260/625] eta 0:02:30 lr 0.000873 wd 0.0500 time 0.4159 (0.4132) data time 0.0008 (0.0029) model time 0.4151 (0.4127) loss 7.0908 (7.4816) grad_norm 2.0844 (2.4333) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][270/625] eta 0:02:26 lr 0.000873 wd 0.0500 time 0.3945 (0.4127) data time 0.0008 (0.0028) model time 0.3937 (0.4121) loss 7.0904 (7.4676) grad_norm 2.0749 (2.4525) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][280/625] eta 0:02:22 lr 0.000873 wd 0.0500 time 0.3987 (0.4124) data time 0.0007 (0.0028) model time 0.3980 (0.4117) loss 8.6522 (7.4708) grad_norm 1.9624 (2.4402) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:35:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][290/625] eta 0:02:18 lr 0.000873 wd 0.0500 time 0.4168 (0.4122) data time 0.0006 (0.0027) model time 0.4161 (0.4114) loss 9.1129 (7.4694) grad_norm 2.7016 (2.4369) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][300/625] eta 0:02:13 lr 0.000873 wd 0.0500 time 0.3949 (0.4118) data time 0.0007 (0.0026) model time 0.3943 (0.4110) loss 8.0942 (7.4710) grad_norm 1.7234 (2.4244) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][310/625] eta 0:02:09 lr 0.000873 wd 0.0500 time 0.3954 (0.4116) data time 0.0006 (0.0026) model time 0.3947 (0.4106) loss 6.8727 (7.4611) grad_norm 2.2626 (2.4284) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][320/625] eta 0:02:05 lr 0.000873 wd 0.0500 time 0.4102 (0.4112) data time 0.0009 (0.0025) model time 0.4094 (0.4103) loss 7.6799 (7.4750) grad_norm 2.1217 (2.4247) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][330/625] eta 0:02:01 lr 0.000873 wd 0.0500 time 0.3965 (0.4109) data time 0.0008 (0.0025) model time 0.3957 (0.4099) loss 8.5989 (7.4704) grad_norm 2.0462 (2.4115) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][340/625] eta 0:01:57 lr 0.000872 wd 0.0500 time 0.3961 (0.4107) data time 0.0006 (0.0024) model time 0.3955 (0.4096) loss 8.0064 (7.4784) grad_norm 2.5753 (2.4016) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][350/625] eta 0:01:53 lr 0.000872 wd 0.0500 time 0.6029 (0.4110) data time 0.0006 (0.0024) model time 0.6024 (0.4101) loss 8.3119 (7.4852) grad_norm 1.9106 (2.3980) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][360/625] eta 0:01:48 lr 0.000872 wd 0.0500 time 0.3951 (0.4108) data time 0.0007 (0.0023) model time 0.3945 (0.4098) loss 8.8006 (7.4921) grad_norm 1.4416 (2.3833) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][370/625] eta 0:01:45 lr 0.000872 wd 0.0500 time 0.6038 (0.4135) data time 0.0008 (0.0023) model time 0.6030 (0.4130) loss 8.0589 (7.4859) grad_norm 2.0521 (2.3867) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][380/625] eta 0:01:42 lr 0.000872 wd 0.0500 time 0.5258 (0.4166) data time 0.0008 (0.0023) model time 0.5250 (0.4165) loss 6.9062 (7.4915) grad_norm 2.0243 (2.3783) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][390/625] eta 0:01:37 lr 0.000872 wd 0.0500 time 0.4096 (0.4166) data time 0.0007 (0.0022) model time 0.4089 (0.4165) loss 9.0353 (7.4976) grad_norm 1.8748 (2.3757) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][400/625] eta 0:01:33 lr 0.000872 wd 0.0500 time 0.3951 (0.4163) data time 0.0006 (0.0022) model time 0.3944 (0.4161) loss 6.4627 (7.5006) grad_norm 1.7032 (2.3838) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][410/625] eta 0:01:29 lr 0.000872 wd 0.0500 time 0.4026 (0.4160) data time 0.0008 (0.0022) model time 0.4018 (0.4157) loss 8.5602 (7.4998) grad_norm 3.8732 (2.3899) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][420/625] eta 0:01:25 lr 0.000872 wd 0.0500 time 0.4113 (0.4157) data time 0.0006 (0.0021) model time 0.4107 (0.4154) loss 7.4924 (7.4956) grad_norm 3.5437 (2.3889) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:36:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][430/625] eta 0:01:21 lr 0.000872 wd 0.0500 time 0.3936 (0.4154) data time 0.0006 (0.0021) model time 0.3929 (0.4150) loss 6.5041 (7.4892) grad_norm 2.6825 (2.3873) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][440/625] eta 0:01:16 lr 0.000871 wd 0.0500 time 0.4062 (0.4151) data time 0.0008 (0.0021) model time 0.4054 (0.4147) loss 8.2019 (7.4980) grad_norm 1.6121 (2.3773) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][450/625] eta 0:01:12 lr 0.000871 wd 0.0500 time 0.4071 (0.4149) data time 0.0008 (0.0021) model time 0.4063 (0.4144) loss 5.9026 (7.4865) grad_norm 3.1712 (2.3848) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][460/625] eta 0:01:08 lr 0.000871 wd 0.0500 time 0.3958 (0.4146) data time 0.0006 (0.0020) model time 0.3952 (0.4141) loss 7.3642 (7.4862) grad_norm 2.0485 (2.3898) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][470/625] eta 0:01:04 lr 0.000871 wd 0.0500 time 0.4068 (0.4144) data time 0.0006 (0.0020) model time 0.4061 (0.4139) loss 7.7933 (7.4791) grad_norm 1.4774 (2.3786) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][480/625] eta 0:01:00 lr 0.000871 wd 0.0500 time 0.4213 (0.4142) data time 0.0006 (0.0020) model time 0.4207 (0.4136) loss 6.7590 (7.4794) grad_norm 2.5302 (2.3728) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][490/625] eta 0:00:55 lr 0.000871 wd 0.0500 time 0.3965 (0.4140) data time 0.0006 (0.0020) model time 0.3959 (0.4134) loss 7.4927 (7.4831) grad_norm 2.0739 (2.3776) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][500/625] eta 0:00:51 lr 0.000871 wd 0.0500 time 0.4044 (0.4138) data time 0.0009 (0.0019) model time 0.4035 (0.4131) loss 7.3705 (7.4850) grad_norm 1.8387 (2.3760) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][510/625] eta 0:00:47 lr 0.000871 wd 0.0500 time 0.4142 (0.4136) data time 0.0008 (0.0019) model time 0.4134 (0.4129) loss 7.2588 (7.4907) grad_norm 1.3925 (2.3683) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][520/625] eta 0:00:43 lr 0.000871 wd 0.0500 time 0.3930 (0.4134) data time 0.0007 (0.0019) model time 0.3924 (0.4127) loss 8.3102 (7.4909) grad_norm 2.8961 (2.3687) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][530/625] eta 0:00:39 lr 0.000871 wd 0.0500 time 0.4029 (0.4132) data time 0.0007 (0.0019) model time 0.4023 (0.4125) loss 7.0427 (7.4968) grad_norm 2.1016 (2.3595) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][540/625] eta 0:00:35 lr 0.000871 wd 0.0500 time 0.4145 (0.4131) data time 0.0010 (0.0019) model time 0.4135 (0.4123) loss 7.9784 (7.4984) grad_norm 1.6607 (2.3541) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][550/625] eta 0:00:30 lr 0.000870 wd 0.0500 time 0.3949 (0.4129) data time 0.0007 (0.0019) model time 0.3942 (0.4121) loss 8.1044 (7.5040) grad_norm 2.0503 (2.3469) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][560/625] eta 0:00:26 lr 0.000870 wd 0.0500 time 0.4056 (0.4128) data time 0.0007 (0.0018) model time 0.4050 (0.4120) loss 6.3130 (7.5032) grad_norm 1.6123 (2.3449) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][570/625] eta 0:00:22 lr 0.000870 wd 0.0500 time 0.4094 (0.4129) data time 0.0006 (0.0018) model time 0.4088 (0.4121) loss 6.6629 (7.4968) grad_norm 2.5250 (2.3402) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:37:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][580/625] eta 0:00:18 lr 0.000870 wd 0.0500 time 0.3941 (0.4127) data time 0.0006 (0.0018) model time 0.3934 (0.4119) loss 6.6877 (7.4879) grad_norm 4.0883 (2.3539) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][590/625] eta 0:00:14 lr 0.000870 wd 0.0500 time 0.5707 (0.4144) data time 0.0009 (0.0018) model time 0.5698 (0.4137) loss 7.0953 (7.4937) grad_norm 1.9693 (2.3497) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][600/625] eta 0:00:10 lr 0.000870 wd 0.0500 time 0.6110 (0.4158) data time 0.0007 (0.0018) model time 0.6104 (0.4153) loss 7.2923 (7.4895) grad_norm 3.1752 (2.3448) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][610/625] eta 0:00:06 lr 0.000870 wd 0.0500 time 0.4053 (0.4158) data time 0.0006 (0.0018) model time 0.4048 (0.4153) loss 7.0418 (7.4898) grad_norm 2.6252 (2.3434) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][620/625] eta 0:00:02 lr 0.000870 wd 0.0500 time 0.3929 (0.4155) data time 0.0006 (0.0018) model time 0.3923 (0.4150) loss 8.2682 (7.4898) grad_norm 2.8745 (2.3555) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 118 training takes 0:04:19 [2024-07-25 00:38:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:38:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:38:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.6064 (0.6064) Acc@1 87.988 (87.988) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 00:38:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.9971 (0.7690) Acc@1 78.125 (84.166) Acc@5 94.580 (97.053) Mem 14939MB [2024-07-25 00:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 1.1191 (0.9051) Acc@1 74.414 (80.729) Acc@5 93.896 (95.547) Mem 14939MB [2024-07-25 00:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.384 Acc@5 95.447 [2024-07-25 00:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-25 00:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.38% [2024-07-25 00:38:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 00:38:21 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 00:38:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5903 (0.5903) Acc@1 88.672 (88.672) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 00:38:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.9614 (0.7355) Acc@1 79.346 (84.934) Acc@5 95.020 (97.332) Mem 14939MB [2024-07-25 00:38:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0947 (0.8708) Acc@1 73.828 (81.336) Acc@5 94.043 (95.861) Mem 14939MB [2024-07-25 00:38:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.038 Acc@5 95.829 [2024-07-25 00:38:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 00:38:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.04% [2024-07-25 00:38:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:38:25 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:38:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][0/625] eta 0:08:46 lr 0.000870 wd 0.0500 time 0.8429 (0.8429) data time 0.4443 (0.4443) model time 0.0000 (0.0000) loss 7.2749 (7.2749) grad_norm 1.9980 (1.9980) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][10/625] eta 0:04:30 lr 0.000870 wd 0.0500 time 0.3945 (0.4399) data time 0.0006 (0.0411) model time 0.0000 (0.0000) loss 6.0369 (7.3358) grad_norm 1.7746 (2.3401) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][20/625] eta 0:04:14 lr 0.000870 wd 0.0500 time 0.3979 (0.4207) data time 0.0006 (0.0220) model time 0.0000 (0.0000) loss 7.2015 (7.1280) grad_norm 2.3438 (2.2917) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][30/625] eta 0:04:06 lr 0.000869 wd 0.0500 time 0.4154 (0.4148) data time 0.0006 (0.0152) model time 0.0000 (0.0000) loss 6.5122 (7.1989) grad_norm 2.3021 (2.2080) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][40/625] eta 0:04:00 lr 0.000869 wd 0.0500 time 0.3945 (0.4110) data time 0.0006 (0.0118) model time 0.0000 (0.0000) loss 6.4199 (7.2233) grad_norm 2.6136 (2.1878) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][50/625] eta 0:03:55 lr 0.000869 wd 0.0500 time 0.3974 (0.4090) data time 0.0007 (0.0096) model time 0.0000 (0.0000) loss 7.9785 (7.3037) grad_norm 1.9716 (2.1612) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][60/625] eta 0:03:51 lr 0.000869 wd 0.0500 time 0.3855 (0.4099) data time 0.0008 (0.0082) model time 0.3847 (0.4139) loss 6.1539 (7.3027) grad_norm 1.4196 (2.1990) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][70/625] eta 0:03:46 lr 0.000869 wd 0.0500 time 0.3984 (0.4085) data time 0.0008 (0.0072) model time 0.3977 (0.4061) loss 7.2195 (7.3338) grad_norm 2.6904 (2.2111) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:38:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][80/625] eta 0:03:42 lr 0.000869 wd 0.0500 time 0.3957 (0.4075) data time 0.0006 (0.0065) model time 0.3951 (0.4037) loss 9.1848 (7.4019) grad_norm 3.0862 (2.2726) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][90/625] eta 0:03:38 lr 0.000869 wd 0.0500 time 0.4081 (0.4075) data time 0.0010 (0.0059) model time 0.4071 (0.4044) loss 8.4232 (7.3540) grad_norm 2.3959 (2.2769) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][100/625] eta 0:03:33 lr 0.000869 wd 0.0500 time 0.3934 (0.4067) data time 0.0006 (0.0054) model time 0.3928 (0.4032) loss 7.7686 (7.3488) grad_norm 2.5370 (2.2624) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][110/625] eta 0:03:29 lr 0.000869 wd 0.0500 time 0.4056 (0.4062) data time 0.0008 (0.0051) model time 0.4049 (0.4026) loss 7.3136 (7.3445) grad_norm 1.8228 (2.2811) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][120/625] eta 0:03:24 lr 0.000869 wd 0.0500 time 0.4081 (0.4057) data time 0.0008 (0.0047) model time 0.4073 (0.4022) loss 8.2562 (7.3877) grad_norm 1.4031 (2.3288) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][130/625] eta 0:03:20 lr 0.000868 wd 0.0500 time 0.3948 (0.4053) data time 0.0007 (0.0045) model time 0.3941 (0.4019) loss 6.1391 (7.3984) grad_norm 1.7424 (2.3001) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][140/625] eta 0:03:16 lr 0.000868 wd 0.0500 time 0.3967 (0.4050) data time 0.0009 (0.0042) model time 0.3958 (0.4016) loss 7.1168 (7.3612) grad_norm 2.4236 (2.2754) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][150/625] eta 0:03:12 lr 0.000868 wd 0.0500 time 0.4066 (0.4048) data time 0.0008 (0.0040) model time 0.4058 (0.4015) loss 7.6972 (7.3680) grad_norm 1.9150 (2.3000) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][160/625] eta 0:03:08 lr 0.000868 wd 0.0500 time 0.3961 (0.4045) data time 0.0008 (0.0038) model time 0.3953 (0.4013) loss 7.1290 (7.3610) grad_norm 1.8777 (2.2913) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][170/625] eta 0:03:03 lr 0.000868 wd 0.0500 time 0.4008 (0.4044) data time 0.0008 (0.0037) model time 0.4001 (0.4013) loss 6.2361 (7.3838) grad_norm 1.4792 (2.2717) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][180/625] eta 0:03:00 lr 0.000868 wd 0.0500 time 0.5939 (0.4058) data time 0.0008 (0.0035) model time 0.5932 (0.4035) loss 8.0785 (7.3718) grad_norm 2.4833 (2.2806) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][190/625] eta 0:03:00 lr 0.000868 wd 0.0500 time 0.5538 (0.4146) data time 0.0005 (0.0034) model time 0.5533 (0.4155) loss 6.1169 (7.3555) grad_norm 2.7494 (2.2945) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][200/625] eta 0:02:57 lr 0.000868 wd 0.0500 time 0.3968 (0.4184) data time 0.0008 (0.0032) model time 0.3960 (0.4206) loss 8.2101 (7.3621) grad_norm 1.9563 (2.3017) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][210/625] eta 0:02:53 lr 0.000868 wd 0.0500 time 0.4028 (0.4177) data time 0.0008 (0.0031) model time 0.4020 (0.4194) loss 7.4646 (7.3700) grad_norm 1.6113 (2.2854) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:39:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][220/625] eta 0:02:48 lr 0.000868 wd 0.0500 time 0.4063 (0.4170) data time 0.0008 (0.0030) model time 0.4056 (0.4183) loss 6.7885 (7.3617) grad_norm 2.4623 (2.2773) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][230/625] eta 0:02:44 lr 0.000868 wd 0.0500 time 0.3965 (0.4163) data time 0.0006 (0.0029) model time 0.3960 (0.4173) loss 6.6636 (7.3707) grad_norm 2.9248 (2.2811) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][240/625] eta 0:02:40 lr 0.000867 wd 0.0500 time 0.3998 (0.4158) data time 0.0006 (0.0028) model time 0.3991 (0.4166) loss 6.6505 (7.3593) grad_norm 1.8750 (2.2838) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][250/625] eta 0:02:35 lr 0.000867 wd 0.0500 time 0.4048 (0.4152) data time 0.0008 (0.0028) model time 0.4040 (0.4157) loss 8.6780 (7.3755) grad_norm 3.3402 (2.2880) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][260/625] eta 0:02:31 lr 0.000867 wd 0.0500 time 0.3959 (0.4146) data time 0.0009 (0.0027) model time 0.3950 (0.4150) loss 8.0418 (7.3922) grad_norm 2.2040 (2.2986) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][270/625] eta 0:02:26 lr 0.000867 wd 0.0500 time 0.4014 (0.4141) data time 0.0006 (0.0026) model time 0.4008 (0.4142) loss 7.8445 (7.3923) grad_norm 2.4165 (2.2999) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][280/625] eta 0:02:22 lr 0.000867 wd 0.0500 time 0.6043 (0.4143) data time 0.0008 (0.0026) model time 0.6035 (0.4145) loss 6.9676 (7.3832) grad_norm 1.8940 (2.3057) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][290/625] eta 0:02:18 lr 0.000867 wd 0.0500 time 0.3967 (0.4139) data time 0.0006 (0.0025) model time 0.3961 (0.4139) loss 6.6331 (7.3801) grad_norm 2.1094 (2.3179) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][300/625] eta 0:02:14 lr 0.000867 wd 0.0500 time 0.3998 (0.4135) data time 0.0008 (0.0025) model time 0.3990 (0.4134) loss 8.4614 (7.3877) grad_norm 3.2983 (2.3593) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][310/625] eta 0:02:10 lr 0.000867 wd 0.0500 time 0.4011 (0.4131) data time 0.0008 (0.0024) model time 0.4003 (0.4129) loss 8.2365 (7.3947) grad_norm 3.9355 (2.3812) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][320/625] eta 0:02:05 lr 0.000867 wd 0.0500 time 0.3996 (0.4128) data time 0.0006 (0.0024) model time 0.3990 (0.4125) loss 7.1144 (7.3956) grad_norm 2.0077 (2.3794) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][330/625] eta 0:02:01 lr 0.000867 wd 0.0500 time 0.3995 (0.4124) data time 0.0007 (0.0023) model time 0.3988 (0.4120) loss 6.9477 (7.3957) grad_norm 2.0407 (2.3725) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][340/625] eta 0:01:57 lr 0.000866 wd 0.0500 time 0.4013 (0.4121) data time 0.0008 (0.0023) model time 0.4005 (0.4117) loss 6.8527 (7.4001) grad_norm 4.7328 (2.3710) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][350/625] eta 0:01:53 lr 0.000866 wd 0.0500 time 0.4044 (0.4118) data time 0.0006 (0.0022) model time 0.4038 (0.4113) loss 6.4941 (7.3974) grad_norm 3.6512 (2.3789) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][360/625] eta 0:01:49 lr 0.000866 wd 0.0500 time 0.4076 (0.4116) data time 0.0007 (0.0022) model time 0.4069 (0.4110) loss 6.3438 (7.3996) grad_norm 1.8676 (2.3830) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:40:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][370/625] eta 0:01:44 lr 0.000866 wd 0.0500 time 0.4049 (0.4113) data time 0.0007 (0.0022) model time 0.4042 (0.4107) loss 8.0883 (7.4117) grad_norm 2.1769 (2.3852) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:41:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][380/625] eta 0:01:40 lr 0.000866 wd 0.0500 time 0.3988 (0.4110) data time 0.0008 (0.0021) model time 0.3980 (0.4104) loss 6.8864 (7.4138) grad_norm 1.6207 (2.3802) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:41:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][390/625] eta 0:01:36 lr 0.000866 wd 0.0500 time 0.3996 (0.4108) data time 0.0008 (0.0021) model time 0.3988 (0.4101) loss 6.6489 (7.4113) grad_norm 1.8073 (2.3685) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][400/625] eta 0:01:32 lr 0.000866 wd 0.0500 time 0.5399 (0.4112) data time 0.0008 (0.0021) model time 0.5391 (0.4105) loss 6.8831 (7.4176) grad_norm 1.5287 (2.3564) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 00:41:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][410/625] eta 0:01:29 lr 0.000866 wd 0.0500 time 0.5950 (0.4147) data time 0.0007 (0.0021) model time 0.5943 (0.4145) loss 6.1158 (7.4261) grad_norm 1.7285 (2.3458) loss_scale 1024.0000 (515.7372) mem 14939MB [2024-07-25 00:41:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][420/625] eta 0:01:25 lr 0.000866 wd 0.0500 time 0.4038 (0.4158) data time 0.0006 (0.0020) model time 0.4032 (0.4157) loss 7.8576 (7.4438) grad_norm 1.6260 (2.3374) loss_scale 1024.0000 (527.8100) mem 14939MB [2024-07-25 00:41:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][430/625] eta 0:01:21 lr 0.000866 wd 0.0500 time 0.4088 (0.4154) data time 0.0006 (0.0020) model time 0.4081 (0.4153) loss 8.3275 (7.4399) grad_norm 2.3310 (2.3322) loss_scale 1024.0000 (539.3225) mem 14939MB [2024-07-25 00:41:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][440/625] eta 0:01:16 lr 0.000866 wd 0.0500 time 0.3941 (0.4151) data time 0.0009 (0.0020) model time 0.3932 (0.4149) loss 8.1774 (7.4413) grad_norm 1.5752 (2.3280) loss_scale 1024.0000 (550.3129) mem 14939MB [2024-07-25 00:41:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][450/625] eta 0:01:12 lr 0.000865 wd 0.0500 time 0.3979 (0.4147) data time 0.0009 (0.0020) model time 0.3970 (0.4145) loss 9.1863 (7.4578) grad_norm 1.8819 (2.3263) loss_scale 1024.0000 (560.8160) mem 14939MB [2024-07-25 00:41:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][460/625] eta 0:01:08 lr 0.000865 wd 0.0500 time 0.4112 (0.4144) data time 0.0008 (0.0019) model time 0.4103 (0.4141) loss 7.8879 (7.4561) grad_norm 1.4566 (2.3175) loss_scale 1024.0000 (570.8633) mem 14939MB [2024-07-25 00:41:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][470/625] eta 0:01:04 lr 0.000865 wd 0.0500 time 0.3950 (0.4141) data time 0.0009 (0.0019) model time 0.3941 (0.4138) loss 7.2849 (7.4535) grad_norm 2.5528 (2.3243) loss_scale 1024.0000 (580.4841) mem 14939MB [2024-07-25 00:41:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][480/625] eta 0:01:00 lr 0.000865 wd 0.0500 time 0.3960 (0.4139) data time 0.0009 (0.0019) model time 0.3951 (0.4135) loss 8.0823 (7.4534) grad_norm 1.8756 (2.3232) loss_scale 1024.0000 (589.7048) mem 14939MB [2024-07-25 00:41:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][490/625] eta 0:00:55 lr 0.000865 wd 0.0500 time 0.4070 (0.4136) data time 0.0008 (0.0019) model time 0.4061 (0.4132) loss 5.8830 (7.4516) grad_norm 1.3552 (2.3120) loss_scale 1024.0000 (598.5499) mem 14939MB [2024-07-25 00:41:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][500/625] eta 0:00:51 lr 0.000865 wd 0.0500 time 0.3962 (0.4137) data time 0.0006 (0.0019) model time 0.3955 (0.4133) loss 7.7877 (7.4518) grad_norm 2.6121 (2.3210) loss_scale 1024.0000 (607.0419) mem 14939MB [2024-07-25 00:41:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][510/625] eta 0:00:47 lr 0.000865 wd 0.0500 time 0.4000 (0.4135) data time 0.0009 (0.0018) model time 0.3991 (0.4130) loss 6.7142 (7.4512) grad_norm 1.9842 (2.3286) loss_scale 1024.0000 (615.2016) mem 14939MB [2024-07-25 00:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][520/625] eta 0:00:43 lr 0.000865 wd 0.0500 time 0.4061 (0.4133) data time 0.0006 (0.0018) model time 0.4055 (0.4128) loss 6.9295 (7.4505) grad_norm 1.3299 (2.3209) loss_scale 1024.0000 (623.0480) mem 14939MB [2024-07-25 00:42:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][530/625] eta 0:00:39 lr 0.000865 wd 0.0500 time 0.3953 (0.4130) data time 0.0007 (0.0018) model time 0.3946 (0.4125) loss 7.9726 (7.4574) grad_norm 3.1591 (2.3166) loss_scale 1024.0000 (630.5989) mem 14939MB [2024-07-25 00:42:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][540/625] eta 0:00:35 lr 0.000865 wd 0.0500 time 0.3976 (0.4128) data time 0.0008 (0.0018) model time 0.3968 (0.4122) loss 6.9807 (7.4616) grad_norm 1.9788 (2.3121) loss_scale 1024.0000 (637.8706) mem 14939MB [2024-07-25 00:42:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][550/625] eta 0:00:30 lr 0.000864 wd 0.0500 time 0.4118 (0.4126) data time 0.0007 (0.0018) model time 0.4111 (0.4120) loss 6.9875 (7.4633) grad_norm 3.8207 (2.3179) loss_scale 1024.0000 (644.8784) mem 14939MB [2024-07-25 00:42:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][560/625] eta 0:00:26 lr 0.000864 wd 0.0500 time 0.3948 (0.4124) data time 0.0006 (0.0018) model time 0.3942 (0.4118) loss 7.5589 (7.4620) grad_norm 4.1245 (2.3208) loss_scale 1024.0000 (651.6364) mem 14939MB [2024-07-25 00:42:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][570/625] eta 0:00:22 lr 0.000864 wd 0.0500 time 0.3971 (0.4122) data time 0.0008 (0.0018) model time 0.3962 (0.4115) loss 7.2739 (7.4593) grad_norm 1.7742 (2.3270) loss_scale 1024.0000 (658.1576) mem 14939MB [2024-07-25 00:42:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][580/625] eta 0:00:18 lr 0.000864 wd 0.0500 time 0.4040 (0.4120) data time 0.0006 (0.0017) model time 0.4034 (0.4113) loss 6.6021 (7.4600) grad_norm 1.7674 (2.3334) loss_scale 1024.0000 (664.4544) mem 14939MB [2024-07-25 00:42:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][590/625] eta 0:00:14 lr 0.000864 wd 0.0500 time 0.3927 (0.4118) data time 0.0007 (0.0017) model time 0.3920 (0.4111) loss 6.9187 (7.4610) grad_norm 2.2097 (2.3340) loss_scale 1024.0000 (670.5381) mem 14939MB [2024-07-25 00:42:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][600/625] eta 0:00:10 lr 0.000864 wd 0.0500 time 0.4007 (0.4116) data time 0.0008 (0.0017) model time 0.3999 (0.4109) loss 7.4751 (7.4651) grad_norm 2.4379 (2.3356) loss_scale 1024.0000 (676.4193) mem 14939MB [2024-07-25 00:42:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][610/625] eta 0:00:06 lr 0.000864 wd 0.0500 time 0.4077 (0.4115) data time 0.0004 (0.0017) model time 0.4073 (0.4107) loss 8.0358 (7.4640) grad_norm 3.5439 (2.3311) loss_scale 1024.0000 (682.1080) mem 14939MB [2024-07-25 00:42:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][620/625] eta 0:00:02 lr 0.000864 wd 0.0500 time 0.5506 (0.4115) data time 0.0006 (0.0017) model time 0.5500 (0.4108) loss 7.9446 (7.4705) grad_norm 1.8232 (2.3346) loss_scale 1024.0000 (687.6135) mem 14939MB [2024-07-25 00:42:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 119 training takes 0:04:17 [2024-07-25 00:42:42 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:42:43 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:42:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.6128 (0.6128) Acc@1 87.939 (87.939) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 00:42:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9902 (0.7587) Acc@1 78.613 (84.091) Acc@5 95.020 (97.106) Mem 14939MB [2024-07-25 00:42:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0898 (0.8965) Acc@1 74.414 (80.655) Acc@5 93.799 (95.531) Mem 14939MB [2024-07-25 00:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.294 Acc@5 95.489 [2024-07-25 00:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-25 00:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.862 (0.862) Loss 0.5889 (0.5889) Acc@1 88.721 (88.721) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 00:42:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.159) Loss 0.9600 (0.7345) Acc@1 79.590 (84.943) Acc@5 95.068 (97.332) Mem 14939MB [2024-07-25 00:42:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 1.0947 (0.8694) Acc@1 74.023 (81.373) Acc@5 93.994 (95.871) Mem 14939MB [2024-07-25 00:42:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.054 Acc@5 95.843 [2024-07-25 00:42:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 00:42:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.05% [2024-07-25 00:42:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:42:49 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:42:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][0/625] eta 0:07:54 lr 0.000864 wd 0.0500 time 0.7599 (0.7599) data time 0.3768 (0.3768) model time 0.0000 (0.0000) loss 7.2583 (7.2583) grad_norm 2.4078 (2.4078) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:42:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][10/625] eta 0:05:26 lr 0.000864 wd 0.0500 time 0.4047 (0.5303) data time 0.0006 (0.0353) model time 0.0000 (0.0000) loss 6.6030 (6.8994) grad_norm 1.5351 (1.8540) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:42:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][20/625] eta 0:04:47 lr 0.000864 wd 0.0500 time 0.3934 (0.4747) data time 0.0008 (0.0189) model time 0.0000 (0.0000) loss 8.2407 (7.3974) grad_norm 1.4190 (1.8348) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][30/625] eta 0:04:28 lr 0.000863 wd 0.0500 time 0.3995 (0.4511) data time 0.0006 (0.0131) model time 0.0000 (0.0000) loss 6.8457 (7.4649) grad_norm 3.5123 (1.9307) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][40/625] eta 0:04:18 lr 0.000863 wd 0.0500 time 0.4180 (0.4422) data time 0.0008 (0.0102) model time 0.0000 (0.0000) loss 8.2508 (7.5116) grad_norm 1.7380 (1.9563) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][50/625] eta 0:04:09 lr 0.000863 wd 0.0500 time 0.3958 (0.4344) data time 0.0006 (0.0083) model time 0.0000 (0.0000) loss 7.2470 (7.5508) grad_norm 1.9294 (1.9282) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][60/625] eta 0:04:02 lr 0.000863 wd 0.0500 time 0.4004 (0.4293) data time 0.0008 (0.0071) model time 0.3996 (0.4020) loss 6.4390 (7.4832) grad_norm 3.4322 (2.0346) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][70/625] eta 0:03:56 lr 0.000863 wd 0.0500 time 0.4051 (0.4253) data time 0.0008 (0.0062) model time 0.4044 (0.4012) loss 7.4218 (7.4640) grad_norm 3.8631 (2.1077) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][80/625] eta 0:03:50 lr 0.000863 wd 0.0500 time 0.3965 (0.4227) data time 0.0009 (0.0056) model time 0.3957 (0.4017) loss 6.9271 (7.4611) grad_norm 2.0986 (2.1297) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][90/625] eta 0:03:44 lr 0.000863 wd 0.0500 time 0.3993 (0.4205) data time 0.0006 (0.0051) model time 0.3987 (0.4019) loss 6.8141 (7.4503) grad_norm 1.8794 (2.0886) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][100/625] eta 0:03:39 lr 0.000863 wd 0.0500 time 0.4069 (0.4189) data time 0.0006 (0.0047) model time 0.4063 (0.4020) loss 9.0774 (7.4863) grad_norm 3.1527 (2.0935) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][110/625] eta 0:03:35 lr 0.000863 wd 0.0500 time 0.4073 (0.4176) data time 0.0008 (0.0043) model time 0.4065 (0.4023) loss 9.4668 (7.4959) grad_norm 1.4230 (2.1229) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][120/625] eta 0:03:30 lr 0.000863 wd 0.0500 time 0.4039 (0.4164) data time 0.0008 (0.0041) model time 0.4031 (0.4024) loss 9.0353 (7.4971) grad_norm 1.8168 (2.1454) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][130/625] eta 0:03:25 lr 0.000862 wd 0.0500 time 0.4106 (0.4154) data time 0.0008 (0.0038) model time 0.4099 (0.4023) loss 7.7451 (7.4935) grad_norm 3.0626 (2.1440) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][140/625] eta 0:03:20 lr 0.000862 wd 0.0500 time 0.3964 (0.4144) data time 0.0006 (0.0036) model time 0.3958 (0.4021) loss 6.2407 (7.4749) grad_norm 2.7246 (2.1691) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][150/625] eta 0:03:16 lr 0.000862 wd 0.0500 time 0.3988 (0.4135) data time 0.0007 (0.0034) model time 0.3980 (0.4019) loss 5.8287 (7.4536) grad_norm 2.7725 (2.2461) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:43:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][160/625] eta 0:03:11 lr 0.000862 wd 0.0500 time 0.4079 (0.4128) data time 0.0008 (0.0033) model time 0.4071 (0.4018) loss 6.5285 (7.4677) grad_norm 1.5145 (2.2447) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][170/625] eta 0:03:07 lr 0.000862 wd 0.0500 time 0.3933 (0.4120) data time 0.0006 (0.0032) model time 0.3927 (0.4015) loss 7.5560 (7.4698) grad_norm 2.8056 (2.2405) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][180/625] eta 0:03:03 lr 0.000862 wd 0.0500 time 0.4078 (0.4116) data time 0.0008 (0.0031) model time 0.4070 (0.4016) loss 5.9840 (7.4533) grad_norm 1.7290 (2.2758) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][190/625] eta 0:02:58 lr 0.000862 wd 0.0500 time 0.4081 (0.4112) data time 0.0008 (0.0030) model time 0.4073 (0.4017) loss 7.4948 (7.4349) grad_norm 2.3960 (2.2580) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][200/625] eta 0:02:54 lr 0.000862 wd 0.0500 time 0.3935 (0.4108) data time 0.0009 (0.0029) model time 0.3927 (0.4017) loss 7.4984 (7.4392) grad_norm 1.6437 (2.2512) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][210/625] eta 0:02:50 lr 0.000862 wd 0.0500 time 0.4014 (0.4105) data time 0.0008 (0.0028) model time 0.4006 (0.4018) loss 7.1360 (7.4478) grad_norm 3.0561 (2.2474) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][220/625] eta 0:02:47 lr 0.000862 wd 0.0500 time 0.5923 (0.4144) data time 0.0007 (0.0027) model time 0.5916 (0.4074) loss 8.4140 (7.4510) grad_norm 2.9175 (2.2559) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][230/625] eta 0:02:45 lr 0.000862 wd 0.0500 time 0.3958 (0.4185) data time 0.0006 (0.0026) model time 0.3952 (0.4130) loss 7.0759 (7.4283) grad_norm 4.8892 (2.2673) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][240/625] eta 0:02:41 lr 0.000861 wd 0.0500 time 0.3988 (0.4196) data time 0.0007 (0.0025) model time 0.3981 (0.4147) loss 7.9050 (7.4371) grad_norm 3.2507 (2.2721) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][250/625] eta 0:02:37 lr 0.000861 wd 0.0500 time 0.4079 (0.4190) data time 0.0008 (0.0025) model time 0.4072 (0.4141) loss 7.8815 (7.4318) grad_norm 2.2231 (2.2708) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][260/625] eta 0:02:32 lr 0.000861 wd 0.0500 time 0.3968 (0.4189) data time 0.0009 (0.0024) model time 0.3959 (0.4141) loss 7.6617 (7.4238) grad_norm 1.9882 (2.2802) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][270/625] eta 0:02:28 lr 0.000861 wd 0.0500 time 0.3988 (0.4183) data time 0.0009 (0.0024) model time 0.3979 (0.4135) loss 7.4145 (7.4275) grad_norm 2.7692 (2.2961) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][280/625] eta 0:02:24 lr 0.000861 wd 0.0500 time 0.4133 (0.4177) data time 0.0009 (0.0023) model time 0.4124 (0.4130) loss 7.5386 (7.4249) grad_norm 2.1236 (2.2880) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][290/625] eta 0:02:19 lr 0.000861 wd 0.0500 time 0.3982 (0.4171) data time 0.0006 (0.0023) model time 0.3975 (0.4125) loss 8.8839 (7.4142) grad_norm 1.9145 (2.3030) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][300/625] eta 0:02:15 lr 0.000861 wd 0.0500 time 0.3966 (0.4166) data time 0.0006 (0.0022) model time 0.3960 (0.4120) loss 7.4348 (7.4235) grad_norm 1.8383 (2.2939) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:44:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][310/625] eta 0:02:11 lr 0.000861 wd 0.0500 time 0.4123 (0.4162) data time 0.0006 (0.0022) model time 0.4117 (0.4116) loss 6.4295 (7.4089) grad_norm 3.5315 (2.2955) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][320/625] eta 0:02:06 lr 0.000861 wd 0.0500 time 0.3952 (0.4157) data time 0.0009 (0.0022) model time 0.3943 (0.4112) loss 6.8689 (7.4005) grad_norm 1.9567 (2.2912) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][330/625] eta 0:02:02 lr 0.000861 wd 0.0500 time 0.3960 (0.4153) data time 0.0007 (0.0021) model time 0.3952 (0.4109) loss 6.2797 (7.3896) grad_norm 1.9518 (2.3086) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][340/625] eta 0:01:58 lr 0.000860 wd 0.0500 time 0.4274 (0.4151) data time 0.0008 (0.0021) model time 0.4266 (0.4107) loss 6.7822 (7.3870) grad_norm 3.0508 (2.3145) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][350/625] eta 0:01:54 lr 0.000860 wd 0.0500 time 0.4021 (0.4147) data time 0.0008 (0.0021) model time 0.4013 (0.4104) loss 6.5041 (7.3944) grad_norm 2.1413 (2.3087) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][360/625] eta 0:01:49 lr 0.000860 wd 0.0500 time 0.4008 (0.4144) data time 0.0009 (0.0020) model time 0.3999 (0.4101) loss 7.9700 (7.4144) grad_norm 2.1912 (2.2967) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][370/625] eta 0:01:45 lr 0.000860 wd 0.0500 time 0.4059 (0.4140) data time 0.0008 (0.0020) model time 0.4051 (0.4098) loss 8.1397 (7.4115) grad_norm 4.6614 (2.3042) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][380/625] eta 0:01:41 lr 0.000860 wd 0.0500 time 0.3950 (0.4137) data time 0.0006 (0.0020) model time 0.3944 (0.4096) loss 7.4340 (7.4134) grad_norm 3.4552 (2.3268) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][390/625] eta 0:01:37 lr 0.000860 wd 0.0500 time 0.3977 (0.4134) data time 0.0006 (0.0019) model time 0.3971 (0.4093) loss 7.4609 (7.4102) grad_norm 2.1093 (2.3260) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][400/625] eta 0:01:32 lr 0.000860 wd 0.0500 time 0.4019 (0.4131) data time 0.0008 (0.0019) model time 0.4010 (0.4090) loss 8.1957 (7.4184) grad_norm 1.8033 (2.3198) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][410/625] eta 0:01:28 lr 0.000860 wd 0.0500 time 0.3958 (0.4128) data time 0.0007 (0.0019) model time 0.3950 (0.4088) loss 7.8332 (7.4294) grad_norm 2.2384 (2.3131) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][420/625] eta 0:01:24 lr 0.000860 wd 0.0500 time 0.3984 (0.4126) data time 0.0006 (0.0019) model time 0.3977 (0.4085) loss 7.6597 (7.4375) grad_norm 2.0251 (2.3082) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][430/625] eta 0:01:20 lr 0.000860 wd 0.0500 time 0.4075 (0.4123) data time 0.0008 (0.0019) model time 0.4067 (0.4083) loss 6.9668 (7.4280) grad_norm 3.4970 (2.3069) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][440/625] eta 0:01:16 lr 0.000859 wd 0.0500 time 0.5993 (0.4142) data time 0.0007 (0.0019) model time 0.5986 (0.4106) loss 6.6065 (7.4318) grad_norm 2.8101 (2.3155) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:45:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][450/625] eta 0:01:12 lr 0.000859 wd 0.0500 time 0.6106 (0.4169) data time 0.0008 (0.0018) model time 0.6098 (0.4136) loss 7.7240 (7.4271) grad_norm 1.4535 (2.3170) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][460/625] eta 0:01:08 lr 0.000859 wd 0.0500 time 0.3946 (0.4176) data time 0.0008 (0.0018) model time 0.3938 (0.4145) loss 8.4990 (7.4361) grad_norm 1.9418 (2.3166) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][470/625] eta 0:01:04 lr 0.000859 wd 0.0500 time 0.4023 (0.4173) data time 0.0006 (0.0018) model time 0.4017 (0.4142) loss 6.6344 (7.4277) grad_norm 1.8229 (2.3087) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][480/625] eta 0:01:00 lr 0.000859 wd 0.0500 time 0.4128 (0.4174) data time 0.0006 (0.0018) model time 0.4122 (0.4144) loss 8.1532 (7.4296) grad_norm 1.4779 (2.2985) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][490/625] eta 0:00:56 lr 0.000859 wd 0.0500 time 0.3960 (0.4171) data time 0.0008 (0.0018) model time 0.3952 (0.4141) loss 6.6340 (7.4351) grad_norm 1.5674 (2.2918) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][500/625] eta 0:00:52 lr 0.000859 wd 0.0500 time 0.4057 (0.4168) data time 0.0006 (0.0017) model time 0.4051 (0.4138) loss 6.3972 (7.4313) grad_norm 2.0497 (2.3004) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][510/625] eta 0:00:47 lr 0.000859 wd 0.0500 time 0.4086 (0.4165) data time 0.0007 (0.0017) model time 0.4078 (0.4136) loss 8.1219 (7.4284) grad_norm 1.4463 (2.3002) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][520/625] eta 0:00:43 lr 0.000859 wd 0.0500 time 0.3950 (0.4162) data time 0.0006 (0.0017) model time 0.3944 (0.4133) loss 7.0689 (7.4144) grad_norm 1.3498 (2.3083) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][530/625] eta 0:00:39 lr 0.000859 wd 0.0500 time 0.4027 (0.4160) data time 0.0006 (0.0017) model time 0.4021 (0.4130) loss 7.3605 (7.4135) grad_norm 2.1639 (2.3050) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][540/625] eta 0:00:35 lr 0.000859 wd 0.0500 time 0.4099 (0.4157) data time 0.0008 (0.0017) model time 0.4091 (0.4128) loss 8.4827 (7.4214) grad_norm 2.1606 (2.2990) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][550/625] eta 0:00:31 lr 0.000858 wd 0.0500 time 0.3977 (0.4155) data time 0.0008 (0.0017) model time 0.3969 (0.4126) loss 7.3120 (7.4243) grad_norm 2.6931 (2.2993) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][560/625] eta 0:00:26 lr 0.000858 wd 0.0500 time 0.4082 (0.4153) data time 0.0008 (0.0017) model time 0.4074 (0.4124) loss 8.0233 (7.4286) grad_norm 1.9239 (2.2931) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][570/625] eta 0:00:22 lr 0.000858 wd 0.0500 time 0.4080 (0.4151) data time 0.0008 (0.0016) model time 0.4072 (0.4122) loss 7.8422 (7.4363) grad_norm 3.3489 (2.3013) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][580/625] eta 0:00:18 lr 0.000858 wd 0.0500 time 0.3988 (0.4149) data time 0.0006 (0.0016) model time 0.3982 (0.4120) loss 8.1944 (7.4388) grad_norm 1.8417 (2.3207) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][590/625] eta 0:00:14 lr 0.000858 wd 0.0500 time 0.3997 (0.4147) data time 0.0008 (0.0016) model time 0.3990 (0.4119) loss 6.0480 (7.4334) grad_norm 2.6783 (2.3286) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:46:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][600/625] eta 0:00:10 lr 0.000858 wd 0.0500 time 0.4017 (0.4145) data time 0.0008 (0.0016) model time 0.4009 (0.4117) loss 7.7135 (7.4380) grad_norm 1.9922 (2.3268) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][610/625] eta 0:00:06 lr 0.000858 wd 0.0500 time 0.3922 (0.4143) data time 0.0004 (0.0016) model time 0.3918 (0.4114) loss 8.0502 (7.4431) grad_norm 1.6713 (2.3201) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][620/625] eta 0:00:02 lr 0.000858 wd 0.0500 time 0.4018 (0.4141) data time 0.0004 (0.0016) model time 0.4014 (0.4112) loss 8.5706 (7.4335) grad_norm 1.2098 (2.3145) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 120 training takes 0:04:18 [2024-07-25 00:47:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:47:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:47:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.6123 (0.6123) Acc@1 88.281 (88.281) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 00:47:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9971 (0.7508) Acc@1 77.490 (84.366) Acc@5 94.580 (97.146) Mem 14939MB [2024-07-25 00:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 1.1270 (0.8888) Acc@1 73.389 (80.780) Acc@5 93.018 (95.594) Mem 14939MB [2024-07-25 00:47:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.436 Acc@5 95.517 [2024-07-25 00:47:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-25 00:47:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.44% [2024-07-25 00:47:12 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 00:47:13 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 00:47:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.445 (0.445) Loss 0.5879 (0.5879) Acc@1 88.721 (88.721) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 00:47:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 0.9580 (0.7335) Acc@1 79.492 (84.943) Acc@5 95.068 (97.332) Mem 14939MB [2024-07-25 00:47:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0938 (0.8681) Acc@1 74.170 (81.399) Acc@5 93.994 (95.878) Mem 14939MB [2024-07-25 00:47:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.074 Acc@5 95.843 [2024-07-25 00:47:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 00:47:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.07% [2024-07-25 00:47:15 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:47:16 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:47:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][0/625] eta 0:08:24 lr 0.000858 wd 0.0500 time 0.8075 (0.8075) data time 0.4322 (0.4322) model time 0.0000 (0.0000) loss 6.7432 (6.7432) grad_norm 2.1174 (2.1174) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][10/625] eta 0:04:31 lr 0.000858 wd 0.0500 time 0.4184 (0.4411) data time 0.0006 (0.0401) model time 0.0000 (0.0000) loss 7.2158 (7.4175) grad_norm 2.3810 (2.0530) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][20/625] eta 0:04:15 lr 0.000858 wd 0.0500 time 0.3976 (0.4226) data time 0.0008 (0.0216) model time 0.0000 (0.0000) loss 6.9689 (7.3218) grad_norm 1.9996 (2.0626) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][30/625] eta 0:04:14 lr 0.000857 wd 0.0500 time 0.5917 (0.4280) data time 0.0006 (0.0149) model time 0.0000 (0.0000) loss 8.6232 (7.4159) grad_norm 3.1567 (2.2059) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][40/625] eta 0:04:25 lr 0.000857 wd 0.0500 time 0.6206 (0.4543) data time 0.0006 (0.0115) model time 0.0000 (0.0000) loss 7.0222 (7.3667) grad_norm 1.3940 (2.3211) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][50/625] eta 0:04:23 lr 0.000857 wd 0.0500 time 0.5515 (0.4577) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 8.1367 (7.3263) grad_norm 1.4157 (2.2432) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][60/625] eta 0:04:15 lr 0.000857 wd 0.0500 time 0.4061 (0.4514) data time 0.0007 (0.0081) model time 0.4054 (0.4183) loss 8.4429 (7.3445) grad_norm 1.9894 (2.1780) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][70/625] eta 0:04:06 lr 0.000857 wd 0.0500 time 0.3970 (0.4442) data time 0.0006 (0.0071) model time 0.3964 (0.4086) loss 5.9128 (7.3201) grad_norm 1.6668 (2.1924) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][80/625] eta 0:03:59 lr 0.000857 wd 0.0500 time 0.3987 (0.4388) data time 0.0006 (0.0063) model time 0.3981 (0.4056) loss 7.6951 (7.2797) grad_norm 1.5761 (2.1724) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:47:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][90/625] eta 0:03:52 lr 0.000857 wd 0.0500 time 0.4147 (0.4351) data time 0.0008 (0.0057) model time 0.4139 (0.4052) loss 6.9176 (7.2573) grad_norm 1.8834 (2.1253) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][100/625] eta 0:03:46 lr 0.000857 wd 0.0500 time 0.3939 (0.4319) data time 0.0010 (0.0053) model time 0.3929 (0.4047) loss 6.6117 (7.2678) grad_norm 2.3551 (2.1471) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][110/625] eta 0:03:41 lr 0.000857 wd 0.0500 time 0.3988 (0.4292) data time 0.0007 (0.0049) model time 0.3981 (0.4040) loss 7.9385 (7.3003) grad_norm 3.0283 (2.1509) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][120/625] eta 0:03:35 lr 0.000857 wd 0.0500 time 0.4041 (0.4269) data time 0.0007 (0.0045) model time 0.4034 (0.4035) loss 8.4037 (7.2808) grad_norm 2.2269 (2.1821) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][130/625] eta 0:03:30 lr 0.000856 wd 0.0500 time 0.3961 (0.4249) data time 0.0006 (0.0043) model time 0.3955 (0.4030) loss 8.0455 (7.2785) grad_norm 1.8530 (2.1758) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][140/625] eta 0:03:25 lr 0.000856 wd 0.0500 time 0.3969 (0.4231) data time 0.0009 (0.0040) model time 0.3960 (0.4025) loss 7.2168 (7.2527) grad_norm 1.5482 (2.1761) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][150/625] eta 0:03:20 lr 0.000856 wd 0.0500 time 0.4138 (0.4217) data time 0.0009 (0.0038) model time 0.4129 (0.4024) loss 7.9167 (7.2880) grad_norm 3.2610 (2.2067) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][160/625] eta 0:03:15 lr 0.000856 wd 0.0500 time 0.3995 (0.4205) data time 0.0006 (0.0037) model time 0.3989 (0.4022) loss 9.0935 (7.3006) grad_norm 2.1979 (2.2308) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][170/625] eta 0:03:10 lr 0.000856 wd 0.0500 time 0.3997 (0.4194) data time 0.0008 (0.0035) model time 0.3989 (0.4021) loss 8.1407 (7.3004) grad_norm 2.3314 (2.2605) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][180/625] eta 0:03:06 lr 0.000856 wd 0.0500 time 0.4120 (0.4184) data time 0.0008 (0.0034) model time 0.4112 (0.4021) loss 6.6783 (7.2835) grad_norm 2.8663 (2.2792) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][190/625] eta 0:03:01 lr 0.000856 wd 0.0500 time 0.3948 (0.4175) data time 0.0007 (0.0032) model time 0.3941 (0.4018) loss 8.9217 (7.2882) grad_norm 1.9375 (2.2686) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][200/625] eta 0:02:57 lr 0.000856 wd 0.0500 time 0.4051 (0.4166) data time 0.0009 (0.0031) model time 0.4043 (0.4017) loss 8.4033 (7.2615) grad_norm 2.9925 (2.2600) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][210/625] eta 0:02:52 lr 0.000856 wd 0.0500 time 0.4140 (0.4167) data time 0.0007 (0.0030) model time 0.4134 (0.4027) loss 6.3855 (7.2731) grad_norm 3.2882 (2.2777) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][220/625] eta 0:02:48 lr 0.000856 wd 0.0500 time 0.3934 (0.4159) data time 0.0007 (0.0029) model time 0.3927 (0.4024) loss 7.1489 (7.2821) grad_norm 2.5978 (2.3084) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][230/625] eta 0:02:44 lr 0.000855 wd 0.0500 time 0.3989 (0.4153) data time 0.0009 (0.0028) model time 0.3981 (0.4023) loss 8.0865 (7.2949) grad_norm 2.8207 (2.3222) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:48:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][240/625] eta 0:02:39 lr 0.000855 wd 0.0500 time 0.4066 (0.4147) data time 0.0009 (0.0028) model time 0.4057 (0.4022) loss 7.4119 (7.3002) grad_norm 1.8885 (2.3358) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][250/625] eta 0:02:35 lr 0.000855 wd 0.0500 time 0.5949 (0.4154) data time 0.0006 (0.0027) model time 0.5943 (0.4036) loss 7.9239 (7.3046) grad_norm 13.0467 (2.3797) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][260/625] eta 0:02:32 lr 0.000855 wd 0.0500 time 0.6189 (0.4190) data time 0.0009 (0.0026) model time 0.6180 (0.4087) loss 7.5969 (7.3192) grad_norm 2.0702 (2.3760) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][270/625] eta 0:02:29 lr 0.000855 wd 0.0500 time 0.4251 (0.4213) data time 0.0006 (0.0026) model time 0.4245 (0.4119) loss 9.1524 (7.3338) grad_norm 2.4453 (2.3653) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][280/625] eta 0:02:25 lr 0.000855 wd 0.0500 time 0.3956 (0.4211) data time 0.0008 (0.0025) model time 0.3947 (0.4120) loss 8.4849 (7.3455) grad_norm 2.0299 (2.3716) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][290/625] eta 0:02:20 lr 0.000855 wd 0.0500 time 0.3968 (0.4204) data time 0.0006 (0.0025) model time 0.3962 (0.4115) loss 6.5809 (7.3389) grad_norm 2.8700 (2.3658) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][300/625] eta 0:02:16 lr 0.000855 wd 0.0500 time 0.4172 (0.4198) data time 0.0009 (0.0024) model time 0.4164 (0.4111) loss 7.1267 (7.3492) grad_norm 4.9119 (2.3810) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][310/625] eta 0:02:12 lr 0.000855 wd 0.0500 time 0.3955 (0.4192) data time 0.0008 (0.0024) model time 0.3947 (0.4106) loss 7.0414 (7.3561) grad_norm 2.0908 (2.3898) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][320/625] eta 0:02:07 lr 0.000855 wd 0.0500 time 0.3973 (0.4186) data time 0.0009 (0.0023) model time 0.3964 (0.4102) loss 6.7013 (7.3547) grad_norm 1.7804 (2.3968) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][330/625] eta 0:02:03 lr 0.000855 wd 0.0500 time 0.4053 (0.4181) data time 0.0009 (0.0023) model time 0.4044 (0.4099) loss 7.1732 (7.3532) grad_norm 1.9299 (2.3903) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][340/625] eta 0:01:58 lr 0.000854 wd 0.0500 time 0.3975 (0.4175) data time 0.0009 (0.0022) model time 0.3966 (0.4094) loss 8.5863 (7.3623) grad_norm 3.0702 (2.3883) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][350/625] eta 0:01:54 lr 0.000854 wd 0.0500 time 0.3935 (0.4171) data time 0.0008 (0.0022) model time 0.3927 (0.4092) loss 6.8744 (7.3704) grad_norm 1.6718 (2.3879) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][360/625] eta 0:01:50 lr 0.000854 wd 0.0500 time 0.4105 (0.4167) data time 0.0008 (0.0022) model time 0.4097 (0.4090) loss 6.5614 (7.3690) grad_norm 2.1137 (2.3811) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][370/625] eta 0:01:46 lr 0.000854 wd 0.0500 time 0.3957 (0.4163) data time 0.0008 (0.0022) model time 0.3948 (0.4087) loss 6.8681 (7.3785) grad_norm 1.3514 (2.3750) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][380/625] eta 0:01:41 lr 0.000854 wd 0.0500 time 0.4001 (0.4159) data time 0.0006 (0.0021) model time 0.3995 (0.4085) loss 6.7916 (7.3666) grad_norm 1.8186 (2.3654) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][390/625] eta 0:01:37 lr 0.000854 wd 0.0500 time 0.4108 (0.4156) data time 0.0006 (0.0021) model time 0.4101 (0.4083) loss 7.3393 (7.3708) grad_norm 2.0895 (2.3581) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][400/625] eta 0:01:33 lr 0.000854 wd 0.0500 time 0.3946 (0.4152) data time 0.0008 (0.0021) model time 0.3938 (0.4080) loss 6.5705 (7.3803) grad_norm 2.9281 (2.3621) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][410/625] eta 0:01:29 lr 0.000854 wd 0.0500 time 0.4043 (0.4149) data time 0.0008 (0.0020) model time 0.4035 (0.4078) loss 8.2750 (7.3931) grad_norm 1.8722 (2.3555) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][420/625] eta 0:01:25 lr 0.000854 wd 0.0500 time 0.6176 (0.4151) data time 0.0009 (0.0020) model time 0.6167 (0.4082) loss 6.7402 (7.3843) grad_norm 1.7683 (2.3519) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][430/625] eta 0:01:20 lr 0.000854 wd 0.0500 time 0.3958 (0.4146) data time 0.0006 (0.0020) model time 0.3952 (0.4079) loss 8.3515 (7.3854) grad_norm 1.9504 (2.3530) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][440/625] eta 0:01:16 lr 0.000853 wd 0.0500 time 0.3968 (0.4143) data time 0.0008 (0.0020) model time 0.3960 (0.4077) loss 7.9021 (7.3791) grad_norm 1.6552 (2.3548) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][450/625] eta 0:01:12 lr 0.000853 wd 0.0500 time 0.4155 (0.4141) data time 0.0006 (0.0019) model time 0.4149 (0.4076) loss 8.6293 (7.3860) grad_norm 1.8513 (2.3461) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][460/625] eta 0:01:08 lr 0.000853 wd 0.0500 time 0.3944 (0.4138) data time 0.0007 (0.0019) model time 0.3938 (0.4074) loss 6.4728 (7.3920) grad_norm 2.4492 (2.3471) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][470/625] eta 0:01:04 lr 0.000853 wd 0.0500 time 0.3957 (0.4139) data time 0.0009 (0.0019) model time 0.3948 (0.4076) loss 8.1157 (7.3837) grad_norm 2.1635 (2.3678) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][480/625] eta 0:01:00 lr 0.000853 wd 0.0500 time 0.5528 (0.4166) data time 0.0009 (0.0019) model time 0.5520 (0.4107) loss 7.4006 (7.3959) grad_norm 1.8579 (2.3754) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][490/625] eta 0:00:56 lr 0.000853 wd 0.0500 time 0.3959 (0.4176) data time 0.0009 (0.0019) model time 0.3950 (0.4119) loss 7.1167 (7.3956) grad_norm 1.9049 (2.3738) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][500/625] eta 0:00:52 lr 0.000853 wd 0.0500 time 0.4120 (0.4178) data time 0.0009 (0.0019) model time 0.4112 (0.4123) loss 6.8901 (7.3930) grad_norm 3.0219 (2.3787) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][510/625] eta 0:00:48 lr 0.000853 wd 0.0500 time 0.4149 (0.4175) data time 0.0009 (0.0019) model time 0.4140 (0.4120) loss 8.8366 (7.4042) grad_norm 2.5699 (2.3873) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][520/625] eta 0:00:43 lr 0.000853 wd 0.0500 time 0.3960 (0.4172) data time 0.0007 (0.0019) model time 0.3954 (0.4118) loss 6.5552 (7.4021) grad_norm 2.5565 (2.3826) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][530/625] eta 0:00:39 lr 0.000853 wd 0.0500 time 0.3977 (0.4169) data time 0.0008 (0.0018) model time 0.3968 (0.4115) loss 7.6678 (7.4023) grad_norm 1.7215 (2.3796) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][540/625] eta 0:00:35 lr 0.000852 wd 0.0500 time 0.4154 (0.4166) data time 0.0007 (0.0018) model time 0.4147 (0.4113) loss 8.1133 (7.4029) grad_norm 2.2013 (2.3738) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][550/625] eta 0:00:31 lr 0.000852 wd 0.0500 time 0.3948 (0.4164) data time 0.0008 (0.0018) model time 0.3940 (0.4111) loss 9.1426 (7.3996) grad_norm 2.3956 (2.3693) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][560/625] eta 0:00:27 lr 0.000852 wd 0.0500 time 0.4009 (0.4161) data time 0.0008 (0.0018) model time 0.4001 (0.4109) loss 7.4890 (7.4053) grad_norm 2.5785 (2.3747) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][570/625] eta 0:00:22 lr 0.000852 wd 0.0500 time 0.4102 (0.4159) data time 0.0009 (0.0018) model time 0.4093 (0.4108) loss 7.4338 (7.4115) grad_norm 2.5039 (2.3840) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][580/625] eta 0:00:18 lr 0.000852 wd 0.0500 time 0.3960 (0.4156) data time 0.0007 (0.0018) model time 0.3953 (0.4106) loss 6.5858 (7.4104) grad_norm 1.9357 (2.3848) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][590/625] eta 0:00:14 lr 0.000852 wd 0.0500 time 0.3995 (0.4154) data time 0.0007 (0.0017) model time 0.3988 (0.4104) loss 8.0991 (7.4184) grad_norm 3.1068 (2.3891) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][600/625] eta 0:00:10 lr 0.000852 wd 0.0500 time 0.4309 (0.4152) data time 0.0007 (0.0017) model time 0.4302 (0.4103) loss 5.8669 (7.4143) grad_norm 2.5208 (2.3868) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][610/625] eta 0:00:06 lr 0.000852 wd 0.0500 time 0.3934 (0.4150) data time 0.0006 (0.0017) model time 0.3927 (0.4100) loss 7.1871 (7.4068) grad_norm 2.1144 (2.3872) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][620/625] eta 0:00:02 lr 0.000852 wd 0.0500 time 0.4004 (0.4148) data time 0.0005 (0.0017) model time 0.3998 (0.4099) loss 7.5782 (7.4016) grad_norm 1.6621 (2.3823) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 121 training takes 0:04:19 [2024-07-25 00:51:35 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:51:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:51:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.446 (0.446) Loss 0.6479 (0.6479) Acc@1 87.891 (87.891) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-25 00:51:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.9922 (0.7723) Acc@1 79.102 (84.482) Acc@5 94.922 (97.110) Mem 14939MB [2024-07-25 00:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.1357 (0.9140) Acc@1 74.219 (80.838) Acc@5 93.506 (95.585) Mem 14939MB [2024-07-25 00:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.446 Acc@5 95.549 [2024-07-25 00:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-25 00:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.45% [2024-07-25 00:51:39 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 00:51:40 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 00:51:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.446 (0.446) Loss 0.5864 (0.5864) Acc@1 88.623 (88.623) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 00:51:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9551 (0.7320) Acc@1 79.541 (84.965) Acc@5 95.068 (97.337) Mem 14939MB [2024-07-25 00:51:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0908 (0.8664) Acc@1 74.268 (81.441) Acc@5 93.896 (95.898) Mem 14939MB [2024-07-25 00:51:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.108 Acc@5 95.865 [2024-07-25 00:51:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 00:51:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.11% [2024-07-25 00:51:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:51:44 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:51:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][0/625] eta 0:08:00 lr 0.000852 wd 0.0500 time 0.7692 (0.7692) data time 0.3785 (0.3785) model time 0.0000 (0.0000) loss 6.5330 (6.5330) grad_norm 3.1078 (3.1078) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][10/625] eta 0:04:27 lr 0.000852 wd 0.0500 time 0.3959 (0.4344) data time 0.0008 (0.0353) model time 0.0000 (0.0000) loss 6.3310 (7.5845) grad_norm 2.6004 (2.7168) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][20/625] eta 0:04:13 lr 0.000851 wd 0.0500 time 0.3945 (0.4188) data time 0.0010 (0.0190) model time 0.0000 (0.0000) loss 6.9995 (7.5039) grad_norm 2.4690 (2.5654) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:51:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][30/625] eta 0:04:05 lr 0.000851 wd 0.0500 time 0.3986 (0.4128) data time 0.0009 (0.0132) model time 0.0000 (0.0000) loss 6.6805 (7.6074) grad_norm 2.6518 (2.5705) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][40/625] eta 0:04:00 lr 0.000851 wd 0.0500 time 0.4241 (0.4103) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 8.4985 (7.5234) grad_norm 2.7698 (2.4716) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][50/625] eta 0:03:55 lr 0.000851 wd 0.0500 time 0.3925 (0.4090) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 7.0156 (7.4339) grad_norm 2.5088 (2.4054) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][60/625] eta 0:03:50 lr 0.000851 wd 0.0500 time 0.4029 (0.4078) data time 0.0009 (0.0072) model time 0.4020 (0.4009) loss 7.6320 (7.4518) grad_norm 2.6407 (2.3447) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][70/625] eta 0:03:50 lr 0.000851 wd 0.0500 time 0.5828 (0.4148) data time 0.0007 (0.0063) model time 0.5822 (0.4288) loss 6.4341 (7.4280) grad_norm 1.7656 (2.3213) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][80/625] eta 0:03:50 lr 0.000851 wd 0.0500 time 0.6184 (0.4238) data time 0.0008 (0.0056) model time 0.6175 (0.4481) loss 6.4368 (7.4059) grad_norm 2.0776 (2.5377) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][90/625] eta 0:03:51 lr 0.000851 wd 0.0500 time 0.5878 (0.4319) data time 0.0006 (0.0051) model time 0.5871 (0.4602) loss 7.0518 (7.4232) grad_norm 1.7140 (2.5215) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][100/625] eta 0:03:45 lr 0.000851 wd 0.0500 time 0.4049 (0.4289) data time 0.0006 (0.0047) model time 0.4043 (0.4483) loss 5.9798 (7.3958) grad_norm 2.4452 (2.4990) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][110/625] eta 0:03:39 lr 0.000851 wd 0.0500 time 0.3940 (0.4267) data time 0.0007 (0.0044) model time 0.3934 (0.4409) loss 6.4606 (7.4093) grad_norm 1.7906 (2.4445) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][120/625] eta 0:03:34 lr 0.000850 wd 0.0500 time 0.4032 (0.4246) data time 0.0007 (0.0041) model time 0.4025 (0.4351) loss 8.0459 (7.4130) grad_norm 1.6785 (2.4262) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][130/625] eta 0:03:29 lr 0.000850 wd 0.0500 time 0.4062 (0.4230) data time 0.0008 (0.0039) model time 0.4053 (0.4310) loss 8.0823 (7.4292) grad_norm 1.7628 (2.4135) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][140/625] eta 0:03:24 lr 0.000850 wd 0.0500 time 0.3941 (0.4214) data time 0.0007 (0.0036) model time 0.3934 (0.4276) loss 7.2915 (7.4321) grad_norm 1.7971 (2.4106) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][150/625] eta 0:03:19 lr 0.000850 wd 0.0500 time 0.3997 (0.4202) data time 0.0006 (0.0035) model time 0.3992 (0.4251) loss 6.6592 (7.4033) grad_norm 1.6267 (2.3848) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][160/625] eta 0:03:15 lr 0.000850 wd 0.0500 time 0.3989 (0.4199) data time 0.0008 (0.0033) model time 0.3980 (0.4241) loss 8.6077 (7.4087) grad_norm 1.6466 (2.3490) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][170/625] eta 0:03:10 lr 0.000850 wd 0.0500 time 0.3943 (0.4188) data time 0.0007 (0.0032) model time 0.3936 (0.4221) loss 7.9846 (7.4074) grad_norm 2.1535 (2.3312) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:52:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][180/625] eta 0:03:06 lr 0.000850 wd 0.0500 time 0.4046 (0.4180) data time 0.0006 (0.0030) model time 0.4040 (0.4206) loss 8.7495 (7.4291) grad_norm 1.8097 (2.3257) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][190/625] eta 0:03:01 lr 0.000850 wd 0.0500 time 0.4071 (0.4173) data time 0.0007 (0.0029) model time 0.4065 (0.4194) loss 7.9024 (7.4313) grad_norm 2.4064 (2.3271) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][200/625] eta 0:02:57 lr 0.000850 wd 0.0500 time 0.4008 (0.4165) data time 0.0006 (0.0029) model time 0.4002 (0.4181) loss 7.5597 (7.4336) grad_norm 2.2020 (2.3263) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][210/625] eta 0:02:52 lr 0.000850 wd 0.0500 time 0.4180 (0.4161) data time 0.0009 (0.0028) model time 0.4171 (0.4174) loss 7.4060 (7.4434) grad_norm 2.3118 (2.3153) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][220/625] eta 0:02:48 lr 0.000850 wd 0.0500 time 0.4016 (0.4156) data time 0.0007 (0.0027) model time 0.4009 (0.4166) loss 8.3734 (7.4719) grad_norm 1.8958 (2.2998) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][230/625] eta 0:02:43 lr 0.000849 wd 0.0500 time 0.3961 (0.4150) data time 0.0009 (0.0026) model time 0.3952 (0.4156) loss 8.1090 (7.4949) grad_norm 1.8965 (2.2867) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][240/625] eta 0:02:39 lr 0.000849 wd 0.0500 time 0.4017 (0.4150) data time 0.0007 (0.0026) model time 0.4010 (0.4155) loss 8.1198 (7.5162) grad_norm 4.0039 (2.3072) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][250/625] eta 0:02:35 lr 0.000849 wd 0.0500 time 0.4040 (0.4145) data time 0.0009 (0.0026) model time 0.4031 (0.4149) loss 8.1378 (7.4978) grad_norm 1.7658 (2.3195) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][260/625] eta 0:02:31 lr 0.000849 wd 0.0500 time 0.3975 (0.4140) data time 0.0007 (0.0025) model time 0.3968 (0.4141) loss 7.7799 (7.4906) grad_norm 3.7548 (2.3630) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][270/625] eta 0:02:26 lr 0.000849 wd 0.0500 time 0.4062 (0.4136) data time 0.0008 (0.0024) model time 0.4054 (0.4136) loss 8.0171 (7.4801) grad_norm 2.1877 (2.3692) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][280/625] eta 0:02:22 lr 0.000849 wd 0.0500 time 0.6036 (0.4139) data time 0.0007 (0.0024) model time 0.6029 (0.4139) loss 8.5088 (7.4800) grad_norm 3.2428 (2.3660) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][290/625] eta 0:02:18 lr 0.000849 wd 0.0500 time 0.3965 (0.4147) data time 0.0007 (0.0023) model time 0.3959 (0.4148) loss 7.5836 (7.4895) grad_norm 2.5244 (2.3658) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][300/625] eta 0:02:16 lr 0.000849 wd 0.0500 time 0.6021 (0.4186) data time 0.0009 (0.0023) model time 0.6013 (0.4195) loss 7.6390 (7.4929) grad_norm 2.3252 (2.3572) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][310/625] eta 0:02:12 lr 0.000849 wd 0.0500 time 0.5501 (0.4195) data time 0.0007 (0.0023) model time 0.5494 (0.4205) loss 7.3051 (7.4917) grad_norm 2.7355 (2.3567) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][320/625] eta 0:02:07 lr 0.000849 wd 0.0500 time 0.3968 (0.4189) data time 0.0007 (0.0022) model time 0.3961 (0.4197) loss 7.9163 (7.5065) grad_norm 3.4627 (2.3617) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][330/625] eta 0:02:03 lr 0.000848 wd 0.0500 time 0.4021 (0.4183) data time 0.0007 (0.0022) model time 0.4014 (0.4190) loss 7.9311 (7.4945) grad_norm 3.3516 (2.3727) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][340/625] eta 0:01:59 lr 0.000848 wd 0.0500 time 0.3956 (0.4178) data time 0.0009 (0.0021) model time 0.3947 (0.4183) loss 7.7731 (7.4936) grad_norm 2.0110 (2.3982) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][350/625] eta 0:01:54 lr 0.000848 wd 0.0500 time 0.3947 (0.4173) data time 0.0008 (0.0021) model time 0.3939 (0.4177) loss 6.6165 (7.4933) grad_norm 1.8736 (2.3935) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][360/625] eta 0:01:50 lr 0.000848 wd 0.0500 time 0.4087 (0.4169) data time 0.0006 (0.0021) model time 0.4081 (0.4171) loss 8.5791 (7.5009) grad_norm 1.6444 (2.3945) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][370/625] eta 0:01:46 lr 0.000848 wd 0.0500 time 0.3940 (0.4164) data time 0.0008 (0.0020) model time 0.3933 (0.4165) loss 5.7898 (7.4937) grad_norm 1.6368 (2.3842) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][380/625] eta 0:01:41 lr 0.000848 wd 0.0500 time 0.3964 (0.4162) data time 0.0008 (0.0020) model time 0.3956 (0.4163) loss 7.2171 (7.4906) grad_norm 1.6725 (2.3734) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][390/625] eta 0:01:37 lr 0.000848 wd 0.0500 time 0.4071 (0.4158) data time 0.0008 (0.0020) model time 0.4063 (0.4158) loss 7.6305 (7.4903) grad_norm 1.8918 (2.3636) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][400/625] eta 0:01:33 lr 0.000848 wd 0.0500 time 0.3966 (0.4155) data time 0.0006 (0.0020) model time 0.3960 (0.4155) loss 5.9882 (7.4928) grad_norm 2.0569 (2.3500) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][410/625] eta 0:01:29 lr 0.000848 wd 0.0500 time 0.3977 (0.4152) data time 0.0009 (0.0019) model time 0.3968 (0.4150) loss 6.7629 (7.4962) grad_norm 3.9967 (2.3503) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][420/625] eta 0:01:25 lr 0.000848 wd 0.0500 time 0.4079 (0.4148) data time 0.0006 (0.0019) model time 0.4073 (0.4146) loss 7.7669 (7.5048) grad_norm 1.6856 (2.3528) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][430/625] eta 0:01:20 lr 0.000847 wd 0.0500 time 0.3944 (0.4145) data time 0.0007 (0.0019) model time 0.3937 (0.4142) loss 6.6237 (7.5025) grad_norm 2.2801 (2.3584) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][440/625] eta 0:01:16 lr 0.000847 wd 0.0500 time 0.3984 (0.4142) data time 0.0006 (0.0019) model time 0.3978 (0.4139) loss 8.8269 (7.5038) grad_norm 1.5779 (2.3524) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][450/625] eta 0:01:12 lr 0.000847 wd 0.0500 time 0.4089 (0.4139) data time 0.0007 (0.0019) model time 0.4083 (0.4135) loss 7.7860 (7.4976) grad_norm 1.5284 (2.3425) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][460/625] eta 0:01:08 lr 0.000847 wd 0.0500 time 0.3965 (0.4137) data time 0.0010 (0.0018) model time 0.3955 (0.4132) loss 7.4648 (7.4951) grad_norm 2.5366 (2.3421) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:54:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][470/625] eta 0:01:04 lr 0.000847 wd 0.0500 time 0.4045 (0.4134) data time 0.0006 (0.0018) model time 0.4039 (0.4129) loss 7.0138 (7.5042) grad_norm 2.2676 (2.3625) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:55:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][480/625] eta 0:00:59 lr 0.000847 wd 0.0500 time 0.4266 (0.4132) data time 0.0006 (0.0018) model time 0.4260 (0.4127) loss 7.1076 (7.5023) grad_norm 3.2820 (2.3626) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:55:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][490/625] eta 0:00:55 lr 0.000847 wd 0.0500 time 0.3938 (0.4130) data time 0.0009 (0.0018) model time 0.3929 (0.4125) loss 6.5388 (7.5071) grad_norm 3.6092 (2.3665) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:55:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][500/625] eta 0:00:51 lr 0.000847 wd 0.0500 time 0.3998 (0.4128) data time 0.0007 (0.0018) model time 0.3991 (0.4122) loss 8.2017 (7.5036) grad_norm 1.7841 (2.3725) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][510/625] eta 0:00:47 lr 0.000847 wd 0.0500 time 0.6002 (0.4138) data time 0.0009 (0.0018) model time 0.5993 (0.4134) loss 8.4062 (7.5038) grad_norm 2.0845 (2.3666) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:55:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][520/625] eta 0:00:43 lr 0.000847 wd 0.0500 time 0.3950 (0.4157) data time 0.0006 (0.0017) model time 0.3943 (0.4154) loss 6.8922 (7.5007) grad_norm 1.9542 (2.3631) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:55:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][530/625] eta 0:00:39 lr 0.000846 wd 0.0500 time 0.3969 (0.4169) data time 0.0007 (0.0017) model time 0.3962 (0.4168) loss 8.0336 (7.4983) grad_norm 1.4531 (2.3534) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 00:55:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][540/625] eta 0:00:35 lr 0.000846 wd 0.0500 time 0.4064 (0.4167) data time 0.0006 (0.0017) model time 0.4058 (0.4164) loss 6.6893 (7.4967) grad_norm 2.3120 (2.3456) loss_scale 2048.0000 (1039.1423) mem 14939MB [2024-07-25 00:55:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][550/625] eta 0:00:31 lr 0.000846 wd 0.0500 time 0.3947 (0.4164) data time 0.0008 (0.0017) model time 0.3939 (0.4161) loss 6.6804 (7.4864) grad_norm 2.0768 (2.3450) loss_scale 2048.0000 (1057.4519) mem 14939MB [2024-07-25 00:55:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][560/625] eta 0:00:27 lr 0.000846 wd 0.0500 time 0.4008 (0.4162) data time 0.0006 (0.0017) model time 0.4001 (0.4159) loss 7.4533 (7.4879) grad_norm 1.8553 (2.3516) loss_scale 2048.0000 (1075.1087) mem 14939MB [2024-07-25 00:55:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][570/625] eta 0:00:22 lr 0.000846 wd 0.0500 time 0.4032 (0.4159) data time 0.0009 (0.0017) model time 0.4023 (0.4156) loss 7.7414 (7.4839) grad_norm 2.3167 (2.3575) loss_scale 2048.0000 (1092.1471) mem 14939MB [2024-07-25 00:55:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][580/625] eta 0:00:18 lr 0.000846 wd 0.0500 time 0.3937 (0.4156) data time 0.0009 (0.0017) model time 0.3928 (0.4153) loss 7.7796 (7.4854) grad_norm 2.9393 (2.3540) loss_scale 2048.0000 (1108.5990) mem 14939MB [2024-07-25 00:55:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][590/625] eta 0:00:14 lr 0.000846 wd 0.0500 time 0.4003 (0.4154) data time 0.0009 (0.0016) model time 0.3994 (0.4150) loss 7.9915 (7.4865) grad_norm 2.8985 (2.3674) loss_scale 2048.0000 (1124.4941) mem 14939MB [2024-07-25 00:55:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][600/625] eta 0:00:10 lr 0.000846 wd 0.0500 time 0.3890 (0.4155) data time 0.0007 (0.0016) model time 0.3883 (0.4150) loss 6.0532 (7.4880) grad_norm 2.2789 (2.3713) loss_scale 2048.0000 (1139.8602) mem 14939MB [2024-07-25 00:55:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][610/625] eta 0:00:06 lr 0.000846 wd 0.0500 time 0.3927 (0.4152) data time 0.0006 (0.0016) model time 0.3921 (0.4148) loss 7.9895 (7.4867) grad_norm 1.7880 (2.3663) loss_scale 2048.0000 (1154.7234) mem 14939MB [2024-07-25 00:56:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][620/625] eta 0:00:02 lr 0.000846 wd 0.0500 time 0.3970 (0.4150) data time 0.0006 (0.0016) model time 0.3964 (0.4145) loss 8.2036 (7.4916) grad_norm 1.8258 (2.3659) loss_scale 2048.0000 (1169.1079) mem 14939MB [2024-07-25 00:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 122 training takes 0:04:19 [2024-07-25 00:56:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 00:56:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 00:56:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 0.5942 (0.5942) Acc@1 89.014 (89.014) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 00:56:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 1.0078 (0.7591) Acc@1 77.246 (84.282) Acc@5 94.971 (97.217) Mem 14939MB [2024-07-25 00:56:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0889 (0.8925) Acc@1 74.561 (80.829) Acc@5 93.604 (95.678) Mem 14939MB [2024-07-25 00:56:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.408 Acc@5 95.635 [2024-07-25 00:56:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-07-25 00:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.726 (0.726) Loss 0.5854 (0.5854) Acc@1 88.672 (88.672) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 00:56:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.152) Loss 0.9541 (0.7309) Acc@1 79.443 (84.965) Acc@5 95.020 (97.337) Mem 14939MB [2024-07-25 00:56:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.120) Loss 1.0908 (0.8651) Acc@1 74.414 (81.466) Acc@5 93.994 (95.910) Mem 14939MB [2024-07-25 00:56:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.142 Acc@5 95.877 [2024-07-25 00:56:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 00:56:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.14% [2024-07-25 00:56:09 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 00:56:10 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 00:56:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][0/625] eta 0:07:59 lr 0.000846 wd 0.0500 time 0.7674 (0.7674) data time 0.3776 (0.3776) model time 0.0000 (0.0000) loss 6.9237 (6.9237) grad_norm 2.6973 (2.6973) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][10/625] eta 0:04:27 lr 0.000845 wd 0.0500 time 0.3951 (0.4350) data time 0.0009 (0.0352) model time 0.0000 (0.0000) loss 6.8531 (7.0241) grad_norm 1.6394 (2.0107) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][20/625] eta 0:04:13 lr 0.000845 wd 0.0500 time 0.4055 (0.4192) data time 0.0008 (0.0189) model time 0.0000 (0.0000) loss 6.9126 (7.0500) grad_norm 1.8958 (2.0923) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][30/625] eta 0:04:06 lr 0.000845 wd 0.0500 time 0.4038 (0.4136) data time 0.0008 (0.0131) model time 0.0000 (0.0000) loss 8.4632 (7.0799) grad_norm 2.4464 (2.1834) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][40/625] eta 0:04:00 lr 0.000845 wd 0.0500 time 0.3969 (0.4109) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 8.4757 (7.1324) grad_norm 2.1626 (2.2454) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][50/625] eta 0:03:55 lr 0.000845 wd 0.0500 time 0.3986 (0.4092) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 8.1391 (7.2416) grad_norm 4.4773 (2.5534) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][60/625] eta 0:03:50 lr 0.000845 wd 0.0500 time 0.4073 (0.4079) data time 0.0008 (0.0072) model time 0.4065 (0.4006) loss 7.6894 (7.2656) grad_norm 3.6250 (2.5637) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][70/625] eta 0:03:45 lr 0.000845 wd 0.0500 time 0.3951 (0.4067) data time 0.0008 (0.0063) model time 0.3942 (0.3997) loss 8.0042 (7.2822) grad_norm 2.3346 (2.4994) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][80/625] eta 0:03:41 lr 0.000845 wd 0.0500 time 0.3993 (0.4060) data time 0.0006 (0.0057) model time 0.3987 (0.3997) loss 6.7955 (7.2804) grad_norm 2.1841 (2.4327) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][90/625] eta 0:03:37 lr 0.000845 wd 0.0500 time 0.4034 (0.4057) data time 0.0006 (0.0051) model time 0.4028 (0.4004) loss 8.4021 (7.3331) grad_norm 2.8404 (2.4267) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][100/625] eta 0:03:33 lr 0.000845 wd 0.0500 time 0.4009 (0.4074) data time 0.0007 (0.0047) model time 0.4003 (0.4046) loss 8.1051 (7.3686) grad_norm 4.1765 (2.4934) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:56:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][110/625] eta 0:03:32 lr 0.000844 wd 0.0500 time 0.5638 (0.4126) data time 0.0006 (0.0044) model time 0.5632 (0.4146) loss 8.5194 (7.3361) grad_norm 1.8123 (2.4864) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][120/625] eta 0:03:33 lr 0.000844 wd 0.0500 time 0.5626 (0.4221) data time 0.0007 (0.0041) model time 0.5619 (0.4307) loss 7.3859 (7.3456) grad_norm 2.5265 (2.4570) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][130/625] eta 0:03:30 lr 0.000844 wd 0.0500 time 0.5458 (0.4254) data time 0.0008 (0.0039) model time 0.5449 (0.4348) loss 8.0288 (7.3430) grad_norm 1.6988 (2.4363) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][140/625] eta 0:03:25 lr 0.000844 wd 0.0500 time 0.3952 (0.4234) data time 0.0007 (0.0037) model time 0.3945 (0.4306) loss 7.5218 (7.3902) grad_norm 1.5398 (2.4020) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][150/625] eta 0:03:20 lr 0.000844 wd 0.0500 time 0.3970 (0.4220) data time 0.0007 (0.0035) model time 0.3963 (0.4276) loss 6.0944 (7.3955) grad_norm 1.9554 (2.4063) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][160/625] eta 0:03:15 lr 0.000844 wd 0.0500 time 0.4065 (0.4208) data time 0.0008 (0.0033) model time 0.4057 (0.4252) loss 6.6512 (7.3722) grad_norm 1.9759 (2.4337) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][170/625] eta 0:03:10 lr 0.000844 wd 0.0500 time 0.3972 (0.4197) data time 0.0010 (0.0032) model time 0.3962 (0.4232) loss 7.1695 (7.3814) grad_norm 3.5126 (2.4495) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][180/625] eta 0:03:06 lr 0.000844 wd 0.0500 time 0.3981 (0.4187) data time 0.0008 (0.0031) model time 0.3972 (0.4214) loss 7.8361 (7.3955) grad_norm 2.0889 (2.4749) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][190/625] eta 0:03:01 lr 0.000844 wd 0.0500 time 0.4087 (0.4178) data time 0.0008 (0.0030) model time 0.4079 (0.4200) loss 6.9840 (7.3978) grad_norm 3.3486 (2.4808) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][200/625] eta 0:02:57 lr 0.000844 wd 0.0500 time 0.4035 (0.4169) data time 0.0008 (0.0029) model time 0.4027 (0.4186) loss 8.8812 (7.4192) grad_norm 1.7705 (2.4561) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][210/625] eta 0:02:52 lr 0.000844 wd 0.0500 time 0.3989 (0.4162) data time 0.0006 (0.0028) model time 0.3982 (0.4174) loss 6.6587 (7.3911) grad_norm 3.2978 (2.4563) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][220/625] eta 0:02:48 lr 0.000843 wd 0.0500 time 0.4098 (0.4155) data time 0.0009 (0.0027) model time 0.4090 (0.4164) loss 7.6006 (7.3839) grad_norm 2.5212 (2.4643) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][230/625] eta 0:02:43 lr 0.000843 wd 0.0500 time 0.3963 (0.4148) data time 0.0007 (0.0026) model time 0.3956 (0.4155) loss 7.6778 (7.3829) grad_norm 2.7752 (2.4654) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][240/625] eta 0:02:39 lr 0.000843 wd 0.0500 time 0.3989 (0.4142) data time 0.0006 (0.0025) model time 0.3983 (0.4146) loss 7.2695 (7.3798) grad_norm 1.9871 (2.4445) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][250/625] eta 0:02:35 lr 0.000843 wd 0.0500 time 0.4112 (0.4137) data time 0.0008 (0.0025) model time 0.4104 (0.4139) loss 6.7679 (7.3998) grad_norm 2.9158 (2.4282) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:57:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][260/625] eta 0:02:30 lr 0.000843 wd 0.0500 time 0.3949 (0.4132) data time 0.0008 (0.0024) model time 0.3941 (0.4132) loss 7.5096 (7.4039) grad_norm 2.0597 (2.4159) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][270/625] eta 0:02:26 lr 0.000843 wd 0.0500 time 0.4018 (0.4128) data time 0.0006 (0.0024) model time 0.4012 (0.4126) loss 8.5265 (7.4085) grad_norm 2.1817 (2.3956) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][280/625] eta 0:02:22 lr 0.000843 wd 0.0500 time 0.4040 (0.4124) data time 0.0009 (0.0023) model time 0.4031 (0.4121) loss 8.4497 (7.4182) grad_norm 1.7560 (2.3754) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][290/625] eta 0:02:18 lr 0.000843 wd 0.0500 time 0.4001 (0.4120) data time 0.0009 (0.0023) model time 0.3992 (0.4116) loss 7.8413 (7.4147) grad_norm 1.6090 (2.3572) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][300/625] eta 0:02:13 lr 0.000843 wd 0.0500 time 0.3978 (0.4116) data time 0.0008 (0.0022) model time 0.3969 (0.4112) loss 6.9922 (7.4063) grad_norm 3.6743 (2.3484) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][310/625] eta 0:02:09 lr 0.000843 wd 0.0500 time 0.4054 (0.4113) data time 0.0010 (0.0022) model time 0.4045 (0.4108) loss 7.0826 (7.3991) grad_norm 2.5891 (2.3432) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][320/625] eta 0:02:05 lr 0.000842 wd 0.0500 time 0.3987 (0.4114) data time 0.0009 (0.0021) model time 0.3978 (0.4108) loss 8.3913 (7.4093) grad_norm 2.1214 (2.3488) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][330/625] eta 0:02:01 lr 0.000842 wd 0.0500 time 0.6047 (0.4130) data time 0.0009 (0.0021) model time 0.6038 (0.4127) loss 6.9299 (7.4098) grad_norm 1.5593 (2.3590) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][340/625] eta 0:01:58 lr 0.000842 wd 0.0500 time 0.5883 (0.4160) data time 0.0007 (0.0021) model time 0.5876 (0.4162) loss 7.8743 (7.4113) grad_norm 1.6983 (2.3550) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][350/625] eta 0:01:54 lr 0.000842 wd 0.0500 time 0.4041 (0.4165) data time 0.0008 (0.0020) model time 0.4032 (0.4168) loss 8.2075 (7.4214) grad_norm 4.6124 (2.3699) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][360/625] eta 0:01:50 lr 0.000842 wd 0.0500 time 0.3967 (0.4164) data time 0.0009 (0.0020) model time 0.3957 (0.4166) loss 8.0746 (7.4204) grad_norm 2.5447 (2.3806) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][370/625] eta 0:01:46 lr 0.000842 wd 0.0500 time 0.4016 (0.4160) data time 0.0008 (0.0020) model time 0.4008 (0.4162) loss 9.1831 (7.4177) grad_norm 2.0844 (2.3708) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][380/625] eta 0:01:41 lr 0.000842 wd 0.0500 time 0.4054 (0.4157) data time 0.0006 (0.0020) model time 0.4048 (0.4158) loss 6.4475 (7.4080) grad_norm 1.8125 (2.3605) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][390/625] eta 0:01:37 lr 0.000842 wd 0.0500 time 0.4039 (0.4154) data time 0.0010 (0.0019) model time 0.4029 (0.4153) loss 6.8589 (7.4047) grad_norm 3.8010 (2.3893) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:58:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][400/625] eta 0:01:33 lr 0.000842 wd 0.0500 time 0.4002 (0.4151) data time 0.0007 (0.0019) model time 0.3995 (0.4150) loss 7.0909 (7.4049) grad_norm 3.8164 (2.4083) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][410/625] eta 0:01:29 lr 0.000842 wd 0.0500 time 0.4067 (0.4148) data time 0.0009 (0.0019) model time 0.4058 (0.4146) loss 7.5254 (7.4031) grad_norm 3.3840 (2.4237) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][420/625] eta 0:01:24 lr 0.000841 wd 0.0500 time 0.3944 (0.4145) data time 0.0007 (0.0019) model time 0.3938 (0.4143) loss 6.3979 (7.4083) grad_norm 2.9491 (2.4273) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][430/625] eta 0:01:20 lr 0.000841 wd 0.0500 time 0.4097 (0.4143) data time 0.0007 (0.0018) model time 0.4091 (0.4140) loss 7.0446 (7.4050) grad_norm 1.6877 (2.4137) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][440/625] eta 0:01:16 lr 0.000841 wd 0.0500 time 0.4108 (0.4140) data time 0.0008 (0.0018) model time 0.4100 (0.4137) loss 8.0350 (7.4124) grad_norm 2.8629 (2.4030) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][450/625] eta 0:01:12 lr 0.000841 wd 0.0500 time 0.3956 (0.4138) data time 0.0007 (0.0018) model time 0.3950 (0.4134) loss 6.3245 (7.4078) grad_norm 2.1178 (2.4066) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][460/625] eta 0:01:08 lr 0.000841 wd 0.0500 time 0.4034 (0.4135) data time 0.0009 (0.0018) model time 0.4025 (0.4131) loss 7.4906 (7.4109) grad_norm 3.5358 (2.4127) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][470/625] eta 0:01:04 lr 0.000841 wd 0.0500 time 0.4093 (0.4133) data time 0.0007 (0.0018) model time 0.4086 (0.4129) loss 8.6695 (7.4013) grad_norm 2.4445 (2.4155) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][480/625] eta 0:00:59 lr 0.000841 wd 0.0500 time 0.3961 (0.4132) data time 0.0009 (0.0017) model time 0.3952 (0.4127) loss 7.6089 (7.4000) grad_norm 2.4836 (2.4242) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][490/625] eta 0:00:55 lr 0.000841 wd 0.0500 time 0.4036 (0.4130) data time 0.0008 (0.0017) model time 0.4028 (0.4125) loss 8.5956 (7.4043) grad_norm 3.2022 (2.4318) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][500/625] eta 0:00:51 lr 0.000841 wd 0.0500 time 0.4113 (0.4128) data time 0.0008 (0.0017) model time 0.4105 (0.4123) loss 6.3110 (7.4012) grad_norm 2.3584 (2.4298) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][510/625] eta 0:00:47 lr 0.000841 wd 0.0500 time 0.3974 (0.4126) data time 0.0006 (0.0017) model time 0.3968 (0.4120) loss 7.6956 (7.4012) grad_norm 1.5911 (2.4285) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][520/625] eta 0:00:43 lr 0.000840 wd 0.0500 time 0.4037 (0.4124) data time 0.0008 (0.0017) model time 0.4029 (0.4118) loss 6.0786 (7.3960) grad_norm 2.4389 (2.4285) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][530/625] eta 0:00:39 lr 0.000840 wd 0.0500 time 0.4187 (0.4123) data time 0.0008 (0.0017) model time 0.4179 (0.4117) loss 7.7721 (7.4043) grad_norm 2.1460 (2.4226) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][540/625] eta 0:00:35 lr 0.000840 wd 0.0500 time 0.3967 (0.4123) data time 0.0008 (0.0017) model time 0.3959 (0.4117) loss 7.5948 (7.4008) grad_norm 2.1355 (2.4159) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 00:59:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][550/625] eta 0:00:30 lr 0.000840 wd 0.0500 time 0.6198 (0.4132) data time 0.0008 (0.0016) model time 0.6190 (0.4126) loss 8.1220 (7.4024) grad_norm 2.2689 (2.4092) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][560/625] eta 0:00:26 lr 0.000840 wd 0.0500 time 0.6138 (0.4150) data time 0.0007 (0.0016) model time 0.6131 (0.4146) loss 6.7074 (7.4104) grad_norm 2.3017 (2.4021) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][570/625] eta 0:00:22 lr 0.000840 wd 0.0500 time 0.3990 (0.4153) data time 0.0010 (0.0016) model time 0.3980 (0.4150) loss 7.6789 (7.4122) grad_norm 2.0052 (2.3960) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][580/625] eta 0:00:18 lr 0.000840 wd 0.0500 time 0.3975 (0.4154) data time 0.0007 (0.0016) model time 0.3968 (0.4150) loss 7.3620 (7.4090) grad_norm 1.9487 (2.3921) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][590/625] eta 0:00:14 lr 0.000840 wd 0.0500 time 0.3985 (0.4151) data time 0.0008 (0.0016) model time 0.3977 (0.4147) loss 7.5432 (7.4076) grad_norm 1.7833 (2.3941) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][600/625] eta 0:00:10 lr 0.000840 wd 0.0500 time 0.4100 (0.4149) data time 0.0009 (0.0016) model time 0.4091 (0.4145) loss 8.0905 (7.4149) grad_norm 1.8884 (2.3899) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][610/625] eta 0:00:06 lr 0.000840 wd 0.0500 time 0.3957 (0.4146) data time 0.0006 (0.0016) model time 0.3951 (0.4142) loss 7.3709 (7.4094) grad_norm 2.7458 (2.3826) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][620/625] eta 0:00:02 lr 0.000840 wd 0.0500 time 0.3978 (0.4144) data time 0.0004 (0.0016) model time 0.3974 (0.4139) loss 6.1936 (7.4043) grad_norm 3.3757 (2.3775) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 123 training takes 0:04:18 [2024-07-25 01:00:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:00:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:00:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5967 (0.5967) Acc@1 88.281 (88.281) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:00:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9922 (0.7519) Acc@1 77.637 (84.295) Acc@5 95.361 (97.110) Mem 14939MB [2024-07-25 01:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.1143 (0.8893) Acc@1 73.730 (80.813) Acc@5 93.457 (95.622) Mem 14939MB [2024-07-25 01:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.454 Acc@5 95.595 [2024-07-25 01:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-07-25 01:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.45% [2024-07-25 01:00:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 01:00:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 01:00:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 0.5845 (0.5845) Acc@1 88.770 (88.770) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 01:00:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.9531 (0.7298) Acc@1 79.443 (85.023) Acc@5 95.068 (97.332) Mem 14939MB [2024-07-25 01:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0889 (0.8638) Acc@1 74.561 (81.508) Acc@5 94.092 (95.912) Mem 14939MB [2024-07-25 01:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.190 Acc@5 95.879 [2024-07-25 01:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-07-25 01:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.19% [2024-07-25 01:00:36 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:00:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:00:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][0/625] eta 0:07:29 lr 0.000839 wd 0.0500 time 0.7195 (0.7195) data time 0.3225 (0.3225) model time 0.0000 (0.0000) loss 7.4717 (7.4717) grad_norm 2.5707 (2.5707) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][10/625] eta 0:04:25 lr 0.000839 wd 0.0500 time 0.3953 (0.4315) data time 0.0009 (0.0303) model time 0.0000 (0.0000) loss 7.9627 (7.4534) grad_norm 1.5706 (2.3310) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][20/625] eta 0:04:12 lr 0.000839 wd 0.0500 time 0.4002 (0.4177) data time 0.0006 (0.0163) model time 0.0000 (0.0000) loss 8.1205 (7.6709) grad_norm 2.4112 (2.4891) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][30/625] eta 0:04:05 lr 0.000839 wd 0.0500 time 0.4071 (0.4129) data time 0.0006 (0.0113) model time 0.0000 (0.0000) loss 8.4508 (7.6465) grad_norm 1.8794 (2.5771) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][40/625] eta 0:04:00 lr 0.000839 wd 0.0500 time 0.3938 (0.4104) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 6.0983 (7.5220) grad_norm 1.7321 (2.5605) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:00:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][50/625] eta 0:03:55 lr 0.000839 wd 0.0500 time 0.3978 (0.4088) data time 0.0007 (0.0073) model time 0.0000 (0.0000) loss 7.7934 (7.4683) grad_norm 1.5834 (2.4465) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][60/625] eta 0:03:50 lr 0.000839 wd 0.0500 time 0.4064 (0.4077) data time 0.0007 (0.0062) model time 0.4057 (0.4011) loss 7.9154 (7.4577) grad_norm 1.9768 (2.3739) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][70/625] eta 0:03:45 lr 0.000839 wd 0.0500 time 0.3989 (0.4068) data time 0.0006 (0.0055) model time 0.3983 (0.4008) loss 7.5563 (7.4198) grad_norm 4.3523 (2.3874) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][80/625] eta 0:03:42 lr 0.000839 wd 0.0500 time 0.4067 (0.4088) data time 0.0006 (0.0050) model time 0.4060 (0.4078) loss 6.0345 (7.4121) grad_norm 3.0892 (2.5136) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][90/625] eta 0:03:38 lr 0.000839 wd 0.0500 time 0.3978 (0.4081) data time 0.0007 (0.0045) model time 0.3971 (0.4063) loss 6.1695 (7.3982) grad_norm 4.3050 (2.5831) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][100/625] eta 0:03:34 lr 0.000838 wd 0.0500 time 0.4001 (0.4076) data time 0.0007 (0.0042) model time 0.3995 (0.4054) loss 6.8555 (7.3896) grad_norm 4.1636 (2.5784) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][110/625] eta 0:03:29 lr 0.000838 wd 0.0500 time 0.4079 (0.4072) data time 0.0006 (0.0039) model time 0.4073 (0.4048) loss 5.7684 (7.3369) grad_norm 1.8118 (2.5902) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][120/625] eta 0:03:25 lr 0.000838 wd 0.0500 time 0.3972 (0.4069) data time 0.0007 (0.0037) model time 0.3966 (0.4044) loss 6.8675 (7.3506) grad_norm 3.3291 (2.5762) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][130/625] eta 0:03:21 lr 0.000838 wd 0.0500 time 0.3986 (0.4065) data time 0.0007 (0.0035) model time 0.3979 (0.4040) loss 7.5738 (7.3092) grad_norm 1.6977 (2.5524) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][140/625] eta 0:03:18 lr 0.000838 wd 0.0500 time 0.5960 (0.4090) data time 0.0009 (0.0033) model time 0.5951 (0.4081) loss 7.8544 (7.3290) grad_norm 2.0583 (2.5194) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][150/625] eta 0:03:17 lr 0.000838 wd 0.0500 time 0.6142 (0.4156) data time 0.0006 (0.0031) model time 0.6136 (0.4180) loss 8.0335 (7.3252) grad_norm 1.7113 (2.5034) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][160/625] eta 0:03:15 lr 0.000838 wd 0.0500 time 0.4021 (0.4195) data time 0.0007 (0.0030) model time 0.4014 (0.4234) loss 7.9220 (7.3345) grad_norm 2.0413 (2.4739) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][170/625] eta 0:03:11 lr 0.000838 wd 0.0500 time 0.3960 (0.4202) data time 0.0007 (0.0029) model time 0.3953 (0.4240) loss 5.7309 (7.3481) grad_norm 1.4796 (2.4440) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][180/625] eta 0:03:06 lr 0.000838 wd 0.0500 time 0.3963 (0.4191) data time 0.0009 (0.0028) model time 0.3953 (0.4221) loss 7.2517 (7.3539) grad_norm 1.8005 (2.4446) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:01:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][190/625] eta 0:03:01 lr 0.000838 wd 0.0500 time 0.4021 (0.4182) data time 0.0006 (0.0027) model time 0.4015 (0.4206) loss 6.9821 (7.3548) grad_norm 2.7722 (2.4225) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][200/625] eta 0:02:57 lr 0.000837 wd 0.0500 time 0.4017 (0.4173) data time 0.0007 (0.0026) model time 0.4010 (0.4192) loss 5.9361 (7.3459) grad_norm 2.3453 (2.4193) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][210/625] eta 0:02:52 lr 0.000837 wd 0.0500 time 0.3985 (0.4166) data time 0.0009 (0.0025) model time 0.3976 (0.4180) loss 8.3514 (7.3501) grad_norm 2.4650 (2.4400) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][220/625] eta 0:02:48 lr 0.000837 wd 0.0500 time 0.4039 (0.4159) data time 0.0008 (0.0024) model time 0.4031 (0.4170) loss 8.6624 (7.3551) grad_norm 2.0222 (2.4288) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][230/625] eta 0:02:44 lr 0.000837 wd 0.0500 time 0.4054 (0.4153) data time 0.0005 (0.0024) model time 0.4049 (0.4161) loss 8.0284 (7.3585) grad_norm 1.9330 (2.4137) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][240/625] eta 0:02:39 lr 0.000837 wd 0.0500 time 0.3973 (0.4148) data time 0.0009 (0.0023) model time 0.3964 (0.4154) loss 7.6890 (7.3632) grad_norm 1.5380 (2.3894) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][250/625] eta 0:02:35 lr 0.000837 wd 0.0500 time 0.4111 (0.4143) data time 0.0007 (0.0023) model time 0.4104 (0.4147) loss 8.5598 (7.3753) grad_norm 2.5583 (2.3763) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][260/625] eta 0:02:30 lr 0.000837 wd 0.0500 time 0.4032 (0.4137) data time 0.0006 (0.0022) model time 0.4026 (0.4139) loss 6.7051 (7.3740) grad_norm 2.4011 (2.3774) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][270/625] eta 0:02:26 lr 0.000837 wd 0.0500 time 0.3947 (0.4132) data time 0.0010 (0.0022) model time 0.3937 (0.4132) loss 7.3214 (7.3953) grad_norm 2.6482 (2.3912) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][280/625] eta 0:02:22 lr 0.000837 wd 0.0500 time 0.3995 (0.4127) data time 0.0009 (0.0021) model time 0.3986 (0.4126) loss 7.9882 (7.3974) grad_norm 1.9814 (2.3837) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][290/625] eta 0:02:18 lr 0.000837 wd 0.0500 time 0.4136 (0.4124) data time 0.0007 (0.0021) model time 0.4130 (0.4122) loss 6.5753 (7.4032) grad_norm 1.8004 (2.3624) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][300/625] eta 0:02:14 lr 0.000837 wd 0.0500 time 0.3964 (0.4127) data time 0.0009 (0.0021) model time 0.3955 (0.4125) loss 7.3568 (7.3980) grad_norm 4.6110 (2.3741) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][310/625] eta 0:02:09 lr 0.000836 wd 0.0500 time 0.4014 (0.4123) data time 0.0008 (0.0020) model time 0.4006 (0.4120) loss 6.2342 (7.4120) grad_norm 1.9042 (2.3911) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][320/625] eta 0:02:05 lr 0.000836 wd 0.0500 time 0.4074 (0.4120) data time 0.0008 (0.0020) model time 0.4066 (0.4116) loss 7.8072 (7.4233) grad_norm 2.6825 (2.3946) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][330/625] eta 0:02:01 lr 0.000836 wd 0.0500 time 0.3945 (0.4117) data time 0.0009 (0.0020) model time 0.3936 (0.4112) loss 7.7552 (7.4330) grad_norm 1.5904 (2.3889) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:02:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][340/625] eta 0:01:57 lr 0.000836 wd 0.0500 time 0.4093 (0.4114) data time 0.0009 (0.0019) model time 0.4084 (0.4109) loss 6.8771 (7.4247) grad_norm 1.7805 (2.3794) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][350/625] eta 0:01:53 lr 0.000836 wd 0.0500 time 0.4011 (0.4111) data time 0.0010 (0.0019) model time 0.4000 (0.4105) loss 6.6588 (7.4268) grad_norm 1.6520 (2.3761) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][360/625] eta 0:01:48 lr 0.000836 wd 0.0500 time 0.3966 (0.4113) data time 0.0006 (0.0019) model time 0.3960 (0.4107) loss 6.9773 (7.4201) grad_norm 3.3439 (2.3798) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][370/625] eta 0:01:45 lr 0.000836 wd 0.0500 time 0.5897 (0.4136) data time 0.0008 (0.0019) model time 0.5889 (0.4133) loss 7.1141 (7.4250) grad_norm 2.6205 (2.4003) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][380/625] eta 0:01:41 lr 0.000836 wd 0.0500 time 0.5185 (0.4151) data time 0.0007 (0.0018) model time 0.5178 (0.4151) loss 7.6093 (7.4234) grad_norm 2.6615 (2.4097) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][390/625] eta 0:01:37 lr 0.000836 wd 0.0500 time 0.4017 (0.4152) data time 0.0010 (0.0018) model time 0.4006 (0.4152) loss 8.1310 (7.4250) grad_norm 3.2260 (2.4177) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][400/625] eta 0:01:33 lr 0.000836 wd 0.0500 time 0.4097 (0.4148) data time 0.0007 (0.0018) model time 0.4090 (0.4147) loss 6.7437 (7.4241) grad_norm 2.0860 (2.4208) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][410/625] eta 0:01:29 lr 0.000835 wd 0.0500 time 0.3931 (0.4145) data time 0.0008 (0.0018) model time 0.3924 (0.4143) loss 6.1255 (7.4269) grad_norm 2.5413 (2.4252) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][420/625] eta 0:01:24 lr 0.000835 wd 0.0500 time 0.3971 (0.4141) data time 0.0008 (0.0017) model time 0.3963 (0.4139) loss 8.4526 (7.4294) grad_norm 1.9431 (2.4196) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][430/625] eta 0:01:20 lr 0.000835 wd 0.0500 time 0.4091 (0.4139) data time 0.0009 (0.0017) model time 0.4082 (0.4136) loss 7.1186 (7.4366) grad_norm 2.1754 (2.4203) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][440/625] eta 0:01:16 lr 0.000835 wd 0.0500 time 0.3956 (0.4135) data time 0.0009 (0.0017) model time 0.3947 (0.4132) loss 6.8845 (7.4262) grad_norm 1.9246 (2.4169) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][450/625] eta 0:01:12 lr 0.000835 wd 0.0500 time 0.3984 (0.4133) data time 0.0010 (0.0017) model time 0.3973 (0.4129) loss 7.3676 (7.4263) grad_norm 1.6880 (2.4074) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][460/625] eta 0:01:08 lr 0.000835 wd 0.0500 time 0.4100 (0.4130) data time 0.0007 (0.0017) model time 0.4093 (0.4126) loss 8.4424 (7.4264) grad_norm 2.5949 (2.4044) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][470/625] eta 0:01:03 lr 0.000835 wd 0.0500 time 0.3957 (0.4128) data time 0.0008 (0.0017) model time 0.3949 (0.4123) loss 7.5712 (7.4219) grad_norm 2.9768 (2.4029) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][480/625] eta 0:00:59 lr 0.000835 wd 0.0500 time 0.3996 (0.4125) data time 0.0006 (0.0017) model time 0.3990 (0.4120) loss 7.5199 (7.4200) grad_norm 1.3632 (2.3967) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][490/625] eta 0:00:55 lr 0.000835 wd 0.0500 time 0.4080 (0.4124) data time 0.0007 (0.0016) model time 0.4073 (0.4118) loss 6.0572 (7.4196) grad_norm 2.0081 (2.3945) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:04:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][500/625] eta 0:00:51 lr 0.000835 wd 0.0500 time 0.3960 (0.4122) data time 0.0006 (0.0016) model time 0.3954 (0.4116) loss 6.2427 (7.4120) grad_norm 6.4342 (2.3952) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:04:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][510/625] eta 0:00:47 lr 0.000834 wd 0.0500 time 0.3991 (0.4120) data time 0.0008 (0.0016) model time 0.3982 (0.4114) loss 7.4034 (7.4074) grad_norm 5.4755 (2.4077) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][520/625] eta 0:00:43 lr 0.000834 wd 0.0500 time 0.4075 (0.4122) data time 0.0009 (0.0016) model time 0.4066 (0.4116) loss 8.1525 (7.4081) grad_norm 2.1292 (inf) loss_scale 1024.0000 (2036.2073) mem 14939MB [2024-07-25 01:04:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][530/625] eta 0:00:39 lr 0.000834 wd 0.0500 time 0.3946 (0.4120) data time 0.0008 (0.0016) model time 0.3938 (0.4114) loss 8.4044 (7.4043) grad_norm 1.7746 (inf) loss_scale 1024.0000 (2017.1450) mem 14939MB [2024-07-25 01:04:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][540/625] eta 0:00:35 lr 0.000834 wd 0.0500 time 0.4031 (0.4119) data time 0.0008 (0.0016) model time 0.4023 (0.4112) loss 8.7352 (7.4100) grad_norm 2.0706 (inf) loss_scale 1024.0000 (1998.7874) mem 14939MB [2024-07-25 01:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][550/625] eta 0:00:30 lr 0.000834 wd 0.0500 time 0.4197 (0.4117) data time 0.0009 (0.0016) model time 0.4188 (0.4110) loss 6.2138 (7.4108) grad_norm 1.6170 (inf) loss_scale 1024.0000 (1981.0962) mem 14939MB [2024-07-25 01:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][560/625] eta 0:00:26 lr 0.000834 wd 0.0500 time 0.3963 (0.4115) data time 0.0007 (0.0016) model time 0.3956 (0.4108) loss 5.9391 (7.4057) grad_norm 1.9236 (inf) loss_scale 1024.0000 (1964.0357) mem 14939MB [2024-07-25 01:04:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][570/625] eta 0:00:22 lr 0.000834 wd 0.0500 time 0.3973 (0.4114) data time 0.0009 (0.0016) model time 0.3964 (0.4106) loss 6.7510 (7.4066) grad_norm 2.3622 (inf) loss_scale 1024.0000 (1947.5727) mem 14939MB [2024-07-25 01:04:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][580/625] eta 0:00:18 lr 0.000834 wd 0.0500 time 0.4077 (0.4114) data time 0.0006 (0.0015) model time 0.4071 (0.4107) loss 6.3682 (7.4168) grad_norm 3.1384 (inf) loss_scale 1024.0000 (1931.6764) mem 14939MB [2024-07-25 01:04:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][590/625] eta 0:00:14 lr 0.000834 wd 0.0500 time 0.4021 (0.4130) data time 0.0007 (0.0015) model time 0.4014 (0.4125) loss 7.9080 (7.4183) grad_norm 4.3312 (inf) loss_scale 1024.0000 (1916.3181) mem 14939MB [2024-07-25 01:04:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][600/625] eta 0:00:10 lr 0.000834 wd 0.0500 time 0.5725 (0.4143) data time 0.0007 (0.0015) model time 0.5719 (0.4138) loss 6.8998 (7.4216) grad_norm 3.7387 (inf) loss_scale 1024.0000 (1901.4709) mem 14939MB [2024-07-25 01:04:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][610/625] eta 0:00:06 lr 0.000833 wd 0.0500 time 0.4149 (0.4144) data time 0.0006 (0.0015) model time 0.4143 (0.4139) loss 8.0509 (7.4193) grad_norm 1.6326 (inf) loss_scale 1024.0000 (1887.1097) mem 14939MB [2024-07-25 01:04:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][620/625] eta 0:00:02 lr 0.000833 wd 0.0500 time 0.3926 (0.4141) data time 0.0006 (0.0015) model time 0.3920 (0.4136) loss 8.0365 (7.4170) grad_norm 1.6517 (inf) loss_scale 1024.0000 (1873.2110) mem 14939MB [2024-07-25 01:04:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 124 training takes 0:04:18 [2024-07-25 01:04:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:04:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:04:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.6343 (0.6343) Acc@1 87.598 (87.598) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-25 01:04:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 1.0264 (0.7797) Acc@1 77.441 (84.029) Acc@5 94.775 (97.026) Mem 14939MB [2024-07-25 01:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.1289 (0.9168) Acc@1 74.023 (80.452) Acc@5 93.213 (95.466) Mem 14939MB [2024-07-25 01:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.126 Acc@5 95.429 [2024-07-25 01:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-07-25 01:05:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.746 (0.746) Loss 0.5840 (0.5840) Acc@1 88.818 (88.818) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 01:05:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.152) Loss 0.9507 (0.7288) Acc@1 79.346 (85.041) Acc@5 95.117 (97.332) Mem 14939MB [2024-07-25 01:05:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.120) Loss 1.0869 (0.8626) Acc@1 74.658 (81.527) Acc@5 94.043 (95.936) Mem 14939MB [2024-07-25 01:05:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.202 Acc@5 95.899 [2024-07-25 01:05:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-07-25 01:05:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.20% [2024-07-25 01:05:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:05:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:05:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][0/625] eta 0:07:57 lr 0.000833 wd 0.0500 time 0.7636 (0.7636) data time 0.3795 (0.3795) model time 0.0000 (0.0000) loss 8.5739 (8.5739) grad_norm 4.9261 (4.9261) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][10/625] eta 0:04:26 lr 0.000833 wd 0.0500 time 0.4000 (0.4338) data time 0.0009 (0.0356) model time 0.0000 (0.0000) loss 8.3478 (7.7265) grad_norm 1.7715 (2.8749) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][20/625] eta 0:04:12 lr 0.000833 wd 0.0500 time 0.3947 (0.4174) data time 0.0008 (0.0191) model time 0.0000 (0.0000) loss 7.1281 (7.6902) grad_norm 2.3831 (2.6018) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][30/625] eta 0:04:05 lr 0.000833 wd 0.0500 time 0.4070 (0.4121) data time 0.0007 (0.0132) model time 0.0000 (0.0000) loss 7.8392 (7.6862) grad_norm 2.8267 (2.5861) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][40/625] eta 0:03:59 lr 0.000833 wd 0.0500 time 0.3958 (0.4096) data time 0.0006 (0.0102) model time 0.0000 (0.0000) loss 6.4338 (7.5797) grad_norm 2.7255 (2.4657) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][50/625] eta 0:03:56 lr 0.000833 wd 0.0500 time 0.3981 (0.4107) data time 0.0007 (0.0084) model time 0.0000 (0.0000) loss 6.8269 (7.6171) grad_norm 2.6226 (2.4245) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][60/625] eta 0:03:51 lr 0.000833 wd 0.0500 time 0.4089 (0.4092) data time 0.0008 (0.0072) model time 0.4081 (0.4002) loss 6.2326 (7.5624) grad_norm 1.9258 (2.4166) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][70/625] eta 0:03:46 lr 0.000833 wd 0.0500 time 0.3942 (0.4078) data time 0.0009 (0.0063) model time 0.3933 (0.3993) loss 7.4458 (7.5507) grad_norm 1.9558 (2.3816) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][80/625] eta 0:03:41 lr 0.000833 wd 0.0500 time 0.3986 (0.4070) data time 0.0009 (0.0057) model time 0.3977 (0.3996) loss 7.8683 (7.5219) grad_norm 2.2362 (2.3596) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][90/625] eta 0:03:37 lr 0.000832 wd 0.0500 time 0.4077 (0.4067) data time 0.0009 (0.0052) model time 0.4068 (0.4005) loss 8.5351 (7.5687) grad_norm 3.8819 (2.4136) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][100/625] eta 0:03:33 lr 0.000832 wd 0.0500 time 0.3971 (0.4062) data time 0.0009 (0.0047) model time 0.3962 (0.4006) loss 5.8848 (7.5356) grad_norm 2.0985 (2.3931) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][110/625] eta 0:03:29 lr 0.000832 wd 0.0500 time 0.3983 (0.4059) data time 0.0006 (0.0044) model time 0.3977 (0.4008) loss 6.2980 (7.5045) grad_norm 2.4619 (2.3620) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][120/625] eta 0:03:24 lr 0.000832 wd 0.0500 time 0.4114 (0.4057) data time 0.0006 (0.0041) model time 0.4108 (0.4010) loss 7.3264 (7.4553) grad_norm 1.7162 (2.3410) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:05:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][130/625] eta 0:03:20 lr 0.000832 wd 0.0500 time 0.4027 (0.4053) data time 0.0008 (0.0039) model time 0.4019 (0.4009) loss 7.9869 (7.4707) grad_norm 3.9944 (2.3796) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][140/625] eta 0:03:16 lr 0.000832 wd 0.0500 time 0.3995 (0.4052) data time 0.0009 (0.0037) model time 0.3986 (0.4010) loss 8.4259 (7.4872) grad_norm 3.0276 (2.4090) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][150/625] eta 0:03:12 lr 0.000832 wd 0.0500 time 0.4125 (0.4050) data time 0.0010 (0.0035) model time 0.4115 (0.4011) loss 8.3049 (7.4685) grad_norm 2.4198 (2.4139) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][160/625] eta 0:03:08 lr 0.000832 wd 0.0500 time 0.3975 (0.4048) data time 0.0007 (0.0033) model time 0.3967 (0.4011) loss 7.4186 (7.4695) grad_norm 2.4026 (2.4270) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][170/625] eta 0:03:04 lr 0.000832 wd 0.0500 time 0.4007 (0.4046) data time 0.0009 (0.0032) model time 0.3999 (0.4011) loss 7.2223 (7.4622) grad_norm 1.8861 (2.4531) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][180/625] eta 0:03:01 lr 0.000832 wd 0.0500 time 0.6138 (0.4074) data time 0.0008 (0.0031) model time 0.6130 (0.4050) loss 7.0280 (7.4855) grad_norm 2.1950 (2.4401) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][190/625] eta 0:02:59 lr 0.000831 wd 0.0500 time 0.5798 (0.4131) data time 0.0007 (0.0030) model time 0.5791 (0.4130) loss 6.1240 (7.4809) grad_norm 2.8100 (2.4409) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][200/625] eta 0:02:56 lr 0.000831 wd 0.0500 time 0.3961 (0.4161) data time 0.0006 (0.0029) model time 0.3954 (0.4169) loss 8.7336 (7.4848) grad_norm 2.2921 (2.4250) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][210/625] eta 0:02:52 lr 0.000831 wd 0.0500 time 0.4047 (0.4153) data time 0.0009 (0.0028) model time 0.4038 (0.4158) loss 8.0511 (7.4738) grad_norm 2.5651 (2.4110) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][220/625] eta 0:02:47 lr 0.000831 wd 0.0500 time 0.4161 (0.4148) data time 0.0008 (0.0027) model time 0.4153 (0.4150) loss 8.1434 (7.4818) grad_norm 1.6748 (2.4040) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][230/625] eta 0:02:43 lr 0.000831 wd 0.0500 time 0.4021 (0.4142) data time 0.0006 (0.0026) model time 0.4014 (0.4142) loss 7.9684 (7.4947) grad_norm 2.2010 (2.3919) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][240/625] eta 0:02:39 lr 0.000831 wd 0.0500 time 0.3952 (0.4136) data time 0.0009 (0.0025) model time 0.3944 (0.4134) loss 8.0561 (7.4984) grad_norm 2.0350 (2.3790) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][250/625] eta 0:02:34 lr 0.000831 wd 0.0500 time 0.4127 (0.4132) data time 0.0007 (0.0025) model time 0.4119 (0.4128) loss 8.0185 (7.5093) grad_norm 1.4837 (2.3704) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][260/625] eta 0:02:30 lr 0.000831 wd 0.0500 time 0.4004 (0.4126) data time 0.0009 (0.0024) model time 0.3995 (0.4121) loss 7.3449 (7.4866) grad_norm 2.0118 (2.3681) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][270/625] eta 0:02:26 lr 0.000831 wd 0.0500 time 0.3995 (0.4128) data time 0.0009 (0.0024) model time 0.3986 (0.4124) loss 7.6113 (7.4942) grad_norm 3.2369 (2.3707) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:06:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][280/625] eta 0:02:22 lr 0.000831 wd 0.0500 time 0.4021 (0.4124) data time 0.0006 (0.0023) model time 0.4015 (0.4118) loss 7.7367 (7.4928) grad_norm 2.0648 (2.3729) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:07:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][290/625] eta 0:02:18 lr 0.000830 wd 0.0500 time 0.3969 (0.4121) data time 0.0008 (0.0023) model time 0.3961 (0.4114) loss 8.1985 (7.4914) grad_norm 1.7477 (2.3780) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:07:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][300/625] eta 0:02:13 lr 0.000830 wd 0.0500 time 0.4025 (0.4117) data time 0.0009 (0.0022) model time 0.4016 (0.4109) loss 7.9057 (7.4888) grad_norm 3.1860 (2.3757) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:07:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][310/625] eta 0:02:09 lr 0.000830 wd 0.0500 time 0.4163 (0.4116) data time 0.0009 (0.0022) model time 0.4154 (0.4108) loss 7.6811 (7.4788) grad_norm 2.6080 (inf) loss_scale 512.0000 (1017.4148) mem 14939MB [2024-07-25 01:07:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][320/625] eta 0:02:05 lr 0.000830 wd 0.0500 time 0.3951 (0.4115) data time 0.0007 (0.0022) model time 0.3945 (0.4106) loss 6.9867 (7.4687) grad_norm 2.9032 (inf) loss_scale 512.0000 (1001.6698) mem 14939MB [2024-07-25 01:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][330/625] eta 0:02:01 lr 0.000830 wd 0.0500 time 0.4092 (0.4112) data time 0.0007 (0.0022) model time 0.4085 (0.4102) loss 7.9059 (7.4685) grad_norm 2.3556 (inf) loss_scale 512.0000 (986.8761) mem 14939MB [2024-07-25 01:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][340/625] eta 0:01:57 lr 0.000830 wd 0.0500 time 0.4052 (0.4109) data time 0.0006 (0.0022) model time 0.4046 (0.4098) loss 7.4787 (7.4575) grad_norm 2.6088 (inf) loss_scale 512.0000 (972.9501) mem 14939MB [2024-07-25 01:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][350/625] eta 0:01:52 lr 0.000830 wd 0.0500 time 0.3998 (0.4107) data time 0.0007 (0.0021) model time 0.3992 (0.4096) loss 7.7094 (7.4599) grad_norm 2.6653 (inf) loss_scale 512.0000 (959.8177) mem 14939MB [2024-07-25 01:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][360/625] eta 0:01:48 lr 0.000830 wd 0.0500 time 0.3960 (0.4104) data time 0.0009 (0.0021) model time 0.3951 (0.4093) loss 7.7682 (7.4617) grad_norm 1.4977 (inf) loss_scale 512.0000 (947.4127) mem 14939MB [2024-07-25 01:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][370/625] eta 0:01:44 lr 0.000830 wd 0.0500 time 0.4074 (0.4102) data time 0.0008 (0.0021) model time 0.4066 (0.4091) loss 7.0421 (7.4487) grad_norm 1.4459 (inf) loss_scale 512.0000 (935.6765) mem 14939MB [2024-07-25 01:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][380/625] eta 0:01:40 lr 0.000830 wd 0.0500 time 0.3939 (0.4100) data time 0.0007 (0.0021) model time 0.3932 (0.4088) loss 7.8782 (7.4519) grad_norm 2.2843 (inf) loss_scale 512.0000 (924.5564) mem 14939MB [2024-07-25 01:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][390/625] eta 0:01:36 lr 0.000829 wd 0.0500 time 0.3934 (0.4098) data time 0.0009 (0.0020) model time 0.3926 (0.4085) loss 8.8620 (7.4673) grad_norm 3.1296 (inf) loss_scale 512.0000 (914.0051) mem 14939MB [2024-07-25 01:07:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][400/625] eta 0:01:32 lr 0.000829 wd 0.0500 time 0.3936 (0.4109) data time 0.0007 (0.0020) model time 0.3929 (0.4099) loss 7.8217 (7.4653) grad_norm 1.8501 (inf) loss_scale 512.0000 (903.9800) mem 14939MB [2024-07-25 01:07:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][410/625] eta 0:01:28 lr 0.000829 wd 0.0500 time 0.5656 (0.4138) data time 0.0006 (0.0020) model time 0.5650 (0.4132) loss 6.0749 (7.4673) grad_norm 3.7781 (inf) loss_scale 512.0000 (894.4428) mem 14939MB [2024-07-25 01:07:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][420/625] eta 0:01:25 lr 0.000829 wd 0.0500 time 0.3990 (0.4152) data time 0.0008 (0.0020) model time 0.3982 (0.4147) loss 8.1559 (7.4618) grad_norm 3.2035 (inf) loss_scale 512.0000 (885.3587) mem 14939MB [2024-07-25 01:08:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][430/625] eta 0:01:20 lr 0.000829 wd 0.0500 time 0.3958 (0.4148) data time 0.0007 (0.0019) model time 0.3951 (0.4143) loss 7.9989 (7.4615) grad_norm 1.8128 (inf) loss_scale 512.0000 (876.6961) mem 14939MB [2024-07-25 01:08:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][440/625] eta 0:01:16 lr 0.000829 wd 0.0500 time 0.4119 (0.4145) data time 0.0010 (0.0019) model time 0.4109 (0.4140) loss 8.3863 (7.4587) grad_norm 3.2158 (inf) loss_scale 512.0000 (868.4263) mem 14939MB [2024-07-25 01:08:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][450/625] eta 0:01:12 lr 0.000829 wd 0.0500 time 0.3948 (0.4142) data time 0.0009 (0.0019) model time 0.3940 (0.4136) loss 6.7958 (7.4531) grad_norm 2.2879 (inf) loss_scale 512.0000 (860.5233) mem 14939MB [2024-07-25 01:08:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][460/625] eta 0:01:08 lr 0.000829 wd 0.0500 time 0.3944 (0.4139) data time 0.0009 (0.0019) model time 0.3935 (0.4132) loss 6.1523 (7.4488) grad_norm 2.5887 (inf) loss_scale 512.0000 (852.9631) mem 14939MB [2024-07-25 01:08:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][470/625] eta 0:01:04 lr 0.000829 wd 0.0500 time 0.4041 (0.4136) data time 0.0009 (0.0019) model time 0.4033 (0.4129) loss 6.6897 (7.4371) grad_norm 2.3287 (inf) loss_scale 512.0000 (845.7240) mem 14939MB [2024-07-25 01:08:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][480/625] eta 0:00:59 lr 0.000829 wd 0.0500 time 0.3964 (0.4133) data time 0.0006 (0.0018) model time 0.3958 (0.4126) loss 6.9034 (7.4385) grad_norm 2.8877 (inf) loss_scale 512.0000 (838.7859) mem 14939MB [2024-07-25 01:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][490/625] eta 0:00:55 lr 0.000828 wd 0.0500 time 0.3980 (0.4134) data time 0.0006 (0.0018) model time 0.3973 (0.4127) loss 8.2008 (7.4359) grad_norm 2.3327 (inf) loss_scale 512.0000 (832.1303) mem 14939MB [2024-07-25 01:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][500/625] eta 0:00:51 lr 0.000828 wd 0.0500 time 0.4120 (0.4132) data time 0.0009 (0.0018) model time 0.4111 (0.4125) loss 7.1967 (7.4425) grad_norm 2.0376 (inf) loss_scale 512.0000 (825.7405) mem 14939MB [2024-07-25 01:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][510/625] eta 0:00:47 lr 0.000828 wd 0.0500 time 0.3989 (0.4131) data time 0.0008 (0.0018) model time 0.3981 (0.4123) loss 6.9023 (7.4371) grad_norm 7.0286 (inf) loss_scale 512.0000 (819.6008) mem 14939MB [2024-07-25 01:08:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][520/625] eta 0:00:43 lr 0.000828 wd 0.0500 time 0.3971 (0.4129) data time 0.0009 (0.0018) model time 0.3962 (0.4120) loss 8.5620 (7.4367) grad_norm 2.3709 (inf) loss_scale 512.0000 (813.6967) mem 14939MB [2024-07-25 01:08:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][530/625] eta 0:00:39 lr 0.000828 wd 0.0500 time 0.4142 (0.4127) data time 0.0009 (0.0018) model time 0.4134 (0.4119) loss 6.5809 (7.4318) grad_norm 2.3005 (inf) loss_scale 512.0000 (808.0151) mem 14939MB [2024-07-25 01:08:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][540/625] eta 0:00:35 lr 0.000828 wd 0.0500 time 0.3958 (0.4125) data time 0.0008 (0.0017) model time 0.3949 (0.4116) loss 8.1951 (7.4275) grad_norm 2.5108 (inf) loss_scale 512.0000 (802.5434) mem 14939MB [2024-07-25 01:08:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][550/625] eta 0:00:30 lr 0.000828 wd 0.0500 time 0.4020 (0.4123) data time 0.0009 (0.0017) model time 0.4011 (0.4114) loss 7.2595 (7.4258) grad_norm 3.3951 (inf) loss_scale 512.0000 (797.2704) mem 14939MB [2024-07-25 01:08:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][560/625] eta 0:00:26 lr 0.000828 wd 0.0500 time 0.4067 (0.4122) data time 0.0008 (0.0017) model time 0.4059 (0.4112) loss 8.3549 (7.4283) grad_norm 1.4856 (inf) loss_scale 512.0000 (792.1854) mem 14939MB [2024-07-25 01:08:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][570/625] eta 0:00:22 lr 0.000828 wd 0.0500 time 0.4193 (0.4120) data time 0.0008 (0.0017) model time 0.4185 (0.4111) loss 7.8370 (7.4290) grad_norm 1.5559 (inf) loss_scale 512.0000 (787.2785) mem 14939MB [2024-07-25 01:09:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][580/625] eta 0:00:18 lr 0.000828 wd 0.0500 time 0.3999 (0.4119) data time 0.0007 (0.0017) model time 0.3992 (0.4110) loss 7.7496 (7.4310) grad_norm 2.1478 (inf) loss_scale 512.0000 (782.5404) mem 14939MB [2024-07-25 01:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][590/625] eta 0:00:14 lr 0.000827 wd 0.0500 time 0.4090 (0.4118) data time 0.0008 (0.0017) model time 0.4082 (0.4109) loss 7.5852 (7.4293) grad_norm 2.1053 (inf) loss_scale 512.0000 (777.9628) mem 14939MB [2024-07-25 01:09:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][600/625] eta 0:00:10 lr 0.000827 wd 0.0500 time 0.3943 (0.4117) data time 0.0007 (0.0017) model time 0.3935 (0.4107) loss 6.9610 (7.4290) grad_norm 3.5794 (inf) loss_scale 512.0000 (773.5374) mem 14939MB [2024-07-25 01:09:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][610/625] eta 0:00:06 lr 0.000827 wd 0.0500 time 0.3988 (0.4115) data time 0.0004 (0.0017) model time 0.3984 (0.4105) loss 6.9583 (7.4285) grad_norm 2.4906 (inf) loss_scale 512.0000 (769.2570) mem 14939MB [2024-07-25 01:09:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][620/625] eta 0:00:02 lr 0.000827 wd 0.0500 time 0.5604 (0.4126) data time 0.0006 (0.0017) model time 0.5598 (0.4117) loss 7.9220 (7.4383) grad_norm 2.2520 (inf) loss_scale 512.0000 (765.1143) mem 14939MB [2024-07-25 01:09:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 125 training takes 0:04:17 [2024-07-25 01:09:21 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:09:22 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:09:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.6162 (0.6162) Acc@1 88.135 (88.135) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-25 01:09:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.9937 (0.7500) Acc@1 78.369 (84.575) Acc@5 94.971 (97.159) Mem 14939MB [2024-07-25 01:09:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1533 (0.8942) Acc@1 73.145 (80.924) Acc@5 93.164 (95.585) Mem 14939MB [2024-07-25 01:09:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.580 Acc@5 95.569 [2024-07-25 01:09:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-25 01:09:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.58% [2024-07-25 01:09:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 01:09:25 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 01:09:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.5825 (0.5825) Acc@1 88.867 (88.867) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 01:09:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.9497 (0.7279) Acc@1 79.590 (85.094) Acc@5 95.166 (97.350) Mem 14939MB [2024-07-25 01:09:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.0859 (0.8613) Acc@1 74.756 (81.562) Acc@5 94.092 (95.957) Mem 14939MB [2024-07-25 01:09:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.240 Acc@5 95.925 [2024-07-25 01:09:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-07-25 01:09:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.24% [2024-07-25 01:09:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:09:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:09:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][0/625] eta 0:07:44 lr 0.000827 wd 0.0500 time 0.7433 (0.7433) data time 0.3657 (0.3657) model time 0.0000 (0.0000) loss 7.5898 (7.5898) grad_norm 2.7520 (2.7520) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:09:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][10/625] eta 0:05:41 lr 0.000827 wd 0.0500 time 0.5861 (0.5556) data time 0.0008 (0.0341) model time 0.0000 (0.0000) loss 6.8687 (7.6299) grad_norm 2.7977 (2.8767) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:09:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][20/625] eta 0:04:58 lr 0.000827 wd 0.0500 time 0.3980 (0.4926) data time 0.0008 (0.0183) model time 0.0000 (0.0000) loss 7.3328 (7.2911) grad_norm 2.1422 (2.5404) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:09:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][30/625] eta 0:04:36 lr 0.000827 wd 0.0500 time 0.4057 (0.4642) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 8.1800 (7.5330) grad_norm 1.8506 (2.4075) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:09:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][40/625] eta 0:04:22 lr 0.000827 wd 0.0500 time 0.4085 (0.4494) data time 0.0006 (0.0100) model time 0.0000 (0.0000) loss 7.5745 (7.5593) grad_norm 1.6583 (2.4165) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:09:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][50/625] eta 0:04:13 lr 0.000827 wd 0.0500 time 0.4007 (0.4401) data time 0.0010 (0.0082) model time 0.0000 (0.0000) loss 9.1379 (7.5347) grad_norm 4.1805 (2.8925) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:09:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][60/625] eta 0:04:05 lr 0.000827 wd 0.0500 time 0.3987 (0.4338) data time 0.0006 (0.0070) model time 0.3980 (0.4004) loss 7.3899 (7.4925) grad_norm 2.4404 (2.8635) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:09:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][70/625] eta 0:03:58 lr 0.000826 wd 0.0500 time 0.4205 (0.4298) data time 0.0009 (0.0062) model time 0.4196 (0.4025) loss 7.2198 (7.4574) grad_norm 1.7178 (2.8939) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][80/625] eta 0:03:52 lr 0.000826 wd 0.0500 time 0.3960 (0.4265) data time 0.0006 (0.0055) model time 0.3953 (0.4023) loss 6.9492 (7.4935) grad_norm 2.1728 (2.7957) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][90/625] eta 0:03:46 lr 0.000826 wd 0.0500 time 0.3948 (0.4237) data time 0.0008 (0.0050) model time 0.3940 (0.4019) loss 8.7023 (7.5005) grad_norm 3.1376 (2.7747) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][100/625] eta 0:03:41 lr 0.000826 wd 0.0500 time 0.4070 (0.4218) data time 0.0006 (0.0046) model time 0.4064 (0.4021) loss 8.3252 (7.4747) grad_norm 2.2672 (2.7842) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][110/625] eta 0:03:36 lr 0.000826 wd 0.0500 time 0.3940 (0.4203) data time 0.0007 (0.0043) model time 0.3932 (0.4025) loss 7.7459 (7.4700) grad_norm 2.9933 (2.7548) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][120/625] eta 0:03:31 lr 0.000826 wd 0.0500 time 0.3975 (0.4188) data time 0.0008 (0.0040) model time 0.3967 (0.4022) loss 8.3014 (7.4780) grad_norm 2.0860 (2.7262) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][130/625] eta 0:03:26 lr 0.000826 wd 0.0500 time 0.4085 (0.4177) data time 0.0009 (0.0038) model time 0.4076 (0.4024) loss 8.6154 (7.4755) grad_norm 1.6126 (2.6756) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][140/625] eta 0:03:22 lr 0.000826 wd 0.0500 time 0.3967 (0.4166) data time 0.0007 (0.0036) model time 0.3960 (0.4022) loss 7.2311 (7.4898) grad_norm 1.7263 (2.6319) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][150/625] eta 0:03:17 lr 0.000826 wd 0.0500 time 0.3985 (0.4157) data time 0.0007 (0.0034) model time 0.3978 (0.4022) loss 6.6132 (7.4616) grad_norm 1.5649 (2.6017) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][160/625] eta 0:03:12 lr 0.000826 wd 0.0500 time 0.4036 (0.4148) data time 0.0009 (0.0033) model time 0.4026 (0.4021) loss 7.3798 (7.4701) grad_norm 2.7663 (2.5817) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][170/625] eta 0:03:08 lr 0.000825 wd 0.0500 time 0.3962 (0.4142) data time 0.0008 (0.0031) model time 0.3955 (0.4022) loss 7.9302 (7.4462) grad_norm 2.2881 (2.5518) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][180/625] eta 0:03:04 lr 0.000825 wd 0.0500 time 0.4020 (0.4135) data time 0.0008 (0.0030) model time 0.4012 (0.4022) loss 6.5416 (7.4629) grad_norm 1.4938 (2.5497) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][190/625] eta 0:02:59 lr 0.000825 wd 0.0500 time 0.4104 (0.4130) data time 0.0009 (0.0029) model time 0.4096 (0.4021) loss 7.4995 (7.4670) grad_norm 3.0784 (2.5418) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][200/625] eta 0:02:55 lr 0.000825 wd 0.0500 time 0.3996 (0.4125) data time 0.0007 (0.0028) model time 0.3989 (0.4021) loss 8.1366 (7.4902) grad_norm 2.2216 (2.5323) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:10:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][210/625] eta 0:02:51 lr 0.000825 wd 0.0500 time 0.5574 (0.4133) data time 0.0010 (0.0027) model time 0.5564 (0.4038) loss 7.8469 (7.5052) grad_norm 2.5537 (2.5147) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][220/625] eta 0:02:48 lr 0.000825 wd 0.0500 time 0.3935 (0.4169) data time 0.0008 (0.0026) model time 0.3927 (0.4090) loss 7.5277 (7.5116) grad_norm 2.6939 (2.5162) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][230/625] eta 0:02:45 lr 0.000825 wd 0.0500 time 0.4142 (0.4193) data time 0.0007 (0.0026) model time 0.4135 (0.4125) loss 7.5806 (7.5111) grad_norm 1.7061 (2.5121) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][240/625] eta 0:02:41 lr 0.000825 wd 0.0500 time 0.3982 (0.4199) data time 0.0007 (0.0025) model time 0.3976 (0.4134) loss 7.6012 (7.5124) grad_norm 1.8068 (2.5074) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][250/625] eta 0:02:37 lr 0.000825 wd 0.0500 time 0.4145 (0.4193) data time 0.0009 (0.0025) model time 0.4136 (0.4130) loss 7.0213 (7.4824) grad_norm 2.7066 (2.4996) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][260/625] eta 0:02:32 lr 0.000825 wd 0.0500 time 0.3954 (0.4187) data time 0.0010 (0.0024) model time 0.3945 (0.4124) loss 7.4083 (7.4794) grad_norm 3.4213 (2.5390) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][270/625] eta 0:02:28 lr 0.000824 wd 0.0500 time 0.3987 (0.4180) data time 0.0010 (0.0024) model time 0.3977 (0.4119) loss 7.8406 (7.4688) grad_norm 2.8593 (2.5466) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][280/625] eta 0:02:24 lr 0.000824 wd 0.0500 time 0.4075 (0.4175) data time 0.0008 (0.0023) model time 0.4067 (0.4115) loss 8.1509 (7.4721) grad_norm 1.6226 (2.5398) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][290/625] eta 0:02:19 lr 0.000824 wd 0.0500 time 0.3950 (0.4170) data time 0.0007 (0.0023) model time 0.3943 (0.4111) loss 8.0964 (7.4548) grad_norm 2.1445 (2.5542) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][300/625] eta 0:02:15 lr 0.000824 wd 0.0500 time 0.3963 (0.4165) data time 0.0006 (0.0022) model time 0.3957 (0.4106) loss 8.0143 (7.4546) grad_norm 2.7957 (2.5573) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][310/625] eta 0:02:11 lr 0.000824 wd 0.0500 time 0.4139 (0.4161) data time 0.0008 (0.0022) model time 0.4131 (0.4103) loss 7.2506 (7.4635) grad_norm 1.6657 (2.5638) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][320/625] eta 0:02:06 lr 0.000824 wd 0.0500 time 0.3932 (0.4156) data time 0.0007 (0.0022) model time 0.3925 (0.4100) loss 7.3102 (7.4677) grad_norm 2.0323 (2.5516) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][330/625] eta 0:02:02 lr 0.000824 wd 0.0500 time 0.3958 (0.4152) data time 0.0007 (0.0021) model time 0.3951 (0.4096) loss 7.3077 (7.4663) grad_norm 2.5099 (2.5408) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][340/625] eta 0:01:58 lr 0.000824 wd 0.0500 time 0.4422 (0.4149) data time 0.0008 (0.0021) model time 0.4414 (0.4095) loss 8.1354 (7.4606) grad_norm 2.9678 (2.5320) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][350/625] eta 0:01:54 lr 0.000824 wd 0.0500 time 0.3966 (0.4148) data time 0.0008 (0.0021) model time 0.3957 (0.4094) loss 6.7812 (7.4592) grad_norm 3.3552 (2.5440) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:11:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][360/625] eta 0:01:49 lr 0.000824 wd 0.0500 time 0.3983 (0.4144) data time 0.0007 (0.0020) model time 0.3977 (0.4091) loss 6.6775 (7.4496) grad_norm 2.8664 (2.5501) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][370/625] eta 0:01:45 lr 0.000823 wd 0.0500 time 0.4145 (0.4141) data time 0.0007 (0.0020) model time 0.4138 (0.4090) loss 6.1822 (7.4489) grad_norm 1.9796 (2.5327) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][380/625] eta 0:01:41 lr 0.000823 wd 0.0500 time 0.4011 (0.4138) data time 0.0006 (0.0020) model time 0.4005 (0.4088) loss 7.0701 (7.4493) grad_norm 1.5036 (2.5179) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][390/625] eta 0:01:37 lr 0.000823 wd 0.0500 time 0.4020 (0.4135) data time 0.0006 (0.0019) model time 0.4014 (0.4085) loss 7.2727 (7.4439) grad_norm 2.3498 (2.5055) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][400/625] eta 0:01:32 lr 0.000823 wd 0.0500 time 0.4138 (0.4133) data time 0.0008 (0.0019) model time 0.4129 (0.4084) loss 8.4840 (7.4422) grad_norm 2.7524 (2.5232) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][410/625] eta 0:01:28 lr 0.000823 wd 0.0500 time 0.3953 (0.4130) data time 0.0009 (0.0019) model time 0.3945 (0.4082) loss 7.1966 (7.4502) grad_norm 1.7190 (2.5232) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][420/625] eta 0:01:24 lr 0.000823 wd 0.0500 time 0.3958 (0.4127) data time 0.0006 (0.0019) model time 0.3952 (0.4080) loss 6.0993 (7.4577) grad_norm 2.0812 (2.5187) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][430/625] eta 0:01:20 lr 0.000823 wd 0.0500 time 0.3960 (0.4131) data time 0.0007 (0.0019) model time 0.3953 (0.4085) loss 8.0939 (7.4638) grad_norm 2.9853 (2.5072) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][440/625] eta 0:01:16 lr 0.000823 wd 0.0500 time 0.6035 (0.4156) data time 0.0008 (0.0018) model time 0.6027 (0.4114) loss 6.1290 (7.4658) grad_norm 2.4873 (2.4982) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][450/625] eta 0:01:12 lr 0.000823 wd 0.0500 time 0.6147 (0.4170) data time 0.0009 (0.0018) model time 0.6138 (0.4131) loss 8.3689 (7.4604) grad_norm 2.1271 (2.4916) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][460/625] eta 0:01:08 lr 0.000823 wd 0.0500 time 0.3979 (0.4174) data time 0.0006 (0.0018) model time 0.3973 (0.4136) loss 8.7319 (7.4684) grad_norm 2.6371 (2.4983) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][470/625] eta 0:01:04 lr 0.000822 wd 0.0500 time 0.4395 (0.4175) data time 0.0009 (0.0018) model time 0.4386 (0.4137) loss 7.7087 (7.4694) grad_norm 1.5175 (2.4950) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][480/625] eta 0:01:00 lr 0.000822 wd 0.0500 time 0.3938 (0.4172) data time 0.0007 (0.0018) model time 0.3931 (0.4134) loss 7.6306 (7.4644) grad_norm 2.8907 (2.5132) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][490/625] eta 0:00:56 lr 0.000822 wd 0.0500 time 0.4010 (0.4169) data time 0.0006 (0.0018) model time 0.4004 (0.4132) loss 6.3749 (7.4681) grad_norm 1.6909 (2.5034) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:12:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][500/625] eta 0:00:52 lr 0.000822 wd 0.0500 time 0.4082 (0.4166) data time 0.0008 (0.0018) model time 0.4073 (0.4129) loss 6.1550 (7.4676) grad_norm 2.0965 (2.4974) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][510/625] eta 0:00:47 lr 0.000822 wd 0.0500 time 0.3922 (0.4164) data time 0.0008 (0.0018) model time 0.3915 (0.4127) loss 8.6509 (7.4718) grad_norm 1.9621 (2.4930) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][520/625] eta 0:00:43 lr 0.000822 wd 0.0500 time 0.4016 (0.4161) data time 0.0008 (0.0017) model time 0.4008 (0.4124) loss 7.8583 (7.4673) grad_norm 1.6952 (2.4858) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][530/625] eta 0:00:39 lr 0.000822 wd 0.0500 time 0.4094 (0.4159) data time 0.0009 (0.0017) model time 0.4084 (0.4122) loss 6.4266 (7.4537) grad_norm 1.3135 (2.4709) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][540/625] eta 0:00:35 lr 0.000822 wd 0.0500 time 0.3928 (0.4156) data time 0.0007 (0.0017) model time 0.3921 (0.4121) loss 7.7242 (7.4546) grad_norm 1.8780 (2.4630) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][550/625] eta 0:00:31 lr 0.000822 wd 0.0500 time 0.4029 (0.4154) data time 0.0009 (0.0017) model time 0.4020 (0.4119) loss 7.4395 (7.4550) grad_norm 1.6058 (2.4562) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][560/625] eta 0:00:26 lr 0.000822 wd 0.0500 time 0.4064 (0.4152) data time 0.0008 (0.0017) model time 0.4055 (0.4116) loss 8.1992 (7.4547) grad_norm 1.6718 (2.4458) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][570/625] eta 0:00:22 lr 0.000821 wd 0.0500 time 0.3935 (0.4149) data time 0.0007 (0.0017) model time 0.3929 (0.4114) loss 5.9471 (7.4503) grad_norm 2.1207 (2.4409) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][580/625] eta 0:00:18 lr 0.000821 wd 0.0500 time 0.3992 (0.4147) data time 0.0007 (0.0017) model time 0.3986 (0.4112) loss 6.4728 (7.4422) grad_norm 1.5544 (2.4364) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][590/625] eta 0:00:14 lr 0.000821 wd 0.0500 time 0.4091 (0.4145) data time 0.0009 (0.0017) model time 0.4082 (0.4111) loss 8.0978 (7.4396) grad_norm 2.3627 (2.4374) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][600/625] eta 0:00:10 lr 0.000821 wd 0.0500 time 0.3943 (0.4143) data time 0.0007 (0.0016) model time 0.3937 (0.4108) loss 8.1147 (7.4404) grad_norm 2.3179 (2.4370) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][610/625] eta 0:00:06 lr 0.000821 wd 0.0500 time 0.4027 (0.4141) data time 0.0004 (0.0016) model time 0.4022 (0.4107) loss 6.5261 (7.4399) grad_norm 2.5477 (2.4345) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][620/625] eta 0:00:02 lr 0.000821 wd 0.0500 time 0.4102 (0.4139) data time 0.0004 (0.0016) model time 0.4098 (0.4105) loss 7.7610 (7.4372) grad_norm 2.2850 (2.4321) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:13:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 126 training takes 0:04:18 [2024-07-25 01:13:48 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:13:49 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:13:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.437 (0.437) Loss 0.6089 (0.6089) Acc@1 88.232 (88.232) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 01:13:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.118) Loss 1.0107 (0.7624) Acc@1 77.832 (84.539) Acc@5 94.873 (97.093) Mem 14939MB [2024-07-25 01:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.1162 (0.9018) Acc@1 73.926 (80.829) Acc@5 94.092 (95.480) Mem 14939MB [2024-07-25 01:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.486 Acc@5 95.453 [2024-07-25 01:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-07-25 01:13:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.797 (0.797) Loss 0.5820 (0.5820) Acc@1 88.867 (88.867) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:13:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.156) Loss 0.9473 (0.7269) Acc@1 79.736 (85.112) Acc@5 95.215 (97.346) Mem 14939MB [2024-07-25 01:13:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 1.0830 (0.8600) Acc@1 74.756 (81.576) Acc@5 94.043 (95.971) Mem 14939MB [2024-07-25 01:13:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.262 Acc@5 95.945 [2024-07-25 01:13:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 01:13:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.26% [2024-07-25 01:13:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:13:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:13:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][0/625] eta 0:08:44 lr 0.000821 wd 0.0500 time 0.8398 (0.8398) data time 0.4495 (0.4495) model time 0.0000 (0.0000) loss 7.6362 (7.6362) grad_norm 1.7910 (1.7910) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][10/625] eta 0:04:30 lr 0.000821 wd 0.0500 time 0.4013 (0.4405) data time 0.0008 (0.0420) model time 0.0000 (0.0000) loss 7.3828 (7.6639) grad_norm 2.2357 (2.0050) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][20/625] eta 0:04:14 lr 0.000821 wd 0.0500 time 0.3976 (0.4211) data time 0.0009 (0.0226) model time 0.0000 (0.0000) loss 7.1648 (7.3610) grad_norm 2.7990 (1.9357) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][30/625] eta 0:04:20 lr 0.000821 wd 0.0500 time 0.5864 (0.4375) data time 0.0006 (0.0156) model time 0.0000 (0.0000) loss 7.2182 (7.2801) grad_norm 2.5525 (2.1167) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][40/625] eta 0:04:19 lr 0.000821 wd 0.0500 time 0.3931 (0.4435) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 8.4294 (7.4054) grad_norm 2.5148 (2.3785) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][50/625] eta 0:04:18 lr 0.000820 wd 0.0500 time 0.5905 (0.4503) data time 0.0008 (0.0099) model time 0.0000 (0.0000) loss 7.4562 (7.4406) grad_norm 2.2586 (2.4502) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][60/625] eta 0:04:09 lr 0.000820 wd 0.0500 time 0.4054 (0.4423) data time 0.0006 (0.0084) model time 0.4048 (0.4001) loss 7.5691 (7.4867) grad_norm 2.1184 (2.4488) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][70/625] eta 0:04:02 lr 0.000820 wd 0.0500 time 0.3937 (0.4364) data time 0.0006 (0.0074) model time 0.3931 (0.4000) loss 7.4000 (7.4818) grad_norm 1.5372 (2.4098) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][80/625] eta 0:03:55 lr 0.000820 wd 0.0500 time 0.3992 (0.4322) data time 0.0006 (0.0066) model time 0.3986 (0.4003) loss 7.4929 (7.4375) grad_norm 3.0686 (2.3942) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][90/625] eta 0:03:49 lr 0.000820 wd 0.0500 time 0.4078 (0.4289) data time 0.0010 (0.0060) model time 0.4068 (0.4006) loss 7.9150 (7.4342) grad_norm 2.0861 (2.4591) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][100/625] eta 0:03:43 lr 0.000820 wd 0.0500 time 0.3947 (0.4259) data time 0.0007 (0.0055) model time 0.3940 (0.4001) loss 7.1888 (7.4044) grad_norm 2.6518 (2.4676) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][110/625] eta 0:03:38 lr 0.000820 wd 0.0500 time 0.4013 (0.4238) data time 0.0008 (0.0051) model time 0.4005 (0.4002) loss 7.7443 (7.4128) grad_norm 2.7902 (2.4398) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][120/625] eta 0:03:33 lr 0.000820 wd 0.0500 time 0.4122 (0.4220) data time 0.0009 (0.0047) model time 0.4113 (0.4003) loss 8.1204 (7.4347) grad_norm 2.8919 (2.5439) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][130/625] eta 0:03:28 lr 0.000820 wd 0.0500 time 0.4045 (0.4204) data time 0.0007 (0.0044) model time 0.4038 (0.4004) loss 8.3157 (7.4532) grad_norm 2.2895 (2.5276) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][140/625] eta 0:03:23 lr 0.000820 wd 0.0500 time 0.4003 (0.4190) data time 0.0007 (0.0042) model time 0.3996 (0.4003) loss 6.6354 (7.4353) grad_norm 2.1544 (2.4948) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][150/625] eta 0:03:18 lr 0.000819 wd 0.0500 time 0.4193 (0.4181) data time 0.0008 (0.0040) model time 0.4185 (0.4008) loss 8.7414 (7.4423) grad_norm 1.8464 (2.5434) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][160/625] eta 0:03:13 lr 0.000819 wd 0.0500 time 0.3991 (0.4172) data time 0.0011 (0.0038) model time 0.3981 (0.4008) loss 7.3464 (7.4233) grad_norm 2.5892 (2.5555) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][170/625] eta 0:03:09 lr 0.000819 wd 0.0500 time 0.4022 (0.4163) data time 0.0009 (0.0036) model time 0.4013 (0.4009) loss 7.0175 (7.4363) grad_norm 2.3688 (2.5512) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][180/625] eta 0:03:04 lr 0.000819 wd 0.0500 time 0.4199 (0.4157) data time 0.0008 (0.0035) model time 0.4191 (0.4011) loss 7.6595 (7.4226) grad_norm 1.4236 (2.5434) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][190/625] eta 0:03:00 lr 0.000819 wd 0.0500 time 0.3979 (0.4149) data time 0.0009 (0.0034) model time 0.3970 (0.4010) loss 7.2139 (7.4353) grad_norm 3.2083 (2.5493) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][200/625] eta 0:02:56 lr 0.000819 wd 0.0500 time 0.4028 (0.4150) data time 0.0009 (0.0032) model time 0.4019 (0.4020) loss 6.7640 (7.4414) grad_norm 3.2993 (2.5375) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][210/625] eta 0:02:52 lr 0.000819 wd 0.0500 time 0.4122 (0.4146) data time 0.0006 (0.0031) model time 0.4116 (0.4022) loss 7.2093 (7.4339) grad_norm 1.3664 (2.5227) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][220/625] eta 0:02:47 lr 0.000819 wd 0.0500 time 0.3933 (0.4140) data time 0.0006 (0.0030) model time 0.3927 (0.4022) loss 6.8074 (7.4290) grad_norm 1.3051 (2.5083) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][230/625] eta 0:02:43 lr 0.000819 wd 0.0500 time 0.4012 (0.4135) data time 0.0007 (0.0029) model time 0.4005 (0.4021) loss 7.6378 (7.4132) grad_norm 1.7012 (2.4989) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][240/625] eta 0:02:39 lr 0.000819 wd 0.0500 time 0.4127 (0.4131) data time 0.0009 (0.0029) model time 0.4118 (0.4022) loss 7.5917 (7.4075) grad_norm 2.2708 (2.4745) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][250/625] eta 0:02:35 lr 0.000818 wd 0.0500 time 0.6248 (0.4156) data time 0.0008 (0.0028) model time 0.6239 (0.4057) loss 7.8850 (7.4004) grad_norm 2.0273 (2.4586) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][260/625] eta 0:02:32 lr 0.000818 wd 0.0500 time 0.4065 (0.4180) data time 0.0009 (0.0027) model time 0.4056 (0.4092) loss 7.4713 (7.3939) grad_norm 1.4223 (2.4463) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][270/625] eta 0:02:29 lr 0.000818 wd 0.0500 time 0.5873 (0.4213) data time 0.0007 (0.0026) model time 0.5866 (0.4135) loss 8.3805 (7.4131) grad_norm 1.3588 (2.4417) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][280/625] eta 0:02:25 lr 0.000818 wd 0.0500 time 0.3963 (0.4205) data time 0.0009 (0.0026) model time 0.3954 (0.4129) loss 8.6433 (7.4286) grad_norm 2.4563 (2.4287) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:15:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][290/625] eta 0:02:20 lr 0.000818 wd 0.0500 time 0.3985 (0.4198) data time 0.0007 (0.0025) model time 0.3978 (0.4124) loss 6.0220 (7.4266) grad_norm 2.5516 (2.4201) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][300/625] eta 0:02:16 lr 0.000818 wd 0.0500 time 0.4091 (0.4193) data time 0.0008 (0.0025) model time 0.4084 (0.4120) loss 6.9188 (7.4033) grad_norm 2.1142 (2.4088) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][310/625] eta 0:02:11 lr 0.000818 wd 0.0500 time 0.3946 (0.4187) data time 0.0006 (0.0024) model time 0.3940 (0.4115) loss 8.2062 (7.3963) grad_norm 2.7933 (2.4794) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][320/625] eta 0:02:07 lr 0.000818 wd 0.0500 time 0.3978 (0.4181) data time 0.0006 (0.0024) model time 0.3972 (0.4111) loss 8.6245 (7.4014) grad_norm 3.0452 (2.5040) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][330/625] eta 0:02:03 lr 0.000818 wd 0.0500 time 0.4113 (0.4176) data time 0.0008 (0.0023) model time 0.4105 (0.4107) loss 7.8628 (7.4005) grad_norm 3.1493 (2.5132) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][340/625] eta 0:01:58 lr 0.000818 wd 0.0500 time 0.3962 (0.4171) data time 0.0006 (0.0023) model time 0.3955 (0.4103) loss 6.2202 (7.3944) grad_norm 4.2332 (2.5245) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][350/625] eta 0:01:54 lr 0.000817 wd 0.0500 time 0.3998 (0.4167) data time 0.0007 (0.0023) model time 0.3992 (0.4100) loss 6.0073 (7.3900) grad_norm 1.8020 (2.5162) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][360/625] eta 0:01:50 lr 0.000817 wd 0.0500 time 0.4332 (0.4164) data time 0.0009 (0.0022) model time 0.4324 (0.4099) loss 6.2526 (7.3771) grad_norm 2.7566 (2.5210) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][370/625] eta 0:01:46 lr 0.000817 wd 0.0500 time 0.3952 (0.4160) data time 0.0008 (0.0023) model time 0.3944 (0.4095) loss 6.7692 (7.3651) grad_norm 1.7811 (2.5088) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][380/625] eta 0:01:41 lr 0.000817 wd 0.0500 time 0.4018 (0.4156) data time 0.0008 (0.0022) model time 0.4011 (0.4092) loss 7.8583 (7.3644) grad_norm 3.5296 (2.4976) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][390/625] eta 0:01:37 lr 0.000817 wd 0.0500 time 0.4071 (0.4153) data time 0.0007 (0.0022) model time 0.4064 (0.4090) loss 5.6709 (7.3682) grad_norm 1.9869 (2.4979) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][400/625] eta 0:01:33 lr 0.000817 wd 0.0500 time 0.3964 (0.4150) data time 0.0006 (0.0022) model time 0.3958 (0.4088) loss 7.1000 (7.3668) grad_norm 2.0878 (2.4959) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][410/625] eta 0:01:29 lr 0.000817 wd 0.0500 time 0.3993 (0.4147) data time 0.0007 (0.0021) model time 0.3986 (0.4086) loss 8.6523 (7.3727) grad_norm 2.5973 (2.4970) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][420/625] eta 0:01:25 lr 0.000817 wd 0.0500 time 0.4084 (0.4148) data time 0.0007 (0.0021) model time 0.4076 (0.4088) loss 8.4495 (7.3814) grad_norm 1.8985 (2.4931) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][430/625] eta 0:01:20 lr 0.000817 wd 0.0500 time 0.3936 (0.4144) data time 0.0007 (0.0021) model time 0.3929 (0.4086) loss 7.6614 (7.3815) grad_norm 1.9119 (2.4796) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][440/625] eta 0:01:16 lr 0.000817 wd 0.0500 time 0.3994 (0.4141) data time 0.0006 (0.0021) model time 0.3988 (0.4083) loss 7.3100 (7.3785) grad_norm 1.6445 (2.4712) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][450/625] eta 0:01:12 lr 0.000816 wd 0.0500 time 0.4076 (0.4139) data time 0.0007 (0.0021) model time 0.4069 (0.4082) loss 7.6119 (7.3893) grad_norm 1.6285 (2.4649) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][460/625] eta 0:01:08 lr 0.000816 wd 0.0500 time 0.3956 (0.4136) data time 0.0007 (0.0020) model time 0.3949 (0.4079) loss 7.3306 (7.3839) grad_norm 1.9893 (2.4622) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][470/625] eta 0:01:04 lr 0.000816 wd 0.0500 time 0.4208 (0.4144) data time 0.0010 (0.0020) model time 0.4198 (0.4090) loss 8.5189 (7.3898) grad_norm 2.7605 (2.4655) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][480/625] eta 0:01:00 lr 0.000816 wd 0.0500 time 0.6156 (0.4167) data time 0.0007 (0.0020) model time 0.6149 (0.4116) loss 6.9453 (7.3868) grad_norm 3.3019 (2.4821) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][490/625] eta 0:00:56 lr 0.000816 wd 0.0500 time 0.5948 (0.4182) data time 0.0008 (0.0020) model time 0.5939 (0.4134) loss 6.5248 (7.3913) grad_norm 2.1460 (2.4760) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][500/625] eta 0:00:52 lr 0.000816 wd 0.0500 time 0.4131 (0.4178) data time 0.0006 (0.0020) model time 0.4125 (0.4131) loss 6.0435 (7.3941) grad_norm 3.6226 (2.4693) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][510/625] eta 0:00:48 lr 0.000816 wd 0.0500 time 0.3951 (0.4175) data time 0.0009 (0.0019) model time 0.3942 (0.4128) loss 5.7656 (7.3870) grad_norm 3.3216 (2.4705) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][520/625] eta 0:00:43 lr 0.000816 wd 0.0500 time 0.4036 (0.4172) data time 0.0006 (0.0019) model time 0.4030 (0.4126) loss 6.7352 (7.3838) grad_norm 2.9868 (2.4787) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][530/625] eta 0:00:39 lr 0.000816 wd 0.0500 time 0.4099 (0.4169) data time 0.0009 (0.0019) model time 0.4091 (0.4123) loss 7.7896 (7.3788) grad_norm 2.1455 (2.4804) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][540/625] eta 0:00:35 lr 0.000816 wd 0.0500 time 0.3942 (0.4166) data time 0.0006 (0.0019) model time 0.3936 (0.4120) loss 7.6128 (7.3841) grad_norm 2.4839 (2.4797) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][550/625] eta 0:00:31 lr 0.000815 wd 0.0500 time 0.4011 (0.4163) data time 0.0007 (0.0019) model time 0.4004 (0.4118) loss 5.8126 (7.3887) grad_norm 1.8800 (2.4761) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][560/625] eta 0:00:27 lr 0.000815 wd 0.0500 time 0.4105 (0.4161) data time 0.0009 (0.0018) model time 0.4097 (0.4116) loss 8.6433 (7.3929) grad_norm 2.7727 (2.4851) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][570/625] eta 0:00:22 lr 0.000815 wd 0.0500 time 0.3927 (0.4158) data time 0.0009 (0.0018) model time 0.3919 (0.4114) loss 7.9181 (7.3970) grad_norm 1.7087 (2.4844) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:17:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][580/625] eta 0:00:18 lr 0.000815 wd 0.0500 time 0.4082 (0.4156) data time 0.0008 (0.0018) model time 0.4074 (0.4112) loss 8.3844 (7.3932) grad_norm 4.9642 (2.4855) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][590/625] eta 0:00:14 lr 0.000815 wd 0.0500 time 0.4057 (0.4154) data time 0.0006 (0.0018) model time 0.4051 (0.4111) loss 8.5747 (7.3981) grad_norm 2.7154 (2.4821) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][600/625] eta 0:00:10 lr 0.000815 wd 0.0500 time 0.3935 (0.4152) data time 0.0007 (0.0018) model time 0.3928 (0.4109) loss 5.6840 (7.3923) grad_norm 2.0446 (2.4744) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][610/625] eta 0:00:06 lr 0.000815 wd 0.0500 time 0.4011 (0.4149) data time 0.0005 (0.0018) model time 0.4007 (0.4107) loss 7.2224 (7.3865) grad_norm 2.2334 (2.4731) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][620/625] eta 0:00:02 lr 0.000815 wd 0.0500 time 0.4049 (0.4147) data time 0.0004 (0.0018) model time 0.4045 (0.4105) loss 6.2651 (7.3837) grad_norm 1.7372 (2.4716) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 127 training takes 0:04:19 [2024-07-25 01:18:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:18:16 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:18:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.679 (0.679) Loss 0.5908 (0.5908) Acc@1 88.330 (88.330) Acc@5 98.535 (98.535) Mem 14939MB [2024-07-25 01:18:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.141) Loss 0.9834 (0.7609) Acc@1 77.783 (84.193) Acc@5 95.215 (97.177) Mem 14939MB [2024-07-25 01:18:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.115) Loss 1.1221 (0.9013) Acc@1 74.219 (80.690) Acc@5 93.750 (95.585) Mem 14939MB [2024-07-25 01:18:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.324 Acc@5 95.547 [2024-07-25 01:18:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-25 01:18:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.752 (0.752) Loss 0.5811 (0.5811) Acc@1 88.818 (88.818) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:18:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.155) Loss 0.9463 (0.7262) Acc@1 79.736 (85.183) Acc@5 95.215 (97.354) Mem 14939MB [2024-07-25 01:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0820 (0.8587) Acc@1 74.756 (81.634) Acc@5 94.043 (95.977) Mem 14939MB [2024-07-25 01:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.310 Acc@5 95.955 [2024-07-25 01:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 01:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.31% [2024-07-25 01:18:21 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:18:22 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:18:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][0/625] eta 0:08:20 lr 0.000815 wd 0.0500 time 0.8012 (0.8012) data time 0.4160 (0.4160) model time 0.0000 (0.0000) loss 6.3807 (6.3807) grad_norm 2.2325 (2.2325) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][10/625] eta 0:04:28 lr 0.000815 wd 0.0500 time 0.3972 (0.4367) data time 0.0009 (0.0387) model time 0.0000 (0.0000) loss 8.2574 (7.3545) grad_norm 2.2706 (2.2837) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][20/625] eta 0:04:14 lr 0.000815 wd 0.0500 time 0.4012 (0.4205) data time 0.0009 (0.0207) model time 0.0000 (0.0000) loss 6.8051 (7.3992) grad_norm 2.7435 (2.5519) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][30/625] eta 0:04:06 lr 0.000814 wd 0.0500 time 0.4084 (0.4142) data time 0.0006 (0.0143) model time 0.0000 (0.0000) loss 7.1316 (7.3937) grad_norm 2.4315 (2.5354) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][40/625] eta 0:04:00 lr 0.000814 wd 0.0500 time 0.3941 (0.4110) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 6.1736 (7.3430) grad_norm 1.7653 (2.5799) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][50/625] eta 0:03:55 lr 0.000814 wd 0.0500 time 0.3999 (0.4093) data time 0.0009 (0.0092) model time 0.0000 (0.0000) loss 8.1960 (7.3977) grad_norm 2.4974 (2.4851) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][60/625] eta 0:03:50 lr 0.000814 wd 0.0500 time 0.4119 (0.4081) data time 0.0007 (0.0078) model time 0.4112 (0.4015) loss 8.2302 (7.4219) grad_norm 1.7615 (2.6631) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][70/625] eta 0:03:53 lr 0.000814 wd 0.0500 time 0.5854 (0.4202) data time 0.0008 (0.0068) model time 0.5846 (0.4471) loss 8.4924 (7.4552) grad_norm 2.0960 (2.6706) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:18:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][80/625] eta 0:03:54 lr 0.000814 wd 0.0500 time 0.4120 (0.4295) data time 0.0009 (0.0061) model time 0.4111 (0.4630) loss 8.3240 (7.4730) grad_norm 2.0879 (2.6443) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][90/625] eta 0:03:52 lr 0.000814 wd 0.0500 time 0.4058 (0.4341) data time 0.0009 (0.0055) model time 0.4050 (0.4649) loss 7.1258 (7.4055) grad_norm 2.8657 (2.6028) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][100/625] eta 0:03:46 lr 0.000814 wd 0.0500 time 0.4115 (0.4311) data time 0.0009 (0.0051) model time 0.4106 (0.4523) loss 8.2715 (7.3549) grad_norm 2.4965 (2.5842) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][110/625] eta 0:03:40 lr 0.000814 wd 0.0500 time 0.3959 (0.4284) data time 0.0008 (0.0047) model time 0.3952 (0.4438) loss 7.5997 (7.3445) grad_norm 2.7873 (2.6087) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][120/625] eta 0:03:35 lr 0.000814 wd 0.0500 time 0.3960 (0.4262) data time 0.0006 (0.0044) model time 0.3954 (0.4377) loss 7.8610 (7.3244) grad_norm 1.5363 (2.5819) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][130/625] eta 0:03:30 lr 0.000813 wd 0.0500 time 0.4065 (0.4244) data time 0.0007 (0.0041) model time 0.4058 (0.4332) loss 6.3762 (7.3124) grad_norm 1.9328 (2.5568) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][140/625] eta 0:03:25 lr 0.000813 wd 0.0500 time 0.3945 (0.4229) data time 0.0009 (0.0039) model time 0.3936 (0.4297) loss 6.7494 (7.3287) grad_norm 2.1664 (2.5575) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][150/625] eta 0:03:20 lr 0.000813 wd 0.0500 time 0.3985 (0.4214) data time 0.0007 (0.0037) model time 0.3978 (0.4266) loss 8.2396 (7.3508) grad_norm 1.8362 (2.5221) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][160/625] eta 0:03:15 lr 0.000813 wd 0.0500 time 0.4138 (0.4203) data time 0.0008 (0.0036) model time 0.4130 (0.4244) loss 7.0612 (7.3235) grad_norm 2.9505 (2.5140) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][170/625] eta 0:03:10 lr 0.000813 wd 0.0500 time 0.4011 (0.4192) data time 0.0008 (0.0034) model time 0.4002 (0.4225) loss 5.2366 (7.3223) grad_norm 4.5558 (2.6661) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][180/625] eta 0:03:06 lr 0.000813 wd 0.0500 time 0.3995 (0.4182) data time 0.0007 (0.0033) model time 0.3988 (0.4207) loss 6.8204 (7.3218) grad_norm 2.7309 (2.6491) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][190/625] eta 0:03:01 lr 0.000813 wd 0.0500 time 0.3973 (0.4184) data time 0.0008 (0.0032) model time 0.3965 (0.4207) loss 6.0864 (7.3111) grad_norm 2.6427 (2.6776) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][200/625] eta 0:02:57 lr 0.000813 wd 0.0500 time 0.4003 (0.4176) data time 0.0008 (0.0031) model time 0.3995 (0.4194) loss 7.5070 (7.3161) grad_norm 2.6248 (2.6766) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][210/625] eta 0:02:52 lr 0.000813 wd 0.0500 time 0.4080 (0.4168) data time 0.0009 (0.0030) model time 0.4070 (0.4182) loss 8.3588 (7.2982) grad_norm 2.2705 (2.6675) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][220/625] eta 0:02:48 lr 0.000813 wd 0.0500 time 0.3933 (0.4161) data time 0.0009 (0.0029) model time 0.3924 (0.4171) loss 7.6902 (7.3090) grad_norm 2.8283 (2.6615) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:19:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][230/625] eta 0:02:44 lr 0.000812 wd 0.0500 time 0.4035 (0.4155) data time 0.0006 (0.0028) model time 0.4029 (0.4162) loss 7.3586 (7.3070) grad_norm 2.2754 (2.6457) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][240/625] eta 0:02:39 lr 0.000812 wd 0.0500 time 0.4051 (0.4151) data time 0.0009 (0.0027) model time 0.4042 (0.4156) loss 6.3322 (7.3064) grad_norm 2.1562 (2.6415) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][250/625] eta 0:02:35 lr 0.000812 wd 0.0500 time 0.3943 (0.4144) data time 0.0009 (0.0027) model time 0.3934 (0.4148) loss 8.5457 (7.3016) grad_norm 2.0355 (2.6148) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][260/625] eta 0:02:31 lr 0.000812 wd 0.0500 time 0.3973 (0.4139) data time 0.0009 (0.0026) model time 0.3963 (0.4140) loss 8.6569 (7.2981) grad_norm 1.4434 (2.6127) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][270/625] eta 0:02:26 lr 0.000812 wd 0.0500 time 0.4127 (0.4134) data time 0.0008 (0.0025) model time 0.4119 (0.4134) loss 7.8524 (7.2914) grad_norm 2.4426 (2.5995) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][280/625] eta 0:02:22 lr 0.000812 wd 0.0500 time 0.5990 (0.4140) data time 0.0007 (0.0025) model time 0.5983 (0.4140) loss 6.9292 (7.2802) grad_norm 3.3971 (2.6042) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][290/625] eta 0:02:19 lr 0.000812 wd 0.0500 time 0.3933 (0.4160) data time 0.0007 (0.0025) model time 0.3926 (0.4163) loss 7.9472 (7.2831) grad_norm 3.9764 (2.6241) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][300/625] eta 0:02:16 lr 0.000812 wd 0.0500 time 0.4012 (0.4190) data time 0.0007 (0.0024) model time 0.4005 (0.4200) loss 6.7155 (7.2902) grad_norm 3.1641 (2.6301) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][310/625] eta 0:02:12 lr 0.000812 wd 0.0500 time 0.4105 (0.4206) data time 0.0008 (0.0024) model time 0.4098 (0.4218) loss 6.9346 (7.2968) grad_norm 2.2323 (2.6266) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][320/625] eta 0:02:08 lr 0.000812 wd 0.0500 time 0.3953 (0.4201) data time 0.0009 (0.0023) model time 0.3944 (0.4211) loss 7.0000 (7.3039) grad_norm 2.5185 (2.6143) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][330/625] eta 0:02:03 lr 0.000811 wd 0.0500 time 0.3991 (0.4195) data time 0.0009 (0.0023) model time 0.3982 (0.4204) loss 8.5140 (7.2942) grad_norm 1.7715 (2.6083) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][340/625] eta 0:01:59 lr 0.000811 wd 0.0500 time 0.4122 (0.4192) data time 0.0007 (0.0022) model time 0.4116 (0.4199) loss 7.9804 (7.2924) grad_norm 3.0558 (2.6079) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][350/625] eta 0:01:55 lr 0.000811 wd 0.0500 time 0.3952 (0.4188) data time 0.0008 (0.0022) model time 0.3944 (0.4194) loss 6.8868 (7.2891) grad_norm 2.1562 (2.5989) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][360/625] eta 0:01:50 lr 0.000811 wd 0.0500 time 0.4007 (0.4184) data time 0.0007 (0.0022) model time 0.4000 (0.4189) loss 7.9523 (7.2949) grad_norm 1.9064 (2.5927) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:20:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][370/625] eta 0:01:46 lr 0.000811 wd 0.0500 time 0.4091 (0.4180) data time 0.0009 (0.0021) model time 0.4082 (0.4184) loss 7.4658 (7.2914) grad_norm 1.8457 (2.5747) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:21:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][380/625] eta 0:01:42 lr 0.000811 wd 0.0500 time 0.3977 (0.4175) data time 0.0008 (0.0021) model time 0.3969 (0.4178) loss 7.9690 (7.3051) grad_norm 3.1839 (2.5741) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:21:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][390/625] eta 0:01:38 lr 0.000811 wd 0.0500 time 0.4012 (0.4171) data time 0.0009 (0.0021) model time 0.4004 (0.4173) loss 6.3044 (7.3050) grad_norm 1.5671 (2.5680) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:21:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][400/625] eta 0:01:33 lr 0.000811 wd 0.0500 time 0.4157 (0.4169) data time 0.0009 (0.0021) model time 0.4149 (0.4170) loss 8.2006 (7.3171) grad_norm 2.2550 (2.5591) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:21:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][410/625] eta 0:01:29 lr 0.000811 wd 0.0500 time 0.3977 (0.4169) data time 0.0007 (0.0020) model time 0.3970 (0.4169) loss 7.0939 (7.3148) grad_norm 1.6494 (2.5588) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:21:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][420/625] eta 0:01:25 lr 0.000811 wd 0.0500 time 0.4118 (0.4166) data time 0.0008 (0.0020) model time 0.4109 (0.4165) loss 7.3221 (7.3164) grad_norm 2.3726 (2.5494) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:21:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][430/625] eta 0:01:21 lr 0.000810 wd 0.0500 time 0.4091 (0.4162) data time 0.0006 (0.0020) model time 0.4084 (0.4162) loss 8.2229 (7.3188) grad_norm 2.3536 (2.5377) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 01:21:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][440/625] eta 0:01:16 lr 0.000810 wd 0.0500 time 0.3964 (0.4159) data time 0.0007 (0.0020) model time 0.3957 (0.4158) loss 6.5566 (7.3097) grad_norm 2.5827 (2.5310) loss_scale 1024.0000 (522.4490) mem 14939MB [2024-07-25 01:21:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][450/625] eta 0:01:12 lr 0.000810 wd 0.0500 time 0.3991 (0.4156) data time 0.0006 (0.0019) model time 0.3985 (0.4154) loss 5.8311 (7.3110) grad_norm 2.1487 (2.5220) loss_scale 1024.0000 (533.5698) mem 14939MB [2024-07-25 01:21:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][460/625] eta 0:01:08 lr 0.000810 wd 0.0500 time 0.4091 (0.4153) data time 0.0007 (0.0019) model time 0.4084 (0.4151) loss 8.2342 (7.3147) grad_norm 3.5735 (2.5183) loss_scale 1024.0000 (544.2082) mem 14939MB [2024-07-25 01:21:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][470/625] eta 0:01:04 lr 0.000810 wd 0.0500 time 0.3939 (0.4150) data time 0.0008 (0.0019) model time 0.3931 (0.4147) loss 7.0356 (7.3220) grad_norm 4.1007 (2.5350) loss_scale 1024.0000 (554.3949) mem 14939MB [2024-07-25 01:21:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][480/625] eta 0:01:00 lr 0.000810 wd 0.0500 time 0.3980 (0.4147) data time 0.0009 (0.0019) model time 0.3971 (0.4144) loss 7.4959 (7.3249) grad_norm 2.0537 (2.5305) loss_scale 1024.0000 (564.1580) mem 14939MB [2024-07-25 01:21:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][490/625] eta 0:00:55 lr 0.000810 wd 0.0500 time 0.4086 (0.4146) data time 0.0009 (0.0019) model time 0.4077 (0.4142) loss 8.4025 (7.3283) grad_norm 2.0285 (2.5235) loss_scale 1024.0000 (573.5234) mem 14939MB [2024-07-25 01:21:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][500/625] eta 0:00:51 lr 0.000810 wd 0.0500 time 0.3973 (0.4143) data time 0.0009 (0.0018) model time 0.3964 (0.4139) loss 7.3511 (7.3291) grad_norm 1.9064 (2.5139) loss_scale 1024.0000 (582.5150) mem 14939MB [2024-07-25 01:21:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][510/625] eta 0:00:47 lr 0.000810 wd 0.0500 time 0.5877 (0.4162) data time 0.0009 (0.0018) model time 0.5868 (0.4159) loss 7.2790 (7.3290) grad_norm 2.3680 (2.5152) loss_scale 1024.0000 (591.1546) mem 14939MB [2024-07-25 01:22:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][520/625] eta 0:00:43 lr 0.000810 wd 0.0500 time 0.4081 (0.4174) data time 0.0006 (0.0018) model time 0.4075 (0.4172) loss 6.5657 (7.3281) grad_norm 1.5469 (2.5189) loss_scale 1024.0000 (599.4626) mem 14939MB [2024-07-25 01:22:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][530/625] eta 0:00:39 lr 0.000809 wd 0.0500 time 0.3996 (0.4183) data time 0.0008 (0.0018) model time 0.3988 (0.4182) loss 8.0139 (7.3315) grad_norm 2.2415 (2.5145) loss_scale 1024.0000 (607.4576) mem 14939MB [2024-07-25 01:22:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][540/625] eta 0:00:35 lr 0.000809 wd 0.0500 time 0.3966 (0.4180) data time 0.0008 (0.0018) model time 0.3958 (0.4179) loss 8.6431 (7.3315) grad_norm 1.5488 (2.5042) loss_scale 1024.0000 (615.1571) mem 14939MB [2024-07-25 01:22:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][550/625] eta 0:00:31 lr 0.000809 wd 0.0500 time 0.4070 (0.4178) data time 0.0007 (0.0018) model time 0.4063 (0.4176) loss 7.0522 (7.3293) grad_norm 2.1298 (2.4998) loss_scale 1024.0000 (622.5771) mem 14939MB [2024-07-25 01:22:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][560/625] eta 0:00:27 lr 0.000809 wd 0.0500 time 0.3927 (0.4175) data time 0.0010 (0.0018) model time 0.3917 (0.4173) loss 7.4555 (7.3308) grad_norm 2.3254 (2.4938) loss_scale 1024.0000 (629.7326) mem 14939MB [2024-07-25 01:22:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][570/625] eta 0:00:22 lr 0.000809 wd 0.0500 time 0.4040 (0.4173) data time 0.0009 (0.0017) model time 0.4031 (0.4170) loss 6.6604 (7.3347) grad_norm 4.1239 (2.4919) loss_scale 1024.0000 (636.6375) mem 14939MB [2024-07-25 01:22:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][580/625] eta 0:00:18 lr 0.000809 wd 0.0500 time 0.4084 (0.4170) data time 0.0009 (0.0017) model time 0.4075 (0.4168) loss 7.4233 (7.3443) grad_norm 2.5706 (2.4880) loss_scale 1024.0000 (643.3046) mem 14939MB [2024-07-25 01:22:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][590/625] eta 0:00:14 lr 0.000809 wd 0.0500 time 0.3935 (0.4168) data time 0.0007 (0.0017) model time 0.3928 (0.4165) loss 7.8676 (7.3505) grad_norm 1.6629 (2.4895) loss_scale 1024.0000 (649.7462) mem 14939MB [2024-07-25 01:22:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][600/625] eta 0:00:10 lr 0.000809 wd 0.0500 time 0.3988 (0.4166) data time 0.0007 (0.0017) model time 0.3981 (0.4162) loss 6.7722 (7.3484) grad_norm 2.1843 (2.4847) loss_scale 1024.0000 (655.9734) mem 14939MB [2024-07-25 01:22:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][610/625] eta 0:00:06 lr 0.000809 wd 0.0500 time 0.4080 (0.4164) data time 0.0006 (0.0017) model time 0.4074 (0.4160) loss 6.6208 (7.3476) grad_norm 2.8619 (2.4914) loss_scale 1024.0000 (661.9967) mem 14939MB [2024-07-25 01:22:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][620/625] eta 0:00:02 lr 0.000809 wd 0.0500 time 0.3928 (0.4161) data time 0.0006 (0.0017) model time 0.3922 (0.4157) loss 6.4183 (7.3439) grad_norm 1.5175 (2.5019) loss_scale 1024.0000 (667.8261) mem 14939MB [2024-07-25 01:22:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 128 training takes 0:04:20 [2024-07-25 01:22:42 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:22:43 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:22:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.6167 (0.6167) Acc@1 88.428 (88.428) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-25 01:22:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.120) Loss 1.0283 (0.7829) Acc@1 77.344 (84.233) Acc@5 94.678 (97.039) Mem 14939MB [2024-07-25 01:22:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1523 (0.9190) Acc@1 73.877 (80.662) Acc@5 93.359 (95.499) Mem 14939MB [2024-07-25 01:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.332 Acc@5 95.529 [2024-07-25 01:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-07-25 01:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.787 (0.787) Loss 0.5801 (0.5801) Acc@1 88.770 (88.770) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:22:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.155) Loss 0.9453 (0.7257) Acc@1 79.541 (85.174) Acc@5 95.215 (97.359) Mem 14939MB [2024-07-25 01:22:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0791 (0.8576) Acc@1 75.000 (81.655) Acc@5 94.092 (95.994) Mem 14939MB [2024-07-25 01:22:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.322 Acc@5 95.973 [2024-07-25 01:22:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 01:22:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.32% [2024-07-25 01:22:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:22:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:22:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][0/625] eta 0:07:55 lr 0.000808 wd 0.0500 time 0.7605 (0.7605) data time 0.3698 (0.3698) model time 0.0000 (0.0000) loss 8.1061 (8.1061) grad_norm 2.1443 (2.1443) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:22:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][10/625] eta 0:04:26 lr 0.000808 wd 0.0500 time 0.3943 (0.4335) data time 0.0009 (0.0346) model time 0.0000 (0.0000) loss 5.7527 (7.4995) grad_norm 1.6227 (2.1417) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:22:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][20/625] eta 0:04:12 lr 0.000808 wd 0.0500 time 0.3941 (0.4176) data time 0.0006 (0.0188) model time 0.0000 (0.0000) loss 8.2395 (7.5771) grad_norm 1.8417 (2.6210) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][30/625] eta 0:04:05 lr 0.000808 wd 0.0500 time 0.4060 (0.4124) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 7.1818 (7.4340) grad_norm 2.2226 (2.5848) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][40/625] eta 0:03:59 lr 0.000808 wd 0.0500 time 0.3949 (0.4100) data time 0.0009 (0.0101) model time 0.0000 (0.0000) loss 7.6828 (7.5151) grad_norm 2.7883 (2.6545) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][50/625] eta 0:03:54 lr 0.000808 wd 0.0500 time 0.3963 (0.4083) data time 0.0009 (0.0083) model time 0.0000 (0.0000) loss 5.9050 (7.5007) grad_norm 3.9438 (2.7416) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][60/625] eta 0:03:50 lr 0.000808 wd 0.0500 time 0.4044 (0.4073) data time 0.0007 (0.0071) model time 0.4037 (0.4012) loss 6.4510 (7.3632) grad_norm 1.8853 (2.6489) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][70/625] eta 0:03:45 lr 0.000808 wd 0.0500 time 0.3956 (0.4066) data time 0.0007 (0.0062) model time 0.3949 (0.4013) loss 6.3364 (7.3709) grad_norm 2.0264 (2.5720) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][80/625] eta 0:03:41 lr 0.000808 wd 0.0500 time 0.3991 (0.4060) data time 0.0007 (0.0056) model time 0.3984 (0.4011) loss 6.5312 (7.3376) grad_norm 3.6744 (2.5661) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][90/625] eta 0:03:36 lr 0.000808 wd 0.0500 time 0.4010 (0.4055) data time 0.0008 (0.0051) model time 0.4002 (0.4011) loss 8.5017 (7.3453) grad_norm 2.3542 (2.6157) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][100/625] eta 0:03:34 lr 0.000807 wd 0.0500 time 0.4008 (0.4083) data time 0.0006 (0.0047) model time 0.4001 (0.4075) loss 6.2011 (7.3262) grad_norm 1.7343 (2.6201) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][110/625] eta 0:03:34 lr 0.000807 wd 0.0500 time 0.3977 (0.4171) data time 0.0009 (0.0043) model time 0.3968 (0.4237) loss 8.2829 (7.3547) grad_norm 2.5469 (2.6112) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][120/625] eta 0:03:33 lr 0.000807 wd 0.0500 time 0.5932 (0.4235) data time 0.0009 (0.0040) model time 0.5924 (0.4337) loss 7.4514 (7.3683) grad_norm 2.1739 (2.6507) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][130/625] eta 0:03:30 lr 0.000807 wd 0.0500 time 0.3938 (0.4247) data time 0.0006 (0.0038) model time 0.3932 (0.4343) loss 6.7335 (7.4196) grad_norm 3.2731 (2.6984) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][140/625] eta 0:03:25 lr 0.000807 wd 0.0500 time 0.4046 (0.4234) data time 0.0007 (0.0036) model time 0.4039 (0.4310) loss 6.1325 (7.3857) grad_norm 2.4763 (2.6714) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][150/625] eta 0:03:20 lr 0.000807 wd 0.0500 time 0.4021 (0.4222) data time 0.0006 (0.0034) model time 0.4014 (0.4283) loss 7.8092 (7.3958) grad_norm 1.5932 (2.6346) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:23:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][160/625] eta 0:03:16 lr 0.000807 wd 0.0500 time 0.3939 (0.4218) data time 0.0009 (0.0033) model time 0.3930 (0.4272) loss 7.0764 (7.4071) grad_norm 2.2825 (2.6044) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][170/625] eta 0:03:11 lr 0.000807 wd 0.0500 time 0.4176 (0.4211) data time 0.0008 (0.0032) model time 0.4167 (0.4256) loss 7.5447 (7.4139) grad_norm 2.1366 (2.5900) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][180/625] eta 0:03:07 lr 0.000807 wd 0.0500 time 0.4075 (0.4204) data time 0.0007 (0.0030) model time 0.4069 (0.4242) loss 6.8826 (7.4210) grad_norm 3.6340 (2.6021) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][190/625] eta 0:03:02 lr 0.000807 wd 0.0500 time 0.3975 (0.4195) data time 0.0008 (0.0029) model time 0.3967 (0.4226) loss 6.6907 (7.4230) grad_norm 2.0566 (2.5951) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][200/625] eta 0:02:57 lr 0.000806 wd 0.0500 time 0.4038 (0.4187) data time 0.0008 (0.0028) model time 0.4029 (0.4213) loss 7.0837 (7.4084) grad_norm 2.7626 (2.5700) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][210/625] eta 0:02:53 lr 0.000806 wd 0.0500 time 0.4169 (0.4180) data time 0.0008 (0.0028) model time 0.4160 (0.4201) loss 9.2024 (7.3907) grad_norm 3.2287 (2.5875) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][220/625] eta 0:02:49 lr 0.000806 wd 0.0500 time 0.3937 (0.4174) data time 0.0006 (0.0027) model time 0.3931 (0.4191) loss 8.0749 (7.3786) grad_norm 3.0195 (2.5993) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][230/625] eta 0:02:44 lr 0.000806 wd 0.0500 time 0.4027 (0.4167) data time 0.0006 (0.0026) model time 0.4021 (0.4182) loss 7.6513 (7.3642) grad_norm 3.3878 (2.5807) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][240/625] eta 0:02:40 lr 0.000806 wd 0.0500 time 0.4095 (0.4163) data time 0.0008 (0.0025) model time 0.4087 (0.4174) loss 6.6950 (7.3885) grad_norm 2.5183 (2.5589) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][250/625] eta 0:02:35 lr 0.000806 wd 0.0500 time 0.3958 (0.4157) data time 0.0006 (0.0025) model time 0.3953 (0.4166) loss 7.0288 (7.4121) grad_norm 1.8403 (2.5336) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][260/625] eta 0:02:31 lr 0.000806 wd 0.0500 time 0.4044 (0.4153) data time 0.0005 (0.0024) model time 0.4038 (0.4161) loss 8.2303 (7.4149) grad_norm 1.6632 (2.5199) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][270/625] eta 0:02:27 lr 0.000806 wd 0.0500 time 0.4086 (0.4149) data time 0.0007 (0.0023) model time 0.4079 (0.4155) loss 8.0817 (7.4136) grad_norm 2.1430 (2.5121) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][280/625] eta 0:02:23 lr 0.000806 wd 0.0500 time 0.3966 (0.4146) data time 0.0009 (0.0023) model time 0.3957 (0.4150) loss 7.3540 (7.4114) grad_norm 1.9085 (2.5103) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][290/625] eta 0:02:18 lr 0.000806 wd 0.0500 time 0.4032 (0.4142) data time 0.0006 (0.0022) model time 0.4026 (0.4145) loss 7.5636 (7.4187) grad_norm 2.2526 (2.5183) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][300/625] eta 0:02:14 lr 0.000805 wd 0.0500 time 0.4114 (0.4139) data time 0.0008 (0.0022) model time 0.4105 (0.4141) loss 7.7873 (7.4102) grad_norm 1.7589 (2.4923) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:24:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][310/625] eta 0:02:10 lr 0.000805 wd 0.0500 time 0.3951 (0.4135) data time 0.0006 (0.0022) model time 0.3944 (0.4136) loss 7.4266 (7.4146) grad_norm 2.7886 (2.4861) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][320/625] eta 0:02:06 lr 0.000805 wd 0.0500 time 0.5937 (0.4143) data time 0.0007 (0.0021) model time 0.5929 (0.4144) loss 9.1896 (7.4118) grad_norm 2.4334 (2.4848) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][330/625] eta 0:02:02 lr 0.000805 wd 0.0500 time 0.4067 (0.4169) data time 0.0006 (0.0021) model time 0.4061 (0.4176) loss 6.6054 (7.4089) grad_norm 1.5627 (2.4799) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][340/625] eta 0:01:59 lr 0.000805 wd 0.0500 time 0.4099 (0.4185) data time 0.0008 (0.0021) model time 0.4091 (0.4193) loss 6.6403 (7.3992) grad_norm 1.9852 (2.4671) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][350/625] eta 0:01:55 lr 0.000805 wd 0.0500 time 0.4058 (0.4186) data time 0.0008 (0.0020) model time 0.4050 (0.4194) loss 7.1814 (7.3970) grad_norm 1.9090 (2.4636) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][360/625] eta 0:01:50 lr 0.000805 wd 0.0500 time 0.3949 (0.4182) data time 0.0010 (0.0020) model time 0.3939 (0.4189) loss 8.8770 (7.4081) grad_norm 3.5019 (2.4684) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][370/625] eta 0:01:46 lr 0.000805 wd 0.0500 time 0.4015 (0.4178) data time 0.0006 (0.0020) model time 0.4009 (0.4184) loss 7.1721 (7.4073) grad_norm 1.7181 (2.4641) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][380/625] eta 0:01:42 lr 0.000805 wd 0.0500 time 0.4117 (0.4179) data time 0.0008 (0.0020) model time 0.4109 (0.4184) loss 8.0115 (7.4134) grad_norm 2.1459 (2.4635) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][390/625] eta 0:01:38 lr 0.000805 wd 0.0500 time 0.3934 (0.4175) data time 0.0009 (0.0019) model time 0.3925 (0.4179) loss 6.3871 (7.4157) grad_norm 2.4907 (2.4593) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][400/625] eta 0:01:33 lr 0.000804 wd 0.0500 time 0.3961 (0.4171) data time 0.0008 (0.0019) model time 0.3954 (0.4175) loss 7.4806 (7.4078) grad_norm 2.1331 (2.4493) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][410/625] eta 0:01:29 lr 0.000804 wd 0.0500 time 0.4070 (0.4168) data time 0.0008 (0.0019) model time 0.4062 (0.4170) loss 8.1744 (7.4199) grad_norm 1.6530 (2.4378) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][420/625] eta 0:01:25 lr 0.000804 wd 0.0500 time 0.3955 (0.4165) data time 0.0006 (0.0019) model time 0.3949 (0.4166) loss 7.0215 (7.4066) grad_norm 3.8640 (2.4490) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][430/625] eta 0:01:21 lr 0.000804 wd 0.0500 time 0.4029 (0.4162) data time 0.0010 (0.0019) model time 0.4019 (0.4162) loss 7.1852 (7.4040) grad_norm 1.5134 (2.4770) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][440/625] eta 0:01:16 lr 0.000804 wd 0.0500 time 0.4041 (0.4161) data time 0.0009 (0.0018) model time 0.4032 (0.4161) loss 8.2579 (7.4150) grad_norm 2.1988 (2.4778) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:25:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][450/625] eta 0:01:12 lr 0.000804 wd 0.0500 time 0.3949 (0.4160) data time 0.0007 (0.0018) model time 0.3942 (0.4160) loss 6.5455 (7.4152) grad_norm 2.4095 (2.4752) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][460/625] eta 0:01:08 lr 0.000804 wd 0.0500 time 0.4059 (0.4159) data time 0.0006 (0.0018) model time 0.4053 (0.4158) loss 7.4170 (7.4154) grad_norm 2.2792 (2.4757) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][470/625] eta 0:01:04 lr 0.000804 wd 0.0500 time 0.4148 (0.4157) data time 0.0010 (0.0018) model time 0.4138 (0.4155) loss 6.1104 (7.4154) grad_norm 3.0786 (2.4752) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][480/625] eta 0:01:00 lr 0.000804 wd 0.0500 time 0.3956 (0.4154) data time 0.0007 (0.0018) model time 0.3949 (0.4153) loss 7.7065 (7.4101) grad_norm 2.1222 (2.4707) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][490/625] eta 0:00:56 lr 0.000804 wd 0.0500 time 0.4120 (0.4152) data time 0.0006 (0.0018) model time 0.4114 (0.4150) loss 8.0721 (7.4129) grad_norm 2.8127 (2.4668) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][500/625] eta 0:00:51 lr 0.000803 wd 0.0500 time 0.4266 (0.4150) data time 0.0008 (0.0017) model time 0.4258 (0.4148) loss 7.4364 (7.4074) grad_norm 2.1263 (2.4601) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][510/625] eta 0:00:47 lr 0.000803 wd 0.0500 time 0.3929 (0.4149) data time 0.0007 (0.0017) model time 0.3922 (0.4147) loss 6.9518 (7.4098) grad_norm 1.5976 (2.4489) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][520/625] eta 0:00:43 lr 0.000803 wd 0.0500 time 0.4025 (0.4149) data time 0.0010 (0.0017) model time 0.4015 (0.4146) loss 7.1040 (7.4080) grad_norm 1.7370 (2.4425) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][530/625] eta 0:00:39 lr 0.000803 wd 0.0500 time 0.4079 (0.4146) data time 0.0008 (0.0017) model time 0.4072 (0.4143) loss 7.6815 (7.4152) grad_norm 2.4287 (2.4357) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][540/625] eta 0:00:35 lr 0.000803 wd 0.0500 time 0.5601 (0.4149) data time 0.0008 (0.0017) model time 0.5593 (0.4146) loss 7.4816 (7.4230) grad_norm 2.6255 (2.4388) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][550/625] eta 0:00:31 lr 0.000803 wd 0.0500 time 0.6106 (0.4169) data time 0.0009 (0.0017) model time 0.6097 (0.4167) loss 6.3299 (7.4249) grad_norm 2.0127 (2.4603) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][560/625] eta 0:00:27 lr 0.000803 wd 0.0500 time 0.6128 (0.4182) data time 0.0007 (0.0017) model time 0.6121 (0.4182) loss 6.3092 (7.4204) grad_norm 2.3309 (2.4649) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][570/625] eta 0:00:23 lr 0.000803 wd 0.0500 time 0.3956 (0.4182) data time 0.0009 (0.0017) model time 0.3947 (0.4182) loss 7.7072 (7.4232) grad_norm 3.1406 (2.4738) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][580/625] eta 0:00:18 lr 0.000803 wd 0.0500 time 0.4017 (0.4179) data time 0.0008 (0.0017) model time 0.4008 (0.4178) loss 6.6127 (7.4251) grad_norm 3.6830 (2.4910) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:26:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][590/625] eta 0:00:14 lr 0.000803 wd 0.0500 time 0.4070 (0.4176) data time 0.0007 (0.0016) model time 0.4063 (0.4175) loss 8.8949 (7.4241) grad_norm 2.2117 (2.4877) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][600/625] eta 0:00:10 lr 0.000802 wd 0.0500 time 0.4020 (0.4177) data time 0.0007 (0.0016) model time 0.4013 (0.4176) loss 7.4583 (7.4160) grad_norm 1.8149 (2.4848) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][610/625] eta 0:00:06 lr 0.000802 wd 0.0500 time 0.4140 (0.4175) data time 0.0006 (0.0016) model time 0.4133 (0.4173) loss 8.2670 (7.4193) grad_norm 1.7097 (2.4757) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][620/625] eta 0:00:02 lr 0.000802 wd 0.0500 time 0.3939 (0.4172) data time 0.0005 (0.0016) model time 0.3935 (0.4170) loss 8.6956 (7.4168) grad_norm 1.8217 (2.4710) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 129 training takes 0:04:20 [2024-07-25 01:27:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:27:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:27:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.6011 (0.6011) Acc@1 89.062 (89.062) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 01:27:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 1.0283 (0.7669) Acc@1 77.979 (84.615) Acc@5 94.678 (97.088) Mem 14939MB [2024-07-25 01:27:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1104 (0.9036) Acc@1 73.779 (80.990) Acc@5 93.848 (95.678) Mem 14939MB [2024-07-25 01:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.628 Acc@5 95.673 [2024-07-25 01:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-25 01:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.63% [2024-07-25 01:27:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 01:27:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 01:27:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 0.5786 (0.5786) Acc@1 88.770 (88.770) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 01:27:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.9429 (0.7247) Acc@1 79.688 (85.183) Acc@5 95.215 (97.363) Mem 14939MB [2024-07-25 01:27:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0781 (0.8565) Acc@1 75.000 (81.664) Acc@5 94.092 (96.015) Mem 14939MB [2024-07-25 01:27:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.332 Acc@5 95.989 [2024-07-25 01:27:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 01:27:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.33% [2024-07-25 01:27:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:27:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:27:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][0/625] eta 0:07:46 lr 0.000802 wd 0.0500 time 0.7466 (0.7466) data time 0.3641 (0.3641) model time 0.0000 (0.0000) loss 7.9335 (7.9335) grad_norm 2.0545 (2.0545) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][10/625] eta 0:04:26 lr 0.000802 wd 0.0500 time 0.4061 (0.4326) data time 0.0008 (0.0339) model time 0.0000 (0.0000) loss 8.0360 (7.6178) grad_norm 1.8102 (2.0679) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][20/625] eta 0:04:12 lr 0.000802 wd 0.0500 time 0.3977 (0.4172) data time 0.0007 (0.0182) model time 0.0000 (0.0000) loss 7.5294 (7.4565) grad_norm 1.6690 (2.6029) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][30/625] eta 0:04:05 lr 0.000802 wd 0.0500 time 0.4061 (0.4124) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 6.6576 (7.3445) grad_norm 2.4046 (2.4715) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][40/625] eta 0:04:00 lr 0.000802 wd 0.0500 time 0.3985 (0.4106) data time 0.0007 (0.0098) model time 0.0000 (0.0000) loss 8.0136 (7.3621) grad_norm 1.5092 (2.3751) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][50/625] eta 0:03:55 lr 0.000802 wd 0.0500 time 0.4085 (0.4094) data time 0.0007 (0.0081) model time 0.0000 (0.0000) loss 8.3228 (7.4867) grad_norm 2.0053 (2.3463) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][60/625] eta 0:03:50 lr 0.000802 wd 0.0500 time 0.4103 (0.4087) data time 0.0006 (0.0069) model time 0.4097 (0.4042) loss 7.4190 (7.4608) grad_norm 2.5440 (2.3928) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][70/625] eta 0:03:46 lr 0.000801 wd 0.0500 time 0.3961 (0.4076) data time 0.0009 (0.0061) model time 0.3952 (0.4021) loss 8.4774 (7.4786) grad_norm 2.2845 (2.4002) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][80/625] eta 0:03:41 lr 0.000801 wd 0.0500 time 0.4004 (0.4069) data time 0.0007 (0.0054) model time 0.3997 (0.4017) loss 8.3464 (7.4935) grad_norm 1.7459 (2.4426) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][90/625] eta 0:03:37 lr 0.000801 wd 0.0500 time 0.4380 (0.4070) data time 0.0008 (0.0050) model time 0.4372 (0.4028) loss 8.7277 (7.4490) grad_norm 2.4686 (2.3973) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:27:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][100/625] eta 0:03:34 lr 0.000801 wd 0.0500 time 0.3988 (0.4078) data time 0.0008 (0.0047) model time 0.3980 (0.4049) loss 7.1304 (7.4377) grad_norm 3.6383 (2.4407) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][110/625] eta 0:03:29 lr 0.000801 wd 0.0500 time 0.3980 (0.4072) data time 0.0007 (0.0043) model time 0.3973 (0.4042) loss 7.4349 (7.4453) grad_norm 3.6911 (2.4431) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][120/625] eta 0:03:25 lr 0.000801 wd 0.0500 time 0.4042 (0.4068) data time 0.0009 (0.0041) model time 0.4032 (0.4037) loss 7.5301 (7.4576) grad_norm 1.8281 (2.4394) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][130/625] eta 0:03:21 lr 0.000801 wd 0.0500 time 0.3942 (0.4063) data time 0.0009 (0.0038) model time 0.3933 (0.4032) loss 6.3025 (7.4348) grad_norm 3.4515 (2.4869) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][140/625] eta 0:03:19 lr 0.000801 wd 0.0500 time 0.5921 (0.4121) data time 0.0007 (0.0036) model time 0.5914 (0.4125) loss 7.9885 (7.4436) grad_norm 3.1969 (2.4771) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][150/625] eta 0:03:18 lr 0.000801 wd 0.0500 time 0.6038 (0.4174) data time 0.0007 (0.0034) model time 0.6032 (0.4204) loss 7.8595 (7.4492) grad_norm 3.4939 (2.4605) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][160/625] eta 0:03:15 lr 0.000801 wd 0.0500 time 0.6117 (0.4208) data time 0.0007 (0.0033) model time 0.6110 (0.4251) loss 7.0908 (7.4638) grad_norm 3.9102 (2.4744) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][170/625] eta 0:03:11 lr 0.000800 wd 0.0500 time 0.3947 (0.4206) data time 0.0007 (0.0031) model time 0.3940 (0.4244) loss 7.0805 (7.4381) grad_norm 3.5540 (2.4964) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][180/625] eta 0:03:06 lr 0.000800 wd 0.0500 time 0.4013 (0.4196) data time 0.0006 (0.0030) model time 0.4007 (0.4226) loss 6.3323 (7.4159) grad_norm 2.7606 (2.5088) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][190/625] eta 0:03:02 lr 0.000800 wd 0.0500 time 0.4034 (0.4188) data time 0.0007 (0.0029) model time 0.4027 (0.4211) loss 7.3631 (7.4000) grad_norm 3.4760 (2.4936) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][200/625] eta 0:02:57 lr 0.000800 wd 0.0500 time 0.3971 (0.4179) data time 0.0007 (0.0028) model time 0.3964 (0.4198) loss 6.5028 (7.3992) grad_norm 1.9782 (2.4965) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][210/625] eta 0:02:53 lr 0.000800 wd 0.0500 time 0.4002 (0.4173) data time 0.0008 (0.0027) model time 0.3994 (0.4188) loss 8.2326 (7.3938) grad_norm 1.5571 (2.4714) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][220/625] eta 0:02:48 lr 0.000800 wd 0.0500 time 0.4078 (0.4169) data time 0.0008 (0.0027) model time 0.4069 (0.4181) loss 7.8582 (7.3994) grad_norm 1.8958 (2.4583) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][230/625] eta 0:02:44 lr 0.000800 wd 0.0500 time 0.3980 (0.4163) data time 0.0009 (0.0026) model time 0.3971 (0.4172) loss 7.7364 (7.4073) grad_norm 2.1630 (2.4396) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][240/625] eta 0:02:40 lr 0.000800 wd 0.0500 time 0.4010 (0.4158) data time 0.0007 (0.0025) model time 0.4002 (0.4165) loss 6.0013 (7.3936) grad_norm 1.6516 (2.4312) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][250/625] eta 0:02:35 lr 0.000800 wd 0.0500 time 0.4060 (0.4152) data time 0.0006 (0.0025) model time 0.4053 (0.4156) loss 6.3424 (7.3967) grad_norm 2.8429 (2.4404) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][260/625] eta 0:02:31 lr 0.000800 wd 0.0500 time 0.3938 (0.4147) data time 0.0006 (0.0024) model time 0.3933 (0.4150) loss 7.2991 (7.3929) grad_norm 2.0125 (2.4359) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][270/625] eta 0:02:27 lr 0.000799 wd 0.0500 time 0.3930 (0.4142) data time 0.0009 (0.0023) model time 0.3921 (0.4143) loss 7.9258 (7.3871) grad_norm 3.0728 (2.4364) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][280/625] eta 0:02:22 lr 0.000799 wd 0.0500 time 0.4090 (0.4138) data time 0.0009 (0.0023) model time 0.4082 (0.4138) loss 7.2534 (7.3914) grad_norm 2.7030 (2.4498) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][290/625] eta 0:02:18 lr 0.000799 wd 0.0500 time 0.3961 (0.4134) data time 0.0007 (0.0022) model time 0.3955 (0.4133) loss 6.1128 (7.3950) grad_norm 1.7633 (2.4554) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][300/625] eta 0:02:14 lr 0.000799 wd 0.0500 time 0.3994 (0.4130) data time 0.0008 (0.0022) model time 0.3986 (0.4127) loss 6.2347 (7.3717) grad_norm 2.3050 (2.4456) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][310/625] eta 0:02:09 lr 0.000799 wd 0.0500 time 0.4085 (0.4127) data time 0.0008 (0.0022) model time 0.4076 (0.4123) loss 8.3719 (7.3830) grad_norm 1.6734 (2.4336) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][320/625] eta 0:02:05 lr 0.000799 wd 0.0500 time 0.3943 (0.4129) data time 0.0007 (0.0021) model time 0.3936 (0.4125) loss 6.5106 (7.3904) grad_norm 3.1883 (2.4167) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][330/625] eta 0:02:01 lr 0.000799 wd 0.0500 time 0.4013 (0.4126) data time 0.0009 (0.0021) model time 0.4004 (0.4121) loss 8.1478 (7.3826) grad_norm 2.5118 (2.4153) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][340/625] eta 0:01:57 lr 0.000799 wd 0.0500 time 0.4056 (0.4123) data time 0.0007 (0.0021) model time 0.4049 (0.4118) loss 6.7809 (7.3831) grad_norm 2.2577 (2.4114) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][350/625] eta 0:01:53 lr 0.000799 wd 0.0500 time 0.4016 (0.4121) data time 0.0008 (0.0021) model time 0.4008 (0.4115) loss 6.8837 (7.3932) grad_norm 2.4794 (2.4028) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][360/625] eta 0:01:49 lr 0.000799 wd 0.0500 time 0.5644 (0.4136) data time 0.0008 (0.0020) model time 0.5636 (0.4133) loss 7.5186 (7.3927) grad_norm 1.5442 (2.3921) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][370/625] eta 0:01:46 lr 0.000798 wd 0.0500 time 0.6011 (0.4166) data time 0.0006 (0.0020) model time 0.6005 (0.4167) loss 7.5869 (7.3838) grad_norm 3.7789 (2.3876) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:29:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][380/625] eta 0:01:42 lr 0.000798 wd 0.0500 time 0.4005 (0.4175) data time 0.0006 (0.0020) model time 0.3999 (0.4177) loss 7.3964 (7.3815) grad_norm 2.0246 (2.3789) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][390/625] eta 0:01:38 lr 0.000798 wd 0.0500 time 0.4136 (0.4174) data time 0.0009 (0.0019) model time 0.4127 (0.4176) loss 7.7970 (7.3831) grad_norm 2.3897 (2.3780) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][400/625] eta 0:01:33 lr 0.000798 wd 0.0500 time 0.4099 (0.4170) data time 0.0009 (0.0019) model time 0.4090 (0.4171) loss 6.5703 (7.3908) grad_norm 3.8479 (2.3953) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][410/625] eta 0:01:29 lr 0.000798 wd 0.0500 time 0.3944 (0.4166) data time 0.0009 (0.0019) model time 0.3935 (0.4166) loss 8.2567 (7.3909) grad_norm 2.0149 (2.3914) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][420/625] eta 0:01:25 lr 0.000798 wd 0.0500 time 0.3965 (0.4162) data time 0.0008 (0.0019) model time 0.3957 (0.4162) loss 7.7406 (7.3875) grad_norm 2.0677 (2.3825) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][430/625] eta 0:01:21 lr 0.000798 wd 0.0500 time 0.4088 (0.4159) data time 0.0008 (0.0019) model time 0.4080 (0.4157) loss 8.4545 (7.3848) grad_norm 7.3653 (2.3929) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][440/625] eta 0:01:16 lr 0.000798 wd 0.0500 time 0.3946 (0.4155) data time 0.0006 (0.0018) model time 0.3940 (0.4153) loss 7.6915 (7.3917) grad_norm 1.9922 (2.3972) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][450/625] eta 0:01:12 lr 0.000798 wd 0.0500 time 0.3968 (0.4152) data time 0.0006 (0.0018) model time 0.3962 (0.4149) loss 6.3099 (7.3851) grad_norm 2.0842 (2.3993) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][460/625] eta 0:01:08 lr 0.000798 wd 0.0500 time 0.4082 (0.4149) data time 0.0006 (0.0018) model time 0.4075 (0.4146) loss 6.7241 (7.3797) grad_norm 1.4797 (2.3919) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][470/625] eta 0:01:04 lr 0.000797 wd 0.0500 time 0.3933 (0.4146) data time 0.0008 (0.0018) model time 0.3925 (0.4142) loss 6.9932 (7.3687) grad_norm 1.8206 (2.3840) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][480/625] eta 0:01:00 lr 0.000797 wd 0.0500 time 0.4006 (0.4143) data time 0.0007 (0.0018) model time 0.4000 (0.4139) loss 8.4205 (7.3701) grad_norm 1.8434 (2.3817) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][490/625] eta 0:00:55 lr 0.000797 wd 0.0500 time 0.4072 (0.4141) data time 0.0009 (0.0017) model time 0.4063 (0.4136) loss 8.1296 (7.3705) grad_norm 1.6903 (2.3769) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][500/625] eta 0:00:51 lr 0.000797 wd 0.0500 time 0.3956 (0.4138) data time 0.0007 (0.0017) model time 0.3949 (0.4133) loss 6.7401 (7.3794) grad_norm 2.5357 (2.3729) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][510/625] eta 0:00:47 lr 0.000797 wd 0.0500 time 0.3949 (0.4136) data time 0.0007 (0.0017) model time 0.3942 (0.4130) loss 6.7302 (7.3883) grad_norm 2.1522 (2.3755) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][520/625] eta 0:00:43 lr 0.000797 wd 0.0500 time 0.4090 (0.4134) data time 0.0009 (0.0017) model time 0.4081 (0.4128) loss 6.4835 (7.3857) grad_norm 3.1055 (2.3735) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:30:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][530/625] eta 0:00:39 lr 0.000797 wd 0.0500 time 0.4013 (0.4132) data time 0.0009 (0.0017) model time 0.4004 (0.4126) loss 6.6014 (7.3740) grad_norm 1.8958 (2.3694) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][540/625] eta 0:00:35 lr 0.000797 wd 0.0500 time 0.3904 (0.4134) data time 0.0007 (0.0017) model time 0.3897 (0.4128) loss 7.7072 (7.3665) grad_norm 2.7407 (2.3711) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][550/625] eta 0:00:30 lr 0.000797 wd 0.0500 time 0.3955 (0.4132) data time 0.0006 (0.0017) model time 0.3949 (0.4126) loss 5.8807 (7.3622) grad_norm 2.0847 (2.3686) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][560/625] eta 0:00:26 lr 0.000797 wd 0.0500 time 0.3979 (0.4130) data time 0.0006 (0.0016) model time 0.3973 (0.4124) loss 7.2713 (7.3652) grad_norm 1.4960 (2.3675) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][570/625] eta 0:00:22 lr 0.000796 wd 0.0500 time 0.4045 (0.4128) data time 0.0007 (0.0016) model time 0.4038 (0.4121) loss 7.8990 (7.3661) grad_norm 4.4724 (2.3842) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][580/625] eta 0:00:18 lr 0.000796 wd 0.0500 time 0.5758 (0.4138) data time 0.0007 (0.0016) model time 0.5751 (0.4132) loss 7.3982 (7.3622) grad_norm 2.0281 (2.3837) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][590/625] eta 0:00:14 lr 0.000796 wd 0.0500 time 0.4145 (0.4152) data time 0.0007 (0.0016) model time 0.4138 (0.4147) loss 6.2421 (7.3650) grad_norm 1.5822 (2.3798) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][600/625] eta 0:00:10 lr 0.000796 wd 0.0500 time 0.5702 (0.4166) data time 0.0010 (0.0016) model time 0.5691 (0.4163) loss 6.6222 (7.3617) grad_norm 2.1052 (2.3802) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][610/625] eta 0:00:06 lr 0.000796 wd 0.0500 time 0.3944 (0.4167) data time 0.0004 (0.0016) model time 0.3940 (0.4164) loss 8.3328 (7.3579) grad_norm 4.4417 (2.3843) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][620/625] eta 0:00:02 lr 0.000796 wd 0.0500 time 0.3930 (0.4164) data time 0.0006 (0.0016) model time 0.3924 (0.4160) loss 7.1932 (7.3614) grad_norm 1.6800 (2.3893) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 130 training takes 0:04:20 [2024-07-25 01:31:38 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:31:39 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5986 (0.5986) Acc@1 87.695 (87.695) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 01:31:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9966 (0.7525) Acc@1 77.539 (84.388) Acc@5 94.922 (97.141) Mem 14939MB [2024-07-25 01:31:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0938 (0.8871) Acc@1 74.707 (80.843) Acc@5 93.652 (95.650) Mem 14939MB [2024-07-25 01:31:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.492 Acc@5 95.575 [2024-07-25 01:31:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-07-25 01:31:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.837 (0.837) Loss 0.5781 (0.5781) Acc@1 88.770 (88.770) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 01:31:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.156) Loss 0.9409 (0.7238) Acc@1 79.834 (85.249) Acc@5 95.264 (97.394) Mem 14939MB [2024-07-25 01:31:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.123) Loss 1.0771 (0.8554) Acc@1 74.902 (81.750) Acc@5 93.994 (96.019) Mem 14939MB [2024-07-25 01:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.406 Acc@5 95.987 [2024-07-25 01:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-07-25 01:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.41% [2024-07-25 01:31:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:31:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:31:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][0/625] eta 0:08:17 lr 0.000796 wd 0.0500 time 0.7965 (0.7965) data time 0.4031 (0.4031) model time 0.0000 (0.0000) loss 8.0910 (8.0910) grad_norm 3.3059 (3.3059) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][10/625] eta 0:04:28 lr 0.000796 wd 0.0500 time 0.3995 (0.4369) data time 0.0008 (0.0375) model time 0.0000 (0.0000) loss 7.7436 (6.8221) grad_norm 2.2883 (2.9410) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][20/625] eta 0:04:14 lr 0.000796 wd 0.0500 time 0.3979 (0.4201) data time 0.0008 (0.0201) model time 0.0000 (0.0000) loss 8.2815 (7.1376) grad_norm 2.3312 (2.7693) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:31:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][30/625] eta 0:04:06 lr 0.000796 wd 0.0500 time 0.4069 (0.4142) data time 0.0006 (0.0139) model time 0.0000 (0.0000) loss 6.4014 (7.1505) grad_norm 2.2198 (2.7090) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][40/625] eta 0:04:00 lr 0.000795 wd 0.0500 time 0.4082 (0.4110) data time 0.0008 (0.0108) model time 0.0000 (0.0000) loss 7.6932 (7.3131) grad_norm 2.2466 (2.6363) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][50/625] eta 0:03:55 lr 0.000795 wd 0.0500 time 0.3963 (0.4088) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 6.5754 (7.2557) grad_norm 3.9907 (2.7652) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][60/625] eta 0:03:50 lr 0.000795 wd 0.0500 time 0.4076 (0.4074) data time 0.0008 (0.0076) model time 0.4068 (0.3990) loss 8.3033 (7.3048) grad_norm 2.8810 (2.8211) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][70/625] eta 0:03:46 lr 0.000795 wd 0.0500 time 0.5425 (0.4084) data time 0.0007 (0.0066) model time 0.5418 (0.4064) loss 5.7142 (7.2362) grad_norm 1.4609 (2.7155) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][80/625] eta 0:03:41 lr 0.000795 wd 0.0500 time 0.3964 (0.4071) data time 0.0008 (0.0059) model time 0.3956 (0.4032) loss 7.9764 (7.2600) grad_norm 2.0606 (2.6358) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][90/625] eta 0:03:37 lr 0.000795 wd 0.0500 time 0.4113 (0.4066) data time 0.0009 (0.0054) model time 0.4104 (0.4029) loss 5.3660 (7.2204) grad_norm 2.9004 (2.5670) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][100/625] eta 0:03:33 lr 0.000795 wd 0.0500 time 0.3959 (0.4061) data time 0.0008 (0.0049) model time 0.3951 (0.4023) loss 6.4984 (7.2118) grad_norm 3.1528 (2.5330) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][110/625] eta 0:03:28 lr 0.000795 wd 0.0500 time 0.4004 (0.4057) data time 0.0008 (0.0046) model time 0.3996 (0.4021) loss 6.1530 (7.2380) grad_norm 2.7714 (2.4991) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][120/625] eta 0:03:24 lr 0.000795 wd 0.0500 time 0.4157 (0.4053) data time 0.0006 (0.0043) model time 0.4150 (0.4018) loss 6.6944 (7.2385) grad_norm 2.9452 (2.4644) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][130/625] eta 0:03:20 lr 0.000795 wd 0.0500 time 0.3939 (0.4049) data time 0.0009 (0.0040) model time 0.3931 (0.4015) loss 8.2202 (7.2365) grad_norm 2.6633 (2.4300) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][140/625] eta 0:03:16 lr 0.000794 wd 0.0500 time 0.3953 (0.4047) data time 0.0009 (0.0038) model time 0.3944 (0.4014) loss 8.9684 (7.2832) grad_norm 2.1128 (2.4228) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][150/625] eta 0:03:12 lr 0.000794 wd 0.0500 time 0.4071 (0.4044) data time 0.0011 (0.0036) model time 0.4060 (0.4012) loss 5.9560 (7.2853) grad_norm 1.7988 (2.4506) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][160/625] eta 0:03:07 lr 0.000794 wd 0.0500 time 0.3956 (0.4042) data time 0.0007 (0.0034) model time 0.3949 (0.4011) loss 7.3068 (7.3077) grad_norm 1.9064 (2.4742) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:32:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][170/625] eta 0:03:04 lr 0.000794 wd 0.0500 time 0.3977 (0.4050) data time 0.0008 (0.0033) model time 0.3968 (0.4025) loss 7.5511 (7.3261) grad_norm 1.9788 (2.5047) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][180/625] eta 0:03:03 lr 0.000794 wd 0.0500 time 0.5870 (0.4123) data time 0.0009 (0.0032) model time 0.5860 (0.4127) loss 8.3460 (7.3406) grad_norm 2.0598 (2.5016) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][190/625] eta 0:03:01 lr 0.000794 wd 0.0500 time 0.3939 (0.4178) data time 0.0009 (0.0030) model time 0.3930 (0.4202) loss 8.2006 (7.3571) grad_norm 2.5276 (2.5130) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][200/625] eta 0:02:58 lr 0.000794 wd 0.0500 time 0.3972 (0.4210) data time 0.0009 (0.0029) model time 0.3963 (0.4241) loss 7.9719 (7.3724) grad_norm 4.0891 (2.5416) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][210/625] eta 0:02:54 lr 0.000794 wd 0.0500 time 0.3997 (0.4201) data time 0.0007 (0.0028) model time 0.3991 (0.4228) loss 7.4557 (7.3724) grad_norm 1.8973 (2.5429) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][220/625] eta 0:02:49 lr 0.000794 wd 0.0500 time 0.4021 (0.4193) data time 0.0007 (0.0028) model time 0.4014 (0.4216) loss 7.1092 (7.3576) grad_norm 2.5735 (2.5319) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][230/625] eta 0:02:45 lr 0.000794 wd 0.0500 time 0.3921 (0.4185) data time 0.0008 (0.0027) model time 0.3912 (0.4204) loss 9.1731 (7.3762) grad_norm 1.7249 (2.5129) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][240/625] eta 0:02:40 lr 0.000793 wd 0.0500 time 0.4059 (0.4179) data time 0.0008 (0.0026) model time 0.4051 (0.4194) loss 5.8801 (7.3542) grad_norm 2.9011 (2.5066) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][250/625] eta 0:02:36 lr 0.000793 wd 0.0500 time 0.4136 (0.4174) data time 0.0006 (0.0025) model time 0.4129 (0.4186) loss 6.1663 (7.3452) grad_norm 1.9981 (2.5190) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][260/625] eta 0:02:32 lr 0.000793 wd 0.0500 time 0.3987 (0.4168) data time 0.0008 (0.0025) model time 0.3979 (0.4178) loss 7.0747 (7.3452) grad_norm 1.9670 (2.5360) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][270/625] eta 0:02:27 lr 0.000793 wd 0.0500 time 0.4048 (0.4162) data time 0.0007 (0.0024) model time 0.4041 (0.4170) loss 6.3995 (7.3462) grad_norm 1.5324 (2.5772) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][280/625] eta 0:02:23 lr 0.000793 wd 0.0500 time 0.4187 (0.4158) data time 0.0009 (0.0024) model time 0.4179 (0.4164) loss 6.3788 (7.3387) grad_norm 1.8948 (2.5748) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][290/625] eta 0:02:19 lr 0.000793 wd 0.0500 time 0.3973 (0.4153) data time 0.0007 (0.0023) model time 0.3966 (0.4158) loss 7.9974 (7.3346) grad_norm 1.5394 (2.5530) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][300/625] eta 0:02:15 lr 0.000793 wd 0.0500 time 0.4011 (0.4156) data time 0.0007 (0.0023) model time 0.4004 (0.4160) loss 6.5582 (7.3403) grad_norm 2.2490 (2.5370) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][310/625] eta 0:02:10 lr 0.000793 wd 0.0500 time 0.4136 (0.4153) data time 0.0008 (0.0022) model time 0.4128 (0.4156) loss 7.6317 (7.3442) grad_norm 1.9379 (2.5367) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][320/625] eta 0:02:06 lr 0.000793 wd 0.0500 time 0.3975 (0.4149) data time 0.0006 (0.0022) model time 0.3969 (0.4151) loss 7.7900 (7.3363) grad_norm 1.6924 (2.5191) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][330/625] eta 0:02:02 lr 0.000793 wd 0.0500 time 0.4001 (0.4146) data time 0.0008 (0.0022) model time 0.3992 (0.4147) loss 8.2992 (7.3315) grad_norm 1.5170 (2.4984) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][340/625] eta 0:01:58 lr 0.000792 wd 0.0500 time 0.4163 (0.4143) data time 0.0007 (0.0021) model time 0.4156 (0.4143) loss 6.8462 (7.3362) grad_norm 2.2898 (2.4872) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][350/625] eta 0:01:53 lr 0.000792 wd 0.0500 time 0.3970 (0.4140) data time 0.0008 (0.0021) model time 0.3961 (0.4139) loss 7.3484 (7.3330) grad_norm 1.9181 (2.4822) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][360/625] eta 0:01:49 lr 0.000792 wd 0.0500 time 0.3970 (0.4136) data time 0.0007 (0.0021) model time 0.3963 (0.4135) loss 6.9861 (7.3265) grad_norm 1.9752 (2.4750) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][370/625] eta 0:01:45 lr 0.000792 wd 0.0500 time 0.4181 (0.4133) data time 0.0006 (0.0020) model time 0.4175 (0.4131) loss 7.6954 (7.3345) grad_norm 2.1202 (2.4592) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][380/625] eta 0:01:41 lr 0.000792 wd 0.0500 time 0.3963 (0.4130) data time 0.0009 (0.0020) model time 0.3954 (0.4127) loss 6.9803 (7.3336) grad_norm 2.9501 (2.4590) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][390/625] eta 0:01:37 lr 0.000792 wd 0.0500 time 0.3995 (0.4133) data time 0.0006 (0.0020) model time 0.3989 (0.4130) loss 7.5911 (7.3223) grad_norm 1.8050 (2.4651) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][400/625] eta 0:01:33 lr 0.000792 wd 0.0500 time 0.3944 (0.4161) data time 0.0009 (0.0019) model time 0.3936 (0.4162) loss 7.1671 (7.3135) grad_norm 2.6198 (2.4743) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][410/625] eta 0:01:29 lr 0.000792 wd 0.0500 time 0.3940 (0.4185) data time 0.0007 (0.0019) model time 0.3933 (0.4189) loss 7.9178 (7.3212) grad_norm 5.1263 (2.5248) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][420/625] eta 0:01:26 lr 0.000792 wd 0.0500 time 0.4163 (0.4196) data time 0.0009 (0.0019) model time 0.4154 (0.4201) loss 7.3612 (7.3249) grad_norm 1.5632 (2.5165) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][430/625] eta 0:01:21 lr 0.000792 wd 0.0500 time 0.4005 (0.4192) data time 0.0008 (0.0019) model time 0.3997 (0.4196) loss 7.5107 (7.3275) grad_norm 1.7103 (2.5101) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][440/625] eta 0:01:17 lr 0.000791 wd 0.0500 time 0.3995 (0.4188) data time 0.0012 (0.0019) model time 0.3983 (0.4191) loss 7.1900 (7.3214) grad_norm 1.6024 (2.5040) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][450/625] eta 0:01:13 lr 0.000791 wd 0.0500 time 0.4126 (0.4185) data time 0.0007 (0.0019) model time 0.4119 (0.4188) loss 7.6857 (7.3200) grad_norm 3.1287 (2.4970) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:34:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][460/625] eta 0:01:09 lr 0.000791 wd 0.0500 time 0.3954 (0.4182) data time 0.0006 (0.0018) model time 0.3947 (0.4184) loss 6.9923 (7.3188) grad_norm 2.4690 (2.4913) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][470/625] eta 0:01:04 lr 0.000791 wd 0.0500 time 0.4035 (0.4179) data time 0.0009 (0.0018) model time 0.4026 (0.4180) loss 6.2874 (7.3059) grad_norm 1.7451 (2.4818) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][480/625] eta 0:01:00 lr 0.000791 wd 0.0500 time 0.4183 (0.4177) data time 0.0007 (0.0018) model time 0.4176 (0.4177) loss 8.4868 (7.3158) grad_norm 2.0751 (2.4830) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][490/625] eta 0:00:56 lr 0.000791 wd 0.0500 time 0.3952 (0.4175) data time 0.0007 (0.0018) model time 0.3945 (0.4175) loss 6.4203 (7.3127) grad_norm 2.5176 (2.4764) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][500/625] eta 0:00:52 lr 0.000791 wd 0.0500 time 0.4018 (0.4172) data time 0.0008 (0.0018) model time 0.4011 (0.4172) loss 7.0723 (7.3138) grad_norm 2.3161 (2.4778) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][510/625] eta 0:00:47 lr 0.000791 wd 0.0500 time 0.4080 (0.4169) data time 0.0007 (0.0018) model time 0.4073 (0.4168) loss 7.3102 (7.3156) grad_norm 3.1148 (2.4813) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][520/625] eta 0:00:43 lr 0.000791 wd 0.0500 time 0.3938 (0.4168) data time 0.0008 (0.0018) model time 0.3930 (0.4167) loss 7.6205 (7.3132) grad_norm 1.6246 (2.4789) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][530/625] eta 0:00:39 lr 0.000791 wd 0.0500 time 0.4119 (0.4166) data time 0.0035 (0.0017) model time 0.4084 (0.4164) loss 8.8877 (7.3094) grad_norm 1.8107 (2.4680) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][540/625] eta 0:00:35 lr 0.000790 wd 0.0500 time 0.4073 (0.4163) data time 0.0006 (0.0017) model time 0.4067 (0.4161) loss 6.2673 (7.3106) grad_norm 2.2620 (2.4689) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][550/625] eta 0:00:31 lr 0.000790 wd 0.0500 time 0.3950 (0.4160) data time 0.0010 (0.0017) model time 0.3940 (0.4158) loss 7.0989 (7.3144) grad_norm 2.2744 (2.4767) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:35:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][560/625] eta 0:00:27 lr 0.000790 wd 0.0500 time 0.4011 (0.4158) data time 0.0008 (0.0017) model time 0.4003 (0.4155) loss 6.3871 (7.3153) grad_norm 2.7157 (2.4769) loss_scale 2048.0000 (1031.3012) mem 14939MB [2024-07-25 01:35:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][570/625] eta 0:00:22 lr 0.000790 wd 0.0500 time 0.4070 (0.4156) data time 0.0008 (0.0017) model time 0.4062 (0.4153) loss 5.9117 (7.3172) grad_norm 2.3082 (2.4819) loss_scale 2048.0000 (1049.1068) mem 14939MB [2024-07-25 01:35:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][580/625] eta 0:00:18 lr 0.000790 wd 0.0500 time 0.3942 (0.4153) data time 0.0010 (0.0017) model time 0.3932 (0.4150) loss 7.7727 (7.3164) grad_norm 1.6483 (2.4725) loss_scale 2048.0000 (1066.2995) mem 14939MB [2024-07-25 01:35:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][590/625] eta 0:00:14 lr 0.000790 wd 0.0500 time 0.4043 (0.4151) data time 0.0009 (0.0017) model time 0.4033 (0.4148) loss 6.4923 (7.3213) grad_norm 2.7544 (2.4666) loss_scale 2048.0000 (1082.9103) mem 14939MB [2024-07-25 01:35:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][600/625] eta 0:00:10 lr 0.000790 wd 0.0500 time 0.4110 (0.4150) data time 0.0008 (0.0017) model time 0.4102 (0.4145) loss 7.2176 (7.3281) grad_norm 2.9981 (2.4652) loss_scale 2048.0000 (1098.9684) mem 14939MB [2024-07-25 01:35:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][610/625] eta 0:00:06 lr 0.000790 wd 0.0500 time 0.3986 (0.4150) data time 0.0004 (0.0016) model time 0.3982 (0.4146) loss 6.2781 (7.3262) grad_norm 1.8658 (2.4583) loss_scale 2048.0000 (1114.5008) mem 14939MB [2024-07-25 01:36:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][620/625] eta 0:00:02 lr 0.000790 wd 0.0500 time 0.5681 (0.4163) data time 0.0004 (0.0016) model time 0.5677 (0.4160) loss 7.7928 (7.3254) grad_norm 1.9211 (2.4577) loss_scale 2048.0000 (1129.5330) mem 14939MB [2024-07-25 01:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 131 training takes 0:04:20 [2024-07-25 01:36:06 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:36:07 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:36:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.6128 (0.6128) Acc@1 88.379 (88.379) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-25 01:36:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 1.0166 (0.7678) Acc@1 77.881 (84.477) Acc@5 94.727 (97.212) Mem 14939MB [2024-07-25 01:36:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1338 (0.9056) Acc@1 73.926 (81.055) Acc@5 93.457 (95.659) Mem 14939MB [2024-07-25 01:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.716 Acc@5 95.661 [2024-07-25 01:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-07-25 01:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.72% [2024-07-25 01:36:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 01:36:10 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 01:36:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5771 (0.5771) Acc@1 88.770 (88.770) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:36:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.9390 (0.7224) Acc@1 79.639 (85.258) Acc@5 95.264 (97.381) Mem 14939MB [2024-07-25 01:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0742 (0.8537) Acc@1 75.195 (81.810) Acc@5 94.043 (96.033) Mem 14939MB [2024-07-25 01:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.452 Acc@5 95.997 [2024-07-25 01:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 01:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.45% [2024-07-25 01:36:13 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:36:14 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:36:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][0/625] eta 0:09:13 lr 0.000790 wd 0.0500 time 0.8859 (0.8859) data time 0.4955 (0.4955) model time 0.0000 (0.0000) loss 7.0427 (7.0427) grad_norm 2.1898 (2.1898) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][10/625] eta 0:05:31 lr 0.000789 wd 0.0500 time 0.5675 (0.5398) data time 0.0009 (0.0459) model time 0.0000 (0.0000) loss 7.7523 (7.2309) grad_norm 3.4175 (2.7191) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][20/625] eta 0:04:53 lr 0.000789 wd 0.0500 time 0.4316 (0.4844) data time 0.0006 (0.0245) model time 0.0000 (0.0000) loss 7.3677 (7.3841) grad_norm 2.6756 (2.4121) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][30/625] eta 0:04:33 lr 0.000789 wd 0.0500 time 0.3937 (0.4596) data time 0.0008 (0.0173) model time 0.0000 (0.0000) loss 8.0547 (7.3057) grad_norm 1.5975 (2.5388) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][40/625] eta 0:04:21 lr 0.000789 wd 0.0500 time 0.3962 (0.4464) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 6.8966 (7.2668) grad_norm 2.0846 (2.5599) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][50/625] eta 0:04:11 lr 0.000789 wd 0.0500 time 0.4123 (0.4375) data time 0.0007 (0.0110) model time 0.0000 (0.0000) loss 6.5966 (7.2697) grad_norm 2.2332 (2.4578) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][60/625] eta 0:04:03 lr 0.000789 wd 0.0500 time 0.3925 (0.4313) data time 0.0009 (0.0094) model time 0.3916 (0.3985) loss 8.0403 (7.2224) grad_norm 1.9751 (2.3558) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][70/625] eta 0:03:56 lr 0.000789 wd 0.0500 time 0.3942 (0.4268) data time 0.0009 (0.0082) model time 0.3933 (0.3983) loss 7.7100 (7.2861) grad_norm 1.9275 (2.2939) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][80/625] eta 0:03:50 lr 0.000789 wd 0.0500 time 0.4051 (0.4236) data time 0.0010 (0.0074) model time 0.4041 (0.3988) loss 7.9410 (7.2670) grad_norm 1.7780 (2.2568) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][90/625] eta 0:03:45 lr 0.000789 wd 0.0500 time 0.3948 (0.4210) data time 0.0009 (0.0067) model time 0.3939 (0.3988) loss 7.8988 (7.2703) grad_norm 2.1111 (2.2147) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:36:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][100/625] eta 0:03:39 lr 0.000789 wd 0.0500 time 0.3970 (0.4190) data time 0.0006 (0.0061) model time 0.3963 (0.3990) loss 7.0862 (7.2505) grad_norm 3.1243 (2.2686) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][110/625] eta 0:03:35 lr 0.000788 wd 0.0500 time 0.4050 (0.4175) data time 0.0006 (0.0056) model time 0.4043 (0.3995) loss 8.3273 (7.2526) grad_norm 1.7338 (2.2378) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][120/625] eta 0:03:30 lr 0.000788 wd 0.0500 time 0.4025 (0.4163) data time 0.0009 (0.0052) model time 0.4016 (0.3998) loss 8.0896 (7.2342) grad_norm 2.4461 (2.2357) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][130/625] eta 0:03:25 lr 0.000788 wd 0.0500 time 0.3967 (0.4150) data time 0.0009 (0.0049) model time 0.3959 (0.3997) loss 7.0517 (7.2537) grad_norm 4.1632 (2.2577) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][140/625] eta 0:03:20 lr 0.000788 wd 0.0500 time 0.4127 (0.4143) data time 0.0007 (0.0046) model time 0.4120 (0.4001) loss 6.4917 (7.2484) grad_norm 2.1688 (2.2907) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][150/625] eta 0:03:16 lr 0.000788 wd 0.0500 time 0.3952 (0.4133) data time 0.0008 (0.0044) model time 0.3945 (0.4000) loss 7.9872 (7.2416) grad_norm 1.6229 (2.3062) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][160/625] eta 0:03:11 lr 0.000788 wd 0.0500 time 0.3958 (0.4126) data time 0.0008 (0.0042) model time 0.3950 (0.4000) loss 6.9702 (7.2366) grad_norm 2.1141 (2.3571) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][170/625] eta 0:03:07 lr 0.000788 wd 0.0500 time 0.4100 (0.4119) data time 0.0010 (0.0040) model time 0.4090 (0.4001) loss 7.4543 (7.2517) grad_norm 1.7979 (2.3598) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][180/625] eta 0:03:03 lr 0.000788 wd 0.0500 time 0.3933 (0.4113) data time 0.0011 (0.0038) model time 0.3922 (0.4000) loss 7.7543 (7.2555) grad_norm 2.3597 (2.3599) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][190/625] eta 0:02:58 lr 0.000788 wd 0.0500 time 0.3962 (0.4107) data time 0.0008 (0.0037) model time 0.3954 (0.4000) loss 7.5040 (7.2494) grad_norm 2.3103 (2.3719) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][200/625] eta 0:02:54 lr 0.000788 wd 0.0500 time 0.4064 (0.4112) data time 0.0006 (0.0035) model time 0.4058 (0.4012) loss 6.9837 (7.2454) grad_norm 1.7194 (2.3795) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][210/625] eta 0:02:51 lr 0.000787 wd 0.0500 time 0.6199 (0.4144) data time 0.0006 (0.0034) model time 0.6192 (0.4060) loss 7.3915 (7.2495) grad_norm 4.2610 (2.3836) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][220/625] eta 0:02:49 lr 0.000787 wd 0.0500 time 0.4049 (0.4179) data time 0.0008 (0.0033) model time 0.4041 (0.4111) loss 7.7717 (7.2665) grad_norm 1.6655 (2.3715) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][230/625] eta 0:02:46 lr 0.000787 wd 0.0500 time 0.5817 (0.4214) data time 0.0008 (0.0032) model time 0.5809 (0.4159) loss 8.0566 (7.2692) grad_norm 2.1178 (2.3840) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][240/625] eta 0:02:42 lr 0.000787 wd 0.0500 time 0.4165 (0.4215) data time 0.0007 (0.0031) model time 0.4158 (0.4162) loss 7.6019 (7.2672) grad_norm 3.1290 (2.3730) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:37:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][250/625] eta 0:02:37 lr 0.000787 wd 0.0500 time 0.3945 (0.4207) data time 0.0007 (0.0030) model time 0.3938 (0.4155) loss 7.5021 (7.2751) grad_norm 2.9095 (2.3754) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][260/625] eta 0:02:33 lr 0.000787 wd 0.0500 time 0.4093 (0.4201) data time 0.0006 (0.0029) model time 0.4086 (0.4149) loss 7.7188 (7.2690) grad_norm 2.2296 (2.3604) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][270/625] eta 0:02:28 lr 0.000787 wd 0.0500 time 0.4106 (0.4195) data time 0.0008 (0.0029) model time 0.4098 (0.4144) loss 6.3593 (7.2763) grad_norm 1.7574 (2.3505) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][280/625] eta 0:02:24 lr 0.000787 wd 0.0500 time 0.3941 (0.4190) data time 0.0007 (0.0028) model time 0.3933 (0.4139) loss 8.4792 (7.2849) grad_norm 2.5802 (2.3570) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][290/625] eta 0:02:20 lr 0.000787 wd 0.0500 time 0.4045 (0.4184) data time 0.0008 (0.0027) model time 0.4036 (0.4134) loss 7.8326 (7.2919) grad_norm 2.5379 (2.3742) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][300/625] eta 0:02:15 lr 0.000787 wd 0.0500 time 0.4133 (0.4179) data time 0.0006 (0.0027) model time 0.4127 (0.4129) loss 7.3672 (7.3004) grad_norm 4.3360 (2.3821) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][310/625] eta 0:02:11 lr 0.000786 wd 0.0500 time 0.3927 (0.4173) data time 0.0007 (0.0026) model time 0.3920 (0.4124) loss 7.9450 (7.3087) grad_norm 1.4394 (2.3807) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][320/625] eta 0:02:07 lr 0.000786 wd 0.0500 time 0.4024 (0.4169) data time 0.0008 (0.0026) model time 0.4016 (0.4120) loss 8.4040 (7.3167) grad_norm 1.8408 (2.3825) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][330/625] eta 0:02:02 lr 0.000786 wd 0.0500 time 0.4036 (0.4166) data time 0.0007 (0.0025) model time 0.4029 (0.4118) loss 8.0875 (7.3310) grad_norm 2.0000 (2.3829) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][340/625] eta 0:01:58 lr 0.000786 wd 0.0500 time 0.3940 (0.4162) data time 0.0007 (0.0025) model time 0.3933 (0.4115) loss 7.7019 (7.3317) grad_norm 1.8042 (2.3786) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][350/625] eta 0:01:54 lr 0.000786 wd 0.0500 time 0.4047 (0.4158) data time 0.0006 (0.0024) model time 0.4041 (0.4111) loss 5.9938 (7.3284) grad_norm 1.8491 (2.3638) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][360/625] eta 0:01:50 lr 0.000786 wd 0.0500 time 0.4127 (0.4155) data time 0.0009 (0.0024) model time 0.4119 (0.4109) loss 6.0166 (7.3330) grad_norm 3.3762 (2.3738) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][370/625] eta 0:01:45 lr 0.000786 wd 0.0500 time 0.3945 (0.4151) data time 0.0006 (0.0023) model time 0.3938 (0.4106) loss 7.1495 (7.3448) grad_norm 3.1190 (2.4044) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][380/625] eta 0:01:41 lr 0.000786 wd 0.0500 time 0.4024 (0.4148) data time 0.0009 (0.0023) model time 0.4015 (0.4104) loss 6.8610 (7.3320) grad_norm 1.5610 (2.4066) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:38:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][390/625] eta 0:01:37 lr 0.000786 wd 0.0500 time 0.4248 (0.4146) data time 0.0006 (0.0023) model time 0.4242 (0.4102) loss 7.2322 (7.3367) grad_norm 2.2582 (2.4079) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][400/625] eta 0:01:33 lr 0.000785 wd 0.0500 time 0.3935 (0.4144) data time 0.0009 (0.0022) model time 0.3926 (0.4100) loss 8.5821 (7.3397) grad_norm 2.1736 (2.4012) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][410/625] eta 0:01:29 lr 0.000785 wd 0.0500 time 0.4122 (0.4141) data time 0.0006 (0.0022) model time 0.4116 (0.4098) loss 8.0846 (7.3415) grad_norm 3.4597 (2.4103) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][420/625] eta 0:01:24 lr 0.000785 wd 0.0500 time 0.4151 (0.4143) data time 0.0006 (0.0022) model time 0.4145 (0.4101) loss 6.6642 (7.3465) grad_norm 2.5684 (2.4179) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][430/625] eta 0:01:20 lr 0.000785 wd 0.0500 time 0.3944 (0.4147) data time 0.0009 (0.0022) model time 0.3935 (0.4106) loss 7.3067 (7.3459) grad_norm 1.9911 (2.4210) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][440/625] eta 0:01:17 lr 0.000785 wd 0.0500 time 0.5836 (0.4173) data time 0.0007 (0.0021) model time 0.5829 (0.4137) loss 8.0087 (7.3503) grad_norm 2.0175 (2.4285) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][450/625] eta 0:01:13 lr 0.000785 wd 0.0500 time 0.5917 (0.4192) data time 0.0007 (0.0021) model time 0.5910 (0.4159) loss 6.9875 (7.3448) grad_norm 2.2346 (2.4356) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][460/625] eta 0:01:09 lr 0.000785 wd 0.0500 time 0.4060 (0.4195) data time 0.0007 (0.0021) model time 0.4053 (0.4163) loss 7.0240 (7.3487) grad_norm 1.9009 (2.4342) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][470/625] eta 0:01:04 lr 0.000785 wd 0.0500 time 0.3988 (0.4191) data time 0.0008 (0.0021) model time 0.3980 (0.4159) loss 7.6866 (7.3519) grad_norm 3.8729 (2.4333) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][480/625] eta 0:01:00 lr 0.000785 wd 0.0500 time 0.4014 (0.4188) data time 0.0008 (0.0020) model time 0.4006 (0.4156) loss 7.7701 (7.3482) grad_norm 2.6163 (2.4333) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][490/625] eta 0:00:56 lr 0.000785 wd 0.0500 time 0.4070 (0.4184) data time 0.0007 (0.0020) model time 0.4063 (0.4153) loss 8.0211 (7.3437) grad_norm 2.2188 (2.4343) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][500/625] eta 0:00:52 lr 0.000784 wd 0.0500 time 0.3999 (0.4181) data time 0.0007 (0.0020) model time 0.3992 (0.4149) loss 7.5836 (7.3432) grad_norm 1.9289 (2.4267) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][510/625] eta 0:00:48 lr 0.000784 wd 0.0500 time 0.4005 (0.4178) data time 0.0009 (0.0020) model time 0.3995 (0.4147) loss 7.5306 (7.3438) grad_norm 1.7043 (2.4244) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][520/625] eta 0:00:43 lr 0.000784 wd 0.0500 time 0.4074 (0.4176) data time 0.0009 (0.0020) model time 0.4065 (0.4144) loss 7.5236 (7.3421) grad_norm 3.0116 (2.4269) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:39:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][530/625] eta 0:00:39 lr 0.000784 wd 0.0500 time 0.4045 (0.4173) data time 0.0007 (0.0019) model time 0.4038 (0.4142) loss 6.7315 (7.3403) grad_norm 2.4364 (2.4361) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][540/625] eta 0:00:35 lr 0.000784 wd 0.0500 time 0.4027 (0.4170) data time 0.0007 (0.0019) model time 0.4020 (0.4139) loss 7.5648 (7.3434) grad_norm 1.9608 (2.4290) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][550/625] eta 0:00:31 lr 0.000784 wd 0.0500 time 0.4102 (0.4168) data time 0.0007 (0.0019) model time 0.4095 (0.4137) loss 6.6556 (7.3387) grad_norm 2.3115 (2.4248) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][560/625] eta 0:00:27 lr 0.000784 wd 0.0500 time 0.4059 (0.4166) data time 0.0006 (0.0019) model time 0.4053 (0.4135) loss 7.2742 (7.3395) grad_norm 1.6916 (2.4178) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][570/625] eta 0:00:22 lr 0.000784 wd 0.0500 time 0.3990 (0.4163) data time 0.0008 (0.0019) model time 0.3983 (0.4133) loss 7.2349 (7.3413) grad_norm 1.7668 (2.4117) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][580/625] eta 0:00:18 lr 0.000784 wd 0.0500 time 0.4048 (0.4161) data time 0.0008 (0.0019) model time 0.4040 (0.4131) loss 6.4152 (7.3415) grad_norm 2.9429 (2.4090) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][590/625] eta 0:00:14 lr 0.000784 wd 0.0500 time 0.3942 (0.4159) data time 0.0007 (0.0018) model time 0.3935 (0.4128) loss 6.6920 (7.3376) grad_norm 2.3704 (2.4189) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][600/625] eta 0:00:10 lr 0.000783 wd 0.0500 time 0.3974 (0.4156) data time 0.0007 (0.0018) model time 0.3967 (0.4126) loss 5.9093 (7.3279) grad_norm 2.3152 (2.4256) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][610/625] eta 0:00:06 lr 0.000783 wd 0.0500 time 0.4072 (0.4154) data time 0.0006 (0.0018) model time 0.4066 (0.4124) loss 6.0281 (7.3233) grad_norm 3.3407 (2.4342) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][620/625] eta 0:00:02 lr 0.000783 wd 0.0500 time 0.3957 (0.4152) data time 0.0004 (0.0018) model time 0.3953 (0.4122) loss 6.3600 (7.3226) grad_norm 5.1716 (2.4483) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 132 training takes 0:04:19 [2024-07-25 01:40:33 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:40:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:40:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.6182 (0.6182) Acc@1 88.428 (88.428) Acc@5 98.047 (98.047) Mem 14939MB [2024-07-25 01:40:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.9990 (0.7649) Acc@1 77.100 (84.477) Acc@5 94.873 (97.075) Mem 14939MB [2024-07-25 01:40:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1221 (0.9029) Acc@1 74.072 (80.864) Acc@5 93.213 (95.466) Mem 14939MB [2024-07-25 01:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.486 Acc@5 95.445 [2024-07-25 01:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-07-25 01:40:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.749 (0.749) Loss 0.5762 (0.5762) Acc@1 88.721 (88.721) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 0.9375 (0.7215) Acc@1 79.639 (85.272) Acc@5 95.312 (97.385) Mem 14939MB [2024-07-25 01:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 1.0713 (0.8521) Acc@1 75.195 (81.831) Acc@5 94.043 (96.038) Mem 14939MB [2024-07-25 01:40:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.472 Acc@5 96.007 [2024-07-25 01:40:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 01:40:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.47% [2024-07-25 01:40:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:40:41 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:40:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][0/625] eta 0:07:59 lr 0.000783 wd 0.0500 time 0.7664 (0.7664) data time 0.3912 (0.3912) model time 0.0000 (0.0000) loss 6.6253 (6.6253) grad_norm 3.3213 (3.3213) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][10/625] eta 0:04:28 lr 0.000783 wd 0.0500 time 0.4191 (0.4374) data time 0.0009 (0.0365) model time 0.0000 (0.0000) loss 8.6883 (7.1365) grad_norm 2.9681 (2.7899) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][20/625] eta 0:04:20 lr 0.000783 wd 0.0500 time 0.4017 (0.4306) data time 0.0008 (0.0197) model time 0.0000 (0.0000) loss 8.5710 (7.2236) grad_norm 1.8107 (2.5962) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:40:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][30/625] eta 0:04:31 lr 0.000783 wd 0.0500 time 0.5919 (0.4570) data time 0.0007 (0.0136) model time 0.0000 (0.0000) loss 6.0251 (7.0918) grad_norm 2.6871 (2.4182) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][40/625] eta 0:04:33 lr 0.000783 wd 0.0500 time 0.4065 (0.4678) data time 0.0008 (0.0106) model time 0.0000 (0.0000) loss 6.8804 (7.0869) grad_norm 1.8116 (2.4130) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][50/625] eta 0:04:31 lr 0.000783 wd 0.0500 time 0.5617 (0.4722) data time 0.0008 (0.0087) model time 0.0000 (0.0000) loss 6.4539 (7.1877) grad_norm 1.8955 (2.4125) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][60/625] eta 0:04:20 lr 0.000783 wd 0.0500 time 0.4112 (0.4610) data time 0.0008 (0.0075) model time 0.4105 (0.4027) loss 8.1262 (7.2436) grad_norm 4.5563 (2.4532) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][70/625] eta 0:04:11 lr 0.000782 wd 0.0500 time 0.3923 (0.4526) data time 0.0008 (0.0066) model time 0.3916 (0.4016) loss 8.3906 (7.2760) grad_norm 3.6514 (2.5180) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][80/625] eta 0:04:03 lr 0.000782 wd 0.0500 time 0.4036 (0.4465) data time 0.0009 (0.0059) model time 0.4027 (0.4017) loss 7.3314 (7.3019) grad_norm 2.7265 (2.5302) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][90/625] eta 0:03:56 lr 0.000782 wd 0.0500 time 0.4125 (0.4418) data time 0.0007 (0.0054) model time 0.4118 (0.4019) loss 6.0951 (7.2966) grad_norm 2.4201 (2.4914) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][100/625] eta 0:03:49 lr 0.000782 wd 0.0500 time 0.3948 (0.4379) data time 0.0009 (0.0049) model time 0.3939 (0.4018) loss 6.8394 (7.2896) grad_norm 2.1843 (2.4454) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][110/625] eta 0:03:43 lr 0.000782 wd 0.0500 time 0.4104 (0.4349) data time 0.0007 (0.0046) model time 0.4098 (0.4021) loss 6.4564 (7.3071) grad_norm 1.6527 (2.4107) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][120/625] eta 0:03:38 lr 0.000782 wd 0.0500 time 0.4083 (0.4322) data time 0.0009 (0.0043) model time 0.4074 (0.4020) loss 8.1392 (7.3160) grad_norm 1.8172 (2.3927) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][130/625] eta 0:03:32 lr 0.000782 wd 0.0500 time 0.3944 (0.4299) data time 0.0007 (0.0040) model time 0.3937 (0.4020) loss 7.7399 (7.3248) grad_norm 2.5995 (2.3771) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][140/625] eta 0:03:27 lr 0.000782 wd 0.0500 time 0.4003 (0.4279) data time 0.0009 (0.0038) model time 0.3994 (0.4018) loss 7.5654 (7.3052) grad_norm 1.8307 (2.3545) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][150/625] eta 0:03:22 lr 0.000782 wd 0.0500 time 0.4041 (0.4262) data time 0.0008 (0.0036) model time 0.4033 (0.4017) loss 5.9367 (7.3007) grad_norm 1.6852 (2.3362) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][160/625] eta 0:03:17 lr 0.000782 wd 0.0500 time 0.4054 (0.4249) data time 0.0008 (0.0035) model time 0.4046 (0.4020) loss 7.8630 (7.3098) grad_norm 1.7498 (2.3290) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][170/625] eta 0:03:12 lr 0.000781 wd 0.0500 time 0.4090 (0.4236) data time 0.0009 (0.0033) model time 0.4081 (0.4019) loss 7.0045 (7.3170) grad_norm 1.5035 (2.3259) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:41:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][180/625] eta 0:03:08 lr 0.000781 wd 0.0500 time 0.4033 (0.4225) data time 0.0006 (0.0032) model time 0.4027 (0.4019) loss 6.7602 (7.3161) grad_norm 2.5187 (2.3189) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][190/625] eta 0:03:03 lr 0.000781 wd 0.0500 time 0.3937 (0.4215) data time 0.0020 (0.0031) model time 0.3917 (0.4020) loss 7.0890 (7.3061) grad_norm 3.6668 (2.3746) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][200/625] eta 0:02:58 lr 0.000781 wd 0.0500 time 0.3985 (0.4206) data time 0.0008 (0.0030) model time 0.3976 (0.4019) loss 7.7289 (7.3077) grad_norm 2.7921 (2.3714) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][210/625] eta 0:02:54 lr 0.000781 wd 0.0500 time 0.4235 (0.4208) data time 0.0006 (0.0029) model time 0.4228 (0.4034) loss 6.7191 (7.2988) grad_norm 1.9408 (2.3775) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][220/625] eta 0:02:50 lr 0.000781 wd 0.0500 time 0.4042 (0.4200) data time 0.0006 (0.0028) model time 0.4036 (0.4033) loss 6.9207 (7.2968) grad_norm 1.6368 (2.3966) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][230/625] eta 0:02:45 lr 0.000781 wd 0.0500 time 0.3979 (0.4192) data time 0.0009 (0.0027) model time 0.3970 (0.4032) loss 6.3856 (7.2929) grad_norm 1.3919 (2.3856) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][240/625] eta 0:02:41 lr 0.000781 wd 0.0500 time 0.6156 (0.4200) data time 0.0009 (0.0027) model time 0.6146 (0.4050) loss 7.9868 (7.3019) grad_norm 3.0234 (2.4257) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][250/625] eta 0:02:37 lr 0.000781 wd 0.0500 time 0.5686 (0.4212) data time 0.0009 (0.0026) model time 0.5677 (0.4072) loss 7.0394 (7.2995) grad_norm 2.9097 (2.4399) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][260/625] eta 0:02:34 lr 0.000781 wd 0.0500 time 0.4057 (0.4244) data time 0.0008 (0.0025) model time 0.4049 (0.4117) loss 8.7578 (7.3000) grad_norm 2.3762 (2.4323) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][270/625] eta 0:02:31 lr 0.000780 wd 0.0500 time 0.4101 (0.4260) data time 0.0009 (0.0025) model time 0.4092 (0.4143) loss 7.8905 (7.2923) grad_norm 2.3552 (2.4140) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][280/625] eta 0:02:26 lr 0.000780 wd 0.0500 time 0.4001 (0.4252) data time 0.0008 (0.0024) model time 0.3993 (0.4137) loss 8.4690 (7.2944) grad_norm 1.6679 (2.4336) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][290/625] eta 0:02:22 lr 0.000780 wd 0.0500 time 0.4030 (0.4244) data time 0.0007 (0.0024) model time 0.4023 (0.4132) loss 7.0094 (7.3104) grad_norm 1.9297 (2.4306) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][300/625] eta 0:02:17 lr 0.000780 wd 0.0500 time 0.4195 (0.4239) data time 0.0008 (0.0023) model time 0.4187 (0.4130) loss 7.8445 (7.3092) grad_norm 3.2757 (2.4426) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][310/625] eta 0:02:13 lr 0.000780 wd 0.0500 time 0.4000 (0.4234) data time 0.0007 (0.0023) model time 0.3993 (0.4128) loss 9.3299 (7.3061) grad_norm 2.2347 (2.4629) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:42:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][320/625] eta 0:02:08 lr 0.000780 wd 0.0500 time 0.3964 (0.4227) data time 0.0006 (0.0023) model time 0.3957 (0.4123) loss 8.2784 (7.3177) grad_norm 1.5068 (2.4538) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][330/625] eta 0:02:04 lr 0.000780 wd 0.0500 time 0.4088 (0.4221) data time 0.0008 (0.0022) model time 0.4080 (0.4120) loss 7.3694 (7.3153) grad_norm 2.0006 (2.4397) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][340/625] eta 0:02:00 lr 0.000780 wd 0.0500 time 0.3929 (0.4215) data time 0.0009 (0.0022) model time 0.3920 (0.4116) loss 8.5969 (7.3288) grad_norm 2.4137 (2.4303) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][350/625] eta 0:01:55 lr 0.000780 wd 0.0500 time 0.3969 (0.4210) data time 0.0006 (0.0022) model time 0.3963 (0.4112) loss 8.1597 (7.3168) grad_norm 1.6473 (2.4233) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][360/625] eta 0:01:51 lr 0.000780 wd 0.0500 time 0.4064 (0.4204) data time 0.0009 (0.0021) model time 0.4055 (0.4109) loss 7.0500 (7.3240) grad_norm 1.7396 (2.4180) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][370/625] eta 0:01:47 lr 0.000779 wd 0.0500 time 0.3982 (0.4199) data time 0.0008 (0.0021) model time 0.3974 (0.4105) loss 7.4525 (7.3313) grad_norm 1.7206 (2.4090) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][380/625] eta 0:01:42 lr 0.000779 wd 0.0500 time 0.3959 (0.4194) data time 0.0009 (0.0021) model time 0.3950 (0.4102) loss 6.8780 (7.3378) grad_norm 7.4106 (2.4354) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][390/625] eta 0:01:38 lr 0.000779 wd 0.0500 time 0.4107 (0.4189) data time 0.0007 (0.0020) model time 0.4099 (0.4099) loss 8.5853 (7.3378) grad_norm 1.9284 (2.4357) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][400/625] eta 0:01:34 lr 0.000779 wd 0.0500 time 0.3936 (0.4185) data time 0.0009 (0.0020) model time 0.3927 (0.4096) loss 8.1225 (7.3350) grad_norm 2.2921 (2.4391) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][410/625] eta 0:01:29 lr 0.000779 wd 0.0500 time 0.3956 (0.4181) data time 0.0009 (0.0020) model time 0.3947 (0.4094) loss 7.2600 (7.3402) grad_norm 2.6151 (2.4547) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][420/625] eta 0:01:25 lr 0.000779 wd 0.0500 time 0.4093 (0.4177) data time 0.0009 (0.0020) model time 0.4084 (0.4091) loss 7.9427 (7.3446) grad_norm 1.8077 (2.4495) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][430/625] eta 0:01:21 lr 0.000779 wd 0.0500 time 0.3919 (0.4177) data time 0.0008 (0.0019) model time 0.3911 (0.4093) loss 7.8563 (7.3570) grad_norm 1.6588 (2.4554) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][440/625] eta 0:01:17 lr 0.000779 wd 0.0500 time 0.3996 (0.4174) data time 0.0006 (0.0019) model time 0.3990 (0.4092) loss 6.0716 (7.3518) grad_norm 2.9993 (2.4651) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][450/625] eta 0:01:12 lr 0.000779 wd 0.0500 time 0.4169 (0.4170) data time 0.0007 (0.0019) model time 0.4162 (0.4090) loss 7.8092 (7.3437) grad_norm 1.9509 (2.4567) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][460/625] eta 0:01:08 lr 0.000779 wd 0.0500 time 0.3926 (0.4171) data time 0.0007 (0.0019) model time 0.3919 (0.4092) loss 6.9183 (7.3453) grad_norm 2.2053 (2.4551) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:43:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][470/625] eta 0:01:04 lr 0.000778 wd 0.0500 time 0.4133 (0.4179) data time 0.0008 (0.0019) model time 0.4125 (0.4103) loss 7.1221 (7.3331) grad_norm 5.1918 (2.4577) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][480/625] eta 0:01:00 lr 0.000778 wd 0.0500 time 0.4109 (0.4195) data time 0.0006 (0.0018) model time 0.4103 (0.4123) loss 8.5090 (7.3495) grad_norm 2.2012 (2.4630) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][490/625] eta 0:00:56 lr 0.000778 wd 0.0500 time 0.5823 (0.4211) data time 0.0009 (0.0018) model time 0.5813 (0.4142) loss 7.4561 (7.3426) grad_norm 2.6300 (2.4677) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][500/625] eta 0:00:52 lr 0.000778 wd 0.0500 time 0.3960 (0.4207) data time 0.0008 (0.0018) model time 0.3951 (0.4138) loss 8.1443 (7.3444) grad_norm 1.6388 (2.4687) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][510/625] eta 0:00:48 lr 0.000778 wd 0.0500 time 0.4026 (0.4204) data time 0.0008 (0.0018) model time 0.4018 (0.4136) loss 7.6770 (7.3423) grad_norm 1.4429 (2.4659) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][520/625] eta 0:00:44 lr 0.000778 wd 0.0500 time 0.4316 (0.4201) data time 0.0008 (0.0018) model time 0.4308 (0.4135) loss 9.0038 (7.3431) grad_norm 2.8334 (2.4603) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][530/625] eta 0:00:39 lr 0.000778 wd 0.0500 time 0.3945 (0.4200) data time 0.0009 (0.0018) model time 0.3936 (0.4134) loss 6.3864 (7.3368) grad_norm 3.1736 (2.4595) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][540/625] eta 0:00:35 lr 0.000778 wd 0.0500 time 0.4000 (0.4197) data time 0.0007 (0.0018) model time 0.3993 (0.4131) loss 7.1792 (7.3362) grad_norm 1.9960 (2.4579) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][550/625] eta 0:00:31 lr 0.000778 wd 0.0500 time 0.4097 (0.4194) data time 0.0007 (0.0018) model time 0.4090 (0.4129) loss 5.9659 (7.3325) grad_norm 3.4337 (2.4762) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][560/625] eta 0:00:27 lr 0.000777 wd 0.0500 time 0.3979 (0.4190) data time 0.0006 (0.0018) model time 0.3972 (0.4126) loss 8.8255 (7.3366) grad_norm 1.5743 (2.4696) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][570/625] eta 0:00:23 lr 0.000777 wd 0.0500 time 0.4021 (0.4187) data time 0.0007 (0.0018) model time 0.4014 (0.4124) loss 8.0540 (7.3381) grad_norm 2.2620 (2.4779) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][580/625] eta 0:00:18 lr 0.000777 wd 0.0500 time 0.4103 (0.4184) data time 0.0008 (0.0017) model time 0.4095 (0.4122) loss 7.4685 (7.3421) grad_norm 1.8597 (2.4746) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][590/625] eta 0:00:14 lr 0.000777 wd 0.0500 time 0.3949 (0.4181) data time 0.0008 (0.0017) model time 0.3941 (0.4119) loss 6.4869 (7.3405) grad_norm 1.9780 (2.4683) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][600/625] eta 0:00:10 lr 0.000777 wd 0.0500 time 0.3982 (0.4178) data time 0.0009 (0.0017) model time 0.3973 (0.4117) loss 7.9733 (7.3408) grad_norm 1.9174 (2.4653) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:44:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][610/625] eta 0:00:06 lr 0.000777 wd 0.0500 time 0.4075 (0.4176) data time 0.0006 (0.0017) model time 0.4069 (0.4115) loss 7.6022 (7.3365) grad_norm 3.0170 (2.4713) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:45:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][620/625] eta 0:00:02 lr 0.000777 wd 0.0500 time 0.3967 (0.4173) data time 0.0006 (0.0017) model time 0.3961 (0.4113) loss 8.3832 (7.3334) grad_norm 1.8664 (2.4732) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 133 training takes 0:04:20 [2024-07-25 01:45:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:45:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:45:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 0.6040 (0.6040) Acc@1 88.525 (88.525) Acc@5 98.535 (98.535) Mem 14939MB [2024-07-25 01:45:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9834 (0.7523) Acc@1 78.906 (84.819) Acc@5 95.117 (97.212) Mem 14939MB [2024-07-25 01:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0898 (0.8876) Acc@1 74.902 (81.362) Acc@5 93.848 (95.712) Mem 14939MB [2024-07-25 01:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.990 Acc@5 95.741 [2024-07-25 01:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 01:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.99% [2024-07-25 01:45:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 01:45:06 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 01:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.5752 (0.5752) Acc@1 88.770 (88.770) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.9365 (0.7206) Acc@1 79.688 (85.298) Acc@5 95.459 (97.417) Mem 14939MB [2024-07-25 01:45:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0693 (0.8510) Acc@1 75.391 (81.859) Acc@5 94.043 (96.061) Mem 14939MB [2024-07-25 01:45:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.498 Acc@5 96.031 [2024-07-25 01:45:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 01:45:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.50% [2024-07-25 01:45:09 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:45:10 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][0/625] eta 0:08:10 lr 0.000777 wd 0.0500 time 0.7848 (0.7848) data time 0.4094 (0.4094) model time 0.0000 (0.0000) loss 7.7819 (7.7819) grad_norm 1.5513 (1.5513) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][10/625] eta 0:04:32 lr 0.000777 wd 0.0500 time 0.4285 (0.4431) data time 0.0009 (0.0381) model time 0.0000 (0.0000) loss 8.4164 (7.3524) grad_norm 3.7388 (2.1229) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][20/625] eta 0:04:16 lr 0.000777 wd 0.0500 time 0.3940 (0.4238) data time 0.0006 (0.0205) model time 0.0000 (0.0000) loss 8.6177 (7.4135) grad_norm 2.8479 (2.3022) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:45:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][30/625] eta 0:04:08 lr 0.000777 wd 0.0500 time 0.3975 (0.4176) data time 0.0009 (0.0142) model time 0.0000 (0.0000) loss 7.1439 (7.4057) grad_norm 1.7941 (2.3950) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:45:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][40/625] eta 0:04:02 lr 0.000776 wd 0.0500 time 0.4167 (0.4146) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 7.7808 (7.4043) grad_norm 1.6742 (2.4538) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 01:45:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][50/625] eta 0:03:56 lr 0.000776 wd 0.0500 time 0.3937 (0.4117) data time 0.0009 (0.0090) model time 0.0000 (0.0000) loss 8.2815 (7.3069) grad_norm 2.7442 (inf) loss_scale 1024.0000 (1867.2941) mem 14939MB [2024-07-25 01:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][60/625] eta 0:03:56 lr 0.000776 wd 0.0500 time 0.5418 (0.4189) data time 0.0008 (0.0077) model time 0.5410 (0.4544) loss 6.3957 (7.2917) grad_norm 3.5061 (inf) loss_scale 1024.0000 (1729.0492) mem 14939MB [2024-07-25 01:45:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][70/625] eta 0:03:59 lr 0.000776 wd 0.0500 time 0.6493 (0.4308) data time 0.0009 (0.0068) model time 0.6485 (0.4784) loss 6.0978 (7.2754) grad_norm 2.6452 (inf) loss_scale 1024.0000 (1629.7465) mem 14939MB [2024-07-25 01:45:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][80/625] eta 0:03:55 lr 0.000776 wd 0.0500 time 0.5325 (0.4327) data time 0.0008 (0.0061) model time 0.5317 (0.4674) loss 7.3589 (7.2302) grad_norm 2.4262 (inf) loss_scale 1024.0000 (1554.9630) mem 14939MB [2024-07-25 01:45:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][90/625] eta 0:03:54 lr 0.000776 wd 0.0500 time 0.4104 (0.4379) data time 0.0008 (0.0055) model time 0.4096 (0.4702) loss 6.6602 (7.2059) grad_norm 1.7527 (inf) loss_scale 1024.0000 (1496.6154) mem 14939MB [2024-07-25 01:45:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][100/625] eta 0:03:47 lr 0.000776 wd 0.0500 time 0.3956 (0.4343) data time 0.0007 (0.0051) model time 0.3949 (0.4563) loss 5.8193 (7.2561) grad_norm 2.2345 (inf) loss_scale 1024.0000 (1449.8218) mem 14939MB [2024-07-25 01:45:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][110/625] eta 0:03:42 lr 0.000776 wd 0.0500 time 0.4031 (0.4314) data time 0.0010 (0.0047) model time 0.4021 (0.4472) loss 8.0019 (7.2767) grad_norm 5.0229 (inf) loss_scale 1024.0000 (1411.4595) mem 14939MB [2024-07-25 01:46:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][120/625] eta 0:03:36 lr 0.000776 wd 0.0500 time 0.4069 (0.4292) data time 0.0008 (0.0044) model time 0.4060 (0.4409) loss 8.2689 (7.2801) grad_norm 2.4822 (inf) loss_scale 1024.0000 (1379.4380) mem 14939MB [2024-07-25 01:46:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][130/625] eta 0:03:31 lr 0.000776 wd 0.0500 time 0.3921 (0.4273) data time 0.0006 (0.0041) model time 0.3915 (0.4363) loss 7.8536 (7.3016) grad_norm 1.7550 (inf) loss_scale 1024.0000 (1352.3053) mem 14939MB [2024-07-25 01:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][140/625] eta 0:03:26 lr 0.000775 wd 0.0500 time 0.4011 (0.4257) data time 0.0009 (0.0040) model time 0.4002 (0.4325) loss 9.0604 (7.3178) grad_norm 3.7059 (inf) loss_scale 1024.0000 (1329.0213) mem 14939MB [2024-07-25 01:46:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][150/625] eta 0:03:21 lr 0.000775 wd 0.0500 time 0.4096 (0.4243) data time 0.0007 (0.0038) model time 0.4088 (0.4296) loss 8.1997 (7.3357) grad_norm 2.7116 (inf) loss_scale 1024.0000 (1308.8212) mem 14939MB [2024-07-25 01:46:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][160/625] eta 0:03:16 lr 0.000775 wd 0.0500 time 0.3927 (0.4236) data time 0.0007 (0.0036) model time 0.3920 (0.4281) loss 7.1751 (7.3329) grad_norm 3.2377 (inf) loss_scale 1024.0000 (1291.1304) mem 14939MB [2024-07-25 01:46:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][170/625] eta 0:03:12 lr 0.000775 wd 0.0500 time 0.4067 (0.4229) data time 0.0008 (0.0035) model time 0.4059 (0.4265) loss 6.8015 (7.3208) grad_norm 3.1996 (inf) loss_scale 1024.0000 (1275.5088) mem 14939MB [2024-07-25 01:46:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][180/625] eta 0:03:07 lr 0.000775 wd 0.0500 time 0.4204 (0.4224) data time 0.0007 (0.0034) model time 0.4197 (0.4254) loss 7.3538 (7.3333) grad_norm 1.6313 (inf) loss_scale 1024.0000 (1261.6133) mem 14939MB [2024-07-25 01:46:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][190/625] eta 0:03:03 lr 0.000775 wd 0.0500 time 0.3942 (0.4220) data time 0.0007 (0.0033) model time 0.3936 (0.4245) loss 7.3510 (7.3268) grad_norm 3.1961 (inf) loss_scale 1024.0000 (1249.1728) mem 14939MB [2024-07-25 01:46:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][200/625] eta 0:02:58 lr 0.000775 wd 0.0500 time 0.4155 (0.4210) data time 0.0007 (0.0032) model time 0.4148 (0.4230) loss 6.4702 (7.3178) grad_norm 1.7229 (inf) loss_scale 1024.0000 (1237.9701) mem 14939MB [2024-07-25 01:46:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][210/625] eta 0:02:54 lr 0.000775 wd 0.0500 time 0.4007 (0.4202) data time 0.0008 (0.0031) model time 0.3999 (0.4218) loss 7.2247 (7.3214) grad_norm 2.7977 (inf) loss_scale 1024.0000 (1227.8294) mem 14939MB [2024-07-25 01:46:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][220/625] eta 0:02:49 lr 0.000775 wd 0.0500 time 0.3932 (0.4193) data time 0.0007 (0.0030) model time 0.3925 (0.4205) loss 5.4917 (7.3096) grad_norm 3.0193 (inf) loss_scale 1024.0000 (1218.6063) mem 14939MB [2024-07-25 01:46:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][230/625] eta 0:02:45 lr 0.000774 wd 0.0500 time 0.4007 (0.4186) data time 0.0009 (0.0029) model time 0.3998 (0.4194) loss 7.5445 (7.3095) grad_norm 2.7746 (inf) loss_scale 1024.0000 (1210.1818) mem 14939MB [2024-07-25 01:46:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][240/625] eta 0:02:40 lr 0.000774 wd 0.0500 time 0.4090 (0.4180) data time 0.0008 (0.0028) model time 0.4082 (0.4185) loss 8.3399 (7.3082) grad_norm 4.5951 (inf) loss_scale 1024.0000 (1202.4564) mem 14939MB [2024-07-25 01:46:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][250/625] eta 0:02:36 lr 0.000774 wd 0.0500 time 0.3939 (0.4173) data time 0.0009 (0.0027) model time 0.3931 (0.4176) loss 6.9242 (7.3096) grad_norm 2.7817 (inf) loss_scale 1024.0000 (1195.3466) mem 14939MB [2024-07-25 01:46:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][260/625] eta 0:02:32 lr 0.000774 wd 0.0500 time 0.4061 (0.4167) data time 0.0008 (0.0027) model time 0.4053 (0.4168) loss 7.3485 (7.3241) grad_norm 2.8462 (inf) loss_scale 1024.0000 (1188.7816) mem 14939MB [2024-07-25 01:47:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][270/625] eta 0:02:27 lr 0.000774 wd 0.0500 time 0.5669 (0.4168) data time 0.0007 (0.0026) model time 0.5662 (0.4168) loss 5.7348 (7.3297) grad_norm 3.1332 (inf) loss_scale 1024.0000 (1182.7011) mem 14939MB [2024-07-25 01:47:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][280/625] eta 0:02:24 lr 0.000774 wd 0.0500 time 0.5711 (0.4175) data time 0.0009 (0.0026) model time 0.5702 (0.4176) loss 8.2616 (7.3370) grad_norm 2.1736 (inf) loss_scale 1024.0000 (1177.0534) mem 14939MB [2024-07-25 01:47:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][290/625] eta 0:02:20 lr 0.000774 wd 0.0500 time 0.5702 (0.4205) data time 0.0008 (0.0025) model time 0.5694 (0.4212) loss 7.9052 (7.3419) grad_norm 1.8260 (inf) loss_scale 1024.0000 (1171.7938) mem 14939MB [2024-07-25 01:47:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][300/625] eta 0:02:17 lr 0.000774 wd 0.0500 time 0.5798 (0.4224) data time 0.0009 (0.0025) model time 0.5789 (0.4234) loss 6.8152 (7.3455) grad_norm 2.1345 (inf) loss_scale 1024.0000 (1166.8837) mem 14939MB [2024-07-25 01:47:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][310/625] eta 0:02:13 lr 0.000774 wd 0.0500 time 0.4181 (0.4235) data time 0.0008 (0.0024) model time 0.4174 (0.4247) loss 6.5711 (7.3397) grad_norm 4.0894 (inf) loss_scale 1024.0000 (1162.2894) mem 14939MB [2024-07-25 01:47:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][320/625] eta 0:02:08 lr 0.000774 wd 0.0500 time 0.3939 (0.4229) data time 0.0007 (0.0024) model time 0.3932 (0.4239) loss 8.0358 (7.3365) grad_norm 4.8605 (inf) loss_scale 1024.0000 (1157.9813) mem 14939MB [2024-07-25 01:47:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][330/625] eta 0:02:04 lr 0.000773 wd 0.0500 time 0.3949 (0.4223) data time 0.0009 (0.0023) model time 0.3940 (0.4231) loss 7.6820 (7.3353) grad_norm 2.0134 (inf) loss_scale 1024.0000 (1153.9335) mem 14939MB [2024-07-25 01:47:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][340/625] eta 0:02:00 lr 0.000773 wd 0.0500 time 0.4161 (0.4217) data time 0.0009 (0.0023) model time 0.4152 (0.4223) loss 8.2234 (7.3324) grad_norm 2.2314 (inf) loss_scale 1024.0000 (1150.1232) mem 14939MB [2024-07-25 01:47:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][350/625] eta 0:01:55 lr 0.000773 wd 0.0500 time 0.3949 (0.4210) data time 0.0009 (0.0023) model time 0.3940 (0.4215) loss 5.1867 (7.3208) grad_norm 2.8656 (inf) loss_scale 1024.0000 (1146.5299) mem 14939MB [2024-07-25 01:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][360/625] eta 0:01:51 lr 0.000773 wd 0.0500 time 0.3924 (0.4204) data time 0.0011 (0.0022) model time 0.3913 (0.4207) loss 6.8898 (7.3180) grad_norm 2.5076 (inf) loss_scale 1024.0000 (1143.1357) mem 14939MB [2024-07-25 01:47:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][370/625] eta 0:01:47 lr 0.000773 wd 0.0500 time 0.4110 (0.4200) data time 0.0008 (0.0022) model time 0.4102 (0.4202) loss 7.3869 (7.3206) grad_norm 2.1498 (inf) loss_scale 1024.0000 (1139.9245) mem 14939MB [2024-07-25 01:47:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][380/625] eta 0:01:42 lr 0.000773 wd 0.0500 time 0.3947 (0.4198) data time 0.0008 (0.0022) model time 0.3939 (0.4200) loss 7.1482 (7.3163) grad_norm 2.2319 (inf) loss_scale 1024.0000 (1136.8819) mem 14939MB [2024-07-25 01:47:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][390/625] eta 0:01:38 lr 0.000773 wd 0.0500 time 0.3979 (0.4194) data time 0.0009 (0.0021) model time 0.3970 (0.4194) loss 7.7626 (7.3083) grad_norm 2.2320 (inf) loss_scale 1024.0000 (1133.9949) mem 14939MB [2024-07-25 01:47:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][400/625] eta 0:01:34 lr 0.000773 wd 0.0500 time 0.4088 (0.4190) data time 0.0008 (0.0021) model time 0.4079 (0.4189) loss 8.1208 (7.3144) grad_norm 5.1635 (inf) loss_scale 1024.0000 (1131.2519) mem 14939MB [2024-07-25 01:48:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][410/625] eta 0:01:29 lr 0.000773 wd 0.0500 time 0.3936 (0.4186) data time 0.0008 (0.0021) model time 0.3928 (0.4184) loss 8.1857 (7.3162) grad_norm 2.9218 (inf) loss_scale 1024.0000 (1128.6423) mem 14939MB [2024-07-25 01:48:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][420/625] eta 0:01:25 lr 0.000773 wd 0.0500 time 0.3993 (0.4182) data time 0.0010 (0.0021) model time 0.3983 (0.4180) loss 8.4291 (7.3160) grad_norm 1.9291 (inf) loss_scale 1024.0000 (1126.1568) mem 14939MB [2024-07-25 01:48:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][430/625] eta 0:01:21 lr 0.000772 wd 0.0500 time 0.4096 (0.4178) data time 0.0007 (0.0020) model time 0.4089 (0.4175) loss 8.5669 (7.3272) grad_norm 1.8276 (inf) loss_scale 1024.0000 (1123.7865) mem 14939MB [2024-07-25 01:48:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][440/625] eta 0:01:17 lr 0.000772 wd 0.0500 time 0.3995 (0.4174) data time 0.0007 (0.0020) model time 0.3988 (0.4171) loss 6.2919 (7.3243) grad_norm 3.1638 (inf) loss_scale 1024.0000 (1121.5238) mem 14939MB [2024-07-25 01:48:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][450/625] eta 0:01:12 lr 0.000772 wd 0.0500 time 0.3989 (0.4171) data time 0.0006 (0.0020) model time 0.3983 (0.4167) loss 8.3235 (7.3302) grad_norm 3.1863 (inf) loss_scale 1024.0000 (1119.3614) mem 14939MB [2024-07-25 01:48:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][460/625] eta 0:01:08 lr 0.000772 wd 0.0500 time 0.4105 (0.4168) data time 0.0008 (0.0020) model time 0.4097 (0.4163) loss 6.2697 (7.3290) grad_norm 3.0521 (inf) loss_scale 1024.0000 (1117.2928) mem 14939MB [2024-07-25 01:48:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][470/625] eta 0:01:04 lr 0.000772 wd 0.0500 time 0.3973 (0.4165) data time 0.0007 (0.0019) model time 0.3966 (0.4160) loss 6.3812 (7.3245) grad_norm 1.8898 (inf) loss_scale 1024.0000 (1115.3121) mem 14939MB [2024-07-25 01:48:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][480/625] eta 0:01:00 lr 0.000772 wd 0.0500 time 0.3985 (0.4162) data time 0.0006 (0.0019) model time 0.3979 (0.4156) loss 7.6499 (7.3251) grad_norm 1.8017 (inf) loss_scale 1024.0000 (1113.4137) mem 14939MB [2024-07-25 01:48:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][490/625] eta 0:00:56 lr 0.000772 wd 0.0500 time 0.4058 (0.4159) data time 0.0008 (0.0019) model time 0.4049 (0.4153) loss 7.9332 (7.3216) grad_norm 4.4100 (inf) loss_scale 1024.0000 (1111.5927) mem 14939MB [2024-07-25 01:48:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][500/625] eta 0:00:52 lr 0.000772 wd 0.0500 time 0.3985 (0.4163) data time 0.0006 (0.0019) model time 0.3980 (0.4157) loss 6.4486 (7.3167) grad_norm 2.8522 (inf) loss_scale 1024.0000 (1109.8443) mem 14939MB [2024-07-25 01:48:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][510/625] eta 0:00:48 lr 0.000772 wd 0.0500 time 0.5968 (0.4180) data time 0.0009 (0.0019) model time 0.5960 (0.4176) loss 7.2879 (7.3201) grad_norm 2.0712 (inf) loss_scale 1024.0000 (1108.1644) mem 14939MB [2024-07-25 01:48:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][520/625] eta 0:00:43 lr 0.000772 wd 0.0500 time 0.4116 (0.4190) data time 0.0007 (0.0018) model time 0.4109 (0.4187) loss 7.2140 (7.3237) grad_norm 2.0423 (inf) loss_scale 1024.0000 (1106.5489) mem 14939MB [2024-07-25 01:48:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][530/625] eta 0:00:39 lr 0.000771 wd 0.0500 time 0.3986 (0.4192) data time 0.0008 (0.0018) model time 0.3979 (0.4190) loss 6.1292 (7.3204) grad_norm 2.3608 (inf) loss_scale 1024.0000 (1104.9944) mem 14939MB [2024-07-25 01:48:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][540/625] eta 0:00:35 lr 0.000771 wd 0.0500 time 0.4026 (0.4189) data time 0.0009 (0.0018) model time 0.4017 (0.4186) loss 6.6004 (7.3127) grad_norm 2.1311 (inf) loss_scale 1024.0000 (1103.4972) mem 14939MB [2024-07-25 01:49:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][550/625] eta 0:00:31 lr 0.000771 wd 0.0500 time 0.4038 (0.4186) data time 0.0008 (0.0018) model time 0.4029 (0.4183) loss 5.8694 (7.3150) grad_norm 1.6841 (inf) loss_scale 1024.0000 (1102.0544) mem 14939MB [2024-07-25 01:49:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][560/625] eta 0:00:27 lr 0.000771 wd 0.0500 time 0.3951 (0.4184) data time 0.0007 (0.0018) model time 0.3945 (0.4180) loss 7.9016 (7.3231) grad_norm 3.1318 (inf) loss_scale 1024.0000 (1100.6631) mem 14939MB [2024-07-25 01:49:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][570/625] eta 0:00:23 lr 0.000771 wd 0.0500 time 0.3965 (0.4184) data time 0.0006 (0.0019) model time 0.3959 (0.4179) loss 7.6821 (7.3256) grad_norm 2.4232 (inf) loss_scale 1024.0000 (1099.3205) mem 14939MB [2024-07-25 01:49:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][580/625] eta 0:00:18 lr 0.000771 wd 0.0500 time 0.4296 (0.4182) data time 0.0007 (0.0019) model time 0.4289 (0.4177) loss 7.2544 (7.3247) grad_norm 4.8810 (inf) loss_scale 1024.0000 (1098.0241) mem 14939MB [2024-07-25 01:49:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][590/625] eta 0:00:14 lr 0.000771 wd 0.0500 time 0.3969 (0.4180) data time 0.0008 (0.0019) model time 0.3960 (0.4174) loss 7.5037 (7.3290) grad_norm 2.2217 (inf) loss_scale 1024.0000 (1096.7716) mem 14939MB [2024-07-25 01:49:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][600/625] eta 0:00:10 lr 0.000771 wd 0.0500 time 0.3993 (0.4181) data time 0.0009 (0.0019) model time 0.3984 (0.4175) loss 5.9046 (7.3303) grad_norm 1.8786 (inf) loss_scale 1024.0000 (1095.5607) mem 14939MB [2024-07-25 01:49:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][610/625] eta 0:00:06 lr 0.000771 wd 0.0500 time 0.4014 (0.4178) data time 0.0004 (0.0019) model time 0.4010 (0.4172) loss 8.4823 (7.3312) grad_norm 3.1845 (inf) loss_scale 1024.0000 (1094.3895) mem 14939MB [2024-07-25 01:49:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][620/625] eta 0:00:02 lr 0.000770 wd 0.0500 time 0.3981 (0.4175) data time 0.0006 (0.0018) model time 0.3976 (0.4168) loss 7.4835 (7.3314) grad_norm 2.2043 (inf) loss_scale 1024.0000 (1093.2560) mem 14939MB [2024-07-25 01:49:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 134 training takes 0:04:20 [2024-07-25 01:49:31 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:49:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5947 (0.5947) Acc@1 87.695 (87.695) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 01:49:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9614 (0.7384) Acc@1 78.857 (84.703) Acc@5 95.312 (97.221) Mem 14939MB [2024-07-25 01:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.1035 (0.8775) Acc@1 74.170 (81.206) Acc@5 93.848 (95.766) Mem 14939MB [2024-07-25 01:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.756 Acc@5 95.717 [2024-07-25 01:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-07-25 01:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.788 (0.788) Loss 0.5742 (0.5742) Acc@1 88.770 (88.770) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:49:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.159) Loss 0.9341 (0.7194) Acc@1 79.834 (85.343) Acc@5 95.459 (97.430) Mem 14939MB [2024-07-25 01:49:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 1.0664 (0.8497) Acc@1 75.684 (81.910) Acc@5 94.189 (96.073) Mem 14939MB [2024-07-25 01:49:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.552 Acc@5 96.049 [2024-07-25 01:49:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 01:49:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.55% [2024-07-25 01:49:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 01:49:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 01:49:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][0/625] eta 0:08:12 lr 0.000770 wd 0.0500 time 0.7882 (0.7882) data time 0.3856 (0.3856) model time 0.0000 (0.0000) loss 7.9093 (7.9093) grad_norm 2.1537 (2.1537) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:49:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][10/625] eta 0:04:28 lr 0.000770 wd 0.0500 time 0.3983 (0.4370) data time 0.0006 (0.0359) model time 0.0000 (0.0000) loss 8.0774 (7.4833) grad_norm 4.3902 (2.7133) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:49:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][20/625] eta 0:04:14 lr 0.000770 wd 0.0500 time 0.3999 (0.4202) data time 0.0009 (0.0195) model time 0.0000 (0.0000) loss 7.8980 (7.4640) grad_norm 1.7046 (2.7268) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:49:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][30/625] eta 0:04:06 lr 0.000770 wd 0.0500 time 0.4082 (0.4142) data time 0.0009 (0.0135) model time 0.0000 (0.0000) loss 6.5610 (7.4267) grad_norm 1.8489 (2.5982) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:49:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][40/625] eta 0:04:00 lr 0.000770 wd 0.0500 time 0.3963 (0.4114) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 5.9549 (7.3799) grad_norm 2.1489 (2.5632) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][50/625] eta 0:03:56 lr 0.000770 wd 0.0500 time 0.3977 (0.4106) data time 0.0006 (0.0087) model time 0.0000 (0.0000) loss 6.6399 (7.3714) grad_norm 1.8426 (2.4478) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][60/625] eta 0:03:51 lr 0.000770 wd 0.0500 time 0.4197 (0.4105) data time 0.0008 (0.0075) model time 0.4188 (0.4091) loss 7.8518 (7.2712) grad_norm 3.1801 (2.3787) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][70/625] eta 0:03:47 lr 0.000770 wd 0.0500 time 0.3945 (0.4098) data time 0.0009 (0.0066) model time 0.3936 (0.4065) loss 7.6658 (7.2889) grad_norm 2.4667 (2.3522) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][80/625] eta 0:03:42 lr 0.000770 wd 0.0500 time 0.3994 (0.4089) data time 0.0006 (0.0060) model time 0.3987 (0.4048) loss 6.7104 (7.2936) grad_norm 2.6000 (2.4328) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][90/625] eta 0:03:39 lr 0.000770 wd 0.0500 time 0.4075 (0.4102) data time 0.0006 (0.0054) model time 0.4068 (0.4086) loss 6.5754 (7.3048) grad_norm 3.2228 (2.5302) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][100/625] eta 0:03:38 lr 0.000769 wd 0.0500 time 0.5961 (0.4161) data time 0.0007 (0.0050) model time 0.5954 (0.4205) loss 7.1489 (7.3091) grad_norm 2.0392 (2.5833) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][110/625] eta 0:03:37 lr 0.000769 wd 0.0500 time 0.3899 (0.4225) data time 0.0009 (0.0046) model time 0.3890 (0.4314) loss 8.8632 (7.3490) grad_norm 2.1781 (2.5331) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][120/625] eta 0:03:36 lr 0.000769 wd 0.0500 time 0.5766 (0.4277) data time 0.0006 (0.0043) model time 0.5760 (0.4391) loss 8.3273 (7.3150) grad_norm 2.1371 (2.4925) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][130/625] eta 0:03:32 lr 0.000769 wd 0.0500 time 0.4082 (0.4284) data time 0.0009 (0.0041) model time 0.4073 (0.4387) loss 7.9539 (7.3093) grad_norm 1.7320 (2.4861) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][140/625] eta 0:03:26 lr 0.000769 wd 0.0500 time 0.3936 (0.4265) data time 0.0009 (0.0038) model time 0.3927 (0.4344) loss 7.5903 (7.3308) grad_norm 2.5400 (2.5142) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][150/625] eta 0:03:21 lr 0.000769 wd 0.0500 time 0.4013 (0.4250) data time 0.0006 (0.0036) model time 0.4007 (0.4313) loss 8.2210 (7.3324) grad_norm 1.8409 (2.5278) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][160/625] eta 0:03:17 lr 0.000769 wd 0.0500 time 0.4066 (0.4237) data time 0.0007 (0.0035) model time 0.4059 (0.4287) loss 7.2577 (7.3372) grad_norm 2.0056 (2.5265) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][170/625] eta 0:03:12 lr 0.000769 wd 0.0500 time 0.3944 (0.4224) data time 0.0007 (0.0033) model time 0.3937 (0.4264) loss 6.2757 (7.3355) grad_norm 3.2828 (2.5304) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][180/625] eta 0:03:07 lr 0.000769 wd 0.0500 time 0.4020 (0.4213) data time 0.0007 (0.0032) model time 0.4013 (0.4245) loss 7.5351 (7.3264) grad_norm 2.3779 (2.5131) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][190/625] eta 0:03:02 lr 0.000768 wd 0.0500 time 0.4111 (0.4205) data time 0.0008 (0.0031) model time 0.4103 (0.4231) loss 7.8367 (7.3373) grad_norm 1.7529 (2.5114) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][200/625] eta 0:02:58 lr 0.000768 wd 0.0500 time 0.3988 (0.4197) data time 0.0007 (0.0030) model time 0.3981 (0.4217) loss 8.1120 (7.3270) grad_norm 1.5694 (2.4937) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][210/625] eta 0:02:53 lr 0.000768 wd 0.0500 time 0.4023 (0.4190) data time 0.0008 (0.0029) model time 0.4014 (0.4205) loss 7.4813 (7.3233) grad_norm 2.2347 (2.4759) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][220/625] eta 0:02:49 lr 0.000768 wd 0.0500 time 0.4122 (0.4183) data time 0.0006 (0.0028) model time 0.4115 (0.4196) loss 6.3341 (7.3198) grad_norm 2.9218 (2.4679) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][230/625] eta 0:02:45 lr 0.000768 wd 0.0500 time 0.3932 (0.4178) data time 0.0008 (0.0028) model time 0.3924 (0.4188) loss 6.9338 (7.3190) grad_norm 1.7172 (2.4600) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][240/625] eta 0:02:40 lr 0.000768 wd 0.0500 time 0.4037 (0.4172) data time 0.0007 (0.0027) model time 0.4031 (0.4179) loss 8.6494 (7.3116) grad_norm 1.8901 (2.4462) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][250/625] eta 0:02:36 lr 0.000768 wd 0.0500 time 0.4121 (0.4167) data time 0.0006 (0.0026) model time 0.4114 (0.4172) loss 7.7874 (7.3091) grad_norm 1.6037 (2.4475) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][260/625] eta 0:02:31 lr 0.000768 wd 0.0500 time 0.3937 (0.4162) data time 0.0007 (0.0025) model time 0.3930 (0.4165) loss 8.2056 (7.3203) grad_norm 3.4843 (2.4484) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][270/625] eta 0:02:27 lr 0.000768 wd 0.0500 time 0.4032 (0.4157) data time 0.0006 (0.0025) model time 0.4026 (0.4159) loss 6.7123 (7.3134) grad_norm 2.8666 (2.4523) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][280/625] eta 0:02:23 lr 0.000768 wd 0.0500 time 0.4046 (0.4153) data time 0.0007 (0.0024) model time 0.4039 (0.4153) loss 6.2935 (7.3076) grad_norm 2.7238 (2.4466) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][290/625] eta 0:02:19 lr 0.000767 wd 0.0500 time 0.3950 (0.4150) data time 0.0009 (0.0024) model time 0.3941 (0.4149) loss 7.3394 (7.3064) grad_norm 1.6523 (2.4403) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][300/625] eta 0:02:14 lr 0.000767 wd 0.0500 time 0.4014 (0.4145) data time 0.0009 (0.0023) model time 0.4005 (0.4143) loss 7.7400 (7.2990) grad_norm 2.3410 (2.4304) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][310/625] eta 0:02:10 lr 0.000767 wd 0.0500 time 0.4187 (0.4146) data time 0.0006 (0.0023) model time 0.4180 (0.4144) loss 6.8205 (7.2943) grad_norm 1.5443 (2.4276) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][320/625] eta 0:02:06 lr 0.000767 wd 0.0500 time 0.4011 (0.4154) data time 0.0009 (0.0023) model time 0.4002 (0.4153) loss 7.3174 (7.3070) grad_norm 2.2644 (2.4331) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:51:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][330/625] eta 0:02:03 lr 0.000767 wd 0.0500 time 0.4043 (0.4179) data time 0.0006 (0.0022) model time 0.4037 (0.4182) loss 7.3995 (7.3184) grad_norm 1.9182 (2.4348) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][340/625] eta 0:01:59 lr 0.000767 wd 0.0500 time 0.5948 (0.4197) data time 0.0006 (0.0022) model time 0.5942 (0.4203) loss 7.0239 (7.3131) grad_norm 3.5670 (2.4600) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][350/625] eta 0:01:55 lr 0.000767 wd 0.0500 time 0.4127 (0.4199) data time 0.0006 (0.0021) model time 0.4121 (0.4205) loss 8.2875 (7.3183) grad_norm 2.0994 (2.4522) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][360/625] eta 0:01:51 lr 0.000767 wd 0.0500 time 0.3941 (0.4194) data time 0.0007 (0.0021) model time 0.3934 (0.4198) loss 6.7421 (7.3072) grad_norm 2.7940 (2.4493) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][370/625] eta 0:01:46 lr 0.000767 wd 0.0500 time 0.4026 (0.4189) data time 0.0009 (0.0021) model time 0.4017 (0.4192) loss 8.1226 (7.3019) grad_norm 2.4377 (2.4537) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][380/625] eta 0:01:42 lr 0.000767 wd 0.0500 time 0.4122 (0.4185) data time 0.0006 (0.0020) model time 0.4116 (0.4188) loss 8.2607 (7.3028) grad_norm 2.5526 (2.4427) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][390/625] eta 0:01:38 lr 0.000766 wd 0.0500 time 0.3938 (0.4182) data time 0.0009 (0.0020) model time 0.3930 (0.4183) loss 7.7431 (7.3011) grad_norm 1.9278 (2.4286) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][400/625] eta 0:01:33 lr 0.000766 wd 0.0500 time 0.3986 (0.4178) data time 0.0009 (0.0020) model time 0.3978 (0.4178) loss 7.0709 (7.2923) grad_norm 1.7229 (2.4200) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][410/625] eta 0:01:29 lr 0.000766 wd 0.0500 time 0.4072 (0.4174) data time 0.0009 (0.0020) model time 0.4063 (0.4174) loss 7.9873 (7.2872) grad_norm 6.7418 (2.4217) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][420/625] eta 0:01:25 lr 0.000766 wd 0.0500 time 0.3947 (0.4171) data time 0.0008 (0.0019) model time 0.3938 (0.4169) loss 7.8466 (7.2899) grad_norm 1.9169 (2.4267) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][430/625] eta 0:01:21 lr 0.000766 wd 0.0500 time 0.3994 (0.4167) data time 0.0008 (0.0019) model time 0.3986 (0.4165) loss 7.3795 (7.2972) grad_norm 1.6413 (2.4292) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][440/625] eta 0:01:17 lr 0.000766 wd 0.0500 time 0.4053 (0.4164) data time 0.0009 (0.0019) model time 0.4044 (0.4162) loss 8.2899 (7.3057) grad_norm 2.2697 (2.4249) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][450/625] eta 0:01:12 lr 0.000766 wd 0.0500 time 0.3945 (0.4161) data time 0.0008 (0.0019) model time 0.3937 (0.4158) loss 6.8368 (7.3013) grad_norm 2.2146 (2.4429) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][460/625] eta 0:01:08 lr 0.000766 wd 0.0500 time 0.4001 (0.4158) data time 0.0007 (0.0019) model time 0.3994 (0.4155) loss 7.7919 (7.3131) grad_norm 1.8825 (2.4451) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][470/625] eta 0:01:04 lr 0.000766 wd 0.0500 time 0.4490 (0.4158) data time 0.0007 (0.0019) model time 0.4483 (0.4154) loss 8.3416 (7.3110) grad_norm 4.1286 (2.4536) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:52:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][480/625] eta 0:01:00 lr 0.000766 wd 0.0500 time 0.3940 (0.4156) data time 0.0009 (0.0018) model time 0.3931 (0.4152) loss 8.4098 (7.3100) grad_norm 4.1267 (2.4569) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][490/625] eta 0:00:56 lr 0.000765 wd 0.0500 time 0.4050 (0.4156) data time 0.0008 (0.0018) model time 0.4042 (0.4151) loss 7.8098 (7.3280) grad_norm 1.5758 (2.4636) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][500/625] eta 0:00:51 lr 0.000765 wd 0.0500 time 0.4129 (0.4154) data time 0.0010 (0.0018) model time 0.4118 (0.4149) loss 6.3849 (7.3337) grad_norm 2.0990 (2.4565) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][510/625] eta 0:00:47 lr 0.000765 wd 0.0500 time 0.3958 (0.4152) data time 0.0009 (0.0018) model time 0.3949 (0.4146) loss 5.8394 (7.3255) grad_norm 4.1740 (2.4606) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][520/625] eta 0:00:43 lr 0.000765 wd 0.0500 time 0.4150 (0.4150) data time 0.0008 (0.0018) model time 0.4142 (0.4144) loss 7.3689 (7.3150) grad_norm 2.7235 (2.4765) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][530/625] eta 0:00:39 lr 0.000765 wd 0.0500 time 0.4136 (0.4150) data time 0.0009 (0.0018) model time 0.4127 (0.4144) loss 8.6211 (7.3162) grad_norm 5.6082 (2.4808) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][540/625] eta 0:00:35 lr 0.000765 wd 0.0500 time 0.3941 (0.4158) data time 0.0009 (0.0018) model time 0.3932 (0.4152) loss 7.9650 (7.3188) grad_norm 1.7269 (2.4839) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][550/625] eta 0:00:31 lr 0.000765 wd 0.0500 time 0.5995 (0.4171) data time 0.0007 (0.0018) model time 0.5988 (0.4167) loss 6.7799 (7.3224) grad_norm 1.8116 (2.4838) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][560/625] eta 0:00:27 lr 0.000765 wd 0.0500 time 0.5740 (0.4184) data time 0.0010 (0.0017) model time 0.5730 (0.4181) loss 7.1581 (7.3275) grad_norm 1.6640 (2.4896) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][570/625] eta 0:00:23 lr 0.000765 wd 0.0500 time 0.4040 (0.4188) data time 0.0009 (0.0017) model time 0.4031 (0.4185) loss 7.6096 (7.3281) grad_norm 2.3723 (2.4892) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][580/625] eta 0:00:18 lr 0.000764 wd 0.0500 time 0.3968 (0.4186) data time 0.0009 (0.0017) model time 0.3959 (0.4183) loss 6.0358 (7.3273) grad_norm 2.6210 (2.4918) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][590/625] eta 0:00:14 lr 0.000764 wd 0.0500 time 0.4086 (0.4183) data time 0.0006 (0.0017) model time 0.4079 (0.4180) loss 8.6048 (7.3337) grad_norm 2.1294 (2.4934) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][600/625] eta 0:00:10 lr 0.000764 wd 0.0500 time 0.4065 (0.4182) data time 0.0009 (0.0017) model time 0.4056 (0.4178) loss 7.5463 (7.3392) grad_norm 3.0231 (2.4852) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][610/625] eta 0:00:06 lr 0.000764 wd 0.0500 time 0.3928 (0.4180) data time 0.0005 (0.0017) model time 0.3923 (0.4176) loss 6.9183 (7.3441) grad_norm 1.8487 (2.4760) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][620/625] eta 0:00:02 lr 0.000764 wd 0.0500 time 0.4191 (0.4178) data time 0.0006 (0.0017) model time 0.4185 (0.4174) loss 7.1642 (7.3457) grad_norm 1.9619 (2.4747) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:53:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 135 training takes 0:04:21 [2024-07-25 01:53:59 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:54:00 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:54:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5884 (0.5884) Acc@1 88.867 (88.867) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 01:54:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9824 (0.7485) Acc@1 78.418 (84.979) Acc@5 95.264 (97.230) Mem 14939MB [2024-07-25 01:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1230 (0.8841) Acc@1 74.072 (81.378) Acc@5 93.750 (95.791) Mem 14939MB [2024-07-25 01:54:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.986 Acc@5 95.745 [2024-07-25 01:54:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 01:54:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.800 (0.800) Loss 0.5732 (0.5732) Acc@1 88.867 (88.867) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 01:54:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.158) Loss 0.9321 (0.7185) Acc@1 79.785 (85.329) Acc@5 95.459 (97.439) Mem 14939MB [2024-07-25 01:54:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 1.0654 (0.8489) Acc@1 75.586 (81.896) Acc@5 94.141 (96.068) Mem 14939MB [2024-07-25 01:54:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.538 Acc@5 96.045 [2024-07-25 01:54:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 01:54:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][0/625] eta 0:12:39 lr 0.000764 wd 0.0500 time 1.2159 (1.2159) data time 0.7306 (0.7306) model time 0.0000 (0.0000) loss 7.5197 (7.5197) grad_norm 2.0272 (2.0272) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][10/625] eta 0:04:52 lr 0.000764 wd 0.0500 time 0.3938 (0.4755) data time 0.0011 (0.0674) model time 0.0000 (0.0000) loss 7.3576 (7.4942) grad_norm 4.3942 (3.2153) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][20/625] eta 0:04:27 lr 0.000764 wd 0.0500 time 0.4001 (0.4415) data time 0.0006 (0.0357) model time 0.0000 (0.0000) loss 7.6170 (7.4860) grad_norm 2.0177 (2.6881) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][30/625] eta 0:04:15 lr 0.000764 wd 0.0500 time 0.4063 (0.4289) data time 0.0008 (0.0246) model time 0.0000 (0.0000) loss 7.0548 (7.6259) grad_norm 2.1383 (2.7381) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][40/625] eta 0:04:07 lr 0.000764 wd 0.0500 time 0.3947 (0.4226) data time 0.0010 (0.0188) model time 0.0000 (0.0000) loss 8.6041 (7.7305) grad_norm 3.0375 (2.5896) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][50/625] eta 0:04:01 lr 0.000764 wd 0.0500 time 0.4002 (0.4192) data time 0.0009 (0.0159) model time 0.0000 (0.0000) loss 8.3053 (7.6206) grad_norm 1.6837 (2.5662) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][60/625] eta 0:03:55 lr 0.000763 wd 0.0500 time 0.4113 (0.4164) data time 0.0006 (0.0135) model time 0.4107 (0.4016) loss 7.6276 (7.5974) grad_norm 4.0968 (2.5678) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][70/625] eta 0:03:49 lr 0.000763 wd 0.0500 time 0.3968 (0.4143) data time 0.0007 (0.0117) model time 0.3961 (0.4008) loss 6.9429 (7.5918) grad_norm 7.3236 (2.8169) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][80/625] eta 0:03:44 lr 0.000763 wd 0.0500 time 0.4020 (0.4125) data time 0.0008 (0.0104) model time 0.4012 (0.4002) loss 7.2933 (7.6011) grad_norm 3.7540 (2.9298) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][90/625] eta 0:03:40 lr 0.000763 wd 0.0500 time 0.4063 (0.4113) data time 0.0007 (0.0093) model time 0.4056 (0.4004) loss 6.7963 (7.5766) grad_norm 1.8360 (2.8967) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][100/625] eta 0:03:35 lr 0.000763 wd 0.0500 time 0.3966 (0.4103) data time 0.0008 (0.0085) model time 0.3958 (0.4003) loss 7.2002 (7.5477) grad_norm 3.0928 (2.8169) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][110/625] eta 0:03:30 lr 0.000763 wd 0.0500 time 0.3989 (0.4094) data time 0.0006 (0.0078) model time 0.3983 (0.4001) loss 7.8231 (7.5107) grad_norm 1.7968 (2.7549) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][120/625] eta 0:03:26 lr 0.000763 wd 0.0500 time 0.4072 (0.4088) data time 0.0007 (0.0073) model time 0.4065 (0.4003) loss 8.1628 (7.5163) grad_norm 1.8385 (2.6884) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:54:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][130/625] eta 0:03:23 lr 0.000763 wd 0.0500 time 0.6068 (0.4113) data time 0.0008 (0.0068) model time 0.6059 (0.4054) loss 7.6931 (7.4914) grad_norm 2.5708 (2.7348) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][140/625] eta 0:03:21 lr 0.000763 wd 0.0500 time 0.5935 (0.4158) data time 0.0008 (0.0064) model time 0.5927 (0.4130) loss 8.7349 (7.5203) grad_norm 2.5081 (2.7246) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][150/625] eta 0:03:20 lr 0.000762 wd 0.0500 time 0.6102 (0.4214) data time 0.0007 (0.0060) model time 0.6096 (0.4216) loss 5.7915 (7.4839) grad_norm 2.7744 (2.7066) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][160/625] eta 0:03:17 lr 0.000762 wd 0.0500 time 0.5672 (0.4246) data time 0.0006 (0.0057) model time 0.5666 (0.4262) loss 6.5046 (7.4698) grad_norm 2.9352 (2.7048) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][170/625] eta 0:03:13 lr 0.000762 wd 0.0500 time 0.4037 (0.4244) data time 0.0009 (0.0054) model time 0.4027 (0.4257) loss 6.2385 (7.4484) grad_norm 1.7780 (2.6792) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][180/625] eta 0:03:08 lr 0.000762 wd 0.0500 time 0.3970 (0.4231) data time 0.0009 (0.0052) model time 0.3961 (0.4237) loss 7.5560 (7.4332) grad_norm 1.6361 (2.6411) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][190/625] eta 0:03:03 lr 0.000762 wd 0.0500 time 0.4130 (0.4220) data time 0.0009 (0.0050) model time 0.4121 (0.4221) loss 6.9152 (7.4358) grad_norm 2.9852 (2.6218) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][200/625] eta 0:02:58 lr 0.000762 wd 0.0500 time 0.3959 (0.4210) data time 0.0007 (0.0048) model time 0.3952 (0.4206) loss 5.8658 (7.4038) grad_norm 1.7071 (2.6021) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][210/625] eta 0:02:54 lr 0.000762 wd 0.0500 time 0.4019 (0.4201) data time 0.0008 (0.0046) model time 0.4010 (0.4195) loss 7.9521 (7.4032) grad_norm 3.2691 (2.5835) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][220/625] eta 0:02:49 lr 0.000762 wd 0.0500 time 0.4063 (0.4193) data time 0.0009 (0.0044) model time 0.4054 (0.4184) loss 7.6556 (7.3977) grad_norm 2.6022 (2.5594) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][230/625] eta 0:02:45 lr 0.000762 wd 0.0500 time 0.3980 (0.4185) data time 0.0008 (0.0043) model time 0.3972 (0.4174) loss 7.2967 (7.3748) grad_norm 2.4880 (2.5585) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][240/625] eta 0:02:40 lr 0.000762 wd 0.0500 time 0.4014 (0.4179) data time 0.0008 (0.0041) model time 0.4006 (0.4166) loss 8.0674 (7.3733) grad_norm 2.5121 (2.5740) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][250/625] eta 0:02:36 lr 0.000761 wd 0.0500 time 0.4161 (0.4173) data time 0.0006 (0.0040) model time 0.4155 (0.4159) loss 7.7186 (7.3832) grad_norm 2.0513 (2.5736) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][260/625] eta 0:02:32 lr 0.000761 wd 0.0500 time 0.3956 (0.4168) data time 0.0006 (0.0039) model time 0.3950 (0.4152) loss 6.1787 (7.4037) grad_norm 2.3904 (2.5584) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:55:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][270/625] eta 0:02:27 lr 0.000761 wd 0.0500 time 0.3978 (0.4162) data time 0.0007 (0.0038) model time 0.3972 (0.4146) loss 9.3482 (7.4233) grad_norm 2.1071 (2.5614) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][280/625] eta 0:02:23 lr 0.000761 wd 0.0500 time 0.4173 (0.4157) data time 0.0007 (0.0037) model time 0.4166 (0.4140) loss 6.3724 (7.4251) grad_norm 2.5875 (2.5851) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][290/625] eta 0:02:19 lr 0.000761 wd 0.0500 time 0.3946 (0.4152) data time 0.0007 (0.0036) model time 0.3939 (0.4134) loss 7.9106 (7.4230) grad_norm 2.4145 (2.5839) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][300/625] eta 0:02:14 lr 0.000761 wd 0.0500 time 0.4051 (0.4147) data time 0.0006 (0.0035) model time 0.4045 (0.4128) loss 8.3585 (7.4196) grad_norm 1.7368 (2.5677) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][310/625] eta 0:02:10 lr 0.000761 wd 0.0500 time 0.4065 (0.4143) data time 0.0009 (0.0034) model time 0.4057 (0.4123) loss 7.4054 (7.4174) grad_norm 3.2578 (2.5599) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][320/625] eta 0:02:06 lr 0.000761 wd 0.0500 time 0.3963 (0.4138) data time 0.0007 (0.0033) model time 0.3956 (0.4118) loss 7.7743 (7.4159) grad_norm 1.9156 (2.5503) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][330/625] eta 0:02:01 lr 0.000761 wd 0.0500 time 0.3994 (0.4134) data time 0.0008 (0.0033) model time 0.3986 (0.4114) loss 8.0032 (7.4045) grad_norm 2.1170 (2.5364) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][340/625] eta 0:01:57 lr 0.000761 wd 0.0500 time 0.4048 (0.4131) data time 0.0006 (0.0032) model time 0.4042 (0.4111) loss 7.3127 (7.4230) grad_norm 2.8918 (2.5433) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][350/625] eta 0:01:53 lr 0.000760 wd 0.0500 time 0.5944 (0.4139) data time 0.0007 (0.0031) model time 0.5936 (0.4120) loss 8.3259 (7.4315) grad_norm 3.5531 (2.5547) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][360/625] eta 0:01:50 lr 0.000760 wd 0.0500 time 0.5096 (0.4153) data time 0.0006 (0.0031) model time 0.5090 (0.4137) loss 7.4223 (7.4263) grad_norm 1.9324 (2.5619) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][370/625] eta 0:01:46 lr 0.000760 wd 0.0500 time 0.5865 (0.4182) data time 0.0008 (0.0030) model time 0.5857 (0.4171) loss 7.0412 (7.4245) grad_norm 1.9643 (2.5653) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][380/625] eta 0:01:42 lr 0.000760 wd 0.0500 time 0.5524 (0.4198) data time 0.0010 (0.0030) model time 0.5515 (0.4189) loss 6.0536 (7.4195) grad_norm 1.9161 (2.5641) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][390/625] eta 0:01:38 lr 0.000760 wd 0.0500 time 0.3938 (0.4194) data time 0.0011 (0.0029) model time 0.3927 (0.4184) loss 7.7456 (7.4134) grad_norm 2.9756 (2.5604) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][400/625] eta 0:01:34 lr 0.000760 wd 0.0500 time 0.4004 (0.4189) data time 0.0007 (0.0029) model time 0.3997 (0.4179) loss 6.8953 (7.4048) grad_norm 2.3946 (2.5580) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:56:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][410/625] eta 0:01:29 lr 0.000760 wd 0.0500 time 0.4089 (0.4186) data time 0.0008 (0.0028) model time 0.4081 (0.4175) loss 7.5778 (7.3979) grad_norm 7.1739 (2.5732) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][420/625] eta 0:01:25 lr 0.000760 wd 0.0500 time 0.3930 (0.4182) data time 0.0007 (0.0028) model time 0.3923 (0.4171) loss 6.5297 (7.3882) grad_norm 1.7751 (2.5726) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][430/625] eta 0:01:21 lr 0.000760 wd 0.0500 time 0.4000 (0.4178) data time 0.0006 (0.0027) model time 0.3994 (0.4167) loss 7.3534 (7.3881) grad_norm 1.6798 (2.5749) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][440/625] eta 0:01:17 lr 0.000759 wd 0.0500 time 0.4163 (0.4175) data time 0.0009 (0.0027) model time 0.4154 (0.4163) loss 6.9946 (7.3939) grad_norm 2.3693 (2.5667) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][450/625] eta 0:01:13 lr 0.000759 wd 0.0500 time 0.3986 (0.4172) data time 0.0007 (0.0027) model time 0.3979 (0.4160) loss 6.2442 (7.3934) grad_norm 2.8088 (2.5903) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][460/625] eta 0:01:08 lr 0.000759 wd 0.0500 time 0.3985 (0.4169) data time 0.0008 (0.0026) model time 0.3977 (0.4156) loss 6.2741 (7.3940) grad_norm 6.5150 (2.6032) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][470/625] eta 0:01:04 lr 0.000759 wd 0.0500 time 0.4039 (0.4166) data time 0.0007 (0.0026) model time 0.4032 (0.4153) loss 7.5379 (7.4025) grad_norm 2.8220 (2.6077) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][480/625] eta 0:01:00 lr 0.000759 wd 0.0500 time 0.3939 (0.4163) data time 0.0008 (0.0026) model time 0.3930 (0.4150) loss 7.3736 (7.3975) grad_norm 2.1594 (2.6125) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][490/625] eta 0:00:56 lr 0.000759 wd 0.0500 time 0.3995 (0.4160) data time 0.0008 (0.0025) model time 0.3986 (0.4147) loss 8.4387 (7.4013) grad_norm 3.0244 (2.6044) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][500/625] eta 0:00:51 lr 0.000759 wd 0.0500 time 0.4091 (0.4158) data time 0.0007 (0.0025) model time 0.4084 (0.4144) loss 6.1142 (7.3918) grad_norm 2.1042 (2.6034) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][510/625] eta 0:00:47 lr 0.000759 wd 0.0500 time 0.3954 (0.4155) data time 0.0006 (0.0025) model time 0.3947 (0.4141) loss 7.6135 (7.3852) grad_norm 3.6139 (2.6026) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][520/625] eta 0:00:43 lr 0.000759 wd 0.0500 time 0.4049 (0.4154) data time 0.0006 (0.0024) model time 0.4043 (0.4140) loss 7.8872 (7.3854) grad_norm 2.0806 (2.6156) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][530/625] eta 0:00:39 lr 0.000759 wd 0.0500 time 0.4155 (0.4153) data time 0.0008 (0.0024) model time 0.4147 (0.4139) loss 7.9394 (7.3748) grad_norm 2.0966 (2.6173) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][540/625] eta 0:00:35 lr 0.000758 wd 0.0500 time 0.3971 (0.4151) data time 0.0008 (0.0024) model time 0.3963 (0.4137) loss 7.3376 (7.3815) grad_norm 1.8017 (2.6127) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][550/625] eta 0:00:31 lr 0.000758 wd 0.0500 time 0.4117 (0.4149) data time 0.0006 (0.0024) model time 0.4111 (0.4135) loss 7.1034 (7.3840) grad_norm 3.3492 (2.6156) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:57:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][560/625] eta 0:00:26 lr 0.000758 wd 0.0500 time 0.4068 (0.4147) data time 0.0008 (0.0023) model time 0.4060 (0.4133) loss 7.6692 (7.3746) grad_norm 2.7960 (2.6176) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][570/625] eta 0:00:22 lr 0.000758 wd 0.0500 time 0.3984 (0.4148) data time 0.0006 (0.0023) model time 0.3978 (0.4134) loss 8.1023 (7.3776) grad_norm 2.6902 (2.6249) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][580/625] eta 0:00:18 lr 0.000758 wd 0.0500 time 0.6543 (0.4164) data time 0.0007 (0.0023) model time 0.6536 (0.4151) loss 6.8766 (7.3760) grad_norm 2.2316 (2.6204) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][590/625] eta 0:00:14 lr 0.000758 wd 0.0500 time 0.4173 (0.4176) data time 0.0009 (0.0023) model time 0.4164 (0.4164) loss 7.7497 (7.3757) grad_norm 1.4812 (2.6117) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][600/625] eta 0:00:10 lr 0.000758 wd 0.0500 time 0.6340 (0.4191) data time 0.0010 (0.0022) model time 0.6330 (0.4181) loss 8.2102 (7.3810) grad_norm 2.0431 (2.6137) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][610/625] eta 0:00:06 lr 0.000758 wd 0.0500 time 0.3914 (0.4189) data time 0.0006 (0.0022) model time 0.3908 (0.4179) loss 6.0257 (7.3756) grad_norm 1.6013 (2.6035) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][620/625] eta 0:00:02 lr 0.000758 wd 0.0500 time 0.4129 (0.4188) data time 0.0006 (0.0022) model time 0.4123 (0.4178) loss 7.7575 (7.3742) grad_norm 2.6721 (2.5968) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 136 training takes 0:04:21 [2024-07-25 01:58:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 01:58:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 01:58:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5859 (0.5859) Acc@1 88.623 (88.623) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 01:58:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.9722 (0.7460) Acc@1 77.393 (84.446) Acc@5 95.752 (97.297) Mem 14939MB [2024-07-25 01:58:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0879 (0.8852) Acc@1 74.023 (80.880) Acc@5 94.287 (95.785) Mem 14939MB [2024-07-25 01:58:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.566 Acc@5 95.737 [2024-07-25 01:58:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-07-25 01:58:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.789 (0.789) Loss 0.5723 (0.5723) Acc@1 89.014 (89.014) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 01:58:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 0.9302 (0.7178) Acc@1 79.688 (85.334) Acc@5 95.508 (97.461) Mem 14939MB [2024-07-25 01:58:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 1.0645 (0.8478) Acc@1 75.488 (81.901) Acc@5 94.238 (96.103) Mem 14939MB [2024-07-25 01:58:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.540 Acc@5 96.077 [2024-07-25 01:58:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 01:58:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][0/625] eta 0:13:26 lr 0.000758 wd 0.0500 time 1.2899 (1.2899) data time 0.7148 (0.7148) model time 0.0000 (0.0000) loss 7.3766 (7.3766) grad_norm 2.7239 (2.7239) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][10/625] eta 0:04:56 lr 0.000757 wd 0.0500 time 0.3955 (0.4819) data time 0.0006 (0.0657) model time 0.0000 (0.0000) loss 5.9501 (7.6337) grad_norm 2.2204 (2.1677) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][20/625] eta 0:04:28 lr 0.000757 wd 0.0500 time 0.4000 (0.4435) data time 0.0009 (0.0348) model time 0.0000 (0.0000) loss 7.9067 (7.7116) grad_norm 1.7283 (2.1305) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][30/625] eta 0:04:15 lr 0.000757 wd 0.0500 time 0.4124 (0.4301) data time 0.0006 (0.0239) model time 0.0000 (0.0000) loss 7.2202 (7.5826) grad_norm 2.4214 (2.2200) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][40/625] eta 0:04:07 lr 0.000757 wd 0.0500 time 0.3939 (0.4228) data time 0.0007 (0.0183) model time 0.0000 (0.0000) loss 7.9446 (7.4698) grad_norm 2.2487 (2.2571) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][50/625] eta 0:04:00 lr 0.000757 wd 0.0500 time 0.3986 (0.4189) data time 0.0008 (0.0150) model time 0.0000 (0.0000) loss 5.7317 (7.3580) grad_norm 2.2845 (2.3030) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][60/625] eta 0:03:55 lr 0.000757 wd 0.0500 time 0.4072 (0.4159) data time 0.0006 (0.0127) model time 0.4065 (0.3999) loss 6.5991 (7.4147) grad_norm 2.5974 (2.3443) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][70/625] eta 0:03:49 lr 0.000757 wd 0.0500 time 0.3954 (0.4135) data time 0.0007 (0.0110) model time 0.3947 (0.3988) loss 6.3194 (7.3963) grad_norm 1.8312 (2.3836) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][80/625] eta 0:03:44 lr 0.000757 wd 0.0500 time 0.3995 (0.4120) data time 0.0007 (0.0098) model time 0.3988 (0.3992) loss 7.2563 (7.3887) grad_norm 2.9287 (2.3809) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][90/625] eta 0:03:39 lr 0.000757 wd 0.0500 time 0.4058 (0.4110) data time 0.0006 (0.0088) model time 0.4052 (0.4000) loss 6.3411 (7.3796) grad_norm 2.7503 (2.4253) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][100/625] eta 0:03:35 lr 0.000757 wd 0.0500 time 0.3930 (0.4103) data time 0.0006 (0.0081) model time 0.3924 (0.4003) loss 6.9373 (7.3969) grad_norm 2.9895 (2.4093) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][110/625] eta 0:03:30 lr 0.000756 wd 0.0500 time 0.4000 (0.4095) data time 0.0007 (0.0075) model time 0.3993 (0.4004) loss 7.8763 (7.3990) grad_norm 1.5316 (2.3885) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][120/625] eta 0:03:26 lr 0.000756 wd 0.0500 time 0.4340 (0.4094) data time 0.0008 (0.0070) model time 0.4332 (0.4013) loss 8.0794 (7.3857) grad_norm 2.7536 (2.3598) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][130/625] eta 0:03:22 lr 0.000756 wd 0.0500 time 0.4017 (0.4090) data time 0.0008 (0.0065) model time 0.4009 (0.4016) loss 6.2877 (7.3450) grad_norm 1.6224 (2.3257) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][140/625] eta 0:03:18 lr 0.000756 wd 0.0500 time 0.3803 (0.4093) data time 0.0007 (0.0062) model time 0.3796 (0.4028) loss 6.1564 (7.3043) grad_norm 2.2321 (2.3202) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][150/625] eta 0:03:14 lr 0.000756 wd 0.0500 time 0.4055 (0.4088) data time 0.0009 (0.0058) model time 0.4046 (0.4026) loss 9.2881 (7.3368) grad_norm 2.2759 (2.3075) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][160/625] eta 0:03:09 lr 0.000756 wd 0.0500 time 0.3947 (0.4084) data time 0.0009 (0.0055) model time 0.3938 (0.4024) loss 8.0080 (7.3496) grad_norm 2.1719 (2.2976) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 01:59:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][170/625] eta 0:03:06 lr 0.000756 wd 0.0500 time 0.3971 (0.4103) data time 0.0006 (0.0052) model time 0.3964 (0.4055) loss 6.8464 (7.3438) grad_norm 1.6120 (2.3086) loss_scale 2048.0000 (1047.9532) mem 14939MB [2024-07-25 01:59:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][180/625] eta 0:03:05 lr 0.000756 wd 0.0500 time 0.4033 (0.4157) data time 0.0006 (0.0050) model time 0.4027 (0.4134) loss 7.3250 (7.3345) grad_norm 2.1769 (2.3225) loss_scale 2048.0000 (1103.2044) mem 14939MB [2024-07-25 01:59:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][190/625] eta 0:03:03 lr 0.000756 wd 0.0500 time 0.3941 (0.4212) data time 0.0006 (0.0048) model time 0.3935 (0.4209) loss 6.2589 (7.3197) grad_norm 1.8087 (2.3335) loss_scale 2048.0000 (1152.6702) mem 14939MB [2024-07-25 01:59:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][200/625] eta 0:03:00 lr 0.000756 wd 0.0500 time 0.4303 (0.4239) data time 0.0006 (0.0047) model time 0.4296 (0.4243) loss 6.6687 (7.3200) grad_norm 1.6715 (2.3126) loss_scale 2048.0000 (1197.2139) mem 14939MB [2024-07-25 02:00:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][210/625] eta 0:02:55 lr 0.000755 wd 0.0500 time 0.3948 (0.4231) data time 0.0008 (0.0046) model time 0.3940 (0.4232) loss 7.0311 (7.3249) grad_norm 3.1892 (2.3419) loss_scale 2048.0000 (1237.5355) mem 14939MB [2024-07-25 02:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][220/625] eta 0:02:50 lr 0.000755 wd 0.0500 time 0.3989 (0.4222) data time 0.0008 (0.0044) model time 0.3981 (0.4220) loss 6.9964 (7.3365) grad_norm 3.9071 (2.3965) loss_scale 2048.0000 (1274.2081) mem 14939MB [2024-07-25 02:00:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][230/625] eta 0:02:46 lr 0.000755 wd 0.0500 time 0.4142 (0.4214) data time 0.0007 (0.0043) model time 0.4135 (0.4209) loss 9.6823 (7.3561) grad_norm 2.5889 (2.4082) loss_scale 2048.0000 (1307.7056) mem 14939MB [2024-07-25 02:00:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][240/625] eta 0:02:41 lr 0.000755 wd 0.0500 time 0.3990 (0.4205) data time 0.0006 (0.0041) model time 0.3984 (0.4197) loss 7.4648 (7.3498) grad_norm 2.3729 (2.4101) loss_scale 2048.0000 (1338.4232) mem 14939MB [2024-07-25 02:00:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][250/625] eta 0:02:37 lr 0.000755 wd 0.0500 time 0.4007 (0.4198) data time 0.0007 (0.0040) model time 0.4000 (0.4189) loss 6.5383 (7.3463) grad_norm 1.5827 (2.4029) loss_scale 2048.0000 (1366.6932) mem 14939MB [2024-07-25 02:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][260/625] eta 0:02:32 lr 0.000755 wd 0.0500 time 0.4071 (0.4191) data time 0.0009 (0.0039) model time 0.4063 (0.4180) loss 7.0008 (7.3525) grad_norm 2.6677 (2.4024) loss_scale 2048.0000 (1392.7969) mem 14939MB [2024-07-25 02:00:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][270/625] eta 0:02:28 lr 0.000755 wd 0.0500 time 0.3978 (0.4185) data time 0.0009 (0.0038) model time 0.3970 (0.4172) loss 6.9503 (7.3505) grad_norm 2.8481 (2.4119) loss_scale 2048.0000 (1416.9742) mem 14939MB [2024-07-25 02:00:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][280/625] eta 0:02:24 lr 0.000755 wd 0.0500 time 0.3964 (0.4178) data time 0.0008 (0.0037) model time 0.3956 (0.4164) loss 8.7678 (7.3542) grad_norm 1.5365 (2.4206) loss_scale 2048.0000 (1439.4306) mem 14939MB [2024-07-25 02:00:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][290/625] eta 0:02:19 lr 0.000755 wd 0.0500 time 0.4255 (0.4173) data time 0.0008 (0.0036) model time 0.4246 (0.4158) loss 7.6231 (7.3692) grad_norm 1.7227 (2.4126) loss_scale 2048.0000 (1460.3436) mem 14939MB [2024-07-25 02:00:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][300/625] eta 0:02:15 lr 0.000754 wd 0.0500 time 0.3955 (0.4167) data time 0.0006 (0.0035) model time 0.3949 (0.4151) loss 6.7663 (7.3602) grad_norm 2.0716 (2.4064) loss_scale 2048.0000 (1479.8671) mem 14939MB [2024-07-25 02:00:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][310/625] eta 0:02:11 lr 0.000754 wd 0.0500 time 0.4007 (0.4162) data time 0.0008 (0.0034) model time 0.3999 (0.4146) loss 7.7681 (7.3653) grad_norm 1.6707 (2.3834) loss_scale 2048.0000 (1498.1350) mem 14939MB [2024-07-25 02:00:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][320/625] eta 0:02:06 lr 0.000754 wd 0.0500 time 0.4548 (0.4159) data time 0.0006 (0.0033) model time 0.4542 (0.4142) loss 9.0389 (7.3471) grad_norm 3.0298 (2.3724) loss_scale 2048.0000 (1515.2648) mem 14939MB [2024-07-25 02:00:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][330/625] eta 0:02:02 lr 0.000754 wd 0.0500 time 0.3938 (0.4154) data time 0.0009 (0.0033) model time 0.3929 (0.4137) loss 7.1988 (7.3468) grad_norm 2.2436 (inf) loss_scale 1024.0000 (1512.7976) mem 14939MB [2024-07-25 02:00:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][340/625] eta 0:01:58 lr 0.000754 wd 0.0500 time 0.4008 (0.4151) data time 0.0008 (0.0032) model time 0.4000 (0.4133) loss 8.3981 (7.3491) grad_norm 3.4140 (inf) loss_scale 1024.0000 (1498.4633) mem 14939MB [2024-07-25 02:00:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][350/625] eta 0:01:54 lr 0.000754 wd 0.0500 time 0.4066 (0.4148) data time 0.0008 (0.0031) model time 0.4058 (0.4130) loss 7.3068 (7.3537) grad_norm 1.7398 (inf) loss_scale 1024.0000 (1484.9459) mem 14939MB [2024-07-25 02:01:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][360/625] eta 0:01:50 lr 0.000754 wd 0.0500 time 0.5997 (0.4153) data time 0.0007 (0.0031) model time 0.5990 (0.4136) loss 6.1310 (7.3585) grad_norm 4.0496 (inf) loss_scale 1024.0000 (1472.1773) mem 14939MB [2024-07-25 02:01:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][370/625] eta 0:01:45 lr 0.000754 wd 0.0500 time 0.4181 (0.4150) data time 0.0006 (0.0030) model time 0.4175 (0.4132) loss 7.8593 (7.3649) grad_norm 2.1042 (inf) loss_scale 1024.0000 (1460.0970) mem 14939MB [2024-07-25 02:01:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][380/625] eta 0:01:41 lr 0.000754 wd 0.0500 time 0.3929 (0.4147) data time 0.0011 (0.0030) model time 0.3918 (0.4129) loss 7.8870 (7.3676) grad_norm 2.1332 (inf) loss_scale 1024.0000 (1448.6509) mem 14939MB [2024-07-25 02:01:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][390/625] eta 0:01:37 lr 0.000754 wd 0.0500 time 0.4071 (0.4149) data time 0.0008 (0.0029) model time 0.4063 (0.4132) loss 7.1141 (7.3598) grad_norm 2.1440 (inf) loss_scale 1024.0000 (1437.7903) mem 14939MB [2024-07-25 02:01:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][400/625] eta 0:01:33 lr 0.000753 wd 0.0500 time 0.3884 (0.4173) data time 0.0007 (0.0029) model time 0.3876 (0.4160) loss 6.7263 (7.3642) grad_norm 1.7575 (inf) loss_scale 1024.0000 (1427.4713) mem 14939MB [2024-07-25 02:01:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][410/625] eta 0:01:30 lr 0.000753 wd 0.0500 time 0.5570 (0.4196) data time 0.0009 (0.0028) model time 0.5562 (0.4186) loss 7.3695 (7.3647) grad_norm 2.8465 (inf) loss_scale 1024.0000 (1417.6545) mem 14939MB [2024-07-25 02:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][420/625] eta 0:01:26 lr 0.000753 wd 0.0500 time 0.3949 (0.4208) data time 0.0006 (0.0028) model time 0.3943 (0.4200) loss 6.2778 (7.3591) grad_norm 1.6115 (inf) loss_scale 1024.0000 (1408.3040) mem 14939MB [2024-07-25 02:01:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][430/625] eta 0:01:21 lr 0.000753 wd 0.0500 time 0.4040 (0.4204) data time 0.0006 (0.0027) model time 0.4034 (0.4195) loss 6.3413 (7.3634) grad_norm 5.1217 (inf) loss_scale 1024.0000 (1399.3875) mem 14939MB [2024-07-25 02:01:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][440/625] eta 0:01:17 lr 0.000753 wd 0.0500 time 0.4192 (0.4200) data time 0.0009 (0.0027) model time 0.4183 (0.4191) loss 8.3821 (7.3554) grad_norm 2.0671 (inf) loss_scale 1024.0000 (1390.8753) mem 14939MB [2024-07-25 02:01:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][450/625] eta 0:01:13 lr 0.000753 wd 0.0500 time 0.3990 (0.4197) data time 0.0008 (0.0026) model time 0.3982 (0.4187) loss 7.5505 (7.3613) grad_norm 1.6411 (inf) loss_scale 1024.0000 (1382.7406) mem 14939MB [2024-07-25 02:01:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][460/625] eta 0:01:09 lr 0.000753 wd 0.0500 time 0.4085 (0.4193) data time 0.0009 (0.0026) model time 0.4077 (0.4183) loss 5.9085 (7.3610) grad_norm 3.5938 (inf) loss_scale 1024.0000 (1374.9588) mem 14939MB [2024-07-25 02:01:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][470/625] eta 0:01:04 lr 0.000753 wd 0.0500 time 0.4161 (0.4190) data time 0.0007 (0.0026) model time 0.4154 (0.4179) loss 7.1002 (7.3569) grad_norm 2.9477 (inf) loss_scale 1024.0000 (1367.5074) mem 14939MB [2024-07-25 02:01:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][480/625] eta 0:01:00 lr 0.000753 wd 0.0500 time 0.3930 (0.4186) data time 0.0008 (0.0026) model time 0.3923 (0.4175) loss 7.3841 (7.3480) grad_norm 2.0869 (inf) loss_scale 1024.0000 (1360.3659) mem 14939MB [2024-07-25 02:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][490/625] eta 0:00:56 lr 0.000753 wd 0.0500 time 0.3950 (0.4183) data time 0.0006 (0.0026) model time 0.3944 (0.4171) loss 7.3591 (7.3484) grad_norm 2.0911 (inf) loss_scale 1024.0000 (1353.5153) mem 14939MB [2024-07-25 02:02:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][500/625] eta 0:00:52 lr 0.000752 wd 0.0500 time 0.4043 (0.4180) data time 0.0007 (0.0025) model time 0.4036 (0.4168) loss 7.9577 (7.3516) grad_norm 1.8783 (inf) loss_scale 1024.0000 (1346.9381) mem 14939MB [2024-07-25 02:02:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][510/625] eta 0:00:48 lr 0.000752 wd 0.0500 time 0.3944 (0.4178) data time 0.0007 (0.0025) model time 0.3938 (0.4166) loss 6.7207 (7.3521) grad_norm 2.1538 (inf) loss_scale 1024.0000 (1340.6184) mem 14939MB [2024-07-25 02:02:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][520/625] eta 0:00:43 lr 0.000752 wd 0.0500 time 0.3996 (0.4176) data time 0.0007 (0.0025) model time 0.3990 (0.4164) loss 7.9053 (7.3491) grad_norm 2.9860 (inf) loss_scale 1024.0000 (1334.5413) mem 14939MB [2024-07-25 02:02:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][530/625] eta 0:00:39 lr 0.000752 wd 0.0500 time 0.4178 (0.4174) data time 0.0006 (0.0024) model time 0.4171 (0.4161) loss 6.4410 (7.3533) grad_norm 4.3657 (inf) loss_scale 1024.0000 (1328.6930) mem 14939MB [2024-07-25 02:02:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][540/625] eta 0:00:35 lr 0.000752 wd 0.0500 time 0.3969 (0.4171) data time 0.0008 (0.0024) model time 0.3962 (0.4158) loss 5.7280 (7.3543) grad_norm 2.3794 (inf) loss_scale 1024.0000 (1323.0610) mem 14939MB [2024-07-25 02:02:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][550/625] eta 0:00:31 lr 0.000752 wd 0.0500 time 0.4041 (0.4169) data time 0.0008 (0.0024) model time 0.4033 (0.4155) loss 7.3223 (7.3557) grad_norm 1.9293 (inf) loss_scale 1024.0000 (1317.6334) mem 14939MB [2024-07-25 02:02:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][560/625] eta 0:00:27 lr 0.000752 wd 0.0500 time 0.4104 (0.4167) data time 0.0007 (0.0024) model time 0.4098 (0.4153) loss 7.4524 (7.3499) grad_norm 3.6448 (inf) loss_scale 1024.0000 (1312.3993) mem 14939MB [2024-07-25 02:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][570/625] eta 0:00:22 lr 0.000752 wd 0.0500 time 0.3956 (0.4164) data time 0.0006 (0.0024) model time 0.3950 (0.4150) loss 6.7219 (7.3424) grad_norm 1.9954 (inf) loss_scale 1024.0000 (1307.3485) mem 14939MB [2024-07-25 02:02:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][580/625] eta 0:00:18 lr 0.000752 wd 0.0500 time 0.3966 (0.4164) data time 0.0006 (0.0023) model time 0.3960 (0.4150) loss 7.4387 (7.3429) grad_norm 1.9198 (inf) loss_scale 1024.0000 (1302.4716) mem 14939MB [2024-07-25 02:02:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][590/625] eta 0:00:14 lr 0.000752 wd 0.0500 time 0.4026 (0.4162) data time 0.0009 (0.0023) model time 0.4017 (0.4148) loss 7.2703 (7.3409) grad_norm 2.4328 (inf) loss_scale 1024.0000 (1297.7597) mem 14939MB [2024-07-25 02:02:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][600/625] eta 0:00:10 lr 0.000751 wd 0.0500 time 0.3930 (0.4159) data time 0.0007 (0.0023) model time 0.3923 (0.4145) loss 6.9478 (7.3373) grad_norm 2.1617 (inf) loss_scale 1024.0000 (1293.2047) mem 14939MB [2024-07-25 02:02:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][610/625] eta 0:00:06 lr 0.000751 wd 0.0500 time 0.3966 (0.4163) data time 0.0006 (0.0023) model time 0.3960 (0.4150) loss 6.4768 (7.3325) grad_norm 1.9588 (inf) loss_scale 1024.0000 (1288.7987) mem 14939MB [2024-07-25 02:02:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][620/625] eta 0:00:02 lr 0.000751 wd 0.0500 time 0.5566 (0.4175) data time 0.0004 (0.0023) model time 0.5562 (0.4163) loss 7.1621 (7.3320) grad_norm 1.9736 (inf) loss_scale 1024.0000 (1284.5346) mem 14939MB [2024-07-25 02:02:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 137 training takes 0:04:20 [2024-07-25 02:02:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:02:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:02:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.446 (0.446) Loss 0.6147 (0.6147) Acc@1 87.939 (87.939) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 02:02:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.9775 (0.7406) Acc@1 78.516 (84.721) Acc@5 95.020 (97.323) Mem 14939MB [2024-07-25 02:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0771 (0.8760) Acc@1 75.049 (81.248) Acc@5 93.750 (95.852) Mem 14939MB [2024-07-25 02:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.906 Acc@5 95.799 [2024-07-25 02:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 02:02:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.769 (0.769) Loss 0.5718 (0.5718) Acc@1 88.965 (88.965) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 02:03:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 0.9297 (0.7172) Acc@1 79.736 (85.387) Acc@5 95.654 (97.474) Mem 14939MB [2024-07-25 02:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 1.0645 (0.8469) Acc@1 75.439 (81.936) Acc@5 94.238 (96.105) Mem 14939MB [2024-07-25 02:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.568 Acc@5 96.071 [2024-07-25 02:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 02:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.57% [2024-07-25 02:03:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:03:02 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:03:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][0/625] eta 0:08:07 lr 0.000751 wd 0.0500 time 0.7795 (0.7795) data time 0.3795 (0.3795) model time 0.0000 (0.0000) loss 8.3102 (8.3102) grad_norm 1.8756 (1.8756) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][10/625] eta 0:05:18 lr 0.000751 wd 0.0500 time 0.6023 (0.5171) data time 0.0008 (0.0353) model time 0.0000 (0.0000) loss 7.0842 (7.2573) grad_norm 2.5906 (2.3311) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][20/625] eta 0:04:39 lr 0.000751 wd 0.0500 time 0.3939 (0.4613) data time 0.0007 (0.0190) model time 0.0000 (0.0000) loss 6.8815 (7.2713) grad_norm 2.2696 (2.3180) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][30/625] eta 0:04:22 lr 0.000751 wd 0.0500 time 0.3991 (0.4416) data time 0.0007 (0.0132) model time 0.0000 (0.0000) loss 6.8022 (7.3149) grad_norm 2.7341 (2.2547) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][40/625] eta 0:04:12 lr 0.000751 wd 0.0500 time 0.4095 (0.4317) data time 0.0009 (0.0102) model time 0.0000 (0.0000) loss 6.1460 (7.3121) grad_norm 2.4741 (2.3358) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][50/625] eta 0:04:04 lr 0.000751 wd 0.0500 time 0.3993 (0.4257) data time 0.0009 (0.0084) model time 0.0000 (0.0000) loss 6.7842 (7.3247) grad_norm 3.0119 (2.3821) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][60/625] eta 0:03:58 lr 0.000751 wd 0.0500 time 0.3929 (0.4218) data time 0.0009 (0.0072) model time 0.3920 (0.4003) loss 7.4010 (7.3010) grad_norm 1.5973 (2.4073) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][70/625] eta 0:03:52 lr 0.000750 wd 0.0500 time 0.4065 (0.4189) data time 0.0007 (0.0063) model time 0.4058 (0.4005) loss 6.2049 (7.3194) grad_norm 2.8512 (2.4242) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][80/625] eta 0:03:47 lr 0.000750 wd 0.0500 time 0.3924 (0.4169) data time 0.0007 (0.0057) model time 0.3917 (0.4008) loss 7.6758 (7.3583) grad_norm 2.2699 (2.4566) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][90/625] eta 0:03:42 lr 0.000750 wd 0.0500 time 0.4012 (0.4150) data time 0.0007 (0.0051) model time 0.4005 (0.4004) loss 7.0271 (7.3121) grad_norm 1.8514 (2.4851) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][100/625] eta 0:03:37 lr 0.000750 wd 0.0500 time 0.4079 (0.4138) data time 0.0007 (0.0047) model time 0.4072 (0.4006) loss 7.4371 (7.3346) grad_norm 1.6961 (2.4726) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][110/625] eta 0:03:33 lr 0.000750 wd 0.0500 time 0.4001 (0.4144) data time 0.0008 (0.0044) model time 0.3993 (0.4038) loss 8.5202 (7.3823) grad_norm 2.0581 (2.4505) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][120/625] eta 0:03:28 lr 0.000750 wd 0.0500 time 0.4000 (0.4134) data time 0.0007 (0.0041) model time 0.3993 (0.4035) loss 7.1794 (7.3615) grad_norm 1.5876 (2.4231) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:03:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][130/625] eta 0:03:24 lr 0.000750 wd 0.0500 time 0.4078 (0.4127) data time 0.0007 (0.0039) model time 0.4070 (0.4034) loss 7.0837 (7.3668) grad_norm 2.3833 (2.5732) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][140/625] eta 0:03:19 lr 0.000750 wd 0.0500 time 0.3971 (0.4120) data time 0.0007 (0.0037) model time 0.3964 (0.4032) loss 5.9060 (7.3521) grad_norm 2.1389 (2.6260) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][150/625] eta 0:03:15 lr 0.000750 wd 0.0500 time 0.3967 (0.4113) data time 0.0009 (0.0035) model time 0.3958 (0.4030) loss 7.3875 (7.3786) grad_norm 3.6589 (2.6213) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][160/625] eta 0:03:11 lr 0.000749 wd 0.0500 time 0.4093 (0.4109) data time 0.0007 (0.0033) model time 0.4086 (0.4030) loss 6.5837 (7.3632) grad_norm 3.0994 (2.6501) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][170/625] eta 0:03:06 lr 0.000749 wd 0.0500 time 0.3969 (0.4103) data time 0.0006 (0.0032) model time 0.3963 (0.4028) loss 7.8910 (7.3713) grad_norm 1.8256 (2.6364) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][180/625] eta 0:03:02 lr 0.000749 wd 0.0500 time 0.4006 (0.4099) data time 0.0014 (0.0031) model time 0.3992 (0.4027) loss 7.8097 (7.3694) grad_norm 2.0954 (2.6503) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][190/625] eta 0:02:58 lr 0.000749 wd 0.0500 time 0.4023 (0.4096) data time 0.0007 (0.0030) model time 0.4016 (0.4027) loss 6.4185 (7.3712) grad_norm 1.8278 (2.6351) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][200/625] eta 0:02:54 lr 0.000749 wd 0.0500 time 0.3921 (0.4099) data time 0.0009 (0.0028) model time 0.3912 (0.4035) loss 7.2448 (7.3692) grad_norm 1.9352 (2.6163) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][210/625] eta 0:02:51 lr 0.000749 wd 0.0500 time 0.5941 (0.4123) data time 0.0007 (0.0028) model time 0.5934 (0.4071) loss 7.7059 (7.3609) grad_norm 2.1815 (2.5888) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][220/625] eta 0:02:48 lr 0.000749 wd 0.0500 time 0.5605 (0.4151) data time 0.0006 (0.0027) model time 0.5598 (0.4109) loss 6.8764 (7.3602) grad_norm 2.7450 (2.5971) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][230/625] eta 0:02:45 lr 0.000749 wd 0.0500 time 0.4539 (0.4199) data time 0.0007 (0.0026) model time 0.4532 (0.4173) loss 8.7096 (7.3700) grad_norm 2.6737 (2.6127) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][240/625] eta 0:02:41 lr 0.000749 wd 0.0500 time 0.3935 (0.4201) data time 0.0009 (0.0026) model time 0.3926 (0.4176) loss 8.7092 (7.3717) grad_norm 3.4131 (2.6461) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][250/625] eta 0:02:37 lr 0.000749 wd 0.0500 time 0.3980 (0.4194) data time 0.0009 (0.0025) model time 0.3971 (0.4168) loss 7.9433 (7.3631) grad_norm 2.4924 (2.6293) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][260/625] eta 0:02:32 lr 0.000748 wd 0.0500 time 0.4095 (0.4189) data time 0.0009 (0.0025) model time 0.4087 (0.4162) loss 7.6837 (7.3660) grad_norm 5.5730 (2.6372) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][270/625] eta 0:02:28 lr 0.000748 wd 0.0500 time 0.3936 (0.4183) data time 0.0007 (0.0024) model time 0.3929 (0.4155) loss 6.7477 (7.3654) grad_norm 2.5477 (2.6560) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][280/625] eta 0:02:24 lr 0.000748 wd 0.0500 time 0.3988 (0.4178) data time 0.0007 (0.0024) model time 0.3981 (0.4150) loss 7.0729 (7.3721) grad_norm 2.1168 (2.6611) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][290/625] eta 0:02:19 lr 0.000748 wd 0.0500 time 0.4066 (0.4173) data time 0.0008 (0.0023) model time 0.4058 (0.4144) loss 8.2099 (7.3761) grad_norm 2.5015 (2.6614) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][300/625] eta 0:02:15 lr 0.000748 wd 0.0500 time 0.3947 (0.4168) data time 0.0007 (0.0023) model time 0.3940 (0.4139) loss 8.0214 (7.3789) grad_norm 1.9308 (2.6724) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][310/625] eta 0:02:11 lr 0.000748 wd 0.0500 time 0.3959 (0.4163) data time 0.0008 (0.0022) model time 0.3951 (0.4135) loss 8.3962 (7.3799) grad_norm 2.3918 (2.6792) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][320/625] eta 0:02:06 lr 0.000748 wd 0.0500 time 0.4281 (0.4160) data time 0.0007 (0.0022) model time 0.4275 (0.4131) loss 6.9212 (7.3786) grad_norm 3.0460 (2.6860) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][330/625] eta 0:02:02 lr 0.000748 wd 0.0500 time 0.4040 (0.4162) data time 0.0006 (0.0022) model time 0.4034 (0.4134) loss 7.2063 (7.3653) grad_norm 1.8607 (2.6786) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][340/625] eta 0:01:58 lr 0.000748 wd 0.0500 time 0.4084 (0.4158) data time 0.0006 (0.0021) model time 0.4078 (0.4130) loss 7.4747 (7.3681) grad_norm 2.0900 (2.6796) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][350/625] eta 0:01:54 lr 0.000748 wd 0.0500 time 0.4254 (0.4156) data time 0.0007 (0.0021) model time 0.4248 (0.4129) loss 6.4170 (7.3682) grad_norm 1.9044 (2.6883) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][360/625] eta 0:01:50 lr 0.000747 wd 0.0500 time 0.3966 (0.4154) data time 0.0007 (0.0021) model time 0.3960 (0.4127) loss 6.0972 (7.3673) grad_norm 3.7012 (2.6908) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][370/625] eta 0:01:45 lr 0.000747 wd 0.0500 time 0.4043 (0.4152) data time 0.0007 (0.0020) model time 0.4036 (0.4125) loss 5.7644 (7.3693) grad_norm 2.1006 (2.6881) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][380/625] eta 0:01:41 lr 0.000747 wd 0.0500 time 0.4136 (0.4149) data time 0.0008 (0.0020) model time 0.4128 (0.4122) loss 6.5654 (7.3716) grad_norm 3.2110 (2.6839) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][390/625] eta 0:01:37 lr 0.000747 wd 0.0500 time 0.3966 (0.4146) data time 0.0008 (0.0020) model time 0.3958 (0.4119) loss 6.6733 (7.3615) grad_norm 1.7252 (2.6811) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][400/625] eta 0:01:33 lr 0.000747 wd 0.0500 time 0.3982 (0.4143) data time 0.0006 (0.0020) model time 0.3976 (0.4116) loss 7.1895 (7.3713) grad_norm 2.5831 (2.6884) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][410/625] eta 0:01:29 lr 0.000747 wd 0.0500 time 0.4072 (0.4140) data time 0.0008 (0.0019) model time 0.4064 (0.4114) loss 8.6540 (7.3624) grad_norm 3.8595 (2.7151) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:05:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][420/625] eta 0:01:25 lr 0.000747 wd 0.0500 time 0.3944 (0.4148) data time 0.0006 (0.0019) model time 0.3938 (0.4122) loss 7.2919 (7.3628) grad_norm 2.8784 (2.7158) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][430/625] eta 0:01:21 lr 0.000747 wd 0.0500 time 0.6081 (0.4166) data time 0.0006 (0.0019) model time 0.6075 (0.4143) loss 5.8912 (7.3645) grad_norm 2.0438 (2.6992) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][440/625] eta 0:01:17 lr 0.000747 wd 0.0500 time 0.5804 (0.4184) data time 0.0009 (0.0019) model time 0.5795 (0.4164) loss 8.2753 (7.3741) grad_norm 2.2982 (2.7105) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][450/625] eta 0:01:13 lr 0.000746 wd 0.0500 time 0.5818 (0.4193) data time 0.0007 (0.0019) model time 0.5811 (0.4174) loss 8.7682 (7.3684) grad_norm 2.6863 (2.7207) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][460/625] eta 0:01:09 lr 0.000746 wd 0.0500 time 0.4094 (0.4197) data time 0.0007 (0.0019) model time 0.4087 (0.4179) loss 7.7261 (7.3720) grad_norm 3.1972 (2.7203) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][470/625] eta 0:01:04 lr 0.000746 wd 0.0500 time 0.4090 (0.4193) data time 0.0006 (0.0019) model time 0.4083 (0.4175) loss 6.4159 (7.3634) grad_norm 1.7076 (2.7182) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][480/625] eta 0:01:00 lr 0.000746 wd 0.0500 time 0.3941 (0.4190) data time 0.0008 (0.0018) model time 0.3933 (0.4172) loss 6.8602 (7.3613) grad_norm 2.0056 (2.7100) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][490/625] eta 0:00:56 lr 0.000746 wd 0.0500 time 0.4145 (0.4187) data time 0.0007 (0.0018) model time 0.4138 (0.4168) loss 8.3257 (7.3634) grad_norm 1.8773 (2.7079) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][500/625] eta 0:00:52 lr 0.000746 wd 0.0500 time 0.4214 (0.4187) data time 0.0008 (0.0018) model time 0.4206 (0.4168) loss 6.9683 (7.3523) grad_norm 2.1582 (2.7007) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][510/625] eta 0:00:48 lr 0.000746 wd 0.0500 time 0.3939 (0.4187) data time 0.0008 (0.0018) model time 0.3931 (0.4168) loss 7.7028 (7.3478) grad_norm 2.0652 (2.6956) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][520/625] eta 0:00:43 lr 0.000746 wd 0.0500 time 0.4006 (0.4185) data time 0.0007 (0.0018) model time 0.3999 (0.4166) loss 8.9334 (7.3519) grad_norm 2.1071 (2.6961) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][530/625] eta 0:00:39 lr 0.000746 wd 0.0500 time 0.4185 (0.4183) data time 0.0007 (0.0018) model time 0.4179 (0.4164) loss 8.9559 (7.3568) grad_norm 2.0181 (2.7086) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][540/625] eta 0:00:35 lr 0.000746 wd 0.0500 time 0.3945 (0.4180) data time 0.0008 (0.0018) model time 0.3936 (0.4161) loss 7.0169 (7.3561) grad_norm 3.2173 (2.7047) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][550/625] eta 0:00:31 lr 0.000745 wd 0.0500 time 0.4015 (0.4180) data time 0.0009 (0.0018) model time 0.4006 (0.4161) loss 7.9563 (7.3552) grad_norm 2.4087 (2.6962) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:06:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][560/625] eta 0:00:27 lr 0.000745 wd 0.0500 time 0.4023 (0.4177) data time 0.0009 (0.0018) model time 0.4014 (0.4158) loss 7.0922 (7.3478) grad_norm 3.3502 (2.6916) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][570/625] eta 0:00:22 lr 0.000745 wd 0.0500 time 0.3946 (0.4175) data time 0.0008 (0.0018) model time 0.3938 (0.4156) loss 7.0782 (7.3466) grad_norm 2.2397 (2.6854) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][580/625] eta 0:00:18 lr 0.000745 wd 0.0500 time 0.4111 (0.4175) data time 0.0006 (0.0018) model time 0.4105 (0.4156) loss 7.0240 (7.3402) grad_norm 2.9544 (2.6853) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][590/625] eta 0:00:14 lr 0.000745 wd 0.0500 time 0.4234 (0.4176) data time 0.0006 (0.0018) model time 0.4228 (0.4156) loss 7.8614 (7.3393) grad_norm 2.1939 (2.6805) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][600/625] eta 0:00:10 lr 0.000745 wd 0.0500 time 0.3935 (0.4174) data time 0.0007 (0.0018) model time 0.3927 (0.4154) loss 8.3940 (7.3460) grad_norm 2.2784 (2.6733) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][610/625] eta 0:00:06 lr 0.000745 wd 0.0500 time 0.4032 (0.4172) data time 0.0004 (0.0018) model time 0.4028 (0.4152) loss 6.3256 (7.3439) grad_norm 4.5079 (2.6714) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][620/625] eta 0:00:02 lr 0.000745 wd 0.0500 time 0.4144 (0.4170) data time 0.0006 (0.0018) model time 0.4138 (0.4151) loss 8.5302 (7.3376) grad_norm 2.0761 (2.6721) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 138 training takes 0:04:20 [2024-07-25 02:07:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:07:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:07:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.624 (0.624) Loss 0.5981 (0.5981) Acc@1 88.037 (88.037) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 02:07:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.135) Loss 0.9639 (0.7400) Acc@1 78.613 (84.593) Acc@5 95.459 (97.230) Mem 14939MB [2024-07-25 02:07:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.112) Loss 1.0713 (0.8701) Acc@1 73.535 (81.071) Acc@5 93.799 (95.791) Mem 14939MB [2024-07-25 02:07:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.694 Acc@5 95.745 [2024-07-25 02:07:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-07-25 02:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.868 (0.868) Loss 0.5703 (0.5703) Acc@1 89.014 (89.014) Acc@5 98.535 (98.535) Mem 14939MB [2024-07-25 02:07:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.159) Loss 0.9277 (0.7160) Acc@1 79.736 (85.400) Acc@5 95.703 (97.474) Mem 14939MB [2024-07-25 02:07:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 1.0635 (0.8457) Acc@1 75.342 (81.959) Acc@5 94.336 (96.124) Mem 14939MB [2024-07-25 02:07:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.598 Acc@5 96.097 [2024-07-25 02:07:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 02:07:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.60% [2024-07-25 02:07:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:07:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][0/625] eta 0:08:30 lr 0.000745 wd 0.0500 time 0.8167 (0.8167) data time 0.4235 (0.4235) model time 0.0000 (0.0000) loss 8.0400 (8.0400) grad_norm 2.2537 (2.2537) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][10/625] eta 0:04:30 lr 0.000745 wd 0.0500 time 0.3975 (0.4397) data time 0.0006 (0.0393) model time 0.0000 (0.0000) loss 6.9031 (7.2453) grad_norm 1.9370 (2.7729) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][20/625] eta 0:04:31 lr 0.000744 wd 0.0500 time 0.4067 (0.4494) data time 0.0008 (0.0210) model time 0.0000 (0.0000) loss 7.3129 (7.1949) grad_norm 1.5724 (2.8438) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][30/625] eta 0:04:35 lr 0.000744 wd 0.0500 time 0.6176 (0.4630) data time 0.0006 (0.0145) model time 0.0000 (0.0000) loss 7.2518 (7.1147) grad_norm 4.3164 (2.9849) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][40/625] eta 0:04:34 lr 0.000744 wd 0.0500 time 0.5580 (0.4694) data time 0.0008 (0.0113) model time 0.0000 (0.0000) loss 8.3243 (7.1387) grad_norm 2.3832 (2.9765) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][50/625] eta 0:04:28 lr 0.000744 wd 0.0500 time 0.6352 (0.4666) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 6.9571 (7.2345) grad_norm 3.0866 (2.9397) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:07:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][60/625] eta 0:04:19 lr 0.000744 wd 0.0500 time 0.3937 (0.4589) data time 0.0009 (0.0079) model time 0.3929 (0.4183) loss 8.1539 (7.2991) grad_norm 1.6174 (2.8834) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][70/625] eta 0:04:10 lr 0.000744 wd 0.0500 time 0.4039 (0.4514) data time 0.0007 (0.0070) model time 0.4032 (0.4113) loss 7.1871 (7.3214) grad_norm 2.2368 (2.9362) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][80/625] eta 0:04:04 lr 0.000744 wd 0.0500 time 0.3945 (0.4478) data time 0.0007 (0.0062) model time 0.3938 (0.4147) loss 6.9602 (7.2705) grad_norm 2.8522 (2.9499) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][90/625] eta 0:03:56 lr 0.000744 wd 0.0500 time 0.4017 (0.4427) data time 0.0009 (0.0056) model time 0.4008 (0.4111) loss 6.7016 (7.2370) grad_norm 2.3059 (2.9110) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][100/625] eta 0:03:50 lr 0.000744 wd 0.0500 time 0.4068 (0.4393) data time 0.0008 (0.0052) model time 0.4060 (0.4104) loss 5.9308 (7.2338) grad_norm 2.7435 (2.9208) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][110/625] eta 0:03:44 lr 0.000744 wd 0.0500 time 0.3975 (0.4359) data time 0.0009 (0.0048) model time 0.3966 (0.4087) loss 6.1675 (7.2664) grad_norm 1.9265 (2.9165) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][120/625] eta 0:03:38 lr 0.000743 wd 0.0500 time 0.4043 (0.4333) data time 0.0009 (0.0045) model time 0.4034 (0.4079) loss 7.4771 (7.2654) grad_norm 1.4172 (2.8615) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][130/625] eta 0:03:33 lr 0.000743 wd 0.0500 time 0.4127 (0.4310) data time 0.0009 (0.0042) model time 0.4118 (0.4073) loss 7.2837 (7.2889) grad_norm 2.6447 (2.8655) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][140/625] eta 0:03:28 lr 0.000743 wd 0.0500 time 0.3935 (0.4290) data time 0.0009 (0.0040) model time 0.3926 (0.4066) loss 7.1316 (7.2574) grad_norm 1.9642 (2.8089) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][150/625] eta 0:03:22 lr 0.000743 wd 0.0500 time 0.3999 (0.4272) data time 0.0008 (0.0038) model time 0.3991 (0.4060) loss 7.7946 (7.2950) grad_norm 2.5821 (2.7702) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][160/625] eta 0:03:17 lr 0.000743 wd 0.0500 time 0.4254 (0.4258) data time 0.0008 (0.0036) model time 0.4246 (0.4058) loss 8.0011 (7.3059) grad_norm 1.6783 (2.7265) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][170/625] eta 0:03:13 lr 0.000743 wd 0.0500 time 0.3926 (0.4249) data time 0.0007 (0.0036) model time 0.3920 (0.4061) loss 7.5191 (7.3081) grad_norm 1.7498 (2.6942) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][180/625] eta 0:03:08 lr 0.000743 wd 0.0500 time 0.4174 (0.4244) data time 0.0007 (0.0035) model time 0.4167 (0.4066) loss 7.7899 (7.3044) grad_norm 2.2298 (2.6772) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][190/625] eta 0:03:04 lr 0.000743 wd 0.0500 time 0.4046 (0.4234) data time 0.0009 (0.0034) model time 0.4037 (0.4064) loss 7.4095 (7.3359) grad_norm 1.9720 (2.6561) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][200/625] eta 0:02:59 lr 0.000743 wd 0.0500 time 0.3977 (0.4224) data time 0.0006 (0.0033) model time 0.3971 (0.4061) loss 6.6579 (7.3249) grad_norm 2.3029 (2.6324) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:08:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][210/625] eta 0:02:54 lr 0.000742 wd 0.0500 time 0.4031 (0.4214) data time 0.0007 (0.0032) model time 0.4025 (0.4058) loss 7.5730 (7.3252) grad_norm 2.0007 (2.6305) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][220/625] eta 0:02:50 lr 0.000742 wd 0.0500 time 0.4184 (0.4208) data time 0.0009 (0.0031) model time 0.4175 (0.4058) loss 6.7362 (7.3140) grad_norm 2.1940 (2.6261) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][230/625] eta 0:02:45 lr 0.000742 wd 0.0500 time 0.3961 (0.4200) data time 0.0009 (0.0030) model time 0.3952 (0.4056) loss 7.5328 (7.3117) grad_norm 2.9384 (2.6515) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][240/625] eta 0:02:41 lr 0.000742 wd 0.0500 time 0.3961 (0.4207) data time 0.0007 (0.0029) model time 0.3954 (0.4072) loss 6.4477 (7.2982) grad_norm 2.3714 (2.6574) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][250/625] eta 0:02:39 lr 0.000742 wd 0.0500 time 0.5512 (0.4243) data time 0.0008 (0.0028) model time 0.5504 (0.4123) loss 7.4106 (7.2993) grad_norm 1.8621 (2.6377) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][260/625] eta 0:02:35 lr 0.000742 wd 0.0500 time 0.3954 (0.4263) data time 0.0006 (0.0028) model time 0.3948 (0.4154) loss 6.2611 (7.2904) grad_norm 2.0180 (2.6123) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][270/625] eta 0:02:31 lr 0.000742 wd 0.0500 time 0.6076 (0.4282) data time 0.0006 (0.0027) model time 0.6069 (0.4181) loss 6.8336 (7.2917) grad_norm 2.2875 (2.6001) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][280/625] eta 0:02:27 lr 0.000742 wd 0.0500 time 0.3992 (0.4272) data time 0.0007 (0.0026) model time 0.3986 (0.4172) loss 8.3371 (7.2978) grad_norm 3.0736 (2.5942) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][290/625] eta 0:02:22 lr 0.000742 wd 0.0500 time 0.3968 (0.4263) data time 0.0007 (0.0026) model time 0.3961 (0.4165) loss 7.0580 (7.2826) grad_norm 3.2666 (2.5904) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][300/625] eta 0:02:18 lr 0.000742 wd 0.0500 time 0.3937 (0.4264) data time 0.0006 (0.0025) model time 0.3931 (0.4170) loss 5.9804 (7.2777) grad_norm 2.6786 (2.5775) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][310/625] eta 0:02:14 lr 0.000741 wd 0.0500 time 0.3970 (0.4257) data time 0.0009 (0.0025) model time 0.3961 (0.4165) loss 6.3345 (7.2686) grad_norm 3.5388 (2.5819) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][320/625] eta 0:02:09 lr 0.000741 wd 0.0500 time 0.4318 (0.4251) data time 0.0006 (0.0025) model time 0.4312 (0.4161) loss 7.2602 (7.2570) grad_norm 2.2153 (2.5731) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][330/625] eta 0:02:05 lr 0.000741 wd 0.0500 time 0.3937 (0.4244) data time 0.0007 (0.0025) model time 0.3930 (0.4155) loss 8.2328 (7.2677) grad_norm 2.9559 (2.5812) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][340/625] eta 0:02:00 lr 0.000741 wd 0.0500 time 0.3990 (0.4239) data time 0.0006 (0.0024) model time 0.3984 (0.4152) loss 8.3408 (7.2685) grad_norm 1.5508 (2.5701) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:09:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][350/625] eta 0:01:56 lr 0.000741 wd 0.0500 time 0.4099 (0.4233) data time 0.0009 (0.0024) model time 0.4090 (0.4147) loss 7.2767 (7.2709) grad_norm 1.6032 (2.5628) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][360/625] eta 0:01:52 lr 0.000741 wd 0.0500 time 0.4087 (0.4228) data time 0.0010 (0.0023) model time 0.4077 (0.4144) loss 8.1791 (7.2781) grad_norm 2.8410 (2.5754) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][370/625] eta 0:01:47 lr 0.000741 wd 0.0500 time 0.3978 (0.4222) data time 0.0008 (0.0023) model time 0.3969 (0.4140) loss 6.7420 (7.2719) grad_norm 2.9311 (2.5729) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][380/625] eta 0:01:43 lr 0.000741 wd 0.0500 time 0.4038 (0.4217) data time 0.0006 (0.0022) model time 0.4032 (0.4136) loss 8.8242 (7.2787) grad_norm 2.0277 (2.5685) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][390/625] eta 0:01:38 lr 0.000741 wd 0.0500 time 0.3950 (0.4212) data time 0.0008 (0.0022) model time 0.3941 (0.4132) loss 7.5583 (7.2834) grad_norm 3.0897 (2.5739) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][400/625] eta 0:01:34 lr 0.000741 wd 0.0500 time 0.3967 (0.4207) data time 0.0009 (0.0022) model time 0.3958 (0.4129) loss 7.1214 (7.2879) grad_norm 2.3801 (2.5696) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][410/625] eta 0:01:30 lr 0.000740 wd 0.0500 time 0.4073 (0.4203) data time 0.0007 (0.0022) model time 0.4066 (0.4126) loss 6.3637 (7.2812) grad_norm 6.6358 (2.5712) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][420/625] eta 0:01:26 lr 0.000740 wd 0.0500 time 0.3935 (0.4198) data time 0.0008 (0.0021) model time 0.3927 (0.4122) loss 7.8884 (7.2878) grad_norm 2.9435 (2.5714) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][430/625] eta 0:01:21 lr 0.000740 wd 0.0500 time 0.4008 (0.4194) data time 0.0007 (0.0021) model time 0.4002 (0.4119) loss 6.4124 (7.2852) grad_norm 5.1001 (2.5772) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][440/625] eta 0:01:17 lr 0.000740 wd 0.0500 time 0.4067 (0.4189) data time 0.0006 (0.0021) model time 0.4061 (0.4116) loss 7.3993 (7.2855) grad_norm 5.6085 (2.5958) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][450/625] eta 0:01:13 lr 0.000740 wd 0.0500 time 0.3995 (0.4186) data time 0.0006 (0.0020) model time 0.3988 (0.4113) loss 7.2838 (7.2883) grad_norm 4.3151 (2.6123) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][460/625] eta 0:01:09 lr 0.000740 wd 0.0500 time 0.6071 (0.4193) data time 0.0006 (0.0020) model time 0.6065 (0.4123) loss 6.8653 (7.2899) grad_norm 1.6124 (2.6006) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][470/625] eta 0:01:05 lr 0.000740 wd 0.0500 time 0.5679 (0.4208) data time 0.0007 (0.0020) model time 0.5672 (0.4141) loss 5.2805 (7.2839) grad_norm 3.2205 (2.5905) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][480/625] eta 0:01:01 lr 0.000740 wd 0.0500 time 0.4075 (0.4220) data time 0.0007 (0.0020) model time 0.4069 (0.4156) loss 5.8022 (7.2742) grad_norm 2.9182 (2.5823) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:10:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][490/625] eta 0:00:57 lr 0.000740 wd 0.0500 time 0.6033 (0.4235) data time 0.0008 (0.0020) model time 0.6025 (0.4174) loss 7.1283 (7.2746) grad_norm 3.4681 (2.5814) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][500/625] eta 0:00:52 lr 0.000739 wd 0.0500 time 0.3938 (0.4230) data time 0.0009 (0.0019) model time 0.3929 (0.4170) loss 8.4958 (7.2776) grad_norm 3.4603 (2.5911) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][510/625] eta 0:00:48 lr 0.000739 wd 0.0500 time 0.4065 (0.4226) data time 0.0009 (0.0019) model time 0.4057 (0.4167) loss 6.8704 (7.2730) grad_norm 2.3130 (2.5786) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][520/625] eta 0:00:44 lr 0.000739 wd 0.0500 time 0.4063 (0.4225) data time 0.0007 (0.0019) model time 0.4056 (0.4167) loss 5.8828 (7.2721) grad_norm 2.9275 (2.5738) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][530/625] eta 0:00:40 lr 0.000739 wd 0.0500 time 0.3920 (0.4221) data time 0.0009 (0.0019) model time 0.3912 (0.4163) loss 8.2474 (7.2752) grad_norm 2.6726 (2.5706) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][540/625] eta 0:00:35 lr 0.000739 wd 0.0500 time 0.4013 (0.4218) data time 0.0008 (0.0019) model time 0.4005 (0.4160) loss 7.6581 (7.2772) grad_norm 1.9949 (2.5767) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][550/625] eta 0:00:31 lr 0.000739 wd 0.0500 time 0.4127 (0.4214) data time 0.0008 (0.0018) model time 0.4118 (0.4157) loss 8.4954 (7.2776) grad_norm 2.8135 (2.5711) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][560/625] eta 0:00:27 lr 0.000739 wd 0.0500 time 0.3949 (0.4211) data time 0.0008 (0.0018) model time 0.3941 (0.4154) loss 5.9801 (7.2735) grad_norm 2.1394 (2.5660) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][570/625] eta 0:00:23 lr 0.000739 wd 0.0500 time 0.4072 (0.4208) data time 0.0008 (0.0018) model time 0.4064 (0.4152) loss 8.0931 (7.2818) grad_norm 1.8283 (2.5607) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][580/625] eta 0:00:18 lr 0.000739 wd 0.0500 time 0.4059 (0.4205) data time 0.0007 (0.0018) model time 0.4052 (0.4149) loss 7.7202 (7.2905) grad_norm 3.3444 (2.5657) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][590/625] eta 0:00:14 lr 0.000739 wd 0.0500 time 0.3934 (0.4201) data time 0.0006 (0.0018) model time 0.3927 (0.4147) loss 7.4992 (7.2857) grad_norm 2.0973 (2.5610) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][600/625] eta 0:00:10 lr 0.000738 wd 0.0500 time 0.4038 (0.4198) data time 0.0006 (0.0018) model time 0.4032 (0.4144) loss 7.1897 (7.2870) grad_norm 2.1061 (2.5566) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][610/625] eta 0:00:06 lr 0.000738 wd 0.0500 time 0.4087 (0.4196) data time 0.0006 (0.0018) model time 0.4081 (0.4142) loss 7.3661 (7.2900) grad_norm 2.4980 (2.5523) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][620/625] eta 0:00:02 lr 0.000738 wd 0.0500 time 0.3928 (0.4193) data time 0.0004 (0.0017) model time 0.3924 (0.4140) loss 6.7553 (7.2898) grad_norm 1.5961 (2.5538) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:11:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 139 training takes 0:04:21 [2024-07-25 02:11:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:11:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:11:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5776 (0.5776) Acc@1 88.477 (88.477) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 02:11:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.9434 (0.7293) Acc@1 78.369 (84.810) Acc@5 95.361 (97.319) Mem 14939MB [2024-07-25 02:11:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.1016 (0.8657) Acc@1 74.023 (81.217) Acc@5 94.092 (95.917) Mem 14939MB [2024-07-25 02:11:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.900 Acc@5 95.873 [2024-07-25 02:11:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 02:11:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.773 (0.773) Loss 0.5688 (0.5688) Acc@1 88.916 (88.916) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 02:11:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.156) Loss 0.9253 (0.7146) Acc@1 79.785 (85.423) Acc@5 95.654 (97.479) Mem 14939MB [2024-07-25 02:11:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.123) Loss 1.0605 (0.8441) Acc@1 75.293 (81.971) Acc@5 94.287 (96.124) Mem 14939MB [2024-07-25 02:11:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.594 Acc@5 96.105 [2024-07-25 02:11:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 02:11:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][0/625] eta 0:13:38 lr 0.000738 wd 0.0500 time 1.3103 (1.3103) data time 0.5038 (0.5038) model time 0.0000 (0.0000) loss 6.0257 (6.0257) grad_norm 2.3705 (2.3705) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][10/625] eta 0:04:58 lr 0.000738 wd 0.0500 time 0.3968 (0.4860) data time 0.0009 (0.0466) model time 0.0000 (0.0000) loss 7.6329 (7.3652) grad_norm 2.8357 (2.1437) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][20/625] eta 0:04:29 lr 0.000738 wd 0.0500 time 0.3984 (0.4450) data time 0.0007 (0.0248) model time 0.0000 (0.0000) loss 7.6739 (7.3708) grad_norm 6.1230 (2.3826) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][30/625] eta 0:04:16 lr 0.000738 wd 0.0500 time 0.4003 (0.4307) data time 0.0009 (0.0171) model time 0.0000 (0.0000) loss 7.5761 (7.5094) grad_norm 3.0602 (2.4471) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][40/625] eta 0:04:07 lr 0.000738 wd 0.0500 time 0.3948 (0.4234) data time 0.0010 (0.0132) model time 0.0000 (0.0000) loss 6.7509 (7.5764) grad_norm 5.5956 (2.5939) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][50/625] eta 0:04:02 lr 0.000738 wd 0.0500 time 0.4013 (0.4216) data time 0.0008 (0.0108) model time 0.0000 (0.0000) loss 7.6270 (7.5781) grad_norm 2.0330 (2.6589) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][60/625] eta 0:04:00 lr 0.000738 wd 0.0500 time 0.5468 (0.4262) data time 0.0009 (0.0092) model time 0.5459 (0.4483) loss 8.2353 (7.5692) grad_norm 2.5784 (2.6263) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][70/625] eta 0:04:03 lr 0.000737 wd 0.0500 time 0.6259 (0.4391) data time 0.0007 (0.0080) model time 0.6252 (0.4827) loss 7.9600 (7.5017) grad_norm 1.8320 (2.6368) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][80/625] eta 0:03:59 lr 0.000737 wd 0.0500 time 0.5968 (0.4398) data time 0.0006 (0.0071) model time 0.5962 (0.4697) loss 6.6024 (7.4034) grad_norm 3.7659 (2.5960) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][90/625] eta 0:03:56 lr 0.000737 wd 0.0500 time 0.4009 (0.4422) data time 0.0008 (0.0064) model time 0.4001 (0.4675) loss 7.4834 (7.4365) grad_norm 1.9964 (2.6260) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][100/625] eta 0:03:50 lr 0.000737 wd 0.0500 time 0.4004 (0.4383) data time 0.0007 (0.0059) model time 0.3997 (0.4543) loss 7.9752 (7.4008) grad_norm 2.5220 (2.5966) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][110/625] eta 0:03:43 lr 0.000737 wd 0.0500 time 0.4112 (0.4348) data time 0.0009 (0.0055) model time 0.4103 (0.4451) loss 6.7743 (7.3960) grad_norm 2.3705 (2.5952) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][120/625] eta 0:03:38 lr 0.000737 wd 0.0500 time 0.3969 (0.4321) data time 0.0006 (0.0051) model time 0.3963 (0.4387) loss 6.0227 (7.3727) grad_norm 1.6277 (2.5863) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][130/625] eta 0:03:32 lr 0.000737 wd 0.0500 time 0.3970 (0.4302) data time 0.0008 (0.0048) model time 0.3963 (0.4346) loss 7.6472 (7.3437) grad_norm 1.8954 (2.5628) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:12:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][140/625] eta 0:03:27 lr 0.000737 wd 0.0500 time 0.4276 (0.4288) data time 0.0008 (0.0046) model time 0.4268 (0.4317) loss 6.3921 (7.3115) grad_norm 2.7193 (2.5699) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][150/625] eta 0:03:22 lr 0.000737 wd 0.0500 time 0.3920 (0.4273) data time 0.0007 (0.0044) model time 0.3913 (0.4290) loss 6.7899 (7.3056) grad_norm 3.1538 (2.5895) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][160/625] eta 0:03:18 lr 0.000737 wd 0.0500 time 0.3964 (0.4259) data time 0.0006 (0.0042) model time 0.3958 (0.4268) loss 7.1273 (7.2944) grad_norm 1.6564 (2.5783) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][170/625] eta 0:03:13 lr 0.000736 wd 0.0500 time 0.4219 (0.4248) data time 0.0008 (0.0041) model time 0.4211 (0.4248) loss 7.7588 (7.2840) grad_norm 2.2686 (2.5430) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][180/625] eta 0:03:08 lr 0.000736 wd 0.0500 time 0.3946 (0.4237) data time 0.0011 (0.0039) model time 0.3936 (0.4232) loss 7.5403 (7.3037) grad_norm 1.7594 (2.5168) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][190/625] eta 0:03:03 lr 0.000736 wd 0.0500 time 0.3981 (0.4226) data time 0.0007 (0.0038) model time 0.3974 (0.4217) loss 8.3726 (7.3333) grad_norm 2.2298 (2.5050) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][200/625] eta 0:02:59 lr 0.000736 wd 0.0500 time 0.4086 (0.4216) data time 0.0008 (0.0037) model time 0.4078 (0.4203) loss 5.9179 (7.3298) grad_norm 2.4142 (2.4949) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][210/625] eta 0:02:54 lr 0.000736 wd 0.0500 time 0.3946 (0.4206) data time 0.0006 (0.0036) model time 0.3940 (0.4190) loss 7.3202 (7.3447) grad_norm 2.5408 (2.4901) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][220/625] eta 0:02:50 lr 0.000736 wd 0.0500 time 0.3960 (0.4198) data time 0.0008 (0.0034) model time 0.3951 (0.4180) loss 8.2238 (7.3557) grad_norm 1.5474 (2.4712) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][230/625] eta 0:02:45 lr 0.000736 wd 0.0500 time 0.4195 (0.4190) data time 0.0009 (0.0033) model time 0.4187 (0.4171) loss 8.1714 (7.3620) grad_norm 1.7465 (2.4561) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][240/625] eta 0:02:41 lr 0.000736 wd 0.0500 time 0.4015 (0.4183) data time 0.0006 (0.0032) model time 0.4009 (0.4162) loss 7.8157 (7.3666) grad_norm 2.0693 (2.4742) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][250/625] eta 0:02:36 lr 0.000736 wd 0.0500 time 0.3970 (0.4176) data time 0.0008 (0.0031) model time 0.3962 (0.4154) loss 6.4010 (7.3566) grad_norm 2.2008 (2.4663) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][260/625] eta 0:02:32 lr 0.000735 wd 0.0500 time 0.4222 (0.4171) data time 0.0006 (0.0030) model time 0.4215 (0.4148) loss 6.9860 (7.3567) grad_norm 2.2741 (2.4599) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][270/625] eta 0:02:28 lr 0.000735 wd 0.0500 time 0.3986 (0.4172) data time 0.0008 (0.0030) model time 0.3978 (0.4150) loss 5.8749 (7.3556) grad_norm 2.7750 (2.4506) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:13:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][280/625] eta 0:02:24 lr 0.000735 wd 0.0500 time 0.5868 (0.4186) data time 0.0008 (0.0029) model time 0.5860 (0.4168) loss 7.8267 (7.3579) grad_norm 4.6795 (2.4777) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][290/625] eta 0:02:21 lr 0.000735 wd 0.0500 time 0.3985 (0.4209) data time 0.0006 (0.0028) model time 0.3979 (0.4197) loss 5.8331 (7.3562) grad_norm 1.9300 (2.4868) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][300/625] eta 0:02:17 lr 0.000735 wd 0.0500 time 0.5843 (0.4231) data time 0.0006 (0.0028) model time 0.5837 (0.4223) loss 7.7319 (7.3543) grad_norm 2.2861 (2.4752) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][310/625] eta 0:02:13 lr 0.000735 wd 0.0500 time 0.4044 (0.4240) data time 0.0008 (0.0027) model time 0.4035 (0.4234) loss 7.9634 (7.3598) grad_norm 2.8264 (2.4662) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][320/625] eta 0:02:09 lr 0.000735 wd 0.0500 time 0.4047 (0.4234) data time 0.0008 (0.0026) model time 0.4039 (0.4226) loss 7.8927 (7.3584) grad_norm 3.5258 (2.4800) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][330/625] eta 0:02:04 lr 0.000735 wd 0.0500 time 0.4080 (0.4228) data time 0.0006 (0.0026) model time 0.4074 (0.4219) loss 8.2473 (7.3593) grad_norm 2.1812 (2.4891) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][340/625] eta 0:02:00 lr 0.000735 wd 0.0500 time 0.3941 (0.4221) data time 0.0008 (0.0025) model time 0.3933 (0.4211) loss 5.9691 (7.3643) grad_norm 2.5700 (2.4879) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][350/625] eta 0:01:55 lr 0.000735 wd 0.0500 time 0.3979 (0.4215) data time 0.0008 (0.0025) model time 0.3970 (0.4204) loss 5.6748 (7.3537) grad_norm 2.9542 (2.4785) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][360/625] eta 0:01:51 lr 0.000734 wd 0.0500 time 0.4103 (0.4210) data time 0.0008 (0.0024) model time 0.4096 (0.4198) loss 7.0709 (7.3376) grad_norm 1.8797 (2.4834) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][370/625] eta 0:01:47 lr 0.000734 wd 0.0500 time 0.4062 (0.4205) data time 0.0006 (0.0024) model time 0.4056 (0.4192) loss 7.0777 (7.3400) grad_norm 2.6368 (2.4817) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][380/625] eta 0:01:42 lr 0.000734 wd 0.0500 time 0.3972 (0.4199) data time 0.0006 (0.0024) model time 0.3966 (0.4186) loss 7.1389 (7.3331) grad_norm 2.1946 (2.4725) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][390/625] eta 0:01:38 lr 0.000734 wd 0.0500 time 0.4158 (0.4196) data time 0.0008 (0.0024) model time 0.4150 (0.4181) loss 7.6782 (7.3321) grad_norm 2.1380 (2.4781) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][400/625] eta 0:01:34 lr 0.000734 wd 0.0500 time 0.3969 (0.4191) data time 0.0008 (0.0024) model time 0.3961 (0.4175) loss 7.6945 (7.3278) grad_norm 4.4480 (2.4941) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][410/625] eta 0:01:30 lr 0.000734 wd 0.0500 time 0.3991 (0.4186) data time 0.0007 (0.0024) model time 0.3984 (0.4170) loss 6.3699 (7.3260) grad_norm 2.3543 (2.5047) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][420/625] eta 0:01:25 lr 0.000734 wd 0.0500 time 0.4072 (0.4182) data time 0.0008 (0.0023) model time 0.4064 (0.4166) loss 8.2512 (7.3303) grad_norm 2.0922 (2.4950) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][430/625] eta 0:01:21 lr 0.000734 wd 0.0500 time 0.3959 (0.4178) data time 0.0007 (0.0023) model time 0.3952 (0.4161) loss 5.9742 (7.3165) grad_norm 2.3403 (2.5154) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:15:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][440/625] eta 0:01:17 lr 0.000734 wd 0.0500 time 0.4022 (0.4174) data time 0.0009 (0.0023) model time 0.4014 (0.4157) loss 7.6216 (7.3203) grad_norm 2.2509 (2.5370) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:15:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][450/625] eta 0:01:12 lr 0.000733 wd 0.0500 time 0.4057 (0.4170) data time 0.0008 (0.0022) model time 0.4049 (0.4153) loss 6.7561 (7.3251) grad_norm 2.3881 (2.5388) loss_scale 2048.0000 (1026.2705) mem 14939MB [2024-07-25 02:15:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][460/625] eta 0:01:08 lr 0.000733 wd 0.0500 time 0.3980 (0.4166) data time 0.0009 (0.0022) model time 0.3971 (0.4149) loss 6.7921 (7.3274) grad_norm 2.9666 (2.5448) loss_scale 2048.0000 (1048.4338) mem 14939MB [2024-07-25 02:15:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][470/625] eta 0:01:04 lr 0.000733 wd 0.0500 time 0.4046 (0.4163) data time 0.0007 (0.0022) model time 0.4039 (0.4146) loss 6.3460 (7.3180) grad_norm 1.9402 (2.5434) loss_scale 2048.0000 (1069.6561) mem 14939MB [2024-07-25 02:15:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][480/625] eta 0:01:00 lr 0.000733 wd 0.0500 time 0.4103 (0.4160) data time 0.0006 (0.0021) model time 0.4096 (0.4142) loss 7.4101 (7.3111) grad_norm 2.0749 (2.5434) loss_scale 2048.0000 (1089.9958) mem 14939MB [2024-07-25 02:15:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][490/625] eta 0:00:56 lr 0.000733 wd 0.0500 time 0.3982 (0.4161) data time 0.0006 (0.0021) model time 0.3976 (0.4144) loss 5.7699 (7.3057) grad_norm 3.5794 (2.5440) loss_scale 2048.0000 (1109.5071) mem 14939MB [2024-07-25 02:15:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][500/625] eta 0:00:52 lr 0.000733 wd 0.0500 time 0.5830 (0.4167) data time 0.0007 (0.0021) model time 0.5824 (0.4151) loss 8.2020 (7.3042) grad_norm 1.8014 (2.5422) loss_scale 2048.0000 (1128.2395) mem 14939MB [2024-07-25 02:15:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][510/625] eta 0:00:48 lr 0.000733 wd 0.0500 time 0.4057 (0.4183) data time 0.0009 (0.0021) model time 0.4049 (0.4168) loss 7.0679 (7.3068) grad_norm 1.8595 (2.5324) loss_scale 2048.0000 (1146.2387) mem 14939MB [2024-07-25 02:15:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][520/625] eta 0:00:44 lr 0.000733 wd 0.0500 time 0.5645 (0.4201) data time 0.0008 (0.0021) model time 0.5637 (0.4188) loss 6.0944 (7.3081) grad_norm 1.9925 (2.5259) loss_scale 2048.0000 (1163.5470) mem 14939MB [2024-07-25 02:15:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][530/625] eta 0:00:39 lr 0.000733 wd 0.0500 time 0.4356 (0.4205) data time 0.0006 (0.0021) model time 0.4350 (0.4192) loss 6.0604 (7.3096) grad_norm 1.8641 (inf) loss_scale 1024.0000 (1170.5612) mem 14939MB [2024-07-25 02:15:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][540/625] eta 0:00:35 lr 0.000733 wd 0.0500 time 0.3914 (0.4203) data time 0.0006 (0.0021) model time 0.3907 (0.4190) loss 8.6335 (7.3139) grad_norm 2.6507 (inf) loss_scale 1024.0000 (1167.8521) mem 14939MB [2024-07-25 02:15:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][550/625] eta 0:00:31 lr 0.000732 wd 0.0500 time 0.3965 (0.4200) data time 0.0008 (0.0020) model time 0.3956 (0.4186) loss 8.0573 (7.3137) grad_norm 1.9926 (inf) loss_scale 1024.0000 (1165.2414) mem 14939MB [2024-07-25 02:15:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][560/625] eta 0:00:27 lr 0.000732 wd 0.0500 time 0.4049 (0.4196) data time 0.0007 (0.0020) model time 0.4041 (0.4183) loss 8.0981 (7.3192) grad_norm 2.7256 (inf) loss_scale 1024.0000 (1162.7237) mem 14939MB [2024-07-25 02:15:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][570/625] eta 0:00:23 lr 0.000732 wd 0.0500 time 0.3945 (0.4194) data time 0.0008 (0.0020) model time 0.3938 (0.4180) loss 8.6281 (7.3310) grad_norm 1.7669 (inf) loss_scale 1024.0000 (1160.2942) mem 14939MB [2024-07-25 02:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][580/625] eta 0:00:18 lr 0.000732 wd 0.0500 time 0.4055 (0.4191) data time 0.0008 (0.0020) model time 0.4047 (0.4177) loss 7.8214 (7.3376) grad_norm 3.0576 (inf) loss_scale 1024.0000 (1157.9484) mem 14939MB [2024-07-25 02:16:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][590/625] eta 0:00:14 lr 0.000732 wd 0.0500 time 0.4172 (0.4188) data time 0.0006 (0.0020) model time 0.4166 (0.4174) loss 6.1509 (7.3379) grad_norm 2.2103 (inf) loss_scale 1024.0000 (1155.6819) mem 14939MB [2024-07-25 02:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][600/625] eta 0:00:10 lr 0.000732 wd 0.0500 time 0.3929 (0.4185) data time 0.0007 (0.0020) model time 0.3922 (0.4171) loss 7.0738 (7.3367) grad_norm 2.5376 (inf) loss_scale 1024.0000 (1153.4908) mem 14939MB [2024-07-25 02:16:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][610/625] eta 0:00:06 lr 0.000732 wd 0.0500 time 0.3961 (0.4183) data time 0.0006 (0.0019) model time 0.3956 (0.4168) loss 7.4218 (7.3390) grad_norm 2.1077 (inf) loss_scale 1024.0000 (1151.3715) mem 14939MB [2024-07-25 02:16:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][620/625] eta 0:00:02 lr 0.000732 wd 0.0500 time 0.4149 (0.4181) data time 0.0006 (0.0019) model time 0.4144 (0.4166) loss 7.6138 (7.3325) grad_norm 4.2852 (inf) loss_scale 1024.0000 (1149.3205) mem 14939MB [2024-07-25 02:16:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 140 training takes 0:04:21 [2024-07-25 02:16:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:16:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:16:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5938 (0.5938) Acc@1 88.379 (88.379) Acc@5 98.145 (98.145) Mem 14939MB [2024-07-25 02:16:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9844 (0.7380) Acc@1 78.418 (84.877) Acc@5 95.264 (97.208) Mem 14939MB [2024-07-25 02:16:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0938 (0.8696) Acc@1 74.902 (81.450) Acc@5 93.701 (95.731) Mem 14939MB [2024-07-25 02:16:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.034 Acc@5 95.675 [2024-07-25 02:16:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 02:16:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.03% [2024-07-25 02:16:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 02:16:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 02:16:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5684 (0.5684) Acc@1 89.014 (89.014) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 02:16:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9243 (0.7140) Acc@1 79.834 (85.458) Acc@5 95.605 (97.492) Mem 14939MB [2024-07-25 02:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0596 (0.8431) Acc@1 75.488 (81.989) Acc@5 94.336 (96.140) Mem 14939MB [2024-07-25 02:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.604 Acc@5 96.113 [2024-07-25 02:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 02:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.60% [2024-07-25 02:16:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:16:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:16:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][0/625] eta 0:08:36 lr 0.000732 wd 0.0500 time 0.8264 (0.8264) data time 0.4209 (0.4209) model time 0.0000 (0.0000) loss 7.6064 (7.6064) grad_norm 2.9901 (2.9901) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:16:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][10/625] eta 0:04:29 lr 0.000732 wd 0.0500 time 0.3919 (0.4382) data time 0.0007 (0.0390) model time 0.0000 (0.0000) loss 7.5316 (7.2011) grad_norm 2.1813 (2.8784) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:16:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][20/625] eta 0:04:14 lr 0.000731 wd 0.0500 time 0.4005 (0.4200) data time 0.0006 (0.0211) model time 0.0000 (0.0000) loss 6.6602 (7.2301) grad_norm 1.6448 (2.7145) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:16:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][30/625] eta 0:04:09 lr 0.000731 wd 0.0500 time 0.3951 (0.4200) data time 0.0007 (0.0146) model time 0.0000 (0.0000) loss 7.9095 (7.2131) grad_norm 1.8222 (2.5999) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:16:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][40/625] eta 0:04:02 lr 0.000731 wd 0.0500 time 0.3939 (0.4149) data time 0.0007 (0.0113) model time 0.0000 (0.0000) loss 8.5002 (7.1337) grad_norm 1.6637 (2.5364) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:16:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][50/625] eta 0:03:56 lr 0.000731 wd 0.0500 time 0.4047 (0.4120) data time 0.0024 (0.0093) model time 0.0000 (0.0000) loss 7.3544 (7.1390) grad_norm 1.6796 (2.4258) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:16:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][60/625] eta 0:03:51 lr 0.000731 wd 0.0500 time 0.3966 (0.4098) data time 0.0007 (0.0079) model time 0.3959 (0.3978) loss 8.3181 (7.1688) grad_norm 1.8594 (2.3567) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:16:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][70/625] eta 0:03:46 lr 0.000731 wd 0.0500 time 0.3936 (0.4089) data time 0.0008 (0.0069) model time 0.3928 (0.4001) loss 8.3808 (7.1402) grad_norm 2.2574 (2.4110) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][80/625] eta 0:03:43 lr 0.000731 wd 0.0500 time 0.5838 (0.4104) data time 0.0008 (0.0062) model time 0.5830 (0.4067) loss 6.7610 (7.1379) grad_norm 1.6262 (2.3901) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][90/625] eta 0:03:39 lr 0.000731 wd 0.0500 time 0.4016 (0.4106) data time 0.0009 (0.0056) model time 0.4007 (0.4080) loss 7.0334 (7.1936) grad_norm 3.0402 (2.4676) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][100/625] eta 0:03:39 lr 0.000731 wd 0.0500 time 0.5889 (0.4182) data time 0.0008 (0.0051) model time 0.5882 (0.4236) loss 6.3064 (7.1889) grad_norm 3.7522 (2.4726) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][110/625] eta 0:03:37 lr 0.000731 wd 0.0500 time 0.5650 (0.4229) data time 0.0006 (0.0048) model time 0.5643 (0.4313) loss 6.8418 (7.1980) grad_norm 3.4120 (2.4684) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][120/625] eta 0:03:35 lr 0.000730 wd 0.0500 time 0.5465 (0.4266) data time 0.0006 (0.0044) model time 0.5459 (0.4362) loss 6.5901 (7.2153) grad_norm 3.6191 (2.4937) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][130/625] eta 0:03:30 lr 0.000730 wd 0.0500 time 0.4145 (0.4262) data time 0.0006 (0.0042) model time 0.4140 (0.4343) loss 6.3661 (7.2066) grad_norm 3.3713 (2.4858) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][140/625] eta 0:03:25 lr 0.000730 wd 0.0500 time 0.3943 (0.4246) data time 0.0009 (0.0039) model time 0.3934 (0.4308) loss 6.9246 (7.1752) grad_norm 1.6948 (2.4878) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][150/625] eta 0:03:20 lr 0.000730 wd 0.0500 time 0.3966 (0.4229) data time 0.0008 (0.0038) model time 0.3958 (0.4275) loss 7.1432 (7.1812) grad_norm 2.0165 (2.4716) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][160/625] eta 0:03:16 lr 0.000730 wd 0.0500 time 0.4064 (0.4217) data time 0.0006 (0.0036) model time 0.4057 (0.4253) loss 7.2716 (7.1854) grad_norm 1.5222 (2.4512) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][170/625] eta 0:03:11 lr 0.000730 wd 0.0500 time 0.3960 (0.4205) data time 0.0008 (0.0034) model time 0.3952 (0.4232) loss 7.3634 (7.1563) grad_norm 2.2604 (2.4381) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][180/625] eta 0:03:06 lr 0.000730 wd 0.0500 time 0.3990 (0.4197) data time 0.0009 (0.0033) model time 0.3981 (0.4218) loss 8.6109 (7.1785) grad_norm 2.6206 (2.4108) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][190/625] eta 0:03:02 lr 0.000730 wd 0.0500 time 0.4402 (0.4191) data time 0.0007 (0.0032) model time 0.4395 (0.4207) loss 5.9614 (7.1607) grad_norm 2.0150 (2.4196) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][200/625] eta 0:02:57 lr 0.000730 wd 0.0500 time 0.3940 (0.4186) data time 0.0008 (0.0031) model time 0.3932 (0.4199) loss 7.3717 (7.1574) grad_norm 2.8801 (2.4614) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][210/625] eta 0:02:53 lr 0.000729 wd 0.0500 time 0.4008 (0.4178) data time 0.0007 (0.0030) model time 0.4001 (0.4186) loss 6.8069 (7.1694) grad_norm 1.8722 (2.4703) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][220/625] eta 0:02:48 lr 0.000729 wd 0.0500 time 0.4074 (0.4172) data time 0.0008 (0.0029) model time 0.4066 (0.4177) loss 7.7368 (7.1864) grad_norm 3.2360 (2.4865) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][230/625] eta 0:02:44 lr 0.000729 wd 0.0500 time 0.3944 (0.4164) data time 0.0007 (0.0028) model time 0.3937 (0.4167) loss 5.4295 (7.1752) grad_norm 1.3714 (2.4943) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][240/625] eta 0:02:40 lr 0.000729 wd 0.0500 time 0.3968 (0.4158) data time 0.0008 (0.0027) model time 0.3960 (0.4158) loss 7.4427 (7.1845) grad_norm 2.1520 (2.4920) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][250/625] eta 0:02:35 lr 0.000729 wd 0.0500 time 0.4077 (0.4158) data time 0.0007 (0.0027) model time 0.4071 (0.4158) loss 5.6605 (7.1835) grad_norm 2.2185 (2.4927) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][260/625] eta 0:02:31 lr 0.000729 wd 0.0500 time 0.3946 (0.4154) data time 0.0008 (0.0026) model time 0.3938 (0.4152) loss 5.7778 (7.1632) grad_norm 1.7298 (2.4868) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][270/625] eta 0:02:27 lr 0.000729 wd 0.0500 time 0.4005 (0.4148) data time 0.0007 (0.0025) model time 0.3999 (0.4145) loss 6.8625 (7.1837) grad_norm 2.5106 (2.4960) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][280/625] eta 0:02:22 lr 0.000729 wd 0.0500 time 0.4076 (0.4143) data time 0.0008 (0.0025) model time 0.4068 (0.4139) loss 7.7573 (7.1808) grad_norm 3.4873 (2.5040) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][290/625] eta 0:02:18 lr 0.000729 wd 0.0500 time 0.3964 (0.4139) data time 0.0009 (0.0024) model time 0.3956 (0.4133) loss 7.6260 (7.1748) grad_norm 1.7309 (2.5088) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][300/625] eta 0:02:14 lr 0.000729 wd 0.0500 time 0.3980 (0.4134) data time 0.0007 (0.0024) model time 0.3973 (0.4127) loss 7.8948 (7.1770) grad_norm 2.3074 (2.4936) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][310/625] eta 0:02:10 lr 0.000728 wd 0.0500 time 0.4076 (0.4136) data time 0.0009 (0.0023) model time 0.4067 (0.4129) loss 7.1812 (7.1665) grad_norm 1.6204 (2.4752) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][320/625] eta 0:02:06 lr 0.000728 wd 0.0500 time 0.5831 (0.4155) data time 0.0007 (0.0023) model time 0.5824 (0.4152) loss 7.4452 (7.1757) grad_norm 2.9026 (2.4656) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][330/625] eta 0:02:03 lr 0.000728 wd 0.0500 time 0.3964 (0.4177) data time 0.0007 (0.0022) model time 0.3958 (0.4178) loss 6.2306 (7.1758) grad_norm 2.8120 (2.4673) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][340/625] eta 0:01:59 lr 0.000728 wd 0.0500 time 0.6032 (0.4189) data time 0.0006 (0.0022) model time 0.6025 (0.4191) loss 5.9526 (7.1808) grad_norm 3.3702 (2.4730) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][350/625] eta 0:01:55 lr 0.000728 wd 0.0500 time 0.4079 (0.4195) data time 0.0007 (0.0022) model time 0.4072 (0.4199) loss 8.1720 (7.1793) grad_norm 3.4090 (2.4821) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:18:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][360/625] eta 0:01:51 lr 0.000728 wd 0.0500 time 0.3968 (0.4190) data time 0.0008 (0.0021) model time 0.3960 (0.4192) loss 6.5345 (7.1906) grad_norm 3.9561 (2.4991) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][370/625] eta 0:01:46 lr 0.000728 wd 0.0500 time 0.3988 (0.4185) data time 0.0006 (0.0021) model time 0.3982 (0.4186) loss 9.0629 (7.1959) grad_norm 2.4822 (2.5014) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][380/625] eta 0:01:42 lr 0.000728 wd 0.0500 time 0.3997 (0.4180) data time 0.0007 (0.0021) model time 0.3990 (0.4180) loss 8.3150 (7.1922) grad_norm 2.6196 (2.4915) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][390/625] eta 0:01:38 lr 0.000728 wd 0.0500 time 0.3941 (0.4176) data time 0.0007 (0.0020) model time 0.3934 (0.4175) loss 7.8019 (7.2006) grad_norm 1.5170 (2.4783) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][400/625] eta 0:01:33 lr 0.000727 wd 0.0500 time 0.3986 (0.4171) data time 0.0006 (0.0020) model time 0.3980 (0.4169) loss 8.3889 (7.1984) grad_norm 1.8878 (2.4759) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][410/625] eta 0:01:29 lr 0.000727 wd 0.0500 time 0.4040 (0.4167) data time 0.0009 (0.0020) model time 0.4031 (0.4165) loss 7.4799 (7.2112) grad_norm 4.6784 (2.4768) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][420/625] eta 0:01:25 lr 0.000727 wd 0.0500 time 0.3939 (0.4163) data time 0.0009 (0.0020) model time 0.3930 (0.4160) loss 5.9273 (7.2180) grad_norm 3.0394 (2.4883) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][430/625] eta 0:01:21 lr 0.000727 wd 0.0500 time 0.3999 (0.4160) data time 0.0008 (0.0019) model time 0.3991 (0.4156) loss 6.7454 (7.2059) grad_norm 1.9069 (2.4885) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][440/625] eta 0:01:16 lr 0.000727 wd 0.0500 time 0.4022 (0.4157) data time 0.0008 (0.0019) model time 0.4014 (0.4152) loss 7.7438 (7.2108) grad_norm 2.7371 (2.4777) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][450/625] eta 0:01:12 lr 0.000727 wd 0.0500 time 0.3955 (0.4153) data time 0.0008 (0.0019) model time 0.3947 (0.4148) loss 8.1901 (7.2164) grad_norm 1.9510 (2.4761) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][460/625] eta 0:01:08 lr 0.000727 wd 0.0500 time 0.4153 (0.4150) data time 0.0009 (0.0019) model time 0.4144 (0.4144) loss 7.8487 (7.2270) grad_norm 2.9747 (2.4729) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][470/625] eta 0:01:04 lr 0.000727 wd 0.0500 time 0.4087 (0.4150) data time 0.0006 (0.0018) model time 0.4080 (0.4144) loss 7.5544 (7.2261) grad_norm 2.8275 (2.4727) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][480/625] eta 0:01:00 lr 0.000727 wd 0.0500 time 0.3951 (0.4147) data time 0.0007 (0.0018) model time 0.3944 (0.4141) loss 8.0953 (7.2284) grad_norm 3.3935 (2.4719) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][490/625] eta 0:00:55 lr 0.000727 wd 0.0500 time 0.3986 (0.4145) data time 0.0006 (0.0018) model time 0.3980 (0.4138) loss 7.8840 (7.2302) grad_norm 1.7725 (2.4751) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][500/625] eta 0:00:51 lr 0.000726 wd 0.0500 time 0.4039 (0.4142) data time 0.0008 (0.0018) model time 0.4031 (0.4135) loss 5.3077 (7.2273) grad_norm 1.6393 (2.4687) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:19:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][510/625] eta 0:00:47 lr 0.000726 wd 0.0500 time 0.3975 (0.4140) data time 0.0007 (0.0018) model time 0.3969 (0.4132) loss 6.1083 (7.2223) grad_norm 2.2922 (2.4637) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][520/625] eta 0:00:43 lr 0.000726 wd 0.0500 time 0.3997 (0.4137) data time 0.0008 (0.0018) model time 0.3989 (0.4129) loss 7.2390 (7.2201) grad_norm 1.7515 (2.4552) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][530/625] eta 0:00:39 lr 0.000726 wd 0.0500 time 0.4045 (0.4138) data time 0.0008 (0.0018) model time 0.4037 (0.4129) loss 7.8991 (7.2286) grad_norm 2.3910 (2.4464) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][540/625] eta 0:00:35 lr 0.000726 wd 0.0500 time 0.5926 (0.4150) data time 0.0008 (0.0018) model time 0.5918 (0.4143) loss 7.0873 (7.2306) grad_norm 2.9145 (2.4416) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][550/625] eta 0:00:31 lr 0.000726 wd 0.0500 time 0.6101 (0.4166) data time 0.0009 (0.0018) model time 0.6092 (0.4160) loss 7.7139 (7.2280) grad_norm 1.8796 (2.4422) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][560/625] eta 0:00:27 lr 0.000726 wd 0.0500 time 0.5950 (0.4179) data time 0.0009 (0.0018) model time 0.5941 (0.4174) loss 6.3002 (7.2284) grad_norm 2.0056 (2.4491) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][570/625] eta 0:00:22 lr 0.000726 wd 0.0500 time 0.4080 (0.4178) data time 0.0008 (0.0018) model time 0.4072 (0.4174) loss 7.3134 (7.2387) grad_norm 2.1996 (2.4455) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][580/625] eta 0:00:18 lr 0.000726 wd 0.0500 time 0.4086 (0.4175) data time 0.0008 (0.0018) model time 0.4078 (0.4170) loss 6.5859 (7.2366) grad_norm 1.7902 (2.4397) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][590/625] eta 0:00:14 lr 0.000726 wd 0.0500 time 0.3941 (0.4172) data time 0.0007 (0.0017) model time 0.3934 (0.4167) loss 6.7903 (7.2356) grad_norm 2.5264 (2.4379) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][600/625] eta 0:00:10 lr 0.000725 wd 0.0500 time 0.4090 (0.4170) data time 0.0008 (0.0017) model time 0.4082 (0.4164) loss 7.6592 (7.2351) grad_norm 2.3853 (2.4399) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][610/625] eta 0:00:06 lr 0.000725 wd 0.0500 time 0.4066 (0.4168) data time 0.0004 (0.0018) model time 0.4062 (0.4162) loss 6.3433 (7.2340) grad_norm 4.5782 (2.4599) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][620/625] eta 0:00:02 lr 0.000725 wd 0.0500 time 0.3974 (0.4165) data time 0.0006 (0.0017) model time 0.3968 (0.4159) loss 7.2936 (7.2337) grad_norm 2.7541 (2.4659) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:20:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 141 training takes 0:04:20 [2024-07-25 02:20:48 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:20:48 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:20:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5864 (0.5864) Acc@1 88.623 (88.623) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 02:20:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.9395 (0.7360) Acc@1 79.541 (84.952) Acc@5 95.459 (97.337) Mem 14939MB [2024-07-25 02:20:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.0762 (0.8737) Acc@1 75.635 (81.336) Acc@5 94.043 (95.894) Mem 14939MB [2024-07-25 02:20:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.928 Acc@5 95.839 [2024-07-25 02:20:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 02:20:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.790 (0.790) Loss 0.5684 (0.5684) Acc@1 88.965 (88.965) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 02:20:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.159) Loss 0.9204 (0.7127) Acc@1 79.883 (85.485) Acc@5 95.703 (97.488) Mem 14939MB [2024-07-25 02:20:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 1.0576 (0.8416) Acc@1 75.391 (82.003) Acc@5 94.385 (96.159) Mem 14939MB [2024-07-25 02:20:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.622 Acc@5 96.129 [2024-07-25 02:20:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 02:20:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.62% [2024-07-25 02:20:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:20:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:20:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][0/625] eta 0:08:05 lr 0.000725 wd 0.0500 time 0.7769 (0.7769) data time 0.4011 (0.4011) model time 0.0000 (0.0000) loss 6.8822 (6.8822) grad_norm 3.3614 (3.3614) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][10/625] eta 0:04:27 lr 0.000725 wd 0.0500 time 0.4093 (0.4353) data time 0.0007 (0.0373) model time 0.0000 (0.0000) loss 5.9845 (7.3048) grad_norm 1.8390 (2.3940) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][20/625] eta 0:04:13 lr 0.000725 wd 0.0500 time 0.3946 (0.4182) data time 0.0008 (0.0200) model time 0.0000 (0.0000) loss 7.5623 (7.2475) grad_norm 1.4845 (2.2033) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][30/625] eta 0:04:05 lr 0.000725 wd 0.0500 time 0.4001 (0.4126) data time 0.0007 (0.0141) model time 0.0000 (0.0000) loss 7.0147 (7.1900) grad_norm 1.5763 (2.1023) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][40/625] eta 0:03:59 lr 0.000725 wd 0.0500 time 0.4112 (0.4099) data time 0.0009 (0.0109) model time 0.0000 (0.0000) loss 7.7640 (7.1972) grad_norm 2.0740 (2.2058) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][50/625] eta 0:03:54 lr 0.000725 wd 0.0500 time 0.3991 (0.4084) data time 0.0008 (0.0089) model time 0.0000 (0.0000) loss 7.7939 (7.2348) grad_norm 2.1538 (2.2891) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][60/625] eta 0:03:50 lr 0.000725 wd 0.0500 time 0.3945 (0.4084) data time 0.0008 (0.0076) model time 0.3937 (0.4074) loss 7.7919 (7.2283) grad_norm 1.8115 (2.2601) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][70/625] eta 0:03:46 lr 0.000724 wd 0.0500 time 0.4108 (0.4075) data time 0.0006 (0.0067) model time 0.4102 (0.4044) loss 6.3704 (7.2039) grad_norm 4.2754 (2.3705) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][80/625] eta 0:03:41 lr 0.000724 wd 0.0500 time 0.3964 (0.4065) data time 0.0006 (0.0060) model time 0.3958 (0.4024) loss 8.3776 (7.1750) grad_norm 2.0503 (2.5335) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][90/625] eta 0:03:37 lr 0.000724 wd 0.0500 time 0.4053 (0.4062) data time 0.0008 (0.0055) model time 0.4046 (0.4024) loss 7.8524 (7.1895) grad_norm 2.1150 (2.6276) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][100/625] eta 0:03:32 lr 0.000724 wd 0.0500 time 0.4073 (0.4057) data time 0.0008 (0.0050) model time 0.4065 (0.4020) loss 8.5363 (7.1612) grad_norm 2.1366 (2.6761) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][110/625] eta 0:03:28 lr 0.000724 wd 0.0500 time 0.3950 (0.4051) data time 0.0009 (0.0046) model time 0.3941 (0.4013) loss 8.0385 (7.1458) grad_norm 1.5323 (2.6698) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][120/625] eta 0:03:24 lr 0.000724 wd 0.0500 time 0.3957 (0.4058) data time 0.0009 (0.0043) model time 0.3948 (0.4029) loss 7.7724 (7.1222) grad_norm 2.1944 (2.6116) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][130/625] eta 0:03:21 lr 0.000724 wd 0.0500 time 0.5489 (0.4079) data time 0.0008 (0.0041) model time 0.5480 (0.4066) loss 7.0119 (7.1508) grad_norm 2.6158 (2.6445) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][140/625] eta 0:03:21 lr 0.000724 wd 0.0500 time 0.4028 (0.4146) data time 0.0009 (0.0039) model time 0.4018 (0.4171) loss 6.4743 (7.1488) grad_norm 4.2090 (2.6300) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:21:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][150/625] eta 0:03:19 lr 0.000724 wd 0.0500 time 0.6221 (0.4194) data time 0.0008 (0.0037) model time 0.6212 (0.4240) loss 7.7683 (7.1799) grad_norm 2.5256 (2.6332) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][160/625] eta 0:03:16 lr 0.000723 wd 0.0500 time 0.4003 (0.4229) data time 0.0007 (0.0035) model time 0.3996 (0.4286) loss 7.7369 (7.2155) grad_norm 1.9599 (2.6383) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][170/625] eta 0:03:11 lr 0.000723 wd 0.0500 time 0.3979 (0.4215) data time 0.0009 (0.0033) model time 0.3970 (0.4261) loss 7.6817 (7.2418) grad_norm 2.1150 (2.6583) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][180/625] eta 0:03:07 lr 0.000723 wd 0.0500 time 0.3993 (0.4203) data time 0.0008 (0.0032) model time 0.3985 (0.4241) loss 6.6083 (7.2434) grad_norm 2.0403 (2.6676) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][190/625] eta 0:03:02 lr 0.000723 wd 0.0500 time 0.4036 (0.4193) data time 0.0006 (0.0031) model time 0.4030 (0.4223) loss 7.0582 (7.2463) grad_norm 1.9522 (2.6422) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][200/625] eta 0:02:57 lr 0.000723 wd 0.0500 time 0.3996 (0.4183) data time 0.0008 (0.0030) model time 0.3988 (0.4207) loss 8.5607 (7.2299) grad_norm 1.5425 (2.6226) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][210/625] eta 0:02:53 lr 0.000723 wd 0.0500 time 0.3982 (0.4175) data time 0.0007 (0.0029) model time 0.3975 (0.4195) loss 7.8035 (7.2471) grad_norm 2.0273 (2.6011) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][220/625] eta 0:02:48 lr 0.000723 wd 0.0500 time 0.4175 (0.4168) data time 0.0007 (0.0028) model time 0.4169 (0.4184) loss 6.6945 (7.2509) grad_norm 1.9278 (2.5871) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][230/625] eta 0:02:44 lr 0.000723 wd 0.0500 time 0.3979 (0.4167) data time 0.0006 (0.0027) model time 0.3973 (0.4181) loss 7.9118 (7.2691) grad_norm 2.2426 (2.5918) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][240/625] eta 0:02:40 lr 0.000723 wd 0.0500 time 0.3986 (0.4161) data time 0.0008 (0.0026) model time 0.3978 (0.4173) loss 6.6030 (7.2688) grad_norm 3.3647 (2.5921) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][250/625] eta 0:02:35 lr 0.000723 wd 0.0500 time 0.4071 (0.4155) data time 0.0007 (0.0026) model time 0.4064 (0.4164) loss 7.0163 (7.2591) grad_norm 1.9947 (2.6026) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][260/625] eta 0:02:31 lr 0.000722 wd 0.0500 time 0.3973 (0.4149) data time 0.0008 (0.0025) model time 0.3965 (0.4155) loss 7.0432 (7.2567) grad_norm 1.8556 (2.5887) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][270/625] eta 0:02:27 lr 0.000722 wd 0.0500 time 0.3974 (0.4143) data time 0.0006 (0.0024) model time 0.3968 (0.4148) loss 8.5806 (7.2633) grad_norm 2.5498 (2.5694) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][280/625] eta 0:02:22 lr 0.000722 wd 0.0500 time 0.4028 (0.4138) data time 0.0008 (0.0024) model time 0.4020 (0.4141) loss 6.2231 (7.2560) grad_norm 2.3066 (2.5510) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][290/625] eta 0:02:18 lr 0.000722 wd 0.0500 time 0.3989 (0.4135) data time 0.0008 (0.0023) model time 0.3982 (0.4137) loss 7.9541 (7.2581) grad_norm 2.7348 (2.5519) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:22:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][300/625] eta 0:02:14 lr 0.000722 wd 0.0500 time 0.4006 (0.4131) data time 0.0006 (0.0023) model time 0.4000 (0.4132) loss 7.9449 (7.2611) grad_norm 2.2787 (2.5571) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][310/625] eta 0:02:10 lr 0.000722 wd 0.0500 time 0.4065 (0.4129) data time 0.0006 (0.0022) model time 0.4059 (0.4128) loss 8.3161 (7.2564) grad_norm 2.2466 (2.5455) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][320/625] eta 0:02:05 lr 0.000722 wd 0.0500 time 0.4012 (0.4125) data time 0.0008 (0.0022) model time 0.4004 (0.4124) loss 7.4967 (7.2634) grad_norm 2.6897 (2.5376) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][330/625] eta 0:02:01 lr 0.000722 wd 0.0500 time 0.4027 (0.4122) data time 0.0007 (0.0022) model time 0.4020 (0.4119) loss 7.7921 (7.2737) grad_norm 2.4630 (2.5443) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][340/625] eta 0:01:57 lr 0.000722 wd 0.0500 time 0.4050 (0.4124) data time 0.0008 (0.0021) model time 0.4042 (0.4121) loss 7.2948 (7.2708) grad_norm 3.1963 (2.5467) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][350/625] eta 0:01:53 lr 0.000721 wd 0.0500 time 0.5835 (0.4129) data time 0.0009 (0.0021) model time 0.5825 (0.4128) loss 7.5938 (7.2647) grad_norm 2.1153 (2.5506) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][360/625] eta 0:01:50 lr 0.000721 wd 0.0500 time 0.5746 (0.4154) data time 0.0006 (0.0021) model time 0.5740 (0.4157) loss 8.2870 (7.2726) grad_norm 4.2806 (2.5629) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][370/625] eta 0:01:46 lr 0.000721 wd 0.0500 time 0.6328 (0.4173) data time 0.0006 (0.0020) model time 0.6322 (0.4178) loss 7.7812 (7.2686) grad_norm 2.4374 (2.5797) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][380/625] eta 0:01:42 lr 0.000721 wd 0.0500 time 0.4007 (0.4186) data time 0.0008 (0.0020) model time 0.3999 (0.4193) loss 6.8653 (7.2721) grad_norm 2.7532 (2.5866) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][390/625] eta 0:01:38 lr 0.000721 wd 0.0500 time 0.4471 (0.4188) data time 0.0008 (0.0020) model time 0.4463 (0.4194) loss 6.1473 (7.2780) grad_norm 3.5697 (2.5760) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][400/625] eta 0:01:34 lr 0.000721 wd 0.0500 time 0.4076 (0.4183) data time 0.0006 (0.0020) model time 0.4071 (0.4188) loss 7.4450 (7.2715) grad_norm 1.5809 (2.5677) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][410/625] eta 0:01:29 lr 0.000721 wd 0.0500 time 0.3992 (0.4179) data time 0.0008 (0.0019) model time 0.3984 (0.4183) loss 8.0213 (7.2804) grad_norm 2.2822 (2.5600) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][420/625] eta 0:01:25 lr 0.000721 wd 0.0500 time 0.4016 (0.4175) data time 0.0007 (0.0019) model time 0.4009 (0.4178) loss 8.2045 (7.2839) grad_norm 3.4599 (2.5576) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][430/625] eta 0:01:21 lr 0.000721 wd 0.0500 time 0.3977 (0.4170) data time 0.0007 (0.0019) model time 0.3969 (0.4173) loss 7.8333 (7.2907) grad_norm 3.0616 (2.5574) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:23:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][440/625] eta 0:01:17 lr 0.000721 wd 0.0500 time 0.4051 (0.4167) data time 0.0006 (0.0019) model time 0.4045 (0.4168) loss 5.8855 (7.2707) grad_norm 2.2226 (2.5523) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][450/625] eta 0:01:12 lr 0.000720 wd 0.0500 time 0.4044 (0.4167) data time 0.0008 (0.0018) model time 0.4036 (0.4168) loss 7.4961 (7.2697) grad_norm 2.5523 (2.5480) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][460/625] eta 0:01:08 lr 0.000720 wd 0.0500 time 0.4020 (0.4164) data time 0.0007 (0.0018) model time 0.4013 (0.4164) loss 6.2366 (7.2689) grad_norm 2.4020 (2.5466) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][470/625] eta 0:01:04 lr 0.000720 wd 0.0500 time 0.4026 (0.4161) data time 0.0006 (0.0018) model time 0.4020 (0.4160) loss 6.1145 (7.2622) grad_norm 3.0831 (2.5411) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][480/625] eta 0:01:00 lr 0.000720 wd 0.0500 time 0.4041 (0.4158) data time 0.0008 (0.0018) model time 0.4033 (0.4157) loss 8.1986 (7.2609) grad_norm 2.9093 (2.5417) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][490/625] eta 0:00:56 lr 0.000720 wd 0.0500 time 0.4048 (0.4155) data time 0.0007 (0.0018) model time 0.4040 (0.4154) loss 6.6103 (7.2587) grad_norm 1.7660 (2.5414) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][500/625] eta 0:00:51 lr 0.000720 wd 0.0500 time 0.4011 (0.4152) data time 0.0005 (0.0017) model time 0.4005 (0.4150) loss 7.0227 (7.2654) grad_norm 2.4309 (2.5421) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][510/625] eta 0:00:47 lr 0.000720 wd 0.0500 time 0.4004 (0.4149) data time 0.0010 (0.0017) model time 0.3994 (0.4147) loss 8.1867 (7.2642) grad_norm 2.2832 (2.5418) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][520/625] eta 0:00:43 lr 0.000720 wd 0.0500 time 0.3990 (0.4146) data time 0.0008 (0.0017) model time 0.3982 (0.4144) loss 6.7015 (7.2642) grad_norm 2.5314 (2.5612) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][530/625] eta 0:00:39 lr 0.000720 wd 0.0500 time 0.3988 (0.4144) data time 0.0008 (0.0017) model time 0.3980 (0.4141) loss 7.3267 (7.2678) grad_norm 2.8428 (2.5554) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][540/625] eta 0:00:35 lr 0.000720 wd 0.0500 time 0.4012 (0.4141) data time 0.0007 (0.0017) model time 0.4006 (0.4138) loss 7.6023 (7.2595) grad_norm 2.6491 (2.5529) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][550/625] eta 0:00:31 lr 0.000719 wd 0.0500 time 0.3982 (0.4139) data time 0.0007 (0.0017) model time 0.3975 (0.4135) loss 7.6828 (7.2624) grad_norm 2.5632 (2.5592) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][560/625] eta 0:00:26 lr 0.000719 wd 0.0500 time 0.5751 (0.4140) data time 0.0007 (0.0017) model time 0.5745 (0.4136) loss 7.3165 (7.2647) grad_norm 2.5245 (2.5688) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][570/625] eta 0:00:22 lr 0.000719 wd 0.0500 time 0.4030 (0.4140) data time 0.0007 (0.0016) model time 0.4023 (0.4136) loss 7.8688 (7.2675) grad_norm 2.2796 (2.5674) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:24:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][580/625] eta 0:00:18 lr 0.000719 wd 0.0500 time 0.6064 (0.4156) data time 0.0006 (0.0016) model time 0.6057 (0.4154) loss 8.1236 (7.2720) grad_norm 2.5099 (2.5607) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][590/625] eta 0:00:14 lr 0.000719 wd 0.0500 time 0.3970 (0.4173) data time 0.0006 (0.0016) model time 0.3964 (0.4172) loss 7.4028 (7.2785) grad_norm 2.5477 (2.5519) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][600/625] eta 0:00:10 lr 0.000719 wd 0.0500 time 0.4005 (0.4178) data time 0.0006 (0.0016) model time 0.3999 (0.4177) loss 6.7457 (7.2786) grad_norm 2.5243 (2.5544) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][610/625] eta 0:00:06 lr 0.000719 wd 0.0500 time 0.4340 (0.4179) data time 0.0006 (0.0016) model time 0.4334 (0.4179) loss 7.9702 (7.2782) grad_norm 2.7747 (2.5545) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][620/625] eta 0:00:02 lr 0.000719 wd 0.0500 time 0.3950 (0.4176) data time 0.0004 (0.0016) model time 0.3946 (0.4175) loss 7.5739 (7.2786) grad_norm 4.7187 (2.5934) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 142 training takes 0:04:20 [2024-07-25 02:25:16 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:25:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:25:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.5879 (0.5879) Acc@1 88.037 (88.037) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 02:25:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.9609 (0.7310) Acc@1 78.174 (84.788) Acc@5 94.775 (97.252) Mem 14939MB [2024-07-25 02:25:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0742 (0.8670) Acc@1 75.635 (81.313) Acc@5 94.141 (95.803) Mem 14939MB [2024-07-25 02:25:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.918 Acc@5 95.771 [2024-07-25 02:25:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 02:25:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.743 (0.743) Loss 0.5664 (0.5664) Acc@1 89.014 (89.014) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 02:25:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.151) Loss 0.9185 (0.7119) Acc@1 80.078 (85.511) Acc@5 95.654 (97.470) Mem 14939MB [2024-07-25 02:25:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.120) Loss 1.0557 (0.8405) Acc@1 75.488 (82.050) Acc@5 94.434 (96.150) Mem 14939MB [2024-07-25 02:25:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.672 Acc@5 96.119 [2024-07-25 02:25:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 02:25:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.67% [2024-07-25 02:25:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:25:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:25:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][0/625] eta 0:08:11 lr 0.000719 wd 0.0500 time 0.7870 (0.7870) data time 0.3836 (0.3836) model time 0.0000 (0.0000) loss 7.5198 (7.5198) grad_norm 4.1566 (4.1566) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][10/625] eta 0:04:32 lr 0.000719 wd 0.0500 time 0.4163 (0.4433) data time 0.0008 (0.0356) model time 0.0000 (0.0000) loss 6.3058 (7.3560) grad_norm 2.6087 (2.7124) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][20/625] eta 0:04:16 lr 0.000718 wd 0.0500 time 0.3931 (0.4237) data time 0.0008 (0.0190) model time 0.0000 (0.0000) loss 7.9802 (7.4923) grad_norm 2.8144 (2.4678) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][30/625] eta 0:04:07 lr 0.000718 wd 0.0500 time 0.3981 (0.4159) data time 0.0009 (0.0133) model time 0.0000 (0.0000) loss 8.0593 (7.4877) grad_norm 3.3904 (2.6094) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][40/625] eta 0:04:01 lr 0.000718 wd 0.0500 time 0.4092 (0.4125) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 6.1983 (7.5164) grad_norm 2.7404 (2.5625) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][50/625] eta 0:03:55 lr 0.000718 wd 0.0500 time 0.3946 (0.4100) data time 0.0006 (0.0085) model time 0.0000 (0.0000) loss 7.7981 (7.5160) grad_norm 2.0316 (2.4612) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][60/625] eta 0:03:50 lr 0.000718 wd 0.0500 time 0.3967 (0.4083) data time 0.0007 (0.0073) model time 0.3960 (0.3989) loss 7.8853 (7.4989) grad_norm 1.9422 (2.4191) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][70/625] eta 0:03:46 lr 0.000718 wd 0.0500 time 0.4034 (0.4080) data time 0.0007 (0.0064) model time 0.4027 (0.4021) loss 6.3629 (7.4734) grad_norm 2.8394 (2.4706) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:25:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][80/625] eta 0:03:41 lr 0.000718 wd 0.0500 time 0.3931 (0.4069) data time 0.0009 (0.0057) model time 0.3922 (0.4006) loss 7.2805 (7.5011) grad_norm 2.7792 (2.4954) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][90/625] eta 0:03:37 lr 0.000718 wd 0.0500 time 0.3938 (0.4071) data time 0.0008 (0.0052) model time 0.3930 (0.4024) loss 7.7245 (7.5116) grad_norm 2.1385 (2.5643) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][100/625] eta 0:03:33 lr 0.000718 wd 0.0500 time 0.4059 (0.4065) data time 0.0007 (0.0047) model time 0.4051 (0.4020) loss 8.1988 (7.5079) grad_norm 3.1791 (2.5824) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][110/625] eta 0:03:29 lr 0.000717 wd 0.0500 time 0.3949 (0.4058) data time 0.0008 (0.0044) model time 0.3942 (0.4014) loss 8.6901 (7.4900) grad_norm 2.4562 (2.6547) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][120/625] eta 0:03:24 lr 0.000717 wd 0.0500 time 0.4008 (0.4058) data time 0.0006 (0.0041) model time 0.4002 (0.4019) loss 7.3591 (7.4682) grad_norm 1.9577 (2.6524) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][130/625] eta 0:03:21 lr 0.000717 wd 0.0500 time 0.4370 (0.4063) data time 0.0008 (0.0039) model time 0.4362 (0.4029) loss 7.2917 (7.4559) grad_norm 2.6521 (2.6258) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][140/625] eta 0:03:17 lr 0.000717 wd 0.0500 time 0.3961 (0.4064) data time 0.0007 (0.0037) model time 0.3954 (0.4033) loss 8.9182 (7.4501) grad_norm 3.8809 (2.6494) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][150/625] eta 0:03:13 lr 0.000717 wd 0.0500 time 0.3988 (0.4064) data time 0.0007 (0.0036) model time 0.3981 (0.4036) loss 6.7598 (7.4312) grad_norm 1.7807 (2.6276) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][160/625] eta 0:03:09 lr 0.000717 wd 0.0500 time 0.4127 (0.4070) data time 0.0006 (0.0034) model time 0.4121 (0.4045) loss 7.6712 (7.4217) grad_norm 2.9354 (2.6201) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][170/625] eta 0:03:06 lr 0.000717 wd 0.0500 time 0.5724 (0.4110) data time 0.0007 (0.0032) model time 0.5716 (0.4104) loss 6.3217 (7.4183) grad_norm 3.7730 (2.6576) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][180/625] eta 0:03:05 lr 0.000717 wd 0.0500 time 0.5672 (0.4164) data time 0.0008 (0.0031) model time 0.5664 (0.4179) loss 5.7083 (7.3987) grad_norm 2.4584 (2.6635) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][190/625] eta 0:03:03 lr 0.000717 wd 0.0500 time 0.3932 (0.4208) data time 0.0009 (0.0030) model time 0.3923 (0.4237) loss 5.6773 (7.3785) grad_norm 2.7876 (2.6458) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][200/625] eta 0:02:59 lr 0.000717 wd 0.0500 time 0.4023 (0.4233) data time 0.0008 (0.0029) model time 0.4015 (0.4268) loss 7.8556 (7.3812) grad_norm 1.7247 (2.6151) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][210/625] eta 0:02:55 lr 0.000716 wd 0.0500 time 0.3956 (0.4223) data time 0.0008 (0.0028) model time 0.3948 (0.4253) loss 6.7659 (7.3471) grad_norm 1.6402 (2.6201) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:26:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][220/625] eta 0:02:50 lr 0.000716 wd 0.0500 time 0.3981 (0.4219) data time 0.0006 (0.0027) model time 0.3975 (0.4245) loss 7.0256 (7.3300) grad_norm 4.0560 (2.6278) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][230/625] eta 0:02:46 lr 0.000716 wd 0.0500 time 0.4086 (0.4210) data time 0.0008 (0.0026) model time 0.4078 (0.4232) loss 8.3872 (7.3087) grad_norm 2.1170 (2.6226) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][240/625] eta 0:02:41 lr 0.000716 wd 0.0500 time 0.4168 (0.4202) data time 0.0008 (0.0026) model time 0.4160 (0.4220) loss 5.4954 (7.2958) grad_norm 6.5823 (2.6332) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][250/625] eta 0:02:37 lr 0.000716 wd 0.0500 time 0.3982 (0.4194) data time 0.0007 (0.0025) model time 0.3975 (0.4208) loss 7.7592 (7.3010) grad_norm 1.8293 (2.6296) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][260/625] eta 0:02:32 lr 0.000716 wd 0.0500 time 0.4072 (0.4187) data time 0.0009 (0.0025) model time 0.4063 (0.4198) loss 7.9886 (7.2921) grad_norm 3.4315 (2.6502) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][270/625] eta 0:02:28 lr 0.000716 wd 0.0500 time 0.3957 (0.4180) data time 0.0007 (0.0024) model time 0.3950 (0.4188) loss 7.0845 (7.2935) grad_norm 2.1924 (2.6421) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][280/625] eta 0:02:23 lr 0.000716 wd 0.0500 time 0.3956 (0.4173) data time 0.0009 (0.0024) model time 0.3946 (0.4179) loss 7.0258 (7.2887) grad_norm 1.6816 (2.6410) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][290/625] eta 0:02:19 lr 0.000716 wd 0.0500 time 0.4038 (0.4167) data time 0.0008 (0.0023) model time 0.4031 (0.4171) loss 7.0438 (7.2792) grad_norm 2.0712 (2.6340) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][300/625] eta 0:02:15 lr 0.000715 wd 0.0500 time 0.3945 (0.4162) data time 0.0008 (0.0023) model time 0.3937 (0.4164) loss 7.8367 (7.2922) grad_norm 2.0643 (2.6371) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][310/625] eta 0:02:10 lr 0.000715 wd 0.0500 time 0.3997 (0.4158) data time 0.0006 (0.0022) model time 0.3991 (0.4159) loss 7.4785 (7.2909) grad_norm 3.1159 (2.6464) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][320/625] eta 0:02:06 lr 0.000715 wd 0.0500 time 0.4410 (0.4156) data time 0.0008 (0.0022) model time 0.4402 (0.4157) loss 7.0445 (7.2995) grad_norm 1.7324 (2.6337) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][330/625] eta 0:02:02 lr 0.000715 wd 0.0500 time 0.3949 (0.4153) data time 0.0008 (0.0022) model time 0.3941 (0.4152) loss 8.4415 (7.3047) grad_norm 2.8205 (2.6247) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][340/625] eta 0:01:58 lr 0.000715 wd 0.0500 time 0.3988 (0.4150) data time 0.0008 (0.0022) model time 0.3979 (0.4148) loss 8.1028 (7.3054) grad_norm 3.1559 (2.6352) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][350/625] eta 0:01:54 lr 0.000715 wd 0.0500 time 0.4046 (0.4146) data time 0.0007 (0.0022) model time 0.4038 (0.4143) loss 7.7695 (7.3105) grad_norm 2.9108 (2.6264) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][360/625] eta 0:01:49 lr 0.000715 wd 0.0500 time 0.4032 (0.4142) data time 0.0009 (0.0021) model time 0.4023 (0.4138) loss 6.6569 (7.3033) grad_norm 2.5255 (2.6150) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:27:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][370/625] eta 0:01:45 lr 0.000715 wd 0.0500 time 0.3984 (0.4139) data time 0.0006 (0.0021) model time 0.3978 (0.4134) loss 7.1139 (7.2845) grad_norm 1.8124 (2.6064) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][380/625] eta 0:01:41 lr 0.000715 wd 0.0500 time 0.4016 (0.4139) data time 0.0006 (0.0021) model time 0.4010 (0.4134) loss 7.4501 (7.2874) grad_norm 1.9362 (2.5975) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][390/625] eta 0:01:37 lr 0.000715 wd 0.0500 time 0.5976 (0.4160) data time 0.0008 (0.0020) model time 0.5969 (0.4158) loss 7.9589 (7.2810) grad_norm 2.5552 (2.5959) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][400/625] eta 0:01:34 lr 0.000714 wd 0.0500 time 0.3926 (0.4183) data time 0.0006 (0.0020) model time 0.3919 (0.4184) loss 8.8381 (7.2877) grad_norm 2.1073 (2.5839) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][410/625] eta 0:01:30 lr 0.000714 wd 0.0500 time 0.4142 (0.4195) data time 0.0006 (0.0020) model time 0.4136 (0.4198) loss 6.5653 (7.2854) grad_norm 2.0722 (2.5759) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][420/625] eta 0:01:26 lr 0.000714 wd 0.0500 time 0.3997 (0.4207) data time 0.0006 (0.0020) model time 0.3992 (0.4211) loss 5.8665 (7.2805) grad_norm 1.7657 (2.5752) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][430/625] eta 0:01:21 lr 0.000714 wd 0.0500 time 0.4245 (0.4204) data time 0.0006 (0.0020) model time 0.4239 (0.4207) loss 8.3879 (7.2847) grad_norm 1.9345 (2.5681) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][440/625] eta 0:01:17 lr 0.000714 wd 0.0500 time 0.3785 (0.4204) data time 0.0007 (0.0020) model time 0.3779 (0.4207) loss 7.7835 (7.2932) grad_norm 3.6766 (2.5713) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][450/625] eta 0:01:13 lr 0.000714 wd 0.0500 time 0.4041 (0.4200) data time 0.0006 (0.0019) model time 0.4035 (0.4202) loss 7.2292 (7.2930) grad_norm 5.1510 (2.5728) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][460/625] eta 0:01:09 lr 0.000714 wd 0.0500 time 0.3966 (0.4196) data time 0.0006 (0.0019) model time 0.3960 (0.4197) loss 7.3086 (7.3016) grad_norm 2.9040 (2.5917) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][470/625] eta 0:01:04 lr 0.000714 wd 0.0500 time 0.3991 (0.4192) data time 0.0009 (0.0019) model time 0.3982 (0.4192) loss 6.1604 (7.2987) grad_norm 2.6326 (2.5909) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][480/625] eta 0:01:00 lr 0.000714 wd 0.0500 time 0.4010 (0.4188) data time 0.0008 (0.0019) model time 0.4002 (0.4187) loss 5.2607 (7.2869) grad_norm 2.1569 (2.5863) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][490/625] eta 0:00:56 lr 0.000713 wd 0.0500 time 0.3974 (0.4184) data time 0.0008 (0.0019) model time 0.3966 (0.4183) loss 7.2401 (7.2880) grad_norm 2.1307 (2.5791) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][500/625] eta 0:00:52 lr 0.000713 wd 0.0500 time 0.3961 (0.4180) data time 0.0009 (0.0018) model time 0.3953 (0.4179) loss 7.9414 (7.2807) grad_norm 2.7710 (2.5742) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:28:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][510/625] eta 0:00:48 lr 0.000713 wd 0.0500 time 0.3979 (0.4177) data time 0.0009 (0.0018) model time 0.3970 (0.4174) loss 7.4935 (7.2752) grad_norm 3.0947 (2.5725) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][520/625] eta 0:00:43 lr 0.000713 wd 0.0500 time 0.3976 (0.4173) data time 0.0006 (0.0018) model time 0.3970 (0.4170) loss 5.4658 (7.2766) grad_norm 3.1595 (2.5922) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][530/625] eta 0:00:39 lr 0.000713 wd 0.0500 time 0.3953 (0.4170) data time 0.0006 (0.0018) model time 0.3947 (0.4166) loss 7.3486 (7.2729) grad_norm 2.0659 (2.6117) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][540/625] eta 0:00:35 lr 0.000713 wd 0.0500 time 0.3993 (0.4166) data time 0.0006 (0.0018) model time 0.3986 (0.4163) loss 6.3082 (7.2680) grad_norm 2.2150 (2.6046) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][550/625] eta 0:00:31 lr 0.000713 wd 0.0500 time 0.3924 (0.4163) data time 0.0008 (0.0018) model time 0.3917 (0.4159) loss 6.8415 (7.2614) grad_norm 1.9304 (2.5982) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][560/625] eta 0:00:27 lr 0.000713 wd 0.0500 time 0.3959 (0.4160) data time 0.0008 (0.0017) model time 0.3951 (0.4156) loss 6.7917 (7.2629) grad_norm 2.5101 (2.5969) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][570/625] eta 0:00:22 lr 0.000713 wd 0.0500 time 0.4062 (0.4158) data time 0.0006 (0.0017) model time 0.4055 (0.4153) loss 7.0780 (7.2592) grad_norm 4.2380 (2.5955) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][580/625] eta 0:00:18 lr 0.000713 wd 0.0500 time 0.3962 (0.4155) data time 0.0008 (0.0017) model time 0.3954 (0.4149) loss 7.0003 (7.2668) grad_norm 1.8169 (2.5942) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][590/625] eta 0:00:14 lr 0.000712 wd 0.0500 time 0.3982 (0.4152) data time 0.0007 (0.0017) model time 0.3975 (0.4147) loss 8.0025 (7.2737) grad_norm 1.4446 (2.5829) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][600/625] eta 0:00:10 lr 0.000712 wd 0.0500 time 0.4048 (0.4153) data time 0.0006 (0.0017) model time 0.4042 (0.4147) loss 6.2701 (7.2646) grad_norm 1.8094 (2.5807) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][610/625] eta 0:00:06 lr 0.000712 wd 0.0500 time 0.5713 (0.4164) data time 0.0006 (0.0017) model time 0.5708 (0.4159) loss 6.2277 (7.2670) grad_norm 2.7204 (2.5792) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][620/625] eta 0:00:02 lr 0.000712 wd 0.0500 time 0.5853 (0.4176) data time 0.0006 (0.0017) model time 0.5847 (0.4172) loss 7.9916 (7.2671) grad_norm 1.7841 (2.5820) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 143 training takes 0:04:21 [2024-07-25 02:29:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:29:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:29:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.442 (0.442) Loss 0.5986 (0.5986) Acc@1 88.574 (88.574) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 02:29:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.123) Loss 0.9824 (0.7547) Acc@1 79.297 (84.801) Acc@5 94.971 (97.270) Mem 14939MB [2024-07-25 02:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 1.0918 (0.8818) Acc@1 74.414 (81.378) Acc@5 94.287 (95.894) Mem 14939MB [2024-07-25 02:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.982 Acc@5 95.817 [2024-07-25 02:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 02:29:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.817 (0.817) Loss 0.5659 (0.5659) Acc@1 89.111 (89.111) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 02:29:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.157) Loss 0.9170 (0.7111) Acc@1 80.322 (85.542) Acc@5 95.557 (97.488) Mem 14939MB [2024-07-25 02:29:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.123) Loss 1.0527 (0.8393) Acc@1 75.439 (82.061) Acc@5 94.580 (96.173) Mem 14939MB [2024-07-25 02:29:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.682 Acc@5 96.143 [2024-07-25 02:29:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 02:29:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.68% [2024-07-25 02:29:51 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:29:52 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:29:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][0/625] eta 0:08:37 lr 0.000712 wd 0.0500 time 0.8276 (0.8276) data time 0.4297 (0.4297) model time 0.0000 (0.0000) loss 6.7732 (6.7732) grad_norm 4.0376 (4.0376) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:29:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][10/625] eta 0:05:08 lr 0.000712 wd 0.0500 time 0.5745 (0.5018) data time 0.0006 (0.0399) model time 0.0000 (0.0000) loss 7.2087 (7.6339) grad_norm 3.2941 (3.4319) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:30:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][20/625] eta 0:04:41 lr 0.000712 wd 0.0500 time 0.4159 (0.4645) data time 0.0007 (0.0216) model time 0.0000 (0.0000) loss 6.8321 (7.2833) grad_norm 2.5299 (3.1059) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:30:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][30/625] eta 0:04:24 lr 0.000712 wd 0.0500 time 0.3988 (0.4445) data time 0.0009 (0.0152) model time 0.0000 (0.0000) loss 6.8925 (7.2909) grad_norm 2.5657 (2.8125) loss_scale 2048.0000 (1189.1613) mem 14939MB [2024-07-25 02:30:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][40/625] eta 0:04:14 lr 0.000712 wd 0.0500 time 0.3983 (0.4352) data time 0.0009 (0.0122) model time 0.0000 (0.0000) loss 8.5179 (7.3580) grad_norm 1.9913 (2.5999) loss_scale 2048.0000 (1398.6341) mem 14939MB [2024-07-25 02:30:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][50/625] eta 0:04:06 lr 0.000712 wd 0.0500 time 0.4156 (0.4292) data time 0.0006 (0.0102) model time 0.0000 (0.0000) loss 8.3440 (7.3388) grad_norm 2.0900 (2.5636) loss_scale 2048.0000 (1525.9608) mem 14939MB [2024-07-25 02:30:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][60/625] eta 0:04:00 lr 0.000711 wd 0.0500 time 0.3963 (0.4257) data time 0.0006 (0.0087) model time 0.3957 (0.4069) loss 6.6109 (7.2853) grad_norm 4.7039 (2.6854) loss_scale 2048.0000 (1611.5410) mem 14939MB [2024-07-25 02:30:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][70/625] eta 0:03:54 lr 0.000711 wd 0.0500 time 0.3994 (0.4222) data time 0.0009 (0.0076) model time 0.3985 (0.4034) loss 7.4731 (7.2462) grad_norm 2.4402 (2.6799) loss_scale 2048.0000 (1673.0141) mem 14939MB [2024-07-25 02:30:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][80/625] eta 0:03:48 lr 0.000711 wd 0.0500 time 0.4046 (0.4195) data time 0.0009 (0.0068) model time 0.4037 (0.4019) loss 6.0890 (7.2073) grad_norm 2.0191 (2.6105) loss_scale 2048.0000 (1719.3086) mem 14939MB [2024-07-25 02:30:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][90/625] eta 0:03:43 lr 0.000711 wd 0.0500 time 0.3976 (0.4180) data time 0.0006 (0.0066) model time 0.3969 (0.4017) loss 6.1884 (7.1781) grad_norm 2.3742 (2.5706) loss_scale 2048.0000 (1755.4286) mem 14939MB [2024-07-25 02:30:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][100/625] eta 0:03:38 lr 0.000711 wd 0.0500 time 0.4044 (0.4164) data time 0.0007 (0.0060) model time 0.4038 (0.4017) loss 7.2987 (7.1502) grad_norm 4.7201 (2.6020) loss_scale 2048.0000 (1784.3960) mem 14939MB [2024-07-25 02:30:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][110/625] eta 0:03:33 lr 0.000711 wd 0.0500 time 0.4100 (0.4151) data time 0.0008 (0.0055) model time 0.4092 (0.4016) loss 7.3626 (7.1493) grad_norm 1.7281 (2.5782) loss_scale 2048.0000 (1808.1441) mem 14939MB [2024-07-25 02:30:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][120/625] eta 0:03:28 lr 0.000711 wd 0.0500 time 0.3990 (0.4138) data time 0.0008 (0.0052) model time 0.3982 (0.4011) loss 7.8559 (7.1886) grad_norm 2.6923 (2.5503) loss_scale 2048.0000 (1827.9669) mem 14939MB [2024-07-25 02:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][130/625] eta 0:03:24 lr 0.000711 wd 0.0500 time 0.4053 (0.4130) data time 0.0006 (0.0048) model time 0.4047 (0.4012) loss 7.4131 (7.1967) grad_norm 2.3208 (2.5172) loss_scale 2048.0000 (1844.7634) mem 14939MB [2024-07-25 02:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][140/625] eta 0:03:19 lr 0.000711 wd 0.0500 time 0.3991 (0.4122) data time 0.0006 (0.0046) model time 0.3985 (0.4012) loss 7.5770 (7.1691) grad_norm 1.7323 (2.5293) loss_scale 2048.0000 (1859.1773) mem 14939MB [2024-07-25 02:30:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][150/625] eta 0:03:15 lr 0.000710 wd 0.0500 time 0.3962 (0.4113) data time 0.0006 (0.0043) model time 0.3956 (0.4009) loss 6.0949 (7.1799) grad_norm 2.5533 (2.5445) loss_scale 2048.0000 (1871.6821) mem 14939MB [2024-07-25 02:30:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][160/625] eta 0:03:10 lr 0.000710 wd 0.0500 time 0.4029 (0.4106) data time 0.0008 (0.0041) model time 0.4021 (0.4008) loss 7.3800 (7.1924) grad_norm 2.2986 (2.5824) loss_scale 2048.0000 (1882.6335) mem 14939MB [2024-07-25 02:31:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][170/625] eta 0:03:06 lr 0.000710 wd 0.0500 time 0.4012 (0.4101) data time 0.0010 (0.0039) model time 0.4002 (0.4007) loss 8.3557 (7.2187) grad_norm 2.0585 (2.5983) loss_scale 2048.0000 (1892.3041) mem 14939MB [2024-07-25 02:31:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][180/625] eta 0:03:02 lr 0.000710 wd 0.0500 time 0.3976 (0.4097) data time 0.0006 (0.0037) model time 0.3969 (0.4009) loss 7.7904 (7.1994) grad_norm 2.1884 (2.5819) loss_scale 2048.0000 (1900.9061) mem 14939MB [2024-07-25 02:31:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][190/625] eta 0:02:58 lr 0.000710 wd 0.0500 time 0.4059 (0.4093) data time 0.0009 (0.0036) model time 0.4050 (0.4008) loss 7.6270 (7.1777) grad_norm 1.6730 (2.5563) loss_scale 2048.0000 (1908.6073) mem 14939MB [2024-07-25 02:31:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][200/625] eta 0:02:54 lr 0.000710 wd 0.0500 time 0.4005 (0.4115) data time 0.0008 (0.0035) model time 0.3997 (0.4043) loss 6.7891 (7.1759) grad_norm 2.7086 (2.5402) loss_scale 2048.0000 (1915.5423) mem 14939MB [2024-07-25 02:31:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][210/625] eta 0:02:52 lr 0.000710 wd 0.0500 time 0.3998 (0.4157) data time 0.0006 (0.0033) model time 0.3992 (0.4103) loss 7.8051 (7.1643) grad_norm 2.0520 (2.5206) loss_scale 2048.0000 (1921.8199) mem 14939MB [2024-07-25 02:31:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][220/625] eta 0:02:49 lr 0.000710 wd 0.0500 time 0.4010 (0.4183) data time 0.0008 (0.0032) model time 0.4001 (0.4139) loss 8.4471 (7.1604) grad_norm 4.0563 (2.5168) loss_scale 2048.0000 (1927.5294) mem 14939MB [2024-07-25 02:31:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][230/625] eta 0:02:45 lr 0.000710 wd 0.0500 time 0.4037 (0.4194) data time 0.0008 (0.0031) model time 0.4029 (0.4154) loss 7.9919 (7.1581) grad_norm 2.3742 (2.5107) loss_scale 2048.0000 (1932.7446) mem 14939MB [2024-07-25 02:31:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][240/625] eta 0:02:41 lr 0.000710 wd 0.0500 time 0.3966 (0.4200) data time 0.0009 (0.0030) model time 0.3957 (0.4164) loss 7.4708 (7.1766) grad_norm 3.0947 (2.5274) loss_scale 2048.0000 (1937.5270) mem 14939MB [2024-07-25 02:31:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][250/625] eta 0:02:37 lr 0.000709 wd 0.0500 time 0.3999 (0.4193) data time 0.0006 (0.0030) model time 0.3993 (0.4155) loss 7.5357 (7.1669) grad_norm 3.5823 (2.5398) loss_scale 2048.0000 (1941.9283) mem 14939MB [2024-07-25 02:31:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][260/625] eta 0:02:32 lr 0.000709 wd 0.0500 time 0.3987 (0.4185) data time 0.0006 (0.0030) model time 0.3981 (0.4147) loss 6.1838 (7.1764) grad_norm 3.2113 (2.5704) loss_scale 2048.0000 (1945.9923) mem 14939MB [2024-07-25 02:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][270/625] eta 0:02:28 lr 0.000709 wd 0.0500 time 0.4003 (0.4179) data time 0.0008 (0.0029) model time 0.3996 (0.4141) loss 7.4051 (7.1845) grad_norm 3.7989 (2.5827) loss_scale 2048.0000 (1949.7565) mem 14939MB [2024-07-25 02:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][280/625] eta 0:02:23 lr 0.000709 wd 0.0500 time 0.4000 (0.4172) data time 0.0006 (0.0028) model time 0.3994 (0.4134) loss 7.5978 (7.1783) grad_norm 1.9623 (2.5744) loss_scale 2048.0000 (1953.2527) mem 14939MB [2024-07-25 02:31:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][290/625] eta 0:02:19 lr 0.000709 wd 0.0500 time 0.3950 (0.4167) data time 0.0010 (0.0027) model time 0.3939 (0.4129) loss 7.6237 (7.1724) grad_norm 1.5142 (2.5829) loss_scale 2048.0000 (1956.5086) mem 14939MB [2024-07-25 02:31:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][300/625] eta 0:02:15 lr 0.000709 wd 0.0500 time 0.3995 (0.4162) data time 0.0007 (0.0027) model time 0.3988 (0.4124) loss 6.9026 (7.1640) grad_norm 2.1976 (2.5699) loss_scale 2048.0000 (1959.5482) mem 14939MB [2024-07-25 02:32:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][310/625] eta 0:02:10 lr 0.000709 wd 0.0500 time 0.4052 (0.4157) data time 0.0006 (0.0026) model time 0.4046 (0.4119) loss 7.0443 (7.1590) grad_norm 1.7615 (2.5659) loss_scale 2048.0000 (1962.3923) mem 14939MB [2024-07-25 02:32:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][320/625] eta 0:02:06 lr 0.000709 wd 0.0500 time 0.4009 (0.4154) data time 0.0008 (0.0026) model time 0.4001 (0.4116) loss 7.5442 (7.1466) grad_norm 3.1561 (2.5748) loss_scale 2048.0000 (1965.0592) mem 14939MB [2024-07-25 02:32:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][330/625] eta 0:02:02 lr 0.000709 wd 0.0500 time 0.4009 (0.4150) data time 0.0008 (0.0025) model time 0.4000 (0.4113) loss 7.5740 (7.1515) grad_norm 2.6128 (2.5673) loss_scale 2048.0000 (1967.5650) mem 14939MB [2024-07-25 02:32:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][340/625] eta 0:01:58 lr 0.000708 wd 0.0500 time 0.4010 (0.4146) data time 0.0006 (0.0025) model time 0.4003 (0.4109) loss 7.8621 (7.1492) grad_norm 4.3918 (2.5765) loss_scale 2048.0000 (1969.9238) mem 14939MB [2024-07-25 02:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][350/625] eta 0:01:53 lr 0.000708 wd 0.0500 time 0.3994 (0.4142) data time 0.0007 (0.0024) model time 0.3987 (0.4105) loss 7.6409 (7.1568) grad_norm 1.8202 (2.6157) loss_scale 2048.0000 (1972.1481) mem 14939MB [2024-07-25 02:32:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][360/625] eta 0:01:49 lr 0.000708 wd 0.0500 time 0.3939 (0.4138) data time 0.0008 (0.0024) model time 0.3931 (0.4101) loss 7.4906 (7.1604) grad_norm 2.5784 (2.6303) loss_scale 2048.0000 (1974.2493) mem 14939MB [2024-07-25 02:32:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][370/625] eta 0:01:45 lr 0.000708 wd 0.0500 time 0.4024 (0.4134) data time 0.0007 (0.0024) model time 0.4016 (0.4098) loss 9.0446 (7.1792) grad_norm 2.1236 (2.6321) loss_scale 2048.0000 (1976.2372) mem 14939MB [2024-07-25 02:32:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][380/625] eta 0:01:41 lr 0.000708 wd 0.0500 time 0.3956 (0.4131) data time 0.0006 (0.0023) model time 0.3950 (0.4095) loss 8.1062 (7.1808) grad_norm 1.9515 (2.6622) loss_scale 2048.0000 (1978.1207) mem 14939MB [2024-07-25 02:32:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][390/625] eta 0:01:36 lr 0.000708 wd 0.0500 time 0.3983 (0.4127) data time 0.0006 (0.0023) model time 0.3977 (0.4091) loss 7.4527 (7.1849) grad_norm 1.8647 (2.6629) loss_scale 2048.0000 (1979.9079) mem 14939MB [2024-07-25 02:32:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][400/625] eta 0:01:32 lr 0.000708 wd 0.0500 time 0.4074 (0.4124) data time 0.0006 (0.0022) model time 0.4067 (0.4089) loss 7.3524 (7.1877) grad_norm 2.3113 (2.6578) loss_scale 2048.0000 (1981.6060) mem 14939MB [2024-07-25 02:32:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][410/625] eta 0:01:28 lr 0.000708 wd 0.0500 time 0.3965 (0.4121) data time 0.0007 (0.0022) model time 0.3958 (0.4085) loss 7.9907 (7.1843) grad_norm 1.7362 (2.6482) loss_scale 2048.0000 (1983.2214) mem 14939MB [2024-07-25 02:32:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][420/625] eta 0:01:24 lr 0.000708 wd 0.0500 time 0.3969 (0.4132) data time 0.0007 (0.0022) model time 0.3962 (0.4098) loss 7.9465 (7.1857) grad_norm 4.9438 (2.6643) loss_scale 2048.0000 (1984.7601) mem 14939MB [2024-07-25 02:32:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][430/625] eta 0:01:20 lr 0.000708 wd 0.0500 time 0.5880 (0.4153) data time 0.0009 (0.0022) model time 0.5871 (0.4123) loss 6.3531 (7.1907) grad_norm 3.2335 (inf) loss_scale 1024.0000 (1981.4756) mem 14939MB [2024-07-25 02:32:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][440/625] eta 0:01:16 lr 0.000707 wd 0.0500 time 0.4016 (0.4161) data time 0.0009 (0.0022) model time 0.4006 (0.4132) loss 6.9331 (7.1848) grad_norm 3.4337 (inf) loss_scale 1024.0000 (1959.7642) mem 14939MB [2024-07-25 02:33:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][450/625] eta 0:01:12 lr 0.000707 wd 0.0500 time 0.6019 (0.4170) data time 0.0007 (0.0022) model time 0.6013 (0.4143) loss 6.1761 (7.1897) grad_norm 1.7374 (inf) loss_scale 1024.0000 (1939.0155) mem 14939MB [2024-07-25 02:33:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][460/625] eta 0:01:08 lr 0.000707 wd 0.0500 time 0.3972 (0.4173) data time 0.0010 (0.0021) model time 0.3962 (0.4147) loss 6.2192 (7.1880) grad_norm 2.2108 (inf) loss_scale 1024.0000 (1919.1670) mem 14939MB [2024-07-25 02:33:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][470/625] eta 0:01:04 lr 0.000707 wd 0.0500 time 0.3936 (0.4170) data time 0.0007 (0.0021) model time 0.3928 (0.4144) loss 7.0945 (7.1968) grad_norm 2.9886 (inf) loss_scale 1024.0000 (1900.1614) mem 14939MB [2024-07-25 02:33:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][480/625] eta 0:01:00 lr 0.000707 wd 0.0500 time 0.4092 (0.4169) data time 0.0009 (0.0021) model time 0.4083 (0.4143) loss 7.8302 (7.2016) grad_norm 3.0450 (inf) loss_scale 1024.0000 (1881.9459) mem 14939MB [2024-07-25 02:33:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][490/625] eta 0:00:56 lr 0.000707 wd 0.0500 time 0.4059 (0.4166) data time 0.0009 (0.0021) model time 0.4050 (0.4139) loss 6.4623 (7.2060) grad_norm 2.0780 (inf) loss_scale 1024.0000 (1864.4725) mem 14939MB [2024-07-25 02:33:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][500/625] eta 0:00:52 lr 0.000707 wd 0.0500 time 0.3948 (0.4164) data time 0.0007 (0.0021) model time 0.3941 (0.4138) loss 6.5235 (7.2046) grad_norm 2.8243 (inf) loss_scale 1024.0000 (1847.6966) mem 14939MB [2024-07-25 02:33:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][510/625] eta 0:00:47 lr 0.000707 wd 0.0500 time 0.3948 (0.4161) data time 0.0009 (0.0020) model time 0.3939 (0.4135) loss 9.1720 (7.2136) grad_norm 1.8934 (inf) loss_scale 1024.0000 (1831.5773) mem 14939MB [2024-07-25 02:33:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][520/625] eta 0:00:43 lr 0.000707 wd 0.0500 time 0.4151 (0.4158) data time 0.0008 (0.0020) model time 0.4142 (0.4133) loss 5.9576 (7.2039) grad_norm 2.6582 (inf) loss_scale 1024.0000 (1816.0768) mem 14939MB [2024-07-25 02:33:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][530/625] eta 0:00:39 lr 0.000706 wd 0.0500 time 0.4017 (0.4156) data time 0.0009 (0.0020) model time 0.4009 (0.4130) loss 8.2668 (7.2039) grad_norm 2.4402 (inf) loss_scale 1024.0000 (1801.1601) mem 14939MB [2024-07-25 02:33:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][540/625] eta 0:00:35 lr 0.000706 wd 0.0500 time 0.3963 (0.4154) data time 0.0007 (0.0020) model time 0.3956 (0.4128) loss 6.0803 (7.2025) grad_norm 1.9778 (inf) loss_scale 1024.0000 (1786.7948) mem 14939MB [2024-07-25 02:33:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][550/625] eta 0:00:31 lr 0.000706 wd 0.0500 time 0.4173 (0.4151) data time 0.0009 (0.0020) model time 0.4163 (0.4126) loss 7.0927 (7.1963) grad_norm 5.4406 (inf) loss_scale 1024.0000 (1772.9510) mem 14939MB [2024-07-25 02:33:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][560/625] eta 0:00:26 lr 0.000706 wd 0.0500 time 0.4039 (0.4149) data time 0.0008 (0.0019) model time 0.4030 (0.4123) loss 8.3346 (7.1965) grad_norm 2.0881 (inf) loss_scale 1024.0000 (1759.6007) mem 14939MB [2024-07-25 02:33:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][570/625] eta 0:00:22 lr 0.000706 wd 0.0500 time 0.3965 (0.4147) data time 0.0007 (0.0019) model time 0.3959 (0.4121) loss 7.7012 (7.1973) grad_norm 1.8967 (inf) loss_scale 1024.0000 (1746.7180) mem 14939MB [2024-07-25 02:33:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][580/625] eta 0:00:18 lr 0.000706 wd 0.0500 time 0.4003 (0.4144) data time 0.0008 (0.0019) model time 0.3995 (0.4119) loss 7.8071 (7.2040) grad_norm 4.0511 (inf) loss_scale 1024.0000 (1734.2788) mem 14939MB [2024-07-25 02:33:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][590/625] eta 0:00:14 lr 0.000706 wd 0.0500 time 0.3969 (0.4142) data time 0.0007 (0.0019) model time 0.3963 (0.4116) loss 8.4492 (7.2060) grad_norm 2.6689 (inf) loss_scale 1024.0000 (1722.2606) mem 14939MB [2024-07-25 02:34:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][600/625] eta 0:00:10 lr 0.000706 wd 0.0500 time 0.3951 (0.4139) data time 0.0009 (0.0019) model time 0.3943 (0.4114) loss 7.2370 (7.2009) grad_norm 2.5371 (inf) loss_scale 1024.0000 (1710.6423) mem 14939MB [2024-07-25 02:34:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][610/625] eta 0:00:06 lr 0.000706 wd 0.0500 time 0.3999 (0.4137) data time 0.0004 (0.0019) model time 0.3995 (0.4111) loss 6.0668 (7.2066) grad_norm 2.3381 (inf) loss_scale 1024.0000 (1699.4043) mem 14939MB [2024-07-25 02:34:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][620/625] eta 0:00:02 lr 0.000706 wd 0.0500 time 0.3960 (0.4135) data time 0.0004 (0.0019) model time 0.3956 (0.4109) loss 5.9324 (7.2098) grad_norm 3.1152 (inf) loss_scale 1024.0000 (1688.5282) mem 14939MB [2024-07-25 02:34:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 144 training takes 0:04:18 [2024-07-25 02:34:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:34:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:34:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.438 (0.438) Loss 0.5840 (0.5840) Acc@1 88.379 (88.379) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 02:34:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 0.9863 (0.7411) Acc@1 79.004 (84.801) Acc@5 94.922 (97.177) Mem 14939MB [2024-07-25 02:34:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.1133 (0.8813) Acc@1 73.584 (81.280) Acc@5 94.043 (95.775) Mem 14939MB [2024-07-25 02:34:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.860 Acc@5 95.707 [2024-07-25 02:34:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-07-25 02:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.884 (0.884) Loss 0.5654 (0.5654) Acc@1 89.160 (89.160) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 02:34:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.159) Loss 0.9155 (0.7100) Acc@1 80.225 (85.587) Acc@5 95.605 (97.488) Mem 14939MB [2024-07-25 02:34:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.124) Loss 1.0518 (0.8382) Acc@1 75.537 (82.101) Acc@5 94.580 (96.182) Mem 14939MB [2024-07-25 02:34:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.716 Acc@5 96.149 [2024-07-25 02:34:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 02:34:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.72% [2024-07-25 02:34:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:34:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:34:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][0/625] eta 0:08:10 lr 0.000705 wd 0.0500 time 0.7853 (0.7853) data time 0.3979 (0.3979) model time 0.0000 (0.0000) loss 8.1674 (8.1674) grad_norm 3.0894 (3.0894) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][10/625] eta 0:04:27 lr 0.000705 wd 0.0500 time 0.3951 (0.4345) data time 0.0008 (0.0370) model time 0.0000 (0.0000) loss 8.2613 (7.4367) grad_norm 4.0863 (3.5594) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][20/625] eta 0:04:42 lr 0.000705 wd 0.0500 time 0.5658 (0.4662) data time 0.0008 (0.0198) model time 0.0000 (0.0000) loss 7.2906 (7.6166) grad_norm 1.7005 (2.9558) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][30/625] eta 0:04:41 lr 0.000705 wd 0.0500 time 0.5621 (0.4732) data time 0.0009 (0.0137) model time 0.0000 (0.0000) loss 6.3655 (7.5607) grad_norm 1.6893 (2.7863) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][40/625] eta 0:04:30 lr 0.000705 wd 0.0500 time 0.5889 (0.4630) data time 0.0007 (0.0106) model time 0.0000 (0.0000) loss 7.4789 (7.5000) grad_norm 2.1978 (2.6088) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][50/625] eta 0:04:28 lr 0.000705 wd 0.0500 time 0.5963 (0.4666) data time 0.0007 (0.0087) model time 0.0000 (0.0000) loss 5.6541 (7.3720) grad_norm 2.2174 (2.5487) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][60/625] eta 0:04:17 lr 0.000705 wd 0.0500 time 0.4021 (0.4564) data time 0.0006 (0.0074) model time 0.4015 (0.4035) loss 6.3751 (7.2795) grad_norm 1.8154 (2.5771) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][70/625] eta 0:04:08 lr 0.000705 wd 0.0500 time 0.4019 (0.4484) data time 0.0008 (0.0065) model time 0.4011 (0.4011) loss 7.5963 (7.2706) grad_norm 3.2373 (2.6458) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][80/625] eta 0:04:01 lr 0.000705 wd 0.0500 time 0.4106 (0.4427) data time 0.0008 (0.0058) model time 0.4098 (0.4010) loss 7.8249 (7.3163) grad_norm 1.9290 (2.6133) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:34:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][90/625] eta 0:03:54 lr 0.000705 wd 0.0500 time 0.4049 (0.4382) data time 0.0008 (0.0053) model time 0.4040 (0.4010) loss 7.4816 (7.3622) grad_norm 2.3049 (2.5902) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][100/625] eta 0:03:48 lr 0.000704 wd 0.0500 time 0.4003 (0.4344) data time 0.0007 (0.0048) model time 0.3996 (0.4007) loss 8.0282 (7.3609) grad_norm 2.0452 (2.5742) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][110/625] eta 0:03:42 lr 0.000704 wd 0.0500 time 0.4010 (0.4315) data time 0.0008 (0.0045) model time 0.4001 (0.4007) loss 7.5160 (7.3438) grad_norm 2.3229 (2.5333) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][120/625] eta 0:03:36 lr 0.000704 wd 0.0500 time 0.3983 (0.4290) data time 0.0009 (0.0042) model time 0.3974 (0.4007) loss 7.3349 (7.3238) grad_norm 3.0737 (2.5296) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][130/625] eta 0:03:31 lr 0.000704 wd 0.0500 time 0.3986 (0.4269) data time 0.0008 (0.0039) model time 0.3979 (0.4007) loss 7.9232 (7.3389) grad_norm 1.6703 (2.4955) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][140/625] eta 0:03:26 lr 0.000704 wd 0.0500 time 0.3999 (0.4251) data time 0.0009 (0.0037) model time 0.3991 (0.4007) loss 8.4915 (7.3378) grad_norm 2.2074 (2.4876) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][150/625] eta 0:03:21 lr 0.000704 wd 0.0500 time 0.3941 (0.4236) data time 0.0009 (0.0035) model time 0.3932 (0.4007) loss 7.5475 (7.3310) grad_norm 3.0764 (2.4825) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][160/625] eta 0:03:16 lr 0.000704 wd 0.0500 time 0.4073 (0.4222) data time 0.0008 (0.0034) model time 0.4065 (0.4006) loss 6.4972 (7.3260) grad_norm 1.8707 (2.4805) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][170/625] eta 0:03:11 lr 0.000704 wd 0.0500 time 0.3998 (0.4208) data time 0.0006 (0.0032) model time 0.3991 (0.4004) loss 7.4501 (7.3213) grad_norm 2.6261 (2.4794) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][180/625] eta 0:03:06 lr 0.000704 wd 0.0500 time 0.3985 (0.4195) data time 0.0008 (0.0031) model time 0.3977 (0.4001) loss 8.2286 (7.3290) grad_norm 3.0565 (2.4782) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][190/625] eta 0:03:02 lr 0.000704 wd 0.0500 time 0.4011 (0.4191) data time 0.0008 (0.0030) model time 0.4004 (0.4009) loss 7.2226 (7.3257) grad_norm 2.6640 (2.4657) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][200/625] eta 0:02:57 lr 0.000703 wd 0.0500 time 0.4013 (0.4182) data time 0.0006 (0.0029) model time 0.4007 (0.4009) loss 6.3153 (7.3157) grad_norm 2.0711 (2.4563) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][210/625] eta 0:02:53 lr 0.000703 wd 0.0500 time 0.3976 (0.4173) data time 0.0008 (0.0028) model time 0.3967 (0.4007) loss 5.8858 (7.2916) grad_norm 1.7932 (2.4391) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][220/625] eta 0:02:48 lr 0.000703 wd 0.0500 time 0.3959 (0.4165) data time 0.0006 (0.0027) model time 0.3953 (0.4006) loss 8.5817 (7.2786) grad_norm 2.6857 (2.4488) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][230/625] eta 0:02:44 lr 0.000703 wd 0.0500 time 0.4051 (0.4158) data time 0.0006 (0.0026) model time 0.4045 (0.4005) loss 6.9147 (7.2503) grad_norm 2.5964 (2.4414) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:35:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][240/625] eta 0:02:41 lr 0.000703 wd 0.0500 time 0.5853 (0.4183) data time 0.0009 (0.0026) model time 0.5844 (0.4045) loss 6.8239 (7.2564) grad_norm 2.2510 (2.4541) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][250/625] eta 0:02:38 lr 0.000703 wd 0.0500 time 0.4016 (0.4226) data time 0.0006 (0.0025) model time 0.4010 (0.4105) loss 7.6391 (7.2653) grad_norm 1.5103 (2.4475) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][260/625] eta 0:02:34 lr 0.000703 wd 0.0500 time 0.5622 (0.4244) data time 0.0006 (0.0024) model time 0.5615 (0.4133) loss 6.8280 (7.2484) grad_norm 2.0976 (2.4401) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][270/625] eta 0:02:30 lr 0.000703 wd 0.0500 time 0.3956 (0.4251) data time 0.0008 (0.0024) model time 0.3948 (0.4145) loss 8.3110 (7.2505) grad_norm 2.9233 (2.4352) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][280/625] eta 0:02:26 lr 0.000703 wd 0.0500 time 0.3954 (0.4241) data time 0.0009 (0.0023) model time 0.3945 (0.4138) loss 7.3695 (7.2513) grad_norm 1.8716 (2.4408) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][290/625] eta 0:02:21 lr 0.000702 wd 0.0500 time 0.4009 (0.4233) data time 0.0006 (0.0023) model time 0.4003 (0.4132) loss 5.4011 (7.2453) grad_norm 1.9218 (2.4450) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][300/625] eta 0:02:17 lr 0.000702 wd 0.0500 time 0.3988 (0.4225) data time 0.0008 (0.0022) model time 0.3979 (0.4126) loss 6.4746 (7.2352) grad_norm 4.0622 (2.4532) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][310/625] eta 0:02:12 lr 0.000702 wd 0.0500 time 0.4019 (0.4218) data time 0.0008 (0.0022) model time 0.4010 (0.4121) loss 7.4282 (7.2415) grad_norm 2.4782 (2.4640) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][320/625] eta 0:02:08 lr 0.000702 wd 0.0500 time 0.4103 (0.4211) data time 0.0009 (0.0022) model time 0.4094 (0.4116) loss 6.9192 (7.2323) grad_norm 1.6733 (2.4592) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][330/625] eta 0:02:04 lr 0.000702 wd 0.0500 time 0.4001 (0.4205) data time 0.0007 (0.0021) model time 0.3994 (0.4112) loss 6.4720 (7.2312) grad_norm 2.9148 (2.4760) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][340/625] eta 0:01:59 lr 0.000702 wd 0.0500 time 0.3978 (0.4200) data time 0.0007 (0.0021) model time 0.3971 (0.4108) loss 8.5730 (7.2483) grad_norm 4.2324 (2.4907) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][350/625] eta 0:01:55 lr 0.000702 wd 0.0500 time 0.4086 (0.4195) data time 0.0007 (0.0020) model time 0.4079 (0.4105) loss 7.4925 (7.2555) grad_norm 1.7582 (2.4872) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][360/625] eta 0:01:51 lr 0.000702 wd 0.0500 time 0.3916 (0.4189) data time 0.0007 (0.0020) model time 0.3909 (0.4101) loss 6.7299 (7.2474) grad_norm 2.5711 (2.4896) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][370/625] eta 0:01:46 lr 0.000702 wd 0.0500 time 0.3929 (0.4184) data time 0.0009 (0.0020) model time 0.3920 (0.4098) loss 7.2758 (7.2397) grad_norm 2.2811 (2.4765) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:36:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][380/625] eta 0:01:42 lr 0.000702 wd 0.0500 time 0.4009 (0.4179) data time 0.0006 (0.0020) model time 0.4002 (0.4094) loss 6.8376 (7.2314) grad_norm 2.3975 (2.4713) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][390/625] eta 0:01:38 lr 0.000701 wd 0.0500 time 0.3986 (0.4175) data time 0.0008 (0.0019) model time 0.3978 (0.4092) loss 8.3122 (7.2294) grad_norm 2.0647 (2.4678) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][400/625] eta 0:01:33 lr 0.000701 wd 0.0500 time 0.3993 (0.4171) data time 0.0008 (0.0019) model time 0.3985 (0.4090) loss 5.8210 (7.2245) grad_norm 1.8659 (2.4602) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][410/625] eta 0:01:29 lr 0.000701 wd 0.0500 time 0.4104 (0.4172) data time 0.0009 (0.0020) model time 0.4095 (0.4091) loss 7.0340 (7.2219) grad_norm 2.1278 (2.4538) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][420/625] eta 0:01:25 lr 0.000701 wd 0.0500 time 0.4014 (0.4168) data time 0.0008 (0.0020) model time 0.4005 (0.4089) loss 6.3582 (7.2061) grad_norm 2.5729 (2.4417) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][430/625] eta 0:01:21 lr 0.000701 wd 0.0500 time 0.3996 (0.4165) data time 0.0008 (0.0020) model time 0.3987 (0.4086) loss 6.9533 (7.2053) grad_norm 2.2286 (2.4297) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][440/625] eta 0:01:16 lr 0.000701 wd 0.0500 time 0.4035 (0.4161) data time 0.0008 (0.0020) model time 0.4027 (0.4084) loss 7.5969 (7.2008) grad_norm 2.5022 (2.4159) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][450/625] eta 0:01:12 lr 0.000701 wd 0.0500 time 0.4171 (0.4158) data time 0.0008 (0.0019) model time 0.4164 (0.4083) loss 8.4315 (7.2110) grad_norm 2.1495 (2.4290) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][460/625] eta 0:01:08 lr 0.000701 wd 0.0500 time 0.6045 (0.4170) data time 0.0008 (0.0019) model time 0.6037 (0.4098) loss 7.4355 (7.2063) grad_norm 2.4649 (2.4293) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][470/625] eta 0:01:04 lr 0.000701 wd 0.0500 time 0.4227 (0.4190) data time 0.0008 (0.0019) model time 0.4219 (0.4121) loss 8.1275 (7.2069) grad_norm 2.8765 (2.4299) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][480/625] eta 0:01:00 lr 0.000700 wd 0.0500 time 0.5922 (0.4202) data time 0.0007 (0.0019) model time 0.5915 (0.4136) loss 7.3094 (7.2138) grad_norm 6.0409 (2.4445) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][490/625] eta 0:00:56 lr 0.000700 wd 0.0500 time 0.5680 (0.4212) data time 0.0008 (0.0019) model time 0.5672 (0.4149) loss 8.4091 (7.2141) grad_norm 3.2475 (2.4415) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][500/625] eta 0:00:52 lr 0.000700 wd 0.0500 time 0.3984 (0.4208) data time 0.0007 (0.0018) model time 0.3977 (0.4145) loss 7.8543 (7.2186) grad_norm 2.6846 (2.4442) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][510/625] eta 0:00:48 lr 0.000700 wd 0.0500 time 0.4239 (0.4205) data time 0.0007 (0.0018) model time 0.4233 (0.4143) loss 8.3136 (7.2134) grad_norm 5.1851 (2.4840) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:37:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][520/625] eta 0:00:44 lr 0.000700 wd 0.0500 time 0.3934 (0.4202) data time 0.0009 (0.0018) model time 0.3926 (0.4142) loss 7.0379 (7.2223) grad_norm 4.3167 (2.5243) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][530/625] eta 0:00:39 lr 0.000700 wd 0.0500 time 0.4037 (0.4200) data time 0.0008 (0.0019) model time 0.4029 (0.4140) loss 7.0763 (7.2197) grad_norm 1.7472 (2.5197) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][540/625] eta 0:00:35 lr 0.000700 wd 0.0500 time 0.4183 (0.4198) data time 0.0006 (0.0018) model time 0.4176 (0.4138) loss 7.4964 (7.2201) grad_norm 4.4359 (2.5186) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][550/625] eta 0:00:31 lr 0.000700 wd 0.0500 time 0.4001 (0.4195) data time 0.0009 (0.0018) model time 0.3991 (0.4136) loss 7.6056 (7.2251) grad_norm 1.9511 (2.5175) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][560/625] eta 0:00:27 lr 0.000700 wd 0.0500 time 0.3992 (0.4192) data time 0.0007 (0.0018) model time 0.3985 (0.4133) loss 7.2905 (7.2199) grad_norm 5.2165 (2.5196) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][570/625] eta 0:00:23 lr 0.000700 wd 0.0500 time 0.3986 (0.4190) data time 0.0009 (0.0018) model time 0.3977 (0.4132) loss 7.8018 (7.2260) grad_norm 1.9550 (2.5281) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][580/625] eta 0:00:18 lr 0.000699 wd 0.0500 time 0.3950 (0.4186) data time 0.0009 (0.0018) model time 0.3941 (0.4129) loss 8.0530 (7.2340) grad_norm 3.0440 (2.5341) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][590/625] eta 0:00:14 lr 0.000699 wd 0.0500 time 0.3976 (0.4183) data time 0.0008 (0.0018) model time 0.3968 (0.4126) loss 8.0560 (7.2331) grad_norm 2.7141 (2.5383) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][600/625] eta 0:00:10 lr 0.000699 wd 0.0500 time 0.4017 (0.4180) data time 0.0006 (0.0018) model time 0.4010 (0.4124) loss 5.5176 (7.2295) grad_norm 2.3623 (2.5444) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][610/625] eta 0:00:06 lr 0.000699 wd 0.0500 time 0.3965 (0.4177) data time 0.0006 (0.0018) model time 0.3959 (0.4121) loss 7.8050 (7.2287) grad_norm 3.2922 (2.5488) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][620/625] eta 0:00:02 lr 0.000699 wd 0.0500 time 0.3985 (0.4174) data time 0.0006 (0.0017) model time 0.3979 (0.4119) loss 7.3262 (7.2296) grad_norm 2.2534 (2.5489) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 02:38:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 145 training takes 0:04:21 [2024-07-25 02:38:39 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:38:40 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:38:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.5698 (0.5698) Acc@1 89.062 (89.062) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 02:38:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.125) Loss 0.9404 (0.7205) Acc@1 78.369 (84.992) Acc@5 95.020 (97.235) Mem 14939MB [2024-07-25 02:38:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.107) Loss 1.0820 (0.8551) Acc@1 73.926 (81.338) Acc@5 93.750 (95.833) Mem 14939MB [2024-07-25 02:38:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.000 Acc@5 95.863 [2024-07-25 02:38:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-07-25 02:38:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.752 (0.752) Loss 0.5654 (0.5654) Acc@1 89.209 (89.209) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 02:38:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.9146 (0.7094) Acc@1 80.225 (85.596) Acc@5 95.654 (97.483) Mem 14939MB [2024-07-25 02:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0508 (0.8373) Acc@1 75.537 (82.129) Acc@5 94.580 (96.189) Mem 14939MB [2024-07-25 02:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.742 Acc@5 96.155 [2024-07-25 02:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 02:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.74% [2024-07-25 02:38:46 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:38:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:38:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][0/625] eta 0:07:47 lr 0.000699 wd 0.0500 time 0.7476 (0.7476) data time 0.3611 (0.3611) model time 0.0000 (0.0000) loss 7.8194 (7.8194) grad_norm 1.8596 (1.8596) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:38:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][10/625] eta 0:04:35 lr 0.000699 wd 0.0500 time 0.4695 (0.4487) data time 0.0006 (0.0349) model time 0.0000 (0.0000) loss 7.1436 (7.0626) grad_norm 1.8578 (2.4356) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:38:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][20/625] eta 0:04:19 lr 0.000699 wd 0.0500 time 0.3931 (0.4283) data time 0.0009 (0.0190) model time 0.0000 (0.0000) loss 7.4112 (7.2806) grad_norm 2.4716 (2.3606) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][30/625] eta 0:04:09 lr 0.000699 wd 0.0500 time 0.3981 (0.4198) data time 0.0009 (0.0134) model time 0.0000 (0.0000) loss 7.2620 (7.2176) grad_norm 2.1616 (2.3249) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][40/625] eta 0:04:02 lr 0.000699 wd 0.0500 time 0.4026 (0.4150) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 6.7801 (7.2444) grad_norm 3.4729 (2.3293) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][50/625] eta 0:03:58 lr 0.000698 wd 0.0500 time 0.4002 (0.4142) data time 0.0007 (0.0086) model time 0.0000 (0.0000) loss 6.3419 (7.1976) grad_norm 2.4976 (2.3118) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][60/625] eta 0:04:01 lr 0.000698 wd 0.0500 time 0.5683 (0.4279) data time 0.0010 (0.0073) model time 0.5673 (0.4968) loss 6.1839 (7.2142) grad_norm 2.1730 (2.6409) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][70/625] eta 0:04:03 lr 0.000698 wd 0.0500 time 0.5636 (0.4387) data time 0.0008 (0.0064) model time 0.5628 (0.5003) loss 8.3810 (7.2180) grad_norm 2.8122 (2.6468) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][80/625] eta 0:03:59 lr 0.000698 wd 0.0500 time 0.6157 (0.4401) data time 0.0008 (0.0058) model time 0.6149 (0.4830) loss 7.2997 (7.1877) grad_norm 4.5003 (2.7124) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][90/625] eta 0:03:55 lr 0.000698 wd 0.0500 time 0.3930 (0.4395) data time 0.0011 (0.0052) model time 0.3919 (0.4709) loss 7.3845 (7.1710) grad_norm 2.5648 (2.7365) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][100/625] eta 0:03:48 lr 0.000698 wd 0.0500 time 0.3992 (0.4358) data time 0.0007 (0.0048) model time 0.3985 (0.4568) loss 5.8460 (7.1899) grad_norm 1.6702 (2.8125) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][110/625] eta 0:03:42 lr 0.000698 wd 0.0500 time 0.4026 (0.4326) data time 0.0007 (0.0044) model time 0.4018 (0.4474) loss 7.0045 (7.1967) grad_norm 2.1880 (2.8707) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][120/625] eta 0:03:37 lr 0.000698 wd 0.0500 time 0.4045 (0.4301) data time 0.0009 (0.0041) model time 0.4037 (0.4407) loss 6.9461 (7.1972) grad_norm 2.2776 (2.8384) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][130/625] eta 0:03:31 lr 0.000698 wd 0.0500 time 0.3988 (0.4278) data time 0.0006 (0.0039) model time 0.3982 (0.4356) loss 8.2452 (7.2201) grad_norm 1.8549 (2.8397) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][140/625] eta 0:03:26 lr 0.000697 wd 0.0500 time 0.4030 (0.4259) data time 0.0009 (0.0037) model time 0.4021 (0.4316) loss 7.5940 (7.2402) grad_norm 2.3418 (2.8455) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][150/625] eta 0:03:21 lr 0.000697 wd 0.0500 time 0.3966 (0.4241) data time 0.0007 (0.0035) model time 0.3960 (0.4283) loss 5.3259 (7.2089) grad_norm 2.0360 (2.7987) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][160/625] eta 0:03:16 lr 0.000697 wd 0.0500 time 0.3941 (0.4235) data time 0.0008 (0.0033) model time 0.3933 (0.4270) loss 7.1192 (7.2057) grad_norm 2.8870 (2.7885) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:39:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][170/625] eta 0:03:12 lr 0.000697 wd 0.0500 time 0.3975 (0.4222) data time 0.0009 (0.0032) model time 0.3966 (0.4247) loss 6.3950 (7.1826) grad_norm 3.1554 (2.8161) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][180/625] eta 0:03:07 lr 0.000697 wd 0.0500 time 0.3962 (0.4209) data time 0.0007 (0.0031) model time 0.3955 (0.4227) loss 7.3517 (7.2023) grad_norm 4.2323 (2.8596) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][190/625] eta 0:03:02 lr 0.000697 wd 0.0500 time 0.3986 (0.4197) data time 0.0007 (0.0030) model time 0.3979 (0.4208) loss 8.1389 (7.1989) grad_norm 4.3872 (2.8565) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][200/625] eta 0:02:57 lr 0.000697 wd 0.0500 time 0.4002 (0.4188) data time 0.0007 (0.0029) model time 0.3995 (0.4194) loss 7.2927 (7.2004) grad_norm 1.6810 (2.8327) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][210/625] eta 0:02:53 lr 0.000697 wd 0.0500 time 0.3968 (0.4178) data time 0.0007 (0.0028) model time 0.3961 (0.4181) loss 7.1489 (7.2042) grad_norm 3.6554 (2.8196) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][220/625] eta 0:02:48 lr 0.000697 wd 0.0500 time 0.3969 (0.4170) data time 0.0009 (0.0027) model time 0.3960 (0.4169) loss 6.5608 (7.2018) grad_norm 2.3898 (2.7934) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][230/625] eta 0:02:44 lr 0.000696 wd 0.0500 time 0.4108 (0.4163) data time 0.0007 (0.0026) model time 0.4101 (0.4160) loss 6.2262 (7.1957) grad_norm 3.3210 (2.9147) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][240/625] eta 0:02:40 lr 0.000696 wd 0.0500 time 0.4018 (0.4156) data time 0.0008 (0.0025) model time 0.4010 (0.4151) loss 7.8279 (7.2128) grad_norm 2.7026 (2.8833) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][250/625] eta 0:02:35 lr 0.000696 wd 0.0500 time 0.4040 (0.4151) data time 0.0008 (0.0025) model time 0.4033 (0.4144) loss 7.6128 (7.2056) grad_norm 1.7242 (2.8651) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][260/625] eta 0:02:31 lr 0.000696 wd 0.0500 time 0.4085 (0.4146) data time 0.0008 (0.0024) model time 0.4076 (0.4137) loss 6.8293 (7.2111) grad_norm 2.7528 (2.8428) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][270/625] eta 0:02:27 lr 0.000696 wd 0.0500 time 0.4055 (0.4149) data time 0.0008 (0.0025) model time 0.4047 (0.4140) loss 8.0349 (7.2155) grad_norm 2.2751 (2.8279) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][280/625] eta 0:02:24 lr 0.000696 wd 0.0500 time 0.6154 (0.4174) data time 0.0007 (0.0024) model time 0.6147 (0.4171) loss 8.6003 (7.2262) grad_norm 3.3823 (2.8225) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][290/625] eta 0:02:20 lr 0.000696 wd 0.0500 time 0.3989 (0.4191) data time 0.0008 (0.0024) model time 0.3981 (0.4190) loss 7.8293 (7.2309) grad_norm 2.1479 (2.8257) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][300/625] eta 0:02:16 lr 0.000696 wd 0.0500 time 0.6130 (0.4209) data time 0.0007 (0.0023) model time 0.6123 (0.4212) loss 7.6148 (7.2256) grad_norm 2.2827 (2.8289) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:40:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][310/625] eta 0:02:12 lr 0.000696 wd 0.0500 time 0.3966 (0.4218) data time 0.0007 (0.0023) model time 0.3959 (0.4222) loss 6.1450 (7.2336) grad_norm 4.9253 (2.8591) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][320/625] eta 0:02:08 lr 0.000696 wd 0.0500 time 0.3976 (0.4211) data time 0.0007 (0.0023) model time 0.3969 (0.4213) loss 6.5048 (7.2468) grad_norm 3.1230 (2.8457) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][330/625] eta 0:02:04 lr 0.000695 wd 0.0500 time 0.3979 (0.4204) data time 0.0008 (0.0022) model time 0.3971 (0.4205) loss 7.0298 (7.2424) grad_norm 1.8309 (2.8237) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][340/625] eta 0:01:59 lr 0.000695 wd 0.0500 time 0.3966 (0.4198) data time 0.0009 (0.0022) model time 0.3957 (0.4197) loss 6.5813 (7.2408) grad_norm 3.4037 (2.8150) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][350/625] eta 0:01:55 lr 0.000695 wd 0.0500 time 0.4041 (0.4192) data time 0.0007 (0.0022) model time 0.4034 (0.4190) loss 8.2524 (7.2459) grad_norm 2.1525 (2.8407) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][360/625] eta 0:01:50 lr 0.000695 wd 0.0500 time 0.4007 (0.4187) data time 0.0009 (0.0022) model time 0.3997 (0.4183) loss 6.7143 (7.2463) grad_norm 3.1394 (2.8450) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][370/625] eta 0:01:46 lr 0.000695 wd 0.0500 time 0.3925 (0.4182) data time 0.0010 (0.0022) model time 0.3915 (0.4177) loss 8.7115 (7.2496) grad_norm 1.9014 (2.8453) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][380/625] eta 0:01:42 lr 0.000695 wd 0.0500 time 0.4020 (0.4184) data time 0.0010 (0.0021) model time 0.4011 (0.4179) loss 7.9203 (7.2603) grad_norm 2.9038 (2.8327) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][390/625] eta 0:01:38 lr 0.000695 wd 0.0500 time 0.3951 (0.4179) data time 0.0007 (0.0021) model time 0.3944 (0.4173) loss 6.4303 (7.2515) grad_norm 2.1509 (2.8252) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][400/625] eta 0:01:33 lr 0.000695 wd 0.0500 time 0.4010 (0.4174) data time 0.0008 (0.0021) model time 0.4002 (0.4168) loss 6.7996 (7.2538) grad_norm 2.0480 (2.8209) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][410/625] eta 0:01:29 lr 0.000695 wd 0.0500 time 0.4321 (0.4171) data time 0.0009 (0.0020) model time 0.4312 (0.4164) loss 6.4435 (7.2546) grad_norm 1.9217 (2.8169) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][420/625] eta 0:01:25 lr 0.000694 wd 0.0500 time 0.3975 (0.4168) data time 0.0009 (0.0020) model time 0.3966 (0.4160) loss 7.3020 (7.2565) grad_norm 2.1605 (2.8125) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][430/625] eta 0:01:21 lr 0.000694 wd 0.0500 time 0.3971 (0.4165) data time 0.0009 (0.0020) model time 0.3962 (0.4157) loss 6.6330 (7.2586) grad_norm 1.8860 (2.8014) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][440/625] eta 0:01:16 lr 0.000694 wd 0.0500 time 0.3988 (0.4161) data time 0.0008 (0.0020) model time 0.3980 (0.4153) loss 6.6504 (7.2543) grad_norm 2.9687 (2.7903) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][450/625] eta 0:01:12 lr 0.000694 wd 0.0500 time 0.4012 (0.4158) data time 0.0007 (0.0019) model time 0.4005 (0.4149) loss 5.7110 (7.2469) grad_norm 2.9363 (2.7936) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:41:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][460/625] eta 0:01:08 lr 0.000694 wd 0.0500 time 0.4004 (0.4155) data time 0.0010 (0.0019) model time 0.3994 (0.4145) loss 6.6264 (7.2450) grad_norm 2.8375 (2.8003) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][470/625] eta 0:01:04 lr 0.000694 wd 0.0500 time 0.4029 (0.4152) data time 0.0007 (0.0019) model time 0.4022 (0.4142) loss 6.7539 (7.2412) grad_norm 1.8988 (2.8072) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][480/625] eta 0:01:00 lr 0.000694 wd 0.0500 time 0.3980 (0.4148) data time 0.0007 (0.0019) model time 0.3974 (0.4138) loss 6.3835 (7.2397) grad_norm 3.6645 (2.8105) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][490/625] eta 0:00:56 lr 0.000694 wd 0.0500 time 0.5639 (0.4153) data time 0.0009 (0.0019) model time 0.5630 (0.4143) loss 7.3451 (7.2469) grad_norm 2.6845 (2.8166) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][500/625] eta 0:00:52 lr 0.000694 wd 0.0500 time 0.5827 (0.4163) data time 0.0008 (0.0018) model time 0.5819 (0.4154) loss 6.6684 (7.2447) grad_norm 2.1532 (2.8103) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][510/625] eta 0:00:48 lr 0.000694 wd 0.0500 time 0.5875 (0.4177) data time 0.0008 (0.0018) model time 0.5867 (0.4170) loss 8.1952 (7.2430) grad_norm 2.5870 (2.7987) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][520/625] eta 0:00:43 lr 0.000693 wd 0.0500 time 0.3995 (0.4184) data time 0.0007 (0.0018) model time 0.3988 (0.4178) loss 7.6692 (7.2428) grad_norm 2.1591 (2.7921) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][530/625] eta 0:00:39 lr 0.000693 wd 0.0500 time 0.3995 (0.4194) data time 0.0005 (0.0018) model time 0.3990 (0.4189) loss 6.5454 (7.2449) grad_norm 2.7282 (2.7920) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][540/625] eta 0:00:35 lr 0.000693 wd 0.0500 time 0.4017 (0.4190) data time 0.0006 (0.0018) model time 0.4011 (0.4185) loss 7.5721 (7.2496) grad_norm 3.3100 (2.8038) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][550/625] eta 0:00:31 lr 0.000693 wd 0.0500 time 0.3945 (0.4187) data time 0.0009 (0.0018) model time 0.3936 (0.4181) loss 7.8002 (7.2484) grad_norm 6.2193 (2.8169) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][560/625] eta 0:00:27 lr 0.000693 wd 0.0500 time 0.4024 (0.4184) data time 0.0008 (0.0017) model time 0.4015 (0.4178) loss 6.4234 (7.2433) grad_norm 2.2412 (2.8135) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][570/625] eta 0:00:22 lr 0.000693 wd 0.0500 time 0.3990 (0.4181) data time 0.0007 (0.0017) model time 0.3983 (0.4174) loss 6.1166 (7.2480) grad_norm 4.0368 (2.8117) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][580/625] eta 0:00:18 lr 0.000693 wd 0.0500 time 0.3946 (0.4178) data time 0.0008 (0.0017) model time 0.3937 (0.4170) loss 8.0810 (7.2455) grad_norm 2.3099 (2.8072) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][590/625] eta 0:00:14 lr 0.000693 wd 0.0500 time 0.3971 (0.4174) data time 0.0008 (0.0017) model time 0.3963 (0.4167) loss 7.6549 (7.2463) grad_norm 1.9687 (2.7992) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:42:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][600/625] eta 0:00:10 lr 0.000693 wd 0.0500 time 0.3970 (0.4175) data time 0.0008 (0.0017) model time 0.3962 (0.4167) loss 7.4513 (7.2454) grad_norm 1.7356 (2.7892) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][610/625] eta 0:00:06 lr 0.000692 wd 0.0500 time 0.3960 (0.4172) data time 0.0006 (0.0017) model time 0.3954 (0.4164) loss 8.0093 (7.2546) grad_norm 2.0091 (2.7817) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][620/625] eta 0:00:02 lr 0.000692 wd 0.0500 time 0.4652 (0.4170) data time 0.0004 (0.0017) model time 0.4648 (0.4162) loss 6.6910 (7.2530) grad_norm 1.3598 (2.8053) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 146 training takes 0:04:20 [2024-07-25 02:43:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:43:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.491 (0.491) Loss 0.5898 (0.5898) Acc@1 88.672 (88.672) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 02:43:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.124) Loss 0.9521 (0.7298) Acc@1 79.492 (85.072) Acc@5 95.361 (97.248) Mem 14939MB [2024-07-25 02:43:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.106) Loss 1.0771 (0.8630) Acc@1 75.146 (81.483) Acc@5 93.896 (95.887) Mem 14939MB [2024-07-25 02:43:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.090 Acc@5 95.863 [2024-07-25 02:43:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 02:43:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.09% [2024-07-25 02:43:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 02:43:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 02:43:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5659 (0.5659) Acc@1 89.111 (89.111) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 02:43:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9131 (0.7086) Acc@1 80.322 (85.622) Acc@5 95.703 (97.496) Mem 14939MB [2024-07-25 02:43:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0488 (0.8365) Acc@1 75.635 (82.145) Acc@5 94.482 (96.201) Mem 14939MB [2024-07-25 02:43:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.768 Acc@5 96.167 [2024-07-25 02:43:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-07-25 02:43:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.77% [2024-07-25 02:43:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:43:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:43:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][0/625] eta 0:09:04 lr 0.000692 wd 0.0500 time 0.8709 (0.8709) data time 0.4896 (0.4896) model time 0.0000 (0.0000) loss 6.5489 (6.5489) grad_norm 2.8729 (2.8729) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][10/625] eta 0:04:32 lr 0.000692 wd 0.0500 time 0.4019 (0.4427) data time 0.0006 (0.0454) model time 0.0000 (0.0000) loss 7.7553 (6.9757) grad_norm 1.5082 (2.4674) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][20/625] eta 0:04:15 lr 0.000692 wd 0.0500 time 0.3985 (0.4229) data time 0.0008 (0.0244) model time 0.0000 (0.0000) loss 5.8810 (6.9165) grad_norm 2.6956 (2.2676) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][30/625] eta 0:04:07 lr 0.000692 wd 0.0500 time 0.3980 (0.4157) data time 0.0010 (0.0169) model time 0.0000 (0.0000) loss 7.4937 (7.0997) grad_norm 2.7619 (2.4879) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][40/625] eta 0:04:00 lr 0.000692 wd 0.0500 time 0.3972 (0.4119) data time 0.0009 (0.0130) model time 0.0000 (0.0000) loss 7.1350 (7.2038) grad_norm 1.9781 (2.4088) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][50/625] eta 0:03:55 lr 0.000692 wd 0.0500 time 0.3991 (0.4098) data time 0.0009 (0.0106) model time 0.0000 (0.0000) loss 7.8260 (7.2948) grad_norm 2.0334 (2.6066) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][60/625] eta 0:03:50 lr 0.000692 wd 0.0500 time 0.4013 (0.4085) data time 0.0006 (0.0090) model time 0.4006 (0.4014) loss 7.3091 (7.2467) grad_norm 2.8415 (2.6263) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][70/625] eta 0:03:46 lr 0.000692 wd 0.0500 time 0.3976 (0.4074) data time 0.0009 (0.0079) model time 0.3967 (0.4004) loss 6.1072 (7.2575) grad_norm 1.6673 (2.5788) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][80/625] eta 0:03:42 lr 0.000691 wd 0.0500 time 0.5723 (0.4086) data time 0.0006 (0.0070) model time 0.5717 (0.4057) loss 6.4085 (7.1549) grad_norm 2.1725 (2.5672) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][90/625] eta 0:03:41 lr 0.000691 wd 0.0500 time 0.6132 (0.4136) data time 0.0008 (0.0064) model time 0.6124 (0.4177) loss 7.5516 (7.1541) grad_norm 2.6692 (2.5591) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:43:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][100/625] eta 0:03:41 lr 0.000691 wd 0.0500 time 0.5622 (0.4212) data time 0.0007 (0.0058) model time 0.5615 (0.4320) loss 7.3442 (7.1460) grad_norm 1.7312 (2.5414) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][110/625] eta 0:03:38 lr 0.000691 wd 0.0500 time 0.4450 (0.4247) data time 0.0007 (0.0054) model time 0.4443 (0.4365) loss 7.7892 (7.1530) grad_norm 1.8860 (2.5267) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][120/625] eta 0:03:37 lr 0.000691 wd 0.0500 time 0.5250 (0.4300) data time 0.0009 (0.0050) model time 0.5241 (0.4438) loss 7.2518 (7.1249) grad_norm 3.2608 (2.6107) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][130/625] eta 0:03:32 lr 0.000691 wd 0.0500 time 0.3999 (0.4292) data time 0.0007 (0.0047) model time 0.3993 (0.4406) loss 8.0241 (7.1584) grad_norm 2.1872 (2.5919) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][140/625] eta 0:03:27 lr 0.000691 wd 0.0500 time 0.4002 (0.4270) data time 0.0008 (0.0044) model time 0.3994 (0.4359) loss 8.3219 (7.1287) grad_norm 1.8731 (2.5650) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][150/625] eta 0:03:22 lr 0.000691 wd 0.0500 time 0.3975 (0.4253) data time 0.0009 (0.0042) model time 0.3966 (0.4323) loss 6.4923 (7.1263) grad_norm 2.2209 (2.5528) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][160/625] eta 0:03:17 lr 0.000691 wd 0.0500 time 0.3974 (0.4237) data time 0.0006 (0.0040) model time 0.3968 (0.4293) loss 6.4263 (7.1129) grad_norm 5.1034 (2.5688) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][170/625] eta 0:03:12 lr 0.000691 wd 0.0500 time 0.3984 (0.4223) data time 0.0008 (0.0038) model time 0.3975 (0.4267) loss 8.0123 (7.1122) grad_norm 1.9734 (2.6002) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][180/625] eta 0:03:07 lr 0.000690 wd 0.0500 time 0.4007 (0.4210) data time 0.0006 (0.0036) model time 0.4001 (0.4245) loss 7.5253 (7.1279) grad_norm 1.8003 (2.5889) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][190/625] eta 0:03:02 lr 0.000690 wd 0.0500 time 0.3951 (0.4199) data time 0.0007 (0.0035) model time 0.3945 (0.4227) loss 8.3393 (7.1516) grad_norm 1.5492 (2.5590) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][200/625] eta 0:02:58 lr 0.000690 wd 0.0500 time 0.3955 (0.4189) data time 0.0008 (0.0034) model time 0.3947 (0.4211) loss 8.0411 (7.1419) grad_norm 1.9073 (2.5350) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][210/625] eta 0:02:53 lr 0.000690 wd 0.0500 time 0.3999 (0.4180) data time 0.0006 (0.0033) model time 0.3993 (0.4197) loss 7.8765 (7.1393) grad_norm 3.9878 (2.5403) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][220/625] eta 0:02:48 lr 0.000690 wd 0.0500 time 0.3964 (0.4171) data time 0.0009 (0.0031) model time 0.3955 (0.4184) loss 7.2993 (7.1520) grad_norm 2.3234 (2.5356) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][230/625] eta 0:02:44 lr 0.000690 wd 0.0500 time 0.3990 (0.4164) data time 0.0008 (0.0030) model time 0.3982 (0.4174) loss 5.7616 (7.1492) grad_norm 1.8561 (2.5199) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][240/625] eta 0:02:40 lr 0.000690 wd 0.0500 time 0.3999 (0.4157) data time 0.0006 (0.0030) model time 0.3992 (0.4164) loss 7.3356 (7.1565) grad_norm 1.7119 (2.5017) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:44:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][250/625] eta 0:02:35 lr 0.000690 wd 0.0500 time 0.3961 (0.4151) data time 0.0008 (0.0029) model time 0.3953 (0.4155) loss 8.0023 (7.1509) grad_norm 2.7564 (2.4938) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][260/625] eta 0:02:31 lr 0.000690 wd 0.0500 time 0.4006 (0.4146) data time 0.0009 (0.0028) model time 0.3997 (0.4148) loss 5.7244 (7.1510) grad_norm 3.3098 (2.5167) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][270/625] eta 0:02:26 lr 0.000689 wd 0.0500 time 0.4049 (0.4141) data time 0.0008 (0.0027) model time 0.4040 (0.4141) loss 6.9974 (7.1537) grad_norm 2.9393 (2.5161) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][280/625] eta 0:02:22 lr 0.000689 wd 0.0500 time 0.3942 (0.4136) data time 0.0006 (0.0027) model time 0.3936 (0.4135) loss 6.3209 (7.1491) grad_norm 3.4616 (2.5338) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][290/625] eta 0:02:18 lr 0.000689 wd 0.0500 time 0.3990 (0.4131) data time 0.0008 (0.0026) model time 0.3982 (0.4129) loss 7.0079 (7.1476) grad_norm 1.6665 (2.5204) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][300/625] eta 0:02:14 lr 0.000689 wd 0.0500 time 0.4001 (0.4126) data time 0.0008 (0.0026) model time 0.3993 (0.4123) loss 6.5422 (7.1496) grad_norm 2.4387 (2.5091) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][310/625] eta 0:02:10 lr 0.000689 wd 0.0500 time 0.5543 (0.4147) data time 0.0007 (0.0025) model time 0.5536 (0.4147) loss 7.8634 (7.1513) grad_norm 2.9111 (2.5027) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][320/625] eta 0:02:07 lr 0.000689 wd 0.0500 time 0.5758 (0.4168) data time 0.0008 (0.0025) model time 0.5750 (0.4172) loss 6.4652 (7.1352) grad_norm 2.7206 (2.5079) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][330/625] eta 0:02:03 lr 0.000689 wd 0.0500 time 0.5907 (0.4186) data time 0.0009 (0.0024) model time 0.5898 (0.4193) loss 7.1549 (7.1522) grad_norm 2.0972 (2.5066) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][340/625] eta 0:01:59 lr 0.000689 wd 0.0500 time 0.3975 (0.4196) data time 0.0008 (0.0024) model time 0.3967 (0.4205) loss 7.4054 (7.1572) grad_norm 2.6881 (2.4991) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][350/625] eta 0:01:55 lr 0.000689 wd 0.0500 time 0.4018 (0.4201) data time 0.0008 (0.0023) model time 0.4010 (0.4210) loss 7.7326 (7.1501) grad_norm 2.7487 (2.4960) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][360/625] eta 0:01:51 lr 0.000689 wd 0.0500 time 0.4312 (0.4199) data time 0.0007 (0.0023) model time 0.4306 (0.4206) loss 7.5812 (7.1475) grad_norm 1.5538 (2.5056) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][370/625] eta 0:01:46 lr 0.000688 wd 0.0500 time 0.3943 (0.4193) data time 0.0007 (0.0023) model time 0.3936 (0.4199) loss 7.3248 (7.1502) grad_norm 3.3584 (2.5195) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][380/625] eta 0:01:42 lr 0.000688 wd 0.0500 time 0.4024 (0.4189) data time 0.0006 (0.0022) model time 0.4019 (0.4193) loss 7.7149 (7.1443) grad_norm 2.1617 (2.5130) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][390/625] eta 0:01:38 lr 0.000688 wd 0.0500 time 0.4007 (0.4185) data time 0.0009 (0.0022) model time 0.3998 (0.4188) loss 6.6780 (7.1429) grad_norm 2.7844 (2.5039) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][400/625] eta 0:01:34 lr 0.000688 wd 0.0500 time 0.4050 (0.4181) data time 0.0008 (0.0022) model time 0.4042 (0.4184) loss 6.4792 (7.1477) grad_norm 1.9280 (2.4876) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][410/625] eta 0:01:29 lr 0.000688 wd 0.0500 time 0.4037 (0.4177) data time 0.0008 (0.0021) model time 0.4028 (0.4179) loss 8.2014 (7.1504) grad_norm 1.8383 (2.4752) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][420/625] eta 0:01:25 lr 0.000688 wd 0.0500 time 0.4218 (0.4173) data time 0.0008 (0.0021) model time 0.4209 (0.4174) loss 7.2293 (7.1463) grad_norm 2.1127 (2.4737) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][430/625] eta 0:01:21 lr 0.000688 wd 0.0500 time 0.3950 (0.4169) data time 0.0008 (0.0021) model time 0.3942 (0.4170) loss 8.3489 (7.1557) grad_norm 1.7988 (2.4948) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][440/625] eta 0:01:17 lr 0.000688 wd 0.0500 time 0.4117 (0.4166) data time 0.0007 (0.0021) model time 0.4110 (0.4165) loss 6.7606 (7.1617) grad_norm 2.3353 (2.5133) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][450/625] eta 0:01:12 lr 0.000688 wd 0.0500 time 0.4137 (0.4163) data time 0.0007 (0.0020) model time 0.4130 (0.4162) loss 6.2961 (7.1707) grad_norm 2.5551 (2.5152) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][460/625] eta 0:01:08 lr 0.000687 wd 0.0500 time 0.3942 (0.4159) data time 0.0007 (0.0020) model time 0.3934 (0.4158) loss 7.1871 (7.1777) grad_norm 1.6959 (2.5066) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][470/625] eta 0:01:04 lr 0.000687 wd 0.0500 time 0.4106 (0.4156) data time 0.0010 (0.0020) model time 0.4095 (0.4154) loss 7.8081 (7.1860) grad_norm 3.3994 (2.5085) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][480/625] eta 0:01:00 lr 0.000687 wd 0.0500 time 0.4025 (0.4153) data time 0.0007 (0.0020) model time 0.4018 (0.4151) loss 7.0261 (7.1906) grad_norm 2.4834 (2.5074) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][490/625] eta 0:00:56 lr 0.000687 wd 0.0500 time 0.3971 (0.4151) data time 0.0008 (0.0020) model time 0.3963 (0.4147) loss 6.9468 (7.1962) grad_norm 2.9185 (2.5138) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][500/625] eta 0:00:51 lr 0.000687 wd 0.0500 time 0.4015 (0.4148) data time 0.0006 (0.0019) model time 0.4009 (0.4145) loss 6.3859 (7.1966) grad_norm 2.1151 (2.5419) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][510/625] eta 0:00:47 lr 0.000687 wd 0.0500 time 0.4039 (0.4146) data time 0.0007 (0.0019) model time 0.4032 (0.4141) loss 6.8839 (7.1957) grad_norm 1.7352 (2.5533) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][520/625] eta 0:00:43 lr 0.000687 wd 0.0500 time 0.3953 (0.4142) data time 0.0006 (0.0019) model time 0.3946 (0.4138) loss 7.6756 (7.1972) grad_norm 2.0016 (2.5546) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:46:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][530/625] eta 0:00:39 lr 0.000687 wd 0.0500 time 0.3944 (0.4154) data time 0.0008 (0.0019) model time 0.3936 (0.4151) loss 6.6918 (7.1993) grad_norm 9.4263 (2.5627) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][540/625] eta 0:00:35 lr 0.000687 wd 0.0500 time 0.5874 (0.4165) data time 0.0008 (0.0019) model time 0.5866 (0.4163) loss 7.1952 (7.2046) grad_norm 2.0152 (2.5624) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][550/625] eta 0:00:31 lr 0.000687 wd 0.0500 time 0.5804 (0.4175) data time 0.0008 (0.0019) model time 0.5796 (0.4174) loss 5.5038 (7.2018) grad_norm 1.8612 (2.5581) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][560/625] eta 0:00:27 lr 0.000686 wd 0.0500 time 0.5854 (0.4185) data time 0.0010 (0.0018) model time 0.5844 (0.4184) loss 7.0933 (7.2011) grad_norm 2.1128 (2.5627) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][570/625] eta 0:00:23 lr 0.000686 wd 0.0500 time 0.3981 (0.4184) data time 0.0006 (0.0018) model time 0.3975 (0.4183) loss 7.1502 (7.2036) grad_norm 1.9238 (2.5592) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][580/625] eta 0:00:18 lr 0.000686 wd 0.0500 time 0.3988 (0.4181) data time 0.0006 (0.0018) model time 0.3982 (0.4180) loss 6.2022 (7.2007) grad_norm 1.5177 (2.5533) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][590/625] eta 0:00:14 lr 0.000686 wd 0.0500 time 0.4030 (0.4178) data time 0.0008 (0.0018) model time 0.4021 (0.4176) loss 7.0881 (7.2054) grad_norm 2.3644 (2.5510) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][600/625] eta 0:00:10 lr 0.000686 wd 0.0500 time 0.4011 (0.4175) data time 0.0008 (0.0018) model time 0.4003 (0.4173) loss 7.4624 (7.2090) grad_norm 2.4891 (2.5440) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][610/625] eta 0:00:06 lr 0.000686 wd 0.0500 time 0.4000 (0.4173) data time 0.0004 (0.0018) model time 0.3996 (0.4170) loss 7.1605 (7.2095) grad_norm 2.5309 (2.5477) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][620/625] eta 0:00:02 lr 0.000686 wd 0.0500 time 0.4046 (0.4170) data time 0.0006 (0.0018) model time 0.4040 (0.4167) loss 8.3815 (7.2121) grad_norm 5.4312 (2.5543) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 147 training takes 0:04:20 [2024-07-25 02:47:35 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:47:36 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:47:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.6064 (0.6064) Acc@1 88.232 (88.232) Acc@5 98.193 (98.193) Mem 14939MB [2024-07-25 02:47:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9575 (0.7401) Acc@1 79.639 (85.019) Acc@5 95.312 (97.266) Mem 14939MB [2024-07-25 02:47:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.1055 (0.8716) Acc@1 74.072 (81.513) Acc@5 93.994 (95.931) Mem 14939MB [2024-07-25 02:47:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.072 Acc@5 95.889 [2024-07-25 02:47:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 02:47:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.794 (0.794) Loss 0.5654 (0.5654) Acc@1 89.209 (89.209) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 02:47:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.154) Loss 0.9111 (0.7078) Acc@1 80.371 (85.658) Acc@5 95.752 (97.528) Mem 14939MB [2024-07-25 02:47:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.122) Loss 1.0479 (0.8355) Acc@1 75.537 (82.150) Acc@5 94.434 (96.233) Mem 14939MB [2024-07-25 02:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.780 Acc@5 96.195 [2024-07-25 02:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-07-25 02:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.78% [2024-07-25 02:47:42 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:47:43 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:47:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][0/625] eta 0:07:45 lr 0.000686 wd 0.0500 time 0.7449 (0.7449) data time 0.3603 (0.3603) model time 0.0000 (0.0000) loss 6.1311 (6.1311) grad_norm 3.1999 (3.1999) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][10/625] eta 0:04:25 lr 0.000686 wd 0.0500 time 0.4041 (0.4321) data time 0.0009 (0.0336) model time 0.0000 (0.0000) loss 6.1787 (7.0319) grad_norm 1.9999 (2.3230) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][20/625] eta 0:04:11 lr 0.000686 wd 0.0500 time 0.3979 (0.4161) data time 0.0006 (0.0180) model time 0.0000 (0.0000) loss 8.2335 (6.9791) grad_norm 1.7120 (2.2657) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][30/625] eta 0:04:04 lr 0.000685 wd 0.0500 time 0.4017 (0.4107) data time 0.0006 (0.0125) model time 0.0000 (0.0000) loss 6.1713 (6.9519) grad_norm 1.9902 (2.5294) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:47:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][40/625] eta 0:03:58 lr 0.000685 wd 0.0500 time 0.4002 (0.4079) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 6.4352 (6.9148) grad_norm 2.5204 (2.4356) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][50/625] eta 0:03:53 lr 0.000685 wd 0.0500 time 0.3968 (0.4066) data time 0.0008 (0.0079) model time 0.0000 (0.0000) loss 6.4658 (6.9172) grad_norm 1.6321 (2.3781) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][60/625] eta 0:03:49 lr 0.000685 wd 0.0500 time 0.4009 (0.4053) data time 0.0006 (0.0068) model time 0.4003 (0.3982) loss 8.7317 (6.9727) grad_norm 3.4768 (2.3363) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][70/625] eta 0:03:45 lr 0.000685 wd 0.0500 time 0.3950 (0.4063) data time 0.0009 (0.0059) model time 0.3941 (0.4049) loss 7.8599 (7.0162) grad_norm 3.2500 (2.4394) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][80/625] eta 0:03:41 lr 0.000685 wd 0.0500 time 0.4051 (0.4056) data time 0.0006 (0.0053) model time 0.4045 (0.4032) loss 7.9808 (7.0546) grad_norm 2.1517 (2.4495) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][90/625] eta 0:03:36 lr 0.000685 wd 0.0500 time 0.4048 (0.4051) data time 0.0009 (0.0048) model time 0.4040 (0.4024) loss 5.8602 (7.0829) grad_norm 2.8917 (2.5118) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][100/625] eta 0:03:32 lr 0.000685 wd 0.0500 time 0.3991 (0.4050) data time 0.0009 (0.0045) model time 0.3982 (0.4025) loss 6.3985 (7.1279) grad_norm 3.1266 (2.5242) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][110/625] eta 0:03:28 lr 0.000685 wd 0.0500 time 0.3996 (0.4047) data time 0.0008 (0.0041) model time 0.3988 (0.4021) loss 8.3894 (7.1132) grad_norm 1.8007 (2.4889) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][120/625] eta 0:03:25 lr 0.000684 wd 0.0500 time 0.6024 (0.4070) data time 0.0008 (0.0039) model time 0.6016 (0.4063) loss 6.6953 (7.1176) grad_norm 3.9438 (2.4775) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][130/625] eta 0:03:22 lr 0.000684 wd 0.0500 time 0.5726 (0.4091) data time 0.0006 (0.0036) model time 0.5719 (0.4098) loss 6.1111 (7.1150) grad_norm 1.7007 (2.4827) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][140/625] eta 0:03:21 lr 0.000684 wd 0.0500 time 0.5856 (0.4152) data time 0.0011 (0.0035) model time 0.5845 (0.4192) loss 7.0527 (7.1227) grad_norm 1.8084 (2.5056) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][150/625] eta 0:03:19 lr 0.000684 wd 0.0500 time 0.4979 (0.4195) data time 0.0009 (0.0033) model time 0.4970 (0.4251) loss 6.4461 (7.1200) grad_norm 2.4792 (2.4953) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][160/625] eta 0:03:16 lr 0.000684 wd 0.0500 time 0.6032 (0.4217) data time 0.0009 (0.0032) model time 0.6023 (0.4278) loss 6.4694 (7.0985) grad_norm 2.7122 (2.4748) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][170/625] eta 0:03:11 lr 0.000684 wd 0.0500 time 0.3995 (0.4204) data time 0.0008 (0.0030) model time 0.3987 (0.4253) loss 6.2579 (7.0918) grad_norm 3.3364 (2.4786) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:48:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][180/625] eta 0:03:06 lr 0.000684 wd 0.0500 time 0.3946 (0.4192) data time 0.0009 (0.0029) model time 0.3937 (0.4233) loss 7.6390 (7.0984) grad_norm 1.9846 (2.4614) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][190/625] eta 0:03:01 lr 0.000684 wd 0.0500 time 0.3972 (0.4182) data time 0.0008 (0.0028) model time 0.3964 (0.4215) loss 6.5095 (7.0997) grad_norm 2.0925 (2.4757) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][200/625] eta 0:02:57 lr 0.000684 wd 0.0500 time 0.3968 (0.4176) data time 0.0009 (0.0027) model time 0.3959 (0.4203) loss 8.6950 (7.1101) grad_norm 1.9357 (2.4468) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][210/625] eta 0:02:52 lr 0.000684 wd 0.0500 time 0.3940 (0.4167) data time 0.0007 (0.0026) model time 0.3933 (0.4190) loss 7.7341 (7.1049) grad_norm 2.0870 (2.4257) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][220/625] eta 0:02:48 lr 0.000683 wd 0.0500 time 0.3958 (0.4162) data time 0.0009 (0.0026) model time 0.3949 (0.4181) loss 8.2924 (7.1195) grad_norm 2.5829 (2.4058) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][230/625] eta 0:02:44 lr 0.000683 wd 0.0500 time 0.3988 (0.4154) data time 0.0007 (0.0025) model time 0.3982 (0.4169) loss 6.9712 (7.1395) grad_norm 1.9560 (2.4004) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][240/625] eta 0:02:39 lr 0.000683 wd 0.0500 time 0.3951 (0.4146) data time 0.0007 (0.0024) model time 0.3944 (0.4159) loss 6.6681 (7.1358) grad_norm 2.5469 (2.4516) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][250/625] eta 0:02:35 lr 0.000683 wd 0.0500 time 0.4045 (0.4141) data time 0.0009 (0.0024) model time 0.4036 (0.4151) loss 7.9076 (7.1405) grad_norm 3.7510 (2.4868) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][260/625] eta 0:02:30 lr 0.000683 wd 0.0500 time 0.4032 (0.4135) data time 0.0009 (0.0023) model time 0.4023 (0.4143) loss 7.6160 (7.1346) grad_norm 2.4386 (2.4902) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][270/625] eta 0:02:26 lr 0.000683 wd 0.0500 time 0.4006 (0.4132) data time 0.0010 (0.0023) model time 0.3996 (0.4137) loss 6.1593 (7.1301) grad_norm 2.1515 (2.4787) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][280/625] eta 0:02:22 lr 0.000683 wd 0.0500 time 0.4008 (0.4130) data time 0.0008 (0.0022) model time 0.4000 (0.4134) loss 7.9775 (7.1334) grad_norm 1.8673 (2.4677) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][290/625] eta 0:02:18 lr 0.000683 wd 0.0500 time 0.3969 (0.4130) data time 0.0008 (0.0022) model time 0.3961 (0.4134) loss 8.1443 (7.1398) grad_norm 2.3429 (2.4565) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][300/625] eta 0:02:14 lr 0.000683 wd 0.0500 time 0.4012 (0.4126) data time 0.0009 (0.0021) model time 0.4003 (0.4129) loss 6.3490 (7.1271) grad_norm 3.0201 (2.4552) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][310/625] eta 0:02:09 lr 0.000682 wd 0.0500 time 0.4052 (0.4122) data time 0.0007 (0.0021) model time 0.4046 (0.4124) loss 7.3956 (7.1309) grad_norm 2.4980 (2.4459) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][320/625] eta 0:02:05 lr 0.000682 wd 0.0500 time 0.3991 (0.4119) data time 0.0009 (0.0021) model time 0.3982 (0.4119) loss 7.7157 (7.1259) grad_norm 1.9577 (2.4635) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][330/625] eta 0:02:01 lr 0.000682 wd 0.0500 time 0.3962 (0.4115) data time 0.0006 (0.0020) model time 0.3956 (0.4115) loss 6.0454 (7.1334) grad_norm 2.0406 (2.4519) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][340/625] eta 0:01:57 lr 0.000682 wd 0.0500 time 0.3985 (0.4116) data time 0.0009 (0.0020) model time 0.3976 (0.4115) loss 8.0684 (7.1287) grad_norm 2.0982 (2.4444) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][350/625] eta 0:01:53 lr 0.000682 wd 0.0500 time 0.5887 (0.4133) data time 0.0008 (0.0020) model time 0.5879 (0.4135) loss 7.4287 (7.1291) grad_norm 3.0060 (2.4479) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][360/625] eta 0:01:50 lr 0.000682 wd 0.0500 time 0.5929 (0.4163) data time 0.0009 (0.0019) model time 0.5920 (0.4170) loss 7.6839 (7.1269) grad_norm 1.7084 (2.4424) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][370/625] eta 0:01:46 lr 0.000682 wd 0.0500 time 0.6084 (0.4180) data time 0.0009 (0.0019) model time 0.6076 (0.4189) loss 7.4625 (7.1323) grad_norm 8.7775 (2.4616) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][380/625] eta 0:01:42 lr 0.000682 wd 0.0500 time 0.5887 (0.4193) data time 0.0009 (0.0019) model time 0.5878 (0.4203) loss 8.0438 (7.1324) grad_norm 2.4838 (2.4777) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][390/625] eta 0:01:38 lr 0.000682 wd 0.0500 time 0.3936 (0.4191) data time 0.0007 (0.0019) model time 0.3929 (0.4201) loss 7.0933 (7.1336) grad_norm 3.1019 (2.4935) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][400/625] eta 0:01:34 lr 0.000682 wd 0.0500 time 0.4142 (0.4187) data time 0.0007 (0.0018) model time 0.4135 (0.4195) loss 6.6535 (7.1241) grad_norm 2.9369 (2.5029) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][410/625] eta 0:01:29 lr 0.000681 wd 0.0500 time 0.3983 (0.4182) data time 0.0007 (0.0018) model time 0.3976 (0.4189) loss 7.8121 (7.1276) grad_norm 2.7082 (2.4998) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][420/625] eta 0:01:25 lr 0.000681 wd 0.0500 time 0.3981 (0.4178) data time 0.0009 (0.0018) model time 0.3972 (0.4183) loss 6.5953 (7.1304) grad_norm 2.8164 (2.5030) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][430/625] eta 0:01:21 lr 0.000681 wd 0.0500 time 0.4046 (0.4174) data time 0.0009 (0.0018) model time 0.4037 (0.4179) loss 7.2969 (7.1333) grad_norm 1.7449 (2.5127) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][440/625] eta 0:01:17 lr 0.000681 wd 0.0500 time 0.3979 (0.4170) data time 0.0006 (0.0018) model time 0.3973 (0.4174) loss 7.4818 (7.1359) grad_norm 2.5997 (2.5086) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][450/625] eta 0:01:12 lr 0.000681 wd 0.0500 time 0.4002 (0.4166) data time 0.0008 (0.0017) model time 0.3993 (0.4170) loss 8.3512 (7.1420) grad_norm 4.2457 (2.5197) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][460/625] eta 0:01:08 lr 0.000681 wd 0.0500 time 0.4072 (0.4164) data time 0.0008 (0.0017) model time 0.4064 (0.4166) loss 7.7320 (7.1522) grad_norm 3.2877 (2.5313) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:50:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][470/625] eta 0:01:04 lr 0.000681 wd 0.0500 time 0.3979 (0.4160) data time 0.0006 (0.0017) model time 0.3973 (0.4162) loss 6.7250 (7.1411) grad_norm 2.8199 (2.5300) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][480/625] eta 0:01:00 lr 0.000681 wd 0.0500 time 0.3968 (0.4157) data time 0.0007 (0.0017) model time 0.3960 (0.4159) loss 5.8833 (7.1380) grad_norm 3.0026 (2.5342) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][490/625] eta 0:00:56 lr 0.000681 wd 0.0500 time 0.4057 (0.4155) data time 0.0006 (0.0017) model time 0.4051 (0.4156) loss 7.4308 (7.1424) grad_norm 2.0126 (2.5305) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][500/625] eta 0:00:51 lr 0.000680 wd 0.0500 time 0.3954 (0.4152) data time 0.0008 (0.0016) model time 0.3946 (0.4152) loss 5.5680 (7.1414) grad_norm 3.7789 (2.5311) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][510/625] eta 0:00:47 lr 0.000680 wd 0.0500 time 0.5775 (0.4153) data time 0.0010 (0.0016) model time 0.5765 (0.4153) loss 8.0755 (7.1454) grad_norm 3.0580 (2.5356) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][520/625] eta 0:00:43 lr 0.000680 wd 0.0500 time 0.4012 (0.4150) data time 0.0009 (0.0016) model time 0.4003 (0.4150) loss 6.5530 (7.1425) grad_norm 2.4348 (2.5438) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][530/625] eta 0:00:39 lr 0.000680 wd 0.0500 time 0.3974 (0.4147) data time 0.0006 (0.0016) model time 0.3968 (0.4146) loss 7.9025 (7.1408) grad_norm 2.6406 (2.5436) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][540/625] eta 0:00:35 lr 0.000680 wd 0.0500 time 0.4022 (0.4145) data time 0.0006 (0.0016) model time 0.4016 (0.4144) loss 6.8847 (7.1347) grad_norm 1.8141 (2.5374) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][550/625] eta 0:00:31 lr 0.000680 wd 0.0500 time 0.4169 (0.4143) data time 0.0006 (0.0016) model time 0.4163 (0.4141) loss 6.4973 (7.1312) grad_norm 2.3959 (2.5426) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][560/625] eta 0:00:26 lr 0.000680 wd 0.0500 time 0.6062 (0.4144) data time 0.0007 (0.0016) model time 0.6055 (0.4142) loss 5.6539 (7.1284) grad_norm 2.2777 (2.5354) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][570/625] eta 0:00:22 lr 0.000680 wd 0.0500 time 0.5847 (0.4155) data time 0.0008 (0.0016) model time 0.5839 (0.4154) loss 8.0546 (7.1315) grad_norm 3.1598 (2.5344) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][580/625] eta 0:00:18 lr 0.000680 wd 0.0500 time 0.6054 (0.4171) data time 0.0008 (0.0016) model time 0.6046 (0.4171) loss 7.6968 (7.1345) grad_norm 2.3708 (2.5354) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][590/625] eta 0:00:14 lr 0.000679 wd 0.0500 time 0.3935 (0.4177) data time 0.0007 (0.0015) model time 0.3929 (0.4178) loss 7.3651 (7.1431) grad_norm 1.6693 (2.5396) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][600/625] eta 0:00:10 lr 0.000679 wd 0.0500 time 0.3954 (0.4186) data time 0.0009 (0.0015) model time 0.3945 (0.4187) loss 6.9750 (7.1487) grad_norm 3.2446 (2.5418) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:51:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][610/625] eta 0:00:06 lr 0.000679 wd 0.0500 time 0.3947 (0.4184) data time 0.0006 (0.0015) model time 0.3941 (0.4186) loss 7.8618 (7.1573) grad_norm 2.5770 (2.5398) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][620/625] eta 0:00:02 lr 0.000679 wd 0.0500 time 0.3988 (0.4182) data time 0.0006 (0.0015) model time 0.3982 (0.4183) loss 5.7296 (7.1575) grad_norm 2.4590 (2.5527) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 148 training takes 0:04:21 [2024-07-25 02:52:04 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:52:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:52:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5771 (0.5771) Acc@1 88.574 (88.574) Acc@5 98.535 (98.535) Mem 14939MB [2024-07-25 02:52:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9092 (0.7227) Acc@1 80.566 (85.281) Acc@5 95.898 (97.417) Mem 14939MB [2024-07-25 02:52:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0830 (0.8580) Acc@1 74.951 (81.743) Acc@5 94.238 (95.994) Mem 14939MB [2024-07-25 02:52:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.336 Acc@5 95.947 [2024-07-25 02:52:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 02:52:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.34% [2024-07-25 02:52:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 02:52:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 02:52:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.5659 (0.5659) Acc@1 89.258 (89.258) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 02:52:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.9102 (0.7073) Acc@1 80.371 (85.733) Acc@5 95.703 (97.536) Mem 14939MB [2024-07-25 02:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0449 (0.8345) Acc@1 75.537 (82.213) Acc@5 94.336 (96.236) Mem 14939MB [2024-07-25 02:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.826 Acc@5 96.205 [2024-07-25 02:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-07-25 02:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.83% [2024-07-25 02:52:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:52:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:52:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][0/625] eta 0:08:45 lr 0.000679 wd 0.0500 time 0.8407 (0.8407) data time 0.4578 (0.4578) model time 0.0000 (0.0000) loss 7.5551 (7.5551) grad_norm 2.6560 (2.6560) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][10/625] eta 0:04:42 lr 0.000679 wd 0.0500 time 0.3969 (0.4599) data time 0.0008 (0.0424) model time 0.0000 (0.0000) loss 7.9833 (6.9426) grad_norm 2.8821 (2.8181) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][20/625] eta 0:04:20 lr 0.000679 wd 0.0500 time 0.4001 (0.4309) data time 0.0008 (0.0227) model time 0.0000 (0.0000) loss 7.4521 (7.0330) grad_norm 2.1040 (2.5175) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][30/625] eta 0:04:10 lr 0.000679 wd 0.0500 time 0.3975 (0.4209) data time 0.0009 (0.0156) model time 0.0000 (0.0000) loss 6.5176 (7.0274) grad_norm 1.8125 (2.7623) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][40/625] eta 0:04:03 lr 0.000679 wd 0.0500 time 0.3957 (0.4156) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 7.1154 (7.1262) grad_norm 1.9795 (2.8860) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][50/625] eta 0:03:57 lr 0.000679 wd 0.0500 time 0.4012 (0.4125) data time 0.0006 (0.0098) model time 0.0000 (0.0000) loss 7.1245 (7.0947) grad_norm 2.4691 (2.8579) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][60/625] eta 0:03:52 lr 0.000678 wd 0.0500 time 0.4069 (0.4108) data time 0.0006 (0.0084) model time 0.4063 (0.4012) loss 7.0159 (7.1108) grad_norm 3.9317 (2.8712) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][70/625] eta 0:03:47 lr 0.000678 wd 0.0500 time 0.3976 (0.4095) data time 0.0007 (0.0073) model time 0.3970 (0.4011) loss 5.6357 (7.1789) grad_norm 2.9291 (2.8859) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][80/625] eta 0:03:42 lr 0.000678 wd 0.0500 time 0.3959 (0.4082) data time 0.0009 (0.0065) model time 0.3949 (0.4000) loss 7.0201 (7.1501) grad_norm 3.2233 (2.8929) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][90/625] eta 0:03:37 lr 0.000678 wd 0.0500 time 0.3975 (0.4073) data time 0.0007 (0.0059) model time 0.3968 (0.3999) loss 7.4946 (7.1446) grad_norm 2.3205 (2.8913) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][100/625] eta 0:03:33 lr 0.000678 wd 0.0500 time 0.3993 (0.4066) data time 0.0009 (0.0054) model time 0.3984 (0.3998) loss 8.9242 (7.1258) grad_norm 2.6992 (2.8717) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:52:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][110/625] eta 0:03:29 lr 0.000678 wd 0.0500 time 0.3994 (0.4059) data time 0.0009 (0.0050) model time 0.3985 (0.3995) loss 7.2481 (7.1504) grad_norm 2.2365 (2.8570) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:53:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][120/625] eta 0:03:24 lr 0.000678 wd 0.0500 time 0.4013 (0.4055) data time 0.0007 (0.0047) model time 0.4006 (0.3995) loss 7.6280 (7.1822) grad_norm 6.4372 (2.8285) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:53:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][130/625] eta 0:03:20 lr 0.000678 wd 0.0500 time 0.4009 (0.4052) data time 0.0008 (0.0044) model time 0.4001 (0.3996) loss 6.9688 (7.1902) grad_norm 1.9064 (2.8151) loss_scale 1024.0000 (539.3588) mem 14939MB [2024-07-25 02:53:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][140/625] eta 0:03:16 lr 0.000678 wd 0.0500 time 0.3978 (0.4048) data time 0.0008 (0.0042) model time 0.3969 (0.3996) loss 6.6426 (7.1812) grad_norm 2.3979 (2.7965) loss_scale 1024.0000 (573.7305) mem 14939MB [2024-07-25 02:53:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][150/625] eta 0:03:12 lr 0.000678 wd 0.0500 time 0.4005 (0.4044) data time 0.0007 (0.0039) model time 0.3998 (0.3994) loss 7.0659 (7.1764) grad_norm 2.6291 (2.7671) loss_scale 1024.0000 (603.5497) mem 14939MB [2024-07-25 02:53:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][160/625] eta 0:03:08 lr 0.000677 wd 0.0500 time 0.3966 (0.4061) data time 0.0006 (0.0038) model time 0.3960 (0.4023) loss 8.1626 (7.1999) grad_norm 2.4032 (2.7757) loss_scale 1024.0000 (629.6646) mem 14939MB [2024-07-25 02:53:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][170/625] eta 0:03:06 lr 0.000677 wd 0.0500 time 0.5817 (0.4109) data time 0.0008 (0.0036) model time 0.5809 (0.4093) loss 7.5289 (7.1975) grad_norm 2.4863 (2.7688) loss_scale 1024.0000 (652.7251) mem 14939MB [2024-07-25 02:53:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][180/625] eta 0:03:04 lr 0.000677 wd 0.0500 time 0.3991 (0.4149) data time 0.0007 (0.0034) model time 0.3984 (0.4149) loss 6.3554 (7.2147) grad_norm 1.9396 (2.7192) loss_scale 1024.0000 (673.2376) mem 14939MB [2024-07-25 02:53:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][190/625] eta 0:03:01 lr 0.000677 wd 0.0500 time 0.3984 (0.4180) data time 0.0006 (0.0033) model time 0.3978 (0.4190) loss 7.4816 (7.2177) grad_norm 3.6650 (2.7219) loss_scale 1024.0000 (691.6021) mem 14939MB [2024-07-25 02:53:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][200/625] eta 0:02:58 lr 0.000677 wd 0.0500 time 0.3988 (0.4190) data time 0.0007 (0.0032) model time 0.3982 (0.4202) loss 7.1803 (7.2345) grad_norm 2.3393 (2.7337) loss_scale 1024.0000 (708.1393) mem 14939MB [2024-07-25 02:53:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][210/625] eta 0:02:53 lr 0.000677 wd 0.0500 time 0.4010 (0.4181) data time 0.0009 (0.0031) model time 0.4001 (0.4189) loss 6.1256 (7.2352) grad_norm 3.4031 (2.7623) loss_scale 1024.0000 (723.1090) mem 14939MB [2024-07-25 02:53:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][220/625] eta 0:02:48 lr 0.000677 wd 0.0500 time 0.4114 (0.4173) data time 0.0009 (0.0030) model time 0.4104 (0.4178) loss 8.0560 (7.2188) grad_norm 1.9241 (2.7516) loss_scale 1024.0000 (736.7240) mem 14939MB [2024-07-25 02:53:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][230/625] eta 0:02:44 lr 0.000677 wd 0.0500 time 0.3925 (0.4172) data time 0.0009 (0.0029) model time 0.3915 (0.4176) loss 5.5601 (7.2212) grad_norm 2.1605 (2.7470) loss_scale 1024.0000 (749.1602) mem 14939MB [2024-07-25 02:53:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][240/625] eta 0:02:40 lr 0.000677 wd 0.0500 time 0.4020 (0.4165) data time 0.0008 (0.0028) model time 0.4012 (0.4166) loss 5.9801 (7.2066) grad_norm 2.0969 (2.7423) loss_scale 1024.0000 (760.5643) mem 14939MB [2024-07-25 02:53:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][250/625] eta 0:02:35 lr 0.000676 wd 0.0500 time 0.4021 (0.4158) data time 0.0006 (0.0028) model time 0.4015 (0.4157) loss 6.9960 (7.2128) grad_norm 2.5714 (2.7657) loss_scale 1024.0000 (771.0598) mem 14939MB [2024-07-25 02:54:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][260/625] eta 0:02:31 lr 0.000676 wd 0.0500 time 0.3978 (0.4151) data time 0.0007 (0.0027) model time 0.3971 (0.4148) loss 8.0517 (7.2192) grad_norm 1.9935 (2.7792) loss_scale 1024.0000 (780.7510) mem 14939MB [2024-07-25 02:54:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][270/625] eta 0:02:27 lr 0.000676 wd 0.0500 time 0.4023 (0.4145) data time 0.0007 (0.0026) model time 0.4016 (0.4141) loss 8.0985 (7.2217) grad_norm 2.5797 (2.7674) loss_scale 1024.0000 (789.7269) mem 14939MB [2024-07-25 02:54:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][280/625] eta 0:02:22 lr 0.000676 wd 0.0500 time 0.4123 (0.4141) data time 0.0006 (0.0026) model time 0.4117 (0.4135) loss 8.0461 (7.2241) grad_norm 2.1844 (2.7532) loss_scale 1024.0000 (798.0641) mem 14939MB [2024-07-25 02:54:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][290/625] eta 0:02:18 lr 0.000676 wd 0.0500 time 0.3975 (0.4136) data time 0.0009 (0.0025) model time 0.3965 (0.4129) loss 7.4148 (7.2171) grad_norm 2.3515 (2.7529) loss_scale 1024.0000 (805.8282) mem 14939MB [2024-07-25 02:54:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][300/625] eta 0:02:14 lr 0.000676 wd 0.0500 time 0.3969 (0.4132) data time 0.0006 (0.0025) model time 0.3963 (0.4124) loss 8.4412 (7.2307) grad_norm 2.4074 (2.7425) loss_scale 1024.0000 (813.0764) mem 14939MB [2024-07-25 02:54:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][310/625] eta 0:02:10 lr 0.000676 wd 0.0500 time 0.4044 (0.4128) data time 0.0009 (0.0024) model time 0.4035 (0.4119) loss 7.9882 (7.2282) grad_norm 1.8321 (inf) loss_scale 512.0000 (813.2733) mem 14939MB [2024-07-25 02:54:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][320/625] eta 0:02:05 lr 0.000676 wd 0.0500 time 0.3968 (0.4123) data time 0.0008 (0.0024) model time 0.3960 (0.4113) loss 7.6076 (7.2323) grad_norm 2.6998 (inf) loss_scale 512.0000 (803.8879) mem 14939MB [2024-07-25 02:54:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][330/625] eta 0:02:01 lr 0.000676 wd 0.0500 time 0.3986 (0.4120) data time 0.0009 (0.0023) model time 0.3977 (0.4109) loss 5.8823 (7.2154) grad_norm 2.7879 (inf) loss_scale 512.0000 (795.0695) mem 14939MB [2024-07-25 02:54:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][340/625] eta 0:01:57 lr 0.000676 wd 0.0500 time 0.4040 (0.4116) data time 0.0008 (0.0023) model time 0.4032 (0.4105) loss 7.3893 (7.2215) grad_norm 3.6211 (inf) loss_scale 512.0000 (786.7683) mem 14939MB [2024-07-25 02:54:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][350/625] eta 0:01:53 lr 0.000675 wd 0.0500 time 0.3990 (0.4113) data time 0.0009 (0.0022) model time 0.3981 (0.4101) loss 5.7372 (7.2261) grad_norm 1.8469 (inf) loss_scale 512.0000 (778.9402) mem 14939MB [2024-07-25 02:54:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][360/625] eta 0:01:48 lr 0.000675 wd 0.0500 time 0.3985 (0.4110) data time 0.0010 (0.0022) model time 0.3975 (0.4098) loss 7.1585 (7.2175) grad_norm 3.5875 (inf) loss_scale 512.0000 (771.5457) mem 14939MB [2024-07-25 02:54:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][370/625] eta 0:01:44 lr 0.000675 wd 0.0500 time 0.4004 (0.4108) data time 0.0009 (0.0022) model time 0.3995 (0.4096) loss 6.2465 (7.2062) grad_norm 2.9552 (inf) loss_scale 512.0000 (764.5499) mem 14939MB [2024-07-25 02:54:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][380/625] eta 0:01:40 lr 0.000675 wd 0.0500 time 0.3973 (0.4117) data time 0.0007 (0.0021) model time 0.3966 (0.4107) loss 6.6509 (7.1960) grad_norm 1.7398 (inf) loss_scale 512.0000 (757.9213) mem 14939MB [2024-07-25 02:54:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][390/625] eta 0:01:37 lr 0.000675 wd 0.0500 time 0.5662 (0.4132) data time 0.0007 (0.0021) model time 0.5655 (0.4123) loss 7.3506 (7.2013) grad_norm 7.8496 (inf) loss_scale 512.0000 (751.6317) mem 14939MB [2024-07-25 02:54:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][400/625] eta 0:01:33 lr 0.000675 wd 0.0500 time 0.5784 (0.4146) data time 0.0006 (0.0021) model time 0.5778 (0.4140) loss 7.8293 (7.2028) grad_norm 3.4552 (inf) loss_scale 512.0000 (745.6559) mem 14939MB [2024-07-25 02:55:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][410/625] eta 0:01:29 lr 0.000675 wd 0.0500 time 0.4040 (0.4159) data time 0.0007 (0.0020) model time 0.4033 (0.4154) loss 6.8408 (7.2042) grad_norm 2.3475 (inf) loss_scale 512.0000 (739.9708) mem 14939MB [2024-07-25 02:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][420/625] eta 0:01:25 lr 0.000675 wd 0.0500 time 0.4000 (0.4169) data time 0.0008 (0.0020) model time 0.3992 (0.4166) loss 6.0942 (7.1948) grad_norm 2.4490 (inf) loss_scale 512.0000 (734.5558) mem 14939MB [2024-07-25 02:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][430/625] eta 0:01:21 lr 0.000675 wd 0.0500 time 0.4036 (0.4165) data time 0.0007 (0.0020) model time 0.4029 (0.4161) loss 8.6606 (7.1969) grad_norm 3.6942 (inf) loss_scale 512.0000 (729.3921) mem 14939MB [2024-07-25 02:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][440/625] eta 0:01:16 lr 0.000674 wd 0.0500 time 0.3956 (0.4161) data time 0.0006 (0.0020) model time 0.3950 (0.4157) loss 6.9233 (7.1976) grad_norm 4.2603 (inf) loss_scale 512.0000 (724.4626) mem 14939MB [2024-07-25 02:55:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][450/625] eta 0:01:12 lr 0.000674 wd 0.0500 time 0.3798 (0.4162) data time 0.0009 (0.0019) model time 0.3790 (0.4158) loss 7.3693 (7.2049) grad_norm 2.7995 (inf) loss_scale 512.0000 (719.7517) mem 14939MB [2024-07-25 02:55:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][460/625] eta 0:01:08 lr 0.000674 wd 0.0500 time 0.4034 (0.4159) data time 0.0009 (0.0019) model time 0.4025 (0.4154) loss 7.8975 (7.2011) grad_norm 4.8255 (inf) loss_scale 512.0000 (715.2451) mem 14939MB [2024-07-25 02:55:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][470/625] eta 0:01:04 lr 0.000674 wd 0.0500 time 0.3955 (0.4155) data time 0.0009 (0.0019) model time 0.3946 (0.4150) loss 7.3260 (7.2039) grad_norm 2.4864 (inf) loss_scale 512.0000 (710.9299) mem 14939MB [2024-07-25 02:55:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][480/625] eta 0:01:00 lr 0.000674 wd 0.0500 time 0.3993 (0.4152) data time 0.0006 (0.0019) model time 0.3987 (0.4146) loss 7.9932 (7.2067) grad_norm 2.1609 (inf) loss_scale 512.0000 (706.7942) mem 14939MB [2024-07-25 02:55:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][490/625] eta 0:00:56 lr 0.000674 wd 0.0500 time 0.3956 (0.4149) data time 0.0006 (0.0019) model time 0.3950 (0.4143) loss 6.5959 (7.2102) grad_norm 2.4176 (inf) loss_scale 512.0000 (702.8269) mem 14939MB [2024-07-25 02:55:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][500/625] eta 0:00:51 lr 0.000674 wd 0.0500 time 0.3978 (0.4146) data time 0.0006 (0.0018) model time 0.3972 (0.4139) loss 8.1936 (7.2095) grad_norm 2.8922 (inf) loss_scale 512.0000 (699.0180) mem 14939MB [2024-07-25 02:55:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][510/625] eta 0:00:47 lr 0.000674 wd 0.0500 time 0.4028 (0.4143) data time 0.0006 (0.0018) model time 0.4021 (0.4136) loss 7.8066 (7.2090) grad_norm 2.6811 (inf) loss_scale 512.0000 (695.3581) mem 14939MB [2024-07-25 02:55:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][520/625] eta 0:00:43 lr 0.000674 wd 0.0500 time 0.4006 (0.4141) data time 0.0008 (0.0018) model time 0.3998 (0.4134) loss 6.8896 (7.2070) grad_norm 2.7669 (inf) loss_scale 512.0000 (691.8388) mem 14939MB [2024-07-25 02:55:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][530/625] eta 0:00:39 lr 0.000674 wd 0.0500 time 0.3981 (0.4139) data time 0.0008 (0.0018) model time 0.3972 (0.4131) loss 7.6485 (7.2131) grad_norm 1.9682 (inf) loss_scale 512.0000 (688.4520) mem 14939MB [2024-07-25 02:55:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][540/625] eta 0:00:35 lr 0.000673 wd 0.0500 time 0.4002 (0.4137) data time 0.0009 (0.0018) model time 0.3992 (0.4129) loss 7.8790 (7.2099) grad_norm 2.5207 (inf) loss_scale 512.0000 (685.1904) mem 14939MB [2024-07-25 02:56:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][550/625] eta 0:00:31 lr 0.000673 wd 0.0500 time 0.3967 (0.4134) data time 0.0009 (0.0017) model time 0.3958 (0.4126) loss 5.6013 (7.2088) grad_norm 3.4299 (inf) loss_scale 512.0000 (682.0472) mem 14939MB [2024-07-25 02:56:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][560/625] eta 0:00:26 lr 0.000673 wd 0.0500 time 0.3958 (0.4132) data time 0.0006 (0.0017) model time 0.3952 (0.4124) loss 7.5039 (7.2094) grad_norm 3.2971 (inf) loss_scale 512.0000 (679.0160) mem 14939MB [2024-07-25 02:56:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][570/625] eta 0:00:22 lr 0.000673 wd 0.0500 time 0.3988 (0.4131) data time 0.0009 (0.0017) model time 0.3979 (0.4122) loss 7.3053 (7.2159) grad_norm 2.5561 (inf) loss_scale 512.0000 (676.0911) mem 14939MB [2024-07-25 02:56:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][580/625] eta 0:00:18 lr 0.000673 wd 0.0500 time 0.4033 (0.4128) data time 0.0008 (0.0017) model time 0.4025 (0.4119) loss 8.1027 (7.2118) grad_norm 1.7304 (inf) loss_scale 512.0000 (673.2668) mem 14939MB [2024-07-25 02:56:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][590/625] eta 0:00:14 lr 0.000673 wd 0.0500 time 0.3980 (0.4126) data time 0.0006 (0.0017) model time 0.3973 (0.4117) loss 7.3422 (7.2161) grad_norm 2.3887 (inf) loss_scale 512.0000 (670.5381) mem 14939MB [2024-07-25 02:56:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][600/625] eta 0:00:10 lr 0.000673 wd 0.0500 time 0.5794 (0.4130) data time 0.0006 (0.0017) model time 0.5787 (0.4122) loss 7.7364 (7.2198) grad_norm 3.2184 (inf) loss_scale 512.0000 (667.9002) mem 14939MB [2024-07-25 02:56:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][610/625] eta 0:00:06 lr 0.000673 wd 0.0500 time 0.5614 (0.4145) data time 0.0004 (0.0017) model time 0.5610 (0.4138) loss 7.3888 (7.2206) grad_norm 2.0406 (inf) loss_scale 512.0000 (665.3486) mem 14939MB [2024-07-25 02:56:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][620/625] eta 0:00:02 lr 0.000673 wd 0.0500 time 0.5843 (0.4153) data time 0.0005 (0.0017) model time 0.5838 (0.4147) loss 8.9247 (7.2206) grad_norm 1.9193 (inf) loss_scale 512.0000 (662.8792) mem 14939MB [2024-07-25 02:56:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 149 training takes 0:04:19 [2024-07-25 02:56:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 02:56:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 02:56:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.6040 (0.6040) Acc@1 88.428 (88.428) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 02:56:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.9751 (0.7460) Acc@1 79.199 (85.085) Acc@5 95.215 (97.212) Mem 14939MB [2024-07-25 02:56:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0850 (0.8708) Acc@1 74.658 (81.701) Acc@5 94.141 (95.977) Mem 14939MB [2024-07-25 02:56:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.298 Acc@5 95.931 [2024-07-25 02:56:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 02:56:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.846 (0.846) Loss 0.5649 (0.5649) Acc@1 89.209 (89.209) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 02:56:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.162) Loss 0.9102 (0.7068) Acc@1 80.469 (85.742) Acc@5 95.801 (97.532) Mem 14939MB [2024-07-25 02:56:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.127) Loss 1.0439 (0.8335) Acc@1 75.635 (82.241) Acc@5 94.385 (96.229) Mem 14939MB [2024-07-25 02:56:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.858 Acc@5 96.203 [2024-07-25 02:56:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 02:56:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.86% [2024-07-25 02:56:38 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 02:56:39 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 02:56:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][0/625] eta 0:08:19 lr 0.000673 wd 0.0500 time 0.7996 (0.7996) data time 0.4214 (0.4214) model time 0.0000 (0.0000) loss 6.7596 (6.7596) grad_norm 3.3971 (3.3971) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][10/625] eta 0:05:18 lr 0.000672 wd 0.0500 time 0.5675 (0.5174) data time 0.0006 (0.0395) model time 0.0000 (0.0000) loss 7.3848 (6.8885) grad_norm 2.6395 (2.8200) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][20/625] eta 0:04:46 lr 0.000672 wd 0.0500 time 0.3934 (0.4728) data time 0.0007 (0.0212) model time 0.0000 (0.0000) loss 7.2748 (6.9870) grad_norm 2.0274 (2.6734) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:56:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][30/625] eta 0:04:27 lr 0.000672 wd 0.0500 time 0.3985 (0.4498) data time 0.0009 (0.0146) model time 0.0000 (0.0000) loss 7.6785 (7.0905) grad_norm 2.0080 (2.6120) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:56:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][40/625] eta 0:04:15 lr 0.000672 wd 0.0500 time 0.4008 (0.4375) data time 0.0006 (0.0113) model time 0.0000 (0.0000) loss 7.5041 (7.2564) grad_norm 2.1157 (2.7849) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][50/625] eta 0:04:07 lr 0.000672 wd 0.0500 time 0.3969 (0.4301) data time 0.0006 (0.0093) model time 0.0000 (0.0000) loss 7.5140 (7.3243) grad_norm 2.3713 (2.6739) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][60/625] eta 0:04:00 lr 0.000672 wd 0.0500 time 0.3958 (0.4249) data time 0.0009 (0.0080) model time 0.3949 (0.3969) loss 6.9640 (7.3037) grad_norm 1.9661 (2.6083) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][70/625] eta 0:03:53 lr 0.000672 wd 0.0500 time 0.4028 (0.4214) data time 0.0007 (0.0070) model time 0.4022 (0.3979) loss 6.9137 (7.2748) grad_norm 2.0489 (2.5618) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][80/625] eta 0:03:48 lr 0.000672 wd 0.0500 time 0.3979 (0.4186) data time 0.0010 (0.0062) model time 0.3969 (0.3978) loss 7.7727 (7.3133) grad_norm 2.5209 (3.1096) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][90/625] eta 0:03:42 lr 0.000672 wd 0.0500 time 0.4043 (0.4168) data time 0.0008 (0.0059) model time 0.4035 (0.3981) loss 7.3591 (7.2951) grad_norm 2.0012 (2.9995) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][100/625] eta 0:03:37 lr 0.000671 wd 0.0500 time 0.4023 (0.4150) data time 0.0009 (0.0054) model time 0.4014 (0.3980) loss 8.4626 (7.2924) grad_norm 1.9260 (2.9527) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][110/625] eta 0:03:33 lr 0.000671 wd 0.0500 time 0.4011 (0.4137) data time 0.0007 (0.0050) model time 0.4004 (0.3983) loss 8.1385 (7.3107) grad_norm 1.8337 (2.9188) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][120/625] eta 0:03:28 lr 0.000671 wd 0.0500 time 0.3991 (0.4124) data time 0.0007 (0.0047) model time 0.3984 (0.3982) loss 6.5146 (7.3052) grad_norm 2.3618 (2.8882) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][130/625] eta 0:03:23 lr 0.000671 wd 0.0500 time 0.3979 (0.4119) data time 0.0009 (0.0048) model time 0.3970 (0.3982) loss 7.8929 (7.2949) grad_norm 1.6028 (2.8464) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][140/625] eta 0:03:19 lr 0.000671 wd 0.0500 time 0.3961 (0.4114) data time 0.0009 (0.0046) model time 0.3952 (0.3989) loss 7.3925 (7.2836) grad_norm 3.4855 (2.8346) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][150/625] eta 0:03:15 lr 0.000671 wd 0.0500 time 0.4001 (0.4106) data time 0.0009 (0.0043) model time 0.3992 (0.3989) loss 8.0752 (7.3076) grad_norm 2.1534 (2.8080) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][160/625] eta 0:03:10 lr 0.000671 wd 0.0500 time 0.4006 (0.4100) data time 0.0007 (0.0041) model time 0.3999 (0.3989) loss 7.3034 (7.3009) grad_norm 2.2046 (2.7966) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][170/625] eta 0:03:06 lr 0.000671 wd 0.0500 time 0.3956 (0.4095) data time 0.0010 (0.0039) model time 0.3947 (0.3990) loss 6.6143 (7.2824) grad_norm 2.6160 (2.7778) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][180/625] eta 0:03:01 lr 0.000671 wd 0.0500 time 0.3993 (0.4090) data time 0.0007 (0.0038) model time 0.3987 (0.3991) loss 8.0471 (7.3052) grad_norm 1.8380 (2.7541) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:57:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][190/625] eta 0:02:57 lr 0.000670 wd 0.0500 time 0.4029 (0.4086) data time 0.0008 (0.0037) model time 0.4021 (0.3991) loss 8.3326 (7.3049) grad_norm 1.8637 (2.7264) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][200/625] eta 0:02:54 lr 0.000670 wd 0.0500 time 0.3977 (0.4110) data time 0.0006 (0.0036) model time 0.3971 (0.4029) loss 6.2576 (7.3035) grad_norm 2.2452 (2.7588) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][210/625] eta 0:02:52 lr 0.000670 wd 0.0500 time 0.5864 (0.4165) data time 0.0007 (0.0034) model time 0.5858 (0.4105) loss 6.8734 (7.2946) grad_norm 2.2625 (2.7418) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][220/625] eta 0:02:49 lr 0.000670 wd 0.0500 time 0.5344 (0.4186) data time 0.0011 (0.0033) model time 0.5333 (0.4136) loss 6.1935 (7.2992) grad_norm 2.4810 (2.7436) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][230/625] eta 0:02:45 lr 0.000670 wd 0.0500 time 0.3947 (0.4202) data time 0.0008 (0.0032) model time 0.3938 (0.4159) loss 6.8346 (7.2920) grad_norm 1.9263 (2.7461) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][240/625] eta 0:02:41 lr 0.000670 wd 0.0500 time 0.3969 (0.4201) data time 0.0006 (0.0031) model time 0.3962 (0.4160) loss 5.9704 (7.2755) grad_norm 2.4188 (2.7179) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][250/625] eta 0:02:37 lr 0.000670 wd 0.0500 time 0.4051 (0.4194) data time 0.0007 (0.0030) model time 0.4044 (0.4152) loss 6.1958 (7.2580) grad_norm 2.9440 (2.7095) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][260/625] eta 0:02:32 lr 0.000670 wd 0.0500 time 0.4014 (0.4187) data time 0.0008 (0.0030) model time 0.4006 (0.4144) loss 7.7061 (7.2520) grad_norm 4.1654 (2.6993) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][270/625] eta 0:02:28 lr 0.000670 wd 0.0500 time 0.4017 (0.4180) data time 0.0006 (0.0029) model time 0.4011 (0.4138) loss 8.0887 (7.2531) grad_norm 2.4352 (2.6975) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][280/625] eta 0:02:24 lr 0.000670 wd 0.0500 time 0.4053 (0.4174) data time 0.0008 (0.0028) model time 0.4044 (0.4132) loss 7.0759 (7.2536) grad_norm 4.9445 (2.7039) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][290/625] eta 0:02:19 lr 0.000669 wd 0.0500 time 0.3987 (0.4168) data time 0.0008 (0.0028) model time 0.3979 (0.4126) loss 6.1126 (7.2499) grad_norm 2.0463 (2.7285) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][300/625] eta 0:02:15 lr 0.000669 wd 0.0500 time 0.3964 (0.4163) data time 0.0008 (0.0027) model time 0.3955 (0.4122) loss 7.4363 (7.2455) grad_norm 2.0247 (2.7160) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][310/625] eta 0:02:11 lr 0.000669 wd 0.0500 time 0.4125 (0.4159) data time 0.0009 (0.0028) model time 0.4116 (0.4116) loss 6.4403 (7.2446) grad_norm 1.8466 (2.7026) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][320/625] eta 0:02:06 lr 0.000669 wd 0.0500 time 0.3951 (0.4155) data time 0.0007 (0.0027) model time 0.3944 (0.4112) loss 6.3613 (7.2457) grad_norm 1.9506 (2.6950) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:58:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][330/625] eta 0:02:02 lr 0.000669 wd 0.0500 time 0.4009 (0.4151) data time 0.0006 (0.0027) model time 0.4003 (0.4108) loss 8.8133 (7.2537) grad_norm 9.9680 (2.7092) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][340/625] eta 0:01:58 lr 0.000669 wd 0.0500 time 0.4048 (0.4147) data time 0.0008 (0.0026) model time 0.4040 (0.4105) loss 7.5531 (7.2471) grad_norm 3.7976 (2.7227) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][350/625] eta 0:01:53 lr 0.000669 wd 0.0500 time 0.3957 (0.4142) data time 0.0006 (0.0026) model time 0.3951 (0.4100) loss 7.6907 (7.2310) grad_norm 1.6997 (2.7181) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][360/625] eta 0:01:49 lr 0.000669 wd 0.0500 time 0.3944 (0.4139) data time 0.0007 (0.0026) model time 0.3937 (0.4098) loss 6.2833 (7.2218) grad_norm 2.0656 (2.7200) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][370/625] eta 0:01:45 lr 0.000669 wd 0.0500 time 0.4104 (0.4136) data time 0.0006 (0.0025) model time 0.4098 (0.4096) loss 6.9501 (7.2241) grad_norm 2.4800 (2.7446) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][380/625] eta 0:01:41 lr 0.000668 wd 0.0500 time 0.3961 (0.4133) data time 0.0009 (0.0025) model time 0.3952 (0.4093) loss 7.2258 (7.2325) grad_norm 3.0305 (2.7513) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][390/625] eta 0:01:37 lr 0.000668 wd 0.0500 time 0.4085 (0.4131) data time 0.0007 (0.0024) model time 0.4078 (0.4091) loss 6.7210 (7.2226) grad_norm 2.8264 (2.7558) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][400/625] eta 0:01:32 lr 0.000668 wd 0.0500 time 0.4090 (0.4128) data time 0.0008 (0.0024) model time 0.4082 (0.4089) loss 7.5701 (7.2367) grad_norm 3.7340 (2.7549) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][410/625] eta 0:01:28 lr 0.000668 wd 0.0500 time 0.3962 (0.4126) data time 0.0009 (0.0024) model time 0.3953 (0.4087) loss 6.9593 (7.2371) grad_norm 1.8729 (2.7561) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][420/625] eta 0:01:24 lr 0.000668 wd 0.0500 time 0.5831 (0.4139) data time 0.0009 (0.0023) model time 0.5822 (0.4103) loss 7.7841 (7.2323) grad_norm 1.8171 (2.7579) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][430/625] eta 0:01:21 lr 0.000668 wd 0.0500 time 0.3948 (0.4156) data time 0.0008 (0.0023) model time 0.3940 (0.4123) loss 7.5091 (7.2312) grad_norm 2.9505 (2.7606) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][440/625] eta 0:01:17 lr 0.000668 wd 0.0500 time 0.6009 (0.4169) data time 0.0009 (0.0023) model time 0.6000 (0.4138) loss 6.6977 (7.2306) grad_norm 3.1322 (2.7632) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][450/625] eta 0:01:13 lr 0.000668 wd 0.0500 time 0.5604 (0.4176) data time 0.0008 (0.0023) model time 0.5596 (0.4147) loss 7.7909 (7.2394) grad_norm 2.8876 (2.7758) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][460/625] eta 0:01:08 lr 0.000668 wd 0.0500 time 0.4036 (0.4176) data time 0.0007 (0.0022) model time 0.4029 (0.4147) loss 8.0823 (7.2381) grad_norm 3.2881 (2.7954) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 02:59:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][470/625] eta 0:01:04 lr 0.000668 wd 0.0500 time 0.3969 (0.4172) data time 0.0008 (0.0022) model time 0.3961 (0.4143) loss 6.7161 (7.2384) grad_norm 2.7358 (2.7929) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][480/625] eta 0:01:00 lr 0.000667 wd 0.0500 time 0.4104 (0.4169) data time 0.0008 (0.0022) model time 0.4096 (0.4140) loss 6.5031 (7.2321) grad_norm 3.6469 (2.7948) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][490/625] eta 0:00:56 lr 0.000667 wd 0.0500 time 0.3995 (0.4165) data time 0.0009 (0.0021) model time 0.3986 (0.4137) loss 6.9386 (7.2254) grad_norm 3.2407 (2.7909) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][500/625] eta 0:00:52 lr 0.000667 wd 0.0500 time 0.3970 (0.4162) data time 0.0008 (0.0021) model time 0.3961 (0.4133) loss 7.1606 (7.2234) grad_norm 2.5560 (2.7760) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][510/625] eta 0:00:47 lr 0.000667 wd 0.0500 time 0.4014 (0.4160) data time 0.0009 (0.0021) model time 0.4005 (0.4132) loss 7.3086 (7.2199) grad_norm 2.5225 (2.7667) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][520/625] eta 0:00:43 lr 0.000667 wd 0.0500 time 0.3990 (0.4157) data time 0.0009 (0.0021) model time 0.3981 (0.4128) loss 7.4809 (7.2180) grad_norm 2.5756 (2.7612) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][530/625] eta 0:00:39 lr 0.000667 wd 0.0500 time 0.4006 (0.4154) data time 0.0006 (0.0021) model time 0.3999 (0.4126) loss 8.3898 (7.2250) grad_norm 2.1446 (2.7513) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][540/625] eta 0:00:35 lr 0.000667 wd 0.0500 time 0.3977 (0.4152) data time 0.0008 (0.0021) model time 0.3969 (0.4123) loss 6.8938 (7.2231) grad_norm 2.9869 (2.7388) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][550/625] eta 0:00:31 lr 0.000667 wd 0.0500 time 0.3981 (0.4149) data time 0.0006 (0.0020) model time 0.3975 (0.4120) loss 7.8427 (7.2292) grad_norm 1.9717 (2.7243) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][560/625] eta 0:00:26 lr 0.000667 wd 0.0500 time 0.3972 (0.4146) data time 0.0008 (0.0020) model time 0.3964 (0.4118) loss 7.4346 (7.2274) grad_norm 2.4887 (2.7173) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][570/625] eta 0:00:22 lr 0.000666 wd 0.0500 time 0.4002 (0.4144) data time 0.0009 (0.0020) model time 0.3993 (0.4115) loss 7.8374 (7.2273) grad_norm 2.3851 (2.7068) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][580/625] eta 0:00:18 lr 0.000666 wd 0.0500 time 0.3966 (0.4141) data time 0.0007 (0.0020) model time 0.3959 (0.4113) loss 6.5545 (7.2192) grad_norm 2.6815 (2.7294) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][590/625] eta 0:00:14 lr 0.000666 wd 0.0500 time 0.3946 (0.4139) data time 0.0008 (0.0020) model time 0.3938 (0.4111) loss 7.1010 (7.2220) grad_norm 2.0766 (2.7222) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][600/625] eta 0:00:10 lr 0.000666 wd 0.0500 time 0.3978 (0.4137) data time 0.0011 (0.0020) model time 0.3967 (0.4109) loss 7.4635 (7.2284) grad_norm 3.0085 (2.7262) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][610/625] eta 0:00:06 lr 0.000666 wd 0.0500 time 0.3998 (0.4135) data time 0.0004 (0.0019) model time 0.3994 (0.4107) loss 6.0237 (7.2321) grad_norm 2.0827 (2.7427) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][620/625] eta 0:00:02 lr 0.000666 wd 0.0500 time 0.3991 (0.4133) data time 0.0004 (0.0019) model time 0.3987 (0.4105) loss 6.0607 (7.2344) grad_norm 2.0376 (2.7326) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:00:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 150 training takes 0:04:18 [2024-07-25 03:00:57 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:00:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:00:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.6055 (0.6055) Acc@1 89.160 (89.160) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 03:01:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.9521 (0.7485) Acc@1 79.053 (85.192) Acc@5 95.410 (97.283) Mem 14939MB [2024-07-25 03:01:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0635 (0.8743) Acc@1 76.074 (81.652) Acc@5 94.385 (95.973) Mem 14939MB [2024-07-25 03:01:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.262 Acc@5 95.955 [2024-07-25 03:01:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 03:01:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.765 (0.765) Loss 0.5645 (0.5645) Acc@1 89.258 (89.258) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 03:01:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.9092 (0.7061) Acc@1 80.322 (85.751) Acc@5 95.801 (97.532) Mem 14939MB [2024-07-25 03:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.122) Loss 1.0430 (0.8325) Acc@1 75.684 (82.273) Acc@5 94.336 (96.231) Mem 14939MB [2024-07-25 03:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.894 Acc@5 96.219 [2024-07-25 03:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 03:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.89% [2024-07-25 03:01:04 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:01:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:01:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][0/625] eta 0:08:09 lr 0.000666 wd 0.0500 time 0.7824 (0.7824) data time 0.3730 (0.3730) model time 0.0000 (0.0000) loss 7.1634 (7.1634) grad_norm 2.5590 (2.5590) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][10/625] eta 0:04:39 lr 0.000666 wd 0.0500 time 0.6218 (0.4546) data time 0.0008 (0.0347) model time 0.0000 (0.0000) loss 6.8127 (7.0451) grad_norm 1.6279 (2.3090) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][20/625] eta 0:04:38 lr 0.000666 wd 0.0500 time 0.5842 (0.4607) data time 0.0007 (0.0187) model time 0.0000 (0.0000) loss 7.5081 (7.0580) grad_norm 2.6272 (2.6364) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][30/625] eta 0:04:35 lr 0.000666 wd 0.0500 time 0.3970 (0.4627) data time 0.0007 (0.0129) model time 0.0000 (0.0000) loss 8.5491 (7.1177) grad_norm 2.8253 (2.5338) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][40/625] eta 0:04:33 lr 0.000665 wd 0.0500 time 0.5953 (0.4673) data time 0.0009 (0.0101) model time 0.0000 (0.0000) loss 7.4058 (7.1215) grad_norm 2.0525 (2.4437) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][50/625] eta 0:04:26 lr 0.000665 wd 0.0500 time 0.3990 (0.4632) data time 0.0006 (0.0083) model time 0.0000 (0.0000) loss 6.4069 (7.0810) grad_norm 1.6721 (2.5122) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][60/625] eta 0:04:15 lr 0.000665 wd 0.0500 time 0.3969 (0.4528) data time 0.0007 (0.0071) model time 0.3961 (0.3990) loss 6.5704 (7.0958) grad_norm 1.8565 (2.4770) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][70/625] eta 0:04:07 lr 0.000665 wd 0.0500 time 0.3948 (0.4456) data time 0.0010 (0.0062) model time 0.3938 (0.3997) loss 7.2000 (7.1098) grad_norm 2.1635 (2.5320) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][80/625] eta 0:03:59 lr 0.000665 wd 0.0500 time 0.4026 (0.4399) data time 0.0009 (0.0056) model time 0.4017 (0.3993) loss 7.6754 (7.0857) grad_norm 2.1557 (2.6022) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][90/625] eta 0:03:52 lr 0.000665 wd 0.0500 time 0.3985 (0.4354) data time 0.0007 (0.0051) model time 0.3978 (0.3990) loss 7.6162 (7.1087) grad_norm 1.6948 (2.6122) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][100/625] eta 0:03:46 lr 0.000665 wd 0.0500 time 0.3961 (0.4318) data time 0.0009 (0.0047) model time 0.3952 (0.3987) loss 6.9294 (7.1237) grad_norm 2.5092 (2.5681) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][110/625] eta 0:03:40 lr 0.000665 wd 0.0500 time 0.3982 (0.4288) data time 0.0007 (0.0043) model time 0.3974 (0.3987) loss 8.5457 (7.1566) grad_norm 2.4640 (2.5638) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:01:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][120/625] eta 0:03:35 lr 0.000665 wd 0.0500 time 0.4025 (0.4265) data time 0.0008 (0.0041) model time 0.4017 (0.3987) loss 7.4302 (7.1885) grad_norm 1.9766 (2.5785) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][130/625] eta 0:03:30 lr 0.000665 wd 0.0500 time 0.4013 (0.4244) data time 0.0007 (0.0038) model time 0.4006 (0.3987) loss 7.3244 (7.2229) grad_norm 2.2095 (2.5492) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][140/625] eta 0:03:25 lr 0.000664 wd 0.0500 time 0.3999 (0.4227) data time 0.0010 (0.0036) model time 0.3990 (0.3988) loss 6.5112 (7.2313) grad_norm 1.8992 (2.5288) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][150/625] eta 0:03:20 lr 0.000664 wd 0.0500 time 0.3994 (0.4212) data time 0.0009 (0.0034) model time 0.3985 (0.3989) loss 7.3157 (7.2361) grad_norm 2.1863 (2.5093) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][160/625] eta 0:03:15 lr 0.000664 wd 0.0500 time 0.3953 (0.4199) data time 0.0008 (0.0033) model time 0.3945 (0.3989) loss 6.7046 (7.2232) grad_norm 1.8404 (2.4830) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][170/625] eta 0:03:10 lr 0.000664 wd 0.0500 time 0.3995 (0.4188) data time 0.0008 (0.0032) model time 0.3987 (0.3989) loss 8.2773 (7.2220) grad_norm 2.6311 (2.4624) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][180/625] eta 0:03:05 lr 0.000664 wd 0.0500 time 0.3957 (0.4176) data time 0.0007 (0.0030) model time 0.3950 (0.3988) loss 6.1075 (7.2042) grad_norm 2.8584 (2.4693) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][190/625] eta 0:03:01 lr 0.000664 wd 0.0500 time 0.3993 (0.4167) data time 0.0010 (0.0029) model time 0.3983 (0.3988) loss 6.1250 (7.1974) grad_norm 2.0076 (2.4533) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][200/625] eta 0:02:57 lr 0.000664 wd 0.0500 time 0.3969 (0.4165) data time 0.0008 (0.0028) model time 0.3961 (0.3996) loss 7.4792 (7.1978) grad_norm 2.6701 (2.4415) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][210/625] eta 0:02:52 lr 0.000664 wd 0.0500 time 0.4105 (0.4158) data time 0.0006 (0.0027) model time 0.4098 (0.3997) loss 7.9330 (7.1913) grad_norm 2.7790 (2.4858) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][220/625] eta 0:02:48 lr 0.000664 wd 0.0500 time 0.3977 (0.4151) data time 0.0008 (0.0027) model time 0.3969 (0.3997) loss 7.6972 (7.2015) grad_norm 2.3536 (2.5262) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][230/625] eta 0:02:43 lr 0.000663 wd 0.0500 time 0.3984 (0.4145) data time 0.0008 (0.0026) model time 0.3976 (0.3997) loss 7.6922 (7.2063) grad_norm 3.2420 (2.5304) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][240/625] eta 0:02:40 lr 0.000663 wd 0.0500 time 0.6140 (0.4177) data time 0.0009 (0.0025) model time 0.6131 (0.4045) loss 7.1540 (7.1863) grad_norm 1.8020 (2.5079) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][250/625] eta 0:02:37 lr 0.000663 wd 0.0500 time 0.3987 (0.4201) data time 0.0007 (0.0025) model time 0.3980 (0.4082) loss 7.5284 (7.1787) grad_norm 1.8119 (2.4902) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:02:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][260/625] eta 0:02:33 lr 0.000663 wd 0.0500 time 0.3994 (0.4219) data time 0.0008 (0.0024) model time 0.3987 (0.4109) loss 7.3015 (7.1821) grad_norm 3.9099 (2.4896) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][270/625] eta 0:02:30 lr 0.000663 wd 0.0500 time 0.4007 (0.4237) data time 0.0008 (0.0023) model time 0.3999 (0.4136) loss 7.2180 (7.1766) grad_norm 3.9220 (2.4816) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][280/625] eta 0:02:25 lr 0.000663 wd 0.0500 time 0.3973 (0.4228) data time 0.0008 (0.0023) model time 0.3966 (0.4129) loss 7.0385 (7.1792) grad_norm 1.4565 (2.4747) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][290/625] eta 0:02:21 lr 0.000663 wd 0.0500 time 0.4016 (0.4221) data time 0.0008 (0.0022) model time 0.4008 (0.4124) loss 8.2306 (7.1948) grad_norm 2.4807 (2.4795) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][300/625] eta 0:02:16 lr 0.000663 wd 0.0500 time 0.3993 (0.4213) data time 0.0007 (0.0022) model time 0.3986 (0.4118) loss 8.0098 (7.2063) grad_norm 2.1063 (2.4694) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][310/625] eta 0:02:12 lr 0.000663 wd 0.0500 time 0.3980 (0.4206) data time 0.0006 (0.0022) model time 0.3974 (0.4113) loss 7.6002 (7.2073) grad_norm 1.7685 (2.4617) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][320/625] eta 0:02:08 lr 0.000662 wd 0.0500 time 0.3951 (0.4199) data time 0.0007 (0.0021) model time 0.3944 (0.4108) loss 7.5102 (7.2160) grad_norm 4.6687 (2.4791) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][330/625] eta 0:02:03 lr 0.000662 wd 0.0500 time 0.3970 (0.4193) data time 0.0007 (0.0021) model time 0.3964 (0.4104) loss 8.5216 (7.2236) grad_norm 1.8694 (2.4861) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][340/625] eta 0:01:59 lr 0.000662 wd 0.0500 time 0.4031 (0.4188) data time 0.0011 (0.0020) model time 0.4020 (0.4100) loss 8.1084 (7.2221) grad_norm 2.3362 (2.4832) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][350/625] eta 0:01:55 lr 0.000662 wd 0.0500 time 0.4140 (0.4182) data time 0.0006 (0.0020) model time 0.4134 (0.4096) loss 6.0319 (7.2207) grad_norm 2.0063 (2.4773) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][360/625] eta 0:01:50 lr 0.000662 wd 0.0500 time 0.3948 (0.4176) data time 0.0008 (0.0020) model time 0.3940 (0.4092) loss 6.4370 (7.2269) grad_norm 1.8608 (2.4750) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][370/625] eta 0:01:46 lr 0.000662 wd 0.0500 time 0.3938 (0.4171) data time 0.0008 (0.0020) model time 0.3931 (0.4088) loss 7.8362 (7.2296) grad_norm 3.0600 (2.4663) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][380/625] eta 0:01:42 lr 0.000662 wd 0.0500 time 0.3953 (0.4166) data time 0.0009 (0.0019) model time 0.3944 (0.4085) loss 6.0710 (7.2190) grad_norm 2.7656 (2.4736) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][390/625] eta 0:01:37 lr 0.000662 wd 0.0500 time 0.3956 (0.4162) data time 0.0009 (0.0019) model time 0.3947 (0.4082) loss 6.4906 (7.2049) grad_norm 4.7525 (2.4727) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][400/625] eta 0:01:33 lr 0.000662 wd 0.0500 time 0.3996 (0.4158) data time 0.0009 (0.0019) model time 0.3987 (0.4079) loss 6.9369 (7.2168) grad_norm 2.5931 (2.4906) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:03:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][410/625] eta 0:01:29 lr 0.000662 wd 0.0500 time 0.3945 (0.4154) data time 0.0009 (0.0019) model time 0.3936 (0.4077) loss 8.6442 (7.2177) grad_norm 2.3573 (2.4872) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][420/625] eta 0:01:25 lr 0.000661 wd 0.0500 time 0.4057 (0.4154) data time 0.0006 (0.0018) model time 0.4051 (0.4079) loss 7.3868 (7.2224) grad_norm 1.9773 (2.5056) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][430/625] eta 0:01:20 lr 0.000661 wd 0.0500 time 0.3965 (0.4151) data time 0.0007 (0.0018) model time 0.3958 (0.4077) loss 6.9777 (7.2185) grad_norm 2.2639 (2.5079) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][440/625] eta 0:01:16 lr 0.000661 wd 0.0500 time 0.3984 (0.4148) data time 0.0009 (0.0018) model time 0.3976 (0.4075) loss 7.8044 (7.2167) grad_norm 3.3230 (2.5143) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][450/625] eta 0:01:12 lr 0.000661 wd 0.0500 time 0.4020 (0.4145) data time 0.0008 (0.0018) model time 0.4012 (0.4073) loss 7.1317 (7.2146) grad_norm 1.6475 (2.5167) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][460/625] eta 0:01:08 lr 0.000661 wd 0.0500 time 0.6066 (0.4165) data time 0.0008 (0.0018) model time 0.6058 (0.4097) loss 6.2059 (7.2130) grad_norm 3.8166 (2.5238) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][470/625] eta 0:01:04 lr 0.000661 wd 0.0500 time 0.3984 (0.4181) data time 0.0009 (0.0017) model time 0.3975 (0.4116) loss 7.3427 (7.2135) grad_norm 2.4180 (2.5423) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][480/625] eta 0:01:00 lr 0.000661 wd 0.0500 time 0.4007 (0.4193) data time 0.0009 (0.0017) model time 0.3998 (0.4131) loss 8.1830 (7.2114) grad_norm 2.1724 (2.5579) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][490/625] eta 0:00:56 lr 0.000661 wd 0.0500 time 0.5637 (0.4203) data time 0.0009 (0.0017) model time 0.5628 (0.4143) loss 7.1385 (7.2138) grad_norm 2.8575 (2.5645) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][500/625] eta 0:00:52 lr 0.000661 wd 0.0500 time 0.4182 (0.4199) data time 0.0009 (0.0017) model time 0.4173 (0.4140) loss 7.1022 (7.2145) grad_norm 3.1662 (2.5712) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][510/625] eta 0:00:48 lr 0.000660 wd 0.0500 time 0.3977 (0.4196) data time 0.0009 (0.0017) model time 0.3968 (0.4138) loss 6.4948 (7.2180) grad_norm 2.0229 (2.5866) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][520/625] eta 0:00:44 lr 0.000660 wd 0.0500 time 0.4064 (0.4193) data time 0.0006 (0.0017) model time 0.4057 (0.4136) loss 6.2710 (7.2089) grad_norm 5.5828 (2.6019) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][530/625] eta 0:00:39 lr 0.000660 wd 0.0500 time 0.3987 (0.4190) data time 0.0008 (0.0017) model time 0.3979 (0.4133) loss 7.8421 (7.2060) grad_norm 4.4230 (2.6519) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][540/625] eta 0:00:35 lr 0.000660 wd 0.0500 time 0.3985 (0.4187) data time 0.0009 (0.0016) model time 0.3976 (0.4131) loss 7.8505 (7.2014) grad_norm 2.5398 (2.6563) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:04:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][550/625] eta 0:00:31 lr 0.000660 wd 0.0500 time 0.3973 (0.4217) data time 0.0011 (0.0016) model time 0.3962 (0.4165) loss 7.7233 (7.2061) grad_norm 3.1206 (2.6485) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][560/625] eta 0:00:27 lr 0.000660 wd 0.0500 time 0.4003 (0.4214) data time 0.0006 (0.0016) model time 0.3997 (0.4162) loss 8.7075 (7.2154) grad_norm 2.8215 (2.6460) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][570/625] eta 0:00:23 lr 0.000660 wd 0.0500 time 0.4145 (0.4210) data time 0.0008 (0.0016) model time 0.4137 (0.4159) loss 7.8918 (7.2076) grad_norm 3.1306 (2.6586) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][580/625] eta 0:00:18 lr 0.000660 wd 0.0500 time 0.3983 (0.4207) data time 0.0006 (0.0016) model time 0.3977 (0.4156) loss 6.2818 (7.2051) grad_norm 4.7860 (2.6689) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][590/625] eta 0:00:14 lr 0.000660 wd 0.0500 time 0.3956 (0.4204) data time 0.0009 (0.0016) model time 0.3948 (0.4154) loss 7.9437 (7.2110) grad_norm 2.8082 (2.6798) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][600/625] eta 0:00:10 lr 0.000660 wd 0.0500 time 0.4072 (0.4200) data time 0.0007 (0.0016) model time 0.4065 (0.4151) loss 6.7345 (7.2104) grad_norm 1.8510 (2.6874) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][610/625] eta 0:00:06 lr 0.000659 wd 0.0500 time 0.3966 (0.4197) data time 0.0006 (0.0016) model time 0.3960 (0.4148) loss 8.1181 (7.2115) grad_norm 1.9081 (2.6832) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][620/625] eta 0:00:02 lr 0.000659 wd 0.0500 time 0.3988 (0.4194) data time 0.0004 (0.0016) model time 0.3984 (0.4145) loss 7.6607 (7.2069) grad_norm 3.0193 (2.6862) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 151 training takes 0:04:22 [2024-07-25 03:05:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:05:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:05:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5903 (0.5903) Acc@1 88.379 (88.379) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 03:05:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.9561 (0.7387) Acc@1 79.053 (85.059) Acc@5 95.459 (97.390) Mem 14939MB [2024-07-25 03:05:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0723 (0.8737) Acc@1 75.000 (81.494) Acc@5 94.092 (95.882) Mem 14939MB [2024-07-25 03:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.114 Acc@5 95.803 [2024-07-25 03:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 03:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.775 (0.775) Loss 0.5645 (0.5645) Acc@1 89.258 (89.258) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 03:05:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.088 (0.154) Loss 0.9082 (0.7057) Acc@1 80.273 (85.720) Acc@5 95.703 (97.514) Mem 14939MB [2024-07-25 03:05:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 1.0420 (0.8318) Acc@1 75.635 (82.238) Acc@5 94.385 (96.222) Mem 14939MB [2024-07-25 03:05:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.870 Acc@5 96.207 [2024-07-25 03:05:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 03:05:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][0/625] eta 0:12:57 lr 0.000659 wd 0.0500 time 1.2438 (1.2438) data time 0.5946 (0.5946) model time 0.0000 (0.0000) loss 6.6719 (6.6719) grad_norm 1.9119 (1.9119) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][10/625] eta 0:04:53 lr 0.000659 wd 0.0500 time 0.3947 (0.4774) data time 0.0009 (0.0589) model time 0.0000 (0.0000) loss 7.6531 (7.1763) grad_norm 2.7797 (2.1074) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][20/625] eta 0:04:26 lr 0.000659 wd 0.0500 time 0.3984 (0.4403) data time 0.0009 (0.0313) model time 0.0000 (0.0000) loss 6.8546 (7.2335) grad_norm 2.0411 (2.5010) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][30/625] eta 0:04:15 lr 0.000659 wd 0.0500 time 0.4106 (0.4292) data time 0.0007 (0.0222) model time 0.0000 (0.0000) loss 6.6842 (7.1759) grad_norm 2.2731 (2.3682) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][40/625] eta 0:04:06 lr 0.000659 wd 0.0500 time 0.3977 (0.4218) data time 0.0006 (0.0174) model time 0.0000 (0.0000) loss 7.0160 (7.1112) grad_norm 2.8700 (2.4907) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][50/625] eta 0:04:03 lr 0.000659 wd 0.0500 time 0.5904 (0.4242) data time 0.0009 (0.0142) model time 0.0000 (0.0000) loss 6.6100 (7.0863) grad_norm 3.9955 (2.9059) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][60/625] eta 0:04:05 lr 0.000659 wd 0.0500 time 0.5894 (0.4340) data time 0.0009 (0.0120) model time 0.5885 (0.4827) loss 7.2592 (7.0578) grad_norm 6.4674 (2.8799) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][70/625] eta 0:04:05 lr 0.000659 wd 0.0500 time 0.5261 (0.4421) data time 0.0008 (0.0104) model time 0.5252 (0.4869) loss 8.1688 (7.1064) grad_norm 1.9050 (2.9126) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][80/625] eta 0:04:03 lr 0.000658 wd 0.0500 time 0.5886 (0.4471) data time 0.0007 (0.0092) model time 0.5879 (0.4851) loss 6.2097 (7.1325) grad_norm 3.9098 (2.9018) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][90/625] eta 0:03:57 lr 0.000658 wd 0.0500 time 0.4055 (0.4439) data time 0.0008 (0.0086) model time 0.4047 (0.4675) loss 7.6936 (7.1308) grad_norm 2.0947 (2.8676) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][100/625] eta 0:03:51 lr 0.000658 wd 0.0500 time 0.3933 (0.4409) data time 0.0009 (0.0078) model time 0.3925 (0.4566) loss 6.1306 (7.0817) grad_norm 2.0147 (2.9024) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][110/625] eta 0:03:46 lr 0.000658 wd 0.0500 time 0.4196 (0.4389) data time 0.0006 (0.0073) model time 0.4190 (0.4499) loss 7.9378 (7.1347) grad_norm 2.8314 (2.9686) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][120/625] eta 0:03:40 lr 0.000658 wd 0.0500 time 0.4182 (0.4364) data time 0.0007 (0.0067) model time 0.4175 (0.4440) loss 7.9631 (7.1154) grad_norm 2.1586 (3.0123) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][130/625] eta 0:03:34 lr 0.000658 wd 0.0500 time 0.3957 (0.4339) data time 0.0009 (0.0063) model time 0.3949 (0.4387) loss 6.9145 (7.1361) grad_norm 2.5199 (2.9751) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][140/625] eta 0:03:29 lr 0.000658 wd 0.0500 time 0.3972 (0.4314) data time 0.0009 (0.0059) model time 0.3963 (0.4343) loss 8.0839 (7.1389) grad_norm 2.7072 (2.9479) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][150/625] eta 0:03:23 lr 0.000658 wd 0.0500 time 0.4027 (0.4294) data time 0.0006 (0.0056) model time 0.4020 (0.4308) loss 6.3401 (7.1257) grad_norm 1.9725 (3.0037) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][160/625] eta 0:03:18 lr 0.000658 wd 0.0500 time 0.4039 (0.4277) data time 0.0006 (0.0053) model time 0.4033 (0.4281) loss 6.5525 (7.1286) grad_norm 4.5279 (2.9949) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][170/625] eta 0:03:13 lr 0.000657 wd 0.0500 time 0.3980 (0.4260) data time 0.0007 (0.0050) model time 0.3973 (0.4256) loss 8.8080 (7.1339) grad_norm 2.8831 (3.0044) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][180/625] eta 0:03:08 lr 0.000657 wd 0.0500 time 0.4030 (0.4247) data time 0.0006 (0.0048) model time 0.4024 (0.4237) loss 7.4977 (7.1331) grad_norm 2.7555 (3.0466) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][190/625] eta 0:03:04 lr 0.000657 wd 0.0500 time 0.3966 (0.4234) data time 0.0007 (0.0046) model time 0.3959 (0.4220) loss 8.5961 (7.1460) grad_norm 4.5198 (3.0662) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:06:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][200/625] eta 0:02:59 lr 0.000657 wd 0.0500 time 0.4055 (0.4225) data time 0.0006 (0.0045) model time 0.4048 (0.4207) loss 7.8496 (7.1607) grad_norm 2.7031 (3.0544) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][210/625] eta 0:02:54 lr 0.000657 wd 0.0500 time 0.4003 (0.4215) data time 0.0009 (0.0043) model time 0.3994 (0.4194) loss 8.3704 (7.1528) grad_norm 2.7592 (3.0585) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][220/625] eta 0:02:50 lr 0.000657 wd 0.0500 time 0.3961 (0.4211) data time 0.0008 (0.0042) model time 0.3952 (0.4190) loss 7.1459 (7.1532) grad_norm 2.3953 (3.0335) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][230/625] eta 0:02:46 lr 0.000657 wd 0.0500 time 0.4002 (0.4203) data time 0.0008 (0.0040) model time 0.3994 (0.4180) loss 7.3913 (7.1644) grad_norm 2.6898 (3.0227) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][240/625] eta 0:02:41 lr 0.000657 wd 0.0500 time 0.4556 (0.4197) data time 0.0008 (0.0039) model time 0.4547 (0.4173) loss 7.1337 (7.1742) grad_norm 2.6843 (3.0213) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][250/625] eta 0:02:37 lr 0.000657 wd 0.0500 time 0.3931 (0.4190) data time 0.0007 (0.0038) model time 0.3924 (0.4165) loss 7.7499 (7.1750) grad_norm 2.8460 (3.0233) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][260/625] eta 0:02:32 lr 0.000656 wd 0.0500 time 0.4011 (0.4182) data time 0.0007 (0.0037) model time 0.4004 (0.4157) loss 6.3607 (7.1752) grad_norm 4.1748 (3.0262) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][270/625] eta 0:02:28 lr 0.000656 wd 0.0500 time 0.5230 (0.4188) data time 0.0008 (0.0036) model time 0.5222 (0.4165) loss 6.8049 (7.1702) grad_norm 1.9150 (3.0337) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][280/625] eta 0:02:25 lr 0.000656 wd 0.0500 time 0.5957 (0.4207) data time 0.0009 (0.0035) model time 0.5949 (0.4187) loss 7.3467 (7.1836) grad_norm 2.1819 (3.0192) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][290/625] eta 0:02:21 lr 0.000656 wd 0.0500 time 0.5835 (0.4231) data time 0.0007 (0.0034) model time 0.5828 (0.4217) loss 6.1014 (7.1922) grad_norm 3.5165 (3.0088) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][300/625] eta 0:02:17 lr 0.000656 wd 0.0500 time 0.3927 (0.4240) data time 0.0009 (0.0034) model time 0.3918 (0.4228) loss 8.3280 (7.2019) grad_norm 4.5359 (3.0584) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][310/625] eta 0:02:13 lr 0.000656 wd 0.0500 time 0.3995 (0.4243) data time 0.0006 (0.0033) model time 0.3989 (0.4232) loss 5.9517 (7.2012) grad_norm 2.8026 (3.0537) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][320/625] eta 0:02:09 lr 0.000656 wd 0.0500 time 0.3980 (0.4235) data time 0.0009 (0.0032) model time 0.3971 (0.4223) loss 8.1985 (7.2068) grad_norm 3.1596 (3.0425) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][330/625] eta 0:02:04 lr 0.000656 wd 0.0500 time 0.3971 (0.4229) data time 0.0006 (0.0031) model time 0.3965 (0.4215) loss 6.5195 (7.2219) grad_norm 2.6959 (3.0610) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:07:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][340/625] eta 0:02:00 lr 0.000656 wd 0.0500 time 0.4645 (0.4224) data time 0.0006 (0.0031) model time 0.4639 (0.4209) loss 7.7174 (7.2243) grad_norm 2.6096 (3.0874) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][350/625] eta 0:01:55 lr 0.000656 wd 0.0500 time 0.3968 (0.4218) data time 0.0009 (0.0030) model time 0.3959 (0.4203) loss 5.8193 (7.2326) grad_norm 2.3396 (3.0936) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][360/625] eta 0:01:51 lr 0.000655 wd 0.0500 time 0.4066 (0.4212) data time 0.0008 (0.0030) model time 0.4058 (0.4196) loss 7.8065 (7.2403) grad_norm 3.6521 (3.0848) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][370/625] eta 0:01:47 lr 0.000655 wd 0.0500 time 0.4065 (0.4207) data time 0.0008 (0.0029) model time 0.4057 (0.4190) loss 8.8136 (7.2410) grad_norm 2.2064 (3.0815) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][380/625] eta 0:01:42 lr 0.000655 wd 0.0500 time 0.3970 (0.4202) data time 0.0008 (0.0029) model time 0.3963 (0.4185) loss 5.8751 (7.2431) grad_norm 3.8221 (3.0846) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][390/625] eta 0:01:38 lr 0.000655 wd 0.0500 time 0.4000 (0.4197) data time 0.0009 (0.0028) model time 0.3992 (0.4179) loss 7.9007 (7.2445) grad_norm 2.3975 (3.0942) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][400/625] eta 0:01:34 lr 0.000655 wd 0.0500 time 0.4149 (0.4193) data time 0.0009 (0.0028) model time 0.4140 (0.4175) loss 7.6554 (7.2414) grad_norm 4.9664 (3.1296) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][410/625] eta 0:01:30 lr 0.000655 wd 0.0500 time 0.4029 (0.4188) data time 0.0008 (0.0027) model time 0.4020 (0.4170) loss 8.7929 (7.2418) grad_norm 3.5720 (3.1632) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][420/625] eta 0:01:25 lr 0.000655 wd 0.0500 time 0.3965 (0.4183) data time 0.0009 (0.0027) model time 0.3957 (0.4164) loss 5.9900 (7.2462) grad_norm 3.0511 (3.1762) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][430/625] eta 0:01:21 lr 0.000655 wd 0.0500 time 0.3931 (0.4179) data time 0.0007 (0.0026) model time 0.3924 (0.4159) loss 7.1043 (7.2470) grad_norm 2.1855 (3.1762) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:08:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][440/625] eta 0:01:17 lr 0.000655 wd 0.0500 time 0.3963 (0.4178) data time 0.0007 (0.0026) model time 0.3957 (0.4158) loss 7.6533 (7.2433) grad_norm 3.7645 (3.1693) loss_scale 1024.0000 (522.4490) mem 14939MB [2024-07-25 03:08:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][450/625] eta 0:01:13 lr 0.000654 wd 0.0500 time 0.3969 (0.4174) data time 0.0008 (0.0026) model time 0.3961 (0.4155) loss 7.3234 (7.2374) grad_norm 3.4124 (3.1817) loss_scale 1024.0000 (533.5698) mem 14939MB [2024-07-25 03:08:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][460/625] eta 0:01:08 lr 0.000654 wd 0.0500 time 0.4016 (0.4170) data time 0.0009 (0.0025) model time 0.4007 (0.4151) loss 8.6539 (7.2433) grad_norm 3.0300 (3.1783) loss_scale 1024.0000 (544.2082) mem 14939MB [2024-07-25 03:08:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][470/625] eta 0:01:04 lr 0.000654 wd 0.0500 time 0.4022 (0.4167) data time 0.0007 (0.0025) model time 0.4015 (0.4147) loss 7.5936 (7.2379) grad_norm 2.0773 (3.1804) loss_scale 1024.0000 (554.3949) mem 14939MB [2024-07-25 03:08:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][480/625] eta 0:01:00 lr 0.000654 wd 0.0500 time 0.3977 (0.4163) data time 0.0009 (0.0025) model time 0.3968 (0.4143) loss 7.6422 (7.2432) grad_norm 3.4114 (3.1931) loss_scale 1024.0000 (564.1580) mem 14939MB [2024-07-25 03:08:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][490/625] eta 0:00:56 lr 0.000654 wd 0.0500 time 0.5940 (0.4168) data time 0.0006 (0.0024) model time 0.5934 (0.4148) loss 7.9004 (7.2546) grad_norm 2.4677 (3.1926) loss_scale 1024.0000 (573.5234) mem 14939MB [2024-07-25 03:09:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][500/625] eta 0:00:52 lr 0.000654 wd 0.0500 time 0.3998 (0.4179) data time 0.0007 (0.0024) model time 0.3991 (0.4161) loss 7.8020 (7.2544) grad_norm 2.5730 (3.1953) loss_scale 1024.0000 (582.5150) mem 14939MB [2024-07-25 03:09:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][510/625] eta 0:00:48 lr 0.000654 wd 0.0500 time 0.3968 (0.4193) data time 0.0008 (0.0024) model time 0.3960 (0.4177) loss 6.3952 (7.2543) grad_norm 3.1546 (3.1934) loss_scale 1024.0000 (591.1546) mem 14939MB [2024-07-25 03:09:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][520/625] eta 0:00:44 lr 0.000654 wd 0.0500 time 0.6037 (0.4208) data time 0.0009 (0.0024) model time 0.6028 (0.4194) loss 8.2879 (7.2500) grad_norm 3.6370 (3.1859) loss_scale 1024.0000 (599.4626) mem 14939MB [2024-07-25 03:09:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][530/625] eta 0:00:39 lr 0.000654 wd 0.0500 time 0.3996 (0.4208) data time 0.0006 (0.0023) model time 0.3989 (0.4194) loss 7.3441 (7.2459) grad_norm 2.9112 (3.1803) loss_scale 1024.0000 (607.4576) mem 14939MB [2024-07-25 03:09:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][540/625] eta 0:00:35 lr 0.000654 wd 0.0500 time 0.4088 (0.4205) data time 0.0008 (0.0023) model time 0.4080 (0.4190) loss 6.0097 (7.2437) grad_norm 1.8078 (3.1615) loss_scale 1024.0000 (615.1571) mem 14939MB [2024-07-25 03:09:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][550/625] eta 0:00:31 lr 0.000653 wd 0.0500 time 0.3996 (0.4201) data time 0.0006 (0.0023) model time 0.3990 (0.4186) loss 6.1028 (7.2378) grad_norm 1.8601 (3.1740) loss_scale 1024.0000 (622.5771) mem 14939MB [2024-07-25 03:09:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][560/625] eta 0:00:27 lr 0.000653 wd 0.0500 time 0.4018 (0.4197) data time 0.0008 (0.0022) model time 0.4010 (0.4182) loss 7.7818 (7.2371) grad_norm 2.4176 (3.1779) loss_scale 1024.0000 (629.7326) mem 14939MB [2024-07-25 03:09:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][570/625] eta 0:00:23 lr 0.000653 wd 0.0500 time 0.3971 (0.4194) data time 0.0006 (0.0023) model time 0.3964 (0.4178) loss 7.0987 (7.2395) grad_norm 2.1660 (3.1908) loss_scale 1024.0000 (636.6375) mem 14939MB [2024-07-25 03:09:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][580/625] eta 0:00:18 lr 0.000653 wd 0.0500 time 0.3988 (0.4190) data time 0.0008 (0.0022) model time 0.3980 (0.4174) loss 7.4057 (7.2421) grad_norm 3.9847 (3.2028) loss_scale 1024.0000 (643.3046) mem 14939MB [2024-07-25 03:09:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][590/625] eta 0:00:14 lr 0.000653 wd 0.0500 time 0.3946 (0.4186) data time 0.0006 (0.0022) model time 0.3940 (0.4170) loss 7.2352 (7.2437) grad_norm 4.2327 (3.2176) loss_scale 1024.0000 (649.7462) mem 14939MB [2024-07-25 03:09:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][600/625] eta 0:00:10 lr 0.000653 wd 0.0500 time 0.3971 (0.4183) data time 0.0008 (0.0022) model time 0.3963 (0.4167) loss 8.1122 (7.2403) grad_norm 2.7516 (3.2101) loss_scale 1024.0000 (655.9734) mem 14939MB [2024-07-25 03:09:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][610/625] eta 0:00:06 lr 0.000653 wd 0.0500 time 0.3986 (0.4180) data time 0.0006 (0.0022) model time 0.3980 (0.4164) loss 6.8768 (7.2400) grad_norm 2.9312 (3.1994) loss_scale 1024.0000 (661.9967) mem 14939MB [2024-07-25 03:09:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][620/625] eta 0:00:02 lr 0.000653 wd 0.0500 time 0.4008 (0.4177) data time 0.0004 (0.0022) model time 0.4003 (0.4160) loss 7.1503 (7.2426) grad_norm 2.3440 (3.1850) loss_scale 1024.0000 (667.8261) mem 14939MB [2024-07-25 03:09:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 152 training takes 0:04:20 [2024-07-25 03:09:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:09:56 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:09:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5889 (0.5889) Acc@1 87.842 (87.842) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 03:09:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.9541 (0.7276) Acc@1 79.590 (85.036) Acc@5 95.068 (97.119) Mem 14939MB [2024-07-25 03:09:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0820 (0.8596) Acc@1 73.730 (81.569) Acc@5 93.799 (95.782) Mem 14939MB [2024-07-25 03:09:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.140 Acc@5 95.747 [2024-07-25 03:09:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 03:09:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.769 (0.769) Loss 0.5645 (0.5645) Acc@1 89.209 (89.209) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:10:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.9067 (0.7050) Acc@1 80.127 (85.724) Acc@5 95.703 (97.523) Mem 14939MB [2024-07-25 03:10:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0410 (0.8311) Acc@1 75.635 (82.257) Acc@5 94.385 (96.224) Mem 14939MB [2024-07-25 03:10:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.890 Acc@5 96.213 [2024-07-25 03:10:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 03:10:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][0/625] eta 0:12:59 lr 0.000653 wd 0.0500 time 1.2477 (1.2477) data time 0.7840 (0.7840) model time 0.0000 (0.0000) loss 7.8973 (7.8973) grad_norm 2.3060 (2.3060) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][10/625] eta 0:04:53 lr 0.000652 wd 0.0500 time 0.4007 (0.4768) data time 0.0005 (0.0721) model time 0.0000 (0.0000) loss 7.9882 (7.2241) grad_norm 2.8167 (3.0440) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][20/625] eta 0:04:26 lr 0.000652 wd 0.0500 time 0.3987 (0.4401) data time 0.0008 (0.0382) model time 0.0000 (0.0000) loss 7.6580 (7.0067) grad_norm 1.9172 (2.9864) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][30/625] eta 0:04:14 lr 0.000652 wd 0.0500 time 0.3993 (0.4280) data time 0.0009 (0.0261) model time 0.0000 (0.0000) loss 7.7414 (7.1159) grad_norm 2.8594 (3.4100) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][40/625] eta 0:04:06 lr 0.000652 wd 0.0500 time 0.4006 (0.4214) data time 0.0007 (0.0200) model time 0.0000 (0.0000) loss 7.0488 (7.1795) grad_norm 7.5647 (3.4912) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][50/625] eta 0:03:59 lr 0.000652 wd 0.0500 time 0.3983 (0.4170) data time 0.0007 (0.0163) model time 0.0000 (0.0000) loss 7.9592 (7.1177) grad_norm 1.9492 (3.3558) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][60/625] eta 0:03:54 lr 0.000652 wd 0.0500 time 0.3964 (0.4144) data time 0.0006 (0.0137) model time 0.3958 (0.4002) loss 7.6323 (7.1292) grad_norm 1.9263 (3.2038) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][70/625] eta 0:03:48 lr 0.000652 wd 0.0500 time 0.3991 (0.4123) data time 0.0007 (0.0119) model time 0.3985 (0.3997) loss 6.4108 (7.1498) grad_norm 3.7289 (3.1366) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][80/625] eta 0:03:43 lr 0.000652 wd 0.0500 time 0.4017 (0.4109) data time 0.0006 (0.0106) model time 0.4010 (0.3997) loss 6.4496 (7.1468) grad_norm 2.0209 (3.1241) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][90/625] eta 0:03:42 lr 0.000652 wd 0.0500 time 0.3983 (0.4159) data time 0.0009 (0.0095) model time 0.3975 (0.4136) loss 7.3469 (7.1278) grad_norm 1.8486 (3.1159) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][100/625] eta 0:03:42 lr 0.000652 wd 0.0500 time 0.6039 (0.4233) data time 0.0008 (0.0087) model time 0.6031 (0.4289) loss 8.1227 (7.1658) grad_norm 3.5833 (3.1193) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][110/625] eta 0:03:39 lr 0.000651 wd 0.0500 time 0.4005 (0.4256) data time 0.0008 (0.0080) model time 0.3996 (0.4321) loss 7.7794 (7.1392) grad_norm 2.8585 (3.0845) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][120/625] eta 0:03:37 lr 0.000651 wd 0.0500 time 0.4246 (0.4302) data time 0.0006 (0.0074) model time 0.4239 (0.4389) loss 7.5376 (7.1235) grad_norm 2.0261 (3.0581) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:10:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][130/625] eta 0:03:32 lr 0.000651 wd 0.0500 time 0.4052 (0.4289) data time 0.0008 (0.0069) model time 0.4044 (0.4355) loss 7.3395 (7.1349) grad_norm 1.9060 (3.0549) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][140/625] eta 0:03:27 lr 0.000651 wd 0.0500 time 0.3970 (0.4268) data time 0.0008 (0.0065) model time 0.3962 (0.4315) loss 7.9446 (7.1442) grad_norm 1.9586 (3.0796) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][150/625] eta 0:03:21 lr 0.000651 wd 0.0500 time 0.4244 (0.4252) data time 0.0008 (0.0061) model time 0.4236 (0.4284) loss 6.5187 (7.1694) grad_norm 2.6391 (3.0456) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][160/625] eta 0:03:16 lr 0.000651 wd 0.0500 time 0.3964 (0.4236) data time 0.0008 (0.0058) model time 0.3957 (0.4257) loss 8.0338 (7.1724) grad_norm 3.3830 (3.0158) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][170/625] eta 0:03:12 lr 0.000651 wd 0.0500 time 0.3974 (0.4222) data time 0.0006 (0.0055) model time 0.3968 (0.4236) loss 7.5304 (7.2009) grad_norm 2.0534 (3.0109) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][180/625] eta 0:03:07 lr 0.000651 wd 0.0500 time 0.4141 (0.4210) data time 0.0008 (0.0052) model time 0.4133 (0.4217) loss 7.3235 (7.1903) grad_norm 2.9531 (3.0133) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][190/625] eta 0:03:02 lr 0.000651 wd 0.0500 time 0.3974 (0.4198) data time 0.0006 (0.0050) model time 0.3967 (0.4199) loss 8.1073 (7.1790) grad_norm 2.1231 (3.0075) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][200/625] eta 0:02:58 lr 0.000650 wd 0.0500 time 0.4024 (0.4189) data time 0.0006 (0.0048) model time 0.4017 (0.4187) loss 6.3308 (7.1624) grad_norm 1.6263 (3.0078) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][210/625] eta 0:02:53 lr 0.000650 wd 0.0500 time 0.4053 (0.4180) data time 0.0009 (0.0046) model time 0.4043 (0.4175) loss 7.4421 (7.1805) grad_norm 2.2982 (3.0005) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][220/625] eta 0:02:48 lr 0.000650 wd 0.0500 time 0.3947 (0.4173) data time 0.0007 (0.0045) model time 0.3940 (0.4165) loss 7.4606 (7.1638) grad_norm 2.2065 (2.9830) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][230/625] eta 0:02:44 lr 0.000650 wd 0.0500 time 0.4032 (0.4174) data time 0.0007 (0.0043) model time 0.4026 (0.4166) loss 7.5418 (7.1466) grad_norm 2.8204 (2.9906) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][240/625] eta 0:02:40 lr 0.000650 wd 0.0500 time 0.4000 (0.4166) data time 0.0009 (0.0042) model time 0.3991 (0.4156) loss 7.3020 (7.1491) grad_norm 2.5490 (2.9863) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][250/625] eta 0:02:36 lr 0.000650 wd 0.0500 time 0.3992 (0.4162) data time 0.0007 (0.0041) model time 0.3984 (0.4150) loss 5.6398 (7.1469) grad_norm 2.1223 (2.9766) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][260/625] eta 0:02:31 lr 0.000650 wd 0.0500 time 0.4052 (0.4156) data time 0.0008 (0.0040) model time 0.4044 (0.4143) loss 6.2048 (7.1539) grad_norm 1.9703 (2.9556) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][270/625] eta 0:02:27 lr 0.000650 wd 0.0500 time 0.3961 (0.4150) data time 0.0008 (0.0039) model time 0.3953 (0.4136) loss 7.3063 (7.1534) grad_norm 2.7218 (2.9552) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:11:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][280/625] eta 0:02:22 lr 0.000650 wd 0.0500 time 0.3976 (0.4144) data time 0.0009 (0.0038) model time 0.3967 (0.4129) loss 7.1667 (7.1688) grad_norm 1.9489 (2.9548) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][290/625] eta 0:02:18 lr 0.000650 wd 0.0500 time 0.4115 (0.4140) data time 0.0008 (0.0037) model time 0.4108 (0.4124) loss 7.3352 (7.1642) grad_norm 2.4561 (2.9508) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][300/625] eta 0:02:14 lr 0.000649 wd 0.0500 time 0.5774 (0.4142) data time 0.0008 (0.0036) model time 0.5766 (0.4126) loss 7.5268 (7.1633) grad_norm 2.4508 (2.9621) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][310/625] eta 0:02:10 lr 0.000649 wd 0.0500 time 0.5688 (0.4155) data time 0.0011 (0.0035) model time 0.5676 (0.4143) loss 7.7439 (7.1648) grad_norm 1.9604 (2.9814) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][320/625] eta 0:02:07 lr 0.000649 wd 0.0500 time 0.5830 (0.4171) data time 0.0008 (0.0034) model time 0.5822 (0.4162) loss 6.0603 (7.1649) grad_norm 4.1454 (2.9856) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][330/625] eta 0:02:03 lr 0.000649 wd 0.0500 time 0.6143 (0.4187) data time 0.0009 (0.0033) model time 0.6134 (0.4180) loss 5.6095 (7.1597) grad_norm 2.4620 (2.9881) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][340/625] eta 0:01:59 lr 0.000649 wd 0.0500 time 0.3981 (0.4195) data time 0.0007 (0.0033) model time 0.3974 (0.4190) loss 6.0682 (7.1546) grad_norm 2.3704 (2.9721) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][350/625] eta 0:01:55 lr 0.000649 wd 0.0500 time 0.3978 (0.4195) data time 0.0007 (0.0032) model time 0.3971 (0.4190) loss 5.9450 (7.1624) grad_norm 2.8931 (2.9576) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][360/625] eta 0:01:51 lr 0.000649 wd 0.0500 time 0.3917 (0.4190) data time 0.0010 (0.0031) model time 0.3908 (0.4184) loss 8.0362 (7.1658) grad_norm 2.6430 (2.9502) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][370/625] eta 0:01:46 lr 0.000649 wd 0.0500 time 0.3971 (0.4185) data time 0.0007 (0.0031) model time 0.3965 (0.4178) loss 6.1705 (7.1717) grad_norm 3.3446 (2.9373) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][380/625] eta 0:01:42 lr 0.000649 wd 0.0500 time 0.3996 (0.4181) data time 0.0007 (0.0030) model time 0.3989 (0.4173) loss 6.8460 (7.1781) grad_norm 2.1789 (2.9330) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][390/625] eta 0:01:38 lr 0.000648 wd 0.0500 time 0.3997 (0.4176) data time 0.0007 (0.0030) model time 0.3990 (0.4167) loss 7.3729 (7.1774) grad_norm 3.0896 (2.9267) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][400/625] eta 0:01:33 lr 0.000648 wd 0.0500 time 0.3963 (0.4171) data time 0.0008 (0.0029) model time 0.3954 (0.4161) loss 6.1278 (7.1828) grad_norm 3.0074 (2.9298) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][410/625] eta 0:01:29 lr 0.000648 wd 0.0500 time 0.4012 (0.4167) data time 0.0009 (0.0029) model time 0.4002 (0.4157) loss 5.7570 (7.1700) grad_norm 2.4589 (2.9260) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:12:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][420/625] eta 0:01:25 lr 0.000648 wd 0.0500 time 0.4037 (0.4163) data time 0.0007 (0.0028) model time 0.4030 (0.4152) loss 6.5159 (7.1698) grad_norm 3.5722 (2.9233) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][430/625] eta 0:01:21 lr 0.000648 wd 0.0500 time 0.3974 (0.4159) data time 0.0009 (0.0028) model time 0.3964 (0.4148) loss 5.8092 (7.1717) grad_norm 2.5678 (2.9181) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][440/625] eta 0:01:16 lr 0.000648 wd 0.0500 time 0.4060 (0.4156) data time 0.0007 (0.0027) model time 0.4053 (0.4145) loss 6.5135 (7.1658) grad_norm 2.7876 (2.9155) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][450/625] eta 0:01:12 lr 0.000648 wd 0.0500 time 0.4133 (0.4156) data time 0.0007 (0.0027) model time 0.4126 (0.4145) loss 8.1301 (7.1644) grad_norm 2.4717 (2.9178) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][460/625] eta 0:01:08 lr 0.000648 wd 0.0500 time 0.3950 (0.4153) data time 0.0009 (0.0027) model time 0.3941 (0.4141) loss 7.8116 (7.1645) grad_norm 2.0808 (2.9119) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][470/625] eta 0:01:04 lr 0.000648 wd 0.0500 time 0.3983 (0.4150) data time 0.0008 (0.0026) model time 0.3975 (0.4138) loss 8.2164 (7.1699) grad_norm 1.9033 (2.9027) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][480/625] eta 0:01:00 lr 0.000648 wd 0.0500 time 0.4031 (0.4148) data time 0.0006 (0.0026) model time 0.4025 (0.4136) loss 7.1332 (7.1655) grad_norm 3.1460 (2.9007) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][490/625] eta 0:00:55 lr 0.000647 wd 0.0500 time 0.3940 (0.4146) data time 0.0006 (0.0025) model time 0.3933 (0.4134) loss 8.6402 (7.1765) grad_norm 2.9395 (2.9195) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][500/625] eta 0:00:51 lr 0.000647 wd 0.0500 time 0.3950 (0.4143) data time 0.0007 (0.0025) model time 0.3943 (0.4131) loss 7.0978 (7.1757) grad_norm 3.9978 (2.9327) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][510/625] eta 0:00:47 lr 0.000647 wd 0.0500 time 0.4036 (0.4141) data time 0.0007 (0.0025) model time 0.4029 (0.4128) loss 8.2485 (7.1761) grad_norm 4.5710 (2.9388) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][520/625] eta 0:00:43 lr 0.000647 wd 0.0500 time 0.3947 (0.4138) data time 0.0008 (0.0025) model time 0.3939 (0.4125) loss 7.3406 (7.1726) grad_norm 2.2751 (2.9406) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][530/625] eta 0:00:39 lr 0.000647 wd 0.0500 time 0.3984 (0.4146) data time 0.0009 (0.0024) model time 0.3975 (0.4134) loss 7.9251 (7.1780) grad_norm 2.5154 (2.9380) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][540/625] eta 0:00:35 lr 0.000647 wd 0.0500 time 0.5495 (0.4163) data time 0.0006 (0.0024) model time 0.5489 (0.4152) loss 6.6609 (7.1842) grad_norm 1.9583 (2.9275) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][550/625] eta 0:00:31 lr 0.000647 wd 0.0500 time 0.3986 (0.4169) data time 0.0007 (0.0024) model time 0.3979 (0.4159) loss 6.1732 (7.1836) grad_norm 2.6749 (2.9279) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:13:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][560/625] eta 0:00:27 lr 0.000647 wd 0.0500 time 0.4073 (0.4179) data time 0.0007 (0.0024) model time 0.4066 (0.4170) loss 6.5411 (7.1832) grad_norm 3.2166 (2.9274) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][570/625] eta 0:00:22 lr 0.000647 wd 0.0500 time 0.3990 (0.4178) data time 0.0007 (0.0023) model time 0.3983 (0.4169) loss 7.5689 (7.1905) grad_norm 2.5611 (2.9175) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][580/625] eta 0:00:18 lr 0.000646 wd 0.0500 time 0.3981 (0.4175) data time 0.0009 (0.0023) model time 0.3972 (0.4166) loss 6.3918 (7.1876) grad_norm 2.8283 (2.9181) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][590/625] eta 0:00:14 lr 0.000646 wd 0.0500 time 0.3953 (0.4172) data time 0.0006 (0.0023) model time 0.3948 (0.4163) loss 7.1396 (7.1932) grad_norm 3.0833 (2.9250) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][600/625] eta 0:00:10 lr 0.000646 wd 0.0500 time 0.4143 (0.4170) data time 0.0006 (0.0023) model time 0.4137 (0.4160) loss 6.5974 (7.1934) grad_norm 2.7077 (2.9231) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][610/625] eta 0:00:06 lr 0.000646 wd 0.0500 time 0.3923 (0.4167) data time 0.0004 (0.0022) model time 0.3919 (0.4157) loss 6.6771 (7.1964) grad_norm 2.1015 (2.9133) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][620/625] eta 0:00:02 lr 0.000646 wd 0.0500 time 0.3978 (0.4164) data time 0.0004 (0.0022) model time 0.3975 (0.4154) loss 8.4829 (7.1983) grad_norm 2.3818 (2.9099) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 153 training takes 0:04:20 [2024-07-25 03:14:21 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:14:22 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:14:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.6016 (0.6016) Acc@1 88.281 (88.281) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 03:14:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9775 (0.7472) Acc@1 78.760 (84.996) Acc@5 95.068 (97.314) Mem 14939MB [2024-07-25 03:14:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0957 (0.8766) Acc@1 73.730 (81.515) Acc@5 94.482 (95.975) Mem 14939MB [2024-07-25 03:14:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.104 Acc@5 95.941 [2024-07-25 03:14:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-07-25 03:14:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.879 (0.879) Loss 0.5635 (0.5635) Acc@1 89.355 (89.355) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:14:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.163) Loss 0.9058 (0.7044) Acc@1 80.322 (85.782) Acc@5 95.752 (97.541) Mem 14939MB [2024-07-25 03:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.126) Loss 1.0400 (0.8305) Acc@1 75.537 (82.292) Acc@5 94.482 (96.243) Mem 14939MB [2024-07-25 03:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.918 Acc@5 96.227 [2024-07-25 03:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 03:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.92% [2024-07-25 03:14:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:14:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:14:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][0/625] eta 0:08:09 lr 0.000646 wd 0.0500 time 0.7826 (0.7826) data time 0.4064 (0.4064) model time 0.0000 (0.0000) loss 7.5869 (7.5869) grad_norm 2.0930 (2.0930) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][10/625] eta 0:04:28 lr 0.000646 wd 0.0500 time 0.4050 (0.4358) data time 0.0009 (0.0378) model time 0.0000 (0.0000) loss 6.0997 (7.2371) grad_norm 4.0882 (2.6715) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][20/625] eta 0:04:14 lr 0.000646 wd 0.0500 time 0.3966 (0.4212) data time 0.0007 (0.0202) model time 0.0000 (0.0000) loss 7.6368 (7.2358) grad_norm 2.5836 (2.7506) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][30/625] eta 0:04:07 lr 0.000646 wd 0.0500 time 0.3992 (0.4152) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 6.9834 (7.2544) grad_norm 2.3765 (3.0214) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][40/625] eta 0:04:01 lr 0.000646 wd 0.0500 time 0.4137 (0.4122) data time 0.0009 (0.0112) model time 0.0000 (0.0000) loss 7.8953 (7.1786) grad_norm 2.2710 (2.8770) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][50/625] eta 0:03:55 lr 0.000645 wd 0.0500 time 0.3947 (0.4099) data time 0.0008 (0.0092) model time 0.0000 (0.0000) loss 5.6402 (7.1579) grad_norm 2.1976 (2.7727) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][60/625] eta 0:03:50 lr 0.000645 wd 0.0500 time 0.3983 (0.4085) data time 0.0009 (0.0078) model time 0.3974 (0.4004) loss 7.9239 (7.1716) grad_norm 5.8621 (2.7821) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][70/625] eta 0:03:46 lr 0.000645 wd 0.0500 time 0.3985 (0.4073) data time 0.0009 (0.0069) model time 0.3976 (0.3999) loss 7.2764 (7.1583) grad_norm 2.3842 (3.0716) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][80/625] eta 0:03:41 lr 0.000645 wd 0.0500 time 0.3983 (0.4064) data time 0.0008 (0.0061) model time 0.3975 (0.3997) loss 7.8706 (7.2134) grad_norm 2.6285 (3.0743) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][90/625] eta 0:03:37 lr 0.000645 wd 0.0500 time 0.3966 (0.4059) data time 0.0007 (0.0056) model time 0.3959 (0.3999) loss 8.0086 (7.2030) grad_norm 2.1161 (2.9883) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][100/625] eta 0:03:32 lr 0.000645 wd 0.0500 time 0.4083 (0.4053) data time 0.0008 (0.0051) model time 0.4074 (0.3998) loss 6.6565 (7.2019) grad_norm 3.4061 (2.9651) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][110/625] eta 0:03:28 lr 0.000645 wd 0.0500 time 0.3958 (0.4048) data time 0.0007 (0.0047) model time 0.3952 (0.3995) loss 6.1978 (7.1872) grad_norm 3.4812 (3.0649) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][120/625] eta 0:03:24 lr 0.000645 wd 0.0500 time 0.3983 (0.4059) data time 0.0007 (0.0044) model time 0.3976 (0.4020) loss 6.1704 (7.1665) grad_norm 2.4527 (3.0711) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][130/625] eta 0:03:23 lr 0.000645 wd 0.0500 time 0.4146 (0.4119) data time 0.0007 (0.0042) model time 0.4139 (0.4123) loss 8.3659 (7.1641) grad_norm 4.5012 (3.1068) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][140/625] eta 0:03:23 lr 0.000644 wd 0.0500 time 0.5879 (0.4189) data time 0.0007 (0.0039) model time 0.5872 (0.4230) loss 8.0626 (7.2019) grad_norm 2.4856 (3.2259) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][150/625] eta 0:03:20 lr 0.000644 wd 0.0500 time 0.5561 (0.4223) data time 0.0006 (0.0037) model time 0.5555 (0.4278) loss 6.8234 (7.2038) grad_norm 2.0413 (3.1835) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][160/625] eta 0:03:17 lr 0.000644 wd 0.0500 time 0.5121 (0.4237) data time 0.0009 (0.0035) model time 0.5112 (0.4292) loss 8.0081 (7.2315) grad_norm 2.7716 (3.1614) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][170/625] eta 0:03:12 lr 0.000644 wd 0.0500 time 0.4051 (0.4224) data time 0.0007 (0.0034) model time 0.4044 (0.4267) loss 8.0805 (7.2285) grad_norm 1.9371 (3.1273) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][180/625] eta 0:03:07 lr 0.000644 wd 0.0500 time 0.3982 (0.4211) data time 0.0006 (0.0033) model time 0.3976 (0.4245) loss 7.6536 (7.2202) grad_norm 2.8229 (3.1090) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][190/625] eta 0:03:02 lr 0.000644 wd 0.0500 time 0.4011 (0.4200) data time 0.0008 (0.0032) model time 0.4003 (0.4227) loss 7.3493 (7.2306) grad_norm 9.9188 (3.1323) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][200/625] eta 0:02:58 lr 0.000644 wd 0.0500 time 0.3999 (0.4190) data time 0.0008 (0.0031) model time 0.3991 (0.4212) loss 5.6411 (7.2339) grad_norm 4.1426 (3.2169) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:15:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][210/625] eta 0:02:53 lr 0.000644 wd 0.0500 time 0.4015 (0.4182) data time 0.0007 (0.0030) model time 0.4007 (0.4198) loss 8.3893 (7.2309) grad_norm 4.1515 (3.2211) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][220/625] eta 0:02:49 lr 0.000644 wd 0.0500 time 0.4007 (0.4179) data time 0.0008 (0.0029) model time 0.3999 (0.4193) loss 8.0320 (7.2477) grad_norm 2.7623 (3.2358) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][230/625] eta 0:02:44 lr 0.000644 wd 0.0500 time 0.4022 (0.4172) data time 0.0011 (0.0028) model time 0.4011 (0.4182) loss 7.3966 (7.2252) grad_norm 3.1774 (3.2301) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][240/625] eta 0:02:40 lr 0.000643 wd 0.0500 time 0.3987 (0.4165) data time 0.0007 (0.0028) model time 0.3979 (0.4172) loss 6.0218 (7.2333) grad_norm 2.5601 (3.2073) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][250/625] eta 0:02:35 lr 0.000643 wd 0.0500 time 0.4000 (0.4158) data time 0.0009 (0.0027) model time 0.3991 (0.4163) loss 5.3074 (7.2179) grad_norm 1.9842 (3.1955) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][260/625] eta 0:02:31 lr 0.000643 wd 0.0500 time 0.3972 (0.4152) data time 0.0007 (0.0026) model time 0.3965 (0.4154) loss 8.1599 (7.2259) grad_norm 2.6097 (3.1766) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][270/625] eta 0:02:27 lr 0.000643 wd 0.0500 time 0.3959 (0.4146) data time 0.0007 (0.0026) model time 0.3952 (0.4146) loss 8.5190 (7.2243) grad_norm 2.3436 (3.1537) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][280/625] eta 0:02:22 lr 0.000643 wd 0.0500 time 0.3948 (0.4140) data time 0.0009 (0.0025) model time 0.3939 (0.4139) loss 7.1388 (7.2244) grad_norm 4.1241 (3.1333) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][290/625] eta 0:02:18 lr 0.000643 wd 0.0500 time 0.3986 (0.4135) data time 0.0006 (0.0025) model time 0.3979 (0.4132) loss 6.4664 (7.2189) grad_norm 3.4633 (3.1317) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][300/625] eta 0:02:14 lr 0.000643 wd 0.0500 time 0.3980 (0.4130) data time 0.0009 (0.0024) model time 0.3971 (0.4126) loss 6.7361 (7.2281) grad_norm 4.7589 (3.1218) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][310/625] eta 0:02:09 lr 0.000643 wd 0.0500 time 0.3970 (0.4125) data time 0.0007 (0.0024) model time 0.3964 (0.4120) loss 6.3533 (7.2169) grad_norm 2.7386 (3.1238) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][320/625] eta 0:02:05 lr 0.000643 wd 0.0500 time 0.4035 (0.4121) data time 0.0009 (0.0023) model time 0.4026 (0.4114) loss 7.1718 (7.2178) grad_norm 5.7519 (3.1444) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][330/625] eta 0:02:01 lr 0.000642 wd 0.0500 time 0.3988 (0.4117) data time 0.0009 (0.0023) model time 0.3979 (0.4110) loss 8.4534 (7.2213) grad_norm 2.5707 (3.1344) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][340/625] eta 0:01:57 lr 0.000642 wd 0.0500 time 0.4059 (0.4119) data time 0.0009 (0.0022) model time 0.4051 (0.4112) loss 5.3520 (7.2078) grad_norm 1.8475 (3.1233) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][350/625] eta 0:01:53 lr 0.000642 wd 0.0500 time 0.6007 (0.4139) data time 0.0008 (0.0022) model time 0.5998 (0.4136) loss 8.1110 (7.2168) grad_norm 2.9135 (3.1069) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:16:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][360/625] eta 0:01:50 lr 0.000642 wd 0.0500 time 0.5855 (0.4160) data time 0.0009 (0.0022) model time 0.5846 (0.4160) loss 7.4510 (7.2247) grad_norm 2.1967 (3.0988) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][370/625] eta 0:01:46 lr 0.000642 wd 0.0500 time 0.6078 (0.4171) data time 0.0009 (0.0021) model time 0.6069 (0.4173) loss 5.9150 (7.2239) grad_norm 2.7686 (3.0893) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][380/625] eta 0:01:42 lr 0.000642 wd 0.0500 time 0.4028 (0.4177) data time 0.0006 (0.0021) model time 0.4022 (0.4179) loss 6.4926 (7.2175) grad_norm 4.2766 (3.0884) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][390/625] eta 0:01:38 lr 0.000642 wd 0.0500 time 0.3982 (0.4176) data time 0.0010 (0.0021) model time 0.3972 (0.4177) loss 7.9071 (7.2202) grad_norm 1.7006 (3.0829) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][400/625] eta 0:01:33 lr 0.000642 wd 0.0500 time 0.3999 (0.4171) data time 0.0008 (0.0021) model time 0.3991 (0.4171) loss 6.5514 (7.2130) grad_norm 3.5425 (3.0709) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][410/625] eta 0:01:29 lr 0.000642 wd 0.0500 time 0.3974 (0.4167) data time 0.0007 (0.0020) model time 0.3967 (0.4166) loss 6.1598 (7.2180) grad_norm 3.1247 (3.0617) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][420/625] eta 0:01:25 lr 0.000641 wd 0.0500 time 0.3976 (0.4162) data time 0.0008 (0.0020) model time 0.3968 (0.4161) loss 8.0495 (7.2283) grad_norm 2.8695 (3.0711) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][430/625] eta 0:01:21 lr 0.000641 wd 0.0500 time 0.3996 (0.4159) data time 0.0009 (0.0020) model time 0.3987 (0.4157) loss 7.4700 (7.2339) grad_norm 2.4233 (3.0683) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][440/625] eta 0:01:16 lr 0.000641 wd 0.0500 time 0.4051 (0.4159) data time 0.0009 (0.0020) model time 0.4042 (0.4157) loss 7.2835 (7.2382) grad_norm 3.7816 (3.0666) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][450/625] eta 0:01:12 lr 0.000641 wd 0.0500 time 0.3980 (0.4155) data time 0.0007 (0.0019) model time 0.3973 (0.4152) loss 8.5468 (7.2412) grad_norm 3.3855 (3.0626) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][460/625] eta 0:01:08 lr 0.000641 wd 0.0500 time 0.3968 (0.4152) data time 0.0009 (0.0019) model time 0.3959 (0.4148) loss 7.4214 (7.2454) grad_norm 2.3175 (3.0593) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][470/625] eta 0:01:04 lr 0.000641 wd 0.0500 time 0.4015 (0.4149) data time 0.0010 (0.0019) model time 0.4005 (0.4145) loss 6.3688 (7.2468) grad_norm 2.1975 (3.0471) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][480/625] eta 0:01:00 lr 0.000641 wd 0.0500 time 0.4024 (0.4146) data time 0.0007 (0.0019) model time 0.4016 (0.4141) loss 6.5241 (7.2516) grad_norm 2.5354 (3.0450) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][490/625] eta 0:00:55 lr 0.000641 wd 0.0500 time 0.4039 (0.4143) data time 0.0008 (0.0019) model time 0.4031 (0.4138) loss 6.8929 (7.2463) grad_norm 3.4231 (3.0352) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:17:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][500/625] eta 0:00:51 lr 0.000641 wd 0.0500 time 0.4003 (0.4141) data time 0.0007 (0.0019) model time 0.3996 (0.4135) loss 7.8990 (7.2538) grad_norm 3.3365 (3.0300) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][510/625] eta 0:00:47 lr 0.000641 wd 0.0500 time 0.3946 (0.4137) data time 0.0007 (0.0018) model time 0.3939 (0.4131) loss 5.7783 (7.2557) grad_norm 2.9710 (3.0243) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][520/625] eta 0:00:43 lr 0.000640 wd 0.0500 time 0.3989 (0.4135) data time 0.0007 (0.0018) model time 0.3982 (0.4128) loss 6.5652 (7.2428) grad_norm 2.0097 (3.0332) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][530/625] eta 0:00:39 lr 0.000640 wd 0.0500 time 0.4043 (0.4132) data time 0.0009 (0.0018) model time 0.4034 (0.4125) loss 7.8248 (7.2482) grad_norm 2.5021 (3.0226) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][540/625] eta 0:00:35 lr 0.000640 wd 0.0500 time 0.3999 (0.4130) data time 0.0006 (0.0018) model time 0.3993 (0.4123) loss 6.7447 (7.2456) grad_norm 2.7781 (3.0202) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][550/625] eta 0:00:30 lr 0.000640 wd 0.0500 time 0.4003 (0.4128) data time 0.0009 (0.0018) model time 0.3994 (0.4121) loss 7.2340 (7.2403) grad_norm 4.3468 (3.0343) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][560/625] eta 0:00:26 lr 0.000640 wd 0.0500 time 0.5747 (0.4132) data time 0.0007 (0.0018) model time 0.5741 (0.4125) loss 6.8805 (7.2339) grad_norm 4.5882 (3.0321) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][570/625] eta 0:00:22 lr 0.000640 wd 0.0500 time 0.5895 (0.4142) data time 0.0007 (0.0017) model time 0.5888 (0.4136) loss 6.9986 (7.2301) grad_norm 2.7498 (3.0596) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][580/625] eta 0:00:18 lr 0.000640 wd 0.0500 time 0.5914 (0.4158) data time 0.0006 (0.0017) model time 0.5907 (0.4153) loss 7.0172 (7.2326) grad_norm 2.4613 (3.0561) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][590/625] eta 0:00:14 lr 0.000640 wd 0.0500 time 0.3973 (0.4162) data time 0.0009 (0.0017) model time 0.3964 (0.4158) loss 6.7927 (7.2298) grad_norm 3.5884 (3.0542) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][600/625] eta 0:00:10 lr 0.000640 wd 0.0500 time 0.3996 (0.4162) data time 0.0007 (0.0017) model time 0.3989 (0.4158) loss 5.9636 (7.2356) grad_norm 2.0097 (3.0449) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][610/625] eta 0:00:06 lr 0.000639 wd 0.0500 time 0.3964 (0.4161) data time 0.0004 (0.0017) model time 0.3960 (0.4157) loss 7.8801 (7.2433) grad_norm 3.6901 (3.0448) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][620/625] eta 0:00:02 lr 0.000639 wd 0.0500 time 0.3987 (0.4158) data time 0.0006 (0.0017) model time 0.3981 (0.4154) loss 6.4059 (7.2384) grad_norm 2.9710 (3.0384) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:18:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 154 training takes 0:04:19 [2024-07-25 03:18:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:18:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:18:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.5811 (0.5811) Acc@1 88.281 (88.281) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 03:18:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.123) Loss 0.9292 (0.7173) Acc@1 79.443 (85.236) Acc@5 95.703 (97.439) Mem 14939MB [2024-07-25 03:18:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.0752 (0.8543) Acc@1 74.170 (81.717) Acc@5 93.994 (95.973) Mem 14939MB [2024-07-25 03:18:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.312 Acc@5 95.947 [2024-07-25 03:18:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 03:18:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.864 (0.864) Loss 0.5625 (0.5625) Acc@1 89.453 (89.453) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:18:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.158) Loss 0.9053 (0.7038) Acc@1 80.322 (85.813) Acc@5 95.654 (97.536) Mem 14939MB [2024-07-25 03:18:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 1.0381 (0.8295) Acc@1 75.537 (82.331) Acc@5 94.629 (96.263) Mem 14939MB [2024-07-25 03:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.946 Acc@5 96.241 [2024-07-25 03:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 03:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.95% [2024-07-25 03:18:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:18:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:18:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][0/625] eta 0:08:31 lr 0.000639 wd 0.0500 time 0.8180 (0.8180) data time 0.4233 (0.4233) model time 0.0000 (0.0000) loss 6.7995 (6.7995) grad_norm 2.3392 (2.3392) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][10/625] eta 0:04:31 lr 0.000639 wd 0.0500 time 0.4057 (0.4407) data time 0.0006 (0.0394) model time 0.0000 (0.0000) loss 7.3576 (7.2103) grad_norm 2.1169 (2.5359) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][20/625] eta 0:04:16 lr 0.000639 wd 0.0500 time 0.3963 (0.4232) data time 0.0009 (0.0211) model time 0.0000 (0.0000) loss 8.3158 (7.1234) grad_norm 2.4785 (2.7764) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][30/625] eta 0:04:07 lr 0.000639 wd 0.0500 time 0.4056 (0.4158) data time 0.0009 (0.0146) model time 0.0000 (0.0000) loss 7.7386 (7.0660) grad_norm 3.3870 (3.0226) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][40/625] eta 0:04:01 lr 0.000639 wd 0.0500 time 0.3977 (0.4120) data time 0.0008 (0.0113) model time 0.0000 (0.0000) loss 6.3017 (7.1001) grad_norm 2.0020 (2.8902) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][50/625] eta 0:03:55 lr 0.000639 wd 0.0500 time 0.3980 (0.4097) data time 0.0007 (0.0092) model time 0.0000 (0.0000) loss 8.8575 (7.2299) grad_norm 2.2614 (2.7555) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][60/625] eta 0:03:50 lr 0.000639 wd 0.0500 time 0.4013 (0.4082) data time 0.0007 (0.0079) model time 0.4006 (0.3995) loss 7.3619 (7.2145) grad_norm 1.6681 (2.7148) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][70/625] eta 0:03:45 lr 0.000639 wd 0.0500 time 0.3934 (0.4070) data time 0.0007 (0.0069) model time 0.3927 (0.3992) loss 8.3386 (7.2735) grad_norm 1.9355 (2.6151) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][80/625] eta 0:03:41 lr 0.000638 wd 0.0500 time 0.3971 (0.4064) data time 0.0009 (0.0062) model time 0.3962 (0.3998) loss 6.7167 (7.2926) grad_norm 2.1031 (2.6766) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][90/625] eta 0:03:37 lr 0.000638 wd 0.0500 time 0.4018 (0.4059) data time 0.0006 (0.0056) model time 0.4012 (0.4000) loss 8.7028 (7.2742) grad_norm 3.1412 (2.6362) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][100/625] eta 0:03:32 lr 0.000638 wd 0.0500 time 0.3974 (0.4052) data time 0.0007 (0.0051) model time 0.3968 (0.3997) loss 8.5933 (7.2889) grad_norm 3.3237 (2.6170) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][110/625] eta 0:03:28 lr 0.000638 wd 0.0500 time 0.4052 (0.4048) data time 0.0007 (0.0048) model time 0.4045 (0.3998) loss 6.5766 (7.2713) grad_norm 2.1324 (2.8197) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][120/625] eta 0:03:24 lr 0.000638 wd 0.0500 time 0.4001 (0.4047) data time 0.0009 (0.0044) model time 0.3993 (0.4001) loss 8.1080 (7.2534) grad_norm 2.6450 (2.8397) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][130/625] eta 0:03:20 lr 0.000638 wd 0.0500 time 0.3972 (0.4045) data time 0.0007 (0.0042) model time 0.3965 (0.4002) loss 8.2645 (7.2281) grad_norm 2.9369 (2.9118) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][140/625] eta 0:03:16 lr 0.000638 wd 0.0500 time 0.4023 (0.4042) data time 0.0008 (0.0039) model time 0.4016 (0.4001) loss 8.7562 (7.2561) grad_norm 1.8625 (2.9317) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:19:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][150/625] eta 0:03:11 lr 0.000638 wd 0.0500 time 0.4020 (0.4040) data time 0.0008 (0.0037) model time 0.4012 (0.4001) loss 6.3233 (7.2406) grad_norm 1.8942 (2.9464) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][160/625] eta 0:03:08 lr 0.000638 wd 0.0500 time 0.3986 (0.4062) data time 0.0007 (0.0036) model time 0.3979 (0.4037) loss 7.2452 (7.2455) grad_norm 3.6534 (3.0021) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][170/625] eta 0:03:07 lr 0.000637 wd 0.0500 time 0.6184 (0.4115) data time 0.0010 (0.0034) model time 0.6174 (0.4114) loss 6.1350 (7.2478) grad_norm 2.8512 (2.9829) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][180/625] eta 0:03:05 lr 0.000637 wd 0.0500 time 0.4017 (0.4163) data time 0.0009 (0.0033) model time 0.4008 (0.4180) loss 7.7407 (7.2352) grad_norm 2.9906 (2.9607) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][190/625] eta 0:03:02 lr 0.000637 wd 0.0500 time 0.3979 (0.4185) data time 0.0007 (0.0032) model time 0.3973 (0.4207) loss 8.0338 (7.2255) grad_norm 2.1974 (2.9330) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][200/625] eta 0:02:58 lr 0.000637 wd 0.0500 time 0.4011 (0.4193) data time 0.0008 (0.0030) model time 0.4003 (0.4216) loss 6.5271 (7.2242) grad_norm 3.4601 (2.9456) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][210/625] eta 0:02:53 lr 0.000637 wd 0.0500 time 0.4002 (0.4184) data time 0.0008 (0.0029) model time 0.3994 (0.4202) loss 7.1437 (7.2357) grad_norm 1.8677 (2.9490) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][220/625] eta 0:02:49 lr 0.000637 wd 0.0500 time 0.3996 (0.4176) data time 0.0007 (0.0029) model time 0.3989 (0.4191) loss 7.9924 (7.2303) grad_norm 2.2035 (2.9533) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][230/625] eta 0:02:44 lr 0.000637 wd 0.0500 time 0.3959 (0.4169) data time 0.0006 (0.0028) model time 0.3953 (0.4180) loss 7.5139 (7.2154) grad_norm 2.1393 (2.9282) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][240/625] eta 0:02:40 lr 0.000637 wd 0.0500 time 0.4160 (0.4162) data time 0.0008 (0.0027) model time 0.4152 (0.4170) loss 6.2640 (7.2260) grad_norm 3.0001 (2.9131) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][250/625] eta 0:02:35 lr 0.000637 wd 0.0500 time 0.3993 (0.4155) data time 0.0008 (0.0026) model time 0.3985 (0.4161) loss 6.2549 (7.2194) grad_norm 2.4965 (2.9086) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][260/625] eta 0:02:31 lr 0.000637 wd 0.0500 time 0.3974 (0.4150) data time 0.0009 (0.0026) model time 0.3965 (0.4153) loss 8.3709 (7.2134) grad_norm 3.7993 (2.9115) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][270/625] eta 0:02:27 lr 0.000636 wd 0.0500 time 0.4020 (0.4144) data time 0.0007 (0.0025) model time 0.4013 (0.4145) loss 7.0300 (7.1866) grad_norm 2.3126 (2.8992) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][280/625] eta 0:02:22 lr 0.000636 wd 0.0500 time 0.3989 (0.4138) data time 0.0009 (0.0024) model time 0.3980 (0.4138) loss 7.5707 (7.1807) grad_norm 3.2529 (2.8790) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:20:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][290/625] eta 0:02:18 lr 0.000636 wd 0.0500 time 0.4044 (0.4134) data time 0.0007 (0.0024) model time 0.4037 (0.4132) loss 6.2542 (7.1839) grad_norm 10.6885 (2.8910) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][300/625] eta 0:02:14 lr 0.000636 wd 0.0500 time 0.3991 (0.4129) data time 0.0006 (0.0023) model time 0.3985 (0.4126) loss 6.0662 (7.1757) grad_norm 4.5812 (2.9088) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][310/625] eta 0:02:09 lr 0.000636 wd 0.0500 time 0.3976 (0.4126) data time 0.0007 (0.0023) model time 0.3970 (0.4122) loss 7.2779 (7.1791) grad_norm 2.3563 (2.9109) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][320/625] eta 0:02:05 lr 0.000636 wd 0.0500 time 0.3965 (0.4122) data time 0.0008 (0.0023) model time 0.3956 (0.4117) loss 7.8330 (7.1803) grad_norm 2.4020 (2.8986) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][330/625] eta 0:02:01 lr 0.000636 wd 0.0500 time 0.4121 (0.4119) data time 0.0006 (0.0023) model time 0.4115 (0.4113) loss 7.1448 (7.1707) grad_norm 1.7177 (2.8953) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][340/625] eta 0:01:57 lr 0.000636 wd 0.0500 time 0.4001 (0.4117) data time 0.0008 (0.0022) model time 0.3993 (0.4110) loss 6.8816 (7.1797) grad_norm 2.1889 (2.8874) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][350/625] eta 0:01:53 lr 0.000636 wd 0.0500 time 0.3975 (0.4115) data time 0.0006 (0.0022) model time 0.3969 (0.4108) loss 7.8766 (7.1807) grad_norm 4.1458 (2.8987) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][360/625] eta 0:01:48 lr 0.000635 wd 0.0500 time 0.4024 (0.4113) data time 0.0008 (0.0023) model time 0.4016 (0.4104) loss 7.8911 (7.1823) grad_norm 2.7121 (2.9071) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][370/625] eta 0:01:44 lr 0.000635 wd 0.0500 time 0.3995 (0.4110) data time 0.0007 (0.0022) model time 0.3988 (0.4101) loss 5.9927 (7.1895) grad_norm 1.7666 (2.8993) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][380/625] eta 0:01:40 lr 0.000635 wd 0.0500 time 0.5202 (0.4120) data time 0.0006 (0.0022) model time 0.5196 (0.4112) loss 7.3215 (7.2000) grad_norm 2.0763 (2.8899) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][390/625] eta 0:01:37 lr 0.000635 wd 0.0500 time 0.6033 (0.4140) data time 0.0008 (0.0022) model time 0.6024 (0.4135) loss 7.8864 (7.1994) grad_norm 3.5495 (2.9115) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][400/625] eta 0:01:33 lr 0.000635 wd 0.0500 time 0.5560 (0.4159) data time 0.0009 (0.0021) model time 0.5551 (0.4157) loss 7.1966 (7.2000) grad_norm 2.3147 (2.9553) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][410/625] eta 0:01:29 lr 0.000635 wd 0.0500 time 0.3980 (0.4168) data time 0.0007 (0.0022) model time 0.3973 (0.4166) loss 6.9140 (7.2080) grad_norm 2.3074 (2.9518) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][420/625] eta 0:01:25 lr 0.000635 wd 0.0500 time 0.3989 (0.4168) data time 0.0007 (0.0022) model time 0.3982 (0.4166) loss 7.5262 (7.2127) grad_norm 2.6079 (2.9411) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:21:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][430/625] eta 0:01:21 lr 0.000635 wd 0.0500 time 0.4052 (0.4165) data time 0.0006 (0.0022) model time 0.4046 (0.4162) loss 8.9098 (7.2205) grad_norm 2.0204 (2.9255) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][440/625] eta 0:01:16 lr 0.000635 wd 0.0500 time 0.3981 (0.4162) data time 0.0007 (0.0021) model time 0.3974 (0.4158) loss 7.7206 (7.2187) grad_norm 2.0508 (2.9162) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][450/625] eta 0:01:12 lr 0.000635 wd 0.0500 time 0.4053 (0.4159) data time 0.0009 (0.0021) model time 0.4044 (0.4155) loss 8.2375 (7.2234) grad_norm 2.2153 (2.9168) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][460/625] eta 0:01:08 lr 0.000634 wd 0.0500 time 0.4047 (0.4156) data time 0.0007 (0.0021) model time 0.4040 (0.4151) loss 6.3155 (7.2150) grad_norm 2.4022 (2.9028) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][470/625] eta 0:01:04 lr 0.000634 wd 0.0500 time 0.3944 (0.4153) data time 0.0007 (0.0021) model time 0.3938 (0.4147) loss 7.3950 (7.2059) grad_norm 2.3095 (2.8923) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][480/625] eta 0:01:00 lr 0.000634 wd 0.0500 time 0.3977 (0.4149) data time 0.0007 (0.0020) model time 0.3970 (0.4143) loss 6.6384 (7.2109) grad_norm 5.1073 (2.8865) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][490/625] eta 0:00:55 lr 0.000634 wd 0.0500 time 0.3996 (0.4146) data time 0.0009 (0.0020) model time 0.3987 (0.4140) loss 7.1928 (7.2131) grad_norm 2.1042 (2.8958) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][500/625] eta 0:00:51 lr 0.000634 wd 0.0500 time 0.3987 (0.4143) data time 0.0006 (0.0020) model time 0.3981 (0.4137) loss 8.7454 (7.2122) grad_norm 2.0181 (2.8896) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][510/625] eta 0:00:47 lr 0.000634 wd 0.0500 time 0.4046 (0.4140) data time 0.0007 (0.0020) model time 0.4039 (0.4134) loss 5.9239 (7.2118) grad_norm 2.9855 (2.8863) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][520/625] eta 0:00:43 lr 0.000634 wd 0.0500 time 0.4019 (0.4138) data time 0.0006 (0.0019) model time 0.4013 (0.4131) loss 6.7341 (7.2225) grad_norm 4.9351 (2.8873) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][530/625] eta 0:00:39 lr 0.000634 wd 0.0500 time 0.3932 (0.4135) data time 0.0006 (0.0019) model time 0.3926 (0.4128) loss 7.5875 (7.2294) grad_norm 2.5992 (2.8817) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][540/625] eta 0:00:35 lr 0.000634 wd 0.0500 time 0.3967 (0.4132) data time 0.0008 (0.0019) model time 0.3959 (0.4124) loss 7.0365 (7.2267) grad_norm 3.1555 (2.8697) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][550/625] eta 0:00:30 lr 0.000633 wd 0.0500 time 0.4050 (0.4129) data time 0.0008 (0.0019) model time 0.4042 (0.4121) loss 7.7237 (7.2300) grad_norm 2.3009 (2.8653) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:22:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][560/625] eta 0:00:26 lr 0.000633 wd 0.0500 time 0.3935 (0.4127) data time 0.0007 (0.0019) model time 0.3929 (0.4119) loss 7.8408 (7.2275) grad_norm 1.9260 (2.8541) loss_scale 2048.0000 (1031.3012) mem 14939MB [2024-07-25 03:22:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][570/625] eta 0:00:22 lr 0.000633 wd 0.0500 time 0.3962 (0.4124) data time 0.0009 (0.0019) model time 0.3953 (0.4116) loss 5.7469 (7.2308) grad_norm 2.7606 (2.8605) loss_scale 2048.0000 (1049.1068) mem 14939MB [2024-07-25 03:22:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][580/625] eta 0:00:18 lr 0.000633 wd 0.0500 time 0.4003 (0.4122) data time 0.0008 (0.0018) model time 0.3995 (0.4113) loss 6.7179 (7.2321) grad_norm 4.2535 (2.8724) loss_scale 2048.0000 (1066.2995) mem 14939MB [2024-07-25 03:23:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][590/625] eta 0:00:14 lr 0.000633 wd 0.0500 time 0.4039 (0.4120) data time 0.0006 (0.0018) model time 0.4033 (0.4111) loss 7.6829 (7.2407) grad_norm 8.0251 (2.8860) loss_scale 2048.0000 (1082.9103) mem 14939MB [2024-07-25 03:23:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][600/625] eta 0:00:10 lr 0.000633 wd 0.0500 time 0.5672 (0.4124) data time 0.0006 (0.0018) model time 0.5666 (0.4115) loss 5.9579 (7.2378) grad_norm 2.1574 (2.8808) loss_scale 2048.0000 (1098.9684) mem 14939MB [2024-07-25 03:23:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][610/625] eta 0:00:06 lr 0.000633 wd 0.0500 time 0.5635 (0.4135) data time 0.0006 (0.0018) model time 0.5629 (0.4127) loss 7.2276 (7.2402) grad_norm 3.1684 (2.8765) loss_scale 2048.0000 (1114.5008) mem 14939MB [2024-07-25 03:23:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][620/625] eta 0:00:02 lr 0.000633 wd 0.0500 time 0.5569 (0.4151) data time 0.0007 (0.0018) model time 0.5562 (0.4145) loss 7.6640 (7.2416) grad_norm 2.2681 (2.8742) loss_scale 2048.0000 (1129.5330) mem 14939MB [2024-07-25 03:23:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 155 training takes 0:04:19 [2024-07-25 03:23:16 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:23:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:23:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.5869 (0.5869) Acc@1 88.330 (88.330) Acc@5 98.242 (98.242) Mem 14939MB [2024-07-25 03:23:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.123) Loss 0.9438 (0.7265) Acc@1 79.639 (85.405) Acc@5 95.654 (97.368) Mem 14939MB [2024-07-25 03:23:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.0557 (0.8553) Acc@1 76.221 (82.020) Acc@5 94.385 (95.987) Mem 14939MB [2024-07-25 03:23:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.656 Acc@5 95.957 [2024-07-25 03:23:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 03:23:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.66% [2024-07-25 03:23:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 03:23:21 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 03:23:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5625 (0.5625) Acc@1 89.453 (89.453) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:23:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9048 (0.7034) Acc@1 80.518 (85.831) Acc@5 95.703 (97.536) Mem 14939MB [2024-07-25 03:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.106) Loss 1.0361 (0.8288) Acc@1 75.537 (82.389) Acc@5 94.678 (96.270) Mem 14939MB [2024-07-25 03:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.012 Acc@5 96.245 [2024-07-25 03:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-07-25 03:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.01% [2024-07-25 03:23:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:23:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][0/625] eta 0:07:39 lr 0.000633 wd 0.0500 time 0.7357 (0.7357) data time 0.3568 (0.3568) model time 0.0000 (0.0000) loss 8.4281 (8.4281) grad_norm 2.4448 (2.4448) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:23:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][10/625] eta 0:04:35 lr 0.000633 wd 0.0500 time 0.4016 (0.4481) data time 0.0008 (0.0334) model time 0.0000 (0.0000) loss 6.0665 (7.1579) grad_norm 2.5022 (2.1571) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:23:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][20/625] eta 0:04:22 lr 0.000632 wd 0.0500 time 0.4167 (0.4339) data time 0.0007 (0.0180) model time 0.0000 (0.0000) loss 8.0099 (7.1376) grad_norm 1.9606 (2.2080) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:23:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][30/625] eta 0:04:11 lr 0.000632 wd 0.0500 time 0.3991 (0.4230) data time 0.0006 (0.0125) model time 0.0000 (0.0000) loss 6.1437 (7.1220) grad_norm 1.5774 (2.2536) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:23:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][40/625] eta 0:04:03 lr 0.000632 wd 0.0500 time 0.3960 (0.4171) data time 0.0007 (0.0097) model time 0.0000 (0.0000) loss 8.3122 (7.1959) grad_norm 3.4441 (2.3038) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:23:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][50/625] eta 0:03:58 lr 0.000632 wd 0.0500 time 0.4001 (0.4140) data time 0.0008 (0.0081) model time 0.0000 (0.0000) loss 7.3763 (7.2028) grad_norm 2.7869 (2.2742) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][60/625] eta 0:03:52 lr 0.000632 wd 0.0500 time 0.3978 (0.4119) data time 0.0009 (0.0069) model time 0.3970 (0.3999) loss 8.0757 (7.2365) grad_norm 2.8515 (2.3796) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][70/625] eta 0:03:47 lr 0.000632 wd 0.0500 time 0.4000 (0.4101) data time 0.0006 (0.0061) model time 0.3994 (0.3992) loss 5.9662 (7.2025) grad_norm 3.3931 (2.4310) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:23:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][80/625] eta 0:03:42 lr 0.000632 wd 0.0500 time 0.3987 (0.4088) data time 0.0007 (0.0054) model time 0.3980 (0.3992) loss 6.9685 (7.2127) grad_norm 4.0901 (2.5686) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][90/625] eta 0:03:38 lr 0.000632 wd 0.0500 time 0.3981 (0.4079) data time 0.0006 (0.0049) model time 0.3974 (0.3993) loss 7.8281 (7.2003) grad_norm 1.8136 (2.5683) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][100/625] eta 0:03:33 lr 0.000632 wd 0.0500 time 0.3959 (0.4072) data time 0.0007 (0.0045) model time 0.3952 (0.3993) loss 7.4393 (7.2091) grad_norm 3.0293 (2.5781) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][110/625] eta 0:03:29 lr 0.000631 wd 0.0500 time 0.4014 (0.4066) data time 0.0009 (0.0042) model time 0.4006 (0.3995) loss 7.8656 (7.1934) grad_norm 4.0494 (2.6012) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][120/625] eta 0:03:25 lr 0.000631 wd 0.0500 time 0.3964 (0.4060) data time 0.0008 (0.0039) model time 0.3956 (0.3993) loss 5.2748 (7.1757) grad_norm 2.0202 (2.5898) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][130/625] eta 0:03:20 lr 0.000631 wd 0.0500 time 0.3971 (0.4057) data time 0.0006 (0.0037) model time 0.3966 (0.3995) loss 7.8229 (7.1852) grad_norm 3.0706 (2.5886) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][140/625] eta 0:03:16 lr 0.000631 wd 0.0500 time 0.3944 (0.4053) data time 0.0009 (0.0035) model time 0.3935 (0.3994) loss 6.7349 (7.1909) grad_norm 3.7146 (2.6445) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][150/625] eta 0:03:12 lr 0.000631 wd 0.0500 time 0.3998 (0.4049) data time 0.0009 (0.0033) model time 0.3990 (0.3994) loss 6.3806 (7.2036) grad_norm 3.8883 (2.6877) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][160/625] eta 0:03:08 lr 0.000631 wd 0.0500 time 0.4007 (0.4056) data time 0.0009 (0.0032) model time 0.3998 (0.4008) loss 7.4502 (7.2000) grad_norm 3.7111 (2.7584) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][170/625] eta 0:03:04 lr 0.000631 wd 0.0500 time 0.4071 (0.4055) data time 0.0006 (0.0031) model time 0.4064 (0.4009) loss 6.9540 (7.1803) grad_norm 2.3459 (2.7719) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][180/625] eta 0:03:00 lr 0.000631 wd 0.0500 time 0.3959 (0.4052) data time 0.0007 (0.0029) model time 0.3952 (0.4009) loss 6.5640 (7.1666) grad_norm 3.2247 (2.7534) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][190/625] eta 0:02:56 lr 0.000631 wd 0.0500 time 0.3990 (0.4049) data time 0.0007 (0.0028) model time 0.3983 (0.4007) loss 7.6197 (7.1680) grad_norm 6.6569 (2.7820) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][200/625] eta 0:02:53 lr 0.000631 wd 0.0500 time 0.5674 (0.4082) data time 0.0009 (0.0027) model time 0.5665 (0.4053) loss 7.7752 (7.1453) grad_norm 3.1381 (2.8298) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][210/625] eta 0:02:50 lr 0.000630 wd 0.0500 time 0.6140 (0.4116) data time 0.0007 (0.0026) model time 0.6133 (0.4099) loss 8.2439 (7.1437) grad_norm 5.0373 (2.8485) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:24:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][220/625] eta 0:02:47 lr 0.000630 wd 0.0500 time 0.4077 (0.4139) data time 0.0008 (0.0026) model time 0.4069 (0.4129) loss 8.1395 (7.1440) grad_norm 2.5189 (2.8445) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][230/625] eta 0:02:44 lr 0.000630 wd 0.0500 time 0.5238 (0.4153) data time 0.0007 (0.0025) model time 0.5231 (0.4147) loss 7.0474 (7.1316) grad_norm 2.5631 (2.8356) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][240/625] eta 0:02:39 lr 0.000630 wd 0.0500 time 0.3974 (0.4147) data time 0.0009 (0.0024) model time 0.3965 (0.4140) loss 6.9501 (7.1309) grad_norm 1.9866 (2.8323) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][250/625] eta 0:02:35 lr 0.000630 wd 0.0500 time 0.4028 (0.4143) data time 0.0008 (0.0025) model time 0.4020 (0.4133) loss 5.9004 (7.1242) grad_norm 2.8940 (2.8462) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][260/625] eta 0:02:31 lr 0.000630 wd 0.0500 time 0.3995 (0.4137) data time 0.0008 (0.0024) model time 0.3987 (0.4126) loss 7.6544 (7.1355) grad_norm 2.4537 (2.8232) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][270/625] eta 0:02:26 lr 0.000630 wd 0.0500 time 0.3963 (0.4131) data time 0.0006 (0.0023) model time 0.3957 (0.4119) loss 7.2450 (7.1414) grad_norm 2.1566 (2.8212) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][280/625] eta 0:02:22 lr 0.000630 wd 0.0500 time 0.4017 (0.4127) data time 0.0006 (0.0023) model time 0.4011 (0.4114) loss 6.4499 (7.1364) grad_norm 2.5439 (2.8143) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][290/625] eta 0:02:18 lr 0.000630 wd 0.0500 time 0.3978 (0.4122) data time 0.0006 (0.0022) model time 0.3972 (0.4108) loss 6.5455 (7.1220) grad_norm 2.1477 (2.8256) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][300/625] eta 0:02:13 lr 0.000629 wd 0.0500 time 0.3968 (0.4118) data time 0.0008 (0.0022) model time 0.3960 (0.4104) loss 6.6358 (7.1316) grad_norm 2.7462 (2.8358) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][310/625] eta 0:02:09 lr 0.000629 wd 0.0500 time 0.3977 (0.4115) data time 0.0007 (0.0022) model time 0.3971 (0.4101) loss 7.7528 (7.1518) grad_norm 7.8485 (2.8448) loss_scale 2048.0000 (2048.0000) mem 14939MB [2024-07-25 03:25:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][320/625] eta 0:02:05 lr 0.000629 wd 0.0500 time 0.3999 (0.4112) data time 0.0010 (0.0021) model time 0.3989 (0.4096) loss 6.1559 (7.1495) grad_norm 2.1929 (inf) loss_scale 1024.0000 (2028.8598) mem 14939MB [2024-07-25 03:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][330/625] eta 0:02:01 lr 0.000629 wd 0.0500 time 0.3984 (0.4107) data time 0.0007 (0.0021) model time 0.3978 (0.4091) loss 6.6634 (7.1391) grad_norm 1.8166 (inf) loss_scale 512.0000 (1983.0332) mem 14939MB [2024-07-25 03:25:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][340/625] eta 0:01:56 lr 0.000629 wd 0.0500 time 0.3968 (0.4104) data time 0.0009 (0.0021) model time 0.3959 (0.4087) loss 7.0805 (7.1430) grad_norm 2.2509 (inf) loss_scale 512.0000 (1939.8944) mem 14939MB [2024-07-25 03:25:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][350/625] eta 0:01:52 lr 0.000629 wd 0.0500 time 0.3970 (0.4101) data time 0.0009 (0.0020) model time 0.3961 (0.4084) loss 7.3610 (7.1382) grad_norm 2.1699 (inf) loss_scale 512.0000 (1899.2137) mem 14939MB [2024-07-25 03:25:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][360/625] eta 0:01:48 lr 0.000629 wd 0.0500 time 0.3924 (0.4098) data time 0.0008 (0.0020) model time 0.3916 (0.4081) loss 6.9666 (7.1472) grad_norm 1.7942 (inf) loss_scale 512.0000 (1860.7867) mem 14939MB [2024-07-25 03:25:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][370/625] eta 0:01:44 lr 0.000629 wd 0.0500 time 0.4494 (0.4099) data time 0.0010 (0.0020) model time 0.4484 (0.4082) loss 6.2986 (7.1565) grad_norm 3.5975 (inf) loss_scale 512.0000 (1824.4313) mem 14939MB [2024-07-25 03:26:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][380/625] eta 0:01:40 lr 0.000629 wd 0.0500 time 0.4162 (0.4102) data time 0.0009 (0.0020) model time 0.4153 (0.4087) loss 8.2411 (7.1611) grad_norm 2.0211 (inf) loss_scale 512.0000 (1789.9843) mem 14939MB [2024-07-25 03:26:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][390/625] eta 0:01:36 lr 0.000628 wd 0.0500 time 0.3959 (0.4101) data time 0.0008 (0.0019) model time 0.3952 (0.4085) loss 8.5139 (7.1639) grad_norm 3.9122 (inf) loss_scale 512.0000 (1757.2992) mem 14939MB [2024-07-25 03:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][400/625] eta 0:01:32 lr 0.000628 wd 0.0500 time 0.4046 (0.4101) data time 0.0008 (0.0019) model time 0.4038 (0.4085) loss 7.6300 (7.1742) grad_norm 4.1230 (inf) loss_scale 512.0000 (1726.2444) mem 14939MB [2024-07-25 03:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][410/625] eta 0:01:28 lr 0.000628 wd 0.0500 time 0.4163 (0.4099) data time 0.0007 (0.0019) model time 0.4156 (0.4083) loss 7.1124 (7.1809) grad_norm 2.6975 (inf) loss_scale 512.0000 (1696.7007) mem 14939MB [2024-07-25 03:26:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][420/625] eta 0:01:24 lr 0.000628 wd 0.0500 time 0.3932 (0.4117) data time 0.0008 (0.0019) model time 0.3924 (0.4104) loss 7.6335 (7.1845) grad_norm 1.8147 (inf) loss_scale 512.0000 (1668.5606) mem 14939MB [2024-07-25 03:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][430/625] eta 0:01:20 lr 0.000628 wd 0.0500 time 0.5544 (0.4141) data time 0.0009 (0.0018) model time 0.5535 (0.4131) loss 7.5048 (7.1975) grad_norm 2.5526 (inf) loss_scale 512.0000 (1641.7262) mem 14939MB [2024-07-25 03:26:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][440/625] eta 0:01:17 lr 0.000628 wd 0.0500 time 0.3946 (0.4164) data time 0.0007 (0.0018) model time 0.3939 (0.4157) loss 6.0873 (7.1887) grad_norm 2.1185 (inf) loss_scale 512.0000 (1616.1088) mem 14939MB [2024-07-25 03:26:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][450/625] eta 0:01:13 lr 0.000628 wd 0.0500 time 0.5809 (0.4175) data time 0.0009 (0.0018) model time 0.5800 (0.4169) loss 7.4443 (7.1982) grad_norm 2.9863 (inf) loss_scale 512.0000 (1591.6275) mem 14939MB [2024-07-25 03:26:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][460/625] eta 0:01:08 lr 0.000628 wd 0.0500 time 0.4267 (0.4172) data time 0.0006 (0.0018) model time 0.4261 (0.4165) loss 6.7119 (7.1951) grad_norm 3.2733 (inf) loss_scale 512.0000 (1568.2082) mem 14939MB [2024-07-25 03:26:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][470/625] eta 0:01:04 lr 0.000628 wd 0.0500 time 0.3980 (0.4169) data time 0.0006 (0.0019) model time 0.3973 (0.4161) loss 6.7978 (7.1930) grad_norm 1.8113 (inf) loss_scale 512.0000 (1545.7834) mem 14939MB [2024-07-25 03:26:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][480/625] eta 0:01:00 lr 0.000628 wd 0.0500 time 0.3997 (0.4165) data time 0.0008 (0.0019) model time 0.3989 (0.4157) loss 7.6397 (7.1904) grad_norm 2.3134 (inf) loss_scale 512.0000 (1524.2911) mem 14939MB [2024-07-25 03:26:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][490/625] eta 0:00:56 lr 0.000627 wd 0.0500 time 0.4143 (0.4162) data time 0.0009 (0.0018) model time 0.4134 (0.4154) loss 8.3194 (7.1921) grad_norm 3.5839 (inf) loss_scale 512.0000 (1503.6741) mem 14939MB [2024-07-25 03:26:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][500/625] eta 0:00:51 lr 0.000627 wd 0.0500 time 0.3912 (0.4159) data time 0.0008 (0.0018) model time 0.3904 (0.4150) loss 7.2292 (7.1942) grad_norm 3.3159 (inf) loss_scale 512.0000 (1483.8802) mem 14939MB [2024-07-25 03:26:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][510/625] eta 0:00:47 lr 0.000627 wd 0.0500 time 0.3956 (0.4156) data time 0.0007 (0.0018) model time 0.3949 (0.4146) loss 7.7200 (7.1865) grad_norm 4.2838 (inf) loss_scale 512.0000 (1464.8611) mem 14939MB [2024-07-25 03:27:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][520/625] eta 0:00:43 lr 0.000627 wd 0.0500 time 0.4170 (0.4153) data time 0.0007 (0.0018) model time 0.4163 (0.4143) loss 8.1406 (7.1892) grad_norm 1.8620 (inf) loss_scale 512.0000 (1446.5720) mem 14939MB [2024-07-25 03:27:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][530/625] eta 0:00:39 lr 0.000627 wd 0.0500 time 0.3995 (0.4150) data time 0.0009 (0.0018) model time 0.3987 (0.4140) loss 8.5079 (7.1926) grad_norm 4.0857 (inf) loss_scale 512.0000 (1428.9718) mem 14939MB [2024-07-25 03:27:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][540/625] eta 0:00:35 lr 0.000627 wd 0.0500 time 0.4000 (0.4148) data time 0.0008 (0.0018) model time 0.3992 (0.4137) loss 7.1500 (7.1905) grad_norm 2.1279 (inf) loss_scale 512.0000 (1412.0222) mem 14939MB [2024-07-25 03:27:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][550/625] eta 0:00:31 lr 0.000627 wd 0.0500 time 0.4254 (0.4146) data time 0.0006 (0.0018) model time 0.4248 (0.4135) loss 8.6624 (7.1921) grad_norm 2.7864 (inf) loss_scale 512.0000 (1395.6878) mem 14939MB [2024-07-25 03:27:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][560/625] eta 0:00:26 lr 0.000627 wd 0.0500 time 0.4000 (0.4143) data time 0.0007 (0.0018) model time 0.3993 (0.4133) loss 7.6348 (7.1965) grad_norm 1.6815 (inf) loss_scale 512.0000 (1379.9358) mem 14939MB [2024-07-25 03:27:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][570/625] eta 0:00:22 lr 0.000627 wd 0.0500 time 0.3991 (0.4141) data time 0.0006 (0.0017) model time 0.3985 (0.4130) loss 6.8403 (7.1936) grad_norm 2.3965 (inf) loss_scale 512.0000 (1364.7356) mem 14939MB [2024-07-25 03:27:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][580/625] eta 0:00:18 lr 0.000626 wd 0.0500 time 0.3986 (0.4139) data time 0.0007 (0.0017) model time 0.3979 (0.4128) loss 6.3287 (7.1903) grad_norm 3.5972 (inf) loss_scale 512.0000 (1350.0585) mem 14939MB [2024-07-25 03:27:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][590/625] eta 0:00:14 lr 0.000626 wd 0.0500 time 0.3990 (0.4137) data time 0.0008 (0.0017) model time 0.3982 (0.4125) loss 6.6495 (7.1877) grad_norm 1.8496 (inf) loss_scale 512.0000 (1335.8782) mem 14939MB [2024-07-25 03:27:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][600/625] eta 0:00:10 lr 0.000626 wd 0.0500 time 0.3985 (0.4138) data time 0.0008 (0.0017) model time 0.3977 (0.4126) loss 7.3063 (7.1837) grad_norm 1.8420 (inf) loss_scale 512.0000 (1322.1697) mem 14939MB [2024-07-25 03:27:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][610/625] eta 0:00:06 lr 0.000626 wd 0.0500 time 0.4019 (0.4135) data time 0.0006 (0.0017) model time 0.4013 (0.4124) loss 6.6606 (7.1793) grad_norm 2.6927 (inf) loss_scale 512.0000 (1308.9100) mem 14939MB [2024-07-25 03:27:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][620/625] eta 0:00:02 lr 0.000626 wd 0.0500 time 0.3953 (0.4133) data time 0.0004 (0.0017) model time 0.3949 (0.4121) loss 7.5665 (7.1830) grad_norm 1.7543 (inf) loss_scale 512.0000 (1296.0773) mem 14939MB [2024-07-25 03:27:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 156 training takes 0:04:18 [2024-07-25 03:27:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:27:43 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:27:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5981 (0.5981) Acc@1 88.721 (88.721) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 03:27:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9951 (0.7468) Acc@1 78.320 (85.125) Acc@5 94.824 (97.368) Mem 14939MB [2024-07-25 03:27:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 1.0859 (0.8741) Acc@1 75.537 (81.817) Acc@5 93.945 (95.989) Mem 14939MB [2024-07-25 03:27:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.446 Acc@5 95.943 [2024-07-25 03:27:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-07-25 03:27:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.873 (0.873) Loss 0.5615 (0.5615) Acc@1 89.453 (89.453) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:27:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.160) Loss 0.9028 (0.7026) Acc@1 80.664 (85.844) Acc@5 95.801 (97.567) Mem 14939MB [2024-07-25 03:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.125) Loss 1.0352 (0.8278) Acc@1 75.586 (82.427) Acc@5 94.678 (96.280) Mem 14939MB [2024-07-25 03:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.046 Acc@5 96.257 [2024-07-25 03:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-07-25 03:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.05% [2024-07-25 03:27:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:27:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:27:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][0/625] eta 0:08:19 lr 0.000626 wd 0.0500 time 0.7994 (0.7994) data time 0.4213 (0.4213) model time 0.0000 (0.0000) loss 7.8455 (7.8455) grad_norm 3.2534 (3.2534) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:27:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][10/625] eta 0:04:48 lr 0.000626 wd 0.0500 time 0.5721 (0.4695) data time 0.0007 (0.0391) model time 0.0000 (0.0000) loss 8.1379 (7.3388) grad_norm 2.2464 (3.3033) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][20/625] eta 0:04:54 lr 0.000626 wd 0.0500 time 0.5946 (0.4876) data time 0.0008 (0.0209) model time 0.0000 (0.0000) loss 7.6890 (7.3077) grad_norm 1.9853 (3.2117) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][30/625] eta 0:04:55 lr 0.000626 wd 0.0500 time 0.5731 (0.4971) data time 0.0008 (0.0145) model time 0.0000 (0.0000) loss 7.3591 (7.2178) grad_norm 1.5949 (2.8816) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][40/625] eta 0:04:48 lr 0.000626 wd 0.0500 time 0.4007 (0.4939) data time 0.0009 (0.0112) model time 0.0000 (0.0000) loss 8.2461 (7.1709) grad_norm 1.7257 (2.6447) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][50/625] eta 0:04:35 lr 0.000625 wd 0.0500 time 0.3964 (0.4785) data time 0.0008 (0.0092) model time 0.0000 (0.0000) loss 7.7197 (7.1725) grad_norm 2.4885 (2.6419) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][60/625] eta 0:04:22 lr 0.000625 wd 0.0500 time 0.3966 (0.4652) data time 0.0008 (0.0078) model time 0.3958 (0.3966) loss 5.7302 (7.1841) grad_norm 2.1280 (2.6168) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][70/625] eta 0:04:12 lr 0.000625 wd 0.0500 time 0.3963 (0.4556) data time 0.0009 (0.0069) model time 0.3954 (0.3963) loss 7.9503 (7.1993) grad_norm 2.7535 (2.8783) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][80/625] eta 0:04:04 lr 0.000625 wd 0.0500 time 0.4029 (0.4487) data time 0.0009 (0.0061) model time 0.4021 (0.3972) loss 8.0449 (7.2255) grad_norm 2.1097 (2.8258) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][90/625] eta 0:03:57 lr 0.000625 wd 0.0500 time 0.4044 (0.4432) data time 0.0009 (0.0055) model time 0.4035 (0.3974) loss 6.6086 (7.2142) grad_norm 3.7487 (2.8962) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][100/625] eta 0:03:50 lr 0.000625 wd 0.0500 time 0.3959 (0.4388) data time 0.0009 (0.0051) model time 0.3950 (0.3974) loss 7.9320 (7.2205) grad_norm 3.1901 (2.8757) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][110/625] eta 0:03:44 lr 0.000625 wd 0.0500 time 0.4029 (0.4352) data time 0.0009 (0.0047) model time 0.4021 (0.3975) loss 8.1627 (7.2005) grad_norm 2.2971 (2.8411) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][120/625] eta 0:03:38 lr 0.000625 wd 0.0500 time 0.3988 (0.4323) data time 0.0006 (0.0044) model time 0.3981 (0.3977) loss 8.0151 (7.2267) grad_norm 2.1628 (2.8086) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][130/625] eta 0:03:32 lr 0.000625 wd 0.0500 time 0.3974 (0.4299) data time 0.0006 (0.0041) model time 0.3968 (0.3980) loss 7.3321 (7.2003) grad_norm 2.5662 (2.7750) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][140/625] eta 0:03:27 lr 0.000624 wd 0.0500 time 0.4021 (0.4286) data time 0.0008 (0.0039) model time 0.4013 (0.3995) loss 5.4839 (7.1980) grad_norm 2.3279 (2.7755) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][150/625] eta 0:03:22 lr 0.000624 wd 0.0500 time 0.3969 (0.4269) data time 0.0008 (0.0037) model time 0.3960 (0.3997) loss 6.2987 (7.1849) grad_norm 4.8002 (2.8875) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][160/625] eta 0:03:17 lr 0.000624 wd 0.0500 time 0.3943 (0.4251) data time 0.0009 (0.0035) model time 0.3934 (0.3995) loss 6.9239 (7.1941) grad_norm 5.1017 (2.9335) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][170/625] eta 0:03:12 lr 0.000624 wd 0.0500 time 0.3992 (0.4238) data time 0.0008 (0.0034) model time 0.3984 (0.3996) loss 7.9326 (7.1715) grad_norm 6.0776 (2.9283) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][180/625] eta 0:03:07 lr 0.000624 wd 0.0500 time 0.3960 (0.4224) data time 0.0008 (0.0032) model time 0.3952 (0.3995) loss 7.9913 (7.1835) grad_norm 1.8660 (2.9158) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][190/625] eta 0:03:03 lr 0.000624 wd 0.0500 time 0.3942 (0.4212) data time 0.0008 (0.0031) model time 0.3935 (0.3994) loss 8.5550 (7.1991) grad_norm 2.3858 (2.8958) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][200/625] eta 0:02:58 lr 0.000624 wd 0.0500 time 0.4001 (0.4204) data time 0.0009 (0.0030) model time 0.3992 (0.3997) loss 7.4064 (7.2005) grad_norm 2.2668 (2.8756) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][210/625] eta 0:02:54 lr 0.000624 wd 0.0500 time 0.3964 (0.4195) data time 0.0006 (0.0029) model time 0.3959 (0.3997) loss 7.3964 (7.2011) grad_norm 3.3886 (2.8747) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][220/625] eta 0:02:49 lr 0.000624 wd 0.0500 time 0.4079 (0.4189) data time 0.0006 (0.0028) model time 0.4073 (0.4001) loss 6.6356 (7.1920) grad_norm 2.5980 (2.9036) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][230/625] eta 0:02:45 lr 0.000624 wd 0.0500 time 0.4414 (0.4191) data time 0.0006 (0.0028) model time 0.4408 (0.4013) loss 5.9762 (7.1887) grad_norm 4.4343 (2.9027) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][240/625] eta 0:02:42 lr 0.000623 wd 0.0500 time 0.6054 (0.4233) data time 0.0006 (0.0027) model time 0.6047 (0.4076) loss 7.7431 (7.2009) grad_norm 3.2942 (2.9260) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][250/625] eta 0:02:39 lr 0.000623 wd 0.0500 time 0.5997 (0.4262) data time 0.0008 (0.0026) model time 0.5989 (0.4119) loss 7.3939 (7.2195) grad_norm 6.7980 (2.9631) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][260/625] eta 0:02:36 lr 0.000623 wd 0.0500 time 0.4060 (0.4280) data time 0.0006 (0.0026) model time 0.4054 (0.4147) loss 6.2243 (7.2241) grad_norm 2.1100 (2.9419) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][270/625] eta 0:02:31 lr 0.000623 wd 0.0500 time 0.4048 (0.4276) data time 0.0006 (0.0025) model time 0.4041 (0.4148) loss 7.6149 (7.2349) grad_norm 3.7393 (2.9361) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][280/625] eta 0:02:27 lr 0.000623 wd 0.0500 time 0.3938 (0.4267) data time 0.0006 (0.0024) model time 0.3932 (0.4142) loss 8.0282 (7.2472) grad_norm 2.9628 (2.9341) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][290/625] eta 0:02:22 lr 0.000623 wd 0.0500 time 0.3966 (0.4257) data time 0.0008 (0.0024) model time 0.3958 (0.4135) loss 7.7858 (7.2535) grad_norm 3.6353 (2.9296) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:29:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][300/625] eta 0:02:18 lr 0.000623 wd 0.0500 time 0.4138 (0.4248) data time 0.0008 (0.0023) model time 0.4130 (0.4129) loss 6.7334 (7.2481) grad_norm 5.0153 (2.9474) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][310/625] eta 0:02:13 lr 0.000623 wd 0.0500 time 0.4066 (0.4242) data time 0.0009 (0.0023) model time 0.4057 (0.4126) loss 7.5468 (7.2510) grad_norm 7.6491 (2.9725) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][320/625] eta 0:02:09 lr 0.000623 wd 0.0500 time 0.4018 (0.4235) data time 0.0008 (0.0023) model time 0.4010 (0.4121) loss 6.4810 (7.2393) grad_norm 2.7477 (2.9671) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][330/625] eta 0:02:04 lr 0.000622 wd 0.0500 time 0.3997 (0.4227) data time 0.0006 (0.0022) model time 0.3991 (0.4116) loss 7.1189 (7.2447) grad_norm 4.1745 (2.9743) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][340/625] eta 0:02:00 lr 0.000622 wd 0.0500 time 0.3966 (0.4221) data time 0.0006 (0.0022) model time 0.3960 (0.4112) loss 7.1472 (7.2501) grad_norm 2.0537 (2.9974) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][350/625] eta 0:01:55 lr 0.000622 wd 0.0500 time 0.4009 (0.4215) data time 0.0008 (0.0021) model time 0.4001 (0.4108) loss 6.2731 (7.2495) grad_norm 2.0423 (3.0895) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][360/625] eta 0:01:51 lr 0.000622 wd 0.0500 time 0.4012 (0.4214) data time 0.0008 (0.0021) model time 0.4004 (0.4110) loss 7.8906 (7.2586) grad_norm 2.7752 (3.0783) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][370/625] eta 0:01:47 lr 0.000622 wd 0.0500 time 0.3954 (0.4208) data time 0.0009 (0.0021) model time 0.3945 (0.4106) loss 7.7124 (7.2474) grad_norm 2.4235 (3.0657) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][380/625] eta 0:01:42 lr 0.000622 wd 0.0500 time 0.4061 (0.4202) data time 0.0008 (0.0020) model time 0.4052 (0.4103) loss 7.4847 (7.2465) grad_norm 2.7120 (3.0668) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][390/625] eta 0:01:38 lr 0.000622 wd 0.0500 time 0.4053 (0.4197) data time 0.0009 (0.0020) model time 0.4044 (0.4099) loss 6.4776 (7.2467) grad_norm 3.0810 (3.0655) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][400/625] eta 0:01:34 lr 0.000622 wd 0.0500 time 0.4043 (0.4192) data time 0.0007 (0.0020) model time 0.4037 (0.4096) loss 8.1562 (7.2517) grad_norm 2.7386 (3.0613) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][410/625] eta 0:01:30 lr 0.000622 wd 0.0500 time 0.3978 (0.4187) data time 0.0008 (0.0020) model time 0.3970 (0.4092) loss 6.7321 (7.2431) grad_norm 3.1385 (3.0573) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][420/625] eta 0:01:25 lr 0.000622 wd 0.0500 time 0.4143 (0.4182) data time 0.0006 (0.0019) model time 0.4138 (0.4090) loss 7.1385 (7.2570) grad_norm 3.3152 (3.0462) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][430/625] eta 0:01:21 lr 0.000621 wd 0.0500 time 0.3984 (0.4178) data time 0.0008 (0.0019) model time 0.3976 (0.4087) loss 7.3021 (7.2522) grad_norm 4.1031 (3.0640) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][440/625] eta 0:01:17 lr 0.000621 wd 0.0500 time 0.3962 (0.4174) data time 0.0009 (0.0019) model time 0.3954 (0.4085) loss 7.2290 (7.2571) grad_norm 1.9713 (3.0571) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:30:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][450/625] eta 0:01:13 lr 0.000621 wd 0.0500 time 0.4069 (0.4174) data time 0.0007 (0.0019) model time 0.4061 (0.4086) loss 6.9377 (7.2511) grad_norm 3.4978 (3.0564) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][460/625] eta 0:01:09 lr 0.000621 wd 0.0500 time 0.6130 (0.4189) data time 0.0008 (0.0018) model time 0.6123 (0.4106) loss 7.3334 (7.2522) grad_norm 2.2187 (3.0497) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][470/625] eta 0:01:05 lr 0.000621 wd 0.0500 time 0.5588 (0.4207) data time 0.0008 (0.0018) model time 0.5580 (0.4127) loss 6.6401 (7.2511) grad_norm 3.0060 (3.0491) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][480/625] eta 0:01:01 lr 0.000621 wd 0.0500 time 0.3995 (0.4219) data time 0.0007 (0.0018) model time 0.3988 (0.4143) loss 7.7940 (7.2531) grad_norm 3.6357 (3.0672) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][490/625] eta 0:00:56 lr 0.000621 wd 0.0500 time 0.4115 (0.4219) data time 0.0008 (0.0018) model time 0.4107 (0.4144) loss 7.2914 (7.2510) grad_norm 1.8686 (3.0546) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][500/625] eta 0:00:52 lr 0.000621 wd 0.0500 time 0.4083 (0.4215) data time 0.0008 (0.0018) model time 0.4075 (0.4141) loss 8.0849 (7.2420) grad_norm 2.0700 (3.0364) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][510/625] eta 0:00:48 lr 0.000621 wd 0.0500 time 0.3989 (0.4211) data time 0.0009 (0.0018) model time 0.3980 (0.4138) loss 7.0722 (7.2418) grad_norm 3.0539 (3.0305) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][520/625] eta 0:00:44 lr 0.000620 wd 0.0500 time 0.4132 (0.4207) data time 0.0010 (0.0017) model time 0.4122 (0.4135) loss 6.8057 (7.2398) grad_norm 2.7981 (3.0219) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][530/625] eta 0:00:39 lr 0.000620 wd 0.0500 time 0.3995 (0.4203) data time 0.0006 (0.0017) model time 0.3989 (0.4132) loss 6.4244 (7.2348) grad_norm 2.2203 (3.0109) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][540/625] eta 0:00:35 lr 0.000620 wd 0.0500 time 0.4061 (0.4200) data time 0.0006 (0.0017) model time 0.4055 (0.4129) loss 7.0660 (7.2291) grad_norm 3.2192 (3.0131) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][550/625] eta 0:00:31 lr 0.000620 wd 0.0500 time 0.4009 (0.4196) data time 0.0006 (0.0017) model time 0.4003 (0.4127) loss 7.1701 (7.2331) grad_norm 2.2601 (3.0207) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][560/625] eta 0:00:27 lr 0.000620 wd 0.0500 time 0.4036 (0.4193) data time 0.0006 (0.0017) model time 0.4030 (0.4124) loss 5.9788 (7.2205) grad_norm 3.3322 (3.0275) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][570/625] eta 0:00:23 lr 0.000620 wd 0.0500 time 0.3943 (0.4190) data time 0.0008 (0.0017) model time 0.3935 (0.4122) loss 7.7922 (7.2194) grad_norm 2.2347 (3.0268) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][580/625] eta 0:00:18 lr 0.000620 wd 0.0500 time 0.3991 (0.4189) data time 0.0008 (0.0016) model time 0.3984 (0.4122) loss 7.5407 (7.2210) grad_norm 2.3085 (3.0174) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:31:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][590/625] eta 0:00:14 lr 0.000620 wd 0.0500 time 0.3996 (0.4186) data time 0.0008 (0.0016) model time 0.3987 (0.4120) loss 7.7156 (7.2237) grad_norm 4.3782 (3.0144) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][600/625] eta 0:00:10 lr 0.000620 wd 0.0500 time 0.4010 (0.4183) data time 0.0008 (0.0016) model time 0.4002 (0.4118) loss 7.9176 (7.2189) grad_norm 3.4430 (3.0107) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][610/625] eta 0:00:06 lr 0.000619 wd 0.0500 time 0.3994 (0.4180) data time 0.0004 (0.0016) model time 0.3990 (0.4115) loss 6.1757 (7.2117) grad_norm 4.4972 (3.0052) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][620/625] eta 0:00:02 lr 0.000619 wd 0.0500 time 0.3990 (0.4177) data time 0.0006 (0.0016) model time 0.3985 (0.4113) loss 7.5727 (7.2094) grad_norm 2.9952 (3.0215) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 157 training takes 0:04:20 [2024-07-25 03:32:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:32:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:32:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5864 (0.5864) Acc@1 89.258 (89.258) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 03:32:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9355 (0.7294) Acc@1 79.590 (85.067) Acc@5 95.605 (97.390) Mem 14939MB [2024-07-25 03:32:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.1035 (0.8644) Acc@1 75.195 (81.631) Acc@5 93.604 (95.933) Mem 14939MB [2024-07-25 03:32:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.250 Acc@5 95.911 [2024-07-25 03:32:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-07-25 03:32:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.881 (0.881) Loss 0.5610 (0.5610) Acc@1 89.404 (89.404) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:32:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.161) Loss 0.9019 (0.7021) Acc@1 80.762 (85.902) Acc@5 95.947 (97.572) Mem 14939MB [2024-07-25 03:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.125) Loss 1.0342 (0.8272) Acc@1 75.635 (82.454) Acc@5 94.775 (96.308) Mem 14939MB [2024-07-25 03:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.058 Acc@5 96.273 [2024-07-25 03:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-07-25 03:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.06% [2024-07-25 03:32:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:32:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][0/625] eta 0:11:01 lr 0.000619 wd 0.0500 time 1.0581 (1.0581) data time 0.6807 (0.6807) model time 0.0000 (0.0000) loss 7.5718 (7.5718) grad_norm 5.3606 (5.3606) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][10/625] eta 0:04:43 lr 0.000619 wd 0.0500 time 0.4024 (0.4608) data time 0.0006 (0.0627) model time 0.0000 (0.0000) loss 6.3945 (7.0717) grad_norm 3.1769 (3.2367) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][20/625] eta 0:04:21 lr 0.000619 wd 0.0500 time 0.3987 (0.4316) data time 0.0008 (0.0333) model time 0.0000 (0.0000) loss 8.1012 (7.0415) grad_norm 3.6772 (3.0107) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][30/625] eta 0:04:10 lr 0.000619 wd 0.0500 time 0.4016 (0.4215) data time 0.0006 (0.0228) model time 0.0000 (0.0000) loss 7.1021 (7.0719) grad_norm 3.5219 (2.9987) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][40/625] eta 0:04:03 lr 0.000619 wd 0.0500 time 0.3973 (0.4167) data time 0.0008 (0.0175) model time 0.0000 (0.0000) loss 7.2599 (7.1400) grad_norm 2.2009 (2.9230) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][50/625] eta 0:04:04 lr 0.000619 wd 0.0500 time 0.5676 (0.4256) data time 0.0006 (0.0142) model time 0.0000 (0.0000) loss 7.2494 (7.1838) grad_norm 2.2566 (2.9834) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][60/625] eta 0:04:07 lr 0.000619 wd 0.0500 time 0.5236 (0.4386) data time 0.0006 (0.0120) model time 0.5230 (0.5037) loss 8.2987 (7.2155) grad_norm 5.1614 (3.0247) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][70/625] eta 0:04:10 lr 0.000619 wd 0.0500 time 0.5968 (0.4514) data time 0.0008 (0.0105) model time 0.5960 (0.5162) loss 6.4098 (7.2529) grad_norm 3.0173 (3.0273) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][80/625] eta 0:04:05 lr 0.000618 wd 0.0500 time 0.3955 (0.4510) data time 0.0006 (0.0093) model time 0.3949 (0.4932) loss 7.2604 (7.2207) grad_norm 2.7285 (3.2753) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:32:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][90/625] eta 0:03:59 lr 0.000618 wd 0.0500 time 0.4018 (0.4468) data time 0.0009 (0.0083) model time 0.4009 (0.4728) loss 8.5283 (7.2095) grad_norm 2.3173 (3.2418) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][100/625] eta 0:03:53 lr 0.000618 wd 0.0500 time 0.6040 (0.4443) data time 0.0006 (0.0076) model time 0.6034 (0.4624) loss 8.1797 (7.2097) grad_norm 2.4092 (3.1648) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][110/625] eta 0:03:46 lr 0.000618 wd 0.0500 time 0.3988 (0.4400) data time 0.0006 (0.0070) model time 0.3982 (0.4514) loss 5.9804 (7.2254) grad_norm 1.7902 (3.1284) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][120/625] eta 0:03:40 lr 0.000618 wd 0.0500 time 0.4008 (0.4367) data time 0.0007 (0.0065) model time 0.4001 (0.4439) loss 7.9253 (7.2208) grad_norm 3.9643 (3.1458) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][130/625] eta 0:03:34 lr 0.000618 wd 0.0500 time 0.3980 (0.4339) data time 0.0008 (0.0061) model time 0.3972 (0.4383) loss 6.5308 (7.2097) grad_norm 2.6570 (3.1405) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][140/625] eta 0:03:29 lr 0.000618 wd 0.0500 time 0.4002 (0.4317) data time 0.0008 (0.0057) model time 0.3995 (0.4342) loss 7.4256 (7.1709) grad_norm 3.1937 (3.1064) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][150/625] eta 0:03:24 lr 0.000618 wd 0.0500 time 0.4106 (0.4298) data time 0.0006 (0.0054) model time 0.4100 (0.4310) loss 6.0460 (7.1595) grad_norm 3.2122 (3.0832) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][160/625] eta 0:03:18 lr 0.000618 wd 0.0500 time 0.3952 (0.4279) data time 0.0006 (0.0051) model time 0.3946 (0.4281) loss 7.7840 (7.1607) grad_norm 2.2137 (3.0535) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][170/625] eta 0:03:13 lr 0.000618 wd 0.0500 time 0.4066 (0.4263) data time 0.0007 (0.0049) model time 0.4059 (0.4257) loss 7.6412 (7.1668) grad_norm 3.2208 (3.0896) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][180/625] eta 0:03:09 lr 0.000617 wd 0.0500 time 0.4046 (0.4249) data time 0.0007 (0.0046) model time 0.4038 (0.4237) loss 8.4628 (7.1917) grad_norm 2.9094 (3.0766) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][190/625] eta 0:03:04 lr 0.000617 wd 0.0500 time 0.3942 (0.4235) data time 0.0007 (0.0044) model time 0.3934 (0.4218) loss 7.8034 (7.2021) grad_norm 3.2923 (3.0922) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][200/625] eta 0:02:59 lr 0.000617 wd 0.0500 time 0.3944 (0.4222) data time 0.0006 (0.0043) model time 0.3937 (0.4202) loss 7.4265 (7.1895) grad_norm 3.4820 (3.1031) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][210/625] eta 0:02:54 lr 0.000617 wd 0.0500 time 0.3990 (0.4211) data time 0.0008 (0.0041) model time 0.3982 (0.4188) loss 7.2066 (7.2010) grad_norm 2.9266 (3.0799) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][220/625] eta 0:02:50 lr 0.000617 wd 0.0500 time 0.3954 (0.4200) data time 0.0008 (0.0039) model time 0.3945 (0.4175) loss 8.4555 (7.1889) grad_norm 1.9283 (3.0612) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][230/625] eta 0:02:45 lr 0.000617 wd 0.0500 time 0.3980 (0.4192) data time 0.0008 (0.0038) model time 0.3972 (0.4165) loss 6.6954 (7.1842) grad_norm 2.4301 (3.0274) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][240/625] eta 0:02:41 lr 0.000617 wd 0.0500 time 0.4029 (0.4184) data time 0.0008 (0.0037) model time 0.4020 (0.4156) loss 8.0364 (7.1836) grad_norm 2.8772 (3.0106) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][250/625] eta 0:02:36 lr 0.000617 wd 0.0500 time 0.3948 (0.4176) data time 0.0007 (0.0036) model time 0.3941 (0.4147) loss 7.4139 (7.1845) grad_norm 2.1951 (2.9799) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][260/625] eta 0:02:32 lr 0.000617 wd 0.0500 time 0.3991 (0.4169) data time 0.0007 (0.0035) model time 0.3984 (0.4139) loss 8.3291 (7.1981) grad_norm 1.7420 (2.9534) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][270/625] eta 0:02:28 lr 0.000616 wd 0.0500 time 0.5885 (0.4189) data time 0.0008 (0.0034) model time 0.5877 (0.4165) loss 5.7920 (7.1899) grad_norm 3.0865 (2.9296) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][280/625] eta 0:02:25 lr 0.000616 wd 0.0500 time 0.5770 (0.4210) data time 0.0008 (0.0033) model time 0.5762 (0.4190) loss 7.5347 (7.1798) grad_norm 2.2512 (2.9068) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][290/625] eta 0:02:21 lr 0.000616 wd 0.0500 time 0.3939 (0.4239) data time 0.0007 (0.0032) model time 0.3932 (0.4226) loss 8.0714 (7.1794) grad_norm 2.2191 (2.8896) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][300/625] eta 0:02:17 lr 0.000616 wd 0.0500 time 0.3979 (0.4242) data time 0.0007 (0.0031) model time 0.3973 (0.4231) loss 7.2000 (7.1656) grad_norm 2.6968 (2.8702) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][310/625] eta 0:02:13 lr 0.000616 wd 0.0500 time 0.4007 (0.4242) data time 0.0006 (0.0031) model time 0.4000 (0.4230) loss 5.4120 (7.1641) grad_norm 2.3037 (2.8531) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][320/625] eta 0:02:09 lr 0.000616 wd 0.0500 time 0.3966 (0.4234) data time 0.0007 (0.0030) model time 0.3959 (0.4221) loss 7.8571 (7.1655) grad_norm 2.8284 (2.8476) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][330/625] eta 0:02:04 lr 0.000616 wd 0.0500 time 0.3977 (0.4233) data time 0.0009 (0.0029) model time 0.3968 (0.4220) loss 6.9910 (7.1556) grad_norm 2.2271 (2.8392) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][340/625] eta 0:02:00 lr 0.000616 wd 0.0500 time 0.3986 (0.4226) data time 0.0008 (0.0029) model time 0.3978 (0.4212) loss 7.0857 (7.1592) grad_norm 2.4922 (2.8319) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][350/625] eta 0:01:56 lr 0.000616 wd 0.0500 time 0.4006 (0.4220) data time 0.0007 (0.0028) model time 0.3999 (0.4205) loss 7.1882 (7.1655) grad_norm 2.4705 (2.8116) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][360/625] eta 0:01:51 lr 0.000615 wd 0.0500 time 0.3940 (0.4214) data time 0.0007 (0.0028) model time 0.3933 (0.4198) loss 7.0512 (7.1581) grad_norm 3.5810 (2.8339) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][370/625] eta 0:01:47 lr 0.000615 wd 0.0500 time 0.4014 (0.4208) data time 0.0008 (0.0027) model time 0.4006 (0.4191) loss 8.2615 (7.1573) grad_norm 3.0874 (2.8442) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:34:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][380/625] eta 0:01:42 lr 0.000615 wd 0.0500 time 0.3957 (0.4203) data time 0.0007 (0.0027) model time 0.3951 (0.4186) loss 7.5076 (7.1512) grad_norm 2.2357 (2.8304) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][390/625] eta 0:01:38 lr 0.000615 wd 0.0500 time 0.3982 (0.4198) data time 0.0010 (0.0026) model time 0.3973 (0.4181) loss 7.5569 (7.1543) grad_norm 1.6424 (2.8176) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][400/625] eta 0:01:34 lr 0.000615 wd 0.0500 time 0.3997 (0.4194) data time 0.0010 (0.0026) model time 0.3988 (0.4176) loss 7.5696 (7.1576) grad_norm 1.8215 (2.7974) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][410/625] eta 0:01:30 lr 0.000615 wd 0.0500 time 0.3957 (0.4190) data time 0.0007 (0.0026) model time 0.3950 (0.4171) loss 6.7566 (7.1559) grad_norm 2.4037 (2.7834) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][420/625] eta 0:01:25 lr 0.000615 wd 0.0500 time 0.4012 (0.4186) data time 0.0008 (0.0026) model time 0.4005 (0.4167) loss 8.0591 (7.1571) grad_norm 2.4967 (2.7834) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][430/625] eta 0:01:21 lr 0.000615 wd 0.0500 time 0.3989 (0.4182) data time 0.0009 (0.0025) model time 0.3980 (0.4162) loss 8.1294 (7.1682) grad_norm 2.6781 (2.7792) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][440/625] eta 0:01:17 lr 0.000615 wd 0.0500 time 0.3964 (0.4178) data time 0.0008 (0.0025) model time 0.3957 (0.4158) loss 6.7380 (7.1669) grad_norm 2.1005 (2.7663) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][450/625] eta 0:01:13 lr 0.000615 wd 0.0500 time 0.3988 (0.4174) data time 0.0008 (0.0025) model time 0.3980 (0.4154) loss 7.0747 (7.1546) grad_norm 2.8422 (2.7591) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][460/625] eta 0:01:08 lr 0.000614 wd 0.0500 time 0.4029 (0.4171) data time 0.0006 (0.0024) model time 0.4024 (0.4150) loss 8.0272 (7.1613) grad_norm 1.9463 (2.7595) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][470/625] eta 0:01:04 lr 0.000614 wd 0.0500 time 0.3938 (0.4167) data time 0.0007 (0.0024) model time 0.3931 (0.4146) loss 7.8854 (7.1626) grad_norm 1.9321 (2.7534) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][480/625] eta 0:01:00 lr 0.000614 wd 0.0500 time 0.4011 (0.4163) data time 0.0008 (0.0024) model time 0.4003 (0.4143) loss 7.2325 (7.1556) grad_norm 2.2530 (2.7546) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][490/625] eta 0:00:56 lr 0.000614 wd 0.0500 time 0.5664 (0.4172) data time 0.0008 (0.0023) model time 0.5656 (0.4152) loss 8.3385 (7.1572) grad_norm 2.6848 (2.7499) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][500/625] eta 0:00:52 lr 0.000614 wd 0.0500 time 0.5940 (0.4186) data time 0.0010 (0.0023) model time 0.5930 (0.4168) loss 7.3517 (7.1635) grad_norm 1.8807 (2.7408) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][510/625] eta 0:00:48 lr 0.000614 wd 0.0500 time 0.5922 (0.4203) data time 0.0007 (0.0023) model time 0.5915 (0.4188) loss 6.7314 (7.1608) grad_norm 3.2473 (2.7377) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:35:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][520/625] eta 0:00:44 lr 0.000614 wd 0.0500 time 0.5694 (0.4211) data time 0.0008 (0.0022) model time 0.5686 (0.4196) loss 6.3718 (7.1619) grad_norm 2.2525 (2.7285) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][530/625] eta 0:00:39 lr 0.000614 wd 0.0500 time 0.3949 (0.4207) data time 0.0008 (0.0022) model time 0.3941 (0.4192) loss 7.2317 (7.1622) grad_norm 2.2764 (2.7190) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][540/625] eta 0:00:35 lr 0.000614 wd 0.0500 time 0.3972 (0.4203) data time 0.0008 (0.0022) model time 0.3965 (0.4188) loss 8.3547 (7.1676) grad_norm 1.8860 (2.7133) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][550/625] eta 0:00:31 lr 0.000613 wd 0.0500 time 0.4097 (0.4202) data time 0.0007 (0.0022) model time 0.4091 (0.4188) loss 8.1121 (7.1750) grad_norm 4.4090 (2.7199) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][560/625] eta 0:00:27 lr 0.000613 wd 0.0500 time 0.3968 (0.4199) data time 0.0009 (0.0022) model time 0.3960 (0.4184) loss 6.1314 (7.1748) grad_norm 1.5485 (2.7118) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][570/625] eta 0:00:23 lr 0.000613 wd 0.0500 time 0.3968 (0.4196) data time 0.0010 (0.0022) model time 0.3958 (0.4180) loss 7.8099 (7.1771) grad_norm 3.1715 (2.7076) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][580/625] eta 0:00:18 lr 0.000613 wd 0.0500 time 0.4008 (0.4192) data time 0.0006 (0.0021) model time 0.4003 (0.4176) loss 8.0625 (7.1750) grad_norm 3.4014 (2.7153) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][590/625] eta 0:00:14 lr 0.000613 wd 0.0500 time 0.3977 (0.4189) data time 0.0008 (0.0021) model time 0.3969 (0.4173) loss 6.0309 (7.1723) grad_norm 1.8801 (2.7209) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][600/625] eta 0:00:10 lr 0.000613 wd 0.0500 time 0.3963 (0.4187) data time 0.0008 (0.0021) model time 0.3955 (0.4171) loss 7.7035 (7.1750) grad_norm 1.8583 (2.7121) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][610/625] eta 0:00:06 lr 0.000613 wd 0.0500 time 0.3983 (0.4184) data time 0.0004 (0.0021) model time 0.3979 (0.4168) loss 5.8838 (7.1772) grad_norm 2.2124 (2.7057) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][620/625] eta 0:00:02 lr 0.000613 wd 0.0500 time 0.3967 (0.4181) data time 0.0006 (0.0021) model time 0.3961 (0.4165) loss 7.2883 (7.1701) grad_norm 2.7530 (2.6993) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 158 training takes 0:04:21 [2024-07-25 03:36:39 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:36:40 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:36:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.6050 (0.6050) Acc@1 88.232 (88.232) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 03:36:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.9443 (0.7359) Acc@1 79.346 (85.338) Acc@5 95.752 (97.412) Mem 14939MB [2024-07-25 03:36:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0898 (0.8681) Acc@1 74.512 (81.799) Acc@5 94.531 (96.082) Mem 14939MB [2024-07-25 03:36:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.472 Acc@5 96.037 [2024-07-25 03:36:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 03:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.818 (0.818) Loss 0.5620 (0.5620) Acc@1 89.355 (89.355) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.157) Loss 0.9009 (0.7021) Acc@1 80.811 (85.915) Acc@5 95.996 (97.599) Mem 14939MB [2024-07-25 03:36:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 1.0332 (0.8266) Acc@1 75.391 (82.450) Acc@5 94.775 (96.315) Mem 14939MB [2024-07-25 03:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.064 Acc@5 96.281 [2024-07-25 03:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-07-25 03:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.06% [2024-07-25 03:36:46 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:36:47 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:36:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][0/625] eta 0:08:36 lr 0.000613 wd 0.0500 time 0.8272 (0.8272) data time 0.4449 (0.4449) model time 0.0000 (0.0000) loss 7.1194 (7.1194) grad_norm 1.9882 (1.9882) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][10/625] eta 0:04:29 lr 0.000613 wd 0.0500 time 0.3970 (0.4381) data time 0.0007 (0.0421) model time 0.0000 (0.0000) loss 7.4325 (7.1575) grad_norm 4.4748 (2.8156) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][20/625] eta 0:04:15 lr 0.000612 wd 0.0500 time 0.3946 (0.4216) data time 0.0008 (0.0228) model time 0.0000 (0.0000) loss 6.5996 (7.2579) grad_norm 5.8936 (3.3613) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:36:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][30/625] eta 0:04:06 lr 0.000612 wd 0.0500 time 0.4049 (0.4144) data time 0.0009 (0.0157) model time 0.0000 (0.0000) loss 7.4548 (7.1612) grad_norm 2.6040 (3.2739) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][40/625] eta 0:04:00 lr 0.000612 wd 0.0500 time 0.3991 (0.4104) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 7.2306 (7.1758) grad_norm 2.0109 (3.2530) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][50/625] eta 0:03:54 lr 0.000612 wd 0.0500 time 0.3992 (0.4082) data time 0.0007 (0.0099) model time 0.0000 (0.0000) loss 6.3920 (7.1498) grad_norm 2.1042 (3.3214) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][60/625] eta 0:03:50 lr 0.000612 wd 0.0500 time 0.4099 (0.4073) data time 0.0006 (0.0085) model time 0.4092 (0.4016) loss 7.8938 (7.1729) grad_norm 1.9387 (3.1825) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][70/625] eta 0:03:45 lr 0.000612 wd 0.0500 time 0.3928 (0.4063) data time 0.0008 (0.0074) model time 0.3920 (0.4003) loss 7.4873 (7.1669) grad_norm 2.4313 (3.0566) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][80/625] eta 0:03:42 lr 0.000612 wd 0.0500 time 0.5717 (0.4091) data time 0.0009 (0.0066) model time 0.5708 (0.4097) loss 7.4304 (7.1190) grad_norm 2.5065 (2.9654) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][90/625] eta 0:03:42 lr 0.000612 wd 0.0500 time 0.6117 (0.4158) data time 0.0006 (0.0060) model time 0.6111 (0.4247) loss 7.8708 (7.1075) grad_norm 1.6937 (2.8968) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][100/625] eta 0:03:43 lr 0.000612 wd 0.0500 time 0.5637 (0.4252) data time 0.0006 (0.0055) model time 0.5631 (0.4416) loss 8.2915 (7.1343) grad_norm 3.6941 (2.8754) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][110/625] eta 0:03:42 lr 0.000611 wd 0.0500 time 0.3963 (0.4319) data time 0.0008 (0.0051) model time 0.3955 (0.4511) loss 7.7897 (7.1238) grad_norm 3.0361 (2.8310) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][120/625] eta 0:03:38 lr 0.000611 wd 0.0500 time 0.3994 (0.4318) data time 0.0006 (0.0047) model time 0.3988 (0.4481) loss 8.3739 (7.1441) grad_norm 2.7972 (2.9640) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][130/625] eta 0:03:32 lr 0.000611 wd 0.0500 time 0.3967 (0.4293) data time 0.0006 (0.0044) model time 0.3961 (0.4418) loss 7.5643 (7.1633) grad_norm 1.8585 (2.9923) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][140/625] eta 0:03:27 lr 0.000611 wd 0.0500 time 0.4079 (0.4273) data time 0.0007 (0.0042) model time 0.4072 (0.4372) loss 7.9626 (7.2124) grad_norm 3.5456 (2.9477) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][150/625] eta 0:03:22 lr 0.000611 wd 0.0500 time 0.4114 (0.4256) data time 0.0006 (0.0040) model time 0.4107 (0.4335) loss 6.9768 (7.2153) grad_norm 1.6598 (2.8857) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][160/625] eta 0:03:17 lr 0.000611 wd 0.0500 time 0.3951 (0.4239) data time 0.0007 (0.0038) model time 0.3944 (0.4303) loss 5.5771 (7.1785) grad_norm 2.0437 (2.8434) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:37:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][170/625] eta 0:03:12 lr 0.000611 wd 0.0500 time 0.3991 (0.4225) data time 0.0009 (0.0036) model time 0.3982 (0.4276) loss 6.2801 (7.1816) grad_norm 5.0405 (2.8500) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][180/625] eta 0:03:07 lr 0.000611 wd 0.0500 time 0.4136 (0.4213) data time 0.0008 (0.0035) model time 0.4127 (0.4255) loss 7.2591 (7.1918) grad_norm 2.3723 (2.8293) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][190/625] eta 0:03:02 lr 0.000611 wd 0.0500 time 0.3937 (0.4202) data time 0.0007 (0.0034) model time 0.3930 (0.4236) loss 8.2336 (7.1731) grad_norm 2.7541 (2.8283) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][200/625] eta 0:02:58 lr 0.000611 wd 0.0500 time 0.4004 (0.4193) data time 0.0008 (0.0032) model time 0.3996 (0.4221) loss 7.9099 (7.1770) grad_norm 4.5956 (2.8453) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][210/625] eta 0:02:53 lr 0.000610 wd 0.0500 time 0.4020 (0.4185) data time 0.0006 (0.0031) model time 0.4014 (0.4209) loss 6.5042 (7.1664) grad_norm 1.8573 (2.8873) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][220/625] eta 0:02:49 lr 0.000610 wd 0.0500 time 0.3991 (0.4177) data time 0.0007 (0.0030) model time 0.3984 (0.4196) loss 8.5492 (7.1749) grad_norm 2.7408 (2.8867) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][230/625] eta 0:02:44 lr 0.000610 wd 0.0500 time 0.4012 (0.4172) data time 0.0007 (0.0029) model time 0.4005 (0.4188) loss 7.6695 (7.1814) grad_norm 3.1252 (2.8808) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][240/625] eta 0:02:40 lr 0.000610 wd 0.0500 time 0.3984 (0.4164) data time 0.0007 (0.0029) model time 0.3977 (0.4177) loss 6.8137 (7.1699) grad_norm 3.2488 (2.8673) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][250/625] eta 0:02:35 lr 0.000610 wd 0.0500 time 0.3942 (0.4158) data time 0.0006 (0.0028) model time 0.3936 (0.4168) loss 6.4750 (7.1675) grad_norm 2.0713 (2.8600) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][260/625] eta 0:02:31 lr 0.000610 wd 0.0500 time 0.3946 (0.4153) data time 0.0010 (0.0027) model time 0.3937 (0.4161) loss 6.1525 (7.1744) grad_norm 3.4837 (2.8612) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][270/625] eta 0:02:27 lr 0.000610 wd 0.0500 time 0.4101 (0.4147) data time 0.0009 (0.0026) model time 0.4092 (0.4153) loss 7.9467 (7.1863) grad_norm 2.4428 (2.8591) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][280/625] eta 0:02:22 lr 0.000610 wd 0.0500 time 0.3981 (0.4141) data time 0.0007 (0.0026) model time 0.3974 (0.4145) loss 7.7413 (7.1846) grad_norm 2.1687 (2.8782) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][290/625] eta 0:02:18 lr 0.000610 wd 0.0500 time 0.4002 (0.4136) data time 0.0009 (0.0025) model time 0.3993 (0.4138) loss 5.8347 (7.1846) grad_norm 1.8955 (2.8702) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][300/625] eta 0:02:14 lr 0.000609 wd 0.0500 time 0.4069 (0.4137) data time 0.0008 (0.0025) model time 0.4061 (0.4138) loss 7.4277 (7.1849) grad_norm 2.4942 (2.8642) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:38:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][310/625] eta 0:02:10 lr 0.000609 wd 0.0500 time 0.3964 (0.4158) data time 0.0007 (0.0024) model time 0.3957 (0.4163) loss 7.4300 (7.1850) grad_norm 4.4904 (2.8560) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][320/625] eta 0:02:07 lr 0.000609 wd 0.0500 time 0.5856 (0.4185) data time 0.0009 (0.0024) model time 0.5847 (0.4195) loss 7.0726 (7.1939) grad_norm 2.3780 (2.8437) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][330/625] eta 0:02:04 lr 0.000609 wd 0.0500 time 0.3997 (0.4209) data time 0.0007 (0.0023) model time 0.3990 (0.4222) loss 6.3592 (7.1772) grad_norm 2.3781 (2.8261) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][340/625] eta 0:01:59 lr 0.000609 wd 0.0500 time 0.3999 (0.4208) data time 0.0008 (0.0023) model time 0.3991 (0.4220) loss 8.0116 (7.1781) grad_norm 1.7723 (2.8349) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][350/625] eta 0:01:55 lr 0.000609 wd 0.0500 time 0.4004 (0.4202) data time 0.0009 (0.0023) model time 0.3995 (0.4213) loss 7.3850 (7.1681) grad_norm 3.2044 (2.8271) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][360/625] eta 0:01:51 lr 0.000609 wd 0.0500 time 0.3987 (0.4197) data time 0.0007 (0.0022) model time 0.3980 (0.4206) loss 6.5424 (7.1589) grad_norm 4.8072 (2.8277) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][370/625] eta 0:01:46 lr 0.000609 wd 0.0500 time 0.3987 (0.4191) data time 0.0007 (0.0022) model time 0.3980 (0.4199) loss 6.5410 (7.1384) grad_norm 3.2561 (2.8287) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][380/625] eta 0:01:42 lr 0.000609 wd 0.0500 time 0.4034 (0.4187) data time 0.0008 (0.0022) model time 0.4026 (0.4193) loss 5.6558 (7.1449) grad_norm 2.2526 (2.8245) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][390/625] eta 0:01:38 lr 0.000609 wd 0.0500 time 0.3970 (0.4182) data time 0.0009 (0.0021) model time 0.3960 (0.4187) loss 7.0895 (7.1529) grad_norm 2.3977 (2.8184) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][400/625] eta 0:01:33 lr 0.000608 wd 0.0500 time 0.3986 (0.4177) data time 0.0007 (0.0021) model time 0.3979 (0.4181) loss 6.9598 (7.1494) grad_norm 2.7927 (2.8141) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][410/625] eta 0:01:29 lr 0.000608 wd 0.0500 time 0.4254 (0.4173) data time 0.0007 (0.0021) model time 0.4247 (0.4176) loss 6.2633 (7.1400) grad_norm 3.1763 (2.8060) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][420/625] eta 0:01:25 lr 0.000608 wd 0.0500 time 0.3925 (0.4170) data time 0.0009 (0.0020) model time 0.3916 (0.4173) loss 7.9608 (7.1440) grad_norm 2.9495 (2.7994) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][430/625] eta 0:01:21 lr 0.000608 wd 0.0500 time 0.3998 (0.4168) data time 0.0010 (0.0020) model time 0.3988 (0.4170) loss 8.3700 (7.1526) grad_norm 3.7986 (2.7934) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][440/625] eta 0:01:17 lr 0.000608 wd 0.0500 time 0.4014 (0.4166) data time 0.0008 (0.0020) model time 0.4005 (0.4167) loss 6.0916 (7.1476) grad_norm 2.1658 (2.7840) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:39:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][450/625] eta 0:01:12 lr 0.000608 wd 0.0500 time 0.4014 (0.4163) data time 0.0009 (0.0020) model time 0.4005 (0.4164) loss 5.9626 (7.1464) grad_norm 1.6494 (2.7813) loss_scale 1024.0000 (517.6763) mem 14939MB [2024-07-25 03:39:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][460/625] eta 0:01:08 lr 0.000608 wd 0.0500 time 0.3978 (0.4159) data time 0.0009 (0.0020) model time 0.3970 (0.4159) loss 6.3772 (7.1425) grad_norm 2.1100 (2.7671) loss_scale 1024.0000 (528.6594) mem 14939MB [2024-07-25 03:40:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][470/625] eta 0:01:04 lr 0.000608 wd 0.0500 time 0.4008 (0.4156) data time 0.0009 (0.0019) model time 0.3999 (0.4155) loss 7.0412 (7.1436) grad_norm 2.3751 (2.7551) loss_scale 1024.0000 (539.1762) mem 14939MB [2024-07-25 03:40:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][480/625] eta 0:01:00 lr 0.000608 wd 0.0500 time 0.3966 (0.4153) data time 0.0008 (0.0019) model time 0.3957 (0.4152) loss 6.6803 (7.1376) grad_norm 2.9559 (2.7563) loss_scale 1024.0000 (549.2557) mem 14939MB [2024-07-25 03:40:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][490/625] eta 0:00:56 lr 0.000607 wd 0.0500 time 0.3972 (0.4150) data time 0.0007 (0.0019) model time 0.3965 (0.4148) loss 5.3064 (7.1295) grad_norm 2.5953 (2.7450) loss_scale 1024.0000 (558.9246) mem 14939MB [2024-07-25 03:40:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][500/625] eta 0:00:51 lr 0.000607 wd 0.0500 time 0.3976 (0.4147) data time 0.0007 (0.0019) model time 0.3969 (0.4145) loss 7.6596 (7.1316) grad_norm 2.6004 (2.7370) loss_scale 1024.0000 (568.2076) mem 14939MB [2024-07-25 03:40:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][510/625] eta 0:00:47 lr 0.000607 wd 0.0500 time 0.3965 (0.4144) data time 0.0010 (0.0019) model time 0.3955 (0.4142) loss 8.0439 (7.1388) grad_norm 5.4523 (2.7521) loss_scale 1024.0000 (577.1272) mem 14939MB [2024-07-25 03:40:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][520/625] eta 0:00:43 lr 0.000607 wd 0.0500 time 0.4005 (0.4146) data time 0.0009 (0.0018) model time 0.3995 (0.4143) loss 7.0686 (7.1439) grad_norm 1.8406 (2.7446) loss_scale 1024.0000 (585.7044) mem 14939MB [2024-07-25 03:40:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][530/625] eta 0:00:39 lr 0.000607 wd 0.0500 time 0.3959 (0.4162) data time 0.0008 (0.0018) model time 0.3950 (0.4161) loss 6.7811 (7.1461) grad_norm 1.5188 (2.7435) loss_scale 1024.0000 (593.9586) mem 14939MB [2024-07-25 03:40:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][540/625] eta 0:00:35 lr 0.000607 wd 0.0500 time 0.3988 (0.4172) data time 0.0006 (0.0018) model time 0.3982 (0.4171) loss 5.8581 (7.1462) grad_norm 3.2610 (2.7431) loss_scale 1024.0000 (601.9076) mem 14939MB [2024-07-25 03:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][550/625] eta 0:00:31 lr 0.000607 wd 0.0500 time 0.5922 (0.4189) data time 0.0009 (0.0018) model time 0.5913 (0.4189) loss 7.6963 (7.1473) grad_norm 3.7887 (2.7372) loss_scale 1024.0000 (609.5681) mem 14939MB [2024-07-25 03:40:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][560/625] eta 0:00:27 lr 0.000607 wd 0.0500 time 0.4030 (0.4187) data time 0.0006 (0.0018) model time 0.4023 (0.4188) loss 6.8748 (7.1517) grad_norm 1.7680 (2.7276) loss_scale 1024.0000 (616.9554) mem 14939MB [2024-07-25 03:40:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][570/625] eta 0:00:23 lr 0.000607 wd 0.0500 time 0.3956 (0.4184) data time 0.0007 (0.0018) model time 0.3949 (0.4184) loss 7.3495 (7.1491) grad_norm 3.3047 (2.7221) loss_scale 1024.0000 (624.0841) mem 14939MB [2024-07-25 03:40:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][580/625] eta 0:00:18 lr 0.000606 wd 0.0500 time 0.3975 (0.4181) data time 0.0008 (0.0018) model time 0.3968 (0.4180) loss 7.3572 (7.1483) grad_norm 5.2993 (2.7224) loss_scale 1024.0000 (630.9673) mem 14939MB [2024-07-25 03:40:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][590/625] eta 0:00:14 lr 0.000606 wd 0.0500 time 0.3984 (0.4178) data time 0.0007 (0.0017) model time 0.3977 (0.4177) loss 7.0151 (7.1430) grad_norm 4.0938 (2.7146) loss_scale 1024.0000 (637.6176) mem 14939MB [2024-07-25 03:40:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][600/625] eta 0:00:10 lr 0.000606 wd 0.0500 time 0.3939 (0.4177) data time 0.0007 (0.0017) model time 0.3933 (0.4176) loss 6.8712 (7.1445) grad_norm 1.6091 (2.7109) loss_scale 1024.0000 (644.0466) mem 14939MB [2024-07-25 03:41:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][610/625] eta 0:00:06 lr 0.000606 wd 0.0500 time 0.4097 (0.4174) data time 0.0006 (0.0017) model time 0.4091 (0.4173) loss 8.4439 (7.1465) grad_norm 5.9634 (2.7174) loss_scale 1024.0000 (650.2651) mem 14939MB [2024-07-25 03:41:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][620/625] eta 0:00:02 lr 0.000606 wd 0.0500 time 0.4016 (0.4171) data time 0.0006 (0.0017) model time 0.4010 (0.4170) loss 7.7542 (7.1488) grad_norm 4.8025 (2.7388) loss_scale 1024.0000 (656.2834) mem 14939MB [2024-07-25 03:41:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 159 training takes 0:04:20 [2024-07-25 03:41:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:41:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:41:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5879 (0.5879) Acc@1 89.062 (89.062) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 03:41:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.9624 (0.7465) Acc@1 79.199 (85.405) Acc@5 96.191 (97.456) Mem 14939MB [2024-07-25 03:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.0791 (0.8749) Acc@1 74.707 (81.934) Acc@5 94.238 (96.124) Mem 14939MB [2024-07-25 03:41:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.534 Acc@5 96.099 [2024-07-25 03:41:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 03:41:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.763 (0.763) Loss 0.5620 (0.5620) Acc@1 89.404 (89.404) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:41:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.8994 (0.7017) Acc@1 80.908 (85.942) Acc@5 96.094 (97.603) Mem 14939MB [2024-07-25 03:41:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0312 (0.8258) Acc@1 75.439 (82.478) Acc@5 94.971 (96.329) Mem 14939MB [2024-07-25 03:41:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.076 Acc@5 96.291 [2024-07-25 03:41:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-07-25 03:41:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.08% [2024-07-25 03:41:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:41:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:41:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][0/625] eta 0:08:12 lr 0.000606 wd 0.0500 time 0.7880 (0.7880) data time 0.4121 (0.4121) model time 0.0000 (0.0000) loss 5.8230 (5.8230) grad_norm 3.5010 (3.5010) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:41:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][10/625] eta 0:04:26 lr 0.000606 wd 0.0500 time 0.4027 (0.4337) data time 0.0008 (0.0382) model time 0.0000 (0.0000) loss 8.4556 (6.6542) grad_norm 3.1038 (inf) loss_scale 512.0000 (744.7273) mem 14939MB [2024-07-25 03:41:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][20/625] eta 0:04:12 lr 0.000606 wd 0.0500 time 0.3979 (0.4173) data time 0.0007 (0.0204) model time 0.0000 (0.0000) loss 6.2709 (6.7504) grad_norm 3.4444 (inf) loss_scale 512.0000 (633.9048) mem 14939MB [2024-07-25 03:41:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][30/625] eta 0:04:05 lr 0.000606 wd 0.0500 time 0.3998 (0.4133) data time 0.0006 (0.0141) model time 0.0000 (0.0000) loss 8.2925 (6.9674) grad_norm 1.9161 (inf) loss_scale 512.0000 (594.5806) mem 14939MB [2024-07-25 03:41:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][40/625] eta 0:04:00 lr 0.000606 wd 0.0500 time 0.4010 (0.4111) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 7.6370 (7.0366) grad_norm 2.2671 (inf) loss_scale 512.0000 (574.4390) mem 14939MB [2024-07-25 03:41:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][50/625] eta 0:03:54 lr 0.000605 wd 0.0500 time 0.3932 (0.4086) data time 0.0007 (0.0098) model time 0.0000 (0.0000) loss 7.0107 (7.0511) grad_norm 2.2700 (inf) loss_scale 512.0000 (562.1961) mem 14939MB [2024-07-25 03:41:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][60/625] eta 0:03:51 lr 0.000605 wd 0.0500 time 0.4067 (0.4103) data time 0.0008 (0.0083) model time 0.4059 (0.4181) loss 7.2702 (7.0944) grad_norm 4.0378 (inf) loss_scale 512.0000 (553.9672) mem 14939MB [2024-07-25 03:41:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][70/625] eta 0:03:47 lr 0.000605 wd 0.0500 time 0.4179 (0.4097) data time 0.0008 (0.0073) model time 0.4171 (0.4118) loss 6.4829 (7.0637) grad_norm 2.9687 (inf) loss_scale 512.0000 (548.0563) mem 14939MB [2024-07-25 03:41:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][80/625] eta 0:03:42 lr 0.000605 wd 0.0500 time 0.3973 (0.4083) data time 0.0009 (0.0065) model time 0.3964 (0.4069) loss 7.1625 (7.0106) grad_norm 2.1911 (inf) loss_scale 512.0000 (543.6049) mem 14939MB [2024-07-25 03:41:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][90/625] eta 0:03:37 lr 0.000605 wd 0.0500 time 0.4023 (0.4074) data time 0.0009 (0.0059) model time 0.4015 (0.4051) loss 6.9938 (7.0435) grad_norm 2.7727 (inf) loss_scale 512.0000 (540.1319) mem 14939MB [2024-07-25 03:41:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][100/625] eta 0:03:33 lr 0.000605 wd 0.0500 time 0.4000 (0.4070) data time 0.0008 (0.0054) model time 0.3992 (0.4046) loss 8.0688 (7.0351) grad_norm 1.7761 (inf) loss_scale 512.0000 (537.3465) mem 14939MB [2024-07-25 03:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][110/625] eta 0:03:29 lr 0.000605 wd 0.0500 time 0.3989 (0.4065) data time 0.0007 (0.0050) model time 0.3982 (0.4039) loss 6.7746 (7.0428) grad_norm 1.6513 (inf) loss_scale 512.0000 (535.0631) mem 14939MB [2024-07-25 03:42:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][120/625] eta 0:03:26 lr 0.000605 wd 0.0500 time 0.5020 (0.4084) data time 0.0009 (0.0047) model time 0.5011 (0.4074) loss 5.4151 (7.0301) grad_norm 2.9803 (inf) loss_scale 512.0000 (533.1570) mem 14939MB [2024-07-25 03:42:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][130/625] eta 0:03:24 lr 0.000605 wd 0.0500 time 0.5894 (0.4122) data time 0.0009 (0.0044) model time 0.5885 (0.4136) loss 7.3210 (7.0610) grad_norm 1.9787 (inf) loss_scale 512.0000 (531.5420) mem 14939MB [2024-07-25 03:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][140/625] eta 0:03:23 lr 0.000605 wd 0.0500 time 0.5938 (0.4200) data time 0.0009 (0.0041) model time 0.5929 (0.4255) loss 7.0206 (7.0510) grad_norm 3.1051 (inf) loss_scale 512.0000 (530.1560) mem 14939MB [2024-07-25 03:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][150/625] eta 0:03:21 lr 0.000604 wd 0.0500 time 0.3966 (0.4233) data time 0.0006 (0.0039) model time 0.3960 (0.4298) loss 7.2476 (7.0860) grad_norm 1.9860 (inf) loss_scale 512.0000 (528.9536) mem 14939MB [2024-07-25 03:42:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][160/625] eta 0:03:16 lr 0.000604 wd 0.0500 time 0.3993 (0.4227) data time 0.0007 (0.0037) model time 0.3986 (0.4283) loss 5.8963 (7.0948) grad_norm 2.5171 (inf) loss_scale 512.0000 (527.9006) mem 14939MB [2024-07-25 03:42:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][170/625] eta 0:03:11 lr 0.000604 wd 0.0500 time 0.3959 (0.4213) data time 0.0009 (0.0036) model time 0.3951 (0.4257) loss 6.9478 (7.0940) grad_norm 2.3384 (inf) loss_scale 512.0000 (526.9708) mem 14939MB [2024-07-25 03:42:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][180/625] eta 0:03:06 lr 0.000604 wd 0.0500 time 0.3992 (0.4201) data time 0.0009 (0.0034) model time 0.3984 (0.4237) loss 8.2748 (7.1053) grad_norm 2.8984 (inf) loss_scale 512.0000 (526.1436) mem 14939MB [2024-07-25 03:42:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][190/625] eta 0:03:02 lr 0.000604 wd 0.0500 time 0.3983 (0.4189) data time 0.0009 (0.0033) model time 0.3974 (0.4218) loss 6.2245 (7.1044) grad_norm 2.1134 (inf) loss_scale 512.0000 (525.4031) mem 14939MB [2024-07-25 03:42:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][200/625] eta 0:02:57 lr 0.000604 wd 0.0500 time 0.3980 (0.4180) data time 0.0007 (0.0032) model time 0.3973 (0.4202) loss 6.3516 (7.1153) grad_norm 2.3734 (inf) loss_scale 512.0000 (524.7363) mem 14939MB [2024-07-25 03:42:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][210/625] eta 0:02:53 lr 0.000604 wd 0.0500 time 0.3976 (0.4171) data time 0.0008 (0.0031) model time 0.3968 (0.4189) loss 7.5690 (7.1222) grad_norm 3.0393 (inf) loss_scale 512.0000 (524.1327) mem 14939MB [2024-07-25 03:42:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][220/625] eta 0:02:48 lr 0.000604 wd 0.0500 time 0.4009 (0.4163) data time 0.0007 (0.0030) model time 0.4002 (0.4176) loss 7.3546 (7.1186) grad_norm 1.9556 (inf) loss_scale 512.0000 (523.5837) mem 14939MB [2024-07-25 03:42:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][230/625] eta 0:02:44 lr 0.000604 wd 0.0500 time 0.3983 (0.4156) data time 0.0009 (0.0029) model time 0.3974 (0.4166) loss 8.3423 (7.1289) grad_norm 3.3960 (inf) loss_scale 512.0000 (523.0823) mem 14939MB [2024-07-25 03:42:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][240/625] eta 0:02:39 lr 0.000603 wd 0.0500 time 0.4025 (0.4149) data time 0.0007 (0.0028) model time 0.4018 (0.4157) loss 6.3345 (7.1309) grad_norm 1.7895 (inf) loss_scale 512.0000 (522.6224) mem 14939MB [2024-07-25 03:42:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][250/625] eta 0:02:35 lr 0.000603 wd 0.0500 time 0.3975 (0.4143) data time 0.0008 (0.0027) model time 0.3967 (0.4148) loss 6.8571 (7.1221) grad_norm 3.0267 (inf) loss_scale 512.0000 (522.1992) mem 14939MB [2024-07-25 03:43:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][260/625] eta 0:02:30 lr 0.000603 wd 0.0500 time 0.3994 (0.4137) data time 0.0009 (0.0027) model time 0.3986 (0.4140) loss 7.6299 (7.1234) grad_norm 3.1562 (inf) loss_scale 512.0000 (521.8084) mem 14939MB [2024-07-25 03:43:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][270/625] eta 0:02:26 lr 0.000603 wd 0.0500 time 0.3997 (0.4132) data time 0.0008 (0.0026) model time 0.3988 (0.4134) loss 6.5048 (7.1147) grad_norm 2.9910 (inf) loss_scale 512.0000 (521.4465) mem 14939MB [2024-07-25 03:43:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][280/625] eta 0:02:22 lr 0.000603 wd 0.0500 time 0.3968 (0.4135) data time 0.0010 (0.0025) model time 0.3958 (0.4136) loss 7.7573 (7.1130) grad_norm 2.8505 (inf) loss_scale 512.0000 (521.1103) mem 14939MB [2024-07-25 03:43:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][290/625] eta 0:02:18 lr 0.000603 wd 0.0500 time 0.4030 (0.4130) data time 0.0007 (0.0025) model time 0.4023 (0.4130) loss 6.4049 (7.1190) grad_norm 3.0268 (inf) loss_scale 512.0000 (520.7973) mem 14939MB [2024-07-25 03:43:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][300/625] eta 0:02:14 lr 0.000603 wd 0.0500 time 0.3982 (0.4125) data time 0.0006 (0.0024) model time 0.3975 (0.4124) loss 7.4765 (7.1237) grad_norm 3.1379 (inf) loss_scale 512.0000 (520.5050) mem 14939MB [2024-07-25 03:43:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][310/625] eta 0:02:09 lr 0.000603 wd 0.0500 time 0.4013 (0.4122) data time 0.0010 (0.0024) model time 0.4002 (0.4119) loss 7.5335 (7.1286) grad_norm 2.1429 (inf) loss_scale 512.0000 (520.2315) mem 14939MB [2024-07-25 03:43:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][320/625] eta 0:02:05 lr 0.000603 wd 0.0500 time 0.3953 (0.4118) data time 0.0008 (0.0023) model time 0.3945 (0.4115) loss 6.1944 (7.1345) grad_norm 1.9582 (inf) loss_scale 512.0000 (519.9751) mem 14939MB [2024-07-25 03:43:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][330/625] eta 0:02:01 lr 0.000602 wd 0.0500 time 0.3955 (0.4114) data time 0.0007 (0.0023) model time 0.3948 (0.4110) loss 7.8086 (7.1336) grad_norm 2.6454 (inf) loss_scale 512.0000 (519.7341) mem 14939MB [2024-07-25 03:43:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][340/625] eta 0:01:57 lr 0.000602 wd 0.0500 time 0.5746 (0.4119) data time 0.0008 (0.0022) model time 0.5739 (0.4116) loss 6.0680 (7.1277) grad_norm 1.5738 (inf) loss_scale 512.0000 (519.5073) mem 14939MB [2024-07-25 03:43:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][350/625] eta 0:01:53 lr 0.000602 wd 0.0500 time 0.5632 (0.4137) data time 0.0008 (0.0022) model time 0.5624 (0.4137) loss 7.6194 (7.1324) grad_norm 2.3674 (inf) loss_scale 512.0000 (519.2934) mem 14939MB [2024-07-25 03:43:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][360/625] eta 0:01:50 lr 0.000602 wd 0.0500 time 0.5942 (0.4165) data time 0.0007 (0.0022) model time 0.5934 (0.4169) loss 8.0391 (7.1413) grad_norm 3.0405 (inf) loss_scale 512.0000 (519.0914) mem 14939MB [2024-07-25 03:43:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][370/625] eta 0:01:46 lr 0.000602 wd 0.0500 time 0.3967 (0.4181) data time 0.0009 (0.0021) model time 0.3958 (0.4186) loss 6.6718 (7.1327) grad_norm 3.0443 (inf) loss_scale 512.0000 (518.9003) mem 14939MB [2024-07-25 03:43:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][380/625] eta 0:01:42 lr 0.000602 wd 0.0500 time 0.3999 (0.4180) data time 0.0007 (0.0021) model time 0.3992 (0.4185) loss 7.8279 (7.1391) grad_norm 2.7748 (inf) loss_scale 512.0000 (518.7192) mem 14939MB [2024-07-25 03:43:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][390/625] eta 0:01:38 lr 0.000602 wd 0.0500 time 0.3988 (0.4175) data time 0.0008 (0.0021) model time 0.3980 (0.4180) loss 6.8171 (7.1338) grad_norm 3.0876 (inf) loss_scale 512.0000 (518.5473) mem 14939MB [2024-07-25 03:44:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][400/625] eta 0:01:33 lr 0.000602 wd 0.0500 time 0.3981 (0.4171) data time 0.0008 (0.0020) model time 0.3973 (0.4174) loss 7.0277 (7.1360) grad_norm 5.2939 (inf) loss_scale 512.0000 (518.3840) mem 14939MB [2024-07-25 03:44:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][410/625] eta 0:01:29 lr 0.000602 wd 0.0500 time 0.3977 (0.4167) data time 0.0010 (0.0020) model time 0.3967 (0.4169) loss 6.1563 (7.1375) grad_norm 5.7544 (inf) loss_scale 512.0000 (518.2287) mem 14939MB [2024-07-25 03:44:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][420/625] eta 0:01:25 lr 0.000602 wd 0.0500 time 0.3984 (0.4163) data time 0.0006 (0.0020) model time 0.3977 (0.4165) loss 6.3754 (7.1377) grad_norm 4.4662 (inf) loss_scale 512.0000 (518.0808) mem 14939MB [2024-07-25 03:44:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][430/625] eta 0:01:21 lr 0.000601 wd 0.0500 time 0.4015 (0.4160) data time 0.0008 (0.0020) model time 0.4007 (0.4161) loss 5.8007 (7.1372) grad_norm 2.9309 (inf) loss_scale 512.0000 (517.9397) mem 14939MB [2024-07-25 03:44:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][440/625] eta 0:01:16 lr 0.000601 wd 0.0500 time 0.4013 (0.4156) data time 0.0007 (0.0019) model time 0.4006 (0.4156) loss 7.4831 (7.1425) grad_norm 2.1252 (inf) loss_scale 512.0000 (517.8050) mem 14939MB [2024-07-25 03:44:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][450/625] eta 0:01:12 lr 0.000601 wd 0.0500 time 0.3989 (0.4153) data time 0.0007 (0.0019) model time 0.3982 (0.4153) loss 6.7835 (7.1358) grad_norm 1.7865 (inf) loss_scale 512.0000 (517.6763) mem 14939MB [2024-07-25 03:44:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][460/625] eta 0:01:08 lr 0.000601 wd 0.0500 time 0.3944 (0.4150) data time 0.0009 (0.0019) model time 0.3935 (0.4149) loss 5.3675 (7.1330) grad_norm 3.1313 (inf) loss_scale 512.0000 (517.5531) mem 14939MB [2024-07-25 03:44:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][470/625] eta 0:01:04 lr 0.000601 wd 0.0500 time 0.4018 (0.4147) data time 0.0007 (0.0019) model time 0.4011 (0.4145) loss 7.5433 (7.1371) grad_norm 3.4300 (inf) loss_scale 512.0000 (517.4352) mem 14939MB [2024-07-25 03:44:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][480/625] eta 0:01:00 lr 0.000601 wd 0.0500 time 0.3987 (0.4143) data time 0.0008 (0.0019) model time 0.3979 (0.4141) loss 6.1248 (7.1326) grad_norm 3.1993 (inf) loss_scale 512.0000 (517.3222) mem 14939MB [2024-07-25 03:44:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][490/625] eta 0:00:55 lr 0.000601 wd 0.0500 time 0.3967 (0.4140) data time 0.0007 (0.0018) model time 0.3960 (0.4137) loss 6.4905 (7.1447) grad_norm 1.6851 (inf) loss_scale 512.0000 (517.2138) mem 14939MB [2024-07-25 03:44:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][500/625] eta 0:00:51 lr 0.000601 wd 0.0500 time 0.3997 (0.4140) data time 0.0009 (0.0018) model time 0.3988 (0.4137) loss 7.6693 (7.1496) grad_norm 2.5168 (inf) loss_scale 512.0000 (517.1098) mem 14939MB [2024-07-25 03:44:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][510/625] eta 0:00:47 lr 0.000601 wd 0.0500 time 0.3950 (0.4137) data time 0.0006 (0.0018) model time 0.3944 (0.4134) loss 6.5470 (7.1517) grad_norm 2.1093 (inf) loss_scale 512.0000 (517.0098) mem 14939MB [2024-07-25 03:44:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][520/625] eta 0:00:43 lr 0.000600 wd 0.0500 time 0.4001 (0.4135) data time 0.0007 (0.0018) model time 0.3994 (0.4131) loss 6.4434 (7.1532) grad_norm 2.1680 (inf) loss_scale 512.0000 (516.9136) mem 14939MB [2024-07-25 03:44:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][530/625] eta 0:00:39 lr 0.000600 wd 0.0500 time 0.3996 (0.4132) data time 0.0009 (0.0018) model time 0.3987 (0.4128) loss 6.2146 (7.1563) grad_norm 1.5977 (inf) loss_scale 512.0000 (516.8211) mem 14939MB [2024-07-25 03:44:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][540/625] eta 0:00:35 lr 0.000600 wd 0.0500 time 0.3979 (0.4130) data time 0.0006 (0.0018) model time 0.3972 (0.4125) loss 8.2358 (7.1599) grad_norm 2.8585 (inf) loss_scale 512.0000 (516.7320) mem 14939MB [2024-07-25 03:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][550/625] eta 0:00:30 lr 0.000600 wd 0.0500 time 0.4005 (0.4128) data time 0.0008 (0.0017) model time 0.3996 (0.4123) loss 8.8541 (7.1658) grad_norm 2.4033 (inf) loss_scale 512.0000 (516.6461) mem 14939MB [2024-07-25 03:45:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][560/625] eta 0:00:26 lr 0.000600 wd 0.0500 time 0.6022 (0.4132) data time 0.0008 (0.0017) model time 0.6014 (0.4128) loss 6.7577 (7.1592) grad_norm 3.6611 (inf) loss_scale 512.0000 (516.5633) mem 14939MB [2024-07-25 03:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][570/625] eta 0:00:22 lr 0.000600 wd 0.0500 time 0.5911 (0.4141) data time 0.0009 (0.0017) model time 0.5902 (0.4137) loss 7.4477 (7.1627) grad_norm 3.9361 (inf) loss_scale 512.0000 (516.4834) mem 14939MB [2024-07-25 03:45:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][580/625] eta 0:00:18 lr 0.000600 wd 0.0500 time 0.5711 (0.4163) data time 0.0007 (0.0017) model time 0.5703 (0.4161) loss 6.6676 (7.1723) grad_norm 4.7194 (inf) loss_scale 512.0000 (516.4062) mem 14939MB [2024-07-25 03:45:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][590/625] eta 0:00:14 lr 0.000600 wd 0.0500 time 0.3975 (0.4171) data time 0.0009 (0.0017) model time 0.3965 (0.4170) loss 7.9190 (7.1690) grad_norm 2.3035 (inf) loss_scale 512.0000 (516.3316) mem 14939MB [2024-07-25 03:45:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][600/625] eta 0:00:10 lr 0.000600 wd 0.0500 time 0.3971 (0.4171) data time 0.0008 (0.0017) model time 0.3963 (0.4170) loss 6.2363 (7.1653) grad_norm 2.0470 (inf) loss_scale 512.0000 (516.2596) mem 14939MB [2024-07-25 03:45:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][610/625] eta 0:00:06 lr 0.000599 wd 0.0500 time 0.3995 (0.4169) data time 0.0006 (0.0017) model time 0.3989 (0.4167) loss 6.8373 (7.1697) grad_norm 4.3801 (inf) loss_scale 512.0000 (516.1899) mem 14939MB [2024-07-25 03:45:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][620/625] eta 0:00:02 lr 0.000599 wd 0.0500 time 0.4190 (0.4167) data time 0.0004 (0.0017) model time 0.4186 (0.4165) loss 6.5691 (7.1712) grad_norm 2.2932 (inf) loss_scale 512.0000 (516.1224) mem 14939MB [2024-07-25 03:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 160 training takes 0:04:20 [2024-07-25 03:45:35 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:45:36 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5947 (0.5947) Acc@1 88.867 (88.867) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 03:45:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9385 (0.7311) Acc@1 79.492 (85.427) Acc@5 95.654 (97.399) Mem 14939MB [2024-07-25 03:45:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0479 (0.8639) Acc@1 76.318 (81.945) Acc@5 94.238 (95.987) Mem 14939MB [2024-07-25 03:45:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.520 Acc@5 95.947 [2024-07-25 03:45:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 03:45:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.819 (0.819) Loss 0.5610 (0.5610) Acc@1 89.502 (89.502) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:45:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.160) Loss 0.8979 (0.7009) Acc@1 80.859 (85.977) Acc@5 96.143 (97.607) Mem 14939MB [2024-07-25 03:45:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.126) Loss 1.0303 (0.8248) Acc@1 75.488 (82.510) Acc@5 95.020 (96.331) Mem 14939MB [2024-07-25 03:45:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.116 Acc@5 96.287 [2024-07-25 03:45:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-07-25 03:45:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.12% [2024-07-25 03:45:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:45:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:45:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][0/625] eta 0:07:42 lr 0.000599 wd 0.0500 time 0.7405 (0.7405) data time 0.3645 (0.3645) model time 0.0000 (0.0000) loss 6.0645 (6.0645) grad_norm 1.7845 (1.7845) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:45:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][10/625] eta 0:04:25 lr 0.000599 wd 0.0500 time 0.3953 (0.4324) data time 0.0011 (0.0340) model time 0.0000 (0.0000) loss 7.3926 (6.9644) grad_norm 1.9299 (2.4462) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:45:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][20/625] eta 0:04:12 lr 0.000599 wd 0.0500 time 0.3990 (0.4171) data time 0.0007 (0.0183) model time 0.0000 (0.0000) loss 7.6582 (6.9300) grad_norm 3.5944 (2.7470) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:45:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][30/625] eta 0:04:08 lr 0.000599 wd 0.0500 time 0.3983 (0.4181) data time 0.0008 (0.0127) model time 0.0000 (0.0000) loss 7.4900 (7.0492) grad_norm 2.4245 (2.9056) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][40/625] eta 0:04:01 lr 0.000599 wd 0.0500 time 0.3990 (0.4137) data time 0.0007 (0.0098) model time 0.0000 (0.0000) loss 7.8314 (7.1176) grad_norm 2.2829 (2.8178) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][50/625] eta 0:03:56 lr 0.000599 wd 0.0500 time 0.4001 (0.4112) data time 0.0010 (0.0082) model time 0.0000 (0.0000) loss 6.9577 (7.1062) grad_norm 3.5364 (2.8292) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][60/625] eta 0:03:51 lr 0.000599 wd 0.0500 time 0.3979 (0.4092) data time 0.0010 (0.0071) model time 0.3969 (0.3983) loss 5.7619 (7.0883) grad_norm 2.8498 (3.1074) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][70/625] eta 0:03:46 lr 0.000599 wd 0.0500 time 0.3967 (0.4079) data time 0.0010 (0.0062) model time 0.3957 (0.3984) loss 8.1903 (7.1325) grad_norm 2.4218 (3.1693) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][80/625] eta 0:03:41 lr 0.000598 wd 0.0500 time 0.3992 (0.4070) data time 0.0009 (0.0056) model time 0.3983 (0.3988) loss 7.9286 (7.0984) grad_norm 3.2966 (3.1337) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][90/625] eta 0:03:37 lr 0.000598 wd 0.0500 time 0.3978 (0.4062) data time 0.0007 (0.0050) model time 0.3972 (0.3989) loss 8.6919 (7.0725) grad_norm 3.0347 (3.1381) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][100/625] eta 0:03:32 lr 0.000598 wd 0.0500 time 0.3981 (0.4055) data time 0.0009 (0.0046) model time 0.3972 (0.3988) loss 6.0838 (7.0638) grad_norm 2.5158 (3.1626) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][110/625] eta 0:03:28 lr 0.000598 wd 0.0500 time 0.3993 (0.4048) data time 0.0009 (0.0043) model time 0.3984 (0.3985) loss 7.4864 (7.0658) grad_norm 2.0203 (3.1289) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][120/625] eta 0:03:24 lr 0.000598 wd 0.0500 time 0.3962 (0.4044) data time 0.0009 (0.0040) model time 0.3953 (0.3985) loss 5.8003 (7.0655) grad_norm 2.9677 (3.1326) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][130/625] eta 0:03:19 lr 0.000598 wd 0.0500 time 0.3948 (0.4039) data time 0.0007 (0.0038) model time 0.3941 (0.3983) loss 7.8929 (7.0579) grad_norm 2.0329 (3.1231) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][140/625] eta 0:03:15 lr 0.000598 wd 0.0500 time 0.4086 (0.4035) data time 0.0007 (0.0036) model time 0.4079 (0.3982) loss 5.9926 (7.0682) grad_norm 2.1750 (3.1132) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][150/625] eta 0:03:11 lr 0.000598 wd 0.0500 time 0.3964 (0.4032) data time 0.0007 (0.0034) model time 0.3957 (0.3983) loss 8.1792 (7.0833) grad_norm 2.7787 (3.0852) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][160/625] eta 0:03:09 lr 0.000598 wd 0.0500 time 0.5900 (0.4072) data time 0.0008 (0.0032) model time 0.5892 (0.4044) loss 8.4039 (7.1009) grad_norm 4.7342 (3.1372) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][170/625] eta 0:03:07 lr 0.000598 wd 0.0500 time 0.3978 (0.4113) data time 0.0006 (0.0031) model time 0.3972 (0.4104) loss 6.5353 (7.0963) grad_norm 2.1209 (3.1113) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:46:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][180/625] eta 0:03:05 lr 0.000597 wd 0.0500 time 0.3995 (0.4175) data time 0.0010 (0.0030) model time 0.3985 (0.4191) loss 8.3851 (7.1160) grad_norm 2.1733 (3.1120) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][190/625] eta 0:03:02 lr 0.000597 wd 0.0500 time 0.3988 (0.4200) data time 0.0008 (0.0029) model time 0.3979 (0.4223) loss 7.4193 (7.1256) grad_norm 2.9270 (3.0769) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][200/625] eta 0:02:58 lr 0.000597 wd 0.0500 time 0.3997 (0.4190) data time 0.0009 (0.0028) model time 0.3988 (0.4208) loss 6.1833 (7.1286) grad_norm 1.9011 (3.0482) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][210/625] eta 0:02:53 lr 0.000597 wd 0.0500 time 0.4022 (0.4181) data time 0.0006 (0.0027) model time 0.4016 (0.4194) loss 6.6789 (7.1226) grad_norm 3.0800 (3.0488) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][220/625] eta 0:02:48 lr 0.000597 wd 0.0500 time 0.3985 (0.4172) data time 0.0007 (0.0026) model time 0.3978 (0.4181) loss 8.2124 (7.1400) grad_norm 2.9313 (3.0721) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][230/625] eta 0:02:44 lr 0.000597 wd 0.0500 time 0.4013 (0.4165) data time 0.0009 (0.0025) model time 0.4004 (0.4171) loss 8.2777 (7.1560) grad_norm 2.4506 (3.0596) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][240/625] eta 0:02:40 lr 0.000597 wd 0.0500 time 0.3977 (0.4158) data time 0.0009 (0.0025) model time 0.3968 (0.4161) loss 8.6601 (7.1547) grad_norm 2.6050 (3.0881) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][250/625] eta 0:02:35 lr 0.000597 wd 0.0500 time 0.3987 (0.4158) data time 0.0010 (0.0024) model time 0.3978 (0.4160) loss 7.4403 (7.1505) grad_norm 2.2855 (3.0846) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][260/625] eta 0:02:31 lr 0.000597 wd 0.0500 time 0.4035 (0.4152) data time 0.0009 (0.0024) model time 0.4026 (0.4153) loss 7.6733 (7.1547) grad_norm 2.2011 (3.0781) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][270/625] eta 0:02:27 lr 0.000596 wd 0.0500 time 0.3960 (0.4147) data time 0.0008 (0.0023) model time 0.3953 (0.4146) loss 6.6746 (7.1323) grad_norm 2.5573 (3.0672) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][280/625] eta 0:02:22 lr 0.000596 wd 0.0500 time 0.4008 (0.4143) data time 0.0008 (0.0023) model time 0.3999 (0.4140) loss 6.7176 (7.1451) grad_norm 2.9281 (3.0492) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][290/625] eta 0:02:18 lr 0.000596 wd 0.0500 time 0.3979 (0.4138) data time 0.0007 (0.0022) model time 0.3972 (0.4134) loss 6.2228 (7.1284) grad_norm 1.6998 (3.0432) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][300/625] eta 0:02:14 lr 0.000596 wd 0.0500 time 0.3980 (0.4134) data time 0.0009 (0.0022) model time 0.3971 (0.4129) loss 8.2270 (7.1305) grad_norm 2.5767 (3.0353) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][310/625] eta 0:02:10 lr 0.000596 wd 0.0500 time 0.3990 (0.4129) data time 0.0009 (0.0021) model time 0.3981 (0.4123) loss 5.2553 (7.1330) grad_norm 3.3460 (3.0174) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][320/625] eta 0:02:05 lr 0.000596 wd 0.0500 time 0.4112 (0.4125) data time 0.0009 (0.0021) model time 0.4103 (0.4119) loss 5.9575 (7.1215) grad_norm 1.7274 (2.9929) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:47:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][330/625] eta 0:02:01 lr 0.000596 wd 0.0500 time 0.3961 (0.4121) data time 0.0008 (0.0021) model time 0.3953 (0.4114) loss 6.4500 (7.1096) grad_norm 2.5688 (2.9688) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][340/625] eta 0:01:57 lr 0.000596 wd 0.0500 time 0.3972 (0.4117) data time 0.0007 (0.0020) model time 0.3965 (0.4109) loss 5.7504 (7.1018) grad_norm 1.8605 (2.9405) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][350/625] eta 0:01:53 lr 0.000596 wd 0.0500 time 0.3999 (0.4114) data time 0.0008 (0.0020) model time 0.3991 (0.4105) loss 7.1786 (7.1146) grad_norm 4.2826 (2.9511) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][360/625] eta 0:01:48 lr 0.000595 wd 0.0500 time 0.3940 (0.4111) data time 0.0009 (0.0020) model time 0.3931 (0.4101) loss 6.8817 (7.1264) grad_norm 2.7929 (2.9565) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][370/625] eta 0:01:44 lr 0.000595 wd 0.0500 time 0.3946 (0.4109) data time 0.0006 (0.0019) model time 0.3941 (0.4099) loss 6.2430 (7.1326) grad_norm 3.6403 (2.9564) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][380/625] eta 0:01:40 lr 0.000595 wd 0.0500 time 0.3975 (0.4114) data time 0.0006 (0.0019) model time 0.3968 (0.4105) loss 6.1435 (7.1389) grad_norm 2.3435 (2.9332) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][390/625] eta 0:01:36 lr 0.000595 wd 0.0500 time 0.6332 (0.4126) data time 0.0007 (0.0019) model time 0.6325 (0.4119) loss 6.8477 (7.1367) grad_norm 2.2812 (2.9239) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][400/625] eta 0:01:33 lr 0.000595 wd 0.0500 time 0.3979 (0.4150) data time 0.0006 (0.0019) model time 0.3973 (0.4146) loss 7.3078 (7.1454) grad_norm 2.1242 (2.9169) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][410/625] eta 0:01:29 lr 0.000595 wd 0.0500 time 0.3953 (0.4164) data time 0.0007 (0.0018) model time 0.3947 (0.4162) loss 5.9771 (7.1532) grad_norm 1.9575 (2.9143) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][420/625] eta 0:01:25 lr 0.000595 wd 0.0500 time 0.3959 (0.4160) data time 0.0009 (0.0018) model time 0.3950 (0.4157) loss 8.0236 (7.1621) grad_norm 7.6957 (2.9246) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][430/625] eta 0:01:21 lr 0.000595 wd 0.0500 time 0.3963 (0.4156) data time 0.0007 (0.0018) model time 0.3956 (0.4152) loss 7.6063 (7.1644) grad_norm 2.3790 (2.9195) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][440/625] eta 0:01:16 lr 0.000595 wd 0.0500 time 0.3988 (0.4151) data time 0.0008 (0.0018) model time 0.3980 (0.4147) loss 7.5602 (7.1577) grad_norm 2.5055 (2.9169) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][450/625] eta 0:01:12 lr 0.000595 wd 0.0500 time 0.4119 (0.4148) data time 0.0007 (0.0018) model time 0.4112 (0.4144) loss 6.8582 (7.1527) grad_norm 2.3146 (2.9022) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][460/625] eta 0:01:08 lr 0.000594 wd 0.0500 time 0.3996 (0.4145) data time 0.0009 (0.0017) model time 0.3987 (0.4139) loss 7.2742 (7.1552) grad_norm 1.4691 (2.8857) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:48:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][470/625] eta 0:01:04 lr 0.000594 wd 0.0500 time 0.3980 (0.4145) data time 0.0007 (0.0017) model time 0.3973 (0.4140) loss 7.9740 (7.1620) grad_norm 2.3901 (2.8838) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][480/625] eta 0:01:00 lr 0.000594 wd 0.0500 time 0.3983 (0.4142) data time 0.0006 (0.0017) model time 0.3977 (0.4136) loss 6.2527 (7.1625) grad_norm 2.1702 (2.8716) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][490/625] eta 0:00:55 lr 0.000594 wd 0.0500 time 0.3970 (0.4139) data time 0.0006 (0.0017) model time 0.3964 (0.4133) loss 6.4364 (7.1574) grad_norm 3.5223 (2.8724) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][500/625] eta 0:00:51 lr 0.000594 wd 0.0500 time 0.3965 (0.4136) data time 0.0009 (0.0017) model time 0.3957 (0.4129) loss 7.9161 (7.1478) grad_norm 1.8114 (2.8680) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][510/625] eta 0:00:47 lr 0.000594 wd 0.0500 time 0.3986 (0.4133) data time 0.0008 (0.0017) model time 0.3977 (0.4126) loss 7.4687 (7.1463) grad_norm 1.6784 (2.8523) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][520/625] eta 0:00:43 lr 0.000594 wd 0.0500 time 0.3987 (0.4130) data time 0.0008 (0.0017) model time 0.3979 (0.4123) loss 8.1537 (7.1530) grad_norm 1.7183 (2.8436) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][530/625] eta 0:00:39 lr 0.000594 wd 0.0500 time 0.4005 (0.4128) data time 0.0008 (0.0017) model time 0.3997 (0.4120) loss 7.1406 (7.1559) grad_norm 2.1005 (2.8310) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][540/625] eta 0:00:35 lr 0.000594 wd 0.0500 time 0.4005 (0.4125) data time 0.0009 (0.0016) model time 0.3996 (0.4117) loss 7.6739 (7.1575) grad_norm 3.5592 (2.8303) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][550/625] eta 0:00:30 lr 0.000593 wd 0.0500 time 0.3935 (0.4123) data time 0.0009 (0.0016) model time 0.3926 (0.4115) loss 7.4726 (7.1627) grad_norm 2.9567 (2.8397) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][560/625] eta 0:00:26 lr 0.000593 wd 0.0500 time 0.3990 (0.4121) data time 0.0007 (0.0016) model time 0.3983 (0.4112) loss 9.0359 (7.1642) grad_norm 4.1456 (2.8466) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][570/625] eta 0:00:22 lr 0.000593 wd 0.0500 time 0.4027 (0.4119) data time 0.0007 (0.0016) model time 0.4020 (0.4110) loss 6.0257 (7.1550) grad_norm 2.5918 (2.8526) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][580/625] eta 0:00:18 lr 0.000593 wd 0.0500 time 0.3991 (0.4117) data time 0.0006 (0.0016) model time 0.3984 (0.4108) loss 8.0586 (7.1539) grad_norm 2.0708 (2.8449) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][590/625] eta 0:00:14 lr 0.000593 wd 0.0500 time 0.4020 (0.4115) data time 0.0006 (0.0016) model time 0.4013 (0.4106) loss 7.8506 (7.1606) grad_norm 1.7754 (2.8362) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][600/625] eta 0:00:10 lr 0.000593 wd 0.0500 time 0.4027 (0.4120) data time 0.0008 (0.0016) model time 0.4020 (0.4111) loss 6.9992 (7.1628) grad_norm 2.2819 (2.8237) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:49:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][610/625] eta 0:00:06 lr 0.000593 wd 0.0500 time 0.3940 (0.4131) data time 0.0005 (0.0016) model time 0.3936 (0.4123) loss 7.5121 (7.1621) grad_norm 1.7807 (2.8111) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][620/625] eta 0:00:02 lr 0.000593 wd 0.0500 time 0.3933 (0.4144) data time 0.0006 (0.0015) model time 0.3927 (0.4137) loss 7.1214 (7.1633) grad_norm 3.1457 (2.8047) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 161 training takes 0:04:19 [2024-07-25 03:50:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:50:02 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:50:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.6035 (0.6035) Acc@1 88.574 (88.574) Acc@5 98.291 (98.291) Mem 14939MB [2024-07-25 03:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.124) Loss 0.9336 (0.7318) Acc@1 79.736 (85.409) Acc@5 95.654 (97.425) Mem 14939MB [2024-07-25 03:50:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.106) Loss 1.0742 (0.8598) Acc@1 74.756 (81.966) Acc@5 93.994 (96.036) Mem 14939MB [2024-07-25 03:50:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.610 Acc@5 96.029 [2024-07-25 03:50:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 03:50:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.845 (0.845) Loss 0.5610 (0.5610) Acc@1 89.502 (89.502) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:50:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.160) Loss 0.8965 (0.7005) Acc@1 80.713 (85.969) Acc@5 96.094 (97.616) Mem 14939MB [2024-07-25 03:50:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.125) Loss 1.0293 (0.8240) Acc@1 75.488 (82.564) Acc@5 95.068 (96.354) Mem 14939MB [2024-07-25 03:50:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.166 Acc@5 96.313 [2024-07-25 03:50:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-07-25 03:50:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.17% [2024-07-25 03:50:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:50:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:50:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][0/625] eta 0:07:55 lr 0.000593 wd 0.0500 time 0.7610 (0.7610) data time 0.3831 (0.3831) model time 0.0000 (0.0000) loss 6.1686 (6.1686) grad_norm 2.3781 (2.3781) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][10/625] eta 0:04:43 lr 0.000593 wd 0.0500 time 0.4023 (0.4604) data time 0.0008 (0.0356) model time 0.0000 (0.0000) loss 6.4111 (6.9139) grad_norm 2.6222 (2.1742) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][20/625] eta 0:04:20 lr 0.000592 wd 0.0500 time 0.3938 (0.4307) data time 0.0010 (0.0192) model time 0.0000 (0.0000) loss 7.7621 (6.8927) grad_norm 2.4057 (2.1076) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][30/625] eta 0:04:10 lr 0.000592 wd 0.0500 time 0.4008 (0.4209) data time 0.0009 (0.0133) model time 0.0000 (0.0000) loss 7.1196 (6.9731) grad_norm 3.9367 (2.3109) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][40/625] eta 0:04:03 lr 0.000592 wd 0.0500 time 0.3982 (0.4158) data time 0.0006 (0.0103) model time 0.0000 (0.0000) loss 6.9169 (7.0610) grad_norm 3.3333 (2.6227) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][50/625] eta 0:03:57 lr 0.000592 wd 0.0500 time 0.3973 (0.4128) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 7.6362 (7.0609) grad_norm 3.1067 (2.7328) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][60/625] eta 0:03:52 lr 0.000592 wd 0.0500 time 0.4059 (0.4110) data time 0.0008 (0.0072) model time 0.4051 (0.4010) loss 8.4765 (7.0785) grad_norm 3.3682 (2.7025) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][70/625] eta 0:03:47 lr 0.000592 wd 0.0500 time 0.4017 (0.4094) data time 0.0008 (0.0063) model time 0.4009 (0.3999) loss 7.3505 (7.1013) grad_norm 3.2288 (2.6963) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][80/625] eta 0:03:42 lr 0.000592 wd 0.0500 time 0.3980 (0.4082) data time 0.0007 (0.0056) model time 0.3973 (0.3995) loss 7.3752 (7.0897) grad_norm 2.8254 (2.7416) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][90/625] eta 0:03:37 lr 0.000592 wd 0.0500 time 0.3980 (0.4073) data time 0.0006 (0.0051) model time 0.3974 (0.3993) loss 6.0373 (7.0804) grad_norm 2.1804 (2.7093) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][100/625] eta 0:03:33 lr 0.000592 wd 0.0500 time 0.4020 (0.4066) data time 0.0008 (0.0047) model time 0.4011 (0.3993) loss 6.4430 (7.1166) grad_norm 3.1438 (2.6777) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][110/625] eta 0:03:29 lr 0.000591 wd 0.0500 time 0.4011 (0.4062) data time 0.0008 (0.0044) model time 0.4003 (0.3997) loss 8.4428 (7.1073) grad_norm 2.4432 (2.8515) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][120/625] eta 0:03:24 lr 0.000591 wd 0.0500 time 0.4035 (0.4059) data time 0.0006 (0.0041) model time 0.4029 (0.4000) loss 7.7614 (7.1325) grad_norm 5.4550 (2.9422) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][130/625] eta 0:03:20 lr 0.000591 wd 0.0500 time 0.4048 (0.4056) data time 0.0006 (0.0038) model time 0.4043 (0.4000) loss 6.5406 (7.1046) grad_norm 4.6212 (2.9459) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][140/625] eta 0:03:16 lr 0.000591 wd 0.0500 time 0.4032 (0.4056) data time 0.0009 (0.0037) model time 0.4023 (0.4004) loss 7.1964 (7.1157) grad_norm 3.0418 (2.9808) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][150/625] eta 0:03:12 lr 0.000591 wd 0.0500 time 0.3995 (0.4054) data time 0.0009 (0.0035) model time 0.3986 (0.4005) loss 6.8134 (7.0873) grad_norm 1.8908 (2.9389) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][160/625] eta 0:03:08 lr 0.000591 wd 0.0500 time 0.4084 (0.4053) data time 0.0006 (0.0034) model time 0.4078 (0.4007) loss 7.0235 (7.0700) grad_norm 1.7102 (2.8811) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][170/625] eta 0:03:04 lr 0.000591 wd 0.0500 time 0.3989 (0.4049) data time 0.0006 (0.0032) model time 0.3983 (0.4006) loss 6.2590 (7.0663) grad_norm 1.6572 (2.8317) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][180/625] eta 0:03:00 lr 0.000591 wd 0.0500 time 0.3981 (0.4047) data time 0.0008 (0.0031) model time 0.3973 (0.4004) loss 7.5420 (7.0918) grad_norm 6.2258 (2.8201) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][190/625] eta 0:02:55 lr 0.000591 wd 0.0500 time 0.4029 (0.4044) data time 0.0006 (0.0030) model time 0.4023 (0.4003) loss 6.3657 (7.0915) grad_norm 1.6444 (2.8233) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][200/625] eta 0:02:53 lr 0.000591 wd 0.0500 time 0.4003 (0.4088) data time 0.0008 (0.0029) model time 0.3995 (0.4064) loss 5.8672 (7.0937) grad_norm 1.7853 (2.8115) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][210/625] eta 0:02:51 lr 0.000590 wd 0.0500 time 0.5958 (0.4141) data time 0.0005 (0.0028) model time 0.5953 (0.4135) loss 7.4619 (7.0918) grad_norm 1.8376 (2.8018) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][220/625] eta 0:02:49 lr 0.000590 wd 0.0500 time 0.4022 (0.4174) data time 0.0006 (0.0027) model time 0.4016 (0.4178) loss 8.4106 (7.1038) grad_norm 2.2416 (2.7777) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][230/625] eta 0:02:45 lr 0.000590 wd 0.0500 time 0.4049 (0.4179) data time 0.0006 (0.0026) model time 0.4042 (0.4184) loss 6.8469 (7.1109) grad_norm 2.5153 (2.7495) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][240/625] eta 0:02:40 lr 0.000590 wd 0.0500 time 0.4178 (0.4173) data time 0.0008 (0.0025) model time 0.4170 (0.4176) loss 6.2842 (7.1153) grad_norm 2.2027 (2.7264) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][250/625] eta 0:02:36 lr 0.000590 wd 0.0500 time 0.4026 (0.4166) data time 0.0008 (0.0025) model time 0.4018 (0.4166) loss 7.3693 (7.1242) grad_norm 3.1616 (2.7208) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:51:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][260/625] eta 0:02:31 lr 0.000590 wd 0.0500 time 0.3993 (0.4160) data time 0.0008 (0.0024) model time 0.3986 (0.4159) loss 8.5392 (7.1160) grad_norm 3.3246 (2.7270) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][270/625] eta 0:02:27 lr 0.000590 wd 0.0500 time 0.4031 (0.4154) data time 0.0006 (0.0023) model time 0.4025 (0.4151) loss 5.9220 (7.1138) grad_norm 2.0519 (2.6968) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][280/625] eta 0:02:23 lr 0.000590 wd 0.0500 time 0.4110 (0.4150) data time 0.0006 (0.0023) model time 0.4104 (0.4145) loss 7.7173 (7.1131) grad_norm 2.4888 (2.6964) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][290/625] eta 0:02:18 lr 0.000590 wd 0.0500 time 0.3932 (0.4145) data time 0.0006 (0.0023) model time 0.3926 (0.4139) loss 6.9044 (7.1241) grad_norm 1.6998 (2.6973) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][300/625] eta 0:02:14 lr 0.000589 wd 0.0500 time 0.4061 (0.4141) data time 0.0006 (0.0022) model time 0.4055 (0.4134) loss 7.0094 (7.1159) grad_norm 3.3259 (2.6822) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][310/625] eta 0:02:10 lr 0.000589 wd 0.0500 time 0.3979 (0.4136) data time 0.0006 (0.0022) model time 0.3973 (0.4128) loss 8.1267 (7.1102) grad_norm 4.3130 (2.6900) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][320/625] eta 0:02:06 lr 0.000589 wd 0.0500 time 0.3971 (0.4132) data time 0.0007 (0.0021) model time 0.3964 (0.4123) loss 6.9370 (7.1105) grad_norm 3.0351 (2.7078) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][330/625] eta 0:02:01 lr 0.000589 wd 0.0500 time 0.3999 (0.4128) data time 0.0007 (0.0021) model time 0.3992 (0.4119) loss 7.7721 (7.1072) grad_norm 2.8445 (2.7331) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][340/625] eta 0:01:57 lr 0.000589 wd 0.0500 time 0.4028 (0.4125) data time 0.0008 (0.0021) model time 0.4020 (0.4115) loss 8.1060 (7.1024) grad_norm 1.8865 (2.7282) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][350/625] eta 0:01:53 lr 0.000589 wd 0.0500 time 0.3952 (0.4122) data time 0.0006 (0.0020) model time 0.3946 (0.4111) loss 7.9915 (7.0982) grad_norm 2.1738 (2.7454) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][360/625] eta 0:01:49 lr 0.000589 wd 0.0500 time 0.3996 (0.4118) data time 0.0008 (0.0020) model time 0.3988 (0.4107) loss 7.0787 (7.1051) grad_norm 2.3532 (2.7387) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][370/625] eta 0:01:44 lr 0.000589 wd 0.0500 time 0.4005 (0.4116) data time 0.0006 (0.0020) model time 0.3999 (0.4104) loss 8.2279 (7.1056) grad_norm 2.0063 (2.7338) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][380/625] eta 0:01:40 lr 0.000589 wd 0.0500 time 0.4025 (0.4112) data time 0.0008 (0.0019) model time 0.4016 (0.4100) loss 6.9890 (7.1003) grad_norm 3.0061 (2.7371) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][390/625] eta 0:01:36 lr 0.000589 wd 0.0500 time 0.3981 (0.4109) data time 0.0006 (0.0019) model time 0.3974 (0.4097) loss 7.9601 (7.1072) grad_norm 3.3784 (2.7318) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][400/625] eta 0:01:32 lr 0.000588 wd 0.0500 time 0.4055 (0.4107) data time 0.0007 (0.0019) model time 0.4048 (0.4094) loss 7.7077 (7.1109) grad_norm 2.0065 (2.7305) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:52:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][410/625] eta 0:01:28 lr 0.000588 wd 0.0500 time 0.3998 (0.4106) data time 0.0009 (0.0019) model time 0.3989 (0.4093) loss 6.7510 (7.1098) grad_norm 4.2750 (2.7368) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][420/625] eta 0:01:24 lr 0.000588 wd 0.0500 time 0.3964 (0.4126) data time 0.0008 (0.0019) model time 0.3957 (0.4115) loss 6.1433 (7.1089) grad_norm 2.0638 (2.7364) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][430/625] eta 0:01:20 lr 0.000588 wd 0.0500 time 0.5712 (0.4147) data time 0.0009 (0.0019) model time 0.5703 (0.4139) loss 8.2207 (7.1144) grad_norm 4.6230 (2.7514) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][440/625] eta 0:01:16 lr 0.000588 wd 0.0500 time 0.5945 (0.4161) data time 0.0009 (0.0019) model time 0.5936 (0.4156) loss 6.4730 (7.1165) grad_norm 4.9584 (2.7844) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][450/625] eta 0:01:12 lr 0.000588 wd 0.0500 time 0.3896 (0.4169) data time 0.0009 (0.0018) model time 0.3887 (0.4165) loss 8.1002 (7.1171) grad_norm 3.7202 (2.7848) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][460/625] eta 0:01:08 lr 0.000588 wd 0.0500 time 0.4040 (0.4166) data time 0.0006 (0.0018) model time 0.4034 (0.4160) loss 7.5243 (7.1146) grad_norm 2.6516 (2.7844) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][470/625] eta 0:01:04 lr 0.000588 wd 0.0500 time 0.4036 (0.4162) data time 0.0009 (0.0018) model time 0.4027 (0.4156) loss 7.1324 (7.1026) grad_norm 2.2047 (2.7721) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][480/625] eta 0:01:00 lr 0.000588 wd 0.0500 time 0.3960 (0.4159) data time 0.0006 (0.0018) model time 0.3954 (0.4152) loss 7.3071 (7.1040) grad_norm 1.7884 (2.7673) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][490/625] eta 0:00:56 lr 0.000587 wd 0.0500 time 0.3977 (0.4155) data time 0.0006 (0.0018) model time 0.3971 (0.4148) loss 6.4765 (7.1121) grad_norm 5.3446 (2.7679) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][500/625] eta 0:00:51 lr 0.000587 wd 0.0500 time 0.3976 (0.4151) data time 0.0009 (0.0017) model time 0.3967 (0.4144) loss 7.7276 (7.1166) grad_norm 4.8624 (2.7821) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][510/625] eta 0:00:47 lr 0.000587 wd 0.0500 time 0.4002 (0.4148) data time 0.0006 (0.0017) model time 0.3996 (0.4141) loss 6.9081 (7.1117) grad_norm 2.2059 (2.7847) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][520/625] eta 0:00:43 lr 0.000587 wd 0.0500 time 0.3953 (0.4145) data time 0.0009 (0.0017) model time 0.3944 (0.4137) loss 7.5350 (7.1142) grad_norm 3.7033 (2.7804) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][530/625] eta 0:00:39 lr 0.000587 wd 0.0500 time 0.4101 (0.4142) data time 0.0006 (0.0017) model time 0.4095 (0.4134) loss 5.8809 (7.1083) grad_norm 2.8858 (2.7767) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][540/625] eta 0:00:35 lr 0.000587 wd 0.0500 time 0.3962 (0.4140) data time 0.0007 (0.0017) model time 0.3955 (0.4131) loss 5.6005 (7.1124) grad_norm 1.6365 (2.7664) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:53:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][550/625] eta 0:00:31 lr 0.000587 wd 0.0500 time 0.3981 (0.4137) data time 0.0007 (0.0017) model time 0.3974 (0.4128) loss 7.5385 (7.1161) grad_norm 2.1668 (2.7592) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][560/625] eta 0:00:26 lr 0.000587 wd 0.0500 time 0.4041 (0.4135) data time 0.0006 (0.0016) model time 0.4034 (0.4126) loss 7.9426 (7.1187) grad_norm 2.2187 (2.7581) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][570/625] eta 0:00:22 lr 0.000587 wd 0.0500 time 0.3958 (0.4133) data time 0.0008 (0.0016) model time 0.3951 (0.4124) loss 7.1902 (7.1180) grad_norm 1.7345 (2.7600) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][580/625] eta 0:00:18 lr 0.000586 wd 0.0500 time 0.3982 (0.4131) data time 0.0009 (0.0016) model time 0.3973 (0.4122) loss 8.0488 (7.1174) grad_norm 3.0533 (2.7638) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][590/625] eta 0:00:14 lr 0.000586 wd 0.0500 time 0.4004 (0.4129) data time 0.0007 (0.0016) model time 0.3996 (0.4119) loss 7.7992 (7.1134) grad_norm 2.3155 (2.7654) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][600/625] eta 0:00:10 lr 0.000586 wd 0.0500 time 0.3957 (0.4127) data time 0.0006 (0.0016) model time 0.3950 (0.4117) loss 7.5827 (7.1181) grad_norm 2.0305 (2.7559) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][610/625] eta 0:00:06 lr 0.000586 wd 0.0500 time 0.4013 (0.4125) data time 0.0004 (0.0016) model time 0.4009 (0.4115) loss 7.7161 (7.1193) grad_norm 2.7545 (2.7490) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][620/625] eta 0:00:02 lr 0.000586 wd 0.0500 time 0.3981 (0.4123) data time 0.0005 (0.0016) model time 0.3976 (0.4113) loss 6.9493 (7.1211) grad_norm 3.6165 (2.7467) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 162 training takes 0:04:17 [2024-07-25 03:54:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:54:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:54:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.5801 (0.5801) Acc@1 88.623 (88.623) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 03:54:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.9209 (0.7099) Acc@1 79.590 (85.467) Acc@5 95.557 (97.456) Mem 14939MB [2024-07-25 03:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0332 (0.8408) Acc@1 75.146 (82.057) Acc@5 94.580 (96.126) Mem 14939MB [2024-07-25 03:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.726 Acc@5 96.077 [2024-07-25 03:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 03:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.73% [2024-07-25 03:54:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 03:54:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 03:54:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5610 (0.5610) Acc@1 89.453 (89.453) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 03:54:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8950 (0.6998) Acc@1 80.762 (85.982) Acc@5 96.094 (97.638) Mem 14939MB [2024-07-25 03:54:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0273 (0.8228) Acc@1 75.732 (82.610) Acc@5 95.068 (96.368) Mem 14939MB [2024-07-25 03:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.202 Acc@5 96.333 [2024-07-25 03:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-07-25 03:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.20% [2024-07-25 03:54:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:54:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:54:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][0/625] eta 0:08:33 lr 0.000586 wd 0.0500 time 0.8220 (0.8220) data time 0.4353 (0.4353) model time 0.0000 (0.0000) loss 8.0924 (8.0924) grad_norm 2.6282 (2.6282) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][10/625] eta 0:04:49 lr 0.000586 wd 0.0500 time 0.6004 (0.4705) data time 0.0009 (0.0405) model time 0.0000 (0.0000) loss 6.8452 (7.1731) grad_norm 2.6141 (2.6269) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][20/625] eta 0:04:47 lr 0.000586 wd 0.0500 time 0.5928 (0.4746) data time 0.0007 (0.0218) model time 0.0000 (0.0000) loss 8.3048 (7.1585) grad_norm 1.8410 (2.4059) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][30/625] eta 0:04:52 lr 0.000586 wd 0.0500 time 0.5064 (0.4913) data time 0.0009 (0.0151) model time 0.0000 (0.0000) loss 7.0261 (7.2256) grad_norm 2.4104 (2.4767) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][40/625] eta 0:04:44 lr 0.000586 wd 0.0500 time 0.4066 (0.4867) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 6.8161 (7.2132) grad_norm 2.8586 (2.4524) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:54:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][50/625] eta 0:04:31 lr 0.000585 wd 0.0500 time 0.3944 (0.4720) data time 0.0007 (0.0095) model time 0.0000 (0.0000) loss 5.5843 (7.2485) grad_norm 3.7044 (2.7250) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:55:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][60/625] eta 0:04:20 lr 0.000585 wd 0.0500 time 0.4085 (0.4602) data time 0.0009 (0.0081) model time 0.4076 (0.3992) loss 6.3198 (7.2424) grad_norm 2.2495 (2.6584) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][70/625] eta 0:04:10 lr 0.000585 wd 0.0500 time 0.4034 (0.4520) data time 0.0010 (0.0071) model time 0.4024 (0.3998) loss 7.2263 (7.1762) grad_norm 4.7291 (2.8059) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][80/625] eta 0:04:02 lr 0.000585 wd 0.0500 time 0.3979 (0.4453) data time 0.0010 (0.0064) model time 0.3969 (0.3989) loss 8.2286 (7.2325) grad_norm 2.7709 (2.8352) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][90/625] eta 0:03:55 lr 0.000585 wd 0.0500 time 0.3983 (0.4408) data time 0.0008 (0.0058) model time 0.3976 (0.3999) loss 7.5385 (7.2416) grad_norm 2.6547 (2.8066) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:55:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][100/625] eta 0:03:49 lr 0.000585 wd 0.0500 time 0.4052 (0.4367) data time 0.0007 (0.0053) model time 0.4045 (0.3996) loss 6.4477 (7.2355) grad_norm 2.2832 (2.7832) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:55:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][110/625] eta 0:03:43 lr 0.000585 wd 0.0500 time 0.3989 (0.4334) data time 0.0009 (0.0049) model time 0.3980 (0.3996) loss 7.6429 (7.2446) grad_norm 3.0455 (2.7979) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][120/625] eta 0:03:37 lr 0.000585 wd 0.0500 time 0.3965 (0.4313) data time 0.0009 (0.0048) model time 0.3956 (0.4002) loss 7.3635 (7.2313) grad_norm 3.4544 (2.8090) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 03:55:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][130/625] eta 0:03:32 lr 0.000585 wd 0.0500 time 0.4159 (0.4289) data time 0.0009 (0.0045) model time 0.4150 (0.4001) loss 7.0702 (7.2490) grad_norm 2.2911 (2.7632) loss_scale 1024.0000 (515.9084) mem 14939MB [2024-07-25 03:55:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][140/625] eta 0:03:26 lr 0.000585 wd 0.0500 time 0.3952 (0.4267) data time 0.0010 (0.0043) model time 0.3942 (0.3998) loss 7.3764 (7.2255) grad_norm 4.3349 (2.7207) loss_scale 1024.0000 (551.9433) mem 14939MB [2024-07-25 03:55:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][150/625] eta 0:03:21 lr 0.000584 wd 0.0500 time 0.3993 (0.4252) data time 0.0009 (0.0040) model time 0.3983 (0.4000) loss 5.6650 (7.1883) grad_norm 2.4793 (2.7160) loss_scale 1024.0000 (583.2053) mem 14939MB [2024-07-25 03:55:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][160/625] eta 0:03:16 lr 0.000584 wd 0.0500 time 0.4026 (0.4236) data time 0.0006 (0.0038) model time 0.4020 (0.4000) loss 6.5097 (7.1979) grad_norm 1.7349 (2.7053) loss_scale 1024.0000 (610.5839) mem 14939MB [2024-07-25 03:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][170/625] eta 0:03:12 lr 0.000584 wd 0.0500 time 0.4018 (0.4222) data time 0.0007 (0.0037) model time 0.4012 (0.3999) loss 7.1788 (7.1820) grad_norm 3.4017 (2.6909) loss_scale 1024.0000 (634.7602) mem 14939MB [2024-07-25 03:55:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][180/625] eta 0:03:07 lr 0.000584 wd 0.0500 time 0.3965 (0.4210) data time 0.0009 (0.0035) model time 0.3956 (0.3998) loss 6.9502 (7.1620) grad_norm 1.8436 (2.6695) loss_scale 1024.0000 (656.2652) mem 14939MB [2024-07-25 03:55:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][190/625] eta 0:03:03 lr 0.000584 wd 0.0500 time 0.3990 (0.4211) data time 0.0009 (0.0034) model time 0.3981 (0.4013) loss 7.7330 (7.1679) grad_norm 1.9437 (2.6570) loss_scale 1024.0000 (675.5183) mem 14939MB [2024-07-25 03:55:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][200/625] eta 0:02:58 lr 0.000584 wd 0.0500 time 0.3994 (0.4200) data time 0.0010 (0.0033) model time 0.3984 (0.4011) loss 8.2459 (7.1746) grad_norm 2.5812 (2.6616) loss_scale 1024.0000 (692.8557) mem 14939MB [2024-07-25 03:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][210/625] eta 0:02:53 lr 0.000584 wd 0.0500 time 0.4007 (0.4192) data time 0.0008 (0.0032) model time 0.3999 (0.4012) loss 6.0600 (7.1779) grad_norm 2.7409 (2.6625) loss_scale 1024.0000 (708.5498) mem 14939MB [2024-07-25 03:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][220/625] eta 0:02:49 lr 0.000584 wd 0.0500 time 0.3987 (0.4183) data time 0.0006 (0.0031) model time 0.3981 (0.4011) loss 6.8457 (7.1730) grad_norm 2.1946 (2.6464) loss_scale 1024.0000 (722.8235) mem 14939MB [2024-07-25 03:56:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][230/625] eta 0:02:45 lr 0.000584 wd 0.0500 time 0.3986 (0.4181) data time 0.0007 (0.0030) model time 0.3979 (0.4017) loss 7.1102 (7.1594) grad_norm 1.8854 (2.6140) loss_scale 1024.0000 (735.8615) mem 14939MB [2024-07-25 03:56:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][240/625] eta 0:02:41 lr 0.000583 wd 0.0500 time 0.6034 (0.4205) data time 0.0009 (0.0029) model time 0.6024 (0.4056) loss 7.5505 (7.1734) grad_norm 2.5265 (2.6055) loss_scale 1024.0000 (747.8174) mem 14939MB [2024-07-25 03:56:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][250/625] eta 0:02:39 lr 0.000583 wd 0.0500 time 0.6039 (0.4241) data time 0.0007 (0.0028) model time 0.6032 (0.4108) loss 6.0159 (7.1731) grad_norm 2.1307 (2.5979) loss_scale 1024.0000 (758.8207) mem 14939MB [2024-07-25 03:56:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][260/625] eta 0:02:35 lr 0.000583 wd 0.0500 time 0.3947 (0.4258) data time 0.0008 (0.0027) model time 0.3938 (0.4134) loss 6.7071 (7.1518) grad_norm 1.9494 (2.5941) loss_scale 1024.0000 (768.9808) mem 14939MB [2024-07-25 03:56:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][270/625] eta 0:02:31 lr 0.000583 wd 0.0500 time 0.3977 (0.4256) data time 0.0008 (0.0027) model time 0.3970 (0.4138) loss 6.4884 (7.1560) grad_norm 2.2034 (2.6014) loss_scale 1024.0000 (778.3911) mem 14939MB [2024-07-25 03:56:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][280/625] eta 0:02:26 lr 0.000583 wd 0.0500 time 0.3997 (0.4248) data time 0.0009 (0.0026) model time 0.3988 (0.4132) loss 6.9467 (7.1561) grad_norm 1.8810 (2.5831) loss_scale 1024.0000 (787.1317) mem 14939MB [2024-07-25 03:56:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][290/625] eta 0:02:22 lr 0.000583 wd 0.0500 time 0.3925 (0.4239) data time 0.0008 (0.0026) model time 0.3918 (0.4126) loss 7.0929 (7.1476) grad_norm 2.5037 (2.5821) loss_scale 1024.0000 (795.2715) mem 14939MB [2024-07-25 03:56:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][300/625] eta 0:02:17 lr 0.000583 wd 0.0500 time 0.3986 (0.4232) data time 0.0008 (0.0025) model time 0.3978 (0.4121) loss 7.7061 (7.1429) grad_norm 2.6150 (2.5906) loss_scale 1024.0000 (802.8704) mem 14939MB [2024-07-25 03:56:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][310/625] eta 0:02:13 lr 0.000583 wd 0.0500 time 0.4014 (0.4224) data time 0.0007 (0.0024) model time 0.4007 (0.4116) loss 7.1135 (7.1324) grad_norm 2.8435 (2.5909) loss_scale 1024.0000 (809.9807) mem 14939MB [2024-07-25 03:56:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][320/625] eta 0:02:08 lr 0.000583 wd 0.0500 time 0.4000 (0.4217) data time 0.0007 (0.0024) model time 0.3993 (0.4111) loss 8.1411 (7.1387) grad_norm 1.8517 (2.5771) loss_scale 1024.0000 (816.6480) mem 14939MB [2024-07-25 03:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][330/625] eta 0:02:04 lr 0.000582 wd 0.0500 time 0.3988 (0.4210) data time 0.0009 (0.0024) model time 0.3979 (0.4107) loss 7.4510 (7.1442) grad_norm 3.7822 (2.5947) loss_scale 1024.0000 (822.9124) mem 14939MB [2024-07-25 03:56:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][340/625] eta 0:01:59 lr 0.000582 wd 0.0500 time 0.4046 (0.4204) data time 0.0007 (0.0023) model time 0.4039 (0.4103) loss 7.9171 (7.1484) grad_norm 2.6625 (2.5962) loss_scale 1024.0000 (828.8094) mem 14939MB [2024-07-25 03:57:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][350/625] eta 0:01:55 lr 0.000582 wd 0.0500 time 0.3975 (0.4198) data time 0.0010 (0.0023) model time 0.3964 (0.4099) loss 7.1036 (7.1561) grad_norm 1.6404 (2.6043) loss_scale 1024.0000 (834.3704) mem 14939MB [2024-07-25 03:57:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][360/625] eta 0:01:51 lr 0.000582 wd 0.0500 time 0.3976 (0.4193) data time 0.0007 (0.0022) model time 0.3968 (0.4095) loss 6.1086 (7.1524) grad_norm 2.6177 (2.6143) loss_scale 1024.0000 (839.6233) mem 14939MB [2024-07-25 03:57:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][370/625] eta 0:01:46 lr 0.000582 wd 0.0500 time 0.4006 (0.4187) data time 0.0009 (0.0022) model time 0.3997 (0.4092) loss 7.9084 (7.1621) grad_norm 3.3960 (2.6204) loss_scale 1024.0000 (844.5930) mem 14939MB [2024-07-25 03:57:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][380/625] eta 0:01:42 lr 0.000582 wd 0.0500 time 0.3955 (0.4183) data time 0.0009 (0.0022) model time 0.3945 (0.4089) loss 7.6136 (7.1678) grad_norm 8.9817 (2.6435) loss_scale 1024.0000 (849.3018) mem 14939MB [2024-07-25 03:57:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][390/625] eta 0:01:38 lr 0.000582 wd 0.0500 time 0.4013 (0.4178) data time 0.0009 (0.0021) model time 0.4004 (0.4086) loss 7.4530 (7.1613) grad_norm 6.8326 (2.6617) loss_scale 1024.0000 (853.7698) mem 14939MB [2024-07-25 03:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][400/625] eta 0:01:33 lr 0.000582 wd 0.0500 time 0.3963 (0.4173) data time 0.0009 (0.0021) model time 0.3954 (0.4083) loss 7.4526 (7.1617) grad_norm 5.0490 (2.6896) loss_scale 1024.0000 (858.0150) mem 14939MB [2024-07-25 03:57:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][410/625] eta 0:01:29 lr 0.000582 wd 0.0500 time 0.3976 (0.4172) data time 0.0009 (0.0021) model time 0.3967 (0.4084) loss 7.5880 (7.1597) grad_norm 2.1542 (2.6848) loss_scale 1024.0000 (862.0535) mem 14939MB [2024-07-25 03:57:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][420/625] eta 0:01:25 lr 0.000582 wd 0.0500 time 0.4013 (0.4168) data time 0.0008 (0.0021) model time 0.4005 (0.4082) loss 7.1462 (7.1559) grad_norm 3.5758 (2.6819) loss_scale 1024.0000 (865.9002) mem 14939MB [2024-07-25 03:57:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][430/625] eta 0:01:21 lr 0.000581 wd 0.0500 time 0.3981 (0.4164) data time 0.0006 (0.0020) model time 0.3975 (0.4079) loss 7.1153 (7.1403) grad_norm 2.3659 (2.7016) loss_scale 1024.0000 (869.5684) mem 14939MB [2024-07-25 03:57:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][440/625] eta 0:01:16 lr 0.000581 wd 0.0500 time 0.3969 (0.4161) data time 0.0007 (0.0020) model time 0.3962 (0.4078) loss 8.0654 (7.1373) grad_norm 1.7063 (2.6921) loss_scale 1024.0000 (873.0703) mem 14939MB [2024-07-25 03:57:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][450/625] eta 0:01:12 lr 0.000581 wd 0.0500 time 0.6137 (0.4162) data time 0.0006 (0.0020) model time 0.6131 (0.4081) loss 6.9009 (7.1346) grad_norm 3.1851 (2.6920) loss_scale 1024.0000 (876.4169) mem 14939MB [2024-07-25 03:57:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][460/625] eta 0:01:08 lr 0.000581 wd 0.0500 time 0.3974 (0.4174) data time 0.0009 (0.0020) model time 0.3965 (0.4096) loss 7.3766 (7.1332) grad_norm 2.4814 (2.6923) loss_scale 1024.0000 (879.6182) mem 14939MB [2024-07-25 03:57:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][470/625] eta 0:01:05 lr 0.000581 wd 0.0500 time 0.5961 (0.4196) data time 0.0007 (0.0019) model time 0.5954 (0.4122) loss 5.5565 (7.1324) grad_norm 2.4779 (2.6941) loss_scale 1024.0000 (882.6837) mem 14939MB [2024-07-25 03:57:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][480/625] eta 0:01:01 lr 0.000581 wd 0.0500 time 0.3997 (0.4211) data time 0.0006 (0.0019) model time 0.3991 (0.4141) loss 5.9122 (7.1306) grad_norm 2.7579 (2.6907) loss_scale 1024.0000 (885.6216) mem 14939MB [2024-07-25 03:58:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][490/625] eta 0:00:56 lr 0.000581 wd 0.0500 time 0.3975 (0.4211) data time 0.0006 (0.0019) model time 0.3969 (0.4142) loss 6.7071 (7.1366) grad_norm 2.5177 (2.6943) loss_scale 1024.0000 (888.4399) mem 14939MB [2024-07-25 03:58:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][500/625] eta 0:00:52 lr 0.000581 wd 0.0500 time 0.3948 (0.4207) data time 0.0009 (0.0019) model time 0.3939 (0.4139) loss 7.5928 (7.1385) grad_norm 2.0887 (2.6955) loss_scale 1024.0000 (891.1457) mem 14939MB [2024-07-25 03:58:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][510/625] eta 0:00:48 lr 0.000581 wd 0.0500 time 0.3980 (0.4203) data time 0.0008 (0.0018) model time 0.3972 (0.4136) loss 8.2060 (7.1341) grad_norm 1.8359 (2.6856) loss_scale 1024.0000 (893.7456) mem 14939MB [2024-07-25 03:58:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][520/625] eta 0:00:44 lr 0.000580 wd 0.0500 time 0.3984 (0.4199) data time 0.0008 (0.0018) model time 0.3975 (0.4132) loss 6.9539 (7.1326) grad_norm 2.7659 (2.6830) loss_scale 1024.0000 (896.2457) mem 14939MB [2024-07-25 03:58:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][530/625] eta 0:00:39 lr 0.000580 wd 0.0500 time 0.3946 (0.4195) data time 0.0008 (0.0018) model time 0.3937 (0.4129) loss 5.5264 (7.1257) grad_norm 2.1113 (2.6712) loss_scale 1024.0000 (898.6516) mem 14939MB [2024-07-25 03:58:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][540/625] eta 0:00:35 lr 0.000580 wd 0.0500 time 0.3988 (0.4192) data time 0.0009 (0.0018) model time 0.3979 (0.4127) loss 6.6845 (7.1259) grad_norm 2.3227 (2.6778) loss_scale 1024.0000 (900.9686) mem 14939MB [2024-07-25 03:58:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][550/625] eta 0:00:31 lr 0.000580 wd 0.0500 time 0.4043 (0.4189) data time 0.0006 (0.0018) model time 0.4036 (0.4125) loss 7.6807 (7.1286) grad_norm 1.9798 (2.6876) loss_scale 1024.0000 (903.2015) mem 14939MB [2024-07-25 03:58:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][560/625] eta 0:00:27 lr 0.000580 wd 0.0500 time 0.3972 (0.4185) data time 0.0006 (0.0018) model time 0.3966 (0.4122) loss 7.0425 (7.1301) grad_norm 2.0832 (2.6882) loss_scale 1024.0000 (905.3547) mem 14939MB [2024-07-25 03:58:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][570/625] eta 0:00:23 lr 0.000580 wd 0.0500 time 0.4026 (0.4182) data time 0.0008 (0.0017) model time 0.4017 (0.4119) loss 6.8359 (7.1357) grad_norm 2.4225 (2.6924) loss_scale 1024.0000 (907.4326) mem 14939MB [2024-07-25 03:58:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][580/625] eta 0:00:18 lr 0.000580 wd 0.0500 time 0.3952 (0.4179) data time 0.0007 (0.0017) model time 0.3945 (0.4117) loss 7.8734 (7.1412) grad_norm 2.8080 (2.6921) loss_scale 1024.0000 (909.4389) mem 14939MB [2024-07-25 03:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][590/625] eta 0:00:14 lr 0.000580 wd 0.0500 time 0.4044 (0.4175) data time 0.0006 (0.0017) model time 0.4037 (0.4114) loss 6.9588 (7.1390) grad_norm 2.5010 (2.6878) loss_scale 1024.0000 (911.3773) mem 14939MB [2024-07-25 03:58:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][600/625] eta 0:00:10 lr 0.000580 wd 0.0500 time 0.4027 (0.4173) data time 0.0009 (0.0017) model time 0.4018 (0.4112) loss 5.6193 (7.1310) grad_norm 2.0160 (2.6810) loss_scale 1024.0000 (913.2512) mem 14939MB [2024-07-25 03:58:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][610/625] eta 0:00:06 lr 0.000580 wd 0.0500 time 0.3956 (0.4170) data time 0.0006 (0.0017) model time 0.3950 (0.4110) loss 6.8070 (7.1368) grad_norm 3.4403 (2.6941) loss_scale 1024.0000 (915.0638) mem 14939MB [2024-07-25 03:58:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][620/625] eta 0:00:02 lr 0.000579 wd 0.0500 time 0.3974 (0.4167) data time 0.0006 (0.0017) model time 0.3968 (0.4107) loss 8.2700 (7.1415) grad_norm 2.6485 (2.6954) loss_scale 1024.0000 (916.8180) mem 14939MB [2024-07-25 03:58:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 163 training takes 0:04:20 [2024-07-25 03:58:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 03:58:56 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 03:58:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5698 (0.5698) Acc@1 89.453 (89.453) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 03:58:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9014 (0.7065) Acc@1 79.834 (85.507) Acc@5 96.094 (97.541) Mem 14939MB [2024-07-25 03:58:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0400 (0.8362) Acc@1 75.586 (82.110) Acc@5 94.482 (96.122) Mem 14939MB [2024-07-25 03:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.674 Acc@5 96.099 [2024-07-25 03:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 03:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.742 (0.742) Loss 0.5601 (0.5601) Acc@1 89.502 (89.502) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 03:59:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.153) Loss 0.8936 (0.6988) Acc@1 80.762 (86.022) Acc@5 96.094 (97.638) Mem 14939MB [2024-07-25 03:59:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 1.0264 (0.8218) Acc@1 75.781 (82.647) Acc@5 95.068 (96.359) Mem 14939MB [2024-07-25 03:59:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.232 Acc@5 96.337 [2024-07-25 03:59:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-07-25 03:59:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.23% [2024-07-25 03:59:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 03:59:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 03:59:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][0/625] eta 0:07:40 lr 0.000579 wd 0.0500 time 0.7374 (0.7374) data time 0.3573 (0.3573) model time 0.0000 (0.0000) loss 6.4927 (6.4927) grad_norm 2.4893 (2.4893) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][10/625] eta 0:04:25 lr 0.000579 wd 0.0500 time 0.3947 (0.4316) data time 0.0007 (0.0333) model time 0.0000 (0.0000) loss 6.4890 (6.7810) grad_norm 5.0934 (2.4308) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][20/625] eta 0:04:12 lr 0.000579 wd 0.0500 time 0.3975 (0.4177) data time 0.0006 (0.0183) model time 0.0000 (0.0000) loss 5.9131 (6.7679) grad_norm 2.2735 (2.5838) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][30/625] eta 0:04:05 lr 0.000579 wd 0.0500 time 0.4003 (0.4127) data time 0.0007 (0.0127) model time 0.0000 (0.0000) loss 8.0427 (6.7766) grad_norm 3.0862 (2.7789) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][40/625] eta 0:03:59 lr 0.000579 wd 0.0500 time 0.3962 (0.4099) data time 0.0007 (0.0102) model time 0.0000 (0.0000) loss 8.0253 (6.8911) grad_norm 3.4920 (3.2494) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][50/625] eta 0:04:00 lr 0.000579 wd 0.0500 time 0.5920 (0.4175) data time 0.0007 (0.0085) model time 0.0000 (0.0000) loss 7.2392 (6.9572) grad_norm 2.7918 (3.0733) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][60/625] eta 0:04:00 lr 0.000579 wd 0.0500 time 0.5779 (0.4259) data time 0.0008 (0.0072) model time 0.5771 (0.4677) loss 8.2385 (6.9417) grad_norm 7.1748 (3.0460) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][70/625] eta 0:04:02 lr 0.000579 wd 0.0500 time 0.5699 (0.4361) data time 0.0008 (0.0063) model time 0.5690 (0.4824) loss 7.7611 (7.0041) grad_norm 4.2120 (3.1405) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][80/625] eta 0:03:56 lr 0.000578 wd 0.0500 time 0.4022 (0.4331) data time 0.0006 (0.0057) model time 0.4015 (0.4586) loss 7.1795 (7.0643) grad_norm 3.4820 (3.2140) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][90/625] eta 0:03:50 lr 0.000578 wd 0.0500 time 0.3959 (0.4314) data time 0.0007 (0.0052) model time 0.3952 (0.4481) loss 8.1579 (7.0865) grad_norm 1.9042 (3.1743) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][100/625] eta 0:03:44 lr 0.000578 wd 0.0500 time 0.3985 (0.4282) data time 0.0006 (0.0047) model time 0.3979 (0.4381) loss 7.8384 (7.1145) grad_norm 2.1644 (3.1765) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][110/625] eta 0:03:39 lr 0.000578 wd 0.0500 time 0.4006 (0.4258) data time 0.0006 (0.0044) model time 0.3999 (0.4318) loss 7.7183 (7.1785) grad_norm 2.3145 (3.1578) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][120/625] eta 0:03:34 lr 0.000578 wd 0.0500 time 0.3959 (0.4238) data time 0.0009 (0.0041) model time 0.3950 (0.4274) loss 7.5902 (7.1570) grad_norm 2.2083 (3.1263) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 03:59:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][130/625] eta 0:03:28 lr 0.000578 wd 0.0500 time 0.3977 (0.4220) data time 0.0006 (0.0039) model time 0.3970 (0.4238) loss 7.4911 (7.1503) grad_norm 2.4330 (3.1319) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][140/625] eta 0:03:23 lr 0.000578 wd 0.0500 time 0.3982 (0.4203) data time 0.0007 (0.0037) model time 0.3975 (0.4210) loss 8.2092 (7.1460) grad_norm 2.9056 (3.1073) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][150/625] eta 0:03:19 lr 0.000578 wd 0.0500 time 0.3965 (0.4190) data time 0.0011 (0.0035) model time 0.3954 (0.4188) loss 7.4270 (7.1773) grad_norm 2.3486 (3.0737) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][160/625] eta 0:03:14 lr 0.000578 wd 0.0500 time 0.3970 (0.4177) data time 0.0009 (0.0033) model time 0.3962 (0.4168) loss 7.5974 (7.1969) grad_norm 2.1214 (3.0377) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][170/625] eta 0:03:09 lr 0.000578 wd 0.0500 time 0.3814 (0.4176) data time 0.0008 (0.0032) model time 0.3806 (0.4167) loss 7.5192 (7.2071) grad_norm 2.6127 (3.0628) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][180/625] eta 0:03:05 lr 0.000577 wd 0.0500 time 0.3964 (0.4167) data time 0.0006 (0.0031) model time 0.3958 (0.4154) loss 7.5367 (7.1958) grad_norm 4.8486 (3.0594) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][190/625] eta 0:03:00 lr 0.000577 wd 0.0500 time 0.3959 (0.4158) data time 0.0007 (0.0030) model time 0.3952 (0.4142) loss 5.1672 (7.1993) grad_norm 4.1004 (3.0600) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][200/625] eta 0:02:56 lr 0.000577 wd 0.0500 time 0.3993 (0.4149) data time 0.0007 (0.0028) model time 0.3987 (0.4131) loss 7.7067 (7.2074) grad_norm 2.5646 (3.0491) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][210/625] eta 0:02:51 lr 0.000577 wd 0.0500 time 0.3985 (0.4142) data time 0.0009 (0.0028) model time 0.3976 (0.4122) loss 8.7126 (7.2056) grad_norm 5.9360 (3.0762) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][220/625] eta 0:02:47 lr 0.000577 wd 0.0500 time 0.3949 (0.4136) data time 0.0009 (0.0027) model time 0.3940 (0.4114) loss 7.3725 (7.2003) grad_norm 2.5890 (3.0731) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][230/625] eta 0:02:43 lr 0.000577 wd 0.0500 time 0.4013 (0.4130) data time 0.0008 (0.0026) model time 0.4005 (0.4108) loss 7.7885 (7.1953) grad_norm 2.2790 (3.0633) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][240/625] eta 0:02:38 lr 0.000577 wd 0.0500 time 0.3941 (0.4124) data time 0.0007 (0.0025) model time 0.3934 (0.4101) loss 7.7143 (7.1787) grad_norm 2.9999 (3.0742) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][250/625] eta 0:02:34 lr 0.000577 wd 0.0500 time 0.3984 (0.4120) data time 0.0008 (0.0025) model time 0.3976 (0.4096) loss 9.1386 (7.1838) grad_norm 2.0733 (3.0469) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][260/625] eta 0:02:30 lr 0.000577 wd 0.0500 time 0.3995 (0.4115) data time 0.0007 (0.0024) model time 0.3988 (0.4091) loss 7.3766 (7.1680) grad_norm 2.3656 (3.0257) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][270/625] eta 0:02:26 lr 0.000576 wd 0.0500 time 0.5988 (0.4130) data time 0.0009 (0.0023) model time 0.5979 (0.4110) loss 7.6993 (7.1906) grad_norm 3.0704 (3.0124) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:00:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][280/625] eta 0:02:22 lr 0.000576 wd 0.0500 time 0.5758 (0.4143) data time 0.0007 (0.0023) model time 0.5751 (0.4127) loss 7.1954 (7.1787) grad_norm 2.7492 (2.9861) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][290/625] eta 0:02:20 lr 0.000576 wd 0.0500 time 0.5925 (0.4184) data time 0.0008 (0.0023) model time 0.5917 (0.4176) loss 7.5859 (7.1809) grad_norm 2.3670 (2.9604) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][300/625] eta 0:02:16 lr 0.000576 wd 0.0500 time 0.4006 (0.4190) data time 0.0007 (0.0022) model time 0.3999 (0.4183) loss 6.0260 (7.1793) grad_norm 2.9179 (2.9437) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][310/625] eta 0:02:11 lr 0.000576 wd 0.0500 time 0.4024 (0.4188) data time 0.0007 (0.0022) model time 0.4017 (0.4181) loss 6.0061 (7.1782) grad_norm 2.8083 (2.9433) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][320/625] eta 0:02:07 lr 0.000576 wd 0.0500 time 0.3997 (0.4183) data time 0.0008 (0.0022) model time 0.3988 (0.4174) loss 7.8214 (7.1856) grad_norm 2.7071 (2.9492) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][330/625] eta 0:02:03 lr 0.000576 wd 0.0500 time 0.4008 (0.4178) data time 0.0008 (0.0022) model time 0.4000 (0.4168) loss 7.2721 (7.1928) grad_norm 3.5747 (2.9445) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][340/625] eta 0:01:58 lr 0.000576 wd 0.0500 time 0.4002 (0.4173) data time 0.0009 (0.0021) model time 0.3993 (0.4162) loss 6.3747 (7.1969) grad_norm 2.6400 (2.9333) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][350/625] eta 0:01:54 lr 0.000576 wd 0.0500 time 0.3949 (0.4168) data time 0.0009 (0.0021) model time 0.3941 (0.4156) loss 6.5707 (7.1918) grad_norm 2.7383 (2.9182) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][360/625] eta 0:01:50 lr 0.000576 wd 0.0500 time 0.4211 (0.4164) data time 0.0008 (0.0021) model time 0.4202 (0.4152) loss 5.7236 (7.1812) grad_norm 3.0214 (2.9105) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][370/625] eta 0:01:46 lr 0.000575 wd 0.0500 time 0.3987 (0.4160) data time 0.0009 (0.0020) model time 0.3979 (0.4147) loss 5.4252 (7.1693) grad_norm 2.6729 (2.9209) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][380/625] eta 0:01:41 lr 0.000575 wd 0.0500 time 0.4028 (0.4156) data time 0.0008 (0.0020) model time 0.4021 (0.4143) loss 7.8034 (7.1675) grad_norm 1.8554 (2.9139) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][390/625] eta 0:01:37 lr 0.000575 wd 0.0500 time 0.6032 (0.4158) data time 0.0007 (0.0020) model time 0.6025 (0.4145) loss 8.0011 (7.1716) grad_norm 2.0906 (2.9007) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][400/625] eta 0:01:33 lr 0.000575 wd 0.0500 time 0.4044 (0.4154) data time 0.0006 (0.0020) model time 0.4038 (0.4140) loss 6.5532 (7.1691) grad_norm 3.7104 (2.9178) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][410/625] eta 0:01:29 lr 0.000575 wd 0.0500 time 0.3986 (0.4150) data time 0.0009 (0.0019) model time 0.3978 (0.4136) loss 7.9107 (7.1675) grad_norm 2.5709 (2.9153) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:01:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][420/625] eta 0:01:25 lr 0.000575 wd 0.0500 time 0.4135 (0.4147) data time 0.0007 (0.0019) model time 0.4128 (0.4133) loss 6.6675 (7.1609) grad_norm 3.4385 (2.9179) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][430/625] eta 0:01:20 lr 0.000575 wd 0.0500 time 0.3992 (0.4143) data time 0.0007 (0.0019) model time 0.3985 (0.4129) loss 7.2082 (7.1550) grad_norm 2.2860 (2.9170) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][440/625] eta 0:01:16 lr 0.000575 wd 0.0500 time 0.4005 (0.4140) data time 0.0008 (0.0019) model time 0.3997 (0.4126) loss 7.4893 (7.1511) grad_norm 2.2944 (2.9163) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][450/625] eta 0:01:12 lr 0.000575 wd 0.0500 time 0.4014 (0.4138) data time 0.0007 (0.0019) model time 0.4007 (0.4123) loss 7.3466 (7.1526) grad_norm 1.9324 (2.9153) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][460/625] eta 0:01:08 lr 0.000574 wd 0.0500 time 0.3987 (0.4135) data time 0.0007 (0.0018) model time 0.3980 (0.4120) loss 6.1829 (7.1530) grad_norm 3.8415 (2.9051) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][470/625] eta 0:01:04 lr 0.000574 wd 0.0500 time 0.4035 (0.4132) data time 0.0007 (0.0018) model time 0.4028 (0.4117) loss 8.6170 (7.1574) grad_norm 2.2420 (2.8987) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][480/625] eta 0:00:59 lr 0.000574 wd 0.0500 time 0.4004 (0.4129) data time 0.0007 (0.0018) model time 0.3998 (0.4114) loss 7.1493 (7.1478) grad_norm 2.0051 (2.9001) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][490/625] eta 0:00:55 lr 0.000574 wd 0.0500 time 0.5998 (0.4138) data time 0.0007 (0.0018) model time 0.5991 (0.4124) loss 5.8243 (7.1503) grad_norm 3.2998 (2.9045) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][500/625] eta 0:00:51 lr 0.000574 wd 0.0500 time 0.5789 (0.4152) data time 0.0009 (0.0018) model time 0.5780 (0.4139) loss 7.7966 (7.1462) grad_norm 4.9823 (2.9062) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][510/625] eta 0:00:47 lr 0.000574 wd 0.0500 time 0.3971 (0.4166) data time 0.0009 (0.0017) model time 0.3962 (0.4155) loss 7.5869 (7.1496) grad_norm 1.7367 (2.9075) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][520/625] eta 0:00:43 lr 0.000574 wd 0.0500 time 0.3980 (0.4169) data time 0.0006 (0.0017) model time 0.3974 (0.4159) loss 7.0577 (7.1523) grad_norm 1.9536 (2.8964) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][530/625] eta 0:00:39 lr 0.000574 wd 0.0500 time 0.4077 (0.4169) data time 0.0008 (0.0017) model time 0.4069 (0.4159) loss 5.5370 (7.1496) grad_norm 3.1670 (2.8915) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][540/625] eta 0:00:35 lr 0.000574 wd 0.0500 time 0.3946 (0.4165) data time 0.0007 (0.0017) model time 0.3939 (0.4154) loss 6.6044 (7.1464) grad_norm 4.5566 (2.8974) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][550/625] eta 0:00:31 lr 0.000573 wd 0.0500 time 0.3961 (0.4163) data time 0.0009 (0.0017) model time 0.3952 (0.4152) loss 6.4415 (7.1396) grad_norm 2.1947 (2.9056) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:02:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][560/625] eta 0:00:27 lr 0.000573 wd 0.0500 time 0.4128 (0.4160) data time 0.0009 (0.0017) model time 0.4119 (0.4148) loss 6.5777 (7.1362) grad_norm 3.3782 (2.9005) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][570/625] eta 0:00:22 lr 0.000573 wd 0.0500 time 0.4009 (0.4157) data time 0.0009 (0.0017) model time 0.4000 (0.4146) loss 6.8001 (7.1370) grad_norm 2.6002 (2.8984) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][580/625] eta 0:00:18 lr 0.000573 wd 0.0500 time 0.3986 (0.4155) data time 0.0009 (0.0017) model time 0.3977 (0.4143) loss 7.4960 (7.1367) grad_norm 2.1088 (2.8895) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][590/625] eta 0:00:14 lr 0.000573 wd 0.0500 time 0.4114 (0.4153) data time 0.0007 (0.0016) model time 0.4107 (0.4141) loss 7.1706 (7.1409) grad_norm 3.8061 (2.8834) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][600/625] eta 0:00:10 lr 0.000573 wd 0.0500 time 0.4026 (0.4151) data time 0.0008 (0.0016) model time 0.4018 (0.4139) loss 7.2010 (7.1469) grad_norm 3.0276 (2.8786) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][610/625] eta 0:00:06 lr 0.000573 wd 0.0500 time 0.4001 (0.4152) data time 0.0006 (0.0016) model time 0.3996 (0.4139) loss 7.0975 (7.1496) grad_norm 3.3363 (2.8867) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][620/625] eta 0:00:02 lr 0.000573 wd 0.0500 time 0.4347 (0.4149) data time 0.0006 (0.0016) model time 0.4341 (0.4137) loss 6.2231 (7.1527) grad_norm 1.8639 (2.8861) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 164 training takes 0:04:19 [2024-07-25 04:03:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:03:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:03:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.5923 (0.5923) Acc@1 88.086 (88.086) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 04:03:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9570 (0.7252) Acc@1 79.199 (85.431) Acc@5 95.557 (97.456) Mem 14939MB [2024-07-25 04:03:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0537 (0.8521) Acc@1 74.121 (81.992) Acc@5 94.580 (96.129) Mem 14939MB [2024-07-25 04:03:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.624 Acc@5 96.105 [2024-07-25 04:03:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 04:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.888 (0.888) Loss 0.5601 (0.5601) Acc@1 89.502 (89.502) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 04:03:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.163) Loss 0.8926 (0.6982) Acc@1 80.762 (86.053) Acc@5 96.094 (97.634) Mem 14939MB [2024-07-25 04:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.126) Loss 1.0244 (0.8209) Acc@1 75.879 (82.682) Acc@5 94.971 (96.361) Mem 14939MB [2024-07-25 04:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.262 Acc@5 96.335 [2024-07-25 04:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-07-25 04:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.26% [2024-07-25 04:03:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:03:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:03:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][0/625] eta 0:09:25 lr 0.000573 wd 0.0500 time 0.9051 (0.9051) data time 0.5138 (0.5138) model time 0.0000 (0.0000) loss 7.1117 (7.1117) grad_norm 2.0020 (2.0020) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][10/625] eta 0:04:34 lr 0.000573 wd 0.0500 time 0.4008 (0.4466) data time 0.0008 (0.0475) model time 0.0000 (0.0000) loss 6.4852 (7.0315) grad_norm 2.9898 (2.6370) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][20/625] eta 0:04:16 lr 0.000572 wd 0.0500 time 0.4110 (0.4242) data time 0.0008 (0.0253) model time 0.0000 (0.0000) loss 7.4631 (7.1088) grad_norm 1.9942 (2.5233) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][30/625] eta 0:04:07 lr 0.000572 wd 0.0500 time 0.4052 (0.4162) data time 0.0008 (0.0174) model time 0.0000 (0.0000) loss 6.7161 (7.1614) grad_norm 3.0375 (2.5688) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][40/625] eta 0:04:01 lr 0.000572 wd 0.0500 time 0.4005 (0.4136) data time 0.0007 (0.0134) model time 0.0000 (0.0000) loss 7.2269 (7.0726) grad_norm 1.8971 (2.6294) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][50/625] eta 0:03:56 lr 0.000572 wd 0.0500 time 0.4018 (0.4110) data time 0.0006 (0.0110) model time 0.0000 (0.0000) loss 6.4446 (7.0396) grad_norm 1.5843 (2.6143) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][60/625] eta 0:03:51 lr 0.000572 wd 0.0500 time 0.4198 (0.4101) data time 0.0008 (0.0094) model time 0.4190 (0.4043) loss 7.9660 (7.0859) grad_norm 2.3318 (2.6331) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:03:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][70/625] eta 0:03:46 lr 0.000572 wd 0.0500 time 0.4013 (0.4088) data time 0.0009 (0.0082) model time 0.4004 (0.4023) loss 7.7136 (7.1100) grad_norm 3.5599 (2.5672) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][80/625] eta 0:03:42 lr 0.000572 wd 0.0500 time 0.4037 (0.4076) data time 0.0006 (0.0073) model time 0.4031 (0.4011) loss 7.1871 (7.0723) grad_norm 1.5805 (2.5567) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][90/625] eta 0:03:41 lr 0.000572 wd 0.0500 time 0.6000 (0.4149) data time 0.0006 (0.0066) model time 0.5993 (0.4190) loss 7.4865 (7.0807) grad_norm 1.7616 (2.5180) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][100/625] eta 0:03:41 lr 0.000572 wd 0.0500 time 0.6154 (0.4220) data time 0.0008 (0.0060) model time 0.6146 (0.4323) loss 6.6219 (7.1348) grad_norm 2.7426 (2.5008) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][110/625] eta 0:03:40 lr 0.000572 wd 0.0500 time 0.3982 (0.4273) data time 0.0006 (0.0056) model time 0.3976 (0.4403) loss 6.5447 (7.1358) grad_norm 1.8384 (2.5233) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][120/625] eta 0:03:36 lr 0.000571 wd 0.0500 time 0.5874 (0.4281) data time 0.0008 (0.0052) model time 0.5865 (0.4396) loss 8.7394 (7.1140) grad_norm 3.8619 (2.5783) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][130/625] eta 0:03:31 lr 0.000571 wd 0.0500 time 0.4081 (0.4265) data time 0.0008 (0.0049) model time 0.4073 (0.4355) loss 7.5612 (7.1022) grad_norm 2.2510 (2.5664) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][140/625] eta 0:03:25 lr 0.000571 wd 0.0500 time 0.3993 (0.4247) data time 0.0009 (0.0046) model time 0.3985 (0.4315) loss 8.0832 (7.1191) grad_norm 4.3044 (2.6033) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][150/625] eta 0:03:21 lr 0.000571 wd 0.0500 time 0.4047 (0.4240) data time 0.0006 (0.0043) model time 0.4041 (0.4297) loss 7.4839 (7.1196) grad_norm 3.6493 (2.6230) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][160/625] eta 0:03:16 lr 0.000571 wd 0.0500 time 0.3933 (0.4226) data time 0.0006 (0.0041) model time 0.3927 (0.4270) loss 6.1534 (7.1061) grad_norm 2.1984 (2.6155) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][170/625] eta 0:03:11 lr 0.000571 wd 0.0500 time 0.4009 (0.4214) data time 0.0008 (0.0039) model time 0.4001 (0.4250) loss 7.3751 (7.0956) grad_norm 2.7630 (2.6270) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][180/625] eta 0:03:07 lr 0.000571 wd 0.0500 time 0.4044 (0.4203) data time 0.0009 (0.0038) model time 0.4035 (0.4231) loss 7.4806 (7.0957) grad_norm 4.0055 (2.6768) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][190/625] eta 0:03:02 lr 0.000571 wd 0.0500 time 0.3994 (0.4194) data time 0.0008 (0.0036) model time 0.3986 (0.4215) loss 7.5579 (7.1128) grad_norm 2.8774 (2.6633) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][200/625] eta 0:02:57 lr 0.000571 wd 0.0500 time 0.3997 (0.4185) data time 0.0008 (0.0035) model time 0.3989 (0.4201) loss 6.8383 (7.1235) grad_norm 3.0827 (2.6837) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:04:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][210/625] eta 0:02:53 lr 0.000570 wd 0.0500 time 0.4079 (0.4176) data time 0.0008 (0.0034) model time 0.4071 (0.4188) loss 6.9314 (7.1241) grad_norm 3.9748 (2.6978) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][220/625] eta 0:02:48 lr 0.000570 wd 0.0500 time 0.4006 (0.4168) data time 0.0007 (0.0033) model time 0.4000 (0.4176) loss 6.4128 (7.1219) grad_norm 1.8126 (2.6887) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][230/625] eta 0:02:44 lr 0.000570 wd 0.0500 time 0.3992 (0.4161) data time 0.0009 (0.0032) model time 0.3983 (0.4166) loss 7.4494 (7.1230) grad_norm 3.4476 (2.6876) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][240/625] eta 0:02:39 lr 0.000570 wd 0.0500 time 0.4014 (0.4155) data time 0.0008 (0.0031) model time 0.4005 (0.4157) loss 7.9092 (7.1290) grad_norm 2.2741 (2.6925) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][250/625] eta 0:02:35 lr 0.000570 wd 0.0500 time 0.4000 (0.4148) data time 0.0008 (0.0031) model time 0.3992 (0.4148) loss 7.0184 (7.1362) grad_norm 2.9081 (2.7047) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][260/625] eta 0:02:31 lr 0.000570 wd 0.0500 time 0.4024 (0.4143) data time 0.0008 (0.0030) model time 0.4016 (0.4140) loss 8.0203 (7.1442) grad_norm 2.0309 (2.7168) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][270/625] eta 0:02:26 lr 0.000570 wd 0.0500 time 0.3988 (0.4138) data time 0.0007 (0.0030) model time 0.3981 (0.4133) loss 7.9938 (7.1585) grad_norm 2.4628 (2.7201) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][280/625] eta 0:02:22 lr 0.000570 wd 0.0500 time 0.3944 (0.4132) data time 0.0009 (0.0029) model time 0.3935 (0.4126) loss 8.0068 (7.1520) grad_norm 2.1017 (2.7621) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][290/625] eta 0:02:18 lr 0.000570 wd 0.0500 time 0.4021 (0.4128) data time 0.0008 (0.0028) model time 0.4013 (0.4121) loss 7.3383 (7.1592) grad_norm 2.3714 (2.7675) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][300/625] eta 0:02:14 lr 0.000570 wd 0.0500 time 0.5989 (0.4130) data time 0.0006 (0.0028) model time 0.5982 (0.4123) loss 6.6340 (7.1695) grad_norm 1.8975 (2.7535) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][310/625] eta 0:02:10 lr 0.000569 wd 0.0500 time 0.5888 (0.4140) data time 0.0006 (0.0027) model time 0.5882 (0.4135) loss 7.0233 (7.1809) grad_norm 2.9720 (2.7570) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][320/625] eta 0:02:06 lr 0.000569 wd 0.0500 time 0.5853 (0.4163) data time 0.0009 (0.0027) model time 0.5844 (0.4162) loss 5.6744 (7.1729) grad_norm 2.1161 (2.7471) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][330/625] eta 0:02:03 lr 0.000569 wd 0.0500 time 0.4021 (0.4186) data time 0.0008 (0.0026) model time 0.4013 (0.4189) loss 6.5136 (7.1516) grad_norm 1.9352 (2.7499) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][340/625] eta 0:01:59 lr 0.000569 wd 0.0500 time 0.3955 (0.4187) data time 0.0009 (0.0026) model time 0.3946 (0.4190) loss 7.0419 (7.1571) grad_norm 1.8670 (2.7304) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:05:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][350/625] eta 0:01:54 lr 0.000569 wd 0.0500 time 0.4038 (0.4182) data time 0.0009 (0.0025) model time 0.4029 (0.4183) loss 7.7538 (7.1578) grad_norm 3.5179 (2.7261) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][360/625] eta 0:01:50 lr 0.000569 wd 0.0500 time 0.3964 (0.4177) data time 0.0007 (0.0025) model time 0.3957 (0.4177) loss 7.9848 (7.1606) grad_norm 2.8999 (2.7229) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][370/625] eta 0:01:46 lr 0.000569 wd 0.0500 time 0.3928 (0.4175) data time 0.0008 (0.0024) model time 0.3920 (0.4175) loss 8.2048 (7.1558) grad_norm 2.9058 (2.7316) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][380/625] eta 0:01:42 lr 0.000569 wd 0.0500 time 0.3968 (0.4171) data time 0.0007 (0.0024) model time 0.3962 (0.4170) loss 6.8272 (7.1564) grad_norm 3.1372 (2.7343) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][390/625] eta 0:01:37 lr 0.000569 wd 0.0500 time 0.3960 (0.4167) data time 0.0007 (0.0024) model time 0.3953 (0.4165) loss 6.7627 (7.1650) grad_norm 4.0593 (2.7373) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][400/625] eta 0:01:33 lr 0.000568 wd 0.0500 time 0.3988 (0.4162) data time 0.0006 (0.0023) model time 0.3982 (0.4159) loss 6.6290 (7.1532) grad_norm 2.0557 (2.7392) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][410/625] eta 0:01:29 lr 0.000568 wd 0.0500 time 0.4004 (0.4159) data time 0.0006 (0.0023) model time 0.3997 (0.4155) loss 6.4132 (7.1470) grad_norm 2.5570 (2.7293) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][420/625] eta 0:01:25 lr 0.000568 wd 0.0500 time 0.3951 (0.4155) data time 0.0006 (0.0023) model time 0.3945 (0.4151) loss 7.9574 (7.1506) grad_norm 5.3712 (2.7309) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][430/625] eta 0:01:20 lr 0.000568 wd 0.0500 time 0.4019 (0.4152) data time 0.0007 (0.0022) model time 0.4013 (0.4147) loss 6.7584 (7.1534) grad_norm 3.7465 (2.7530) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][440/625] eta 0:01:16 lr 0.000568 wd 0.0500 time 0.4022 (0.4148) data time 0.0008 (0.0022) model time 0.4015 (0.4143) loss 7.1960 (7.1558) grad_norm 2.9097 (2.7791) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][450/625] eta 0:01:12 lr 0.000568 wd 0.0500 time 0.3964 (0.4145) data time 0.0008 (0.0022) model time 0.3956 (0.4139) loss 8.0543 (7.1593) grad_norm 2.3924 (2.7806) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][460/625] eta 0:01:08 lr 0.000568 wd 0.0500 time 0.3979 (0.4142) data time 0.0007 (0.0021) model time 0.3972 (0.4136) loss 7.1914 (7.1605) grad_norm 2.4535 (2.7700) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][470/625] eta 0:01:04 lr 0.000568 wd 0.0500 time 0.4012 (0.4139) data time 0.0009 (0.0021) model time 0.4004 (0.4132) loss 6.1336 (7.1623) grad_norm 2.6077 (2.7656) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][480/625] eta 0:00:59 lr 0.000568 wd 0.0500 time 0.3968 (0.4136) data time 0.0008 (0.0021) model time 0.3960 (0.4129) loss 7.1438 (7.1734) grad_norm 2.4953 (2.7604) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][490/625] eta 0:00:55 lr 0.000567 wd 0.0500 time 0.4060 (0.4134) data time 0.0009 (0.0021) model time 0.4052 (0.4126) loss 6.2735 (7.1693) grad_norm 3.2571 (2.7539) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:06:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][500/625] eta 0:00:51 lr 0.000567 wd 0.0500 time 0.4016 (0.4131) data time 0.0009 (0.0021) model time 0.4007 (0.4123) loss 8.4814 (7.1742) grad_norm 2.3763 (2.7572) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][510/625] eta 0:00:47 lr 0.000567 wd 0.0500 time 0.3947 (0.4129) data time 0.0007 (0.0020) model time 0.3940 (0.4120) loss 7.6322 (7.1678) grad_norm 3.3277 (2.7610) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][520/625] eta 0:00:43 lr 0.000567 wd 0.0500 time 0.4081 (0.4127) data time 0.0008 (0.0020) model time 0.4073 (0.4118) loss 7.2389 (7.1598) grad_norm 2.1421 (2.7627) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][530/625] eta 0:00:39 lr 0.000567 wd 0.0500 time 0.3985 (0.4143) data time 0.0009 (0.0020) model time 0.3976 (0.4136) loss 7.3042 (7.1597) grad_norm 4.6730 (2.7683) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][540/625] eta 0:00:35 lr 0.000567 wd 0.0500 time 0.5768 (0.4154) data time 0.0009 (0.0020) model time 0.5759 (0.4148) loss 7.6824 (7.1642) grad_norm 1.9680 (2.7759) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][550/625] eta 0:00:31 lr 0.000567 wd 0.0500 time 0.5684 (0.4167) data time 0.0009 (0.0020) model time 0.5675 (0.4163) loss 5.9809 (7.1586) grad_norm 3.9142 (2.7673) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][560/625] eta 0:00:27 lr 0.000567 wd 0.0500 time 0.4061 (0.4171) data time 0.0007 (0.0019) model time 0.4054 (0.4166) loss 7.5400 (7.1576) grad_norm 5.4928 (2.7805) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][570/625] eta 0:00:22 lr 0.000567 wd 0.0500 time 0.3971 (0.4167) data time 0.0008 (0.0019) model time 0.3962 (0.4163) loss 6.5231 (7.1539) grad_norm 3.5965 (2.7897) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][580/625] eta 0:00:18 lr 0.000567 wd 0.0500 time 0.3990 (0.4164) data time 0.0007 (0.0019) model time 0.3983 (0.4159) loss 5.8826 (7.1470) grad_norm 1.9940 (2.7989) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][590/625] eta 0:00:14 lr 0.000566 wd 0.0500 time 0.3991 (0.4164) data time 0.0007 (0.0019) model time 0.3984 (0.4159) loss 7.2712 (7.1494) grad_norm 2.4154 (2.8013) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][600/625] eta 0:00:10 lr 0.000566 wd 0.0500 time 0.3955 (0.4162) data time 0.0006 (0.0019) model time 0.3949 (0.4156) loss 8.7644 (7.1590) grad_norm 1.8037 (2.7925) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][610/625] eta 0:00:06 lr 0.000566 wd 0.0500 time 0.3952 (0.4159) data time 0.0005 (0.0019) model time 0.3947 (0.4153) loss 6.3787 (7.1544) grad_norm 2.7120 (2.7945) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][620/625] eta 0:00:02 lr 0.000566 wd 0.0500 time 0.4009 (0.4156) data time 0.0006 (0.0018) model time 0.4003 (0.4149) loss 7.6743 (7.1575) grad_norm 3.1078 (2.7968) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:07:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 165 training takes 0:04:19 [2024-07-25 04:07:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:07:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:07:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.6016 (0.6016) Acc@1 89.014 (89.014) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 04:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9263 (0.7322) Acc@1 79.736 (85.338) Acc@5 95.654 (97.354) Mem 14939MB [2024-07-25 04:07:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0225 (0.8559) Acc@1 76.758 (82.075) Acc@5 94.727 (96.110) Mem 14939MB [2024-07-25 04:07:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.726 Acc@5 96.061 [2024-07-25 04:07:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 04:07:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.73% [2024-07-25 04:07:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 04:07:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 04:07:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.438 (0.438) Loss 0.5596 (0.5596) Acc@1 89.453 (89.453) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 04:07:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 0.8906 (0.6977) Acc@1 80.713 (86.080) Acc@5 96.094 (97.630) Mem 14939MB [2024-07-25 04:07:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0215 (0.8200) Acc@1 75.830 (82.706) Acc@5 95.020 (96.377) Mem 14939MB [2024-07-25 04:07:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.290 Acc@5 96.343 [2024-07-25 04:07:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-07-25 04:07:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.29% [2024-07-25 04:07:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:07:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:07:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][0/625] eta 0:08:28 lr 0.000566 wd 0.0500 time 0.8142 (0.8142) data time 0.4362 (0.4362) model time 0.0000 (0.0000) loss 6.3654 (6.3654) grad_norm 3.1166 (3.1166) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][10/625] eta 0:04:27 lr 0.000566 wd 0.0500 time 0.3941 (0.4355) data time 0.0006 (0.0404) model time 0.0000 (0.0000) loss 8.7462 (7.4483) grad_norm 12.6447 (3.3995) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][20/625] eta 0:04:12 lr 0.000566 wd 0.0500 time 0.3981 (0.4178) data time 0.0006 (0.0216) model time 0.0000 (0.0000) loss 7.3539 (7.2011) grad_norm 2.4192 (3.0096) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][30/625] eta 0:04:05 lr 0.000566 wd 0.0500 time 0.4095 (0.4122) data time 0.0009 (0.0150) model time 0.0000 (0.0000) loss 6.1991 (7.1858) grad_norm 2.3094 (2.7401) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][40/625] eta 0:03:59 lr 0.000566 wd 0.0500 time 0.3949 (0.4095) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 8.2412 (7.1408) grad_norm 2.5589 (2.7630) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][50/625] eta 0:03:54 lr 0.000566 wd 0.0500 time 0.3981 (0.4076) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 6.8716 (7.1769) grad_norm 2.3436 (3.0503) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][60/625] eta 0:03:49 lr 0.000565 wd 0.0500 time 0.3994 (0.4063) data time 0.0009 (0.0081) model time 0.3985 (0.3986) loss 6.0915 (7.1567) grad_norm 3.8731 (3.1644) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][70/625] eta 0:03:45 lr 0.000565 wd 0.0500 time 0.4009 (0.4055) data time 0.0006 (0.0071) model time 0.4003 (0.3993) loss 8.3378 (7.1995) grad_norm 4.1140 (3.1627) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][80/625] eta 0:03:40 lr 0.000565 wd 0.0500 time 0.3969 (0.4050) data time 0.0008 (0.0063) model time 0.3961 (0.3997) loss 7.6002 (7.1538) grad_norm 2.9603 (3.2281) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][90/625] eta 0:03:37 lr 0.000565 wd 0.0500 time 0.4005 (0.4069) data time 0.0008 (0.0057) model time 0.3997 (0.4050) loss 6.7834 (7.1232) grad_norm 2.4974 (3.2683) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][100/625] eta 0:03:33 lr 0.000565 wd 0.0500 time 0.3976 (0.4061) data time 0.0007 (0.0052) model time 0.3969 (0.4038) loss 6.5470 (7.1287) grad_norm 4.2283 (3.2238) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][110/625] eta 0:03:28 lr 0.000565 wd 0.0500 time 0.3939 (0.4056) data time 0.0008 (0.0048) model time 0.3932 (0.4030) loss 5.9345 (7.1379) grad_norm 2.6413 (3.1499) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][120/625] eta 0:03:25 lr 0.000565 wd 0.0500 time 0.5458 (0.4078) data time 0.0008 (0.0045) model time 0.5450 (0.4071) loss 5.5994 (7.1458) grad_norm 1.7289 (3.0983) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][130/625] eta 0:03:23 lr 0.000565 wd 0.0500 time 0.6083 (0.4114) data time 0.0007 (0.0042) model time 0.6077 (0.4129) loss 8.2405 (7.1384) grad_norm 2.1600 (3.0456) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:08:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][140/625] eta 0:03:23 lr 0.000565 wd 0.0500 time 0.5598 (0.4188) data time 0.0007 (0.0040) model time 0.5591 (0.4243) loss 7.0464 (7.1324) grad_norm 2.9145 (3.0618) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][150/625] eta 0:03:20 lr 0.000564 wd 0.0500 time 0.5741 (0.4212) data time 0.0006 (0.0038) model time 0.5736 (0.4273) loss 7.6165 (7.1439) grad_norm 2.1980 (3.0096) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][160/625] eta 0:03:15 lr 0.000564 wd 0.0500 time 0.3998 (0.4211) data time 0.0007 (0.0036) model time 0.3991 (0.4265) loss 8.9188 (7.1433) grad_norm 1.7183 (2.9745) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][170/625] eta 0:03:11 lr 0.000564 wd 0.0500 time 0.3973 (0.4202) data time 0.0007 (0.0035) model time 0.3966 (0.4246) loss 7.7570 (7.1139) grad_norm 1.8831 (2.9710) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][180/625] eta 0:03:06 lr 0.000564 wd 0.0500 time 0.3979 (0.4190) data time 0.0009 (0.0034) model time 0.3970 (0.4225) loss 5.9825 (7.1010) grad_norm 2.3580 (2.9761) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][190/625] eta 0:03:01 lr 0.000564 wd 0.0500 time 0.3973 (0.4180) data time 0.0007 (0.0032) model time 0.3965 (0.4208) loss 6.6027 (7.1078) grad_norm 3.4188 (2.9675) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][200/625] eta 0:02:57 lr 0.000564 wd 0.0500 time 0.3965 (0.4170) data time 0.0009 (0.0031) model time 0.3955 (0.4193) loss 7.3343 (7.1167) grad_norm 3.1078 (2.9425) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][210/625] eta 0:02:52 lr 0.000564 wd 0.0500 time 0.3992 (0.4161) data time 0.0006 (0.0030) model time 0.3986 (0.4179) loss 6.5337 (7.1089) grad_norm 2.2077 (2.9338) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][220/625] eta 0:02:48 lr 0.000564 wd 0.0500 time 0.4008 (0.4154) data time 0.0008 (0.0029) model time 0.4000 (0.4168) loss 7.6646 (7.1108) grad_norm 2.4864 (2.9021) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][230/625] eta 0:02:43 lr 0.000564 wd 0.0500 time 0.3959 (0.4147) data time 0.0007 (0.0028) model time 0.3952 (0.4158) loss 5.8107 (7.1131) grad_norm 3.6756 (2.9235) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][240/625] eta 0:02:39 lr 0.000563 wd 0.0500 time 0.3954 (0.4141) data time 0.0008 (0.0028) model time 0.3946 (0.4149) loss 7.0579 (7.1305) grad_norm 2.5518 (2.9167) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][250/625] eta 0:02:35 lr 0.000563 wd 0.0500 time 0.4002 (0.4135) data time 0.0008 (0.0027) model time 0.3994 (0.4141) loss 5.7520 (7.1263) grad_norm 1.8940 (2.9094) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:09:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][260/625] eta 0:02:30 lr 0.000563 wd 0.0500 time 0.3953 (0.4130) data time 0.0008 (0.0026) model time 0.3945 (0.4134) loss 6.3492 (7.1342) grad_norm 1.9649 (2.8796) loss_scale 2048.0000 (1047.5402) mem 14939MB [2024-07-25 04:09:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][270/625] eta 0:02:26 lr 0.000563 wd 0.0500 time 0.3986 (0.4126) data time 0.0009 (0.0025) model time 0.3978 (0.4128) loss 7.4849 (7.1319) grad_norm 2.8694 (2.8822) loss_scale 2048.0000 (1084.4576) mem 14939MB [2024-07-25 04:09:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][280/625] eta 0:02:22 lr 0.000563 wd 0.0500 time 0.4074 (0.4122) data time 0.0008 (0.0025) model time 0.4066 (0.4122) loss 7.9076 (7.1470) grad_norm 2.0771 (2.8845) loss_scale 2048.0000 (1118.7473) mem 14939MB [2024-07-25 04:09:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][290/625] eta 0:02:17 lr 0.000563 wd 0.0500 time 0.3995 (0.4118) data time 0.0008 (0.0024) model time 0.3988 (0.4117) loss 7.1265 (7.1413) grad_norm 3.7593 (2.8753) loss_scale 2048.0000 (1150.6804) mem 14939MB [2024-07-25 04:10:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][300/625] eta 0:02:13 lr 0.000563 wd 0.0500 time 0.4008 (0.4114) data time 0.0006 (0.0024) model time 0.4002 (0.4112) loss 6.9399 (7.1342) grad_norm 1.9466 (2.8740) loss_scale 2048.0000 (1180.4917) mem 14939MB [2024-07-25 04:10:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][310/625] eta 0:02:09 lr 0.000563 wd 0.0500 time 0.4039 (0.4117) data time 0.0009 (0.0023) model time 0.4029 (0.4116) loss 7.2484 (7.1287) grad_norm 4.5698 (2.8667) loss_scale 2048.0000 (1208.3859) mem 14939MB [2024-07-25 04:10:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][320/625] eta 0:02:05 lr 0.000563 wd 0.0500 time 0.3966 (0.4112) data time 0.0008 (0.0023) model time 0.3958 (0.4110) loss 6.9810 (7.1313) grad_norm 1.6624 (2.8552) loss_scale 2048.0000 (1234.5421) mem 14939MB [2024-07-25 04:10:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][330/625] eta 0:02:01 lr 0.000563 wd 0.0500 time 0.3999 (0.4109) data time 0.0008 (0.0023) model time 0.3991 (0.4106) loss 7.6533 (7.1448) grad_norm 2.7471 (2.8520) loss_scale 2048.0000 (1259.1178) mem 14939MB [2024-07-25 04:10:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][340/625] eta 0:01:57 lr 0.000562 wd 0.0500 time 0.4039 (0.4111) data time 0.0006 (0.0022) model time 0.4033 (0.4108) loss 6.3328 (7.1399) grad_norm 1.7968 (2.8438) loss_scale 2048.0000 (1282.2522) mem 14939MB [2024-07-25 04:10:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][350/625] eta 0:01:53 lr 0.000562 wd 0.0500 time 0.5950 (0.4125) data time 0.0006 (0.0022) model time 0.5944 (0.4124) loss 6.8379 (7.1399) grad_norm 1.9009 (2.8277) loss_scale 2048.0000 (1304.0684) mem 14939MB [2024-07-25 04:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][360/625] eta 0:01:50 lr 0.000562 wd 0.0500 time 0.5946 (0.4151) data time 0.0009 (0.0021) model time 0.5938 (0.4154) loss 7.9916 (7.1460) grad_norm 1.9603 (2.8109) loss_scale 2048.0000 (1324.6759) mem 14939MB [2024-07-25 04:10:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][370/625] eta 0:01:46 lr 0.000562 wd 0.0500 time 0.5696 (0.4170) data time 0.0009 (0.0021) model time 0.5687 (0.4176) loss 7.4414 (7.1448) grad_norm 2.6546 (2.7967) loss_scale 2048.0000 (1344.1725) mem 14939MB [2024-07-25 04:10:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][380/625] eta 0:01:42 lr 0.000562 wd 0.0500 time 0.3997 (0.4166) data time 0.0007 (0.0021) model time 0.3991 (0.4170) loss 8.3379 (7.1440) grad_norm 3.5612 (2.7951) loss_scale 2048.0000 (1362.6457) mem 14939MB [2024-07-25 04:10:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][390/625] eta 0:01:37 lr 0.000562 wd 0.0500 time 0.3969 (0.4161) data time 0.0006 (0.0021) model time 0.3963 (0.4164) loss 7.5171 (7.1523) grad_norm 2.7903 (2.7811) loss_scale 2048.0000 (1380.1739) mem 14939MB [2024-07-25 04:10:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][400/625] eta 0:01:33 lr 0.000562 wd 0.0500 time 0.3937 (0.4156) data time 0.0006 (0.0020) model time 0.3931 (0.4159) loss 7.2871 (7.1572) grad_norm 2.0923 (2.7931) loss_scale 2048.0000 (1396.8279) mem 14939MB [2024-07-25 04:10:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][410/625] eta 0:01:29 lr 0.000562 wd 0.0500 time 0.3958 (0.4152) data time 0.0008 (0.0020) model time 0.3951 (0.4154) loss 7.5412 (7.1578) grad_norm 2.5930 (2.7884) loss_scale 2048.0000 (1412.6715) mem 14939MB [2024-07-25 04:10:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][420/625] eta 0:01:25 lr 0.000562 wd 0.0500 time 0.3996 (0.4148) data time 0.0008 (0.0020) model time 0.3988 (0.4149) loss 8.2729 (7.1656) grad_norm 2.7402 (2.7814) loss_scale 2048.0000 (1427.7625) mem 14939MB [2024-07-25 04:10:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][430/625] eta 0:01:20 lr 0.000561 wd 0.0500 time 0.3992 (0.4145) data time 0.0007 (0.0020) model time 0.3985 (0.4144) loss 7.0048 (7.1514) grad_norm 8.1645 (2.7802) loss_scale 2048.0000 (1442.1531) mem 14939MB [2024-07-25 04:10:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][440/625] eta 0:01:16 lr 0.000561 wd 0.0500 time 0.3965 (0.4141) data time 0.0006 (0.0019) model time 0.3958 (0.4140) loss 6.5081 (7.1598) grad_norm 1.9634 (2.7754) loss_scale 2048.0000 (1455.8912) mem 14939MB [2024-07-25 04:11:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][450/625] eta 0:01:12 lr 0.000561 wd 0.0500 time 0.3994 (0.4138) data time 0.0006 (0.0019) model time 0.3988 (0.4136) loss 6.1909 (7.1596) grad_norm 3.7016 (2.7795) loss_scale 2048.0000 (1469.0200) mem 14939MB [2024-07-25 04:11:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][460/625] eta 0:01:08 lr 0.000561 wd 0.0500 time 0.4253 (0.4135) data time 0.0006 (0.0019) model time 0.4247 (0.4133) loss 6.3659 (7.1576) grad_norm 2.1026 (2.7748) loss_scale 2048.0000 (1481.5792) mem 14939MB [2024-07-25 04:11:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][470/625] eta 0:01:04 lr 0.000561 wd 0.0500 time 0.3982 (0.4132) data time 0.0009 (0.0019) model time 0.3973 (0.4129) loss 8.0824 (7.1465) grad_norm 3.5756 (2.8112) loss_scale 2048.0000 (1493.6051) mem 14939MB [2024-07-25 04:11:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][480/625] eta 0:00:59 lr 0.000561 wd 0.0500 time 0.3974 (0.4129) data time 0.0006 (0.0018) model time 0.3968 (0.4126) loss 6.1184 (7.1455) grad_norm 3.4790 (2.8273) loss_scale 2048.0000 (1505.1310) mem 14939MB [2024-07-25 04:11:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][490/625] eta 0:00:55 lr 0.000561 wd 0.0500 time 0.4064 (0.4127) data time 0.0006 (0.0018) model time 0.4057 (0.4123) loss 5.7569 (7.1476) grad_norm 1.9688 (2.8237) loss_scale 2048.0000 (1516.1874) mem 14939MB [2024-07-25 04:11:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][500/625] eta 0:00:51 lr 0.000561 wd 0.0500 time 0.3965 (0.4124) data time 0.0008 (0.0018) model time 0.3957 (0.4120) loss 6.3864 (7.1333) grad_norm 2.8852 (2.8200) loss_scale 2048.0000 (1526.8024) mem 14939MB [2024-07-25 04:11:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][510/625] eta 0:00:47 lr 0.000561 wd 0.0500 time 0.3986 (0.4122) data time 0.0008 (0.0018) model time 0.3978 (0.4117) loss 7.8265 (7.1376) grad_norm 2.7695 (2.8187) loss_scale 2048.0000 (1537.0020) mem 14939MB [2024-07-25 04:11:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][520/625] eta 0:00:43 lr 0.000561 wd 0.0500 time 0.4052 (0.4120) data time 0.0008 (0.0018) model time 0.4044 (0.4114) loss 7.1113 (7.1364) grad_norm 3.0376 (2.8150) loss_scale 2048.0000 (1546.8100) mem 14939MB [2024-07-25 04:11:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][530/625] eta 0:00:39 lr 0.000560 wd 0.0500 time 0.3980 (0.4121) data time 0.0008 (0.0018) model time 0.3972 (0.4116) loss 7.2103 (7.1468) grad_norm 1.6156 (2.8014) loss_scale 2048.0000 (1556.2486) mem 14939MB [2024-07-25 04:11:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][540/625] eta 0:00:35 lr 0.000560 wd 0.0500 time 0.3973 (0.4119) data time 0.0007 (0.0018) model time 0.3966 (0.4114) loss 7.2695 (7.1500) grad_norm 3.1217 (2.7932) loss_scale 2048.0000 (1565.3383) mem 14939MB [2024-07-25 04:11:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][550/625] eta 0:00:30 lr 0.000560 wd 0.0500 time 0.3972 (0.4116) data time 0.0007 (0.0018) model time 0.3965 (0.4111) loss 7.6492 (7.1468) grad_norm 2.4833 (2.7965) loss_scale 2048.0000 (1574.0980) mem 14939MB [2024-07-25 04:11:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][560/625] eta 0:00:26 lr 0.000560 wd 0.0500 time 0.3984 (0.4117) data time 0.0006 (0.0017) model time 0.3978 (0.4111) loss 5.8649 (7.1439) grad_norm 2.0714 (2.7844) loss_scale 2048.0000 (1582.5455) mem 14939MB [2024-07-25 04:11:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][570/625] eta 0:00:22 lr 0.000560 wd 0.0500 time 0.5768 (0.4128) data time 0.0006 (0.0017) model time 0.5762 (0.4123) loss 5.6681 (7.1359) grad_norm 3.0585 (2.7932) loss_scale 2048.0000 (1590.6970) mem 14939MB [2024-07-25 04:11:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][580/625] eta 0:00:18 lr 0.000560 wd 0.0500 time 0.6085 (0.4150) data time 0.0007 (0.0017) model time 0.6078 (0.4147) loss 6.1489 (7.1390) grad_norm 2.6792 (2.7951) loss_scale 2048.0000 (1598.5680) mem 14939MB [2024-07-25 04:12:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][590/625] eta 0:00:14 lr 0.000560 wd 0.0500 time 0.3979 (0.4160) data time 0.0006 (0.0017) model time 0.3973 (0.4158) loss 6.2706 (7.1325) grad_norm 2.3229 (2.7918) loss_scale 2048.0000 (1606.1726) mem 14939MB [2024-07-25 04:12:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][600/625] eta 0:00:10 lr 0.000560 wd 0.0500 time 0.3981 (0.4160) data time 0.0008 (0.0017) model time 0.3973 (0.4158) loss 7.8749 (7.1347) grad_norm 1.6835 (2.7809) loss_scale 2048.0000 (1613.5241) mem 14939MB [2024-07-25 04:12:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][610/625] eta 0:00:06 lr 0.000560 wd 0.0500 time 0.3917 (0.4157) data time 0.0008 (0.0017) model time 0.3909 (0.4155) loss 7.4182 (7.1376) grad_norm 1.5921 (inf) loss_scale 1024.0000 (1612.2553) mem 14939MB [2024-07-25 04:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][620/625] eta 0:00:02 lr 0.000559 wd 0.0500 time 0.4070 (0.4155) data time 0.0005 (0.0017) model time 0.4065 (0.4152) loss 6.5722 (7.1369) grad_norm 2.3112 (inf) loss_scale 1024.0000 (1602.7826) mem 14939MB [2024-07-25 04:12:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 166 training takes 0:04:19 [2024-07-25 04:12:16 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:12:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:12:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.446 (0.446) Loss 0.5962 (0.5962) Acc@1 88.818 (88.818) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 04:12:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 0.9268 (0.7213) Acc@1 79.883 (85.587) Acc@5 95.654 (97.488) Mem 14939MB [2024-07-25 04:12:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0469 (0.8458) Acc@1 75.879 (82.196) Acc@5 94.189 (96.166) Mem 14939MB [2024-07-25 04:12:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.786 Acc@5 96.107 [2024-07-25 04:12:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-07-25 04:12:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.79% [2024-07-25 04:12:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 04:12:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 04:12:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5596 (0.5596) Acc@1 89.453 (89.453) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 04:12:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8892 (0.6973) Acc@1 80.908 (86.088) Acc@5 96.143 (97.603) Mem 14939MB [2024-07-25 04:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0215 (0.8192) Acc@1 75.781 (82.703) Acc@5 95.068 (96.366) Mem 14939MB [2024-07-25 04:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.286 Acc@5 96.333 [2024-07-25 04:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-07-25 04:12:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][0/625] eta 0:13:06 lr 0.000559 wd 0.0500 time 1.2581 (1.2581) data time 0.6458 (0.6458) model time 0.0000 (0.0000) loss 6.5868 (6.5868) grad_norm 5.8687 (5.8687) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:12:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][10/625] eta 0:04:54 lr 0.000559 wd 0.0500 time 0.4005 (0.4785) data time 0.0007 (0.0596) model time 0.0000 (0.0000) loss 6.7517 (6.9460) grad_norm 3.1889 (2.7821) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][20/625] eta 0:04:26 lr 0.000559 wd 0.0500 time 0.3998 (0.4411) data time 0.0006 (0.0316) model time 0.0000 (0.0000) loss 7.5894 (7.0953) grad_norm 2.1184 (2.5354) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:12:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][30/625] eta 0:04:14 lr 0.000559 wd 0.0500 time 0.3998 (0.4280) data time 0.0009 (0.0217) model time 0.0000 (0.0000) loss 7.4816 (7.0263) grad_norm 5.2115 (2.8941) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:12:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][40/625] eta 0:04:06 lr 0.000559 wd 0.0500 time 0.3999 (0.4214) data time 0.0006 (0.0166) model time 0.0000 (0.0000) loss 7.6708 (7.0632) grad_norm 3.5151 (3.0491) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:12:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][50/625] eta 0:04:00 lr 0.000559 wd 0.0500 time 0.4002 (0.4179) data time 0.0008 (0.0136) model time 0.0000 (0.0000) loss 6.7514 (7.0362) grad_norm 2.1366 (2.8951) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:12:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][60/625] eta 0:03:56 lr 0.000559 wd 0.0500 time 0.3933 (0.4185) data time 0.0008 (0.0115) model time 0.3925 (0.4209) loss 6.6494 (7.0656) grad_norm 2.8387 (2.8853) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:12:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][70/625] eta 0:03:50 lr 0.000559 wd 0.0500 time 0.3977 (0.4160) data time 0.0006 (0.0100) model time 0.3971 (0.4104) loss 7.4212 (7.0230) grad_norm 1.8846 (2.9292) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:12:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][80/625] eta 0:03:45 lr 0.000559 wd 0.0500 time 0.4074 (0.4139) data time 0.0008 (0.0089) model time 0.4066 (0.4062) loss 7.0913 (7.0877) grad_norm 2.2240 (2.9255) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:13:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][90/625] eta 0:03:40 lr 0.000558 wd 0.0500 time 0.3991 (0.4122) data time 0.0008 (0.0080) model time 0.3982 (0.4040) loss 7.5671 (7.0497) grad_norm 3.1730 (2.9169) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:13:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][100/625] eta 0:03:35 lr 0.000558 wd 0.0500 time 0.3980 (0.4110) data time 0.0009 (0.0073) model time 0.3970 (0.4030) loss 8.2437 (7.0899) grad_norm 2.9192 (2.9008) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 04:13:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][110/625] eta 0:03:31 lr 0.000558 wd 0.0500 time 0.4040 (0.4098) data time 0.0009 (0.0067) model time 0.4031 (0.4020) loss 7.4337 (7.0992) grad_norm 1.6372 (inf) loss_scale 512.0000 (977.8739) mem 14939MB [2024-07-25 04:13:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][120/625] eta 0:03:26 lr 0.000558 wd 0.0500 time 0.3945 (0.4090) data time 0.0010 (0.0063) model time 0.3935 (0.4015) loss 7.2955 (7.0895) grad_norm 6.5095 (inf) loss_scale 512.0000 (939.3719) mem 14939MB [2024-07-25 04:13:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][130/625] eta 0:03:22 lr 0.000558 wd 0.0500 time 0.3975 (0.4083) data time 0.0006 (0.0059) model time 0.3968 (0.4013) loss 6.3395 (7.1066) grad_norm 2.9389 (inf) loss_scale 512.0000 (906.7481) mem 14939MB [2024-07-25 04:13:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][140/625] eta 0:03:17 lr 0.000558 wd 0.0500 time 0.3966 (0.4077) data time 0.0009 (0.0055) model time 0.3957 (0.4010) loss 7.1536 (7.0802) grad_norm 2.4889 (inf) loss_scale 512.0000 (878.7518) mem 14939MB [2024-07-25 04:13:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][150/625] eta 0:03:13 lr 0.000558 wd 0.0500 time 0.3953 (0.4071) data time 0.0008 (0.0052) model time 0.3944 (0.4008) loss 6.7132 (7.0761) grad_norm 2.9486 (inf) loss_scale 512.0000 (854.4636) mem 14939MB [2024-07-25 04:13:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][160/625] eta 0:03:10 lr 0.000558 wd 0.0500 time 0.5757 (0.4098) data time 0.0006 (0.0049) model time 0.5751 (0.4051) loss 8.2080 (7.0622) grad_norm 2.1076 (inf) loss_scale 512.0000 (833.1925) mem 14939MB [2024-07-25 04:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][170/625] eta 0:03:08 lr 0.000558 wd 0.0500 time 0.3955 (0.4143) data time 0.0006 (0.0047) model time 0.3948 (0.4119) loss 6.8651 (7.0793) grad_norm 2.9334 (inf) loss_scale 512.0000 (814.4094) mem 14939MB [2024-07-25 04:13:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][180/625] eta 0:03:06 lr 0.000557 wd 0.0500 time 0.5805 (0.4199) data time 0.0008 (0.0045) model time 0.5796 (0.4199) loss 6.6727 (7.0877) grad_norm 2.5169 (inf) loss_scale 512.0000 (797.7017) mem 14939MB [2024-07-25 04:13:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][190/625] eta 0:03:03 lr 0.000557 wd 0.0500 time 0.4038 (0.4218) data time 0.0009 (0.0043) model time 0.4029 (0.4223) loss 5.4119 (7.0823) grad_norm 2.0727 (inf) loss_scale 512.0000 (782.7435) mem 14939MB [2024-07-25 04:13:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][200/625] eta 0:02:58 lr 0.000557 wd 0.0500 time 0.4018 (0.4208) data time 0.0007 (0.0041) model time 0.4012 (0.4209) loss 6.8303 (7.0830) grad_norm 3.7513 (inf) loss_scale 512.0000 (769.2736) mem 14939MB [2024-07-25 04:13:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][210/625] eta 0:02:54 lr 0.000557 wd 0.0500 time 0.4001 (0.4200) data time 0.0008 (0.0040) model time 0.3993 (0.4198) loss 7.2070 (7.0940) grad_norm 2.9295 (inf) loss_scale 512.0000 (757.0806) mem 14939MB [2024-07-25 04:13:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][220/625] eta 0:02:49 lr 0.000557 wd 0.0500 time 0.4164 (0.4193) data time 0.0008 (0.0039) model time 0.4156 (0.4188) loss 7.2945 (7.0993) grad_norm 4.6243 (inf) loss_scale 512.0000 (745.9910) mem 14939MB [2024-07-25 04:14:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][230/625] eta 0:02:45 lr 0.000557 wd 0.0500 time 0.3953 (0.4185) data time 0.0008 (0.0037) model time 0.3945 (0.4177) loss 7.7286 (7.1132) grad_norm 2.6144 (inf) loss_scale 512.0000 (735.8615) mem 14939MB [2024-07-25 04:14:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][240/625] eta 0:02:40 lr 0.000557 wd 0.0500 time 0.3964 (0.4180) data time 0.0007 (0.0036) model time 0.3957 (0.4170) loss 8.0382 (7.1309) grad_norm 2.6958 (inf) loss_scale 512.0000 (726.5726) mem 14939MB [2024-07-25 04:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][250/625] eta 0:02:36 lr 0.000557 wd 0.0500 time 0.4107 (0.4172) data time 0.0008 (0.0035) model time 0.4099 (0.4161) loss 8.0808 (7.1432) grad_norm 2.3973 (inf) loss_scale 512.0000 (718.0239) mem 14939MB [2024-07-25 04:14:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][260/625] eta 0:02:32 lr 0.000557 wd 0.0500 time 0.3948 (0.4165) data time 0.0008 (0.0034) model time 0.3940 (0.4152) loss 8.1167 (7.1491) grad_norm 2.3748 (inf) loss_scale 512.0000 (710.1303) mem 14939MB [2024-07-25 04:14:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][270/625] eta 0:02:27 lr 0.000557 wd 0.0500 time 0.3981 (0.4158) data time 0.0007 (0.0033) model time 0.3974 (0.4144) loss 7.2642 (7.1545) grad_norm 2.2705 (inf) loss_scale 512.0000 (702.8192) mem 14939MB [2024-07-25 04:14:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][280/625] eta 0:02:23 lr 0.000556 wd 0.0500 time 0.3940 (0.4161) data time 0.0006 (0.0033) model time 0.3934 (0.4147) loss 6.7811 (7.1657) grad_norm 2.5634 (inf) loss_scale 512.0000 (696.0285) mem 14939MB [2024-07-25 04:14:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][290/625] eta 0:02:19 lr 0.000556 wd 0.0500 time 0.3973 (0.4155) data time 0.0008 (0.0032) model time 0.3965 (0.4140) loss 7.1751 (7.1753) grad_norm 4.3469 (inf) loss_scale 512.0000 (689.7045) mem 14939MB [2024-07-25 04:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][300/625] eta 0:02:14 lr 0.000556 wd 0.0500 time 0.4013 (0.4150) data time 0.0007 (0.0031) model time 0.4007 (0.4135) loss 7.9824 (7.1924) grad_norm 1.9565 (inf) loss_scale 512.0000 (683.8007) mem 14939MB [2024-07-25 04:14:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][310/625] eta 0:02:10 lr 0.000556 wd 0.0500 time 0.3976 (0.4146) data time 0.0006 (0.0030) model time 0.3969 (0.4130) loss 7.5371 (7.1945) grad_norm 3.5126 (inf) loss_scale 512.0000 (678.2765) mem 14939MB [2024-07-25 04:14:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][320/625] eta 0:02:06 lr 0.000556 wd 0.0500 time 0.4001 (0.4144) data time 0.0007 (0.0030) model time 0.3994 (0.4128) loss 5.9149 (7.1760) grad_norm 2.4644 (inf) loss_scale 512.0000 (673.0966) mem 14939MB [2024-07-25 04:14:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][330/625] eta 0:02:02 lr 0.000556 wd 0.0500 time 0.4044 (0.4141) data time 0.0007 (0.0029) model time 0.4038 (0.4124) loss 8.6754 (7.1888) grad_norm 3.6556 (inf) loss_scale 512.0000 (668.2296) mem 14939MB [2024-07-25 04:14:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][340/625] eta 0:01:57 lr 0.000556 wd 0.0500 time 0.3969 (0.4137) data time 0.0011 (0.0028) model time 0.3958 (0.4120) loss 6.3229 (7.1831) grad_norm 2.5160 (inf) loss_scale 512.0000 (663.6481) mem 14939MB [2024-07-25 04:14:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][350/625] eta 0:01:53 lr 0.000556 wd 0.0500 time 0.3984 (0.4133) data time 0.0007 (0.0028) model time 0.3977 (0.4116) loss 6.2283 (7.1796) grad_norm 2.4522 (inf) loss_scale 512.0000 (659.3276) mem 14939MB [2024-07-25 04:14:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][360/625] eta 0:01:49 lr 0.000556 wd 0.0500 time 0.4025 (0.4130) data time 0.0008 (0.0027) model time 0.4017 (0.4112) loss 6.9977 (7.1693) grad_norm 2.2583 (inf) loss_scale 512.0000 (655.2465) mem 14939MB [2024-07-25 04:14:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][370/625] eta 0:01:45 lr 0.000555 wd 0.0500 time 0.3985 (0.4127) data time 0.0008 (0.0027) model time 0.3978 (0.4109) loss 5.8179 (7.1740) grad_norm 2.2666 (inf) loss_scale 512.0000 (651.3854) mem 14939MB [2024-07-25 04:15:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][380/625] eta 0:01:41 lr 0.000555 wd 0.0500 time 0.3964 (0.4133) data time 0.0006 (0.0026) model time 0.3958 (0.4116) loss 7.6239 (7.1820) grad_norm 1.5867 (inf) loss_scale 512.0000 (647.7270) mem 14939MB [2024-07-25 04:15:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][390/625] eta 0:01:37 lr 0.000555 wd 0.0500 time 0.5914 (0.4157) data time 0.0009 (0.0026) model time 0.5905 (0.4144) loss 5.7477 (7.1775) grad_norm 1.9073 (inf) loss_scale 512.0000 (644.2558) mem 14939MB [2024-07-25 04:15:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][400/625] eta 0:01:33 lr 0.000555 wd 0.0500 time 0.3942 (0.4175) data time 0.0006 (0.0026) model time 0.3936 (0.4165) loss 6.8117 (7.1751) grad_norm 2.1380 (inf) loss_scale 512.0000 (640.9576) mem 14939MB [2024-07-25 04:15:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][410/625] eta 0:01:29 lr 0.000555 wd 0.0500 time 0.4019 (0.4181) data time 0.0008 (0.0025) model time 0.4010 (0.4171) loss 6.4411 (7.1752) grad_norm 1.9466 (inf) loss_scale 512.0000 (637.8200) mem 14939MB [2024-07-25 04:15:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][420/625] eta 0:01:25 lr 0.000555 wd 0.0500 time 0.3979 (0.4176) data time 0.0009 (0.0025) model time 0.3970 (0.4166) loss 7.2475 (7.1742) grad_norm 2.5524 (inf) loss_scale 512.0000 (634.8314) mem 14939MB [2024-07-25 04:15:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][430/625] eta 0:01:21 lr 0.000555 wd 0.0500 time 0.3978 (0.4172) data time 0.0007 (0.0024) model time 0.3971 (0.4161) loss 5.9784 (7.1709) grad_norm 3.2259 (inf) loss_scale 512.0000 (631.9814) mem 14939MB [2024-07-25 04:15:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][440/625] eta 0:01:17 lr 0.000555 wd 0.0500 time 0.3981 (0.4168) data time 0.0006 (0.0024) model time 0.3974 (0.4157) loss 7.2136 (7.1682) grad_norm 1.7012 (inf) loss_scale 512.0000 (629.2608) mem 14939MB [2024-07-25 04:15:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][450/625] eta 0:01:12 lr 0.000555 wd 0.0500 time 0.3993 (0.4164) data time 0.0008 (0.0024) model time 0.3985 (0.4153) loss 7.9706 (7.1639) grad_norm 4.4907 (inf) loss_scale 512.0000 (626.6608) mem 14939MB [2024-07-25 04:15:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][460/625] eta 0:01:08 lr 0.000555 wd 0.0500 time 0.3967 (0.4160) data time 0.0007 (0.0023) model time 0.3960 (0.4148) loss 6.0375 (7.1518) grad_norm 3.9935 (inf) loss_scale 512.0000 (624.1735) mem 14939MB [2024-07-25 04:15:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][470/625] eta 0:01:04 lr 0.000554 wd 0.0500 time 0.4002 (0.4157) data time 0.0009 (0.0023) model time 0.3993 (0.4145) loss 8.3064 (7.1540) grad_norm 4.4152 (inf) loss_scale 512.0000 (621.7919) mem 14939MB [2024-07-25 04:15:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][480/625] eta 0:01:00 lr 0.000554 wd 0.0500 time 0.3987 (0.4154) data time 0.0006 (0.0023) model time 0.3981 (0.4141) loss 6.1359 (7.1595) grad_norm 2.3160 (inf) loss_scale 512.0000 (619.5094) mem 14939MB [2024-07-25 04:15:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][490/625] eta 0:00:56 lr 0.000554 wd 0.0500 time 0.3958 (0.4150) data time 0.0008 (0.0023) model time 0.3950 (0.4137) loss 8.1454 (7.1554) grad_norm 3.0133 (inf) loss_scale 512.0000 (617.3198) mem 14939MB [2024-07-25 04:15:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][500/625] eta 0:00:51 lr 0.000554 wd 0.0500 time 0.3989 (0.4151) data time 0.0009 (0.0022) model time 0.3980 (0.4139) loss 8.0270 (7.1592) grad_norm 2.2816 (inf) loss_scale 512.0000 (615.2176) mem 14939MB [2024-07-25 04:15:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][510/625] eta 0:00:47 lr 0.000554 wd 0.0500 time 0.3983 (0.4148) data time 0.0007 (0.0022) model time 0.3976 (0.4136) loss 6.6280 (7.1582) grad_norm 2.2671 (inf) loss_scale 512.0000 (613.1977) mem 14939MB [2024-07-25 04:15:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][520/625] eta 0:00:43 lr 0.000554 wd 0.0500 time 0.4132 (0.4146) data time 0.0009 (0.0022) model time 0.4123 (0.4133) loss 7.1168 (7.1621) grad_norm 3.9994 (inf) loss_scale 512.0000 (611.2553) mem 14939MB [2024-07-25 04:16:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][530/625] eta 0:00:39 lr 0.000554 wd 0.0500 time 0.3955 (0.4143) data time 0.0006 (0.0022) model time 0.3949 (0.4130) loss 6.7057 (7.1675) grad_norm 3.1526 (inf) loss_scale 512.0000 (609.3861) mem 14939MB [2024-07-25 04:16:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][540/625] eta 0:00:35 lr 0.000554 wd 0.0500 time 0.3999 (0.4141) data time 0.0006 (0.0021) model time 0.3992 (0.4128) loss 6.8871 (7.1662) grad_norm 2.3274 (inf) loss_scale 512.0000 (607.5860) mem 14939MB [2024-07-25 04:16:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][550/625] eta 0:00:31 lr 0.000554 wd 0.0500 time 0.4215 (0.4139) data time 0.0008 (0.0021) model time 0.4207 (0.4126) loss 6.1425 (7.1634) grad_norm 2.4934 (inf) loss_scale 512.0000 (605.8512) mem 14939MB [2024-07-25 04:16:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][560/625] eta 0:00:26 lr 0.000553 wd 0.0500 time 0.3972 (0.4138) data time 0.0009 (0.0021) model time 0.3963 (0.4124) loss 8.6167 (7.1638) grad_norm 1.9820 (inf) loss_scale 512.0000 (604.1783) mem 14939MB [2024-07-25 04:16:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][570/625] eta 0:00:22 lr 0.000553 wd 0.0500 time 0.4012 (0.4137) data time 0.0006 (0.0021) model time 0.4006 (0.4123) loss 7.1633 (7.1651) grad_norm 3.0999 (inf) loss_scale 512.0000 (602.5639) mem 14939MB [2024-07-25 04:16:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][580/625] eta 0:00:18 lr 0.000553 wd 0.0500 time 0.3992 (0.4135) data time 0.0008 (0.0020) model time 0.3984 (0.4121) loss 6.5966 (7.1605) grad_norm 2.7543 (inf) loss_scale 512.0000 (601.0052) mem 14939MB [2024-07-25 04:16:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][590/625] eta 0:00:14 lr 0.000553 wd 0.0500 time 0.4028 (0.4133) data time 0.0009 (0.0020) model time 0.4020 (0.4119) loss 6.4446 (7.1574) grad_norm 1.8419 (inf) loss_scale 512.0000 (599.4992) mem 14939MB [2024-07-25 04:16:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][600/625] eta 0:00:10 lr 0.000553 wd 0.0500 time 0.4003 (0.4137) data time 0.0008 (0.0020) model time 0.3994 (0.4124) loss 7.5838 (7.1591) grad_norm 2.2406 (inf) loss_scale 512.0000 (598.0433) mem 14939MB [2024-07-25 04:16:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][610/625] eta 0:00:06 lr 0.000553 wd 0.0500 time 0.3972 (0.4152) data time 0.0006 (0.0020) model time 0.3966 (0.4140) loss 6.6670 (7.1476) grad_norm 9.0225 (inf) loss_scale 512.0000 (596.6350) mem 14939MB [2024-07-25 04:16:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][620/625] eta 0:00:02 lr 0.000553 wd 0.0500 time 0.3935 (0.4165) data time 0.0005 (0.0020) model time 0.3930 (0.4154) loss 8.1225 (7.1456) grad_norm 2.7408 (inf) loss_scale 512.0000 (595.2721) mem 14939MB [2024-07-25 04:16:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 167 training takes 0:04:20 [2024-07-25 04:16:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:16:44 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:16:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.6016 (0.6016) Acc@1 88.525 (88.525) Acc@5 98.340 (98.340) Mem 14939MB [2024-07-25 04:16:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9595 (0.7350) Acc@1 79.395 (85.365) Acc@5 95.850 (97.292) Mem 14939MB [2024-07-25 04:16:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 1.0742 (0.8702) Acc@1 75.732 (81.866) Acc@5 93.799 (95.915) Mem 14939MB [2024-07-25 04:16:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.532 Acc@5 95.887 [2024-07-25 04:16:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-07-25 04:16:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.874 (0.874) Loss 0.5591 (0.5591) Acc@1 89.551 (89.551) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 04:16:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.160) Loss 0.8882 (0.6966) Acc@1 81.006 (86.075) Acc@5 96.143 (97.612) Mem 14939MB [2024-07-25 04:16:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.125) Loss 1.0205 (0.8186) Acc@1 75.684 (82.731) Acc@5 95.068 (96.377) Mem 14939MB [2024-07-25 04:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.324 Acc@5 96.343 [2024-07-25 04:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-07-25 04:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.32% [2024-07-25 04:16:50 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:16:51 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:16:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][0/625] eta 0:13:46 lr 0.000553 wd 0.0500 time 1.3223 (1.3223) data time 0.9449 (0.9449) model time 0.0000 (0.0000) loss 7.9360 (7.9360) grad_norm 2.6447 (2.6447) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:16:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][10/625] eta 0:05:08 lr 0.000553 wd 0.0500 time 0.4032 (0.5015) data time 0.0008 (0.0869) model time 0.0000 (0.0000) loss 7.6929 (7.1357) grad_norm 2.0912 (2.5543) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][20/625] eta 0:04:35 lr 0.000553 wd 0.0500 time 0.4012 (0.4548) data time 0.0009 (0.0460) model time 0.0000 (0.0000) loss 8.2798 (7.1973) grad_norm 2.0340 (2.6530) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][30/625] eta 0:04:24 lr 0.000552 wd 0.0500 time 0.3978 (0.4443) data time 0.0007 (0.0315) model time 0.0000 (0.0000) loss 7.3461 (7.1953) grad_norm 2.3513 (2.8228) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][40/625] eta 0:04:13 lr 0.000552 wd 0.0500 time 0.4009 (0.4337) data time 0.0007 (0.0240) model time 0.0000 (0.0000) loss 6.9573 (7.1943) grad_norm 1.5574 (2.9029) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][50/625] eta 0:04:06 lr 0.000552 wd 0.0500 time 0.4042 (0.4280) data time 0.0009 (0.0206) model time 0.0000 (0.0000) loss 7.5695 (7.2184) grad_norm 1.8563 (2.9055) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][60/625] eta 0:03:59 lr 0.000552 wd 0.0500 time 0.4009 (0.4235) data time 0.0006 (0.0173) model time 0.4003 (0.3999) loss 8.1066 (7.2270) grad_norm 2.1915 (2.8528) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][70/625] eta 0:03:53 lr 0.000552 wd 0.0500 time 0.4031 (0.4206) data time 0.0007 (0.0151) model time 0.4024 (0.4007) loss 7.4083 (7.2543) grad_norm 3.7282 (2.8826) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][80/625] eta 0:03:47 lr 0.000552 wd 0.0500 time 0.4008 (0.4183) data time 0.0007 (0.0133) model time 0.4001 (0.4008) loss 6.4536 (7.2229) grad_norm 2.3271 (3.0727) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][90/625] eta 0:03:42 lr 0.000552 wd 0.0500 time 0.4010 (0.4164) data time 0.0009 (0.0120) model time 0.4001 (0.4006) loss 7.2448 (7.2693) grad_norm 4.7829 (3.1138) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][100/625] eta 0:03:37 lr 0.000552 wd 0.0500 time 0.3963 (0.4148) data time 0.0008 (0.0109) model time 0.3955 (0.4004) loss 7.4094 (7.2554) grad_norm 4.3916 (3.1549) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][110/625] eta 0:03:32 lr 0.000552 wd 0.0500 time 0.4066 (0.4134) data time 0.0007 (0.0100) model time 0.4060 (0.4001) loss 6.9248 (7.2469) grad_norm 3.1952 (3.1048) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][120/625] eta 0:03:28 lr 0.000551 wd 0.0500 time 0.4005 (0.4123) data time 0.0008 (0.0092) model time 0.3997 (0.3999) loss 6.6344 (7.2372) grad_norm 4.6155 (3.0710) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:17:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][130/625] eta 0:03:23 lr 0.000551 wd 0.0500 time 0.4007 (0.4113) data time 0.0008 (0.0086) model time 0.3999 (0.3997) loss 7.0946 (7.2483) grad_norm 1.7354 (inf) loss_scale 256.0000 (496.3664) mem 14939MB [2024-07-25 04:17:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][140/625] eta 0:03:19 lr 0.000551 wd 0.0500 time 0.4009 (0.4104) data time 0.0007 (0.0080) model time 0.4003 (0.3996) loss 6.1710 (7.2238) grad_norm 2.2996 (inf) loss_scale 256.0000 (479.3191) mem 14939MB [2024-07-25 04:17:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][150/625] eta 0:03:14 lr 0.000551 wd 0.0500 time 0.3972 (0.4096) data time 0.0006 (0.0076) model time 0.3966 (0.3993) loss 7.0694 (7.2190) grad_norm 2.1380 (inf) loss_scale 256.0000 (464.5298) mem 14939MB [2024-07-25 04:17:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][160/625] eta 0:03:10 lr 0.000551 wd 0.0500 time 0.3948 (0.4089) data time 0.0008 (0.0071) model time 0.3940 (0.3991) loss 5.5103 (7.2146) grad_norm 2.5852 (inf) loss_scale 256.0000 (451.5776) mem 14939MB [2024-07-25 04:18:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][170/625] eta 0:03:05 lr 0.000551 wd 0.0500 time 0.4000 (0.4084) data time 0.0006 (0.0068) model time 0.3994 (0.3991) loss 7.2402 (7.2154) grad_norm 2.6710 (inf) loss_scale 256.0000 (440.1404) mem 14939MB [2024-07-25 04:18:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][180/625] eta 0:03:01 lr 0.000551 wd 0.0500 time 0.3997 (0.4079) data time 0.0007 (0.0064) model time 0.3990 (0.3991) loss 6.7193 (7.2063) grad_norm 2.2087 (inf) loss_scale 256.0000 (429.9669) mem 14939MB [2024-07-25 04:18:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][190/625] eta 0:02:57 lr 0.000551 wd 0.0500 time 0.3989 (0.4075) data time 0.0007 (0.0062) model time 0.3983 (0.3991) loss 5.9664 (7.1958) grad_norm 1.9617 (inf) loss_scale 256.0000 (420.8586) mem 14939MB [2024-07-25 04:18:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][200/625] eta 0:02:54 lr 0.000551 wd 0.0500 time 0.3995 (0.4107) data time 0.0009 (0.0059) model time 0.3986 (0.4040) loss 6.8494 (7.1782) grad_norm 2.4060 (inf) loss_scale 256.0000 (412.6567) mem 14939MB [2024-07-25 04:18:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][210/625] eta 0:02:51 lr 0.000551 wd 0.0500 time 0.4013 (0.4143) data time 0.0008 (0.0057) model time 0.4005 (0.4091) loss 8.0663 (7.1841) grad_norm 3.3053 (inf) loss_scale 256.0000 (405.2322) mem 14939MB [2024-07-25 04:18:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][220/625] eta 0:02:49 lr 0.000550 wd 0.0500 time 0.5428 (0.4178) data time 0.0007 (0.0054) model time 0.5421 (0.4138) loss 7.3035 (7.1753) grad_norm 1.8757 (inf) loss_scale 256.0000 (398.4796) mem 14939MB [2024-07-25 04:18:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][230/625] eta 0:02:44 lr 0.000550 wd 0.0500 time 0.3935 (0.4176) data time 0.0007 (0.0052) model time 0.3928 (0.4138) loss 5.2698 (7.1623) grad_norm 1.8795 (inf) loss_scale 256.0000 (392.3117) mem 14939MB [2024-07-25 04:18:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][240/625] eta 0:02:40 lr 0.000550 wd 0.0500 time 0.4014 (0.4170) data time 0.0007 (0.0051) model time 0.4007 (0.4132) loss 7.1510 (7.1741) grad_norm 4.0382 (inf) loss_scale 256.0000 (386.6556) mem 14939MB [2024-07-25 04:18:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][250/625] eta 0:02:36 lr 0.000550 wd 0.0500 time 0.4017 (0.4172) data time 0.0006 (0.0049) model time 0.4011 (0.4136) loss 6.1928 (7.1835) grad_norm 1.9726 (inf) loss_scale 256.0000 (381.4502) mem 14939MB [2024-07-25 04:18:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][260/625] eta 0:02:32 lr 0.000550 wd 0.0500 time 0.4000 (0.4166) data time 0.0009 (0.0047) model time 0.3992 (0.4129) loss 7.5555 (7.1847) grad_norm 2.1819 (inf) loss_scale 256.0000 (376.6437) mem 14939MB [2024-07-25 04:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][270/625] eta 0:02:27 lr 0.000550 wd 0.0500 time 0.4081 (0.4160) data time 0.0007 (0.0046) model time 0.4074 (0.4123) loss 6.2721 (7.1882) grad_norm 2.0308 (inf) loss_scale 256.0000 (372.1919) mem 14939MB [2024-07-25 04:18:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][280/625] eta 0:02:23 lr 0.000550 wd 0.0500 time 0.3932 (0.4154) data time 0.0008 (0.0045) model time 0.3924 (0.4117) loss 5.6574 (7.1771) grad_norm 2.4534 (inf) loss_scale 256.0000 (368.0569) mem 14939MB [2024-07-25 04:18:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][290/625] eta 0:02:19 lr 0.000550 wd 0.0500 time 0.4010 (0.4151) data time 0.0007 (0.0044) model time 0.4002 (0.4114) loss 6.8082 (7.1787) grad_norm 2.3503 (inf) loss_scale 256.0000 (364.2062) mem 14939MB [2024-07-25 04:18:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][300/625] eta 0:02:14 lr 0.000550 wd 0.0500 time 0.3981 (0.4146) data time 0.0007 (0.0043) model time 0.3974 (0.4109) loss 6.6854 (7.1761) grad_norm 2.3299 (inf) loss_scale 256.0000 (360.6113) mem 14939MB [2024-07-25 04:18:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][310/625] eta 0:02:10 lr 0.000549 wd 0.0500 time 0.3982 (0.4142) data time 0.0008 (0.0042) model time 0.3974 (0.4105) loss 7.1248 (7.1805) grad_norm 1.7202 (inf) loss_scale 256.0000 (357.2476) mem 14939MB [2024-07-25 04:19:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][320/625] eta 0:02:06 lr 0.000549 wd 0.0500 time 0.4005 (0.4138) data time 0.0006 (0.0041) model time 0.3999 (0.4101) loss 8.2659 (7.1850) grad_norm 1.8389 (inf) loss_scale 256.0000 (354.0935) mem 14939MB [2024-07-25 04:19:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][330/625] eta 0:02:01 lr 0.000549 wd 0.0500 time 0.4007 (0.4134) data time 0.0008 (0.0040) model time 0.3999 (0.4097) loss 7.0925 (7.1731) grad_norm 2.9347 (inf) loss_scale 256.0000 (351.1299) mem 14939MB [2024-07-25 04:19:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][340/625] eta 0:01:57 lr 0.000549 wd 0.0500 time 0.3956 (0.4130) data time 0.0008 (0.0039) model time 0.3948 (0.4094) loss 7.4508 (7.1563) grad_norm 1.9293 (inf) loss_scale 256.0000 (348.3402) mem 14939MB [2024-07-25 04:19:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][350/625] eta 0:01:53 lr 0.000549 wd 0.0500 time 0.3997 (0.4127) data time 0.0008 (0.0038) model time 0.3989 (0.4091) loss 7.9849 (7.1553) grad_norm 3.9560 (inf) loss_scale 256.0000 (345.7094) mem 14939MB [2024-07-25 04:19:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][360/625] eta 0:01:49 lr 0.000549 wd 0.0500 time 0.3978 (0.4123) data time 0.0008 (0.0037) model time 0.3970 (0.4088) loss 7.1110 (7.1496) grad_norm 2.8596 (inf) loss_scale 256.0000 (343.2244) mem 14939MB [2024-07-25 04:19:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][370/625] eta 0:01:45 lr 0.000549 wd 0.0500 time 0.3959 (0.4121) data time 0.0008 (0.0037) model time 0.3951 (0.4087) loss 8.3457 (7.1546) grad_norm 4.5321 (inf) loss_scale 256.0000 (340.8733) mem 14939MB [2024-07-25 04:19:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][380/625] eta 0:01:40 lr 0.000549 wd 0.0500 time 0.3997 (0.4119) data time 0.0008 (0.0036) model time 0.3989 (0.4084) loss 6.2058 (7.1516) grad_norm 3.8824 (inf) loss_scale 256.0000 (338.6457) mem 14939MB [2024-07-25 04:19:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][390/625] eta 0:01:36 lr 0.000549 wd 0.0500 time 0.4044 (0.4116) data time 0.0007 (0.0035) model time 0.4038 (0.4082) loss 7.6181 (7.1439) grad_norm 5.4068 (inf) loss_scale 256.0000 (336.5320) mem 14939MB [2024-07-25 04:19:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][400/625] eta 0:01:32 lr 0.000549 wd 0.0500 time 0.3952 (0.4114) data time 0.0007 (0.0035) model time 0.3945 (0.4080) loss 7.4147 (7.1465) grad_norm 3.1859 (inf) loss_scale 256.0000 (334.5237) mem 14939MB [2024-07-25 04:19:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][410/625] eta 0:01:28 lr 0.000548 wd 0.0500 time 0.4178 (0.4113) data time 0.0009 (0.0034) model time 0.4169 (0.4079) loss 6.7376 (7.1451) grad_norm 2.6808 (inf) loss_scale 256.0000 (332.6131) mem 14939MB [2024-07-25 04:19:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][420/625] eta 0:01:24 lr 0.000548 wd 0.0500 time 0.5745 (0.4128) data time 0.0008 (0.0033) model time 0.5737 (0.4097) loss 7.2298 (7.1387) grad_norm 1.8288 (inf) loss_scale 256.0000 (330.7933) mem 14939MB [2024-07-25 04:19:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][430/625] eta 0:01:20 lr 0.000548 wd 0.0500 time 0.5912 (0.4146) data time 0.0008 (0.0033) model time 0.5904 (0.4119) loss 6.4317 (7.1322) grad_norm 6.9914 (inf) loss_scale 256.0000 (329.0580) mem 14939MB [2024-07-25 04:19:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][440/625] eta 0:01:16 lr 0.000548 wd 0.0500 time 0.5943 (0.4159) data time 0.0007 (0.0032) model time 0.5936 (0.4133) loss 8.2602 (7.1211) grad_norm 2.6174 (inf) loss_scale 256.0000 (327.4014) mem 14939MB [2024-07-25 04:19:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][450/625] eta 0:01:12 lr 0.000548 wd 0.0500 time 0.3981 (0.4158) data time 0.0009 (0.0032) model time 0.3972 (0.4133) loss 7.5601 (7.1135) grad_norm 2.4030 (inf) loss_scale 256.0000 (325.8182) mem 14939MB [2024-07-25 04:20:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][460/625] eta 0:01:08 lr 0.000548 wd 0.0500 time 0.4004 (0.4154) data time 0.0008 (0.0031) model time 0.3996 (0.4129) loss 7.6925 (7.1147) grad_norm 2.0602 (inf) loss_scale 256.0000 (324.3037) mem 14939MB [2024-07-25 04:20:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][470/625] eta 0:01:04 lr 0.000548 wd 0.0500 time 0.4007 (0.4155) data time 0.0008 (0.0031) model time 0.3999 (0.4130) loss 7.0675 (7.1098) grad_norm 1.9171 (inf) loss_scale 256.0000 (322.8535) mem 14939MB [2024-07-25 04:20:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][480/625] eta 0:01:00 lr 0.000548 wd 0.0500 time 0.3955 (0.4152) data time 0.0009 (0.0030) model time 0.3946 (0.4127) loss 7.4156 (7.1119) grad_norm 4.0172 (inf) loss_scale 256.0000 (321.4636) mem 14939MB [2024-07-25 04:20:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][490/625] eta 0:00:56 lr 0.000548 wd 0.0500 time 0.4020 (0.4149) data time 0.0009 (0.0030) model time 0.4012 (0.4124) loss 7.0291 (7.1099) grad_norm 1.9158 (inf) loss_scale 256.0000 (320.1303) mem 14939MB [2024-07-25 04:20:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][500/625] eta 0:00:51 lr 0.000547 wd 0.0500 time 0.3998 (0.4145) data time 0.0007 (0.0030) model time 0.3991 (0.4121) loss 7.9788 (7.1103) grad_norm 2.2806 (inf) loss_scale 256.0000 (318.8503) mem 14939MB [2024-07-25 04:20:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][510/625] eta 0:00:47 lr 0.000547 wd 0.0500 time 0.3994 (0.4142) data time 0.0006 (0.0029) model time 0.3988 (0.4117) loss 5.7867 (7.1101) grad_norm 2.1351 (inf) loss_scale 256.0000 (317.6204) mem 14939MB [2024-07-25 04:20:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][520/625] eta 0:00:43 lr 0.000547 wd 0.0500 time 0.3983 (0.4140) data time 0.0009 (0.0029) model time 0.3974 (0.4115) loss 7.7696 (7.1126) grad_norm 2.4938 (inf) loss_scale 256.0000 (316.4376) mem 14939MB [2024-07-25 04:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][530/625] eta 0:00:39 lr 0.000547 wd 0.0500 time 0.4121 (0.4137) data time 0.0009 (0.0028) model time 0.4112 (0.4112) loss 8.1988 (7.1056) grad_norm 2.0002 (inf) loss_scale 256.0000 (315.2994) mem 14939MB [2024-07-25 04:20:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][540/625] eta 0:00:35 lr 0.000547 wd 0.0500 time 0.3975 (0.4134) data time 0.0007 (0.0028) model time 0.3968 (0.4110) loss 6.5618 (7.1067) grad_norm 2.2529 (inf) loss_scale 256.0000 (314.2033) mem 14939MB [2024-07-25 04:20:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][550/625] eta 0:00:30 lr 0.000547 wd 0.0500 time 0.3970 (0.4132) data time 0.0008 (0.0028) model time 0.3962 (0.4107) loss 8.4600 (7.1127) grad_norm 2.5200 (inf) loss_scale 256.0000 (313.1470) mem 14939MB [2024-07-25 04:20:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][560/625] eta 0:00:26 lr 0.000547 wd 0.0500 time 0.4023 (0.4130) data time 0.0007 (0.0027) model time 0.4016 (0.4105) loss 6.5702 (7.1069) grad_norm 2.5107 (inf) loss_scale 256.0000 (312.1283) mem 14939MB [2024-07-25 04:20:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][570/625] eta 0:00:22 lr 0.000547 wd 0.0500 time 0.4023 (0.4127) data time 0.0007 (0.0027) model time 0.4016 (0.4103) loss 6.4293 (7.1076) grad_norm 2.6647 (inf) loss_scale 256.0000 (311.1454) mem 14939MB [2024-07-25 04:20:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][580/625] eta 0:00:18 lr 0.000547 wd 0.0500 time 0.3991 (0.4125) data time 0.0008 (0.0027) model time 0.3983 (0.4101) loss 8.6135 (7.1090) grad_norm 3.4974 (inf) loss_scale 256.0000 (310.1962) mem 14939MB [2024-07-25 04:20:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][590/625] eta 0:00:14 lr 0.000546 wd 0.0500 time 0.4018 (0.4123) data time 0.0007 (0.0026) model time 0.4011 (0.4098) loss 6.0829 (7.1051) grad_norm 4.1745 (inf) loss_scale 256.0000 (309.2792) mem 14939MB [2024-07-25 04:20:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][600/625] eta 0:00:10 lr 0.000546 wd 0.0500 time 0.3967 (0.4120) data time 0.0007 (0.0026) model time 0.3960 (0.4096) loss 6.1540 (7.1044) grad_norm 3.0345 (inf) loss_scale 256.0000 (308.3927) mem 14939MB [2024-07-25 04:21:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][610/625] eta 0:00:06 lr 0.000546 wd 0.0500 time 0.3968 (0.4118) data time 0.0004 (0.0026) model time 0.3964 (0.4094) loss 6.8285 (7.1042) grad_norm 4.6878 (inf) loss_scale 256.0000 (307.5352) mem 14939MB [2024-07-25 04:21:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][620/625] eta 0:00:02 lr 0.000546 wd 0.0500 time 0.4027 (0.4116) data time 0.0004 (0.0026) model time 0.4022 (0.4092) loss 6.3479 (7.1053) grad_norm 3.9026 (inf) loss_scale 256.0000 (306.7053) mem 14939MB [2024-07-25 04:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 168 training takes 0:04:17 [2024-07-25 04:21:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:21:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:21:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.5664 (0.5664) Acc@1 88.965 (88.965) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 04:21:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9214 (0.7082) Acc@1 79.883 (85.400) Acc@5 96.094 (97.496) Mem 14939MB [2024-07-25 04:21:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.0225 (0.8365) Acc@1 75.586 (82.001) Acc@5 94.873 (96.166) Mem 14939MB [2024-07-25 04:21:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.650 Acc@5 96.125 [2024-07-25 04:21:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 04:21:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.819 (0.819) Loss 0.5576 (0.5576) Acc@1 89.648 (89.648) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 04:21:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.156) Loss 0.8862 (0.6956) Acc@1 81.104 (86.124) Acc@5 96.143 (97.616) Mem 14939MB [2024-07-25 04:21:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 1.0195 (0.8175) Acc@1 75.732 (82.750) Acc@5 95.020 (96.377) Mem 14939MB [2024-07-25 04:21:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.344 Acc@5 96.345 [2024-07-25 04:21:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-07-25 04:21:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.34% [2024-07-25 04:21:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:21:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:21:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][0/625] eta 0:12:16 lr 0.000546 wd 0.0500 time 1.1782 (1.1782) data time 0.6024 (0.6024) model time 0.0000 (0.0000) loss 7.8384 (7.8384) grad_norm 2.1036 (2.1036) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][10/625] eta 0:05:07 lr 0.000546 wd 0.0500 time 0.3958 (0.5000) data time 0.0008 (0.0557) model time 0.0000 (0.0000) loss 7.7929 (7.5388) grad_norm 2.6182 (2.5371) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][20/625] eta 0:05:03 lr 0.000546 wd 0.0500 time 0.5817 (0.5017) data time 0.0007 (0.0296) model time 0.0000 (0.0000) loss 6.8050 (7.2124) grad_norm 2.2350 (2.6443) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][30/625] eta 0:05:03 lr 0.000546 wd 0.0500 time 0.5848 (0.5097) data time 0.0008 (0.0203) model time 0.0000 (0.0000) loss 7.3118 (7.1249) grad_norm 3.9538 (3.4703) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][40/625] eta 0:04:49 lr 0.000546 wd 0.0500 time 0.4000 (0.4957) data time 0.0009 (0.0156) model time 0.0000 (0.0000) loss 8.0023 (7.1309) grad_norm 2.2871 (3.4478) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][50/625] eta 0:04:35 lr 0.000546 wd 0.0500 time 0.3962 (0.4800) data time 0.0009 (0.0127) model time 0.0000 (0.0000) loss 6.2237 (7.1727) grad_norm 2.9356 (3.3294) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][60/625] eta 0:04:24 lr 0.000545 wd 0.0500 time 0.4130 (0.4673) data time 0.0007 (0.0108) model time 0.4123 (0.4017) loss 6.2383 (7.1176) grad_norm 4.6302 (3.3344) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][70/625] eta 0:04:14 lr 0.000545 wd 0.0500 time 0.3989 (0.4578) data time 0.0008 (0.0094) model time 0.3980 (0.4001) loss 7.5812 (7.0412) grad_norm 3.4181 (3.3231) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][80/625] eta 0:04:05 lr 0.000545 wd 0.0500 time 0.4017 (0.4511) data time 0.0008 (0.0084) model time 0.4008 (0.4009) loss 8.1701 (7.0859) grad_norm 2.4783 (3.5679) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:21:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][90/625] eta 0:03:58 lr 0.000545 wd 0.0500 time 0.4080 (0.4455) data time 0.0006 (0.0075) model time 0.4074 (0.4007) loss 8.0380 (7.0722) grad_norm 2.6662 (3.5250) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][100/625] eta 0:03:51 lr 0.000545 wd 0.0500 time 0.3978 (0.4409) data time 0.0009 (0.0069) model time 0.3969 (0.4001) loss 7.0907 (7.0931) grad_norm 2.0238 (3.4027) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][110/625] eta 0:03:45 lr 0.000545 wd 0.0500 time 0.3996 (0.4375) data time 0.0007 (0.0063) model time 0.3989 (0.4005) loss 7.8076 (7.1009) grad_norm 7.0879 (3.3981) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][120/625] eta 0:03:39 lr 0.000545 wd 0.0500 time 0.4008 (0.4344) data time 0.0006 (0.0059) model time 0.4002 (0.4003) loss 7.4467 (7.1320) grad_norm 2.3959 (3.3312) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][130/625] eta 0:03:33 lr 0.000545 wd 0.0500 time 0.3996 (0.4318) data time 0.0009 (0.0055) model time 0.3987 (0.4002) loss 7.5771 (7.1447) grad_norm 3.2228 (3.2692) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][140/625] eta 0:03:28 lr 0.000545 wd 0.0500 time 0.3971 (0.4297) data time 0.0007 (0.0052) model time 0.3964 (0.4002) loss 7.4637 (7.1449) grad_norm 3.9425 (3.2422) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][150/625] eta 0:03:23 lr 0.000545 wd 0.0500 time 0.4025 (0.4278) data time 0.0006 (0.0049) model time 0.4019 (0.4002) loss 6.1753 (7.1314) grad_norm 6.3883 (3.2543) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][160/625] eta 0:03:18 lr 0.000544 wd 0.0500 time 0.3971 (0.4261) data time 0.0007 (0.0047) model time 0.3964 (0.4002) loss 6.7601 (7.1202) grad_norm 4.2355 (3.2523) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][170/625] eta 0:03:13 lr 0.000544 wd 0.0500 time 0.4039 (0.4247) data time 0.0007 (0.0044) model time 0.4033 (0.4002) loss 5.6834 (7.0966) grad_norm 1.9229 (3.2074) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][180/625] eta 0:03:08 lr 0.000544 wd 0.0500 time 0.3989 (0.4233) data time 0.0008 (0.0042) model time 0.3980 (0.4002) loss 8.4600 (7.1209) grad_norm 1.7220 (3.1501) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][190/625] eta 0:03:03 lr 0.000544 wd 0.0500 time 0.3980 (0.4221) data time 0.0008 (0.0041) model time 0.3973 (0.4002) loss 8.4473 (7.1340) grad_norm 2.7307 (3.1283) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][200/625] eta 0:02:59 lr 0.000544 wd 0.0500 time 0.4017 (0.4212) data time 0.0007 (0.0039) model time 0.4010 (0.4003) loss 8.5090 (7.1437) grad_norm 2.4409 (3.1233) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][210/625] eta 0:02:54 lr 0.000544 wd 0.0500 time 0.4036 (0.4204) data time 0.0009 (0.0038) model time 0.4027 (0.4005) loss 7.6366 (7.1273) grad_norm 3.7439 (3.0994) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][220/625] eta 0:02:49 lr 0.000544 wd 0.0500 time 0.3998 (0.4195) data time 0.0007 (0.0036) model time 0.3991 (0.4004) loss 6.8781 (7.1446) grad_norm 1.9290 (3.1179) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][230/625] eta 0:02:46 lr 0.000544 wd 0.0500 time 0.5906 (0.4208) data time 0.0009 (0.0035) model time 0.5897 (0.4031) loss 5.9065 (7.1412) grad_norm 2.2024 (3.1426) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:22:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][240/625] eta 0:02:43 lr 0.000544 wd 0.0500 time 0.5756 (0.4242) data time 0.0007 (0.0034) model time 0.5749 (0.4083) loss 6.8777 (7.1363) grad_norm 1.9247 (3.1152) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][250/625] eta 0:02:40 lr 0.000543 wd 0.0500 time 0.4006 (0.4276) data time 0.0006 (0.0033) model time 0.4000 (0.4134) loss 7.1427 (7.1430) grad_norm 1.8883 (3.0851) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][260/625] eta 0:02:36 lr 0.000543 wd 0.0500 time 0.5384 (0.4292) data time 0.0006 (0.0032) model time 0.5377 (0.4160) loss 6.2259 (7.1423) grad_norm 2.0016 (3.0636) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][270/625] eta 0:02:32 lr 0.000543 wd 0.0500 time 0.4050 (0.4282) data time 0.0010 (0.0031) model time 0.4040 (0.4153) loss 5.8992 (7.1335) grad_norm 1.8946 (3.0259) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][280/625] eta 0:02:27 lr 0.000543 wd 0.0500 time 0.3942 (0.4273) data time 0.0009 (0.0031) model time 0.3933 (0.4146) loss 7.8964 (7.1309) grad_norm 2.4692 (3.0019) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][290/625] eta 0:02:22 lr 0.000543 wd 0.0500 time 0.3992 (0.4263) data time 0.0007 (0.0030) model time 0.3986 (0.4140) loss 6.9923 (7.1178) grad_norm 1.8446 (3.0016) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][300/625] eta 0:02:18 lr 0.000543 wd 0.0500 time 0.3944 (0.4254) data time 0.0008 (0.0029) model time 0.3936 (0.4133) loss 7.1202 (7.1131) grad_norm 2.7833 (2.9941) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][310/625] eta 0:02:13 lr 0.000543 wd 0.0500 time 0.3957 (0.4245) data time 0.0009 (0.0029) model time 0.3948 (0.4127) loss 5.0473 (7.1032) grad_norm 2.7836 (2.9711) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][320/625] eta 0:02:09 lr 0.000543 wd 0.0500 time 0.4023 (0.4237) data time 0.0008 (0.0028) model time 0.4015 (0.4122) loss 7.1355 (7.0957) grad_norm 1.7050 (2.9388) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][330/625] eta 0:02:04 lr 0.000543 wd 0.0500 time 0.3973 (0.4230) data time 0.0006 (0.0027) model time 0.3967 (0.4116) loss 6.7055 (7.1074) grad_norm 3.7375 (2.9426) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][340/625] eta 0:02:00 lr 0.000543 wd 0.0500 time 0.3980 (0.4222) data time 0.0007 (0.0027) model time 0.3973 (0.4112) loss 6.3696 (7.1097) grad_norm 2.9252 (2.9499) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][350/625] eta 0:01:55 lr 0.000542 wd 0.0500 time 0.3987 (0.4215) data time 0.0009 (0.0026) model time 0.3978 (0.4107) loss 7.0730 (7.1035) grad_norm 3.0885 (2.9399) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][360/625] eta 0:01:51 lr 0.000542 wd 0.0500 time 0.3972 (0.4209) data time 0.0008 (0.0026) model time 0.3964 (0.4103) loss 8.4409 (7.1010) grad_norm 1.6021 (2.9509) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][370/625] eta 0:01:47 lr 0.000542 wd 0.0500 time 0.3982 (0.4203) data time 0.0008 (0.0025) model time 0.3974 (0.4099) loss 5.9467 (7.0958) grad_norm 2.8365 (2.9536) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][380/625] eta 0:01:42 lr 0.000542 wd 0.0500 time 0.3975 (0.4198) data time 0.0007 (0.0025) model time 0.3968 (0.4095) loss 7.7629 (7.0995) grad_norm 2.1245 (2.9668) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:23:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][390/625] eta 0:01:38 lr 0.000542 wd 0.0500 time 0.3957 (0.4192) data time 0.0007 (0.0025) model time 0.3949 (0.4092) loss 6.2620 (7.0935) grad_norm 3.1963 (2.9607) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][400/625] eta 0:01:34 lr 0.000542 wd 0.0500 time 0.3984 (0.4187) data time 0.0007 (0.0024) model time 0.3977 (0.4089) loss 6.1611 (7.0972) grad_norm 2.6846 (2.9471) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][410/625] eta 0:01:29 lr 0.000542 wd 0.0500 time 0.4025 (0.4183) data time 0.0009 (0.0024) model time 0.4017 (0.4086) loss 5.7430 (7.1053) grad_norm 2.6370 (2.9751) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][420/625] eta 0:01:25 lr 0.000542 wd 0.0500 time 0.3980 (0.4179) data time 0.0009 (0.0023) model time 0.3970 (0.4084) loss 7.6675 (7.1019) grad_norm 2.3022 (2.9758) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][430/625] eta 0:01:21 lr 0.000542 wd 0.0500 time 0.3996 (0.4174) data time 0.0007 (0.0023) model time 0.3989 (0.4081) loss 6.6715 (7.0978) grad_norm 3.0107 (2.9744) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][440/625] eta 0:01:17 lr 0.000541 wd 0.0500 time 0.4006 (0.4170) data time 0.0007 (0.0023) model time 0.4000 (0.4079) loss 6.6919 (7.0891) grad_norm 2.9936 (2.9743) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][450/625] eta 0:01:13 lr 0.000541 wd 0.0500 time 0.5799 (0.4177) data time 0.0009 (0.0023) model time 0.5790 (0.4088) loss 6.1866 (7.0871) grad_norm 2.1481 (2.9761) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][460/625] eta 0:01:09 lr 0.000541 wd 0.0500 time 0.3941 (0.4184) data time 0.0007 (0.0022) model time 0.3934 (0.4098) loss 6.1880 (7.0806) grad_norm 2.6790 (2.9688) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][470/625] eta 0:01:05 lr 0.000541 wd 0.0500 time 0.3987 (0.4206) data time 0.0008 (0.0022) model time 0.3979 (0.4125) loss 7.5418 (7.0853) grad_norm 2.1922 (2.9537) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][480/625] eta 0:01:01 lr 0.000541 wd 0.0500 time 0.5910 (0.4218) data time 0.0007 (0.0022) model time 0.5902 (0.4140) loss 7.6462 (7.0906) grad_norm 4.1521 (2.9428) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][490/625] eta 0:00:56 lr 0.000541 wd 0.0500 time 0.3940 (0.4217) data time 0.0009 (0.0021) model time 0.3931 (0.4141) loss 7.9527 (7.0931) grad_norm 2.4671 (2.9732) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][500/625] eta 0:00:52 lr 0.000541 wd 0.0500 time 0.4015 (0.4213) data time 0.0008 (0.0021) model time 0.4006 (0.4137) loss 7.5825 (7.1014) grad_norm 2.7700 (2.9848) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][510/625] eta 0:00:48 lr 0.000541 wd 0.0500 time 0.4050 (0.4209) data time 0.0007 (0.0021) model time 0.4043 (0.4134) loss 7.3724 (7.1053) grad_norm 2.7471 (2.9766) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][520/625] eta 0:00:44 lr 0.000541 wd 0.0500 time 0.3956 (0.4205) data time 0.0006 (0.0021) model time 0.3949 (0.4132) loss 6.5570 (7.1086) grad_norm 2.8397 (2.9720) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:24:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][530/625] eta 0:00:39 lr 0.000540 wd 0.0500 time 0.3955 (0.4202) data time 0.0009 (0.0021) model time 0.3947 (0.4129) loss 7.9807 (7.1110) grad_norm 2.0736 (2.9680) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][540/625] eta 0:00:35 lr 0.000540 wd 0.0500 time 0.3992 (0.4199) data time 0.0007 (0.0020) model time 0.3985 (0.4127) loss 7.0691 (7.1126) grad_norm 1.8513 (2.9556) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][550/625] eta 0:00:31 lr 0.000540 wd 0.0500 time 0.4001 (0.4195) data time 0.0009 (0.0020) model time 0.3992 (0.4124) loss 6.4531 (7.1141) grad_norm 1.8937 (2.9391) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][560/625] eta 0:00:27 lr 0.000540 wd 0.0500 time 0.3962 (0.4191) data time 0.0009 (0.0020) model time 0.3953 (0.4121) loss 6.1280 (7.1115) grad_norm 3.4001 (2.9404) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][570/625] eta 0:00:23 lr 0.000540 wd 0.0500 time 0.4013 (0.4188) data time 0.0007 (0.0020) model time 0.4007 (0.4119) loss 6.2601 (7.1066) grad_norm 2.4995 (2.9431) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][580/625] eta 0:00:18 lr 0.000540 wd 0.0500 time 0.3991 (0.4185) data time 0.0007 (0.0020) model time 0.3984 (0.4116) loss 6.6850 (7.1137) grad_norm 1.6228 (2.9350) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][590/625] eta 0:00:14 lr 0.000540 wd 0.0500 time 0.4018 (0.4182) data time 0.0007 (0.0019) model time 0.4011 (0.4114) loss 7.0679 (7.1146) grad_norm 3.5113 (2.9465) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][600/625] eta 0:00:10 lr 0.000540 wd 0.0500 time 0.4046 (0.4179) data time 0.0006 (0.0019) model time 0.4040 (0.4112) loss 7.5869 (7.1157) grad_norm 2.9906 (2.9406) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][610/625] eta 0:00:06 lr 0.000540 wd 0.0500 time 0.3984 (0.4176) data time 0.0004 (0.0019) model time 0.3980 (0.4110) loss 6.3650 (7.1156) grad_norm 2.2731 (2.9341) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][620/625] eta 0:00:02 lr 0.000540 wd 0.0500 time 0.4002 (0.4173) data time 0.0004 (0.0019) model time 0.3998 (0.4107) loss 7.2174 (7.1125) grad_norm 3.3218 (2.9242) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 169 training takes 0:04:20 [2024-07-25 04:25:36 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:25:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:25:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5562 (0.5562) Acc@1 88.721 (88.721) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 04:25:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9023 (0.7006) Acc@1 79.395 (85.143) Acc@5 95.996 (97.572) Mem 14939MB [2024-07-25 04:25:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0449 (0.8318) Acc@1 75.293 (81.871) Acc@5 94.385 (96.198) Mem 14939MB [2024-07-25 04:25:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.604 Acc@5 96.165 [2024-07-25 04:25:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-07-25 04:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.838 (0.838) Loss 0.5576 (0.5576) Acc@1 89.648 (89.648) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 04:25:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.158) Loss 0.8843 (0.6950) Acc@1 81.152 (86.151) Acc@5 96.240 (97.621) Mem 14939MB [2024-07-25 04:25:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.123) Loss 1.0195 (0.8165) Acc@1 75.635 (82.782) Acc@5 94.971 (96.382) Mem 14939MB [2024-07-25 04:25:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.380 Acc@5 96.351 [2024-07-25 04:25:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 04:25:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.38% [2024-07-25 04:25:42 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:25:43 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:25:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][0/625] eta 0:07:28 lr 0.000539 wd 0.0500 time 0.7170 (0.7170) data time 0.3360 (0.3360) model time 0.0000 (0.0000) loss 7.7693 (7.7693) grad_norm 7.1021 (7.1021) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][10/625] eta 0:04:24 lr 0.000539 wd 0.0500 time 0.4013 (0.4294) data time 0.0008 (0.0315) model time 0.0000 (0.0000) loss 7.9080 (7.0414) grad_norm 1.8654 (3.6562) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][20/625] eta 0:04:11 lr 0.000539 wd 0.0500 time 0.3969 (0.4165) data time 0.0006 (0.0169) model time 0.0000 (0.0000) loss 8.1952 (6.9385) grad_norm 3.9579 (3.2674) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:25:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][30/625] eta 0:04:04 lr 0.000539 wd 0.0500 time 0.3989 (0.4109) data time 0.0007 (0.0118) model time 0.0000 (0.0000) loss 8.5698 (6.9608) grad_norm 3.1350 (3.2822) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][40/625] eta 0:03:58 lr 0.000539 wd 0.0500 time 0.3985 (0.4082) data time 0.0008 (0.0091) model time 0.0000 (0.0000) loss 7.4406 (7.0538) grad_norm 2.2124 (3.0965) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][50/625] eta 0:04:00 lr 0.000539 wd 0.0500 time 0.6024 (0.4180) data time 0.0007 (0.0075) model time 0.0000 (0.0000) loss 7.5717 (7.0426) grad_norm 2.9829 (3.1728) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][60/625] eta 0:04:03 lr 0.000539 wd 0.0500 time 0.5527 (0.4306) data time 0.0009 (0.0064) model time 0.5518 (0.4938) loss 6.9828 (7.0233) grad_norm 2.5408 (3.0509) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][70/625] eta 0:04:03 lr 0.000539 wd 0.0500 time 0.5937 (0.4390) data time 0.0007 (0.0057) model time 0.5930 (0.4915) loss 7.2491 (7.0441) grad_norm 6.4828 (3.0683) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][80/625] eta 0:03:59 lr 0.000539 wd 0.0500 time 0.3983 (0.4403) data time 0.0006 (0.0051) model time 0.3977 (0.4773) loss 7.3878 (7.1158) grad_norm 2.6281 (3.0339) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][90/625] eta 0:03:53 lr 0.000539 wd 0.0500 time 0.4145 (0.4362) data time 0.0007 (0.0046) model time 0.4139 (0.4585) loss 8.7843 (7.1826) grad_norm 5.3108 (3.0183) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][100/625] eta 0:03:47 lr 0.000538 wd 0.0500 time 0.3956 (0.4325) data time 0.0008 (0.0043) model time 0.3947 (0.4464) loss 7.2798 (7.1933) grad_norm 4.0482 (3.0314) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][110/625] eta 0:03:41 lr 0.000538 wd 0.0500 time 0.4025 (0.4294) data time 0.0007 (0.0040) model time 0.4019 (0.4382) loss 7.1774 (7.1917) grad_norm 3.2694 (3.0105) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][120/625] eta 0:03:35 lr 0.000538 wd 0.0500 time 0.3973 (0.4269) data time 0.0008 (0.0037) model time 0.3966 (0.4324) loss 7.2173 (7.2015) grad_norm 1.8637 (3.0103) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][130/625] eta 0:03:30 lr 0.000538 wd 0.0500 time 0.3966 (0.4247) data time 0.0007 (0.0035) model time 0.3959 (0.4280) loss 6.9362 (7.2015) grad_norm 2.3674 (3.0111) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][140/625] eta 0:03:25 lr 0.000538 wd 0.0500 time 0.3977 (0.4229) data time 0.0010 (0.0033) model time 0.3967 (0.4247) loss 7.0547 (7.1856) grad_norm 1.7061 (2.9714) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][150/625] eta 0:03:20 lr 0.000538 wd 0.0500 time 0.3991 (0.4212) data time 0.0007 (0.0032) model time 0.3984 (0.4219) loss 7.8817 (7.1749) grad_norm 1.6273 (2.9160) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][160/625] eta 0:03:15 lr 0.000538 wd 0.0500 time 0.3961 (0.4199) data time 0.0008 (0.0030) model time 0.3953 (0.4199) loss 6.6433 (7.1526) grad_norm 1.9802 (2.8941) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][170/625] eta 0:03:10 lr 0.000538 wd 0.0500 time 0.3989 (0.4188) data time 0.0007 (0.0029) model time 0.3981 (0.4182) loss 6.2693 (7.1257) grad_norm 3.6647 (2.9095) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:26:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][180/625] eta 0:03:05 lr 0.000538 wd 0.0500 time 0.4132 (0.4178) data time 0.0007 (0.0028) model time 0.4126 (0.4168) loss 8.2988 (7.1583) grad_norm 3.1557 (2.9013) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][190/625] eta 0:03:01 lr 0.000537 wd 0.0500 time 0.3947 (0.4170) data time 0.0008 (0.0027) model time 0.3939 (0.4157) loss 5.7069 (7.1618) grad_norm 3.4092 (2.8965) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][200/625] eta 0:02:56 lr 0.000537 wd 0.0500 time 0.3982 (0.4162) data time 0.0006 (0.0026) model time 0.3975 (0.4146) loss 6.6132 (7.1564) grad_norm 2.9165 (2.8922) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][210/625] eta 0:02:52 lr 0.000537 wd 0.0500 time 0.4034 (0.4154) data time 0.0007 (0.0025) model time 0.4027 (0.4136) loss 6.3032 (7.1593) grad_norm 2.5363 (2.8845) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][220/625] eta 0:02:48 lr 0.000537 wd 0.0500 time 0.3973 (0.4156) data time 0.0006 (0.0024) model time 0.3967 (0.4139) loss 6.2649 (7.1372) grad_norm 4.8280 (2.8682) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][230/625] eta 0:02:43 lr 0.000537 wd 0.0500 time 0.3984 (0.4152) data time 0.0008 (0.0024) model time 0.3976 (0.4134) loss 7.2923 (7.1462) grad_norm 2.2512 (2.8527) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][240/625] eta 0:02:39 lr 0.000537 wd 0.0500 time 0.4521 (0.4148) data time 0.0006 (0.0023) model time 0.4515 (0.4129) loss 8.1153 (7.1415) grad_norm 7.0659 (2.8587) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][250/625] eta 0:02:35 lr 0.000537 wd 0.0500 time 0.3983 (0.4142) data time 0.0008 (0.0023) model time 0.3975 (0.4122) loss 7.7334 (7.1308) grad_norm 4.5131 (2.8699) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][260/625] eta 0:02:30 lr 0.000537 wd 0.0500 time 0.3972 (0.4137) data time 0.0009 (0.0023) model time 0.3963 (0.4116) loss 6.8280 (7.1348) grad_norm 2.4774 (2.8793) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][270/625] eta 0:02:27 lr 0.000537 wd 0.0500 time 0.5519 (0.4158) data time 0.0009 (0.0023) model time 0.5510 (0.4142) loss 5.8579 (7.1416) grad_norm 2.2631 (2.8723) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][280/625] eta 0:02:24 lr 0.000537 wd 0.0500 time 0.6066 (0.4178) data time 0.0006 (0.0022) model time 0.6060 (0.4166) loss 6.0043 (7.1299) grad_norm 1.8929 (2.8570) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][290/625] eta 0:02:20 lr 0.000536 wd 0.0500 time 0.3972 (0.4200) data time 0.0009 (0.0022) model time 0.3963 (0.4193) loss 7.0756 (7.1363) grad_norm 1.7757 (2.8559) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][300/625] eta 0:02:16 lr 0.000536 wd 0.0500 time 0.3975 (0.4199) data time 0.0007 (0.0021) model time 0.3968 (0.4193) loss 6.5695 (7.1261) grad_norm 1.8335 (2.8399) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][310/625] eta 0:02:12 lr 0.000536 wd 0.0500 time 0.4037 (0.4193) data time 0.0008 (0.0021) model time 0.4029 (0.4185) loss 6.2127 (7.1144) grad_norm 1.8495 (2.8318) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:27:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][320/625] eta 0:02:07 lr 0.000536 wd 0.0500 time 0.3964 (0.4188) data time 0.0006 (0.0021) model time 0.3958 (0.4180) loss 7.1267 (7.1111) grad_norm 1.9761 (2.8185) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][330/625] eta 0:02:03 lr 0.000536 wd 0.0500 time 0.3963 (0.4182) data time 0.0008 (0.0020) model time 0.3954 (0.4172) loss 7.4967 (7.1100) grad_norm 1.8199 (2.8106) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][340/625] eta 0:01:59 lr 0.000536 wd 0.0500 time 0.3987 (0.4178) data time 0.0008 (0.0020) model time 0.3979 (0.4167) loss 7.3487 (7.1109) grad_norm 2.4876 (2.8040) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][350/625] eta 0:01:54 lr 0.000536 wd 0.0500 time 0.3970 (0.4172) data time 0.0006 (0.0020) model time 0.3964 (0.4161) loss 6.2110 (7.1175) grad_norm 6.5239 (2.8184) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][360/625] eta 0:01:50 lr 0.000536 wd 0.0500 time 0.3978 (0.4168) data time 0.0007 (0.0019) model time 0.3971 (0.4155) loss 7.3264 (7.1052) grad_norm 9.0896 (2.8308) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][370/625] eta 0:01:46 lr 0.000536 wd 0.0500 time 0.4044 (0.4163) data time 0.0009 (0.0019) model time 0.4036 (0.4150) loss 8.2083 (7.1148) grad_norm 3.9636 (2.8445) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][380/625] eta 0:01:41 lr 0.000535 wd 0.0500 time 0.3915 (0.4158) data time 0.0008 (0.0019) model time 0.3907 (0.4144) loss 7.7135 (7.1194) grad_norm 3.3630 (2.8659) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][390/625] eta 0:01:37 lr 0.000535 wd 0.0500 time 0.3941 (0.4154) data time 0.0006 (0.0019) model time 0.3935 (0.4140) loss 7.6909 (7.1153) grad_norm 3.3666 (2.8811) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][400/625] eta 0:01:33 lr 0.000535 wd 0.0500 time 0.3942 (0.4149) data time 0.0009 (0.0018) model time 0.3933 (0.4135) loss 7.5662 (7.1102) grad_norm 2.7826 (2.8743) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][410/625] eta 0:01:29 lr 0.000535 wd 0.0500 time 0.4039 (0.4145) data time 0.0007 (0.0018) model time 0.4032 (0.4130) loss 7.1544 (7.1136) grad_norm 2.0998 (2.8621) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][420/625] eta 0:01:24 lr 0.000535 wd 0.0500 time 0.4001 (0.4143) data time 0.0009 (0.0018) model time 0.3993 (0.4128) loss 7.7070 (7.1152) grad_norm 2.4922 (2.8556) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][430/625] eta 0:01:20 lr 0.000535 wd 0.0500 time 0.4002 (0.4139) data time 0.0006 (0.0018) model time 0.3995 (0.4124) loss 6.1172 (7.1178) grad_norm 2.3028 (2.8549) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][440/625] eta 0:01:16 lr 0.000535 wd 0.0500 time 0.3980 (0.4141) data time 0.0008 (0.0018) model time 0.3971 (0.4126) loss 7.7899 (7.1137) grad_norm 2.4456 (2.8500) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][450/625] eta 0:01:12 lr 0.000535 wd 0.0500 time 0.3976 (0.4138) data time 0.0010 (0.0017) model time 0.3966 (0.4122) loss 7.1942 (7.1073) grad_norm 2.3205 (2.8344) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][460/625] eta 0:01:08 lr 0.000535 wd 0.0500 time 0.4079 (0.4135) data time 0.0009 (0.0017) model time 0.4070 (0.4119) loss 6.4008 (7.1044) grad_norm 1.6817 (2.8218) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][470/625] eta 0:01:04 lr 0.000535 wd 0.0500 time 0.4017 (0.4133) data time 0.0009 (0.0017) model time 0.4008 (0.4117) loss 7.3311 (7.1063) grad_norm 2.1390 (2.8195) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][480/625] eta 0:00:59 lr 0.000534 wd 0.0500 time 0.4002 (0.4130) data time 0.0007 (0.0017) model time 0.3995 (0.4114) loss 7.4467 (7.1055) grad_norm 2.1634 (2.8167) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][490/625] eta 0:00:55 lr 0.000534 wd 0.0500 time 0.3968 (0.4139) data time 0.0009 (0.0017) model time 0.3959 (0.4125) loss 6.6457 (7.1039) grad_norm 2.1239 (2.8056) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][500/625] eta 0:00:51 lr 0.000534 wd 0.0500 time 0.5886 (0.4157) data time 0.0009 (0.0017) model time 0.5877 (0.4144) loss 7.5353 (7.1037) grad_norm 2.5704 (2.8078) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][510/625] eta 0:00:47 lr 0.000534 wd 0.0500 time 0.6033 (0.4172) data time 0.0008 (0.0016) model time 0.6025 (0.4161) loss 7.4040 (7.1017) grad_norm 3.2927 (2.8206) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][520/625] eta 0:00:43 lr 0.000534 wd 0.0500 time 0.3963 (0.4175) data time 0.0008 (0.0016) model time 0.3955 (0.4164) loss 8.5296 (7.1081) grad_norm 3.9734 (2.8308) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][530/625] eta 0:00:39 lr 0.000534 wd 0.0500 time 0.3971 (0.4171) data time 0.0006 (0.0016) model time 0.3965 (0.4161) loss 7.1984 (7.1084) grad_norm 2.5036 (2.8372) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][540/625] eta 0:00:35 lr 0.000534 wd 0.0500 time 0.4040 (0.4168) data time 0.0008 (0.0016) model time 0.4032 (0.4157) loss 6.6070 (7.1126) grad_norm 1.8119 (2.8275) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][550/625] eta 0:00:31 lr 0.000534 wd 0.0500 time 0.4003 (0.4165) data time 0.0010 (0.0016) model time 0.3993 (0.4154) loss 6.1640 (7.1140) grad_norm 2.0185 (2.8166) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][560/625] eta 0:00:27 lr 0.000534 wd 0.0500 time 0.3965 (0.4162) data time 0.0008 (0.0016) model time 0.3957 (0.4150) loss 7.7996 (7.1143) grad_norm 1.7548 (2.8033) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][570/625] eta 0:00:22 lr 0.000533 wd 0.0500 time 0.3986 (0.4159) data time 0.0007 (0.0016) model time 0.3979 (0.4147) loss 7.7243 (7.1156) grad_norm 3.6480 (2.7992) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][580/625] eta 0:00:18 lr 0.000533 wd 0.0500 time 0.4135 (0.4156) data time 0.0007 (0.0016) model time 0.4128 (0.4144) loss 6.7329 (7.1166) grad_norm 3.8460 (2.8269) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][590/625] eta 0:00:14 lr 0.000533 wd 0.0500 time 0.4001 (0.4154) data time 0.0009 (0.0016) model time 0.3992 (0.4141) loss 7.3673 (7.1152) grad_norm 1.8972 (2.8193) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][600/625] eta 0:00:10 lr 0.000533 wd 0.0500 time 0.3984 (0.4151) data time 0.0006 (0.0016) model time 0.3978 (0.4139) loss 7.7215 (7.1192) grad_norm 2.4026 (2.8135) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:29:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][610/625] eta 0:00:06 lr 0.000533 wd 0.0500 time 0.4021 (0.4149) data time 0.0006 (0.0016) model time 0.4015 (0.4136) loss 8.2327 (7.1144) grad_norm 2.4239 (2.8239) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][620/625] eta 0:00:02 lr 0.000533 wd 0.0500 time 0.3991 (0.4147) data time 0.0006 (0.0016) model time 0.3985 (0.4133) loss 6.1449 (7.1036) grad_norm 2.2777 (2.8244) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 170 training takes 0:04:19 [2024-07-25 04:30:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:30:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:30:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.443 (0.443) Loss 0.5684 (0.5684) Acc@1 89.160 (89.160) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 04:30:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9058 (0.7057) Acc@1 79.688 (85.316) Acc@5 96.143 (97.483) Mem 14939MB [2024-07-25 04:30:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0469 (0.8332) Acc@1 75.146 (81.999) Acc@5 94.238 (96.157) Mem 14939MB [2024-07-25 04:30:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.696 Acc@5 96.131 [2024-07-25 04:30:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-07-25 04:30:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.779 (0.779) Loss 0.5571 (0.5571) Acc@1 89.795 (89.795) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 04:30:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.156) Loss 0.8843 (0.6943) Acc@1 81.152 (86.146) Acc@5 96.240 (97.625) Mem 14939MB [2024-07-25 04:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0186 (0.8155) Acc@1 75.586 (82.796) Acc@5 95.068 (96.401) Mem 14939MB [2024-07-25 04:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.388 Acc@5 96.365 [2024-07-25 04:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 04:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.39% [2024-07-25 04:30:09 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:30:10 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:30:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][0/625] eta 0:07:42 lr 0.000533 wd 0.0500 time 0.7401 (0.7401) data time 0.3561 (0.3561) model time 0.0000 (0.0000) loss 6.7919 (6.7919) grad_norm 2.9539 (2.9539) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][10/625] eta 0:04:25 lr 0.000533 wd 0.0500 time 0.4117 (0.4310) data time 0.0006 (0.0332) model time 0.0000 (0.0000) loss 7.5848 (6.8495) grad_norm 3.4144 (2.3908) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][20/625] eta 0:04:11 lr 0.000533 wd 0.0500 time 0.3986 (0.4154) data time 0.0007 (0.0178) model time 0.0000 (0.0000) loss 7.0524 (6.9674) grad_norm 4.6772 (2.8548) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][30/625] eta 0:04:03 lr 0.000533 wd 0.0500 time 0.3997 (0.4100) data time 0.0006 (0.0123) model time 0.0000 (0.0000) loss 6.0432 (7.1170) grad_norm 3.0369 (2.9626) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][40/625] eta 0:03:58 lr 0.000532 wd 0.0500 time 0.3990 (0.4076) data time 0.0007 (0.0095) model time 0.0000 (0.0000) loss 6.1825 (7.1170) grad_norm 3.0296 (3.0034) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][50/625] eta 0:03:53 lr 0.000532 wd 0.0500 time 0.3970 (0.4056) data time 0.0009 (0.0078) model time 0.0000 (0.0000) loss 8.1083 (7.1945) grad_norm 2.7408 (3.0091) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][60/625] eta 0:03:48 lr 0.000532 wd 0.0500 time 0.4213 (0.4047) data time 0.0007 (0.0067) model time 0.4207 (0.3992) loss 7.1767 (7.1964) grad_norm 3.5740 (2.9424) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][70/625] eta 0:03:44 lr 0.000532 wd 0.0500 time 0.3983 (0.4039) data time 0.0010 (0.0059) model time 0.3973 (0.3987) loss 6.1125 (7.1814) grad_norm 2.7285 (2.9078) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][80/625] eta 0:03:39 lr 0.000532 wd 0.0500 time 0.3974 (0.4031) data time 0.0010 (0.0053) model time 0.3965 (0.3980) loss 7.2932 (7.1570) grad_norm 3.1325 (2.9013) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][90/625] eta 0:03:39 lr 0.000532 wd 0.0500 time 0.6033 (0.4103) data time 0.0008 (0.0048) model time 0.6025 (0.4154) loss 7.7866 (7.1868) grad_norm 2.9649 (2.8649) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][100/625] eta 0:03:41 lr 0.000532 wd 0.0500 time 0.5805 (0.4215) data time 0.0006 (0.0044) model time 0.5798 (0.4367) loss 6.3676 (7.1863) grad_norm 2.3548 (2.8418) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:30:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][110/625] eta 0:03:37 lr 0.000532 wd 0.0500 time 0.3981 (0.4226) data time 0.0006 (0.0041) model time 0.3975 (0.4361) loss 6.0964 (7.1221) grad_norm 1.9181 (2.9018) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][120/625] eta 0:03:34 lr 0.000532 wd 0.0500 time 0.4152 (0.4257) data time 0.0008 (0.0039) model time 0.4143 (0.4394) loss 8.0579 (7.1125) grad_norm 2.5954 (2.9020) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][130/625] eta 0:03:29 lr 0.000531 wd 0.0500 time 0.3963 (0.4236) data time 0.0007 (0.0036) model time 0.3956 (0.4341) loss 6.3852 (7.1263) grad_norm 3.1523 (2.8877) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][140/625] eta 0:03:24 lr 0.000531 wd 0.0500 time 0.3962 (0.4219) data time 0.0009 (0.0034) model time 0.3953 (0.4302) loss 6.5639 (7.1097) grad_norm 2.0474 (2.8495) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][150/625] eta 0:03:19 lr 0.000531 wd 0.0500 time 0.3971 (0.4203) data time 0.0006 (0.0033) model time 0.3965 (0.4269) loss 5.9364 (7.0857) grad_norm 3.6007 (2.8185) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][160/625] eta 0:03:14 lr 0.000531 wd 0.0500 time 0.4028 (0.4191) data time 0.0006 (0.0031) model time 0.4022 (0.4245) loss 7.0920 (7.0821) grad_norm 1.7040 (2.7953) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][170/625] eta 0:03:10 lr 0.000531 wd 0.0500 time 0.3983 (0.4181) data time 0.0006 (0.0030) model time 0.3977 (0.4225) loss 6.7588 (7.0804) grad_norm 3.9144 (2.8034) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][180/625] eta 0:03:05 lr 0.000531 wd 0.0500 time 0.4027 (0.4172) data time 0.0009 (0.0029) model time 0.4018 (0.4208) loss 6.9691 (7.0638) grad_norm 2.4357 (2.8265) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][190/625] eta 0:03:01 lr 0.000531 wd 0.0500 time 0.3954 (0.4162) data time 0.0013 (0.0028) model time 0.3941 (0.4191) loss 6.9900 (7.0601) grad_norm 2.1303 (2.8489) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][200/625] eta 0:02:56 lr 0.000531 wd 0.0500 time 0.3774 (0.4163) data time 0.0009 (0.0027) model time 0.3765 (0.4190) loss 7.5811 (7.0736) grad_norm 1.7794 (2.8701) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][210/625] eta 0:02:52 lr 0.000531 wd 0.0500 time 0.4001 (0.4154) data time 0.0006 (0.0026) model time 0.3994 (0.4176) loss 7.2504 (7.0635) grad_norm 4.1314 (2.8585) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][220/625] eta 0:02:47 lr 0.000531 wd 0.0500 time 0.3957 (0.4147) data time 0.0008 (0.0025) model time 0.3949 (0.4165) loss 8.2541 (7.0834) grad_norm 2.3926 (2.8554) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][230/625] eta 0:02:43 lr 0.000530 wd 0.0500 time 0.4021 (0.4140) data time 0.0008 (0.0025) model time 0.4014 (0.4154) loss 6.8957 (7.0647) grad_norm 5.0585 (2.8438) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][240/625] eta 0:02:39 lr 0.000530 wd 0.0500 time 0.3967 (0.4133) data time 0.0007 (0.0024) model time 0.3960 (0.4145) loss 5.7036 (7.0432) grad_norm 2.8604 (2.8618) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:31:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][250/625] eta 0:02:34 lr 0.000530 wd 0.0500 time 0.3956 (0.4127) data time 0.0009 (0.0023) model time 0.3947 (0.4136) loss 8.0307 (7.0569) grad_norm 1.6767 (2.8592) loss_scale 512.0000 (259.0598) mem 14939MB [2024-07-25 04:31:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][260/625] eta 0:02:30 lr 0.000530 wd 0.0500 time 0.3973 (0.4122) data time 0.0006 (0.0023) model time 0.3967 (0.4128) loss 7.7480 (7.0564) grad_norm 3.3117 (2.8542) loss_scale 512.0000 (268.7510) mem 14939MB [2024-07-25 04:32:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][270/625] eta 0:02:26 lr 0.000530 wd 0.0500 time 0.3971 (0.4117) data time 0.0009 (0.0022) model time 0.3962 (0.4122) loss 7.6742 (7.0663) grad_norm 2.5846 (2.8422) loss_scale 512.0000 (277.7269) mem 14939MB [2024-07-25 04:32:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][280/625] eta 0:02:21 lr 0.000530 wd 0.0500 time 0.3995 (0.4112) data time 0.0009 (0.0022) model time 0.3987 (0.4116) loss 7.4590 (7.0751) grad_norm 2.1584 (2.8161) loss_scale 512.0000 (286.0641) mem 14939MB [2024-07-25 04:32:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][290/625] eta 0:02:17 lr 0.000530 wd 0.0500 time 0.3991 (0.4109) data time 0.0009 (0.0021) model time 0.3982 (0.4111) loss 5.7951 (7.0698) grad_norm 3.4989 (2.8153) loss_scale 512.0000 (293.8282) mem 14939MB [2024-07-25 04:32:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][300/625] eta 0:02:13 lr 0.000530 wd 0.0500 time 0.6110 (0.4112) data time 0.0009 (0.0021) model time 0.6101 (0.4115) loss 7.2075 (7.0684) grad_norm 3.9473 (2.8338) loss_scale 512.0000 (301.0764) mem 14939MB [2024-07-25 04:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][310/625] eta 0:02:10 lr 0.000530 wd 0.0500 time 0.5945 (0.4135) data time 0.0006 (0.0021) model time 0.5939 (0.4141) loss 7.5761 (7.0686) grad_norm 3.8326 (2.8332) loss_scale 512.0000 (307.8585) mem 14939MB [2024-07-25 04:32:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][320/625] eta 0:02:06 lr 0.000529 wd 0.0500 time 0.5603 (0.4155) data time 0.0008 (0.0020) model time 0.5595 (0.4164) loss 7.9646 (7.0819) grad_norm 2.2169 (2.8245) loss_scale 512.0000 (314.2181) mem 14939MB [2024-07-25 04:32:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][330/625] eta 0:02:02 lr 0.000529 wd 0.0500 time 0.4000 (0.4161) data time 0.0008 (0.0020) model time 0.3992 (0.4171) loss 6.4823 (7.0843) grad_norm 1.5951 (2.8182) loss_scale 512.0000 (320.1934) mem 14939MB [2024-07-25 04:32:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][340/625] eta 0:01:58 lr 0.000529 wd 0.0500 time 0.3973 (0.4166) data time 0.0007 (0.0020) model time 0.3966 (0.4176) loss 8.4888 (7.0891) grad_norm 2.4717 (2.8135) loss_scale 512.0000 (325.8182) mem 14939MB [2024-07-25 04:32:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][350/625] eta 0:01:54 lr 0.000529 wd 0.0500 time 0.3987 (0.4161) data time 0.0008 (0.0020) model time 0.3979 (0.4169) loss 6.7936 (7.0826) grad_norm 2.4145 (2.7976) loss_scale 512.0000 (331.1225) mem 14939MB [2024-07-25 04:32:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][360/625] eta 0:01:50 lr 0.000529 wd 0.0500 time 0.3987 (0.4156) data time 0.0008 (0.0019) model time 0.3978 (0.4164) loss 7.7775 (7.0872) grad_norm 4.1046 (2.7988) loss_scale 512.0000 (336.1330) mem 14939MB [2024-07-25 04:32:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][370/625] eta 0:01:45 lr 0.000529 wd 0.0500 time 0.3975 (0.4152) data time 0.0007 (0.0019) model time 0.3968 (0.4158) loss 7.6768 (7.0906) grad_norm 2.7877 (2.8836) loss_scale 512.0000 (340.8733) mem 14939MB [2024-07-25 04:32:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][380/625] eta 0:01:41 lr 0.000529 wd 0.0500 time 0.3967 (0.4148) data time 0.0008 (0.0019) model time 0.3959 (0.4152) loss 7.0382 (7.0813) grad_norm 2.0374 (2.8837) loss_scale 512.0000 (345.3648) mem 14939MB [2024-07-25 04:32:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][390/625] eta 0:01:37 lr 0.000529 wd 0.0500 time 0.3961 (0.4144) data time 0.0008 (0.0018) model time 0.3953 (0.4147) loss 7.2789 (7.0868) grad_norm 1.7899 (2.8750) loss_scale 512.0000 (349.6266) mem 14939MB [2024-07-25 04:32:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][400/625] eta 0:01:33 lr 0.000529 wd 0.0500 time 0.4029 (0.4139) data time 0.0008 (0.0018) model time 0.4020 (0.4142) loss 7.2055 (7.0929) grad_norm 2.1662 (2.8688) loss_scale 512.0000 (353.6758) mem 14939MB [2024-07-25 04:33:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][410/625] eta 0:01:28 lr 0.000529 wd 0.0500 time 0.3975 (0.4136) data time 0.0007 (0.0018) model time 0.3968 (0.4137) loss 6.2422 (7.0891) grad_norm 3.3164 (2.8656) loss_scale 512.0000 (357.5280) mem 14939MB [2024-07-25 04:33:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][420/625] eta 0:01:24 lr 0.000528 wd 0.0500 time 0.5946 (0.4137) data time 0.0007 (0.0018) model time 0.5940 (0.4139) loss 7.9857 (7.0964) grad_norm 1.6901 (2.8493) loss_scale 512.0000 (361.1971) mem 14939MB [2024-07-25 04:33:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][430/625] eta 0:01:20 lr 0.000528 wd 0.0500 time 0.4008 (0.4134) data time 0.0008 (0.0018) model time 0.4001 (0.4135) loss 7.2291 (7.0917) grad_norm 4.7675 (2.8450) loss_scale 512.0000 (364.6961) mem 14939MB [2024-07-25 04:33:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][440/625] eta 0:01:16 lr 0.000528 wd 0.0500 time 0.3996 (0.4131) data time 0.0009 (0.0017) model time 0.3988 (0.4131) loss 8.0162 (7.1026) grad_norm 3.4928 (2.8467) loss_scale 512.0000 (368.0363) mem 14939MB [2024-07-25 04:33:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][450/625] eta 0:01:12 lr 0.000528 wd 0.0500 time 0.3997 (0.4128) data time 0.0007 (0.0017) model time 0.3990 (0.4127) loss 7.9006 (7.1051) grad_norm 1.8817 (2.8493) loss_scale 512.0000 (371.2284) mem 14939MB [2024-07-25 04:33:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][460/625] eta 0:01:08 lr 0.000528 wd 0.0500 time 0.4004 (0.4125) data time 0.0008 (0.0017) model time 0.3996 (0.4124) loss 8.0124 (7.1042) grad_norm 2.7405 (2.8594) loss_scale 512.0000 (374.2820) mem 14939MB [2024-07-25 04:33:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][470/625] eta 0:01:03 lr 0.000528 wd 0.0500 time 0.3996 (0.4122) data time 0.0008 (0.0017) model time 0.3988 (0.4121) loss 5.4091 (7.1020) grad_norm 3.0849 (2.8636) loss_scale 512.0000 (377.2059) mem 14939MB [2024-07-25 04:33:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][480/625] eta 0:00:59 lr 0.000528 wd 0.0500 time 0.3979 (0.4119) data time 0.0009 (0.0017) model time 0.3970 (0.4118) loss 8.2988 (7.1047) grad_norm 2.8925 (2.8598) loss_scale 512.0000 (380.0083) mem 14939MB [2024-07-25 04:33:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][490/625] eta 0:00:55 lr 0.000528 wd 0.0500 time 0.4020 (0.4117) data time 0.0009 (0.0017) model time 0.4012 (0.4115) loss 7.7074 (7.1115) grad_norm 20.0372 (2.8905) loss_scale 512.0000 (382.6965) mem 14939MB [2024-07-25 04:33:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][500/625] eta 0:00:51 lr 0.000528 wd 0.0500 time 0.3984 (0.4115) data time 0.0008 (0.0016) model time 0.3977 (0.4113) loss 7.1858 (7.1075) grad_norm 4.3504 (2.8909) loss_scale 512.0000 (385.2774) mem 14939MB [2024-07-25 04:33:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][510/625] eta 0:00:47 lr 0.000527 wd 0.0500 time 0.4008 (0.4113) data time 0.0008 (0.0016) model time 0.4000 (0.4110) loss 6.5642 (7.0991) grad_norm 4.8866 (2.9262) loss_scale 512.0000 (387.7573) mem 14939MB [2024-07-25 04:33:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][520/625] eta 0:00:43 lr 0.000527 wd 0.0500 time 0.5947 (0.4114) data time 0.0007 (0.0016) model time 0.5940 (0.4111) loss 7.3328 (7.1084) grad_norm 3.8179 (2.9418) loss_scale 512.0000 (390.1420) mem 14939MB [2024-07-25 04:33:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][530/625] eta 0:00:39 lr 0.000527 wd 0.0500 time 0.3975 (0.4126) data time 0.0007 (0.0016) model time 0.3968 (0.4125) loss 6.1896 (7.1022) grad_norm 2.5808 (2.9322) loss_scale 512.0000 (392.4369) mem 14939MB [2024-07-25 04:33:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][540/625] eta 0:00:35 lr 0.000527 wd 0.0500 time 0.5788 (0.4140) data time 0.0009 (0.0016) model time 0.5779 (0.4139) loss 6.1653 (7.1023) grad_norm 1.8806 (2.9179) loss_scale 512.0000 (394.6470) mem 14939MB [2024-07-25 04:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][550/625] eta 0:00:31 lr 0.000527 wd 0.0500 time 0.4066 (0.4151) data time 0.0006 (0.0016) model time 0.4060 (0.4151) loss 7.9410 (7.0910) grad_norm 2.3122 (2.9229) loss_scale 512.0000 (396.7768) mem 14939MB [2024-07-25 04:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][560/625] eta 0:00:27 lr 0.000527 wd 0.0500 time 0.4041 (0.4154) data time 0.0007 (0.0016) model time 0.4034 (0.4154) loss 7.7725 (7.0939) grad_norm 2.7530 (2.9212) loss_scale 512.0000 (398.8307) mem 14939MB [2024-07-25 04:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][570/625] eta 0:00:22 lr 0.000527 wd 0.0500 time 0.3982 (0.4151) data time 0.0007 (0.0016) model time 0.3975 (0.4151) loss 6.6712 (7.0915) grad_norm 1.7233 (2.9094) loss_scale 512.0000 (400.8126) mem 14939MB [2024-07-25 04:34:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][580/625] eta 0:00:18 lr 0.000527 wd 0.0500 time 0.3990 (0.4149) data time 0.0006 (0.0015) model time 0.3983 (0.4148) loss 7.5896 (7.0908) grad_norm 2.5404 (2.9119) loss_scale 512.0000 (402.7263) mem 14939MB [2024-07-25 04:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][590/625] eta 0:00:14 lr 0.000527 wd 0.0500 time 0.4063 (0.4146) data time 0.0006 (0.0015) model time 0.4057 (0.4145) loss 7.8689 (7.0901) grad_norm 1.9141 (2.9119) loss_scale 512.0000 (404.5753) mem 14939MB [2024-07-25 04:34:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][600/625] eta 0:00:10 lr 0.000527 wd 0.0500 time 0.3929 (0.4144) data time 0.0009 (0.0015) model time 0.3920 (0.4143) loss 7.5825 (7.0848) grad_norm 2.0053 (2.9114) loss_scale 512.0000 (406.3627) mem 14939MB [2024-07-25 04:34:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][610/625] eta 0:00:06 lr 0.000526 wd 0.0500 time 0.3944 (0.4142) data time 0.0008 (0.0015) model time 0.3936 (0.4140) loss 7.8899 (7.0887) grad_norm 2.2277 (2.9076) loss_scale 512.0000 (408.0917) mem 14939MB [2024-07-25 04:34:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][620/625] eta 0:00:02 lr 0.000526 wd 0.0500 time 0.3975 (0.4139) data time 0.0004 (0.0015) model time 0.3971 (0.4137) loss 6.4664 (7.0888) grad_norm 1.6418 (2.9241) loss_scale 512.0000 (409.7649) mem 14939MB [2024-07-25 04:34:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 171 training takes 0:04:18 [2024-07-25 04:34:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:34:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.024 (1.024) Loss 0.5859 (0.5859) Acc@1 88.477 (88.477) Acc@5 98.438 (98.438) Mem 14939MB [2024-07-25 04:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.171) Loss 0.9312 (0.7193) Acc@1 79.980 (85.587) Acc@5 95.898 (97.514) Mem 14939MB [2024-07-25 04:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.130) Loss 1.0518 (0.8500) Acc@1 75.684 (82.178) Acc@5 94.775 (96.119) Mem 14939MB [2024-07-25 04:34:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.762 Acc@5 96.095 [2024-07-25 04:34:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-07-25 04:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.770 (0.770) Loss 0.5566 (0.5566) Acc@1 89.844 (89.844) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 04:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.8823 (0.6933) Acc@1 81.201 (86.186) Acc@5 96.191 (97.625) Mem 14939MB [2024-07-25 04:34:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0176 (0.8144) Acc@1 75.684 (82.812) Acc@5 95.117 (96.405) Mem 14939MB [2024-07-25 04:34:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.406 Acc@5 96.373 [2024-07-25 04:34:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 04:34:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.41% [2024-07-25 04:34:36 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:34:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:34:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][0/625] eta 0:07:34 lr 0.000526 wd 0.0500 time 0.7277 (0.7277) data time 0.3472 (0.3472) model time 0.0000 (0.0000) loss 6.3119 (6.3119) grad_norm 2.9399 (2.9399) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:34:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][10/625] eta 0:04:23 lr 0.000526 wd 0.0500 time 0.3993 (0.4291) data time 0.0009 (0.0325) model time 0.0000 (0.0000) loss 7.3363 (6.8786) grad_norm 2.7764 (3.0959) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:34:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][20/625] eta 0:04:11 lr 0.000526 wd 0.0500 time 0.3958 (0.4153) data time 0.0009 (0.0175) model time 0.0000 (0.0000) loss 8.4840 (6.9879) grad_norm 4.0677 (3.1627) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:34:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][30/625] eta 0:04:04 lr 0.000526 wd 0.0500 time 0.4129 (0.4106) data time 0.0006 (0.0121) model time 0.0000 (0.0000) loss 6.3948 (6.9621) grad_norm 2.8949 (3.1155) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:34:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][40/625] eta 0:03:58 lr 0.000526 wd 0.0500 time 0.3948 (0.4079) data time 0.0009 (0.0094) model time 0.0000 (0.0000) loss 6.2538 (6.9974) grad_norm 3.5655 (3.8210) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:34:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][50/625] eta 0:03:53 lr 0.000526 wd 0.0500 time 0.3977 (0.4058) data time 0.0007 (0.0077) model time 0.0000 (0.0000) loss 7.2774 (7.0086) grad_norm 2.4387 (3.6986) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][60/625] eta 0:03:48 lr 0.000526 wd 0.0500 time 0.4003 (0.4046) data time 0.0009 (0.0066) model time 0.3995 (0.3977) loss 7.6496 (7.0224) grad_norm 2.3798 (3.4749) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][70/625] eta 0:03:44 lr 0.000526 wd 0.0500 time 0.3955 (0.4041) data time 0.0009 (0.0058) model time 0.3946 (0.3989) loss 7.9464 (7.0718) grad_norm 2.4988 (3.4041) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][80/625] eta 0:03:39 lr 0.000525 wd 0.0500 time 0.3963 (0.4034) data time 0.0009 (0.0052) model time 0.3954 (0.3982) loss 7.1665 (7.1026) grad_norm 3.2190 (3.3349) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][90/625] eta 0:03:35 lr 0.000525 wd 0.0500 time 0.4032 (0.4030) data time 0.0009 (0.0048) model time 0.4023 (0.3983) loss 6.2114 (7.0415) grad_norm 1.8967 (3.2782) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][100/625] eta 0:03:31 lr 0.000525 wd 0.0500 time 0.3987 (0.4028) data time 0.0009 (0.0044) model time 0.3979 (0.3987) loss 6.6706 (7.0266) grad_norm 2.0757 (3.2233) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][110/625] eta 0:03:27 lr 0.000525 wd 0.0500 time 0.4053 (0.4025) data time 0.0007 (0.0041) model time 0.4046 (0.3987) loss 6.1502 (7.0322) grad_norm 2.3213 (3.1490) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][120/625] eta 0:03:24 lr 0.000525 wd 0.0500 time 0.3988 (0.4054) data time 0.0009 (0.0038) model time 0.3979 (0.4042) loss 7.0812 (7.0510) grad_norm 3.2745 (3.1119) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][130/625] eta 0:03:24 lr 0.000525 wd 0.0500 time 0.5910 (0.4122) data time 0.0009 (0.0036) model time 0.5901 (0.4153) loss 7.0428 (7.0533) grad_norm 2.0698 (3.1050) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][140/625] eta 0:03:23 lr 0.000525 wd 0.0500 time 0.5984 (0.4188) data time 0.0007 (0.0034) model time 0.5977 (0.4252) loss 7.5310 (7.0387) grad_norm 2.2300 (3.0821) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][150/625] eta 0:03:19 lr 0.000525 wd 0.0500 time 0.5912 (0.4200) data time 0.0007 (0.0032) model time 0.5905 (0.4262) loss 7.8373 (7.0420) grad_norm 1.8629 (3.0326) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][160/625] eta 0:03:15 lr 0.000525 wd 0.0500 time 0.3965 (0.4197) data time 0.0009 (0.0031) model time 0.3956 (0.4252) loss 6.8227 (7.0281) grad_norm 3.1574 (2.9942) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][170/625] eta 0:03:10 lr 0.000524 wd 0.0500 time 0.3977 (0.4186) data time 0.0006 (0.0030) model time 0.3971 (0.4231) loss 8.8349 (7.0077) grad_norm 2.8001 (2.9650) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:35:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][180/625] eta 0:03:05 lr 0.000524 wd 0.0500 time 0.3873 (0.4174) data time 0.0009 (0.0028) model time 0.3865 (0.4210) loss 6.8622 (7.0276) grad_norm inf (inf) loss_scale 256.0000 (510.5856) mem 14939MB [2024-07-25 04:35:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][190/625] eta 0:03:01 lr 0.000524 wd 0.0500 time 0.4013 (0.4174) data time 0.0008 (0.0027) model time 0.4005 (0.4207) loss 6.5249 (7.0086) grad_norm 2.2707 (inf) loss_scale 256.0000 (497.2565) mem 14939MB [2024-07-25 04:36:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][200/625] eta 0:02:56 lr 0.000524 wd 0.0500 time 0.3959 (0.4165) data time 0.0007 (0.0027) model time 0.3952 (0.4192) loss 7.5274 (7.0296) grad_norm 1.7935 (inf) loss_scale 256.0000 (485.2537) mem 14939MB [2024-07-25 04:36:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][210/625] eta 0:02:52 lr 0.000524 wd 0.0500 time 0.4007 (0.4157) data time 0.0009 (0.0026) model time 0.3998 (0.4180) loss 7.4327 (7.0264) grad_norm 2.7949 (inf) loss_scale 256.0000 (474.3886) mem 14939MB [2024-07-25 04:36:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][220/625] eta 0:02:48 lr 0.000524 wd 0.0500 time 0.4009 (0.4150) data time 0.0006 (0.0025) model time 0.4003 (0.4168) loss 7.1218 (7.0299) grad_norm 2.2697 (inf) loss_scale 256.0000 (464.5068) mem 14939MB [2024-07-25 04:36:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][230/625] eta 0:02:43 lr 0.000524 wd 0.0500 time 0.4011 (0.4143) data time 0.0006 (0.0024) model time 0.4005 (0.4158) loss 7.3764 (7.0466) grad_norm 2.1850 (inf) loss_scale 256.0000 (455.4805) mem 14939MB [2024-07-25 04:36:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][240/625] eta 0:02:39 lr 0.000524 wd 0.0500 time 0.3993 (0.4137) data time 0.0007 (0.0024) model time 0.3987 (0.4148) loss 7.1349 (7.0559) grad_norm 2.6675 (inf) loss_scale 256.0000 (447.2033) mem 14939MB [2024-07-25 04:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][250/625] eta 0:02:34 lr 0.000524 wd 0.0500 time 0.3983 (0.4131) data time 0.0006 (0.0023) model time 0.3976 (0.4140) loss 6.5980 (7.0498) grad_norm 13.2848 (inf) loss_scale 256.0000 (439.5857) mem 14939MB [2024-07-25 04:36:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][260/625] eta 0:02:30 lr 0.000524 wd 0.0500 time 0.4064 (0.4126) data time 0.0006 (0.0023) model time 0.4058 (0.4133) loss 7.8945 (7.0614) grad_norm 2.0317 (inf) loss_scale 256.0000 (432.5517) mem 14939MB [2024-07-25 04:36:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][270/625] eta 0:02:26 lr 0.000523 wd 0.0500 time 0.4011 (0.4121) data time 0.0009 (0.0022) model time 0.4002 (0.4126) loss 7.7742 (7.0562) grad_norm 2.6338 (inf) loss_scale 256.0000 (426.0369) mem 14939MB [2024-07-25 04:36:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][280/625] eta 0:02:22 lr 0.000523 wd 0.0500 time 0.4026 (0.4116) data time 0.0006 (0.0022) model time 0.4020 (0.4120) loss 7.0650 (7.0757) grad_norm 2.9718 (inf) loss_scale 256.0000 (419.9858) mem 14939MB [2024-07-25 04:36:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][290/625] eta 0:02:17 lr 0.000523 wd 0.0500 time 0.3976 (0.4112) data time 0.0009 (0.0021) model time 0.3967 (0.4114) loss 5.7786 (7.0830) grad_norm 2.4787 (inf) loss_scale 256.0000 (414.3505) mem 14939MB [2024-07-25 04:36:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][300/625] eta 0:02:13 lr 0.000523 wd 0.0500 time 0.3976 (0.4108) data time 0.0007 (0.0021) model time 0.3969 (0.4109) loss 6.1712 (7.0908) grad_norm 2.9788 (inf) loss_scale 256.0000 (409.0897) mem 14939MB [2024-07-25 04:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][310/625] eta 0:02:09 lr 0.000523 wd 0.0500 time 0.4195 (0.4105) data time 0.0006 (0.0020) model time 0.4188 (0.4105) loss 7.7936 (7.0989) grad_norm 3.1559 (inf) loss_scale 256.0000 (404.1672) mem 14939MB [2024-07-25 04:36:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][320/625] eta 0:02:05 lr 0.000523 wd 0.0500 time 0.3989 (0.4102) data time 0.0008 (0.0020) model time 0.3980 (0.4100) loss 6.3786 (7.0995) grad_norm 2.1426 (inf) loss_scale 256.0000 (399.5514) mem 14939MB [2024-07-25 04:36:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][330/625] eta 0:02:00 lr 0.000523 wd 0.0500 time 0.3968 (0.4098) data time 0.0008 (0.0020) model time 0.3960 (0.4096) loss 7.6454 (7.0945) grad_norm 3.2363 (inf) loss_scale 256.0000 (395.2145) mem 14939MB [2024-07-25 04:36:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][340/625] eta 0:01:57 lr 0.000523 wd 0.0500 time 0.5963 (0.4107) data time 0.0009 (0.0020) model time 0.5954 (0.4106) loss 7.3205 (7.0919) grad_norm 3.1224 (inf) loss_scale 256.0000 (391.1320) mem 14939MB [2024-07-25 04:37:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][350/625] eta 0:01:53 lr 0.000523 wd 0.0500 time 0.6027 (0.4123) data time 0.0009 (0.0019) model time 0.6018 (0.4125) loss 7.7723 (7.0832) grad_norm 3.8545 (inf) loss_scale 256.0000 (387.2821) mem 14939MB [2024-07-25 04:37:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][360/625] eta 0:01:49 lr 0.000522 wd 0.0500 time 0.5589 (0.4150) data time 0.0007 (0.0019) model time 0.5583 (0.4156) loss 7.5217 (7.0856) grad_norm 2.3392 (inf) loss_scale 256.0000 (383.6454) mem 14939MB [2024-07-25 04:37:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][370/625] eta 0:01:46 lr 0.000522 wd 0.0500 time 0.5832 (0.4166) data time 0.0007 (0.0019) model time 0.5825 (0.4174) loss 6.1168 (7.0694) grad_norm 1.9095 (inf) loss_scale 256.0000 (380.2049) mem 14939MB [2024-07-25 04:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][380/625] eta 0:01:41 lr 0.000522 wd 0.0500 time 0.3964 (0.4161) data time 0.0008 (0.0019) model time 0.3955 (0.4168) loss 6.3347 (7.0607) grad_norm 2.3800 (inf) loss_scale 256.0000 (376.9449) mem 14939MB [2024-07-25 04:37:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][390/625] eta 0:01:37 lr 0.000522 wd 0.0500 time 0.3979 (0.4156) data time 0.0008 (0.0018) model time 0.3971 (0.4162) loss 7.8321 (7.0664) grad_norm 2.0117 (inf) loss_scale 256.0000 (373.8517) mem 14939MB [2024-07-25 04:37:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][400/625] eta 0:01:33 lr 0.000522 wd 0.0500 time 0.3976 (0.4153) data time 0.0008 (0.0018) model time 0.3967 (0.4157) loss 7.1319 (7.0724) grad_norm 2.3788 (inf) loss_scale 256.0000 (370.9127) mem 14939MB [2024-07-25 04:37:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][410/625] eta 0:01:29 lr 0.000522 wd 0.0500 time 0.3974 (0.4154) data time 0.0006 (0.0018) model time 0.3968 (0.4158) loss 7.4475 (7.0701) grad_norm 2.0357 (inf) loss_scale 256.0000 (368.1168) mem 14939MB [2024-07-25 04:37:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][420/625] eta 0:01:25 lr 0.000522 wd 0.0500 time 0.4022 (0.4150) data time 0.0006 (0.0018) model time 0.4015 (0.4153) loss 5.7680 (7.0692) grad_norm 2.0689 (inf) loss_scale 256.0000 (365.4537) mem 14939MB [2024-07-25 04:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][430/625] eta 0:01:20 lr 0.000522 wd 0.0500 time 0.3982 (0.4145) data time 0.0007 (0.0017) model time 0.3975 (0.4148) loss 6.6704 (7.0705) grad_norm 2.0365 (inf) loss_scale 256.0000 (362.9142) mem 14939MB [2024-07-25 04:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][440/625] eta 0:01:16 lr 0.000522 wd 0.0500 time 0.4040 (0.4142) data time 0.0009 (0.0017) model time 0.4031 (0.4143) loss 7.9533 (7.0748) grad_norm 1.8944 (inf) loss_scale 256.0000 (360.4898) mem 14939MB [2024-07-25 04:37:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][450/625] eta 0:01:12 lr 0.000522 wd 0.0500 time 0.4123 (0.4139) data time 0.0006 (0.0017) model time 0.4118 (0.4139) loss 7.6399 (7.0734) grad_norm 2.7409 (inf) loss_scale 256.0000 (358.1729) mem 14939MB [2024-07-25 04:37:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][460/625] eta 0:01:08 lr 0.000521 wd 0.0500 time 0.3966 (0.4135) data time 0.0008 (0.0017) model time 0.3958 (0.4135) loss 6.7803 (7.0744) grad_norm 1.9227 (inf) loss_scale 256.0000 (355.9566) mem 14939MB [2024-07-25 04:37:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][470/625] eta 0:01:04 lr 0.000521 wd 0.0500 time 0.3972 (0.4132) data time 0.0008 (0.0017) model time 0.3964 (0.4132) loss 8.0998 (7.0773) grad_norm 1.7783 (inf) loss_scale 256.0000 (353.8344) mem 14939MB [2024-07-25 04:37:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][480/625] eta 0:00:59 lr 0.000521 wd 0.0500 time 0.4265 (0.4130) data time 0.0007 (0.0017) model time 0.4257 (0.4129) loss 6.8936 (7.0742) grad_norm 3.1671 (inf) loss_scale 256.0000 (351.8004) mem 14939MB [2024-07-25 04:37:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][490/625] eta 0:00:55 lr 0.000521 wd 0.0500 time 0.3965 (0.4127) data time 0.0007 (0.0016) model time 0.3959 (0.4125) loss 5.7495 (7.0718) grad_norm 3.1838 (inf) loss_scale 256.0000 (349.8493) mem 14939MB [2024-07-25 04:38:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][500/625] eta 0:00:51 lr 0.000521 wd 0.0500 time 0.3981 (0.4124) data time 0.0007 (0.0016) model time 0.3974 (0.4123) loss 6.2230 (7.0705) grad_norm 2.4250 (inf) loss_scale 256.0000 (347.9760) mem 14939MB [2024-07-25 04:38:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][510/625] eta 0:00:47 lr 0.000521 wd 0.0500 time 0.4114 (0.4122) data time 0.0009 (0.0016) model time 0.4106 (0.4120) loss 7.9725 (7.0682) grad_norm 1.6968 (inf) loss_scale 256.0000 (346.1761) mem 14939MB [2024-07-25 04:38:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][520/625] eta 0:00:43 lr 0.000521 wd 0.0500 time 0.3937 (0.4120) data time 0.0007 (0.0016) model time 0.3929 (0.4117) loss 6.8080 (7.0756) grad_norm 2.7106 (inf) loss_scale 256.0000 (344.4453) mem 14939MB [2024-07-25 04:38:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][530/625] eta 0:00:39 lr 0.000521 wd 0.0500 time 0.3986 (0.4117) data time 0.0009 (0.0016) model time 0.3978 (0.4114) loss 7.1879 (7.0824) grad_norm 5.5694 (inf) loss_scale 256.0000 (342.7797) mem 14939MB [2024-07-25 04:38:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][540/625] eta 0:00:34 lr 0.000521 wd 0.0500 time 0.3997 (0.4114) data time 0.0006 (0.0016) model time 0.3991 (0.4111) loss 6.2686 (7.0837) grad_norm 3.0740 (inf) loss_scale 256.0000 (341.1756) mem 14939MB [2024-07-25 04:38:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][550/625] eta 0:00:30 lr 0.000520 wd 0.0500 time 0.3967 (0.4112) data time 0.0007 (0.0016) model time 0.3960 (0.4108) loss 7.3163 (7.0883) grad_norm 2.5924 (inf) loss_scale 256.0000 (339.6298) mem 14939MB [2024-07-25 04:38:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][560/625] eta 0:00:26 lr 0.000520 wd 0.0500 time 0.4007 (0.4113) data time 0.0007 (0.0016) model time 0.4000 (0.4109) loss 7.0217 (7.0907) grad_norm 1.9148 (inf) loss_scale 256.0000 (338.1390) mem 14939MB [2024-07-25 04:38:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][570/625] eta 0:00:22 lr 0.000520 wd 0.0500 time 0.5911 (0.4127) data time 0.0008 (0.0015) model time 0.5903 (0.4125) loss 5.5455 (7.0906) grad_norm 2.9567 (inf) loss_scale 256.0000 (336.7005) mem 14939MB [2024-07-25 04:38:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][580/625] eta 0:00:18 lr 0.000520 wd 0.0500 time 0.5718 (0.4144) data time 0.0007 (0.0015) model time 0.5712 (0.4143) loss 7.3519 (7.0915) grad_norm 2.3862 (inf) loss_scale 256.0000 (335.3115) mem 14939MB [2024-07-25 04:38:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][590/625] eta 0:00:14 lr 0.000520 wd 0.0500 time 0.5949 (0.4156) data time 0.0006 (0.0015) model time 0.5942 (0.4156) loss 7.2922 (7.0931) grad_norm 3.0735 (inf) loss_scale 256.0000 (333.9695) mem 14939MB [2024-07-25 04:38:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][600/625] eta 0:00:10 lr 0.000520 wd 0.0500 time 0.3984 (0.4154) data time 0.0009 (0.0015) model time 0.3975 (0.4153) loss 7.2702 (7.0977) grad_norm 2.5334 (inf) loss_scale 256.0000 (332.6722) mem 14939MB [2024-07-25 04:38:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][610/625] eta 0:00:06 lr 0.000520 wd 0.0500 time 0.3962 (0.4151) data time 0.0006 (0.0015) model time 0.3956 (0.4150) loss 6.9932 (7.0989) grad_norm 2.4052 (inf) loss_scale 256.0000 (331.4173) mem 14939MB [2024-07-25 04:38:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][620/625] eta 0:00:02 lr 0.000520 wd 0.0500 time 0.4044 (0.4149) data time 0.0006 (0.0015) model time 0.4038 (0.4147) loss 6.9554 (7.0953) grad_norm 4.6161 (inf) loss_scale 256.0000 (330.2029) mem 14939MB [2024-07-25 04:38:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 172 training takes 0:04:19 [2024-07-25 04:38:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:38:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.5825 (0.5825) Acc@1 88.965 (88.965) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 04:38:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9023 (0.7133) Acc@1 80.273 (85.636) Acc@5 95.898 (97.505) Mem 14939MB [2024-07-25 04:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0391 (0.8391) Acc@1 75.830 (82.280) Acc@5 94.824 (96.250) Mem 14939MB [2024-07-25 04:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.902 Acc@5 96.215 [2024-07-25 04:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 04:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.90% [2024-07-25 04:38:59 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 04:39:00 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 04:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.5557 (0.5557) Acc@1 89.893 (89.893) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 04:39:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8809 (0.6926) Acc@1 81.348 (86.248) Acc@5 96.191 (97.643) Mem 14939MB [2024-07-25 04:39:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.0176 (0.8134) Acc@1 75.586 (82.864) Acc@5 95.068 (96.419) Mem 14939MB [2024-07-25 04:39:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.444 Acc@5 96.377 [2024-07-25 04:39:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 04:39:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.44% [2024-07-25 04:39:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:39:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:39:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][0/625] eta 0:07:48 lr 0.000520 wd 0.0500 time 0.7500 (0.7500) data time 0.3720 (0.3720) model time 0.0000 (0.0000) loss 6.8564 (6.8564) grad_norm 2.8243 (2.8243) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][10/625] eta 0:04:25 lr 0.000520 wd 0.0500 time 0.3931 (0.4324) data time 0.0008 (0.0346) model time 0.0000 (0.0000) loss 5.7496 (6.9123) grad_norm 2.1944 (2.3935) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][20/625] eta 0:04:12 lr 0.000519 wd 0.0500 time 0.3975 (0.4165) data time 0.0007 (0.0185) model time 0.0000 (0.0000) loss 7.6159 (7.0891) grad_norm 1.8654 (2.6955) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][30/625] eta 0:04:04 lr 0.000519 wd 0.0500 time 0.3990 (0.4112) data time 0.0009 (0.0128) model time 0.0000 (0.0000) loss 6.5556 (7.1351) grad_norm 2.5428 (3.3562) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][40/625] eta 0:03:58 lr 0.000519 wd 0.0500 time 0.3977 (0.4085) data time 0.0006 (0.0099) model time 0.0000 (0.0000) loss 7.5957 (7.1258) grad_norm 2.6863 (3.1521) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][50/625] eta 0:03:54 lr 0.000519 wd 0.0500 time 0.3967 (0.4070) data time 0.0009 (0.0081) model time 0.0000 (0.0000) loss 6.8933 (7.1040) grad_norm 2.4518 (3.0029) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][60/625] eta 0:03:49 lr 0.000519 wd 0.0500 time 0.3978 (0.4059) data time 0.0006 (0.0069) model time 0.3972 (0.3995) loss 6.1324 (7.0913) grad_norm 3.1213 (2.9222) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][70/625] eta 0:03:44 lr 0.000519 wd 0.0500 time 0.3962 (0.4052) data time 0.0008 (0.0061) model time 0.3953 (0.3998) loss 7.4080 (7.0937) grad_norm 24.7220 (3.1617) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][80/625] eta 0:03:40 lr 0.000519 wd 0.0500 time 0.3965 (0.4043) data time 0.0010 (0.0055) model time 0.3956 (0.3989) loss 7.6381 (7.1054) grad_norm 2.1651 (3.2234) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][90/625] eta 0:03:36 lr 0.000519 wd 0.0500 time 0.4086 (0.4038) data time 0.0008 (0.0050) model time 0.4078 (0.3988) loss 6.0825 (7.0629) grad_norm 2.9942 (3.2028) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][100/625] eta 0:03:31 lr 0.000519 wd 0.0500 time 0.3981 (0.4032) data time 0.0009 (0.0046) model time 0.3972 (0.3984) loss 7.6862 (7.0680) grad_norm 2.9191 (3.1571) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][110/625] eta 0:03:27 lr 0.000519 wd 0.0500 time 0.3962 (0.4027) data time 0.0009 (0.0043) model time 0.3953 (0.3980) loss 7.0618 (7.0897) grad_norm 4.3194 (3.1279) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][120/625] eta 0:03:23 lr 0.000518 wd 0.0500 time 0.3977 (0.4033) data time 0.0007 (0.0040) model time 0.3970 (0.3997) loss 6.2265 (7.0934) grad_norm 4.0004 (3.0913) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:39:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][130/625] eta 0:03:19 lr 0.000518 wd 0.0500 time 0.3987 (0.4029) data time 0.0009 (0.0037) model time 0.3978 (0.3993) loss 7.0535 (7.0851) grad_norm 2.9339 (3.0563) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][140/625] eta 0:03:15 lr 0.000518 wd 0.0500 time 0.3968 (0.4025) data time 0.0007 (0.0035) model time 0.3961 (0.3990) loss 7.6523 (7.1218) grad_norm 3.3147 (3.0053) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][150/625] eta 0:03:11 lr 0.000518 wd 0.0500 time 0.3975 (0.4024) data time 0.0007 (0.0034) model time 0.3967 (0.3991) loss 8.5245 (7.1370) grad_norm 4.5004 (2.9795) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][160/625] eta 0:03:09 lr 0.000518 wd 0.0500 time 0.3990 (0.4068) data time 0.0009 (0.0032) model time 0.3981 (0.4057) loss 6.1614 (7.1417) grad_norm 3.5111 (2.9427) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][170/625] eta 0:03:07 lr 0.000518 wd 0.0500 time 0.3988 (0.4117) data time 0.0009 (0.0031) model time 0.3979 (0.4128) loss 7.9183 (7.1545) grad_norm 7.0047 (3.0341) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][180/625] eta 0:03:04 lr 0.000518 wd 0.0500 time 0.3985 (0.4154) data time 0.0007 (0.0030) model time 0.3979 (0.4178) loss 6.0771 (7.1319) grad_norm 3.6068 (3.0129) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][190/625] eta 0:03:01 lr 0.000518 wd 0.0500 time 0.4011 (0.4172) data time 0.0009 (0.0029) model time 0.4002 (0.4199) loss 6.9959 (7.1151) grad_norm 2.5598 (3.0408) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][200/625] eta 0:02:56 lr 0.000518 wd 0.0500 time 0.3976 (0.4163) data time 0.0009 (0.0028) model time 0.3968 (0.4185) loss 6.1176 (7.1106) grad_norm 3.3937 (3.0349) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][210/625] eta 0:02:52 lr 0.000517 wd 0.0500 time 0.4009 (0.4155) data time 0.0007 (0.0027) model time 0.4002 (0.4172) loss 6.4477 (7.0940) grad_norm 2.4656 (3.0196) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][220/625] eta 0:02:47 lr 0.000517 wd 0.0500 time 0.3988 (0.4147) data time 0.0007 (0.0026) model time 0.3982 (0.4161) loss 7.5448 (7.1005) grad_norm 3.7351 (2.9867) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][230/625] eta 0:02:43 lr 0.000517 wd 0.0500 time 0.3966 (0.4140) data time 0.0008 (0.0025) model time 0.3958 (0.4150) loss 7.9293 (7.0936) grad_norm 2.0817 (2.9460) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][240/625] eta 0:02:39 lr 0.000517 wd 0.0500 time 0.4009 (0.4134) data time 0.0008 (0.0025) model time 0.4001 (0.4142) loss 6.6557 (7.0888) grad_norm 1.8654 (2.9101) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][250/625] eta 0:02:34 lr 0.000517 wd 0.0500 time 0.3997 (0.4129) data time 0.0009 (0.0024) model time 0.3988 (0.4134) loss 7.6900 (7.0842) grad_norm 3.0451 (2.9017) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][260/625] eta 0:02:30 lr 0.000517 wd 0.0500 time 0.3975 (0.4124) data time 0.0006 (0.0024) model time 0.3968 (0.4128) loss 6.7778 (7.0946) grad_norm 2.1624 (2.8849) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][270/625] eta 0:02:26 lr 0.000517 wd 0.0500 time 0.3994 (0.4120) data time 0.0007 (0.0023) model time 0.3987 (0.4122) loss 6.6218 (7.0970) grad_norm 2.0383 (2.8838) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:40:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][280/625] eta 0:02:21 lr 0.000517 wd 0.0500 time 0.4000 (0.4116) data time 0.0009 (0.0023) model time 0.3991 (0.4116) loss 6.1894 (7.0957) grad_norm 4.3201 (2.8817) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][290/625] eta 0:02:17 lr 0.000517 wd 0.0500 time 0.4018 (0.4112) data time 0.0011 (0.0022) model time 0.4007 (0.4111) loss 7.9835 (7.0982) grad_norm 2.4680 (2.8695) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][300/625] eta 0:02:13 lr 0.000517 wd 0.0500 time 0.3987 (0.4108) data time 0.0007 (0.0022) model time 0.3980 (0.4106) loss 5.5309 (7.0843) grad_norm 8.5455 (2.8868) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][310/625] eta 0:02:09 lr 0.000516 wd 0.0500 time 0.3983 (0.4105) data time 0.0007 (0.0021) model time 0.3976 (0.4102) loss 6.2922 (7.0773) grad_norm 2.9606 (2.8910) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][320/625] eta 0:02:05 lr 0.000516 wd 0.0500 time 0.3940 (0.4102) data time 0.0009 (0.0021) model time 0.3930 (0.4099) loss 7.3107 (7.0774) grad_norm 3.2022 (2.8810) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][330/625] eta 0:02:00 lr 0.000516 wd 0.0500 time 0.4021 (0.4099) data time 0.0009 (0.0021) model time 0.4012 (0.4095) loss 6.2750 (7.0753) grad_norm 1.9573 (2.8670) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][340/625] eta 0:01:56 lr 0.000516 wd 0.0500 time 0.4006 (0.4101) data time 0.0009 (0.0020) model time 0.3997 (0.4097) loss 6.7812 (7.0777) grad_norm 2.2801 (2.8584) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][350/625] eta 0:01:52 lr 0.000516 wd 0.0500 time 0.3924 (0.4098) data time 0.0008 (0.0020) model time 0.3916 (0.4093) loss 6.0180 (7.0862) grad_norm 2.8858 (2.8569) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][360/625] eta 0:01:48 lr 0.000516 wd 0.0500 time 0.3970 (0.4095) data time 0.0011 (0.0020) model time 0.3959 (0.4089) loss 6.5939 (7.0909) grad_norm 2.0024 (2.8497) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][370/625] eta 0:01:44 lr 0.000516 wd 0.0500 time 0.3995 (0.4092) data time 0.0009 (0.0019) model time 0.3986 (0.4086) loss 6.6548 (7.0887) grad_norm 4.7682 (2.8716) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][380/625] eta 0:01:40 lr 0.000516 wd 0.0500 time 0.5952 (0.4108) data time 0.0008 (0.0019) model time 0.5943 (0.4104) loss 7.5874 (7.0893) grad_norm 3.1013 (2.8780) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][390/625] eta 0:01:36 lr 0.000516 wd 0.0500 time 0.3979 (0.4124) data time 0.0008 (0.0019) model time 0.3971 (0.4122) loss 6.8491 (7.0878) grad_norm 2.1858 (2.8862) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][400/625] eta 0:01:33 lr 0.000515 wd 0.0500 time 0.5376 (0.4145) data time 0.0009 (0.0019) model time 0.5367 (0.4147) loss 6.8911 (7.0908) grad_norm 2.4472 (2.8734) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][410/625] eta 0:01:29 lr 0.000515 wd 0.0500 time 0.3976 (0.4152) data time 0.0008 (0.0018) model time 0.3969 (0.4154) loss 6.4393 (7.0951) grad_norm 2.7115 (2.8918) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:41:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][420/625] eta 0:01:25 lr 0.000515 wd 0.0500 time 0.4113 (0.4149) data time 0.0006 (0.0018) model time 0.4106 (0.4150) loss 6.1275 (7.0925) grad_norm 1.8716 (2.8739) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][430/625] eta 0:01:20 lr 0.000515 wd 0.0500 time 0.3972 (0.4145) data time 0.0007 (0.0018) model time 0.3965 (0.4146) loss 6.8638 (7.0874) grad_norm 2.5754 (2.8619) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][440/625] eta 0:01:16 lr 0.000515 wd 0.0500 time 0.4150 (0.4142) data time 0.0008 (0.0018) model time 0.4142 (0.4142) loss 7.4795 (7.0761) grad_norm 3.2865 (2.8497) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][450/625] eta 0:01:12 lr 0.000515 wd 0.0500 time 0.3893 (0.4139) data time 0.0007 (0.0018) model time 0.3886 (0.4138) loss 5.9043 (7.0698) grad_norm 2.0638 (2.8417) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][460/625] eta 0:01:08 lr 0.000515 wd 0.0500 time 0.3978 (0.4136) data time 0.0007 (0.0017) model time 0.3971 (0.4134) loss 6.7823 (7.0706) grad_norm 3.2421 (2.8769) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][470/625] eta 0:01:04 lr 0.000515 wd 0.0500 time 0.3995 (0.4132) data time 0.0007 (0.0017) model time 0.3988 (0.4131) loss 7.5257 (7.0776) grad_norm 3.3004 (2.9049) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][480/625] eta 0:00:59 lr 0.000515 wd 0.0500 time 0.3987 (0.4130) data time 0.0009 (0.0017) model time 0.3978 (0.4128) loss 6.4063 (7.0763) grad_norm 2.3319 (2.9193) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][490/625] eta 0:00:55 lr 0.000515 wd 0.0500 time 0.3965 (0.4127) data time 0.0007 (0.0017) model time 0.3958 (0.4125) loss 6.3535 (7.0642) grad_norm 1.8589 (2.9141) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][500/625] eta 0:00:51 lr 0.000514 wd 0.0500 time 0.4079 (0.4125) data time 0.0007 (0.0017) model time 0.4072 (0.4122) loss 6.9722 (7.0659) grad_norm 3.3999 (2.9103) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][510/625] eta 0:00:47 lr 0.000514 wd 0.0500 time 0.3989 (0.4123) data time 0.0008 (0.0017) model time 0.3981 (0.4119) loss 6.8246 (7.0608) grad_norm 1.8241 (2.9045) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][520/625] eta 0:00:43 lr 0.000514 wd 0.0500 time 0.3962 (0.4120) data time 0.0008 (0.0016) model time 0.3954 (0.4116) loss 7.9567 (7.0669) grad_norm 2.4742 (2.8891) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][530/625] eta 0:00:39 lr 0.000514 wd 0.0500 time 0.4004 (0.4118) data time 0.0008 (0.0016) model time 0.3996 (0.4114) loss 8.2872 (7.0599) grad_norm 2.0316 (2.8771) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][540/625] eta 0:00:34 lr 0.000514 wd 0.0500 time 0.4003 (0.4116) data time 0.0009 (0.0016) model time 0.3994 (0.4111) loss 7.5159 (7.0592) grad_norm 3.4219 (2.8834) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][550/625] eta 0:00:30 lr 0.000514 wd 0.0500 time 0.3964 (0.4114) data time 0.0007 (0.0016) model time 0.3957 (0.4109) loss 8.1880 (7.0595) grad_norm 1.7522 (2.8791) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][560/625] eta 0:00:26 lr 0.000514 wd 0.0500 time 0.4002 (0.4115) data time 0.0008 (0.0016) model time 0.3993 (0.4110) loss 7.5121 (7.0622) grad_norm 1.5275 (2.8630) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:42:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][570/625] eta 0:00:22 lr 0.000514 wd 0.0500 time 0.4019 (0.4112) data time 0.0008 (0.0016) model time 0.4011 (0.4107) loss 6.7325 (7.0622) grad_norm 2.0406 (2.8517) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][580/625] eta 0:00:18 lr 0.000514 wd 0.0500 time 0.3978 (0.4111) data time 0.0008 (0.0016) model time 0.3970 (0.4105) loss 7.4592 (7.0606) grad_norm 2.5563 (2.8417) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][590/625] eta 0:00:14 lr 0.000513 wd 0.0500 time 0.4001 (0.4109) data time 0.0006 (0.0016) model time 0.3994 (0.4103) loss 6.4954 (7.0641) grad_norm 1.7937 (2.8377) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][600/625] eta 0:00:10 lr 0.000513 wd 0.0500 time 0.5980 (0.4119) data time 0.0007 (0.0016) model time 0.5974 (0.4114) loss 7.4539 (7.0659) grad_norm 1.9279 (2.8356) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][610/625] eta 0:00:06 lr 0.000513 wd 0.0500 time 0.5853 (0.4134) data time 0.0004 (0.0016) model time 0.5850 (0.4131) loss 6.0063 (7.0633) grad_norm 3.8857 (2.8358) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][620/625] eta 0:00:02 lr 0.000513 wd 0.0500 time 0.5809 (0.4147) data time 0.0004 (0.0015) model time 0.5806 (0.4144) loss 5.8873 (7.0588) grad_norm 4.6365 (2.8440) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 173 training takes 0:04:19 [2024-07-25 04:43:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:43:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:43:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5718 (0.5718) Acc@1 89.160 (89.160) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 04:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9360 (0.7142) Acc@1 79.639 (85.538) Acc@5 95.850 (97.461) Mem 14939MB [2024-07-25 04:43:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0127 (0.8371) Acc@1 76.904 (82.382) Acc@5 94.629 (96.159) Mem 14939MB [2024-07-25 04:43:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.944 Acc@5 96.147 [2024-07-25 04:43:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 04:43:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.94% [2024-07-25 04:43:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 04:43:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 04:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.5552 (0.5552) Acc@1 89.941 (89.941) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 04:43:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8804 (0.6922) Acc@1 81.250 (86.248) Acc@5 96.191 (97.647) Mem 14939MB [2024-07-25 04:43:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0166 (0.8127) Acc@1 75.830 (82.892) Acc@5 95.068 (96.436) Mem 14939MB [2024-07-25 04:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.482 Acc@5 96.393 [2024-07-25 04:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 04:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.48% [2024-07-25 04:43:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:43:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:43:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][0/625] eta 0:08:22 lr 0.000513 wd 0.0500 time 0.8037 (0.8037) data time 0.4216 (0.4216) model time 0.0000 (0.0000) loss 7.6822 (7.6822) grad_norm 2.0636 (2.0636) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][10/625] eta 0:04:40 lr 0.000513 wd 0.0500 time 0.3955 (0.4556) data time 0.0006 (0.0393) model time 0.0000 (0.0000) loss 6.1427 (7.1533) grad_norm 2.2072 (2.9833) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][20/625] eta 0:04:19 lr 0.000513 wd 0.0500 time 0.4036 (0.4285) data time 0.0007 (0.0209) model time 0.0000 (0.0000) loss 7.6318 (7.2473) grad_norm 2.5469 (2.8246) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][30/625] eta 0:04:10 lr 0.000513 wd 0.0500 time 0.3927 (0.4209) data time 0.0007 (0.0145) model time 0.0000 (0.0000) loss 7.0503 (7.2052) grad_norm 2.0998 (2.6947) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][40/625] eta 0:04:03 lr 0.000513 wd 0.0500 time 0.4011 (0.4158) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 7.0463 (7.1869) grad_norm 2.7381 (2.6331) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][50/625] eta 0:03:57 lr 0.000513 wd 0.0500 time 0.4049 (0.4124) data time 0.0007 (0.0092) model time 0.0000 (0.0000) loss 7.9150 (7.1905) grad_norm 1.6551 (2.7811) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:43:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][60/625] eta 0:03:53 lr 0.000512 wd 0.0500 time 0.4081 (0.4128) data time 0.0008 (0.0078) model time 0.4073 (0.4136) loss 7.4426 (7.2431) grad_norm 2.7015 (2.8061) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][70/625] eta 0:03:48 lr 0.000512 wd 0.0500 time 0.3972 (0.4115) data time 0.0009 (0.0075) model time 0.3963 (0.4060) loss 8.1643 (7.1922) grad_norm 1.9779 (2.7156) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][80/625] eta 0:03:43 lr 0.000512 wd 0.0500 time 0.3997 (0.4100) data time 0.0007 (0.0067) model time 0.3991 (0.4034) loss 6.5876 (7.1916) grad_norm 1.8983 (2.6888) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][90/625] eta 0:03:38 lr 0.000512 wd 0.0500 time 0.3962 (0.4089) data time 0.0007 (0.0061) model time 0.3954 (0.4023) loss 6.7310 (7.1790) grad_norm 2.5093 (2.7022) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][100/625] eta 0:03:34 lr 0.000512 wd 0.0500 time 0.4009 (0.4080) data time 0.0009 (0.0057) model time 0.4000 (0.4013) loss 6.5757 (7.1478) grad_norm 3.0108 (2.7193) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][110/625] eta 0:03:29 lr 0.000512 wd 0.0500 time 0.4082 (0.4073) data time 0.0007 (0.0053) model time 0.4075 (0.4009) loss 6.5863 (7.1135) grad_norm 3.5647 (2.7733) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][120/625] eta 0:03:25 lr 0.000512 wd 0.0500 time 0.3993 (0.4068) data time 0.0009 (0.0049) model time 0.3984 (0.4008) loss 7.1828 (7.1016) grad_norm 3.2663 (2.8904) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][130/625] eta 0:03:21 lr 0.000512 wd 0.0500 time 0.3998 (0.4062) data time 0.0009 (0.0046) model time 0.3989 (0.4006) loss 8.6721 (7.0944) grad_norm 4.1890 (2.9283) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][140/625] eta 0:03:16 lr 0.000512 wd 0.0500 time 0.4006 (0.4058) data time 0.0009 (0.0043) model time 0.3997 (0.4005) loss 6.7357 (7.1016) grad_norm 2.4975 (2.9094) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][150/625] eta 0:03:12 lr 0.000511 wd 0.0500 time 0.3956 (0.4054) data time 0.0010 (0.0041) model time 0.3947 (0.4004) loss 6.3492 (7.0821) grad_norm 2.0422 (2.9136) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][160/625] eta 0:03:08 lr 0.000511 wd 0.0500 time 0.3988 (0.4051) data time 0.0009 (0.0039) model time 0.3979 (0.4003) loss 7.3782 (7.0655) grad_norm 1.7734 (2.8969) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][170/625] eta 0:03:04 lr 0.000511 wd 0.0500 time 0.3981 (0.4048) data time 0.0009 (0.0037) model time 0.3972 (0.4001) loss 5.8945 (7.0290) grad_norm 2.6740 (2.8982) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][180/625] eta 0:02:59 lr 0.000511 wd 0.0500 time 0.3957 (0.4044) data time 0.0008 (0.0036) model time 0.3949 (0.3999) loss 7.7866 (7.0345) grad_norm 1.9849 (2.8916) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][190/625] eta 0:02:56 lr 0.000511 wd 0.0500 time 0.3989 (0.4054) data time 0.0009 (0.0034) model time 0.3980 (0.4015) loss 8.1794 (7.0154) grad_norm 3.4657 (2.9057) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][200/625] eta 0:02:53 lr 0.000511 wd 0.0500 time 0.3983 (0.4086) data time 0.0008 (0.0033) model time 0.3975 (0.4059) loss 9.0111 (7.0056) grad_norm 4.1298 (2.8895) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:44:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][210/625] eta 0:02:51 lr 0.000511 wd 0.0500 time 0.5950 (0.4134) data time 0.0008 (0.0032) model time 0.5941 (0.4124) loss 7.1192 (7.0003) grad_norm 1.9583 (2.8766) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][220/625] eta 0:02:48 lr 0.000511 wd 0.0500 time 0.4003 (0.4161) data time 0.0008 (0.0031) model time 0.3995 (0.4159) loss 5.5048 (6.9968) grad_norm 3.1792 (2.9245) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][230/625] eta 0:02:44 lr 0.000511 wd 0.0500 time 0.3963 (0.4154) data time 0.0007 (0.0030) model time 0.3956 (0.4150) loss 7.1804 (6.9945) grad_norm 2.5332 (3.0089) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][240/625] eta 0:02:39 lr 0.000511 wd 0.0500 time 0.3972 (0.4148) data time 0.0007 (0.0030) model time 0.3964 (0.4141) loss 7.0104 (6.9945) grad_norm 2.1617 (3.0010) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][250/625] eta 0:02:35 lr 0.000510 wd 0.0500 time 0.4003 (0.4142) data time 0.0007 (0.0029) model time 0.3996 (0.4133) loss 7.6063 (7.0032) grad_norm 4.5625 (3.0077) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][260/625] eta 0:02:30 lr 0.000510 wd 0.0500 time 0.3987 (0.4136) data time 0.0006 (0.0028) model time 0.3981 (0.4127) loss 7.0711 (7.0008) grad_norm 3.6267 (2.9987) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][270/625] eta 0:02:26 lr 0.000510 wd 0.0500 time 0.3982 (0.4130) data time 0.0006 (0.0027) model time 0.3976 (0.4120) loss 7.8133 (7.0009) grad_norm 2.6511 (2.9761) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][280/625] eta 0:02:22 lr 0.000510 wd 0.0500 time 0.3859 (0.4132) data time 0.0007 (0.0027) model time 0.3852 (0.4122) loss 7.1585 (7.0104) grad_norm 5.2582 (3.0196) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][290/625] eta 0:02:18 lr 0.000510 wd 0.0500 time 0.4004 (0.4127) data time 0.0007 (0.0026) model time 0.3998 (0.4116) loss 6.2821 (7.0206) grad_norm 1.6959 (3.0002) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][300/625] eta 0:02:14 lr 0.000510 wd 0.0500 time 0.4039 (0.4123) data time 0.0007 (0.0025) model time 0.4031 (0.4111) loss 7.5356 (7.0118) grad_norm 1.7401 (2.9897) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][310/625] eta 0:02:09 lr 0.000510 wd 0.0500 time 0.4053 (0.4119) data time 0.0007 (0.0025) model time 0.4046 (0.4106) loss 6.2566 (7.0176) grad_norm 3.4229 (2.9916) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][320/625] eta 0:02:05 lr 0.000510 wd 0.0500 time 0.4019 (0.4115) data time 0.0007 (0.0024) model time 0.4013 (0.4102) loss 7.2852 (7.0236) grad_norm 2.4198 (2.9751) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][330/625] eta 0:02:01 lr 0.000510 wd 0.0500 time 0.4025 (0.4112) data time 0.0007 (0.0024) model time 0.4018 (0.4098) loss 7.1209 (7.0297) grad_norm 1.8661 (2.9610) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][340/625] eta 0:01:57 lr 0.000509 wd 0.0500 time 0.4002 (0.4108) data time 0.0009 (0.0023) model time 0.3994 (0.4094) loss 7.6574 (7.0391) grad_norm 3.0556 (2.9633) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][350/625] eta 0:01:52 lr 0.000509 wd 0.0500 time 0.3998 (0.4105) data time 0.0007 (0.0023) model time 0.3991 (0.4090) loss 8.4960 (7.0459) grad_norm 2.4963 (2.9581) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][360/625] eta 0:01:48 lr 0.000509 wd 0.0500 time 0.3998 (0.4102) data time 0.0007 (0.0023) model time 0.3991 (0.4087) loss 7.7549 (7.0527) grad_norm 2.3207 (2.9434) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][370/625] eta 0:01:44 lr 0.000509 wd 0.0500 time 0.3987 (0.4099) data time 0.0007 (0.0022) model time 0.3981 (0.4084) loss 7.9893 (7.0517) grad_norm 5.1596 (2.9491) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][380/625] eta 0:01:40 lr 0.000509 wd 0.0500 time 0.4107 (0.4096) data time 0.0007 (0.0022) model time 0.4100 (0.4081) loss 7.3414 (7.0502) grad_norm 2.5471 (2.9543) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][390/625] eta 0:01:36 lr 0.000509 wd 0.0500 time 0.3978 (0.4093) data time 0.0007 (0.0022) model time 0.3972 (0.4078) loss 7.4847 (7.0478) grad_norm 2.4266 (2.9649) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][400/625] eta 0:01:32 lr 0.000509 wd 0.0500 time 0.3981 (0.4091) data time 0.0008 (0.0021) model time 0.3973 (0.4075) loss 6.7019 (7.0503) grad_norm 1.8370 (2.9588) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][410/625] eta 0:01:27 lr 0.000509 wd 0.0500 time 0.3981 (0.4093) data time 0.0007 (0.0021) model time 0.3974 (0.4078) loss 6.9038 (7.0444) grad_norm 1.7795 (2.9469) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][420/625] eta 0:01:24 lr 0.000509 wd 0.0500 time 0.6244 (0.4113) data time 0.0008 (0.0021) model time 0.6237 (0.4100) loss 9.0574 (7.0519) grad_norm 2.1668 (2.9355) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][430/625] eta 0:01:20 lr 0.000509 wd 0.0500 time 0.5900 (0.4138) data time 0.0009 (0.0021) model time 0.5891 (0.4129) loss 7.4536 (7.0440) grad_norm 3.6955 (2.9254) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][440/625] eta 0:01:16 lr 0.000508 wd 0.0500 time 0.6255 (0.4153) data time 0.0009 (0.0020) model time 0.6246 (0.4146) loss 8.1379 (7.0466) grad_norm 2.1473 (2.9232) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][450/625] eta 0:01:12 lr 0.000508 wd 0.0500 time 0.3992 (0.4150) data time 0.0006 (0.0020) model time 0.3986 (0.4142) loss 6.6061 (7.0470) grad_norm 2.0890 (2.9178) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][460/625] eta 0:01:08 lr 0.000508 wd 0.0500 time 0.3987 (0.4147) data time 0.0006 (0.0020) model time 0.3980 (0.4138) loss 6.5369 (7.0459) grad_norm 2.6861 (2.9078) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][470/625] eta 0:01:04 lr 0.000508 wd 0.0500 time 0.4022 (0.4144) data time 0.0008 (0.0020) model time 0.4014 (0.4135) loss 5.8815 (7.0400) grad_norm 1.9947 (2.8968) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][480/625] eta 0:01:00 lr 0.000508 wd 0.0500 time 0.3991 (0.4140) data time 0.0008 (0.0020) model time 0.3983 (0.4131) loss 7.7901 (7.0404) grad_norm 1.8746 (2.8977) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][490/625] eta 0:00:55 lr 0.000508 wd 0.0500 time 0.3958 (0.4138) data time 0.0006 (0.0019) model time 0.3951 (0.4128) loss 6.7958 (7.0425) grad_norm 2.7893 (2.8914) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:46:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][500/625] eta 0:00:51 lr 0.000508 wd 0.0500 time 0.3954 (0.4137) data time 0.0006 (0.0019) model time 0.3948 (0.4127) loss 7.1558 (7.0392) grad_norm 4.3423 (2.8889) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][510/625] eta 0:00:47 lr 0.000508 wd 0.0500 time 0.3989 (0.4134) data time 0.0008 (0.0019) model time 0.3981 (0.4124) loss 6.4546 (7.0383) grad_norm 2.7915 (2.8801) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][520/625] eta 0:00:43 lr 0.000508 wd 0.0500 time 0.3953 (0.4131) data time 0.0006 (0.0019) model time 0.3947 (0.4121) loss 7.2892 (7.0414) grad_norm 2.7246 (2.8758) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][530/625] eta 0:00:39 lr 0.000508 wd 0.0500 time 0.4056 (0.4129) data time 0.0006 (0.0019) model time 0.4050 (0.4118) loss 6.6349 (7.0433) grad_norm 2.3173 (2.8659) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][540/625] eta 0:00:35 lr 0.000507 wd 0.0500 time 0.3928 (0.4126) data time 0.0009 (0.0018) model time 0.3919 (0.4115) loss 6.3873 (7.0378) grad_norm 1.6868 (2.8547) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][550/625] eta 0:00:30 lr 0.000507 wd 0.0500 time 0.3995 (0.4123) data time 0.0008 (0.0018) model time 0.3987 (0.4113) loss 7.3658 (7.0408) grad_norm 2.7883 (2.8515) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][560/625] eta 0:00:26 lr 0.000507 wd 0.0500 time 0.3978 (0.4121) data time 0.0006 (0.0018) model time 0.3972 (0.4110) loss 8.3488 (7.0417) grad_norm 3.7756 (2.8486) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][570/625] eta 0:00:22 lr 0.000507 wd 0.0500 time 0.3964 (0.4119) data time 0.0008 (0.0018) model time 0.3956 (0.4108) loss 6.6553 (7.0397) grad_norm 3.9989 (2.8558) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][580/625] eta 0:00:18 lr 0.000507 wd 0.0500 time 0.3967 (0.4117) data time 0.0008 (0.0018) model time 0.3959 (0.4105) loss 7.0538 (7.0447) grad_norm 2.9711 (2.8609) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][590/625] eta 0:00:14 lr 0.000507 wd 0.0500 time 0.4066 (0.4115) data time 0.0007 (0.0018) model time 0.4059 (0.4104) loss 6.2733 (7.0438) grad_norm 2.4881 (2.8574) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][600/625] eta 0:00:10 lr 0.000507 wd 0.0500 time 0.3990 (0.4113) data time 0.0008 (0.0017) model time 0.3982 (0.4101) loss 6.8848 (7.0444) grad_norm 1.8455 (2.8681) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][610/625] eta 0:00:06 lr 0.000507 wd 0.0500 time 0.3979 (0.4111) data time 0.0006 (0.0017) model time 0.3973 (0.4099) loss 5.7645 (7.0443) grad_norm 2.2066 (2.8635) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][620/625] eta 0:00:02 lr 0.000507 wd 0.0500 time 0.3997 (0.4109) data time 0.0005 (0.0017) model time 0.3991 (0.4098) loss 6.3339 (7.0481) grad_norm 2.8518 (2.8660) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:47:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 174 training takes 0:04:16 [2024-07-25 04:47:47 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:47:48 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:47:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5747 (0.5747) Acc@1 89.453 (89.453) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 04:47:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.9229 (0.7135) Acc@1 79.053 (85.591) Acc@5 96.045 (97.470) Mem 14939MB [2024-07-25 04:47:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.0684 (0.8384) Acc@1 75.732 (82.196) Acc@5 94.092 (96.187) Mem 14939MB [2024-07-25 04:47:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.818 Acc@5 96.161 [2024-07-25 04:47:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-07-25 04:47:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.820 (0.820) Loss 0.5552 (0.5552) Acc@1 89.893 (89.893) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 04:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.8789 (0.6917) Acc@1 81.152 (86.253) Acc@5 96.143 (97.670) Mem 14939MB [2024-07-25 04:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0137 (0.8117) Acc@1 75.830 (82.903) Acc@5 94.922 (96.452) Mem 14939MB [2024-07-25 04:47:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.486 Acc@5 96.411 [2024-07-25 04:47:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 04:47:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.49% [2024-07-25 04:47:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:47:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:47:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][0/625] eta 0:07:49 lr 0.000507 wd 0.0500 time 0.7505 (0.7505) data time 0.3721 (0.3721) model time 0.0000 (0.0000) loss 7.7831 (7.7831) grad_norm 1.9832 (1.9832) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][10/625] eta 0:05:04 lr 0.000506 wd 0.0500 time 0.4019 (0.4955) data time 0.0007 (0.0346) model time 0.0000 (0.0000) loss 8.2472 (7.2777) grad_norm 2.4431 (2.9002) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][20/625] eta 0:05:00 lr 0.000506 wd 0.0500 time 0.5754 (0.4960) data time 0.0008 (0.0187) model time 0.0000 (0.0000) loss 5.8038 (6.9181) grad_norm 3.0854 (2.7764) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][30/625] eta 0:04:55 lr 0.000506 wd 0.0500 time 0.5698 (0.4972) data time 0.0009 (0.0129) model time 0.0000 (0.0000) loss 6.7323 (7.0366) grad_norm 2.7706 (2.5887) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][40/625] eta 0:04:45 lr 0.000506 wd 0.0500 time 0.4041 (0.4882) data time 0.0008 (0.0100) model time 0.0000 (0.0000) loss 6.7270 (7.0812) grad_norm 2.1349 (2.6176) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][50/625] eta 0:04:30 lr 0.000506 wd 0.0500 time 0.3970 (0.4711) data time 0.0008 (0.0082) model time 0.0000 (0.0000) loss 6.0731 (7.0669) grad_norm 3.4573 (2.7224) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][60/625] eta 0:04:19 lr 0.000506 wd 0.0500 time 0.3992 (0.4595) data time 0.0006 (0.0070) model time 0.3986 (0.3993) loss 7.5174 (7.0458) grad_norm 3.4074 (2.8049) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][70/625] eta 0:04:10 lr 0.000506 wd 0.0500 time 0.4051 (0.4512) data time 0.0007 (0.0062) model time 0.4044 (0.3993) loss 5.8229 (7.0507) grad_norm 3.9479 (3.4605) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][80/625] eta 0:04:02 lr 0.000506 wd 0.0500 time 0.3964 (0.4447) data time 0.0009 (0.0055) model time 0.3956 (0.3988) loss 7.1532 (7.0241) grad_norm 2.5082 (3.3878) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][90/625] eta 0:03:55 lr 0.000506 wd 0.0500 time 0.3978 (0.4396) data time 0.0008 (0.0050) model time 0.3969 (0.3985) loss 7.3900 (7.0451) grad_norm 4.5600 (3.3538) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][100/625] eta 0:03:48 lr 0.000505 wd 0.0500 time 0.4024 (0.4358) data time 0.0008 (0.0046) model time 0.4016 (0.3989) loss 8.0951 (7.0385) grad_norm 2.4001 (3.2612) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][110/625] eta 0:03:42 lr 0.000505 wd 0.0500 time 0.4004 (0.4326) data time 0.0009 (0.0043) model time 0.3996 (0.3990) loss 7.5116 (7.0424) grad_norm 1.5828 (3.2162) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][120/625] eta 0:03:37 lr 0.000505 wd 0.0500 time 0.3967 (0.4298) data time 0.0006 (0.0040) model time 0.3960 (0.3989) loss 7.2115 (7.0607) grad_norm 2.4730 (3.4250) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][130/625] eta 0:03:31 lr 0.000505 wd 0.0500 time 0.3996 (0.4277) data time 0.0008 (0.0037) model time 0.3988 (0.3991) loss 6.3751 (7.0288) grad_norm 2.1965 (3.3911) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][140/625] eta 0:03:26 lr 0.000505 wd 0.0500 time 0.4005 (0.4257) data time 0.0006 (0.0035) model time 0.3999 (0.3990) loss 7.4716 (7.0531) grad_norm 4.0162 (3.3417) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:48:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][150/625] eta 0:03:21 lr 0.000505 wd 0.0500 time 0.3999 (0.4240) data time 0.0007 (0.0034) model time 0.3992 (0.3991) loss 7.6046 (7.0519) grad_norm 3.4653 (3.3279) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][160/625] eta 0:03:16 lr 0.000505 wd 0.0500 time 0.4101 (0.4225) data time 0.0007 (0.0032) model time 0.4094 (0.3991) loss 7.0180 (7.0551) grad_norm 2.5635 (3.3087) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][170/625] eta 0:03:11 lr 0.000505 wd 0.0500 time 0.3991 (0.4213) data time 0.0009 (0.0031) model time 0.3982 (0.3992) loss 7.3124 (7.0558) grad_norm 2.7765 (3.2839) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][180/625] eta 0:03:06 lr 0.000505 wd 0.0500 time 0.3965 (0.4201) data time 0.0006 (0.0030) model time 0.3959 (0.3992) loss 7.5140 (7.0613) grad_norm 25.5843 (3.3665) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][190/625] eta 0:03:02 lr 0.000505 wd 0.0500 time 0.4000 (0.4191) data time 0.0006 (0.0029) model time 0.3993 (0.3993) loss 7.2356 (7.0717) grad_norm 1.6725 (3.3417) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][200/625] eta 0:02:57 lr 0.000504 wd 0.0500 time 0.3981 (0.4181) data time 0.0007 (0.0028) model time 0.3974 (0.3992) loss 7.8005 (7.0729) grad_norm 2.2277 (3.3258) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][210/625] eta 0:02:53 lr 0.000504 wd 0.0500 time 0.3969 (0.4172) data time 0.0007 (0.0027) model time 0.3962 (0.3991) loss 7.8710 (7.0863) grad_norm 2.9158 (3.2820) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][220/625] eta 0:02:48 lr 0.000504 wd 0.0500 time 0.4043 (0.4164) data time 0.0009 (0.0026) model time 0.4034 (0.3991) loss 6.8495 (7.0610) grad_norm 3.5758 (3.2653) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][230/625] eta 0:02:44 lr 0.000504 wd 0.0500 time 0.3996 (0.4171) data time 0.0008 (0.0025) model time 0.3988 (0.4010) loss 6.0208 (7.0547) grad_norm 5.1050 (3.2551) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][240/625] eta 0:02:41 lr 0.000504 wd 0.0500 time 0.3991 (0.4200) data time 0.0008 (0.0024) model time 0.3983 (0.4054) loss 8.2901 (7.0572) grad_norm 3.5112 (3.2559) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][250/625] eta 0:02:38 lr 0.000504 wd 0.0500 time 0.3957 (0.4224) data time 0.0007 (0.0024) model time 0.3951 (0.4090) loss 7.4114 (7.0610) grad_norm 2.0892 (3.2173) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][260/625] eta 0:02:34 lr 0.000504 wd 0.0500 time 0.3960 (0.4239) data time 0.0007 (0.0023) model time 0.3953 (0.4115) loss 5.7741 (7.0676) grad_norm 2.7921 (3.1905) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][270/625] eta 0:02:30 lr 0.000504 wd 0.0500 time 0.3964 (0.4229) data time 0.0006 (0.0023) model time 0.3957 (0.4108) loss 6.2678 (7.0598) grad_norm 2.1628 (3.1756) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][280/625] eta 0:02:25 lr 0.000504 wd 0.0500 time 0.4062 (0.4221) data time 0.0006 (0.0022) model time 0.4056 (0.4103) loss 6.2124 (7.0697) grad_norm 2.9690 (3.1759) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:49:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][290/625] eta 0:02:21 lr 0.000503 wd 0.0500 time 0.3980 (0.4213) data time 0.0007 (0.0022) model time 0.3973 (0.4098) loss 8.4529 (7.0905) grad_norm 4.7678 (3.2079) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][300/625] eta 0:02:16 lr 0.000503 wd 0.0500 time 0.3941 (0.4206) data time 0.0008 (0.0021) model time 0.3933 (0.4094) loss 5.4708 (7.0799) grad_norm 3.6121 (3.2020) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 04:50:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][310/625] eta 0:02:12 lr 0.000503 wd 0.0500 time 0.4046 (0.4200) data time 0.0008 (0.0021) model time 0.4038 (0.4090) loss 8.8546 (7.0890) grad_norm 2.2214 (3.1725) loss_scale 512.0000 (260.9389) mem 14939MB [2024-07-25 04:50:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][320/625] eta 0:02:07 lr 0.000503 wd 0.0500 time 0.3945 (0.4194) data time 0.0009 (0.0021) model time 0.3936 (0.4087) loss 6.0393 (7.0977) grad_norm 1.8647 (3.1402) loss_scale 512.0000 (268.7601) mem 14939MB [2024-07-25 04:50:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][330/625] eta 0:02:03 lr 0.000503 wd 0.0500 time 0.3973 (0.4188) data time 0.0007 (0.0020) model time 0.3967 (0.4084) loss 6.9370 (7.0941) grad_norm 1.7881 (3.1240) loss_scale 512.0000 (276.1088) mem 14939MB [2024-07-25 04:50:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][340/625] eta 0:01:59 lr 0.000503 wd 0.0500 time 0.4061 (0.4182) data time 0.0008 (0.0020) model time 0.4052 (0.4080) loss 7.5145 (7.0918) grad_norm 2.5620 (3.1187) loss_scale 512.0000 (283.0264) mem 14939MB [2024-07-25 04:50:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][350/625] eta 0:01:54 lr 0.000503 wd 0.0500 time 0.3977 (0.4178) data time 0.0006 (0.0020) model time 0.3970 (0.4078) loss 7.9664 (7.0910) grad_norm 5.9317 (3.1183) loss_scale 512.0000 (289.5499) mem 14939MB [2024-07-25 04:50:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][360/625] eta 0:01:50 lr 0.000503 wd 0.0500 time 0.4006 (0.4173) data time 0.0007 (0.0019) model time 0.3999 (0.4076) loss 8.4591 (7.0953) grad_norm 2.3388 (3.0995) loss_scale 512.0000 (295.7119) mem 14939MB [2024-07-25 04:50:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][370/625] eta 0:01:46 lr 0.000503 wd 0.0500 time 0.4113 (0.4169) data time 0.0007 (0.0019) model time 0.4106 (0.4074) loss 6.2195 (7.0974) grad_norm 2.4672 (3.0888) loss_scale 512.0000 (301.5418) mem 14939MB [2024-07-25 04:50:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][380/625] eta 0:01:42 lr 0.000503 wd 0.0500 time 0.3952 (0.4164) data time 0.0009 (0.0019) model time 0.3943 (0.4071) loss 6.7566 (7.0877) grad_norm 2.1801 (3.0945) loss_scale 512.0000 (307.0656) mem 14939MB [2024-07-25 04:50:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][390/625] eta 0:01:37 lr 0.000502 wd 0.0500 time 0.3979 (0.4160) data time 0.0008 (0.0019) model time 0.3971 (0.4069) loss 8.1756 (7.0987) grad_norm 2.7504 (3.0883) loss_scale 512.0000 (312.3069) mem 14939MB [2024-07-25 04:50:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][400/625] eta 0:01:33 lr 0.000502 wd 0.0500 time 0.4166 (0.4157) data time 0.0008 (0.0018) model time 0.4158 (0.4067) loss 6.9317 (7.1038) grad_norm 2.5916 (3.0659) loss_scale 512.0000 (317.2868) mem 14939MB [2024-07-25 04:50:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][410/625] eta 0:01:29 lr 0.000502 wd 0.0500 time 0.4070 (0.4154) data time 0.0009 (0.0018) model time 0.4060 (0.4066) loss 6.5717 (7.0943) grad_norm 2.1289 (3.0450) loss_scale 512.0000 (322.0243) mem 14939MB [2024-07-25 04:50:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][420/625] eta 0:01:25 lr 0.000502 wd 0.0500 time 0.4030 (0.4151) data time 0.0006 (0.0018) model time 0.4023 (0.4065) loss 5.7818 (7.0915) grad_norm 3.5516 (3.0463) loss_scale 512.0000 (326.5368) mem 14939MB [2024-07-25 04:50:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][430/625] eta 0:01:20 lr 0.000502 wd 0.0500 time 0.4007 (0.4148) data time 0.0007 (0.0018) model time 0.4000 (0.4063) loss 7.1737 (7.0873) grad_norm 1.7506 (3.0282) loss_scale 512.0000 (330.8399) mem 14939MB [2024-07-25 04:50:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][440/625] eta 0:01:16 lr 0.000502 wd 0.0500 time 0.4011 (0.4145) data time 0.0009 (0.0018) model time 0.4002 (0.4061) loss 7.1153 (7.0937) grad_norm 3.1662 (3.0220) loss_scale 512.0000 (334.9478) mem 14939MB [2024-07-25 04:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][450/625] eta 0:01:12 lr 0.000502 wd 0.0500 time 0.3971 (0.4150) data time 0.0008 (0.0018) model time 0.3963 (0.4069) loss 7.2493 (7.0955) grad_norm 3.4993 (3.0526) loss_scale 512.0000 (338.8736) mem 14939MB [2024-07-25 04:51:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][460/625] eta 0:01:08 lr 0.000502 wd 0.0500 time 0.5938 (0.4169) data time 0.0008 (0.0017) model time 0.5929 (0.4093) loss 7.3462 (7.0992) grad_norm 2.1353 (3.0599) loss_scale 512.0000 (342.6291) mem 14939MB [2024-07-25 04:51:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][470/625] eta 0:01:04 lr 0.000502 wd 0.0500 time 0.3957 (0.4185) data time 0.0006 (0.0017) model time 0.3951 (0.4112) loss 6.0202 (7.0993) grad_norm 2.1168 (3.0522) loss_scale 512.0000 (346.2251) mem 14939MB [2024-07-25 04:51:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][480/625] eta 0:01:00 lr 0.000501 wd 0.0500 time 0.4027 (0.4201) data time 0.0006 (0.0017) model time 0.4020 (0.4130) loss 6.8992 (7.0968) grad_norm 2.1995 (3.0546) loss_scale 512.0000 (349.6715) mem 14939MB [2024-07-25 04:51:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][490/625] eta 0:00:56 lr 0.000501 wd 0.0500 time 0.4011 (0.4196) data time 0.0007 (0.0017) model time 0.4004 (0.4127) loss 7.6610 (7.0945) grad_norm 2.0259 (3.0414) loss_scale 512.0000 (352.9776) mem 14939MB [2024-07-25 04:51:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][500/625] eta 0:00:52 lr 0.000501 wd 0.0500 time 0.4020 (0.4193) data time 0.0007 (0.0017) model time 0.4013 (0.4124) loss 5.6885 (7.0859) grad_norm 1.8529 (3.0304) loss_scale 512.0000 (356.1517) mem 14939MB [2024-07-25 04:51:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][510/625] eta 0:00:48 lr 0.000501 wd 0.0500 time 0.3981 (0.4189) data time 0.0009 (0.0017) model time 0.3972 (0.4122) loss 7.2088 (7.0854) grad_norm 4.1281 (3.0731) loss_scale 512.0000 (359.2016) mem 14939MB [2024-07-25 04:51:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][520/625] eta 0:00:43 lr 0.000501 wd 0.0500 time 0.3987 (0.4186) data time 0.0008 (0.0017) model time 0.3979 (0.4120) loss 5.8645 (7.0830) grad_norm 1.8303 (3.0912) loss_scale 512.0000 (362.1344) mem 14939MB [2024-07-25 04:51:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][530/625] eta 0:00:39 lr 0.000501 wd 0.0500 time 0.4121 (0.4183) data time 0.0006 (0.0017) model time 0.4115 (0.4117) loss 6.2903 (7.0874) grad_norm 1.9662 (3.0868) loss_scale 512.0000 (364.9567) mem 14939MB [2024-07-25 04:51:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][540/625] eta 0:00:35 lr 0.000501 wd 0.0500 time 0.4005 (0.4180) data time 0.0009 (0.0016) model time 0.3995 (0.4115) loss 7.2726 (7.0875) grad_norm 1.8383 (3.0796) loss_scale 512.0000 (367.6747) mem 14939MB [2024-07-25 04:51:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][550/625] eta 0:00:31 lr 0.000501 wd 0.0500 time 0.3985 (0.4176) data time 0.0009 (0.0016) model time 0.3976 (0.4112) loss 7.1431 (7.0825) grad_norm 1.8311 (3.0722) loss_scale 512.0000 (370.2940) mem 14939MB [2024-07-25 04:51:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][560/625] eta 0:00:27 lr 0.000501 wd 0.0500 time 0.3984 (0.4175) data time 0.0007 (0.0016) model time 0.3977 (0.4111) loss 6.1215 (7.0801) grad_norm 2.3379 (3.0609) loss_scale 512.0000 (372.8200) mem 14939MB [2024-07-25 04:51:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][570/625] eta 0:00:22 lr 0.000501 wd 0.0500 time 0.3979 (0.4172) data time 0.0008 (0.0016) model time 0.3971 (0.4109) loss 7.5564 (7.0795) grad_norm 2.1054 (3.0450) loss_scale 512.0000 (375.2574) mem 14939MB [2024-07-25 04:51:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][580/625] eta 0:00:18 lr 0.000500 wd 0.0500 time 0.3942 (0.4169) data time 0.0009 (0.0017) model time 0.3933 (0.4107) loss 7.5672 (7.0760) grad_norm 1.9612 (3.0466) loss_scale 512.0000 (377.6110) mem 14939MB [2024-07-25 04:52:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][590/625] eta 0:00:14 lr 0.000500 wd 0.0500 time 0.3972 (0.4166) data time 0.0007 (0.0017) model time 0.3965 (0.4104) loss 7.1207 (7.0719) grad_norm 2.3245 (3.0443) loss_scale 512.0000 (379.8849) mem 14939MB [2024-07-25 04:52:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][600/625] eta 0:00:10 lr 0.000500 wd 0.0500 time 0.3975 (0.4164) data time 0.0007 (0.0017) model time 0.3968 (0.4102) loss 6.6879 (7.0697) grad_norm 2.7800 (3.0346) loss_scale 512.0000 (382.0832) mem 14939MB [2024-07-25 04:52:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][610/625] eta 0:00:06 lr 0.000500 wd 0.0500 time 0.3983 (0.4161) data time 0.0004 (0.0017) model time 0.3979 (0.4100) loss 6.1488 (7.0651) grad_norm 1.9520 (3.0226) loss_scale 512.0000 (384.2095) mem 14939MB [2024-07-25 04:52:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][620/625] eta 0:00:02 lr 0.000500 wd 0.0500 time 0.4006 (0.4158) data time 0.0004 (0.0016) model time 0.4002 (0.4098) loss 6.0070 (7.0629) grad_norm 2.0475 (3.0194) loss_scale 512.0000 (386.2673) mem 14939MB [2024-07-25 04:52:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 175 training takes 0:04:19 [2024-07-25 04:52:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:52:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:52:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5601 (0.5601) Acc@1 89.697 (89.697) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 04:52:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9238 (0.7108) Acc@1 79.492 (85.693) Acc@5 95.752 (97.559) Mem 14939MB [2024-07-25 04:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0605 (0.8369) Acc@1 75.244 (82.313) Acc@5 94.385 (96.289) Mem 14939MB [2024-07-25 04:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.948 Acc@5 96.251 [2024-07-25 04:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-07-25 04:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.95% [2024-07-25 04:52:18 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 04:52:19 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 04:52:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.443 (0.443) Loss 0.5547 (0.5547) Acc@1 89.844 (89.844) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 04:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.140) Loss 0.8784 (0.6914) Acc@1 81.152 (86.253) Acc@5 96.094 (97.652) Mem 14939MB [2024-07-25 04:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.114) Loss 1.0137 (0.8109) Acc@1 75.928 (82.917) Acc@5 94.922 (96.445) Mem 14939MB [2024-07-25 04:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.504 Acc@5 96.403 [2024-07-25 04:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 04:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.50% [2024-07-25 04:52:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:52:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:52:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][0/625] eta 0:08:24 lr 0.000500 wd 0.0500 time 0.8075 (0.8075) data time 0.4254 (0.4254) model time 0.0000 (0.0000) loss 8.0481 (8.0481) grad_norm 3.4088 (3.4088) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][10/625] eta 0:04:29 lr 0.000500 wd 0.0500 time 0.4012 (0.4384) data time 0.0006 (0.0394) model time 0.0000 (0.0000) loss 7.1715 (6.8985) grad_norm 1.7777 (2.2909) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:52:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][20/625] eta 0:04:14 lr 0.000500 wd 0.0500 time 0.3991 (0.4208) data time 0.0006 (0.0210) model time 0.0000 (0.0000) loss 6.8030 (7.0154) grad_norm 2.3652 (2.2881) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:52:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][30/625] eta 0:04:06 lr 0.000500 wd 0.0500 time 0.4040 (0.4144) data time 0.0008 (0.0145) model time 0.0000 (0.0000) loss 6.5844 (6.9127) grad_norm 2.0193 (2.2491) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:52:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][40/625] eta 0:04:00 lr 0.000500 wd 0.0500 time 0.4005 (0.4114) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 7.0235 (6.8739) grad_norm 2.1610 (2.2201) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:52:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][50/625] eta 0:04:02 lr 0.000499 wd 0.0500 time 0.5957 (0.4222) data time 0.0006 (0.0093) model time 0.0000 (0.0000) loss 6.5737 (6.8633) grad_norm 3.3583 (2.5271) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][60/625] eta 0:04:04 lr 0.000499 wd 0.0500 time 0.5680 (0.4336) data time 0.0008 (0.0080) model time 0.5672 (0.4908) loss 6.9814 (6.8472) grad_norm 2.3334 (2.6085) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:52:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][70/625] eta 0:04:02 lr 0.000499 wd 0.0500 time 0.5277 (0.4373) data time 0.0008 (0.0070) model time 0.5269 (0.4748) loss 7.0809 (6.9127) grad_norm 2.5972 (2.5940) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:52:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][80/625] eta 0:03:56 lr 0.000499 wd 0.0500 time 0.3968 (0.4340) data time 0.0007 (0.0062) model time 0.3961 (0.4532) loss 7.5804 (6.9224) grad_norm 2.6388 (2.7937) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][90/625] eta 0:03:50 lr 0.000499 wd 0.0500 time 0.3974 (0.4301) data time 0.0008 (0.0056) model time 0.3966 (0.4392) loss 7.1501 (6.9287) grad_norm 3.0760 (2.8877) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][100/625] eta 0:03:44 lr 0.000499 wd 0.0500 time 0.4009 (0.4273) data time 0.0008 (0.0052) model time 0.4001 (0.4315) loss 6.0975 (6.9442) grad_norm 2.6127 (2.9355) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][110/625] eta 0:03:38 lr 0.000499 wd 0.0500 time 0.4016 (0.4249) data time 0.0006 (0.0048) model time 0.4010 (0.4263) loss 5.8958 (6.9402) grad_norm 2.6523 (3.0789) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][120/625] eta 0:03:33 lr 0.000499 wd 0.0500 time 0.3968 (0.4229) data time 0.0007 (0.0044) model time 0.3960 (0.4225) loss 5.7347 (6.9370) grad_norm 2.1170 (3.0327) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][130/625] eta 0:03:28 lr 0.000499 wd 0.0500 time 0.3990 (0.4212) data time 0.0008 (0.0042) model time 0.3982 (0.4196) loss 5.8206 (6.9405) grad_norm 2.8306 (3.0276) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][140/625] eta 0:03:23 lr 0.000498 wd 0.0500 time 0.3979 (0.4196) data time 0.0011 (0.0040) model time 0.3968 (0.4172) loss 7.9625 (6.9676) grad_norm 3.0081 (3.0269) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][150/625] eta 0:03:18 lr 0.000498 wd 0.0500 time 0.3977 (0.4184) data time 0.0008 (0.0038) model time 0.3969 (0.4156) loss 7.3055 (6.9769) grad_norm 2.0648 (3.0826) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][160/625] eta 0:03:14 lr 0.000498 wd 0.0500 time 0.3959 (0.4173) data time 0.0009 (0.0036) model time 0.3950 (0.4141) loss 6.8948 (6.9893) grad_norm 2.2513 (3.1116) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][170/625] eta 0:03:09 lr 0.000498 wd 0.0500 time 0.4063 (0.4163) data time 0.0006 (0.0034) model time 0.4058 (0.4129) loss 7.1924 (6.9943) grad_norm 2.9988 (3.0863) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][180/625] eta 0:03:04 lr 0.000498 wd 0.0500 time 0.3969 (0.4153) data time 0.0009 (0.0033) model time 0.3960 (0.4117) loss 7.1503 (6.9994) grad_norm 1.7139 (3.0389) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][190/625] eta 0:03:00 lr 0.000498 wd 0.0500 time 0.3981 (0.4144) data time 0.0006 (0.0031) model time 0.3974 (0.4107) loss 6.4188 (7.0260) grad_norm 3.2477 (3.0405) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][200/625] eta 0:02:56 lr 0.000498 wd 0.0500 time 0.4017 (0.4146) data time 0.0008 (0.0030) model time 0.4009 (0.4111) loss 7.6124 (7.0405) grad_norm 2.3633 (3.0509) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][210/625] eta 0:02:51 lr 0.000498 wd 0.0500 time 0.4013 (0.4138) data time 0.0008 (0.0029) model time 0.4005 (0.4103) loss 5.7187 (7.0319) grad_norm 2.0087 (3.1383) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][220/625] eta 0:02:47 lr 0.000498 wd 0.0500 time 0.3968 (0.4131) data time 0.0007 (0.0029) model time 0.3962 (0.4095) loss 7.3632 (7.0359) grad_norm 2.3090 (3.1425) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][230/625] eta 0:02:42 lr 0.000498 wd 0.0500 time 0.3983 (0.4126) data time 0.0009 (0.0028) model time 0.3974 (0.4090) loss 6.3194 (7.0326) grad_norm 2.0585 (3.1318) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][240/625] eta 0:02:38 lr 0.000497 wd 0.0500 time 0.3989 (0.4121) data time 0.0009 (0.0027) model time 0.3980 (0.4085) loss 7.8574 (7.0328) grad_norm 1.8050 (3.1064) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][250/625] eta 0:02:34 lr 0.000497 wd 0.0500 time 0.3985 (0.4117) data time 0.0007 (0.0026) model time 0.3978 (0.4080) loss 7.8845 (7.0343) grad_norm 1.9291 (3.0846) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][260/625] eta 0:02:30 lr 0.000497 wd 0.0500 time 0.5732 (0.4118) data time 0.0009 (0.0026) model time 0.5723 (0.4084) loss 8.1603 (7.0397) grad_norm 2.3497 (3.0804) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][270/625] eta 0:02:26 lr 0.000497 wd 0.0500 time 0.6179 (0.4140) data time 0.0006 (0.0025) model time 0.6173 (0.4112) loss 7.1820 (7.0450) grad_norm 1.8955 (3.0477) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][280/625] eta 0:02:23 lr 0.000497 wd 0.0500 time 0.5685 (0.4169) data time 0.0007 (0.0024) model time 0.5679 (0.4148) loss 7.5872 (7.0410) grad_norm 2.1296 (3.0144) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][290/625] eta 0:02:20 lr 0.000497 wd 0.0500 time 0.5657 (0.4189) data time 0.0007 (0.0024) model time 0.5651 (0.4173) loss 6.9332 (7.0270) grad_norm 2.4469 (2.9907) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][300/625] eta 0:02:16 lr 0.000497 wd 0.0500 time 0.3961 (0.4196) data time 0.0006 (0.0023) model time 0.3955 (0.4182) loss 6.2488 (7.0266) grad_norm 2.0209 (2.9704) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][310/625] eta 0:02:11 lr 0.000497 wd 0.0500 time 0.3989 (0.4189) data time 0.0008 (0.0023) model time 0.3981 (0.4174) loss 7.2895 (7.0336) grad_norm 1.5301 (2.9517) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][320/625] eta 0:02:07 lr 0.000497 wd 0.0500 time 0.4026 (0.4183) data time 0.0009 (0.0023) model time 0.4017 (0.4166) loss 6.6962 (7.0293) grad_norm 2.5141 (2.9507) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][330/625] eta 0:02:03 lr 0.000496 wd 0.0500 time 0.4010 (0.4177) data time 0.0006 (0.0022) model time 0.4004 (0.4160) loss 8.1346 (7.0249) grad_norm 2.3141 (2.9586) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][340/625] eta 0:01:58 lr 0.000496 wd 0.0500 time 0.4050 (0.4172) data time 0.0008 (0.0022) model time 0.4041 (0.4154) loss 6.6911 (7.0251) grad_norm 3.3797 (2.9362) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][350/625] eta 0:01:54 lr 0.000496 wd 0.0500 time 0.3985 (0.4167) data time 0.0008 (0.0021) model time 0.3977 (0.4149) loss 7.6524 (7.0298) grad_norm 4.3219 (2.9450) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][360/625] eta 0:01:50 lr 0.000496 wd 0.0500 time 0.3969 (0.4162) data time 0.0006 (0.0021) model time 0.3963 (0.4144) loss 8.2706 (7.0255) grad_norm 3.6263 (2.9395) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:54:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][370/625] eta 0:01:46 lr 0.000496 wd 0.0500 time 0.3973 (0.4158) data time 0.0009 (0.0021) model time 0.3964 (0.4138) loss 6.8252 (7.0225) grad_norm 1.9771 (2.9141) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][380/625] eta 0:01:41 lr 0.000496 wd 0.0500 time 0.3994 (0.4153) data time 0.0007 (0.0020) model time 0.3987 (0.4133) loss 7.2423 (7.0197) grad_norm 2.2819 (2.9092) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][390/625] eta 0:01:37 lr 0.000496 wd 0.0500 time 0.3965 (0.4149) data time 0.0007 (0.0020) model time 0.3958 (0.4129) loss 7.4645 (7.0180) grad_norm 2.7501 (2.8967) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][400/625] eta 0:01:33 lr 0.000496 wd 0.0500 time 0.3962 (0.4145) data time 0.0009 (0.0020) model time 0.3953 (0.4124) loss 6.2939 (7.0269) grad_norm 1.6364 (2.8740) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][410/625] eta 0:01:29 lr 0.000496 wd 0.0500 time 0.3993 (0.4141) data time 0.0007 (0.0020) model time 0.3986 (0.4120) loss 5.9095 (7.0231) grad_norm 2.3660 (2.8609) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][420/625] eta 0:01:24 lr 0.000496 wd 0.0500 time 0.3966 (0.4140) data time 0.0007 (0.0019) model time 0.3959 (0.4120) loss 7.9341 (7.0219) grad_norm 25.9129 (2.9080) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][430/625] eta 0:01:20 lr 0.000495 wd 0.0500 time 0.3988 (0.4136) data time 0.0007 (0.0019) model time 0.3981 (0.4116) loss 7.3009 (7.0232) grad_norm 2.8426 (2.9141) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][440/625] eta 0:01:16 lr 0.000495 wd 0.0500 time 0.3998 (0.4133) data time 0.0007 (0.0019) model time 0.3991 (0.4112) loss 7.8092 (7.0204) grad_norm 11.9718 (2.9548) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][450/625] eta 0:01:12 lr 0.000495 wd 0.0500 time 0.4011 (0.4129) data time 0.0008 (0.0019) model time 0.4003 (0.4108) loss 7.7994 (7.0339) grad_norm 4.0600 (2.9640) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][460/625] eta 0:01:08 lr 0.000495 wd 0.0500 time 0.3953 (0.4126) data time 0.0007 (0.0018) model time 0.3947 (0.4105) loss 5.8533 (7.0360) grad_norm 2.2501 (2.9724) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][470/625] eta 0:01:03 lr 0.000495 wd 0.0500 time 0.3964 (0.4123) data time 0.0008 (0.0018) model time 0.3955 (0.4102) loss 6.2822 (7.0296) grad_norm 2.6522 (2.9681) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][480/625] eta 0:00:59 lr 0.000495 wd 0.0500 time 0.3975 (0.4120) data time 0.0007 (0.0018) model time 0.3968 (0.4099) loss 7.1492 (7.0333) grad_norm 1.9063 (2.9554) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][490/625] eta 0:00:55 lr 0.000495 wd 0.0500 time 0.3962 (0.4128) data time 0.0007 (0.0018) model time 0.3955 (0.4109) loss 6.9219 (7.0320) grad_norm 1.9670 (2.9437) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][500/625] eta 0:00:51 lr 0.000495 wd 0.0500 time 0.5814 (0.4144) data time 0.0008 (0.0018) model time 0.5805 (0.4126) loss 7.0750 (7.0327) grad_norm 3.0454 (2.9330) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:55:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][510/625] eta 0:00:47 lr 0.000495 wd 0.0500 time 0.5921 (0.4162) data time 0.0009 (0.0018) model time 0.5912 (0.4146) loss 6.5201 (7.0323) grad_norm 2.0762 (2.9377) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][520/625] eta 0:00:43 lr 0.000494 wd 0.0500 time 0.3951 (0.4165) data time 0.0008 (0.0018) model time 0.3943 (0.4149) loss 7.2417 (7.0404) grad_norm 4.0260 (2.9335) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][530/625] eta 0:00:39 lr 0.000494 wd 0.0500 time 0.4009 (0.4162) data time 0.0008 (0.0017) model time 0.4001 (0.4146) loss 7.3523 (7.0472) grad_norm 2.4725 (2.9351) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][540/625] eta 0:00:35 lr 0.000494 wd 0.0500 time 0.3962 (0.4159) data time 0.0006 (0.0017) model time 0.3956 (0.4143) loss 7.7699 (7.0492) grad_norm 1.8433 (2.9332) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][550/625] eta 0:00:31 lr 0.000494 wd 0.0500 time 0.3954 (0.4157) data time 0.0006 (0.0017) model time 0.3948 (0.4141) loss 7.3351 (7.0491) grad_norm 2.5501 (2.9397) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][560/625] eta 0:00:27 lr 0.000494 wd 0.0500 time 0.4322 (0.4157) data time 0.0007 (0.0017) model time 0.4316 (0.4141) loss 7.4765 (7.0479) grad_norm 2.2810 (2.9338) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][570/625] eta 0:00:22 lr 0.000494 wd 0.0500 time 0.4253 (0.4156) data time 0.0006 (0.0017) model time 0.4247 (0.4140) loss 6.6664 (7.0475) grad_norm 3.3825 (2.9406) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][580/625] eta 0:00:18 lr 0.000494 wd 0.0500 time 0.3963 (0.4153) data time 0.0007 (0.0017) model time 0.3956 (0.4137) loss 7.2930 (7.0457) grad_norm 3.1361 (2.9431) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][590/625] eta 0:00:14 lr 0.000494 wd 0.0500 time 0.4016 (0.4151) data time 0.0006 (0.0017) model time 0.4010 (0.4135) loss 7.4140 (7.0536) grad_norm 2.2748 (2.9397) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][600/625] eta 0:00:10 lr 0.000494 wd 0.0500 time 0.4126 (0.4149) data time 0.0009 (0.0016) model time 0.4117 (0.4133) loss 6.1854 (7.0541) grad_norm 1.9855 (2.9268) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][610/625] eta 0:00:06 lr 0.000494 wd 0.0500 time 0.3950 (0.4148) data time 0.0004 (0.0017) model time 0.3946 (0.4131) loss 7.6182 (7.0592) grad_norm 2.3709 (2.9238) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][620/625] eta 0:00:02 lr 0.000493 wd 0.0500 time 0.3983 (0.4146) data time 0.0004 (0.0016) model time 0.3979 (0.4129) loss 7.2321 (7.0580) grad_norm 2.3618 (2.9175) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 176 training takes 0:04:19 [2024-07-25 04:56:42 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 04:56:43 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 04:56:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.5776 (0.5776) Acc@1 89.404 (89.404) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 04:56:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.9243 (0.7284) Acc@1 80.469 (85.720) Acc@5 95.947 (97.443) Mem 14939MB [2024-07-25 04:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0439 (0.8542) Acc@1 76.270 (82.361) Acc@5 94.385 (96.184) Mem 14939MB [2024-07-25 04:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.022 Acc@5 96.135 [2024-07-25 04:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-07-25 04:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.02% [2024-07-25 04:56:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 04:56:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 04:56:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5537 (0.5537) Acc@1 89.941 (89.941) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 04:56:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8770 (0.6903) Acc@1 81.201 (86.284) Acc@5 95.996 (97.643) Mem 14939MB [2024-07-25 04:56:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0127 (0.8098) Acc@1 75.830 (82.952) Acc@5 95.020 (96.452) Mem 14939MB [2024-07-25 04:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.540 Acc@5 96.411 [2024-07-25 04:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 04:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.54% [2024-07-25 04:56:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 04:56:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 04:56:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][0/625] eta 0:07:48 lr 0.000493 wd 0.0500 time 0.7502 (0.7502) data time 0.3696 (0.3696) model time 0.0000 (0.0000) loss 6.5749 (6.5749) grad_norm 1.7800 (1.7800) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][10/625] eta 0:04:24 lr 0.000493 wd 0.0500 time 0.3954 (0.4308) data time 0.0006 (0.0344) model time 0.0000 (0.0000) loss 6.2143 (6.9801) grad_norm 3.1347 (2.4427) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:56:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][20/625] eta 0:04:11 lr 0.000493 wd 0.0500 time 0.4016 (0.4161) data time 0.0009 (0.0184) model time 0.0000 (0.0000) loss 7.6342 (6.9530) grad_norm 1.7199 (2.5454) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][30/625] eta 0:04:04 lr 0.000493 wd 0.0500 time 0.4025 (0.4109) data time 0.0008 (0.0128) model time 0.0000 (0.0000) loss 8.0004 (7.0200) grad_norm 2.9140 (2.5384) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][40/625] eta 0:03:58 lr 0.000493 wd 0.0500 time 0.3972 (0.4081) data time 0.0008 (0.0099) model time 0.0000 (0.0000) loss 7.6730 (6.9942) grad_norm 1.8482 (2.5811) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][50/625] eta 0:03:53 lr 0.000493 wd 0.0500 time 0.3977 (0.4065) data time 0.0007 (0.0081) model time 0.0000 (0.0000) loss 5.6075 (6.9834) grad_norm 8.2643 (2.6231) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][60/625] eta 0:03:49 lr 0.000493 wd 0.0500 time 0.3990 (0.4054) data time 0.0007 (0.0070) model time 0.3984 (0.3991) loss 7.1037 (6.9564) grad_norm 2.4170 (2.5992) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][70/625] eta 0:03:44 lr 0.000493 wd 0.0500 time 0.4122 (0.4048) data time 0.0006 (0.0061) model time 0.4116 (0.3996) loss 7.2842 (6.9947) grad_norm 2.2859 (2.5615) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][80/625] eta 0:03:41 lr 0.000493 wd 0.0500 time 0.4028 (0.4070) data time 0.0007 (0.0055) model time 0.4021 (0.4071) loss 7.1944 (7.0195) grad_norm 1.9703 (2.5385) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][90/625] eta 0:03:41 lr 0.000492 wd 0.0500 time 0.4030 (0.4140) data time 0.0008 (0.0050) model time 0.4022 (0.4227) loss 6.7843 (6.9981) grad_norm 2.0698 (2.7217) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][100/625] eta 0:03:41 lr 0.000492 wd 0.0500 time 0.5901 (0.4217) data time 0.0009 (0.0046) model time 0.5893 (0.4362) loss 6.8613 (7.0172) grad_norm 2.4269 (2.6870) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][110/625] eta 0:03:39 lr 0.000492 wd 0.0500 time 0.3989 (0.4271) data time 0.0008 (0.0042) model time 0.3981 (0.4438) loss 7.4324 (7.0218) grad_norm 1.8720 (2.7146) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][120/625] eta 0:03:35 lr 0.000492 wd 0.0500 time 0.4041 (0.4265) data time 0.0007 (0.0040) model time 0.4034 (0.4402) loss 6.5500 (6.9995) grad_norm 3.3888 (2.6947) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][130/625] eta 0:03:30 lr 0.000492 wd 0.0500 time 0.4003 (0.4245) data time 0.0007 (0.0037) model time 0.3996 (0.4351) loss 6.2024 (7.0136) grad_norm 2.4040 (2.7011) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][140/625] eta 0:03:25 lr 0.000492 wd 0.0500 time 0.4000 (0.4228) data time 0.0009 (0.0035) model time 0.3991 (0.4311) loss 6.3565 (7.0100) grad_norm 1.9940 (2.7367) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][150/625] eta 0:03:20 lr 0.000492 wd 0.0500 time 0.4057 (0.4227) data time 0.0006 (0.0034) model time 0.4051 (0.4300) loss 6.0169 (6.9941) grad_norm 4.4396 (2.7852) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:57:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][160/625] eta 0:03:15 lr 0.000492 wd 0.0500 time 0.3944 (0.4211) data time 0.0010 (0.0032) model time 0.3933 (0.4270) loss 7.4523 (7.0030) grad_norm 2.5206 (2.7601) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][170/625] eta 0:03:11 lr 0.000492 wd 0.0500 time 0.4012 (0.4198) data time 0.0010 (0.0031) model time 0.4002 (0.4245) loss 7.7405 (7.0026) grad_norm 3.5023 (2.8873) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][180/625] eta 0:03:06 lr 0.000492 wd 0.0500 time 0.3996 (0.4186) data time 0.0007 (0.0030) model time 0.3988 (0.4224) loss 6.4836 (7.0133) grad_norm 2.2151 (2.8758) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][190/625] eta 0:03:01 lr 0.000491 wd 0.0500 time 0.3986 (0.4176) data time 0.0008 (0.0029) model time 0.3978 (0.4207) loss 8.2420 (7.0228) grad_norm 1.6382 (2.8793) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][200/625] eta 0:02:57 lr 0.000491 wd 0.0500 time 0.4035 (0.4167) data time 0.0006 (0.0028) model time 0.4029 (0.4192) loss 6.3969 (7.0172) grad_norm 2.7654 (2.8944) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][210/625] eta 0:02:52 lr 0.000491 wd 0.0500 time 0.3956 (0.4160) data time 0.0008 (0.0027) model time 0.3948 (0.4181) loss 7.2207 (7.0117) grad_norm 2.5610 (2.8647) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][220/625] eta 0:02:48 lr 0.000491 wd 0.0500 time 0.4005 (0.4152) data time 0.0009 (0.0026) model time 0.3996 (0.4169) loss 7.5639 (7.0196) grad_norm 5.6584 (2.9249) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][230/625] eta 0:02:43 lr 0.000491 wd 0.0500 time 0.4030 (0.4147) data time 0.0006 (0.0025) model time 0.4024 (0.4161) loss 7.5542 (7.0195) grad_norm 4.1934 (2.9232) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][240/625] eta 0:02:39 lr 0.000491 wd 0.0500 time 0.4001 (0.4140) data time 0.0006 (0.0025) model time 0.3995 (0.4151) loss 6.2918 (7.0154) grad_norm 3.2987 (2.9373) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][250/625] eta 0:02:35 lr 0.000491 wd 0.0500 time 0.3969 (0.4134) data time 0.0008 (0.0024) model time 0.3961 (0.4142) loss 7.3357 (7.0210) grad_norm 2.6247 (2.9239) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][260/625] eta 0:02:30 lr 0.000491 wd 0.0500 time 0.4166 (0.4130) data time 0.0007 (0.0023) model time 0.4159 (0.4136) loss 7.9944 (7.0381) grad_norm 2.0074 (2.8975) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][270/625] eta 0:02:26 lr 0.000491 wd 0.0500 time 0.3989 (0.4124) data time 0.0009 (0.0023) model time 0.3980 (0.4129) loss 6.3257 (7.0438) grad_norm 1.8922 (2.8816) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][280/625] eta 0:02:22 lr 0.000490 wd 0.0500 time 0.4087 (0.4120) data time 0.0007 (0.0022) model time 0.4080 (0.4123) loss 7.6656 (7.0511) grad_norm 1.8818 (2.8593) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][290/625] eta 0:02:17 lr 0.000490 wd 0.0500 time 0.3991 (0.4116) data time 0.0007 (0.0022) model time 0.3984 (0.4118) loss 7.7615 (7.0368) grad_norm 1.7172 (2.8342) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][300/625] eta 0:02:14 lr 0.000490 wd 0.0500 time 0.5875 (0.4124) data time 0.0008 (0.0022) model time 0.5867 (0.4127) loss 7.1915 (7.0449) grad_norm 2.4375 (2.8186) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:58:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][310/625] eta 0:02:10 lr 0.000490 wd 0.0500 time 0.5979 (0.4133) data time 0.0008 (0.0021) model time 0.5970 (0.4137) loss 6.2061 (7.0521) grad_norm 2.8719 (2.8047) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][320/625] eta 0:02:06 lr 0.000490 wd 0.0500 time 0.5722 (0.4150) data time 0.0007 (0.0021) model time 0.5715 (0.4157) loss 6.9253 (7.0461) grad_norm 3.2982 (2.7988) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][330/625] eta 0:02:02 lr 0.000490 wd 0.0500 time 0.4017 (0.4156) data time 0.0006 (0.0020) model time 0.4011 (0.4163) loss 6.8582 (7.0493) grad_norm 2.5882 (2.7946) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][340/625] eta 0:01:58 lr 0.000490 wd 0.0500 time 0.4031 (0.4155) data time 0.0009 (0.0020) model time 0.4022 (0.4162) loss 7.4874 (7.0460) grad_norm 2.6608 (2.7907) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][350/625] eta 0:01:54 lr 0.000490 wd 0.0500 time 0.3957 (0.4152) data time 0.0007 (0.0020) model time 0.3950 (0.4158) loss 7.0106 (7.0466) grad_norm 2.4876 (2.7946) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][360/625] eta 0:01:49 lr 0.000490 wd 0.0500 time 0.4364 (0.4149) data time 0.0009 (0.0020) model time 0.4355 (0.4153) loss 6.7651 (7.0459) grad_norm 1.7020 (2.7964) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][370/625] eta 0:01:45 lr 0.000490 wd 0.0500 time 0.3951 (0.4148) data time 0.0009 (0.0019) model time 0.3942 (0.4152) loss 7.4834 (7.0489) grad_norm 1.8928 (2.8015) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][380/625] eta 0:01:41 lr 0.000489 wd 0.0500 time 0.3978 (0.4144) data time 0.0009 (0.0019) model time 0.3969 (0.4147) loss 8.4525 (7.0510) grad_norm 1.9979 (2.7983) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][390/625] eta 0:01:37 lr 0.000489 wd 0.0500 time 0.3941 (0.4141) data time 0.0008 (0.0019) model time 0.3933 (0.4143) loss 6.5653 (7.0447) grad_norm 2.5005 (2.7957) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][400/625] eta 0:01:33 lr 0.000489 wd 0.0500 time 0.3968 (0.4137) data time 0.0007 (0.0019) model time 0.3961 (0.4138) loss 8.5091 (7.0445) grad_norm 2.1147 (2.7878) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][410/625] eta 0:01:28 lr 0.000489 wd 0.0500 time 0.3956 (0.4133) data time 0.0009 (0.0019) model time 0.3947 (0.4133) loss 7.6944 (7.0401) grad_norm 4.1541 (2.8031) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][420/625] eta 0:01:24 lr 0.000489 wd 0.0500 time 0.4005 (0.4130) data time 0.0009 (0.0018) model time 0.3996 (0.4129) loss 7.5920 (7.0467) grad_norm 3.5298 (2.7935) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][430/625] eta 0:01:20 lr 0.000489 wd 0.0500 time 0.3955 (0.4127) data time 0.0007 (0.0018) model time 0.3948 (0.4126) loss 7.5028 (7.0491) grad_norm 1.8176 (2.7909) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][440/625] eta 0:01:16 lr 0.000489 wd 0.0500 time 0.3951 (0.4125) data time 0.0009 (0.0018) model time 0.3942 (0.4123) loss 6.7123 (7.0413) grad_norm 2.7271 (2.8172) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][450/625] eta 0:01:12 lr 0.000489 wd 0.0500 time 0.3992 (0.4122) data time 0.0007 (0.0018) model time 0.3986 (0.4120) loss 6.0715 (7.0398) grad_norm 1.8866 (2.8247) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 04:59:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][460/625] eta 0:01:07 lr 0.000489 wd 0.0500 time 0.3963 (0.4119) data time 0.0009 (0.0018) model time 0.3954 (0.4117) loss 6.7497 (7.0383) grad_norm 1.6891 (2.8071) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][470/625] eta 0:01:03 lr 0.000488 wd 0.0500 time 0.3939 (0.4117) data time 0.0009 (0.0017) model time 0.3930 (0.4113) loss 6.5820 (7.0356) grad_norm 2.6751 (2.8281) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][480/625] eta 0:00:59 lr 0.000488 wd 0.0500 time 0.3995 (0.4114) data time 0.0006 (0.0017) model time 0.3989 (0.4110) loss 7.7281 (7.0277) grad_norm 2.2059 (2.8232) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][490/625] eta 0:00:55 lr 0.000488 wd 0.0500 time 0.3999 (0.4111) data time 0.0007 (0.0017) model time 0.3992 (0.4107) loss 6.9821 (7.0283) grad_norm 2.1408 (2.8128) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][500/625] eta 0:00:51 lr 0.000488 wd 0.0500 time 0.4020 (0.4109) data time 0.0008 (0.0017) model time 0.4012 (0.4105) loss 6.2835 (7.0329) grad_norm 3.1393 (2.8188) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][510/625] eta 0:00:47 lr 0.000488 wd 0.0500 time 0.3998 (0.4107) data time 0.0007 (0.0017) model time 0.3992 (0.4102) loss 6.3534 (7.0283) grad_norm 1.6211 (2.8070) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][520/625] eta 0:00:43 lr 0.000488 wd 0.0500 time 0.3958 (0.4108) data time 0.0009 (0.0017) model time 0.3949 (0.4103) loss 7.1957 (7.0307) grad_norm 3.1629 (2.8149) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][530/625] eta 0:00:39 lr 0.000488 wd 0.0500 time 0.3948 (0.4123) data time 0.0009 (0.0016) model time 0.3940 (0.4120) loss 5.7022 (7.0334) grad_norm 10.5864 (2.8281) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][540/625] eta 0:00:35 lr 0.000488 wd 0.0500 time 0.6154 (0.4141) data time 0.0006 (0.0016) model time 0.6148 (0.4139) loss 7.8097 (7.0254) grad_norm 2.3038 (2.8370) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][550/625] eta 0:00:31 lr 0.000488 wd 0.0500 time 0.3979 (0.4145) data time 0.0007 (0.0016) model time 0.3972 (0.4144) loss 6.4987 (7.0285) grad_norm 2.3111 (2.8494) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][560/625] eta 0:00:26 lr 0.000488 wd 0.0500 time 0.3958 (0.4148) data time 0.0006 (0.0016) model time 0.3952 (0.4147) loss 8.2317 (7.0339) grad_norm 2.6264 (2.8512) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][570/625] eta 0:00:22 lr 0.000487 wd 0.0500 time 0.3996 (0.4146) data time 0.0008 (0.0016) model time 0.3988 (0.4144) loss 7.2833 (7.0412) grad_norm 11.1677 (2.8620) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][580/625] eta 0:00:18 lr 0.000487 wd 0.0500 time 0.3953 (0.4143) data time 0.0007 (0.0016) model time 0.3946 (0.4141) loss 6.6818 (7.0396) grad_norm 2.8017 (2.8630) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][590/625] eta 0:00:14 lr 0.000487 wd 0.0500 time 0.3989 (0.4143) data time 0.0006 (0.0016) model time 0.3983 (0.4140) loss 5.7918 (7.0297) grad_norm 2.5052 (2.8615) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:00:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][600/625] eta 0:00:10 lr 0.000487 wd 0.0500 time 0.3983 (0.4140) data time 0.0008 (0.0016) model time 0.3975 (0.4138) loss 7.4975 (7.0326) grad_norm 2.1956 (2.8658) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][610/625] eta 0:00:06 lr 0.000487 wd 0.0500 time 0.3981 (0.4138) data time 0.0006 (0.0016) model time 0.3975 (0.4136) loss 8.0716 (7.0346) grad_norm 1.9825 (2.8629) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][620/625] eta 0:00:02 lr 0.000487 wd 0.0500 time 0.3971 (0.4136) data time 0.0006 (0.0015) model time 0.3965 (0.4133) loss 7.0334 (7.0304) grad_norm 3.1161 (2.8654) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 177 training takes 0:04:18 [2024-07-25 05:01:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:01:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:01:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5806 (0.5806) Acc@1 89.209 (89.209) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 05:01:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9302 (0.7221) Acc@1 79.541 (85.516) Acc@5 95.898 (97.483) Mem 14939MB [2024-07-25 05:01:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0684 (0.8460) Acc@1 74.805 (82.164) Acc@5 94.580 (96.210) Mem 14939MB [2024-07-25 05:01:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.850 Acc@5 96.191 [2024-07-25 05:01:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-07-25 05:01:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.841 (0.841) Loss 0.5537 (0.5537) Acc@1 89.941 (89.941) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 05:01:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.157) Loss 0.8755 (0.6896) Acc@1 81.250 (86.288) Acc@5 96.045 (97.656) Mem 14939MB [2024-07-25 05:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 1.0098 (0.8088) Acc@1 75.781 (82.954) Acc@5 95.117 (96.466) Mem 14939MB [2024-07-25 05:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.548 Acc@5 96.423 [2024-07-25 05:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 05:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.55% [2024-07-25 05:01:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:01:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:01:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][0/625] eta 0:07:55 lr 0.000487 wd 0.0500 time 0.7604 (0.7604) data time 0.3716 (0.3716) model time 0.0000 (0.0000) loss 6.8058 (6.8058) grad_norm 3.2039 (3.2039) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][10/625] eta 0:04:25 lr 0.000487 wd 0.0500 time 0.3997 (0.4318) data time 0.0008 (0.0346) model time 0.0000 (0.0000) loss 7.4084 (7.3581) grad_norm 4.1496 (3.1052) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][20/625] eta 0:04:12 lr 0.000487 wd 0.0500 time 0.3960 (0.4166) data time 0.0007 (0.0185) model time 0.0000 (0.0000) loss 6.3533 (7.1772) grad_norm 2.5195 (2.9542) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][30/625] eta 0:04:05 lr 0.000487 wd 0.0500 time 0.4062 (0.4122) data time 0.0008 (0.0129) model time 0.0000 (0.0000) loss 7.4772 (7.0937) grad_norm 5.1845 (3.1973) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][40/625] eta 0:03:59 lr 0.000486 wd 0.0500 time 0.4087 (0.4099) data time 0.0009 (0.0100) model time 0.0000 (0.0000) loss 7.3167 (6.9959) grad_norm 3.2064 (3.1335) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][50/625] eta 0:03:55 lr 0.000486 wd 0.0500 time 0.4047 (0.4088) data time 0.0009 (0.0082) model time 0.0000 (0.0000) loss 7.4530 (7.0173) grad_norm 1.8606 (2.9828) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][60/625] eta 0:03:50 lr 0.000486 wd 0.0500 time 0.4031 (0.4073) data time 0.0009 (0.0070) model time 0.4022 (0.3985) loss 7.4533 (7.0814) grad_norm 2.2463 (2.9187) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][70/625] eta 0:03:45 lr 0.000486 wd 0.0500 time 0.4031 (0.4062) data time 0.0007 (0.0061) model time 0.4025 (0.3986) loss 7.8060 (7.0768) grad_norm 1.8572 (2.8491) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][80/625] eta 0:03:40 lr 0.000486 wd 0.0500 time 0.3927 (0.4051) data time 0.0008 (0.0055) model time 0.3919 (0.3978) loss 8.3121 (7.1017) grad_norm 3.2398 (2.8212) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][90/625] eta 0:03:36 lr 0.000486 wd 0.0500 time 0.3974 (0.4043) data time 0.0008 (0.0050) model time 0.3966 (0.3976) loss 6.3822 (7.0852) grad_norm 3.0517 (2.8793) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:01:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][100/625] eta 0:03:31 lr 0.000486 wd 0.0500 time 0.4005 (0.4038) data time 0.0009 (0.0046) model time 0.3996 (0.3977) loss 6.0552 (7.0606) grad_norm 2.0515 (2.8626) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:02:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][110/625] eta 0:03:27 lr 0.000486 wd 0.0500 time 0.3979 (0.4033) data time 0.0007 (0.0043) model time 0.3973 (0.3977) loss 7.0649 (7.0767) grad_norm 2.9349 (2.8343) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:02:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][120/625] eta 0:03:25 lr 0.000486 wd 0.0500 time 0.5501 (0.4074) data time 0.0009 (0.0040) model time 0.5492 (0.4054) loss 6.4219 (7.0581) grad_norm 4.4052 (2.8314) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:02:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][130/625] eta 0:03:25 lr 0.000485 wd 0.0500 time 0.5914 (0.4146) data time 0.0009 (0.0038) model time 0.5906 (0.4173) loss 7.0240 (7.0644) grad_norm 3.4637 (2.8382) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:02:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][140/625] eta 0:03:23 lr 0.000485 wd 0.0500 time 0.5537 (0.4203) data time 0.0007 (0.0036) model time 0.5530 (0.4259) loss 6.2629 (7.0288) grad_norm 1.9711 (inf) loss_scale 256.0000 (508.3688) mem 14939MB [2024-07-25 05:02:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][150/625] eta 0:03:20 lr 0.000485 wd 0.0500 time 0.5856 (0.4227) data time 0.0006 (0.0034) model time 0.5850 (0.4288) loss 8.5720 (7.0262) grad_norm 1.8774 (inf) loss_scale 256.0000 (491.6556) mem 14939MB [2024-07-25 05:02:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][160/625] eta 0:03:15 lr 0.000485 wd 0.0500 time 0.3959 (0.4211) data time 0.0007 (0.0032) model time 0.3951 (0.4258) loss 7.4430 (7.0352) grad_norm 2.0620 (inf) loss_scale 256.0000 (477.0186) mem 14939MB [2024-07-25 05:02:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][170/625] eta 0:03:11 lr 0.000485 wd 0.0500 time 0.3984 (0.4200) data time 0.0008 (0.0031) model time 0.3976 (0.4238) loss 7.2459 (7.0118) grad_norm 3.0234 (inf) loss_scale 256.0000 (464.0936) mem 14939MB [2024-07-25 05:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][180/625] eta 0:03:06 lr 0.000485 wd 0.0500 time 0.3990 (0.4189) data time 0.0007 (0.0030) model time 0.3984 (0.4220) loss 6.9183 (7.0010) grad_norm 3.0336 (inf) loss_scale 256.0000 (452.5967) mem 14939MB [2024-07-25 05:02:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][190/625] eta 0:03:01 lr 0.000485 wd 0.0500 time 0.3958 (0.4179) data time 0.0006 (0.0029) model time 0.3952 (0.4203) loss 5.4842 (6.9969) grad_norm 2.6128 (inf) loss_scale 256.0000 (442.3037) mem 14939MB [2024-07-25 05:02:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][200/625] eta 0:02:57 lr 0.000485 wd 0.0500 time 0.3988 (0.4170) data time 0.0007 (0.0028) model time 0.3981 (0.4189) loss 5.3390 (6.9759) grad_norm 3.2519 (inf) loss_scale 256.0000 (433.0348) mem 14939MB [2024-07-25 05:02:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][210/625] eta 0:02:52 lr 0.000485 wd 0.0500 time 0.3986 (0.4162) data time 0.0007 (0.0027) model time 0.3978 (0.4176) loss 6.3064 (6.9778) grad_norm 2.1999 (inf) loss_scale 256.0000 (424.6445) mem 14939MB [2024-07-25 05:02:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][220/625] eta 0:02:48 lr 0.000485 wd 0.0500 time 0.3984 (0.4154) data time 0.0006 (0.0026) model time 0.3977 (0.4164) loss 6.9810 (7.0001) grad_norm 2.0216 (inf) loss_scale 256.0000 (417.0136) mem 14939MB [2024-07-25 05:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][230/625] eta 0:02:43 lr 0.000484 wd 0.0500 time 0.4067 (0.4147) data time 0.0007 (0.0025) model time 0.4060 (0.4155) loss 6.7038 (7.0084) grad_norm 2.0462 (inf) loss_scale 256.0000 (410.0433) mem 14939MB [2024-07-25 05:02:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][240/625] eta 0:02:39 lr 0.000484 wd 0.0500 time 0.3997 (0.4141) data time 0.0009 (0.0025) model time 0.3988 (0.4146) loss 6.0705 (7.0092) grad_norm 2.2979 (inf) loss_scale 256.0000 (403.6515) mem 14939MB [2024-07-25 05:02:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][250/625] eta 0:02:35 lr 0.000484 wd 0.0500 time 0.3993 (0.4135) data time 0.0008 (0.0024) model time 0.3985 (0.4138) loss 6.0611 (6.9981) grad_norm 2.7787 (inf) loss_scale 256.0000 (397.7689) mem 14939MB [2024-07-25 05:03:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][260/625] eta 0:02:30 lr 0.000484 wd 0.0500 time 0.3964 (0.4130) data time 0.0007 (0.0023) model time 0.3957 (0.4130) loss 8.0629 (7.0022) grad_norm 3.6428 (inf) loss_scale 256.0000 (392.3372) mem 14939MB [2024-07-25 05:03:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][270/625] eta 0:02:26 lr 0.000484 wd 0.0500 time 0.3966 (0.4127) data time 0.0008 (0.0024) model time 0.3957 (0.4125) loss 7.3789 (7.0036) grad_norm 2.3964 (inf) loss_scale 256.0000 (387.3063) mem 14939MB [2024-07-25 05:03:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][280/625] eta 0:02:22 lr 0.000484 wd 0.0500 time 0.4039 (0.4122) data time 0.0007 (0.0023) model time 0.4032 (0.4119) loss 7.5718 (7.0139) grad_norm 2.1335 (inf) loss_scale 256.0000 (382.6335) mem 14939MB [2024-07-25 05:03:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][290/625] eta 0:02:17 lr 0.000484 wd 0.0500 time 0.4025 (0.4118) data time 0.0009 (0.0023) model time 0.4016 (0.4114) loss 6.8375 (7.0160) grad_norm 3.0788 (inf) loss_scale 256.0000 (378.2818) mem 14939MB [2024-07-25 05:03:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][300/625] eta 0:02:13 lr 0.000484 wd 0.0500 time 0.3970 (0.4114) data time 0.0006 (0.0022) model time 0.3963 (0.4109) loss 7.6658 (7.0299) grad_norm 1.4932 (inf) loss_scale 256.0000 (374.2193) mem 14939MB [2024-07-25 05:03:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][310/625] eta 0:02:09 lr 0.000484 wd 0.0500 time 0.3994 (0.4110) data time 0.0007 (0.0022) model time 0.3986 (0.4104) loss 6.7539 (7.0233) grad_norm 2.3964 (inf) loss_scale 256.0000 (370.4180) mem 14939MB [2024-07-25 05:03:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][320/625] eta 0:02:05 lr 0.000484 wd 0.0500 time 0.3973 (0.4106) data time 0.0010 (0.0022) model time 0.3963 (0.4099) loss 7.1136 (7.0170) grad_norm 3.6974 (inf) loss_scale 256.0000 (366.8536) mem 14939MB [2024-07-25 05:03:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][330/625] eta 0:02:01 lr 0.000483 wd 0.0500 time 0.3983 (0.4103) data time 0.0010 (0.0021) model time 0.3973 (0.4096) loss 6.7835 (7.0095) grad_norm 7.1366 (inf) loss_scale 256.0000 (363.5045) mem 14939MB [2024-07-25 05:03:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][340/625] eta 0:01:57 lr 0.000483 wd 0.0500 time 0.5681 (0.4114) data time 0.0010 (0.0021) model time 0.5671 (0.4108) loss 7.8822 (7.0123) grad_norm 4.5030 (inf) loss_scale 256.0000 (360.3519) mem 14939MB [2024-07-25 05:03:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][350/625] eta 0:01:53 lr 0.000483 wd 0.0500 time 0.6160 (0.4132) data time 0.0006 (0.0021) model time 0.6154 (0.4129) loss 7.1451 (7.0129) grad_norm 2.3975 (inf) loss_scale 256.0000 (357.3789) mem 14939MB [2024-07-25 05:03:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][360/625] eta 0:01:49 lr 0.000483 wd 0.0500 time 0.4000 (0.4147) data time 0.0009 (0.0020) model time 0.3991 (0.4147) loss 7.1393 (7.0028) grad_norm 2.8183 (inf) loss_scale 256.0000 (354.5706) mem 14939MB [2024-07-25 05:03:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][370/625] eta 0:01:46 lr 0.000483 wd 0.0500 time 0.5891 (0.4164) data time 0.0006 (0.0020) model time 0.5885 (0.4166) loss 7.6541 (7.0124) grad_norm 2.2002 (inf) loss_scale 256.0000 (351.9137) mem 14939MB [2024-07-25 05:03:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][380/625] eta 0:01:41 lr 0.000483 wd 0.0500 time 0.4000 (0.4160) data time 0.0008 (0.0020) model time 0.3992 (0.4161) loss 8.3972 (7.0163) grad_norm 2.1757 (inf) loss_scale 256.0000 (349.3963) mem 14939MB [2024-07-25 05:03:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][390/625] eta 0:01:37 lr 0.000483 wd 0.0500 time 0.3989 (0.4157) data time 0.0009 (0.0019) model time 0.3980 (0.4157) loss 5.9383 (7.0036) grad_norm 2.3103 (inf) loss_scale 256.0000 (347.0077) mem 14939MB [2024-07-25 05:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][400/625] eta 0:01:33 lr 0.000483 wd 0.0500 time 0.4013 (0.4153) data time 0.0008 (0.0019) model time 0.4005 (0.4153) loss 6.0529 (7.0035) grad_norm 2.6984 (inf) loss_scale 256.0000 (344.7382) mem 14939MB [2024-07-25 05:04:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][410/625] eta 0:01:29 lr 0.000483 wd 0.0500 time 0.4101 (0.4150) data time 0.0007 (0.0019) model time 0.4094 (0.4149) loss 7.6484 (7.0011) grad_norm 1.8328 (inf) loss_scale 256.0000 (342.5791) mem 14939MB [2024-07-25 05:04:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][420/625] eta 0:01:25 lr 0.000482 wd 0.0500 time 0.3970 (0.4147) data time 0.0008 (0.0019) model time 0.3961 (0.4145) loss 5.7275 (6.9935) grad_norm 2.7628 (inf) loss_scale 256.0000 (340.5226) mem 14939MB [2024-07-25 05:04:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][430/625] eta 0:01:20 lr 0.000482 wd 0.0500 time 0.4026 (0.4144) data time 0.0009 (0.0019) model time 0.4018 (0.4141) loss 7.5522 (6.9932) grad_norm 2.8332 (inf) loss_scale 256.0000 (338.5615) mem 14939MB [2024-07-25 05:04:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][440/625] eta 0:01:16 lr 0.000482 wd 0.0500 time 0.3966 (0.4140) data time 0.0009 (0.0018) model time 0.3957 (0.4137) loss 7.2340 (6.9909) grad_norm 2.0274 (inf) loss_scale 256.0000 (336.6893) mem 14939MB [2024-07-25 05:04:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][450/625] eta 0:01:12 lr 0.000482 wd 0.0500 time 0.3975 (0.4136) data time 0.0008 (0.0018) model time 0.3967 (0.4133) loss 7.0001 (6.9832) grad_norm 2.0619 (inf) loss_scale 256.0000 (334.9002) mem 14939MB [2024-07-25 05:04:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][460/625] eta 0:01:08 lr 0.000482 wd 0.0500 time 0.3953 (0.4133) data time 0.0008 (0.0018) model time 0.3945 (0.4129) loss 6.1896 (6.9791) grad_norm 2.1200 (inf) loss_scale 256.0000 (333.1887) mem 14939MB [2024-07-25 05:04:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][470/625] eta 0:01:04 lr 0.000482 wd 0.0500 time 0.3999 (0.4130) data time 0.0008 (0.0018) model time 0.3991 (0.4125) loss 7.2919 (6.9847) grad_norm 2.7320 (inf) loss_scale 256.0000 (331.5499) mem 14939MB [2024-07-25 05:04:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][480/625] eta 0:00:59 lr 0.000482 wd 0.0500 time 0.3987 (0.4127) data time 0.0007 (0.0018) model time 0.3981 (0.4122) loss 6.1208 (6.9807) grad_norm 3.7126 (inf) loss_scale 256.0000 (329.9792) mem 14939MB [2024-07-25 05:04:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][490/625] eta 0:00:55 lr 0.000482 wd 0.0500 time 0.3986 (0.4125) data time 0.0009 (0.0017) model time 0.3977 (0.4119) loss 6.8932 (6.9832) grad_norm 2.2147 (inf) loss_scale 256.0000 (328.4725) mem 14939MB [2024-07-25 05:04:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][500/625] eta 0:00:51 lr 0.000482 wd 0.0500 time 0.4066 (0.4122) data time 0.0008 (0.0017) model time 0.4058 (0.4116) loss 7.0846 (6.9883) grad_norm 2.7903 (inf) loss_scale 256.0000 (327.0259) mem 14939MB [2024-07-25 05:04:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][510/625] eta 0:00:47 lr 0.000482 wd 0.0500 time 0.3957 (0.4119) data time 0.0008 (0.0017) model time 0.3949 (0.4113) loss 6.9049 (6.9898) grad_norm 2.4760 (inf) loss_scale 256.0000 (325.6360) mem 14939MB [2024-07-25 05:04:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][520/625] eta 0:00:43 lr 0.000481 wd 0.0500 time 0.4060 (0.4117) data time 0.0008 (0.0017) model time 0.4052 (0.4110) loss 6.7332 (6.9923) grad_norm 4.4441 (inf) loss_scale 256.0000 (324.2994) mem 14939MB [2024-07-25 05:04:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][530/625] eta 0:00:39 lr 0.000481 wd 0.0500 time 0.3959 (0.4115) data time 0.0009 (0.0017) model time 0.3951 (0.4108) loss 8.1126 (6.9986) grad_norm 3.1518 (inf) loss_scale 256.0000 (323.0132) mem 14939MB [2024-07-25 05:04:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][540/625] eta 0:00:34 lr 0.000481 wd 0.0500 time 0.3965 (0.4112) data time 0.0008 (0.0017) model time 0.3957 (0.4105) loss 8.4080 (6.9966) grad_norm 3.1375 (inf) loss_scale 256.0000 (321.7745) mem 14939MB [2024-07-25 05:05:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][550/625] eta 0:00:30 lr 0.000481 wd 0.0500 time 0.3963 (0.4110) data time 0.0007 (0.0017) model time 0.3956 (0.4102) loss 5.6646 (6.9971) grad_norm 2.4418 (inf) loss_scale 256.0000 (320.5808) mem 14939MB [2024-07-25 05:05:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][560/625] eta 0:00:26 lr 0.000481 wd 0.0500 time 0.5950 (0.4117) data time 0.0008 (0.0016) model time 0.5942 (0.4110) loss 5.7700 (6.9983) grad_norm 6.0255 (inf) loss_scale 256.0000 (319.4296) mem 14939MB [2024-07-25 05:05:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][570/625] eta 0:00:22 lr 0.000481 wd 0.0500 time 0.5895 (0.4130) data time 0.0006 (0.0016) model time 0.5888 (0.4124) loss 7.1873 (7.0037) grad_norm 2.0952 (inf) loss_scale 256.0000 (318.3187) mem 14939MB [2024-07-25 05:05:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][580/625] eta 0:00:18 lr 0.000481 wd 0.0500 time 0.3968 (0.4139) data time 0.0007 (0.0016) model time 0.3961 (0.4134) loss 6.6956 (7.0084) grad_norm 2.2693 (inf) loss_scale 256.0000 (317.2461) mem 14939MB [2024-07-25 05:05:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][590/625] eta 0:00:14 lr 0.000481 wd 0.0500 time 0.3968 (0.4148) data time 0.0007 (0.0016) model time 0.3962 (0.4144) loss 7.5453 (7.0122) grad_norm 2.0271 (inf) loss_scale 256.0000 (316.2098) mem 14939MB [2024-07-25 05:05:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][600/625] eta 0:00:10 lr 0.000481 wd 0.0500 time 0.3969 (0.4146) data time 0.0007 (0.0016) model time 0.3962 (0.4141) loss 5.9474 (7.0102) grad_norm 3.3077 (inf) loss_scale 256.0000 (315.2080) mem 14939MB [2024-07-25 05:05:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][610/625] eta 0:00:06 lr 0.000480 wd 0.0500 time 0.4090 (0.4143) data time 0.0006 (0.0016) model time 0.4084 (0.4138) loss 6.8213 (7.0101) grad_norm 2.1564 (inf) loss_scale 256.0000 (314.2390) mem 14939MB [2024-07-25 05:05:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][620/625] eta 0:00:02 lr 0.000480 wd 0.0500 time 0.3925 (0.4141) data time 0.0006 (0.0016) model time 0.3919 (0.4136) loss 7.3266 (7.0080) grad_norm 2.1301 (inf) loss_scale 256.0000 (313.3011) mem 14939MB [2024-07-25 05:05:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 178 training takes 0:04:18 [2024-07-25 05:05:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:05:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:05:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.542 (0.542) Loss 0.5796 (0.5796) Acc@1 89.648 (89.648) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 05:05:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.127) Loss 0.9116 (0.7154) Acc@1 80.518 (85.920) Acc@5 95.410 (97.492) Mem 14939MB [2024-07-25 05:05:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.107) Loss 1.0352 (0.8363) Acc@1 75.537 (82.503) Acc@5 94.727 (96.236) Mem 14939MB [2024-07-25 05:05:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.140 Acc@5 96.221 [2024-07-25 05:05:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-07-25 05:05:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.14% [2024-07-25 05:05:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 05:05:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 05:05:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.661 (0.661) Loss 0.5532 (0.5532) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:05:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.138) Loss 0.8760 (0.6892) Acc@1 81.201 (86.315) Acc@5 95.996 (97.665) Mem 14939MB [2024-07-25 05:05:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.113) Loss 1.0088 (0.8081) Acc@1 75.879 (82.980) Acc@5 95.166 (96.482) Mem 14939MB [2024-07-25 05:05:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.570 Acc@5 96.439 [2024-07-25 05:05:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-07-25 05:05:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.57% [2024-07-25 05:05:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:05:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:05:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][0/625] eta 0:07:44 lr 0.000480 wd 0.0500 time 0.7438 (0.7438) data time 0.3648 (0.3648) model time 0.0000 (0.0000) loss 7.8102 (7.8102) grad_norm 3.4914 (3.4914) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:05:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][10/625] eta 0:04:25 lr 0.000480 wd 0.0500 time 0.3974 (0.4311) data time 0.0008 (0.0342) model time 0.0000 (0.0000) loss 7.9992 (7.0461) grad_norm 2.0707 (2.4022) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:05:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][20/625] eta 0:04:12 lr 0.000480 wd 0.0500 time 0.3998 (0.4169) data time 0.0007 (0.0183) model time 0.0000 (0.0000) loss 8.5071 (7.2131) grad_norm 3.5711 (2.3035) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][30/625] eta 0:04:05 lr 0.000480 wd 0.0500 time 0.3949 (0.4122) data time 0.0007 (0.0127) model time 0.0000 (0.0000) loss 7.0492 (7.1323) grad_norm 1.8618 (2.3517) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][40/625] eta 0:03:59 lr 0.000480 wd 0.0500 time 0.4007 (0.4094) data time 0.0009 (0.0098) model time 0.0000 (0.0000) loss 6.3031 (7.0177) grad_norm 2.7237 (2.3054) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][50/625] eta 0:03:54 lr 0.000480 wd 0.0500 time 0.3975 (0.4076) data time 0.0006 (0.0081) model time 0.0000 (0.0000) loss 5.5960 (6.9574) grad_norm 2.6595 (2.4084) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][60/625] eta 0:03:49 lr 0.000480 wd 0.0500 time 0.3982 (0.4066) data time 0.0008 (0.0069) model time 0.3974 (0.4003) loss 7.2515 (6.9822) grad_norm 3.2152 (2.4126) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][70/625] eta 0:03:46 lr 0.000480 wd 0.0500 time 0.3989 (0.4085) data time 0.0008 (0.0061) model time 0.3981 (0.4097) loss 6.7788 (6.9638) grad_norm 1.9561 (2.4042) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][80/625] eta 0:03:42 lr 0.000479 wd 0.0500 time 0.3999 (0.4075) data time 0.0007 (0.0054) model time 0.3993 (0.4064) loss 7.1817 (6.9939) grad_norm 3.5404 (2.3925) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][90/625] eta 0:03:37 lr 0.000479 wd 0.0500 time 0.4015 (0.4067) data time 0.0006 (0.0049) model time 0.4009 (0.4045) loss 6.2556 (6.9503) grad_norm 2.9317 (2.4691) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][100/625] eta 0:03:33 lr 0.000479 wd 0.0500 time 0.3985 (0.4061) data time 0.0006 (0.0045) model time 0.3979 (0.4036) loss 6.4073 (6.9999) grad_norm 2.2317 (2.5494) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][110/625] eta 0:03:28 lr 0.000479 wd 0.0500 time 0.3997 (0.4057) data time 0.0008 (0.0042) model time 0.3989 (0.4032) loss 6.5129 (6.9900) grad_norm 3.9748 (2.6175) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][120/625] eta 0:03:24 lr 0.000479 wd 0.0500 time 0.4013 (0.4055) data time 0.0006 (0.0039) model time 0.4007 (0.4031) loss 7.9574 (7.0055) grad_norm 2.5020 (2.6396) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][130/625] eta 0:03:20 lr 0.000479 wd 0.0500 time 0.3974 (0.4051) data time 0.0007 (0.0037) model time 0.3967 (0.4027) loss 7.7655 (7.0236) grad_norm 3.0934 (2.6353) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][140/625] eta 0:03:16 lr 0.000479 wd 0.0500 time 0.3952 (0.4048) data time 0.0009 (0.0035) model time 0.3943 (0.4024) loss 7.8160 (7.0424) grad_norm 2.5820 (2.6523) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][150/625] eta 0:03:12 lr 0.000479 wd 0.0500 time 0.3975 (0.4044) data time 0.0007 (0.0033) model time 0.3968 (0.4019) loss 7.7644 (7.0114) grad_norm 1.7396 (2.6617) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][160/625] eta 0:03:10 lr 0.000479 wd 0.0500 time 0.5702 (0.4098) data time 0.0006 (0.0032) model time 0.5695 (0.4099) loss 6.1695 (6.9979) grad_norm 3.8820 (2.6919) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][170/625] eta 0:03:08 lr 0.000479 wd 0.0500 time 0.3963 (0.4142) data time 0.0009 (0.0030) model time 0.3954 (0.4161) loss 6.4880 (7.0053) grad_norm 2.0628 (2.7422) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:06:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][180/625] eta 0:03:06 lr 0.000478 wd 0.0500 time 0.3991 (0.4195) data time 0.0007 (0.0029) model time 0.3984 (0.4233) loss 6.9256 (7.0009) grad_norm 2.4591 (2.7252) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][190/625] eta 0:03:02 lr 0.000478 wd 0.0500 time 0.4005 (0.4205) data time 0.0007 (0.0028) model time 0.3998 (0.4243) loss 7.2757 (6.9883) grad_norm 2.8771 (2.7711) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][200/625] eta 0:02:58 lr 0.000478 wd 0.0500 time 0.3936 (0.4195) data time 0.0007 (0.0027) model time 0.3929 (0.4227) loss 7.9676 (6.9888) grad_norm 2.9612 (2.7971) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][210/625] eta 0:02:53 lr 0.000478 wd 0.0500 time 0.3987 (0.4186) data time 0.0006 (0.0026) model time 0.3981 (0.4212) loss 6.4635 (6.9758) grad_norm 2.6800 (2.7769) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][220/625] eta 0:02:49 lr 0.000478 wd 0.0500 time 0.3975 (0.4181) data time 0.0010 (0.0029) model time 0.3965 (0.4199) loss 6.5228 (6.9823) grad_norm 2.6275 (2.7916) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][230/625] eta 0:02:44 lr 0.000478 wd 0.0500 time 0.3973 (0.4173) data time 0.0007 (0.0028) model time 0.3966 (0.4187) loss 9.0633 (6.9847) grad_norm 3.1987 (2.7898) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][240/625] eta 0:02:40 lr 0.000478 wd 0.0500 time 0.3977 (0.4167) data time 0.0008 (0.0027) model time 0.3969 (0.4178) loss 6.8289 (6.9823) grad_norm 2.2891 (2.7846) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][250/625] eta 0:02:36 lr 0.000478 wd 0.0500 time 0.4013 (0.4160) data time 0.0009 (0.0026) model time 0.4004 (0.4169) loss 6.4274 (6.9803) grad_norm 1.7410 (2.7590) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][260/625] eta 0:02:31 lr 0.000478 wd 0.0500 time 0.3992 (0.4155) data time 0.0008 (0.0026) model time 0.3984 (0.4162) loss 7.6539 (6.9811) grad_norm 3.0879 (2.7790) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][270/625] eta 0:02:27 lr 0.000478 wd 0.0500 time 0.4004 (0.4150) data time 0.0009 (0.0025) model time 0.3995 (0.4155) loss 8.4071 (6.9798) grad_norm 2.5229 (2.7783) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][280/625] eta 0:02:23 lr 0.000477 wd 0.0500 time 0.3995 (0.4146) data time 0.0007 (0.0025) model time 0.3988 (0.4149) loss 7.0127 (6.9892) grad_norm 2.5357 (2.7938) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][290/625] eta 0:02:19 lr 0.000477 wd 0.0500 time 0.3994 (0.4150) data time 0.0009 (0.0024) model time 0.3985 (0.4154) loss 8.1823 (6.9953) grad_norm 1.7693 (2.7865) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][300/625] eta 0:02:14 lr 0.000477 wd 0.0500 time 0.3993 (0.4146) data time 0.0009 (0.0024) model time 0.3984 (0.4149) loss 7.8664 (6.9959) grad_norm 2.0661 (2.7921) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][310/625] eta 0:02:10 lr 0.000477 wd 0.0500 time 0.3967 (0.4141) data time 0.0009 (0.0023) model time 0.3959 (0.4142) loss 7.2972 (7.0004) grad_norm 1.8774 (2.7693) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][320/625] eta 0:02:06 lr 0.000477 wd 0.0500 time 0.3988 (0.4137) data time 0.0008 (0.0023) model time 0.3980 (0.4137) loss 6.2468 (6.9915) grad_norm 2.4273 (2.7516) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:07:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][330/625] eta 0:02:01 lr 0.000477 wd 0.0500 time 0.4012 (0.4133) data time 0.0008 (0.0022) model time 0.4004 (0.4132) loss 5.9975 (6.9909) grad_norm 2.2700 (2.7548) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][340/625] eta 0:01:57 lr 0.000477 wd 0.0500 time 0.3938 (0.4129) data time 0.0007 (0.0022) model time 0.3931 (0.4126) loss 6.9301 (6.9892) grad_norm 1.8726 (2.7441) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][350/625] eta 0:01:53 lr 0.000477 wd 0.0500 time 0.4009 (0.4125) data time 0.0009 (0.0022) model time 0.4000 (0.4122) loss 6.1148 (6.9900) grad_norm 2.7247 (2.7569) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][360/625] eta 0:01:49 lr 0.000477 wd 0.0500 time 0.4032 (0.4122) data time 0.0007 (0.0021) model time 0.4025 (0.4118) loss 7.2540 (6.9845) grad_norm 2.4635 (2.7595) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][370/625] eta 0:01:45 lr 0.000476 wd 0.0500 time 0.3983 (0.4120) data time 0.0006 (0.0021) model time 0.3976 (0.4115) loss 7.4977 (6.9864) grad_norm 2.4051 (2.7895) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][380/625] eta 0:01:41 lr 0.000476 wd 0.0500 time 0.5667 (0.4137) data time 0.0009 (0.0021) model time 0.5658 (0.4135) loss 7.8776 (7.0038) grad_norm 4.4991 (2.8227) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][390/625] eta 0:01:37 lr 0.000476 wd 0.0500 time 0.3966 (0.4143) data time 0.0009 (0.0020) model time 0.3958 (0.4141) loss 7.8817 (7.0130) grad_norm 2.9400 (2.8404) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][400/625] eta 0:01:33 lr 0.000476 wd 0.0500 time 0.5573 (0.4165) data time 0.0007 (0.0020) model time 0.5566 (0.4167) loss 7.0855 (7.0169) grad_norm 1.7986 (2.8515) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][410/625] eta 0:01:29 lr 0.000476 wd 0.0500 time 0.3985 (0.4170) data time 0.0007 (0.0020) model time 0.3978 (0.4173) loss 6.1093 (7.0143) grad_norm 3.3812 (2.8455) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][420/625] eta 0:01:25 lr 0.000476 wd 0.0500 time 0.3991 (0.4166) data time 0.0009 (0.0020) model time 0.3982 (0.4168) loss 6.6768 (7.0021) grad_norm 2.9983 (2.8462) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][430/625] eta 0:01:21 lr 0.000476 wd 0.0500 time 0.4127 (0.4163) data time 0.0006 (0.0019) model time 0.4121 (0.4163) loss 7.7566 (6.9976) grad_norm 5.3728 (2.8756) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][440/625] eta 0:01:16 lr 0.000476 wd 0.0500 time 0.3988 (0.4159) data time 0.0008 (0.0019) model time 0.3980 (0.4159) loss 6.6816 (7.0062) grad_norm 7.1551 (2.8929) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][450/625] eta 0:01:12 lr 0.000476 wd 0.0500 time 0.3961 (0.4156) data time 0.0009 (0.0019) model time 0.3953 (0.4155) loss 6.6808 (7.0023) grad_norm 2.7893 (2.8928) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][460/625] eta 0:01:08 lr 0.000476 wd 0.0500 time 0.3961 (0.4153) data time 0.0008 (0.0019) model time 0.3953 (0.4151) loss 7.0830 (6.9980) grad_norm 2.5952 (2.8975) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:08:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][470/625] eta 0:01:04 lr 0.000475 wd 0.0500 time 0.3977 (0.4149) data time 0.0009 (0.0018) model time 0.3968 (0.4147) loss 6.0851 (6.9960) grad_norm 2.0584 (2.9025) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][480/625] eta 0:01:00 lr 0.000475 wd 0.0500 time 0.3965 (0.4146) data time 0.0007 (0.0018) model time 0.3958 (0.4144) loss 5.8259 (6.9867) grad_norm 2.2915 (2.9193) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][490/625] eta 0:00:55 lr 0.000475 wd 0.0500 time 0.3999 (0.4143) data time 0.0009 (0.0018) model time 0.3990 (0.4140) loss 6.8748 (6.9813) grad_norm 2.3925 (2.9311) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][500/625] eta 0:00:51 lr 0.000475 wd 0.0500 time 0.3965 (0.4140) data time 0.0009 (0.0018) model time 0.3956 (0.4136) loss 6.8407 (6.9773) grad_norm 2.3515 (2.9371) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][510/625] eta 0:00:47 lr 0.000475 wd 0.0500 time 0.4013 (0.4142) data time 0.0007 (0.0018) model time 0.4006 (0.4138) loss 8.0695 (6.9777) grad_norm 1.9485 (2.9385) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][520/625] eta 0:00:43 lr 0.000475 wd 0.0500 time 0.4024 (0.4139) data time 0.0008 (0.0018) model time 0.4016 (0.4135) loss 7.1418 (6.9834) grad_norm 2.8312 (2.9304) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][530/625] eta 0:00:39 lr 0.000475 wd 0.0500 time 0.4012 (0.4136) data time 0.0006 (0.0018) model time 0.4006 (0.4132) loss 7.1160 (6.9850) grad_norm 2.5540 (2.9203) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][540/625] eta 0:00:35 lr 0.000475 wd 0.0500 time 0.3962 (0.4133) data time 0.0008 (0.0017) model time 0.3954 (0.4129) loss 7.0167 (6.9870) grad_norm 2.4408 (2.9251) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][550/625] eta 0:00:30 lr 0.000475 wd 0.0500 time 0.4033 (0.4131) data time 0.0008 (0.0017) model time 0.4026 (0.4126) loss 7.1612 (6.9845) grad_norm 2.8276 (2.9230) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][560/625] eta 0:00:26 lr 0.000474 wd 0.0500 time 0.4026 (0.4128) data time 0.0006 (0.0017) model time 0.4020 (0.4123) loss 8.4884 (6.9900) grad_norm 3.5726 (2.9219) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][570/625] eta 0:00:22 lr 0.000474 wd 0.0500 time 0.3980 (0.4126) data time 0.0008 (0.0017) model time 0.3973 (0.4120) loss 6.4626 (6.9934) grad_norm 2.9836 (2.9359) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][580/625] eta 0:00:18 lr 0.000474 wd 0.0500 time 0.4051 (0.4124) data time 0.0009 (0.0017) model time 0.4042 (0.4118) loss 7.4618 (6.9930) grad_norm 2.5738 (2.9279) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][590/625] eta 0:00:14 lr 0.000474 wd 0.0500 time 0.3968 (0.4121) data time 0.0005 (0.0017) model time 0.3962 (0.4115) loss 7.7334 (6.9940) grad_norm 2.9063 (2.9202) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][600/625] eta 0:00:10 lr 0.000474 wd 0.0500 time 0.5500 (0.4134) data time 0.0009 (0.0017) model time 0.5491 (0.4129) loss 6.4328 (6.9982) grad_norm 2.6459 (2.9260) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:09:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][610/625] eta 0:00:06 lr 0.000474 wd 0.0500 time 0.5682 (0.4145) data time 0.0005 (0.0016) model time 0.5677 (0.4140) loss 8.1639 (7.0038) grad_norm 10.1158 (2.9457) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][620/625] eta 0:00:02 lr 0.000474 wd 0.0500 time 0.5646 (0.4154) data time 0.0004 (0.0016) model time 0.5643 (0.4151) loss 6.4193 (7.0004) grad_norm 1.9082 (2.9452) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 179 training takes 0:04:19 [2024-07-25 05:10:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:10:02 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:10:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.5967 (0.5967) Acc@1 88.965 (88.965) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 05:10:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9351 (0.7365) Acc@1 80.322 (85.773) Acc@5 95.850 (97.523) Mem 14939MB [2024-07-25 05:10:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.0625 (0.8592) Acc@1 75.293 (82.354) Acc@5 94.678 (96.238) Mem 14939MB [2024-07-25 05:10:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.012 Acc@5 96.221 [2024-07-25 05:10:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-07-25 05:10:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.803 (0.803) Loss 0.5522 (0.5522) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:10:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.155) Loss 0.8750 (0.6883) Acc@1 81.299 (86.337) Acc@5 96.045 (97.678) Mem 14939MB [2024-07-25 05:10:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0088 (0.8069) Acc@1 75.879 (82.994) Acc@5 95.215 (96.494) Mem 14939MB [2024-07-25 05:10:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.596 Acc@5 96.451 [2024-07-25 05:10:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-07-25 05:10:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.60% [2024-07-25 05:10:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:10:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:10:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][0/625] eta 0:08:46 lr 0.000474 wd 0.0500 time 0.8416 (0.8416) data time 0.4552 (0.4552) model time 0.0000 (0.0000) loss 6.6602 (6.6602) grad_norm 3.5823 (3.5823) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][10/625] eta 0:04:51 lr 0.000474 wd 0.0500 time 0.4038 (0.4744) data time 0.0009 (0.0423) model time 0.0000 (0.0000) loss 7.4866 (7.3708) grad_norm 3.8311 (3.1479) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][20/625] eta 0:04:24 lr 0.000474 wd 0.0500 time 0.3988 (0.4379) data time 0.0006 (0.0226) model time 0.0000 (0.0000) loss 7.5510 (7.3338) grad_norm 3.3125 (3.0821) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][30/625] eta 0:04:13 lr 0.000474 wd 0.0500 time 0.3965 (0.4254) data time 0.0006 (0.0155) model time 0.0000 (0.0000) loss 7.5216 (7.2828) grad_norm 3.8063 (3.0641) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][40/625] eta 0:04:06 lr 0.000473 wd 0.0500 time 0.3948 (0.4221) data time 0.0006 (0.0120) model time 0.0000 (0.0000) loss 6.3644 (7.1665) grad_norm 2.4326 (2.8982) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][50/625] eta 0:03:59 lr 0.000473 wd 0.0500 time 0.3999 (0.4174) data time 0.0008 (0.0098) model time 0.0000 (0.0000) loss 5.8427 (7.1302) grad_norm 2.2374 (2.7899) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][60/625] eta 0:03:54 lr 0.000473 wd 0.0500 time 0.3978 (0.4144) data time 0.0009 (0.0084) model time 0.3970 (0.3980) loss 6.6703 (7.0940) grad_norm 2.5402 (2.7734) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][70/625] eta 0:03:48 lr 0.000473 wd 0.0500 time 0.3975 (0.4122) data time 0.0007 (0.0073) model time 0.3968 (0.3979) loss 6.1249 (7.0568) grad_norm 2.3843 (2.7430) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][80/625] eta 0:03:43 lr 0.000473 wd 0.0500 time 0.3984 (0.4106) data time 0.0006 (0.0065) model time 0.3978 (0.3982) loss 6.7144 (7.0291) grad_norm 1.5409 (2.7031) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][90/625] eta 0:03:39 lr 0.000473 wd 0.0500 time 0.3979 (0.4095) data time 0.0008 (0.0059) model time 0.3971 (0.3986) loss 7.5028 (7.0251) grad_norm 2.0015 (2.7085) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][100/625] eta 0:03:34 lr 0.000473 wd 0.0500 time 0.3985 (0.4084) data time 0.0006 (0.0054) model time 0.3979 (0.3984) loss 5.1998 (6.9993) grad_norm 2.4592 (2.7397) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][110/625] eta 0:03:29 lr 0.000473 wd 0.0500 time 0.3976 (0.4076) data time 0.0008 (0.0050) model time 0.3969 (0.3983) loss 6.0472 (6.9933) grad_norm 2.2293 (2.7025) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:10:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][120/625] eta 0:03:25 lr 0.000473 wd 0.0500 time 0.3979 (0.4068) data time 0.0009 (0.0047) model time 0.3970 (0.3982) loss 7.2967 (6.9563) grad_norm 3.4219 (2.7661) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][130/625] eta 0:03:21 lr 0.000472 wd 0.0500 time 0.3991 (0.4062) data time 0.0008 (0.0044) model time 0.3983 (0.3981) loss 7.6029 (6.9404) grad_norm 2.9872 (2.7662) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][140/625] eta 0:03:16 lr 0.000472 wd 0.0500 time 0.4006 (0.4059) data time 0.0007 (0.0041) model time 0.3999 (0.3984) loss 7.2951 (6.9739) grad_norm 1.7553 (2.7701) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][150/625] eta 0:03:12 lr 0.000472 wd 0.0500 time 0.3965 (0.4055) data time 0.0006 (0.0039) model time 0.3960 (0.3985) loss 6.6246 (6.9674) grad_norm 4.7320 (2.7584) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][160/625] eta 0:03:08 lr 0.000472 wd 0.0500 time 0.3996 (0.4051) data time 0.0006 (0.0037) model time 0.3991 (0.3985) loss 6.0399 (6.9600) grad_norm 2.3975 (2.7283) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][170/625] eta 0:03:04 lr 0.000472 wd 0.0500 time 0.4035 (0.4047) data time 0.0007 (0.0035) model time 0.4029 (0.3985) loss 7.6942 (6.9720) grad_norm 2.1426 (2.6967) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][180/625] eta 0:02:59 lr 0.000472 wd 0.0500 time 0.3957 (0.4044) data time 0.0006 (0.0034) model time 0.3951 (0.3984) loss 6.9334 (6.9932) grad_norm 3.0147 (2.7418) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][190/625] eta 0:02:56 lr 0.000472 wd 0.0500 time 0.3996 (0.4051) data time 0.0007 (0.0032) model time 0.3990 (0.3997) loss 7.3474 (6.9867) grad_norm 2.6491 (2.7444) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][200/625] eta 0:02:54 lr 0.000472 wd 0.0500 time 0.5717 (0.4095) data time 0.0008 (0.0031) model time 0.5709 (0.4060) loss 7.0014 (6.9992) grad_norm 2.5462 (2.7410) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][210/625] eta 0:02:50 lr 0.000472 wd 0.0500 time 0.5845 (0.4116) data time 0.0009 (0.0030) model time 0.5836 (0.4089) loss 6.3144 (6.9943) grad_norm 2.1551 (2.7505) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][220/625] eta 0:02:47 lr 0.000472 wd 0.0500 time 0.3982 (0.4133) data time 0.0006 (0.0029) model time 0.3975 (0.4113) loss 6.2079 (6.9928) grad_norm 2.1192 (2.7734) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][230/625] eta 0:02:43 lr 0.000471 wd 0.0500 time 0.3993 (0.4132) data time 0.0006 (0.0028) model time 0.3987 (0.4112) loss 6.0116 (6.9821) grad_norm 1.9583 (2.7797) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][240/625] eta 0:02:38 lr 0.000471 wd 0.0500 time 0.3988 (0.4127) data time 0.0008 (0.0028) model time 0.3979 (0.4106) loss 6.4183 (6.9930) grad_norm 1.9952 (2.7728) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][250/625] eta 0:02:34 lr 0.000471 wd 0.0500 time 0.4009 (0.4123) data time 0.0008 (0.0027) model time 0.4000 (0.4101) loss 7.9498 (6.9904) grad_norm 2.7756 (2.7580) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:11:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][260/625] eta 0:02:30 lr 0.000471 wd 0.0500 time 0.4032 (0.4124) data time 0.0007 (0.0026) model time 0.4024 (0.4104) loss 5.6801 (6.9856) grad_norm 3.1777 (2.7534) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][270/625] eta 0:02:26 lr 0.000471 wd 0.0500 time 0.3990 (0.4120) data time 0.0008 (0.0026) model time 0.3982 (0.4099) loss 7.0216 (6.9816) grad_norm 2.2536 (2.7451) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][280/625] eta 0:02:21 lr 0.000471 wd 0.0500 time 0.4015 (0.4116) data time 0.0008 (0.0025) model time 0.4007 (0.4094) loss 6.1202 (6.9705) grad_norm 2.1957 (2.7409) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][290/625] eta 0:02:17 lr 0.000471 wd 0.0500 time 0.3977 (0.4112) data time 0.0008 (0.0024) model time 0.3969 (0.4090) loss 5.5308 (6.9610) grad_norm 1.9148 (2.7459) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][300/625] eta 0:02:13 lr 0.000471 wd 0.0500 time 0.4162 (0.4108) data time 0.0007 (0.0024) model time 0.4156 (0.4086) loss 7.1312 (6.9638) grad_norm 2.2491 (2.7390) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][310/625] eta 0:02:09 lr 0.000471 wd 0.0500 time 0.4024 (0.4105) data time 0.0006 (0.0023) model time 0.4018 (0.4082) loss 6.1738 (6.9594) grad_norm 5.1076 (2.7538) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][320/625] eta 0:02:05 lr 0.000470 wd 0.0500 time 0.3961 (0.4101) data time 0.0009 (0.0023) model time 0.3952 (0.4079) loss 7.2467 (6.9648) grad_norm 4.5475 (2.7645) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][330/625] eta 0:02:00 lr 0.000470 wd 0.0500 time 0.4036 (0.4098) data time 0.0009 (0.0023) model time 0.4028 (0.4075) loss 6.1808 (6.9560) grad_norm 2.7116 (2.7659) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][340/625] eta 0:01:56 lr 0.000470 wd 0.0500 time 0.3963 (0.4095) data time 0.0008 (0.0022) model time 0.3955 (0.4072) loss 8.0413 (6.9505) grad_norm 4.0642 (2.7742) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][350/625] eta 0:01:52 lr 0.000470 wd 0.0500 time 0.3986 (0.4092) data time 0.0008 (0.0022) model time 0.3978 (0.4069) loss 6.5964 (6.9515) grad_norm 2.0584 (2.7914) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][360/625] eta 0:01:48 lr 0.000470 wd 0.0500 time 0.3970 (0.4089) data time 0.0007 (0.0022) model time 0.3963 (0.4066) loss 6.8415 (6.9547) grad_norm 2.5621 (2.7903) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][370/625] eta 0:01:44 lr 0.000470 wd 0.0500 time 0.4239 (0.4087) data time 0.0009 (0.0021) model time 0.4230 (0.4064) loss 7.3280 (6.9520) grad_norm 1.6606 (2.7755) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][380/625] eta 0:01:40 lr 0.000470 wd 0.0500 time 0.3998 (0.4085) data time 0.0007 (0.0021) model time 0.3990 (0.4062) loss 8.1380 (6.9634) grad_norm 2.3425 (2.7601) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][390/625] eta 0:01:35 lr 0.000470 wd 0.0500 time 0.3963 (0.4083) data time 0.0009 (0.0021) model time 0.3954 (0.4060) loss 7.1642 (6.9708) grad_norm 1.9429 (2.7573) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][400/625] eta 0:01:31 lr 0.000470 wd 0.0500 time 0.4013 (0.4081) data time 0.0011 (0.0021) model time 0.4002 (0.4058) loss 7.1729 (6.9726) grad_norm 1.8734 (2.7462) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:12:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][410/625] eta 0:01:27 lr 0.000470 wd 0.0500 time 0.3963 (0.4086) data time 0.0009 (0.0020) model time 0.3954 (0.4065) loss 7.9748 (6.9747) grad_norm 3.9864 (2.7432) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][420/625] eta 0:01:24 lr 0.000469 wd 0.0500 time 0.3955 (0.4110) data time 0.0009 (0.0020) model time 0.3947 (0.4092) loss 5.3390 (6.9722) grad_norm 2.2621 (2.7470) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][430/625] eta 0:01:20 lr 0.000469 wd 0.0500 time 0.3987 (0.4124) data time 0.0007 (0.0020) model time 0.3980 (0.4108) loss 5.7387 (6.9734) grad_norm 2.0886 (2.7356) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][440/625] eta 0:01:16 lr 0.000469 wd 0.0500 time 0.6131 (0.4142) data time 0.0007 (0.0020) model time 0.6124 (0.4129) loss 6.8457 (6.9763) grad_norm 1.7816 (2.7264) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][450/625] eta 0:01:12 lr 0.000469 wd 0.0500 time 0.3993 (0.4143) data time 0.0006 (0.0019) model time 0.3986 (0.4130) loss 7.3345 (6.9736) grad_norm 4.0532 (2.7276) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][460/625] eta 0:01:08 lr 0.000469 wd 0.0500 time 0.3966 (0.4140) data time 0.0008 (0.0019) model time 0.3958 (0.4127) loss 7.1606 (6.9764) grad_norm 2.1551 (2.7278) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][470/625] eta 0:01:04 lr 0.000469 wd 0.0500 time 0.3961 (0.4137) data time 0.0008 (0.0019) model time 0.3953 (0.4123) loss 7.4997 (6.9769) grad_norm 1.8445 (2.7238) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][480/625] eta 0:00:59 lr 0.000469 wd 0.0500 time 0.3988 (0.4134) data time 0.0008 (0.0019) model time 0.3980 (0.4120) loss 6.1464 (6.9774) grad_norm 2.0593 (2.7317) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][490/625] eta 0:00:55 lr 0.000469 wd 0.0500 time 0.3996 (0.4135) data time 0.0007 (0.0019) model time 0.3989 (0.4121) loss 7.0002 (6.9853) grad_norm 1.9350 (2.7283) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][500/625] eta 0:00:51 lr 0.000469 wd 0.0500 time 0.4010 (0.4132) data time 0.0007 (0.0018) model time 0.4003 (0.4118) loss 7.6268 (6.9898) grad_norm 2.0227 (2.7248) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][510/625] eta 0:00:47 lr 0.000469 wd 0.0500 time 0.4005 (0.4129) data time 0.0006 (0.0018) model time 0.3999 (0.4115) loss 5.7156 (6.9872) grad_norm 3.5849 (2.7295) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][520/625] eta 0:00:43 lr 0.000468 wd 0.0500 time 0.4002 (0.4127) data time 0.0008 (0.0018) model time 0.3994 (0.4113) loss 8.2215 (6.9882) grad_norm 1.9953 (2.7277) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][530/625] eta 0:00:39 lr 0.000468 wd 0.0500 time 0.4001 (0.4125) data time 0.0007 (0.0018) model time 0.3994 (0.4110) loss 6.1223 (6.9819) grad_norm 3.4721 (2.7222) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][540/625] eta 0:00:35 lr 0.000468 wd 0.0500 time 0.3997 (0.4123) data time 0.0008 (0.0018) model time 0.3989 (0.4108) loss 6.0847 (6.9818) grad_norm 2.0691 (2.7341) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:13:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][550/625] eta 0:00:30 lr 0.000468 wd 0.0500 time 0.3975 (0.4120) data time 0.0006 (0.0017) model time 0.3969 (0.4106) loss 6.7336 (6.9809) grad_norm 3.8421 (2.7414) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][560/625] eta 0:00:26 lr 0.000468 wd 0.0500 time 0.3956 (0.4118) data time 0.0007 (0.0017) model time 0.3949 (0.4103) loss 6.6487 (6.9733) grad_norm 2.3200 (2.7368) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][570/625] eta 0:00:22 lr 0.000468 wd 0.0500 time 0.3973 (0.4116) data time 0.0006 (0.0017) model time 0.3967 (0.4101) loss 6.7798 (6.9762) grad_norm 1.7623 (2.7313) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][580/625] eta 0:00:18 lr 0.000468 wd 0.0500 time 0.4019 (0.4114) data time 0.0008 (0.0017) model time 0.4011 (0.4099) loss 8.0346 (6.9806) grad_norm 2.1718 (2.7284) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][590/625] eta 0:00:14 lr 0.000468 wd 0.0500 time 0.3956 (0.4112) data time 0.0008 (0.0017) model time 0.3948 (0.4097) loss 6.8367 (6.9824) grad_norm 1.6752 (2.7253) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][600/625] eta 0:00:10 lr 0.000468 wd 0.0500 time 0.3989 (0.4110) data time 0.0009 (0.0017) model time 0.3980 (0.4095) loss 6.3821 (6.9846) grad_norm 2.7114 (2.7307) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][610/625] eta 0:00:06 lr 0.000467 wd 0.0500 time 0.4064 (0.4108) data time 0.0004 (0.0017) model time 0.4060 (0.4093) loss 5.9780 (6.9816) grad_norm 3.0442 (2.7422) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][620/625] eta 0:00:02 lr 0.000467 wd 0.0500 time 0.3957 (0.4107) data time 0.0006 (0.0017) model time 0.3950 (0.4091) loss 7.0678 (6.9771) grad_norm 2.8016 (2.7439) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 180 training takes 0:04:16 [2024-07-25 05:14:25 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:14:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:14:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5718 (0.5718) Acc@1 89.307 (89.307) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 05:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9004 (0.7132) Acc@1 80.029 (85.804) Acc@5 95.996 (97.545) Mem 14939MB [2024-07-25 05:14:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0361 (0.8335) Acc@1 75.977 (82.587) Acc@5 94.189 (96.310) Mem 14939MB [2024-07-25 05:14:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.252 Acc@5 96.319 [2024-07-25 05:14:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-07-25 05:14:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.25% [2024-07-25 05:14:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 05:14:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 05:14:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.545 (0.545) Loss 0.5527 (0.5527) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:14:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.128) Loss 0.8740 (0.6877) Acc@1 81.348 (86.364) Acc@5 96.094 (97.683) Mem 14939MB [2024-07-25 05:14:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.108) Loss 1.0078 (0.8062) Acc@1 75.830 (83.022) Acc@5 95.166 (96.512) Mem 14939MB [2024-07-25 05:14:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.614 Acc@5 96.467 [2024-07-25 05:14:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-07-25 05:14:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.61% [2024-07-25 05:14:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:14:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][0/625] eta 0:09:04 lr 0.000467 wd 0.0500 time 0.8717 (0.8717) data time 0.4978 (0.4978) model time 0.0000 (0.0000) loss 8.3973 (8.3973) grad_norm 3.1307 (3.1307) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][10/625] eta 0:05:10 lr 0.000467 wd 0.0500 time 0.5596 (0.5050) data time 0.0008 (0.0460) model time 0.0000 (0.0000) loss 6.3863 (7.1769) grad_norm 2.0761 (2.8656) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][20/625] eta 0:04:59 lr 0.000467 wd 0.0500 time 0.5838 (0.4953) data time 0.0010 (0.0245) model time 0.0000 (0.0000) loss 6.4651 (6.9169) grad_norm 3.5321 (2.8976) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][30/625] eta 0:04:50 lr 0.000467 wd 0.0500 time 0.3956 (0.4882) data time 0.0007 (0.0169) model time 0.0000 (0.0000) loss 6.6583 (6.9167) grad_norm 2.5342 (2.7079) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][40/625] eta 0:04:40 lr 0.000467 wd 0.0500 time 0.4003 (0.4802) data time 0.0007 (0.0130) model time 0.0000 (0.0000) loss 7.5779 (7.0386) grad_norm 3.2426 (2.6280) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:14:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][50/625] eta 0:04:27 lr 0.000467 wd 0.0500 time 0.4004 (0.4647) data time 0.0009 (0.0106) model time 0.0000 (0.0000) loss 7.4437 (7.0835) grad_norm 3.0256 (2.6153) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][60/625] eta 0:04:16 lr 0.000467 wd 0.0500 time 0.3983 (0.4542) data time 0.0006 (0.0090) model time 0.3977 (0.4001) loss 7.5186 (7.0711) grad_norm 6.2461 (2.8151) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][70/625] eta 0:04:08 lr 0.000467 wd 0.0500 time 0.3986 (0.4469) data time 0.0008 (0.0079) model time 0.3978 (0.4005) loss 7.0066 (7.1052) grad_norm 3.6729 (2.9803) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][80/625] eta 0:04:00 lr 0.000467 wd 0.0500 time 0.4015 (0.4412) data time 0.0008 (0.0070) model time 0.4007 (0.4004) loss 7.4297 (7.0785) grad_norm 2.0862 (2.9427) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][90/625] eta 0:03:53 lr 0.000466 wd 0.0500 time 0.3971 (0.4373) data time 0.0007 (0.0063) model time 0.3965 (0.4015) loss 6.4592 (7.0410) grad_norm 1.9844 (2.8906) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][100/625] eta 0:03:47 lr 0.000466 wd 0.0500 time 0.3974 (0.4336) data time 0.0006 (0.0058) model time 0.3967 (0.4009) loss 6.9894 (7.0375) grad_norm 1.5379 (2.8155) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][110/625] eta 0:03:41 lr 0.000466 wd 0.0500 time 0.4004 (0.4305) data time 0.0008 (0.0053) model time 0.3996 (0.4007) loss 6.8486 (7.0203) grad_norm 2.5788 (2.7632) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][120/625] eta 0:03:36 lr 0.000466 wd 0.0500 time 0.4082 (0.4280) data time 0.0006 (0.0050) model time 0.4076 (0.4005) loss 7.0852 (7.0061) grad_norm 2.5804 (2.7633) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][130/625] eta 0:03:30 lr 0.000466 wd 0.0500 time 0.3996 (0.4258) data time 0.0009 (0.0047) model time 0.3987 (0.4002) loss 6.3578 (7.0228) grad_norm 2.0772 (2.7365) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][140/625] eta 0:03:25 lr 0.000466 wd 0.0500 time 0.3988 (0.4240) data time 0.0008 (0.0044) model time 0.3980 (0.4002) loss 7.0511 (7.0030) grad_norm 3.2867 (2.7803) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][150/625] eta 0:03:20 lr 0.000466 wd 0.0500 time 0.3991 (0.4225) data time 0.0007 (0.0042) model time 0.3983 (0.4002) loss 7.3769 (7.0060) grad_norm 8.9549 (2.8389) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][160/625] eta 0:03:15 lr 0.000466 wd 0.0500 time 0.3996 (0.4211) data time 0.0008 (0.0040) model time 0.3988 (0.4001) loss 8.0061 (7.0400) grad_norm 2.0865 (2.8489) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][170/625] eta 0:03:11 lr 0.000466 wd 0.0500 time 0.3937 (0.4201) data time 0.0008 (0.0038) model time 0.3929 (0.4002) loss 6.6320 (7.0292) grad_norm 2.6627 (2.8707) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][180/625] eta 0:03:06 lr 0.000465 wd 0.0500 time 0.4013 (0.4191) data time 0.0008 (0.0037) model time 0.4005 (0.4002) loss 7.0773 (7.0172) grad_norm 2.5140 (2.8975) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][190/625] eta 0:03:01 lr 0.000465 wd 0.0500 time 0.3965 (0.4180) data time 0.0006 (0.0035) model time 0.3959 (0.4000) loss 5.6857 (6.9922) grad_norm 2.0058 (2.8637) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:15:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][200/625] eta 0:02:57 lr 0.000465 wd 0.0500 time 0.3943 (0.4170) data time 0.0006 (0.0034) model time 0.3937 (0.3998) loss 6.6725 (6.9980) grad_norm 4.5465 (2.8451) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:16:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][210/625] eta 0:02:53 lr 0.000465 wd 0.0500 time 0.3953 (0.4171) data time 0.0006 (0.0033) model time 0.3947 (0.4011) loss 7.3623 (7.0119) grad_norm 1.9579 (2.8246) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:16:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][220/625] eta 0:02:48 lr 0.000465 wd 0.0500 time 0.4062 (0.4165) data time 0.0006 (0.0032) model time 0.4056 (0.4011) loss 6.9188 (7.0039) grad_norm 2.2529 (2.8007) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][230/625] eta 0:02:45 lr 0.000465 wd 0.0500 time 0.3936 (0.4186) data time 0.0006 (0.0031) model time 0.3930 (0.4046) loss 5.6882 (6.9926) grad_norm 2.9046 (2.7959) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:16:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][240/625] eta 0:02:41 lr 0.000465 wd 0.0500 time 0.3951 (0.4205) data time 0.0008 (0.0030) model time 0.3943 (0.4078) loss 6.8817 (6.9942) grad_norm 2.1152 (2.7735) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:16:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][250/625] eta 0:02:38 lr 0.000465 wd 0.0500 time 0.5754 (0.4236) data time 0.0008 (0.0029) model time 0.5745 (0.4122) loss 7.0876 (6.9948) grad_norm 5.0461 (2.7927) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:16:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][260/625] eta 0:02:34 lr 0.000465 wd 0.0500 time 0.4062 (0.4242) data time 0.0007 (0.0028) model time 0.4054 (0.4134) loss 6.4187 (6.9949) grad_norm 2.9349 (2.7822) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:16:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][270/625] eta 0:02:30 lr 0.000465 wd 0.0500 time 0.3976 (0.4233) data time 0.0008 (0.0028) model time 0.3968 (0.4128) loss 6.8195 (6.9835) grad_norm 2.2408 (2.7820) loss_scale 512.0000 (262.6125) mem 14939MB [2024-07-25 05:16:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][280/625] eta 0:02:25 lr 0.000464 wd 0.0500 time 0.4007 (0.4226) data time 0.0006 (0.0027) model time 0.4001 (0.4123) loss 7.2677 (6.9931) grad_norm 3.4241 (2.7718) loss_scale 512.0000 (271.4875) mem 14939MB [2024-07-25 05:16:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][290/625] eta 0:02:21 lr 0.000464 wd 0.0500 time 0.3994 (0.4218) data time 0.0006 (0.0026) model time 0.3988 (0.4117) loss 6.9225 (6.9951) grad_norm 3.0381 (2.7629) loss_scale 512.0000 (279.7526) mem 14939MB [2024-07-25 05:16:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][300/625] eta 0:02:16 lr 0.000464 wd 0.0500 time 0.4068 (0.4211) data time 0.0008 (0.0026) model time 0.4060 (0.4113) loss 7.8123 (7.0067) grad_norm 4.4127 (2.7566) loss_scale 512.0000 (287.4684) mem 14939MB [2024-07-25 05:16:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][310/625] eta 0:02:12 lr 0.000464 wd 0.0500 time 0.3966 (0.4204) data time 0.0009 (0.0025) model time 0.3957 (0.4108) loss 7.4129 (7.0109) grad_norm 1.9610 (2.8290) loss_scale 512.0000 (294.6881) mem 14939MB [2024-07-25 05:16:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][320/625] eta 0:02:08 lr 0.000464 wd 0.0500 time 0.4056 (0.4198) data time 0.0006 (0.0025) model time 0.4050 (0.4104) loss 7.7298 (7.0111) grad_norm 2.6367 (2.8315) loss_scale 512.0000 (301.4579) mem 14939MB [2024-07-25 05:16:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][330/625] eta 0:02:03 lr 0.000464 wd 0.0500 time 0.4005 (0.4192) data time 0.0006 (0.0024) model time 0.3999 (0.4100) loss 7.1420 (7.0056) grad_norm 1.8537 (2.8390) loss_scale 512.0000 (307.8187) mem 14939MB [2024-07-25 05:16:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][340/625] eta 0:01:59 lr 0.000464 wd 0.0500 time 0.4141 (0.4187) data time 0.0008 (0.0024) model time 0.4134 (0.4096) loss 6.1412 (7.0007) grad_norm 2.6624 (2.8271) loss_scale 512.0000 (313.8065) mem 14939MB [2024-07-25 05:17:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][350/625] eta 0:01:54 lr 0.000464 wd 0.0500 time 0.3969 (0.4181) data time 0.0008 (0.0023) model time 0.3961 (0.4092) loss 7.0539 (7.0001) grad_norm 1.8207 (2.8209) loss_scale 512.0000 (319.4530) mem 14939MB [2024-07-25 05:17:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][360/625] eta 0:01:50 lr 0.000464 wd 0.0500 time 0.4052 (0.4176) data time 0.0006 (0.0023) model time 0.4046 (0.4090) loss 5.7914 (6.9870) grad_norm 3.2275 (2.8158) loss_scale 512.0000 (324.7867) mem 14939MB [2024-07-25 05:17:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][370/625] eta 0:01:46 lr 0.000464 wd 0.0500 time 0.3980 (0.4171) data time 0.0006 (0.0023) model time 0.3974 (0.4086) loss 7.3227 (6.9830) grad_norm 2.6610 (2.9128) loss_scale 512.0000 (329.8329) mem 14939MB [2024-07-25 05:17:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][380/625] eta 0:01:42 lr 0.000463 wd 0.0500 time 0.3983 (0.4166) data time 0.0008 (0.0022) model time 0.3976 (0.4083) loss 7.4062 (6.9821) grad_norm 2.0628 (2.9167) loss_scale 512.0000 (334.6142) mem 14939MB [2024-07-25 05:17:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][390/625] eta 0:01:37 lr 0.000463 wd 0.0500 time 0.3979 (0.4161) data time 0.0006 (0.0022) model time 0.3973 (0.4079) loss 6.9490 (6.9878) grad_norm 2.2314 (2.9123) loss_scale 512.0000 (339.1509) mem 14939MB [2024-07-25 05:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][400/625] eta 0:01:33 lr 0.000463 wd 0.0500 time 0.4003 (0.4157) data time 0.0006 (0.0021) model time 0.3997 (0.4077) loss 7.2757 (6.9840) grad_norm 2.0099 (2.9148) loss_scale 512.0000 (343.4613) mem 14939MB [2024-07-25 05:17:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][410/625] eta 0:01:29 lr 0.000463 wd 0.0500 time 0.3953 (0.4153) data time 0.0008 (0.0021) model time 0.3946 (0.4074) loss 6.0438 (6.9823) grad_norm 4.4365 (2.9068) loss_scale 512.0000 (347.5620) mem 14939MB [2024-07-25 05:17:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][420/625] eta 0:01:25 lr 0.000463 wd 0.0500 time 0.6148 (0.4154) data time 0.0006 (0.0021) model time 0.6141 (0.4077) loss 6.3149 (6.9769) grad_norm 4.0656 (2.9065) loss_scale 512.0000 (351.4679) mem 14939MB [2024-07-25 05:17:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][430/625] eta 0:01:20 lr 0.000463 wd 0.0500 time 0.3979 (0.4150) data time 0.0007 (0.0021) model time 0.3972 (0.4074) loss 8.1420 (6.9910) grad_norm 3.8050 (2.9083) loss_scale 512.0000 (355.1926) mem 14939MB [2024-07-25 05:17:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][440/625] eta 0:01:16 lr 0.000463 wd 0.0500 time 0.3958 (0.4146) data time 0.0008 (0.0020) model time 0.3950 (0.4071) loss 7.2247 (6.9994) grad_norm 2.2840 (2.9012) loss_scale 512.0000 (358.7483) mem 14939MB [2024-07-25 05:17:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][450/625] eta 0:01:12 lr 0.000463 wd 0.0500 time 0.5975 (0.4157) data time 0.0008 (0.0020) model time 0.5967 (0.4085) loss 6.1074 (7.0052) grad_norm 3.3733 (2.8972) loss_scale 512.0000 (362.1463) mem 14939MB [2024-07-25 05:17:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][460/625] eta 0:01:08 lr 0.000463 wd 0.0500 time 0.5893 (0.4172) data time 0.0008 (0.0020) model time 0.5884 (0.4104) loss 6.4350 (7.0054) grad_norm 3.4117 (2.8875) loss_scale 512.0000 (365.3970) mem 14939MB [2024-07-25 05:17:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][470/625] eta 0:01:04 lr 0.000462 wd 0.0500 time 0.5450 (0.4187) data time 0.0009 (0.0020) model time 0.5441 (0.4122) loss 7.8488 (7.0107) grad_norm 2.9890 (2.8786) loss_scale 512.0000 (368.5096) mem 14939MB [2024-07-25 05:17:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][480/625] eta 0:01:00 lr 0.000462 wd 0.0500 time 0.4021 (0.4194) data time 0.0006 (0.0019) model time 0.4015 (0.4131) loss 6.3216 (7.0122) grad_norm 1.9503 (2.8901) loss_scale 512.0000 (371.4927) mem 14939MB [2024-07-25 05:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][490/625] eta 0:00:56 lr 0.000462 wd 0.0500 time 0.4044 (0.4190) data time 0.0009 (0.0019) model time 0.4035 (0.4128) loss 7.6029 (7.0171) grad_norm 3.4275 (2.8772) loss_scale 512.0000 (374.3544) mem 14939MB [2024-07-25 05:18:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][500/625] eta 0:00:52 lr 0.000462 wd 0.0500 time 0.3991 (0.4186) data time 0.0009 (0.0019) model time 0.3982 (0.4125) loss 6.2378 (7.0210) grad_norm 3.5220 (2.8716) loss_scale 512.0000 (377.1018) mem 14939MB [2024-07-25 05:18:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][510/625] eta 0:00:48 lr 0.000462 wd 0.0500 time 0.3969 (0.4183) data time 0.0009 (0.0019) model time 0.3961 (0.4122) loss 6.5736 (7.0208) grad_norm 2.8600 (2.8812) loss_scale 512.0000 (379.7417) mem 14939MB [2024-07-25 05:18:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][520/625] eta 0:00:43 lr 0.000462 wd 0.0500 time 0.4005 (0.4179) data time 0.0007 (0.0019) model time 0.3999 (0.4119) loss 7.6029 (7.0196) grad_norm 3.1772 (2.8922) loss_scale 512.0000 (382.2802) mem 14939MB [2024-07-25 05:18:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][530/625] eta 0:00:39 lr 0.000462 wd 0.0500 time 0.3978 (0.4176) data time 0.0007 (0.0019) model time 0.3971 (0.4116) loss 5.8870 (7.0195) grad_norm 1.8726 (2.8908) loss_scale 512.0000 (384.7232) mem 14939MB [2024-07-25 05:18:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][540/625] eta 0:00:35 lr 0.000462 wd 0.0500 time 0.3974 (0.4172) data time 0.0008 (0.0018) model time 0.3966 (0.4113) loss 7.0458 (7.0168) grad_norm 2.5140 (2.8830) loss_scale 512.0000 (387.0758) mem 14939MB [2024-07-25 05:18:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][550/625] eta 0:00:31 lr 0.000462 wd 0.0500 time 0.3976 (0.4169) data time 0.0009 (0.0018) model time 0.3968 (0.4111) loss 7.4384 (7.0106) grad_norm 2.0262 (2.8752) loss_scale 512.0000 (389.3430) mem 14939MB [2024-07-25 05:18:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][560/625] eta 0:00:27 lr 0.000462 wd 0.0500 time 0.3947 (0.4166) data time 0.0009 (0.0018) model time 0.3939 (0.4109) loss 7.4357 (7.0054) grad_norm 1.8590 (2.8742) loss_scale 512.0000 (391.5294) mem 14939MB [2024-07-25 05:18:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][570/625] eta 0:00:22 lr 0.000461 wd 0.0500 time 0.3934 (0.4163) data time 0.0009 (0.0018) model time 0.3925 (0.4106) loss 7.9167 (7.0077) grad_norm 1.6713 (2.8670) loss_scale 512.0000 (393.6392) mem 14939MB [2024-07-25 05:18:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][580/625] eta 0:00:18 lr 0.000461 wd 0.0500 time 0.3978 (0.4160) data time 0.0009 (0.0018) model time 0.3970 (0.4104) loss 8.4638 (7.0123) grad_norm 2.0192 (2.8564) loss_scale 512.0000 (395.6764) mem 14939MB [2024-07-25 05:18:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][590/625] eta 0:00:14 lr 0.000461 wd 0.0500 time 0.3944 (0.4157) data time 0.0008 (0.0018) model time 0.3935 (0.4102) loss 6.1861 (7.0173) grad_norm 2.1856 (2.8476) loss_scale 512.0000 (397.6447) mem 14939MB [2024-07-25 05:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][600/625] eta 0:00:10 lr 0.000461 wd 0.0500 time 0.4003 (0.4155) data time 0.0007 (0.0017) model time 0.3997 (0.4100) loss 7.6650 (7.0222) grad_norm 2.8452 (2.8638) loss_scale 512.0000 (399.5474) mem 14939MB [2024-07-25 05:18:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][610/625] eta 0:00:06 lr 0.000461 wd 0.0500 time 0.3995 (0.4153) data time 0.0004 (0.0017) model time 0.3992 (0.4099) loss 6.9125 (7.0236) grad_norm 2.1075 (2.8630) loss_scale 512.0000 (401.3879) mem 14939MB [2024-07-25 05:18:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][620/625] eta 0:00:02 lr 0.000461 wd 0.0500 time 0.4014 (0.4150) data time 0.0004 (0.0017) model time 0.4010 (0.4097) loss 6.7934 (7.0238) grad_norm 2.2731 (2.8530) loss_scale 512.0000 (403.1691) mem 14939MB [2024-07-25 05:18:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 181 training takes 0:04:19 [2024-07-25 05:18:53 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:18:54 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:18:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.5610 (0.5610) Acc@1 89.307 (89.307) Acc@5 98.535 (98.535) Mem 14939MB [2024-07-25 05:18:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8838 (0.7013) Acc@1 80.615 (86.022) Acc@5 96.094 (97.572) Mem 14939MB [2024-07-25 05:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0166 (0.8208) Acc@1 75.781 (82.701) Acc@5 95.020 (96.405) Mem 14939MB [2024-07-25 05:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.378 Acc@5 96.357 [2024-07-25 05:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 05:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.38% [2024-07-25 05:18:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 05:18:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 05:18:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5527 (0.5527) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:18:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8735 (0.6872) Acc@1 81.396 (86.377) Acc@5 96.143 (97.678) Mem 14939MB [2024-07-25 05:18:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 1.0049 (0.8053) Acc@1 75.977 (83.043) Acc@5 95.264 (96.526) Mem 14939MB [2024-07-25 05:19:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.638 Acc@5 96.483 [2024-07-25 05:19:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-07-25 05:19:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.64% [2024-07-25 05:19:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:19:01 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:19:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][0/625] eta 0:07:45 lr 0.000461 wd 0.0500 time 0.7455 (0.7455) data time 0.3611 (0.3611) model time 0.0000 (0.0000) loss 6.2189 (6.2189) grad_norm 2.2675 (2.2675) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][10/625] eta 0:04:24 lr 0.000461 wd 0.0500 time 0.3975 (0.4297) data time 0.0007 (0.0336) model time 0.0000 (0.0000) loss 7.5999 (6.9952) grad_norm 2.4368 (2.6652) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][20/625] eta 0:04:11 lr 0.000461 wd 0.0500 time 0.3942 (0.4151) data time 0.0009 (0.0182) model time 0.0000 (0.0000) loss 7.7360 (7.1248) grad_norm 2.2896 (2.7233) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][30/625] eta 0:04:03 lr 0.000461 wd 0.0500 time 0.4015 (0.4098) data time 0.0008 (0.0127) model time 0.0000 (0.0000) loss 6.6591 (7.0335) grad_norm 3.0879 (2.7074) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][40/625] eta 0:04:01 lr 0.000460 wd 0.0500 time 0.5965 (0.4121) data time 0.0007 (0.0098) model time 0.0000 (0.0000) loss 6.3948 (7.0618) grad_norm 3.1092 (2.7188) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][50/625] eta 0:04:04 lr 0.000460 wd 0.0500 time 0.5980 (0.4255) data time 0.0009 (0.0080) model time 0.0000 (0.0000) loss 7.3122 (7.0633) grad_norm 2.1138 (2.6300) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][60/625] eta 0:04:03 lr 0.000460 wd 0.0500 time 0.5244 (0.4317) data time 0.0008 (0.0069) model time 0.5236 (0.4624) loss 7.0717 (7.0863) grad_norm 4.0469 (2.5675) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][70/625] eta 0:04:04 lr 0.000460 wd 0.0500 time 0.4129 (0.4399) data time 0.0009 (0.0060) model time 0.4120 (0.4758) loss 8.5971 (7.0661) grad_norm 2.9146 (2.6921) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][80/625] eta 0:03:59 lr 0.000460 wd 0.0500 time 0.3970 (0.4388) data time 0.0007 (0.0054) model time 0.3963 (0.4605) loss 8.3294 (7.0885) grad_norm 1.8781 (2.7049) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][90/625] eta 0:03:52 lr 0.000460 wd 0.0500 time 0.3961 (0.4345) data time 0.0007 (0.0049) model time 0.3954 (0.4449) loss 7.5745 (7.1166) grad_norm 2.9177 (2.6785) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][100/625] eta 0:03:46 lr 0.000460 wd 0.0500 time 0.4128 (0.4315) data time 0.0007 (0.0045) model time 0.4121 (0.4366) loss 7.8065 (7.0667) grad_norm 1.7638 (2.6449) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][110/625] eta 0:03:40 lr 0.000460 wd 0.0500 time 0.3971 (0.4288) data time 0.0007 (0.0042) model time 0.3964 (0.4306) loss 6.1508 (7.0395) grad_norm 1.8998 (2.6007) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][120/625] eta 0:03:35 lr 0.000460 wd 0.0500 time 0.3992 (0.4265) data time 0.0006 (0.0039) model time 0.3986 (0.4262) loss 6.8019 (7.0143) grad_norm 2.9424 (2.5968) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:19:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][130/625] eta 0:03:30 lr 0.000460 wd 0.0500 time 0.3968 (0.4245) data time 0.0007 (0.0037) model time 0.3962 (0.4229) loss 9.3884 (7.0112) grad_norm 2.0897 (2.5603) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][140/625] eta 0:03:25 lr 0.000459 wd 0.0500 time 0.3975 (0.4227) data time 0.0007 (0.0035) model time 0.3968 (0.4202) loss 7.2664 (7.0228) grad_norm 1.9916 (2.5976) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][150/625] eta 0:03:20 lr 0.000459 wd 0.0500 time 0.3971 (0.4212) data time 0.0009 (0.0034) model time 0.3962 (0.4180) loss 6.4341 (6.9921) grad_norm 2.2393 (2.6124) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][160/625] eta 0:03:15 lr 0.000459 wd 0.0500 time 0.3999 (0.4207) data time 0.0008 (0.0032) model time 0.3991 (0.4175) loss 7.3071 (6.9732) grad_norm 3.3877 (2.7893) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][170/625] eta 0:03:10 lr 0.000459 wd 0.0500 time 0.4005 (0.4193) data time 0.0006 (0.0031) model time 0.3999 (0.4157) loss 6.2969 (6.9798) grad_norm 3.6663 (2.8260) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][180/625] eta 0:03:06 lr 0.000459 wd 0.0500 time 0.3991 (0.4182) data time 0.0007 (0.0030) model time 0.3984 (0.4144) loss 6.0020 (6.9789) grad_norm 2.4368 (2.8164) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][190/625] eta 0:03:01 lr 0.000459 wd 0.0500 time 0.3987 (0.4172) data time 0.0009 (0.0029) model time 0.3977 (0.4132) loss 6.4750 (6.9567) grad_norm 2.4838 (2.8231) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][200/625] eta 0:02:56 lr 0.000459 wd 0.0500 time 0.4002 (0.4163) data time 0.0007 (0.0028) model time 0.3995 (0.4122) loss 7.7674 (6.9620) grad_norm 2.8181 (2.8219) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][210/625] eta 0:02:52 lr 0.000459 wd 0.0500 time 0.4015 (0.4155) data time 0.0009 (0.0027) model time 0.4006 (0.4114) loss 7.6450 (6.9735) grad_norm 5.4070 (2.8831) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][220/625] eta 0:02:48 lr 0.000459 wd 0.0500 time 0.4009 (0.4149) data time 0.0007 (0.0026) model time 0.4002 (0.4107) loss 5.5978 (6.9681) grad_norm 2.5892 (2.8756) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][230/625] eta 0:02:43 lr 0.000458 wd 0.0500 time 0.3997 (0.4142) data time 0.0009 (0.0025) model time 0.3989 (0.4100) loss 6.0271 (6.9700) grad_norm 2.2275 (2.9237) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][240/625] eta 0:02:39 lr 0.000458 wd 0.0500 time 0.3951 (0.4136) data time 0.0009 (0.0026) model time 0.3942 (0.4093) loss 7.6790 (6.9669) grad_norm 1.9239 (2.9654) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][250/625] eta 0:02:34 lr 0.000458 wd 0.0500 time 0.3993 (0.4131) data time 0.0009 (0.0025) model time 0.3984 (0.4088) loss 6.1133 (6.9651) grad_norm 4.6501 (3.0237) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][260/625] eta 0:02:30 lr 0.000458 wd 0.0500 time 0.3975 (0.4126) data time 0.0006 (0.0025) model time 0.3969 (0.4083) loss 7.8470 (6.9774) grad_norm 3.6779 (3.0495) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][270/625] eta 0:02:27 lr 0.000458 wd 0.0500 time 0.6025 (0.4146) data time 0.0007 (0.0024) model time 0.6018 (0.4110) loss 6.3019 (6.9974) grad_norm 2.4185 (3.0295) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:20:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][280/625] eta 0:02:24 lr 0.000458 wd 0.0500 time 0.6106 (0.4175) data time 0.0009 (0.0024) model time 0.6097 (0.4146) loss 7.8176 (6.9996) grad_norm 3.1775 (3.0351) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][290/625] eta 0:02:20 lr 0.000458 wd 0.0500 time 0.3997 (0.4184) data time 0.0009 (0.0023) model time 0.3988 (0.4158) loss 6.1432 (7.0033) grad_norm 2.4569 (3.0277) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][300/625] eta 0:02:16 lr 0.000458 wd 0.0500 time 0.4045 (0.4192) data time 0.0006 (0.0023) model time 0.4038 (0.4168) loss 6.4016 (6.9977) grad_norm 3.0486 (3.0109) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][310/625] eta 0:02:11 lr 0.000458 wd 0.0500 time 0.4074 (0.4186) data time 0.0008 (0.0022) model time 0.4067 (0.4161) loss 5.9072 (6.9887) grad_norm 2.9124 (3.0315) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][320/625] eta 0:02:07 lr 0.000458 wd 0.0500 time 0.3999 (0.4180) data time 0.0008 (0.0022) model time 0.3991 (0.4155) loss 5.9639 (6.9967) grad_norm 3.3045 (3.0262) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][330/625] eta 0:02:03 lr 0.000457 wd 0.0500 time 0.3996 (0.4174) data time 0.0009 (0.0021) model time 0.3987 (0.4149) loss 7.0756 (7.0098) grad_norm 2.2805 (3.0269) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][340/625] eta 0:01:58 lr 0.000457 wd 0.0500 time 0.3974 (0.4169) data time 0.0006 (0.0021) model time 0.3967 (0.4143) loss 6.8066 (7.0064) grad_norm 2.6221 (3.0387) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][350/625] eta 0:01:54 lr 0.000457 wd 0.0500 time 0.3963 (0.4163) data time 0.0008 (0.0021) model time 0.3955 (0.4137) loss 7.5794 (7.0035) grad_norm 3.3712 (3.0473) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][360/625] eta 0:01:50 lr 0.000457 wd 0.0500 time 0.3970 (0.4159) data time 0.0006 (0.0020) model time 0.3964 (0.4132) loss 7.5746 (6.9996) grad_norm 2.9318 (3.0405) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][370/625] eta 0:01:45 lr 0.000457 wd 0.0500 time 0.3973 (0.4155) data time 0.0006 (0.0020) model time 0.3967 (0.4128) loss 5.7595 (6.9964) grad_norm 2.3994 (3.0368) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][380/625] eta 0:01:41 lr 0.000457 wd 0.0500 time 0.4011 (0.4154) data time 0.0007 (0.0020) model time 0.4005 (0.4128) loss 6.7291 (6.9984) grad_norm 2.5602 (3.0366) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][390/625] eta 0:01:37 lr 0.000457 wd 0.0500 time 0.3968 (0.4150) data time 0.0009 (0.0020) model time 0.3960 (0.4124) loss 6.4534 (6.9892) grad_norm 5.9402 (3.0322) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][400/625] eta 0:01:33 lr 0.000457 wd 0.0500 time 0.3949 (0.4146) data time 0.0007 (0.0019) model time 0.3942 (0.4120) loss 7.7512 (6.9920) grad_norm 6.4445 (3.0387) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][410/625] eta 0:01:29 lr 0.000457 wd 0.0500 time 0.3971 (0.4142) data time 0.0012 (0.0019) model time 0.3959 (0.4115) loss 7.1969 (6.9969) grad_norm 3.0070 (3.0297) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][420/625] eta 0:01:24 lr 0.000457 wd 0.0500 time 0.3981 (0.4138) data time 0.0009 (0.0019) model time 0.3973 (0.4112) loss 5.5115 (6.9916) grad_norm 3.6851 (3.0288) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:21:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][430/625] eta 0:01:20 lr 0.000456 wd 0.0500 time 0.4015 (0.4135) data time 0.0006 (0.0019) model time 0.4009 (0.4108) loss 7.6671 (6.9970) grad_norm 1.7343 (3.0187) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][440/625] eta 0:01:16 lr 0.000456 wd 0.0500 time 0.3993 (0.4132) data time 0.0006 (0.0018) model time 0.3987 (0.4105) loss 5.4619 (6.9954) grad_norm 1.9565 (2.9999) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][450/625] eta 0:01:12 lr 0.000456 wd 0.0500 time 0.4023 (0.4130) data time 0.0008 (0.0018) model time 0.4015 (0.4103) loss 6.0544 (6.9940) grad_norm 2.0597 (2.9793) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][460/625] eta 0:01:08 lr 0.000456 wd 0.0500 time 0.3986 (0.4127) data time 0.0008 (0.0018) model time 0.3978 (0.4101) loss 5.9415 (7.0011) grad_norm 3.7906 (2.9753) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][470/625] eta 0:01:03 lr 0.000456 wd 0.0500 time 0.4197 (0.4125) data time 0.0006 (0.0018) model time 0.4190 (0.4099) loss 7.5844 (6.9963) grad_norm 3.1957 (2.9842) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][480/625] eta 0:00:59 lr 0.000456 wd 0.0500 time 0.3929 (0.4123) data time 0.0009 (0.0018) model time 0.3920 (0.4096) loss 5.9080 (6.9937) grad_norm 2.4856 (2.9722) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][490/625] eta 0:00:55 lr 0.000456 wd 0.0500 time 0.5882 (0.4144) data time 0.0006 (0.0018) model time 0.5876 (0.4121) loss 5.7672 (6.9902) grad_norm 1.9155 (2.9633) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][500/625] eta 0:00:51 lr 0.000456 wd 0.0500 time 0.4058 (0.4149) data time 0.0008 (0.0018) model time 0.4050 (0.4127) loss 7.1633 (6.9918) grad_norm 4.3775 (2.9551) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][510/625] eta 0:00:47 lr 0.000456 wd 0.0500 time 0.6135 (0.4167) data time 0.0007 (0.0017) model time 0.6128 (0.4147) loss 7.1474 (6.9968) grad_norm 2.2202 (2.9542) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][520/625] eta 0:00:43 lr 0.000455 wd 0.0500 time 0.3963 (0.4168) data time 0.0007 (0.0017) model time 0.3956 (0.4148) loss 6.1395 (7.0010) grad_norm 1.8634 (2.9631) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][530/625] eta 0:00:39 lr 0.000455 wd 0.0500 time 0.3960 (0.4164) data time 0.0006 (0.0017) model time 0.3953 (0.4144) loss 5.9051 (6.9948) grad_norm 2.5775 (2.9536) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][540/625] eta 0:00:35 lr 0.000455 wd 0.0500 time 0.3970 (0.4161) data time 0.0008 (0.0017) model time 0.3962 (0.4141) loss 6.2696 (7.0029) grad_norm 1.9593 (2.9429) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][550/625] eta 0:00:31 lr 0.000455 wd 0.0500 time 0.3961 (0.4158) data time 0.0009 (0.0017) model time 0.3952 (0.4138) loss 7.2693 (7.0076) grad_norm 1.8669 (2.9332) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][560/625] eta 0:00:27 lr 0.000455 wd 0.0500 time 0.4090 (0.4155) data time 0.0009 (0.0017) model time 0.4081 (0.4135) loss 6.3842 (7.0050) grad_norm 1.8974 (2.9287) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:22:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][570/625] eta 0:00:22 lr 0.000455 wd 0.0500 time 0.4039 (0.4152) data time 0.0008 (0.0017) model time 0.4031 (0.4132) loss 5.7394 (7.0093) grad_norm 3.6562 (2.9537) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][580/625] eta 0:00:18 lr 0.000455 wd 0.0500 time 0.4004 (0.4150) data time 0.0010 (0.0016) model time 0.3994 (0.4129) loss 8.2627 (7.0094) grad_norm 2.2941 (2.9510) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][590/625] eta 0:00:14 lr 0.000455 wd 0.0500 time 0.3957 (0.4147) data time 0.0007 (0.0016) model time 0.3950 (0.4126) loss 9.1572 (7.0160) grad_norm 2.1312 (2.9479) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][600/625] eta 0:00:10 lr 0.000455 wd 0.0500 time 0.3771 (0.4147) data time 0.0007 (0.0016) model time 0.3764 (0.4127) loss 5.9966 (7.0144) grad_norm 2.2455 (2.9406) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][610/625] eta 0:00:06 lr 0.000455 wd 0.0500 time 0.3981 (0.4145) data time 0.0006 (0.0016) model time 0.3975 (0.4124) loss 6.7966 (7.0163) grad_norm 1.9740 (2.9347) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][620/625] eta 0:00:02 lr 0.000454 wd 0.0500 time 0.4002 (0.4142) data time 0.0004 (0.0016) model time 0.3998 (0.4122) loss 5.4915 (7.0138) grad_norm 2.8566 (2.9326) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 182 training takes 0:04:18 [2024-07-25 05:23:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:23:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:23:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.445 (0.445) Loss 0.5498 (0.5498) Acc@1 89.307 (89.307) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 05:23:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8760 (0.6941) Acc@1 81.055 (85.938) Acc@5 96.045 (97.567) Mem 14939MB [2024-07-25 05:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9883 (0.8148) Acc@1 76.270 (82.610) Acc@5 94.971 (96.350) Mem 14939MB [2024-07-25 05:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.208 Acc@5 96.315 [2024-07-25 05:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-07-25 05:23:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.749 (0.749) Loss 0.5518 (0.5518) Acc@1 89.990 (89.990) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.8721 (0.6865) Acc@1 81.396 (86.395) Acc@5 96.240 (97.687) Mem 14939MB [2024-07-25 05:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0039 (0.8045) Acc@1 76.074 (83.064) Acc@5 95.312 (96.559) Mem 14939MB [2024-07-25 05:23:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.652 Acc@5 96.509 [2024-07-25 05:23:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-07-25 05:23:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.65% [2024-07-25 05:23:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:23:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:23:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][0/625] eta 0:08:16 lr 0.000454 wd 0.0500 time 0.7936 (0.7936) data time 0.3854 (0.3854) model time 0.0000 (0.0000) loss 7.3963 (7.3963) grad_norm 2.3332 (2.3332) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][10/625] eta 0:04:30 lr 0.000454 wd 0.0500 time 0.4017 (0.4390) data time 0.0008 (0.0358) model time 0.0000 (0.0000) loss 7.2634 (7.1659) grad_norm 2.3144 (3.2176) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][20/625] eta 0:04:15 lr 0.000454 wd 0.0500 time 0.3997 (0.4227) data time 0.0006 (0.0191) model time 0.0000 (0.0000) loss 7.7533 (7.0050) grad_norm 3.6175 (3.1851) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][30/625] eta 0:04:07 lr 0.000454 wd 0.0500 time 0.3984 (0.4152) data time 0.0006 (0.0132) model time 0.0000 (0.0000) loss 7.3434 (6.8830) grad_norm 3.0153 (3.0663) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][40/625] eta 0:04:00 lr 0.000454 wd 0.0500 time 0.3976 (0.4109) data time 0.0008 (0.0102) model time 0.0000 (0.0000) loss 7.1488 (6.9475) grad_norm 6.7967 (3.0834) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][50/625] eta 0:03:54 lr 0.000454 wd 0.0500 time 0.3924 (0.4085) data time 0.0007 (0.0084) model time 0.0000 (0.0000) loss 7.5180 (6.9990) grad_norm 2.5360 (3.1277) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][60/625] eta 0:03:49 lr 0.000454 wd 0.0500 time 0.4008 (0.4069) data time 0.0006 (0.0072) model time 0.4002 (0.3975) loss 7.9437 (6.9942) grad_norm 3.2629 (3.2184) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:23:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][70/625] eta 0:03:45 lr 0.000454 wd 0.0500 time 0.3976 (0.4056) data time 0.0009 (0.0063) model time 0.3967 (0.3972) loss 7.3846 (7.0131) grad_norm 2.6540 (3.1346) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][80/625] eta 0:03:42 lr 0.000454 wd 0.0500 time 0.5984 (0.4089) data time 0.0006 (0.0056) model time 0.5978 (0.4088) loss 7.0504 (6.9957) grad_norm 1.9969 (3.1258) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][90/625] eta 0:03:42 lr 0.000453 wd 0.0500 time 0.6119 (0.4156) data time 0.0007 (0.0051) model time 0.6113 (0.4237) loss 6.1450 (6.9794) grad_norm 2.3011 (3.0865) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][100/625] eta 0:03:42 lr 0.000453 wd 0.0500 time 0.5811 (0.4229) data time 0.0009 (0.0047) model time 0.5803 (0.4367) loss 6.5040 (6.9950) grad_norm 3.1814 (3.0156) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][110/625] eta 0:03:40 lr 0.000453 wd 0.0500 time 0.3949 (0.4291) data time 0.0008 (0.0043) model time 0.3941 (0.4456) loss 8.4933 (6.9961) grad_norm 4.4540 (3.0593) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][120/625] eta 0:03:35 lr 0.000453 wd 0.0500 time 0.3979 (0.4274) data time 0.0008 (0.0041) model time 0.3971 (0.4403) loss 7.3061 (6.9623) grad_norm 2.2459 (3.0692) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][130/625] eta 0:03:30 lr 0.000453 wd 0.0500 time 0.5398 (0.4262) data time 0.0009 (0.0038) model time 0.5390 (0.4366) loss 8.2498 (6.9934) grad_norm 2.5749 (3.0949) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][140/625] eta 0:03:25 lr 0.000453 wd 0.0500 time 0.3982 (0.4242) data time 0.0009 (0.0036) model time 0.3974 (0.4322) loss 7.0523 (7.0014) grad_norm 2.6276 (3.1056) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][150/625] eta 0:03:20 lr 0.000453 wd 0.0500 time 0.3961 (0.4226) data time 0.0006 (0.0034) model time 0.3955 (0.4289) loss 5.5979 (6.9751) grad_norm 2.1556 (3.0877) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][160/625] eta 0:03:15 lr 0.000453 wd 0.0500 time 0.3947 (0.4211) data time 0.0006 (0.0033) model time 0.3941 (0.4261) loss 6.9094 (6.9707) grad_norm 2.8161 (3.0604) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][170/625] eta 0:03:11 lr 0.000453 wd 0.0500 time 0.3982 (0.4198) data time 0.0008 (0.0031) model time 0.3973 (0.4237) loss 6.0162 (6.9519) grad_norm 1.8884 (3.0293) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][180/625] eta 0:03:06 lr 0.000453 wd 0.0500 time 0.4000 (0.4186) data time 0.0008 (0.0030) model time 0.3992 (0.4217) loss 7.1821 (6.9611) grad_norm 3.9596 (3.0267) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][190/625] eta 0:03:01 lr 0.000452 wd 0.0500 time 0.3980 (0.4176) data time 0.0008 (0.0029) model time 0.3972 (0.4200) loss 6.7308 (6.9742) grad_norm 2.5429 (2.9900) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][200/625] eta 0:02:57 lr 0.000452 wd 0.0500 time 0.3955 (0.4166) data time 0.0008 (0.0028) model time 0.3948 (0.4184) loss 7.0329 (6.9719) grad_norm 1.9215 (3.0303) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][210/625] eta 0:02:52 lr 0.000452 wd 0.0500 time 0.3948 (0.4156) data time 0.0009 (0.0027) model time 0.3939 (0.4170) loss 6.2020 (6.9476) grad_norm 3.3568 (3.0233) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:24:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][220/625] eta 0:02:48 lr 0.000452 wd 0.0500 time 0.3990 (0.4148) data time 0.0009 (0.0026) model time 0.3982 (0.4158) loss 6.3868 (6.9589) grad_norm 3.3622 (3.0488) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][230/625] eta 0:02:43 lr 0.000452 wd 0.0500 time 0.4047 (0.4143) data time 0.0008 (0.0026) model time 0.4039 (0.4150) loss 7.6674 (6.9610) grad_norm 2.1511 (3.0612) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][240/625] eta 0:02:39 lr 0.000452 wd 0.0500 time 0.3995 (0.4137) data time 0.0008 (0.0025) model time 0.3987 (0.4141) loss 6.5427 (6.9617) grad_norm 3.4318 (3.0319) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][250/625] eta 0:02:34 lr 0.000452 wd 0.0500 time 0.3990 (0.4131) data time 0.0009 (0.0024) model time 0.3981 (0.4134) loss 7.2528 (6.9581) grad_norm 1.9365 (3.0113) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][260/625] eta 0:02:30 lr 0.000452 wd 0.0500 time 0.4108 (0.4127) data time 0.0008 (0.0024) model time 0.4099 (0.4128) loss 7.6303 (6.9562) grad_norm 2.5821 (2.9853) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][270/625] eta 0:02:26 lr 0.000452 wd 0.0500 time 0.4018 (0.4122) data time 0.0006 (0.0023) model time 0.4012 (0.4121) loss 7.9582 (6.9820) grad_norm 3.6178 (2.9759) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][280/625] eta 0:02:22 lr 0.000452 wd 0.0500 time 0.3997 (0.4117) data time 0.0009 (0.0023) model time 0.3988 (0.4115) loss 6.1845 (6.9805) grad_norm 2.2972 (2.9757) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][290/625] eta 0:02:17 lr 0.000451 wd 0.0500 time 0.4012 (0.4113) data time 0.0006 (0.0022) model time 0.4005 (0.4110) loss 7.8769 (6.9762) grad_norm 2.2063 (2.9630) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][300/625] eta 0:02:13 lr 0.000451 wd 0.0500 time 0.5934 (0.4120) data time 0.0007 (0.0022) model time 0.5928 (0.4118) loss 7.9042 (6.9855) grad_norm 2.2172 (2.9546) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][310/625] eta 0:02:10 lr 0.000451 wd 0.0500 time 0.4057 (0.4140) data time 0.0009 (0.0021) model time 0.4049 (0.4141) loss 6.4137 (6.9712) grad_norm 1.9642 (2.9445) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][320/625] eta 0:02:06 lr 0.000451 wd 0.0500 time 0.6007 (0.4157) data time 0.0006 (0.0021) model time 0.6001 (0.4162) loss 7.1841 (6.9675) grad_norm 2.6170 (2.9365) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][330/625] eta 0:02:02 lr 0.000451 wd 0.0500 time 0.3980 (0.4169) data time 0.0007 (0.0021) model time 0.3973 (0.4175) loss 7.5758 (6.9624) grad_norm 4.2443 (2.9688) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][340/625] eta 0:01:58 lr 0.000451 wd 0.0500 time 0.4080 (0.4170) data time 0.0009 (0.0020) model time 0.4071 (0.4175) loss 7.6641 (6.9608) grad_norm 2.0533 (2.9710) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][350/625] eta 0:01:54 lr 0.000451 wd 0.0500 time 0.3988 (0.4165) data time 0.0007 (0.0020) model time 0.3981 (0.4169) loss 6.2788 (6.9638) grad_norm 2.4638 (2.9732) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:25:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][360/625] eta 0:01:50 lr 0.000451 wd 0.0500 time 0.3993 (0.4164) data time 0.0007 (0.0020) model time 0.3986 (0.4168) loss 7.5374 (6.9671) grad_norm 2.2671 (2.9597) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][370/625] eta 0:01:46 lr 0.000451 wd 0.0500 time 0.4047 (0.4159) data time 0.0007 (0.0019) model time 0.4040 (0.4162) loss 7.8214 (6.9769) grad_norm 4.8995 (2.9606) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][380/625] eta 0:01:41 lr 0.000450 wd 0.0500 time 0.3945 (0.4155) data time 0.0009 (0.0019) model time 0.3936 (0.4156) loss 6.9139 (6.9889) grad_norm 5.0891 (2.9643) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][390/625] eta 0:01:37 lr 0.000450 wd 0.0500 time 0.3994 (0.4150) data time 0.0009 (0.0019) model time 0.3985 (0.4151) loss 7.4044 (6.9956) grad_norm 2.7678 (2.9547) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][400/625] eta 0:01:33 lr 0.000450 wd 0.0500 time 0.3976 (0.4146) data time 0.0008 (0.0019) model time 0.3967 (0.4146) loss 6.7979 (6.9977) grad_norm 2.6609 (2.9601) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][410/625] eta 0:01:29 lr 0.000450 wd 0.0500 time 0.3932 (0.4143) data time 0.0007 (0.0019) model time 0.3925 (0.4142) loss 7.9411 (6.9890) grad_norm 2.3895 (2.9535) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][420/625] eta 0:01:24 lr 0.000450 wd 0.0500 time 0.4016 (0.4140) data time 0.0009 (0.0018) model time 0.4007 (0.4138) loss 6.7983 (6.9970) grad_norm 2.6439 (2.9557) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][430/625] eta 0:01:20 lr 0.000450 wd 0.0500 time 0.3996 (0.4137) data time 0.0007 (0.0018) model time 0.3989 (0.4134) loss 6.2714 (6.9984) grad_norm 2.0781 (2.9516) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][440/625] eta 0:01:16 lr 0.000450 wd 0.0500 time 0.3943 (0.4133) data time 0.0006 (0.0018) model time 0.3936 (0.4130) loss 7.4310 (6.9922) grad_norm 3.8667 (2.9575) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][450/625] eta 0:01:12 lr 0.000450 wd 0.0500 time 0.3998 (0.4131) data time 0.0006 (0.0018) model time 0.3992 (0.4127) loss 5.7217 (6.9944) grad_norm 2.6709 (2.9469) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][460/625] eta 0:01:08 lr 0.000450 wd 0.0500 time 0.4082 (0.4128) data time 0.0008 (0.0017) model time 0.4073 (0.4124) loss 6.0308 (6.9861) grad_norm 3.0698 (2.9536) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][470/625] eta 0:01:03 lr 0.000450 wd 0.0500 time 0.3971 (0.4125) data time 0.0007 (0.0017) model time 0.3964 (0.4120) loss 7.9994 (6.9855) grad_norm 5.3259 (2.9544) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][480/625] eta 0:00:59 lr 0.000449 wd 0.0500 time 0.4018 (0.4122) data time 0.0006 (0.0017) model time 0.4011 (0.4117) loss 7.5140 (6.9882) grad_norm 2.4432 (2.9535) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][490/625] eta 0:00:55 lr 0.000449 wd 0.0500 time 0.3985 (0.4120) data time 0.0007 (0.0017) model time 0.3978 (0.4115) loss 7.5857 (6.9876) grad_norm 2.8716 (2.9505) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][500/625] eta 0:00:51 lr 0.000449 wd 0.0500 time 0.3957 (0.4117) data time 0.0008 (0.0017) model time 0.3950 (0.4112) loss 5.9112 (6.9871) grad_norm 2.6300 (2.9548) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:26:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][510/625] eta 0:00:47 lr 0.000449 wd 0.0500 time 0.3992 (0.4115) data time 0.0006 (0.0017) model time 0.3985 (0.4109) loss 7.5800 (6.9920) grad_norm 2.8420 (2.9550) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][520/625] eta 0:00:43 lr 0.000449 wd 0.0500 time 0.5955 (0.4120) data time 0.0008 (0.0016) model time 0.5947 (0.4115) loss 7.2511 (6.9872) grad_norm 2.6506 (2.9568) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][530/625] eta 0:00:39 lr 0.000449 wd 0.0500 time 0.3992 (0.4136) data time 0.0008 (0.0016) model time 0.3984 (0.4132) loss 8.4510 (6.9948) grad_norm 5.7166 (2.9775) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][540/625] eta 0:00:35 lr 0.000449 wd 0.0500 time 0.5911 (0.4149) data time 0.0009 (0.0016) model time 0.5902 (0.4146) loss 7.2564 (6.9902) grad_norm 2.2338 (2.9859) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][550/625] eta 0:00:31 lr 0.000449 wd 0.0500 time 0.5802 (0.4154) data time 0.0010 (0.0016) model time 0.5792 (0.4152) loss 6.8567 (6.9916) grad_norm 1.9346 (2.9759) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][560/625] eta 0:00:26 lr 0.000449 wd 0.0500 time 0.3985 (0.4152) data time 0.0009 (0.0016) model time 0.3976 (0.4149) loss 7.3983 (6.9912) grad_norm 3.9884 (2.9682) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][570/625] eta 0:00:22 lr 0.000449 wd 0.0500 time 0.4030 (0.4149) data time 0.0008 (0.0016) model time 0.4022 (0.4146) loss 6.6549 (6.9934) grad_norm 2.6465 (2.9680) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][580/625] eta 0:00:18 lr 0.000448 wd 0.0500 time 0.4029 (0.4149) data time 0.0008 (0.0016) model time 0.4022 (0.4146) loss 7.9268 (6.9992) grad_norm 1.9343 (2.9579) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][590/625] eta 0:00:14 lr 0.000448 wd 0.0500 time 0.3996 (0.4147) data time 0.0006 (0.0016) model time 0.3989 (0.4144) loss 7.2554 (6.9981) grad_norm 2.3542 (2.9441) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][600/625] eta 0:00:10 lr 0.000448 wd 0.0500 time 0.4125 (0.4146) data time 0.0007 (0.0015) model time 0.4118 (0.4142) loss 5.6236 (6.9886) grad_norm 3.3878 (2.9639) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][610/625] eta 0:00:06 lr 0.000448 wd 0.0500 time 0.3975 (0.4144) data time 0.0004 (0.0015) model time 0.3971 (0.4140) loss 6.3955 (6.9818) grad_norm 2.7047 (2.9645) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][620/625] eta 0:00:02 lr 0.000448 wd 0.0500 time 0.3999 (0.4142) data time 0.0006 (0.0015) model time 0.3993 (0.4138) loss 7.1142 (6.9839) grad_norm 1.6906 (2.9541) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 183 training takes 0:04:18 [2024-07-25 05:27:46 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:27:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:27:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5483 (0.5483) Acc@1 89.551 (89.551) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 05:27:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.9121 (0.6947) Acc@1 80.469 (86.084) Acc@5 95.850 (97.621) Mem 14939MB [2024-07-25 05:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0264 (0.8173) Acc@1 76.123 (82.773) Acc@5 94.434 (96.338) Mem 14939MB [2024-07-25 05:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.392 Acc@5 96.295 [2024-07-25 05:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 05:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.39% [2024-07-25 05:27:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 05:27:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 05:27:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5518 (0.5518) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:27:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8706 (0.6859) Acc@1 81.445 (86.395) Acc@5 96.338 (97.705) Mem 14939MB [2024-07-25 05:27:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0020 (0.8035) Acc@1 76.172 (83.075) Acc@5 95.361 (96.570) Mem 14939MB [2024-07-25 05:27:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.670 Acc@5 96.527 [2024-07-25 05:27:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-07-25 05:27:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.67% [2024-07-25 05:27:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:27:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:27:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][0/625] eta 0:08:46 lr 0.000448 wd 0.0500 time 0.8424 (0.8424) data time 0.4634 (0.4634) model time 0.0000 (0.0000) loss 5.7128 (5.7128) grad_norm 2.0281 (2.0281) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:27:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][10/625] eta 0:04:30 lr 0.000448 wd 0.0500 time 0.4014 (0.4402) data time 0.0009 (0.0430) model time 0.0000 (0.0000) loss 8.4774 (7.4556) grad_norm 3.3871 (3.2398) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][20/625] eta 0:04:14 lr 0.000448 wd 0.0500 time 0.3976 (0.4210) data time 0.0007 (0.0230) model time 0.0000 (0.0000) loss 5.7672 (7.2077) grad_norm 1.8371 (3.0860) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][30/625] eta 0:04:06 lr 0.000448 wd 0.0500 time 0.4027 (0.4144) data time 0.0009 (0.0158) model time 0.0000 (0.0000) loss 8.1111 (7.2023) grad_norm 2.5097 (2.8534) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][40/625] eta 0:04:00 lr 0.000448 wd 0.0500 time 0.4003 (0.4110) data time 0.0008 (0.0122) model time 0.0000 (0.0000) loss 7.3253 (7.1740) grad_norm 4.6310 (2.8397) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][50/625] eta 0:03:55 lr 0.000447 wd 0.0500 time 0.4048 (0.4089) data time 0.0008 (0.0100) model time 0.0000 (0.0000) loss 7.1220 (7.1308) grad_norm 3.1757 (2.8087) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][60/625] eta 0:03:50 lr 0.000447 wd 0.0500 time 0.4052 (0.4078) data time 0.0008 (0.0085) model time 0.4044 (0.4010) loss 6.3180 (7.1006) grad_norm 3.1873 (2.8735) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][70/625] eta 0:03:45 lr 0.000447 wd 0.0500 time 0.3996 (0.4067) data time 0.0009 (0.0074) model time 0.3987 (0.4001) loss 8.3122 (7.0866) grad_norm 4.1339 (3.3682) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][80/625] eta 0:03:42 lr 0.000447 wd 0.0500 time 0.4006 (0.4080) data time 0.0008 (0.0066) model time 0.3998 (0.4055) loss 6.4866 (7.0678) grad_norm 1.7396 (3.2280) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][90/625] eta 0:03:37 lr 0.000447 wd 0.0500 time 0.4002 (0.4072) data time 0.0007 (0.0060) model time 0.3995 (0.4042) loss 8.0529 (7.0582) grad_norm 2.7708 (3.1570) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][100/625] eta 0:03:33 lr 0.000447 wd 0.0500 time 0.3995 (0.4066) data time 0.0007 (0.0055) model time 0.3988 (0.4033) loss 7.3801 (7.0514) grad_norm 3.0930 (3.1061) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][110/625] eta 0:03:29 lr 0.000447 wd 0.0500 time 0.4027 (0.4062) data time 0.0008 (0.0051) model time 0.4018 (0.4030) loss 5.9115 (7.0561) grad_norm 1.8005 (3.0707) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][120/625] eta 0:03:27 lr 0.000447 wd 0.0500 time 0.4037 (0.4118) data time 0.0007 (0.0047) model time 0.4031 (0.4129) loss 6.0051 (7.0493) grad_norm 4.5231 (3.0431) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][130/625] eta 0:03:26 lr 0.000447 wd 0.0500 time 0.3991 (0.4162) data time 0.0007 (0.0044) model time 0.3985 (0.4199) loss 7.8345 (7.0509) grad_norm 2.6170 (3.0182) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][140/625] eta 0:03:24 lr 0.000447 wd 0.0500 time 0.5146 (0.4210) data time 0.0009 (0.0042) model time 0.5136 (0.4269) loss 5.6277 (7.0328) grad_norm 4.5503 (3.0100) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:28:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][150/625] eta 0:03:20 lr 0.000446 wd 0.0500 time 0.3982 (0.4220) data time 0.0007 (0.0040) model time 0.3975 (0.4278) loss 6.7328 (7.0309) grad_norm 3.1273 (3.0327) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][160/625] eta 0:03:15 lr 0.000446 wd 0.0500 time 0.4010 (0.4205) data time 0.0006 (0.0038) model time 0.4004 (0.4250) loss 6.2862 (7.0390) grad_norm 3.9475 (3.0647) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][170/625] eta 0:03:10 lr 0.000446 wd 0.0500 time 0.3969 (0.4195) data time 0.0006 (0.0036) model time 0.3962 (0.4230) loss 6.0014 (7.0214) grad_norm 2.8337 (3.0575) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][180/625] eta 0:03:06 lr 0.000446 wd 0.0500 time 0.3977 (0.4183) data time 0.0006 (0.0035) model time 0.3971 (0.4210) loss 7.6206 (7.0176) grad_norm 2.7985 (3.0526) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][190/625] eta 0:03:01 lr 0.000446 wd 0.0500 time 0.4029 (0.4173) data time 0.0007 (0.0033) model time 0.4022 (0.4195) loss 6.9134 (7.0215) grad_norm 2.2227 (3.0214) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][200/625] eta 0:02:56 lr 0.000446 wd 0.0500 time 0.4001 (0.4164) data time 0.0007 (0.0032) model time 0.3994 (0.4180) loss 7.6838 (7.0207) grad_norm 2.7643 (2.9893) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][210/625] eta 0:02:52 lr 0.000446 wd 0.0500 time 0.4051 (0.4156) data time 0.0008 (0.0031) model time 0.4043 (0.4168) loss 7.6188 (7.0176) grad_norm 2.0445 (2.9771) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][220/625] eta 0:02:47 lr 0.000446 wd 0.0500 time 0.3958 (0.4148) data time 0.0007 (0.0030) model time 0.3951 (0.4156) loss 7.3081 (7.0009) grad_norm 1.9161 (2.9693) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][230/625] eta 0:02:43 lr 0.000446 wd 0.0500 time 0.3950 (0.4140) data time 0.0009 (0.0029) model time 0.3941 (0.4145) loss 6.1402 (6.9977) grad_norm 2.2438 (2.9817) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][240/625] eta 0:02:39 lr 0.000446 wd 0.0500 time 0.3960 (0.4133) data time 0.0007 (0.0028) model time 0.3953 (0.4136) loss 7.8404 (7.0067) grad_norm 2.7270 (2.9741) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][250/625] eta 0:02:34 lr 0.000445 wd 0.0500 time 0.3990 (0.4128) data time 0.0007 (0.0028) model time 0.3983 (0.4129) loss 6.9336 (7.0221) grad_norm 3.2153 (3.0302) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][260/625] eta 0:02:30 lr 0.000445 wd 0.0500 time 0.3980 (0.4123) data time 0.0006 (0.0027) model time 0.3974 (0.4121) loss 6.9271 (7.0104) grad_norm 2.1193 (3.0638) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][270/625] eta 0:02:26 lr 0.000445 wd 0.0500 time 0.4032 (0.4117) data time 0.0008 (0.0026) model time 0.4024 (0.4115) loss 7.2038 (7.0004) grad_norm 3.7030 (3.0762) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][280/625] eta 0:02:21 lr 0.000445 wd 0.0500 time 0.4086 (0.4113) data time 0.0007 (0.0026) model time 0.4079 (0.4109) loss 6.1093 (6.9963) grad_norm 2.0983 (3.0469) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][290/625] eta 0:02:17 lr 0.000445 wd 0.0500 time 0.3969 (0.4108) data time 0.0009 (0.0025) model time 0.3960 (0.4103) loss 6.6761 (6.9888) grad_norm 3.1561 (3.0282) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:29:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][300/625] eta 0:02:13 lr 0.000445 wd 0.0500 time 0.3994 (0.4111) data time 0.0007 (0.0025) model time 0.3987 (0.4106) loss 5.7521 (6.9762) grad_norm 3.5205 (3.0196) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][310/625] eta 0:02:09 lr 0.000445 wd 0.0500 time 0.3992 (0.4107) data time 0.0009 (0.0024) model time 0.3984 (0.4101) loss 7.9742 (6.9776) grad_norm 2.3502 (3.0351) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][320/625] eta 0:02:05 lr 0.000445 wd 0.0500 time 0.3947 (0.4105) data time 0.0009 (0.0024) model time 0.3938 (0.4099) loss 7.2137 (6.9733) grad_norm 1.6289 (3.0224) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][330/625] eta 0:02:01 lr 0.000445 wd 0.0500 time 0.5862 (0.4108) data time 0.0006 (0.0023) model time 0.5856 (0.4102) loss 7.0740 (6.9835) grad_norm 2.4589 (3.0317) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][340/625] eta 0:01:57 lr 0.000444 wd 0.0500 time 0.5983 (0.4126) data time 0.0009 (0.0023) model time 0.5974 (0.4123) loss 7.1806 (6.9793) grad_norm 2.7354 (3.0149) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][350/625] eta 0:01:53 lr 0.000444 wd 0.0500 time 0.5182 (0.4141) data time 0.0008 (0.0022) model time 0.5173 (0.4140) loss 6.0325 (6.9649) grad_norm 1.8798 (2.9904) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][360/625] eta 0:01:50 lr 0.000444 wd 0.0500 time 0.4010 (0.4151) data time 0.0007 (0.0022) model time 0.4003 (0.4152) loss 7.0501 (6.9710) grad_norm 2.6451 (2.9839) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][370/625] eta 0:01:46 lr 0.000444 wd 0.0500 time 0.3983 (0.4166) data time 0.0009 (0.0022) model time 0.3974 (0.4168) loss 7.3954 (6.9674) grad_norm 2.1705 (2.9873) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][380/625] eta 0:01:41 lr 0.000444 wd 0.0500 time 0.3976 (0.4161) data time 0.0009 (0.0021) model time 0.3967 (0.4163) loss 5.5668 (6.9545) grad_norm 1.7244 (2.9717) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:30:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][390/625] eta 0:01:37 lr 0.000444 wd 0.0500 time 0.4019 (0.4157) data time 0.0009 (0.0021) model time 0.4010 (0.4158) loss 7.5973 (6.9644) grad_norm 3.5090 (2.9732) loss_scale 1024.0000 (514.6189) mem 14939MB [2024-07-25 05:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][400/625] eta 0:01:33 lr 0.000444 wd 0.0500 time 0.4066 (0.4154) data time 0.0006 (0.0021) model time 0.4060 (0.4154) loss 5.7531 (6.9492) grad_norm 2.2875 (2.9636) loss_scale 1024.0000 (527.3217) mem 14939MB [2024-07-25 05:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][410/625] eta 0:01:29 lr 0.000444 wd 0.0500 time 0.3973 (0.4150) data time 0.0007 (0.0020) model time 0.3966 (0.4149) loss 6.6049 (6.9484) grad_norm 4.1464 (2.9790) loss_scale 1024.0000 (539.4063) mem 14939MB [2024-07-25 05:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][420/625] eta 0:01:25 lr 0.000444 wd 0.0500 time 0.3995 (0.4147) data time 0.0008 (0.0020) model time 0.3987 (0.4145) loss 5.0384 (6.9427) grad_norm 3.5904 (2.9771) loss_scale 1024.0000 (550.9169) mem 14939MB [2024-07-25 05:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][430/625] eta 0:01:20 lr 0.000444 wd 0.0500 time 0.3994 (0.4143) data time 0.0007 (0.0020) model time 0.3988 (0.4141) loss 6.6485 (6.9509) grad_norm 3.4419 (2.9710) loss_scale 1024.0000 (561.8933) mem 14939MB [2024-07-25 05:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][440/625] eta 0:01:16 lr 0.000443 wd 0.0500 time 0.3974 (0.4139) data time 0.0008 (0.0020) model time 0.3965 (0.4137) loss 6.3737 (6.9547) grad_norm 5.3649 (2.9759) loss_scale 1024.0000 (572.3719) mem 14939MB [2024-07-25 05:31:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][450/625] eta 0:01:12 lr 0.000443 wd 0.0500 time 0.3986 (0.4137) data time 0.0008 (0.0019) model time 0.3978 (0.4134) loss 6.7291 (6.9554) grad_norm 2.6504 (2.9713) loss_scale 1024.0000 (582.3858) mem 14939MB [2024-07-25 05:31:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][460/625] eta 0:01:08 lr 0.000443 wd 0.0500 time 0.3974 (0.4134) data time 0.0008 (0.0019) model time 0.3965 (0.4130) loss 6.7978 (6.9518) grad_norm 2.4209 (inf) loss_scale 512.0000 (587.5228) mem 14939MB [2024-07-25 05:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][470/625] eta 0:01:04 lr 0.000443 wd 0.0500 time 0.3990 (0.4131) data time 0.0009 (0.0019) model time 0.3981 (0.4126) loss 6.7561 (6.9559) grad_norm 3.0223 (inf) loss_scale 512.0000 (585.9193) mem 14939MB [2024-07-25 05:31:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][480/625] eta 0:00:59 lr 0.000443 wd 0.0500 time 0.3994 (0.4128) data time 0.0007 (0.0019) model time 0.3987 (0.4123) loss 8.1938 (6.9610) grad_norm 1.9416 (inf) loss_scale 512.0000 (584.3825) mem 14939MB [2024-07-25 05:31:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][490/625] eta 0:00:55 lr 0.000443 wd 0.0500 time 0.3990 (0.4125) data time 0.0008 (0.0019) model time 0.3982 (0.4120) loss 7.8335 (6.9582) grad_norm 1.8926 (inf) loss_scale 512.0000 (582.9084) mem 14939MB [2024-07-25 05:31:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][500/625] eta 0:00:51 lr 0.000443 wd 0.0500 time 0.3982 (0.4122) data time 0.0007 (0.0018) model time 0.3975 (0.4116) loss 7.4251 (6.9505) grad_norm 2.1302 (inf) loss_scale 512.0000 (581.4930) mem 14939MB [2024-07-25 05:31:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][510/625] eta 0:00:47 lr 0.000443 wd 0.0500 time 0.3983 (0.4119) data time 0.0008 (0.0018) model time 0.3974 (0.4113) loss 7.8434 (6.9558) grad_norm 2.5155 (inf) loss_scale 512.0000 (580.1331) mem 14939MB [2024-07-25 05:31:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][520/625] eta 0:00:43 lr 0.000443 wd 0.0500 time 0.3992 (0.4121) data time 0.0009 (0.0018) model time 0.3983 (0.4115) loss 7.2800 (6.9572) grad_norm 3.0641 (inf) loss_scale 512.0000 (578.8253) mem 14939MB [2024-07-25 05:31:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][530/625] eta 0:00:39 lr 0.000443 wd 0.0500 time 0.3965 (0.4118) data time 0.0008 (0.0018) model time 0.3956 (0.4112) loss 7.2009 (6.9553) grad_norm 1.9852 (inf) loss_scale 512.0000 (577.5669) mem 14939MB [2024-07-25 05:31:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][540/625] eta 0:00:34 lr 0.000442 wd 0.0500 time 0.3947 (0.4116) data time 0.0008 (0.0018) model time 0.3938 (0.4110) loss 5.9057 (6.9500) grad_norm 1.7960 (inf) loss_scale 512.0000 (576.3549) mem 14939MB [2024-07-25 05:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][550/625] eta 0:00:30 lr 0.000442 wd 0.0500 time 0.4041 (0.4114) data time 0.0009 (0.0018) model time 0.4032 (0.4107) loss 8.2952 (6.9522) grad_norm 2.9868 (inf) loss_scale 512.0000 (575.1869) mem 14939MB [2024-07-25 05:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][560/625] eta 0:00:26 lr 0.000442 wd 0.0500 time 0.5773 (0.4128) data time 0.0007 (0.0017) model time 0.5767 (0.4122) loss 5.8182 (6.9523) grad_norm 1.8424 (inf) loss_scale 512.0000 (574.0606) mem 14939MB [2024-07-25 05:31:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][570/625] eta 0:00:22 lr 0.000442 wd 0.0500 time 0.5920 (0.4138) data time 0.0009 (0.0017) model time 0.5911 (0.4134) loss 6.5845 (6.9512) grad_norm 2.0065 (inf) loss_scale 512.0000 (572.9737) mem 14939MB [2024-07-25 05:31:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][580/625] eta 0:00:18 lr 0.000442 wd 0.0500 time 0.3983 (0.4149) data time 0.0006 (0.0017) model time 0.3976 (0.4145) loss 7.3260 (6.9531) grad_norm 2.6546 (inf) loss_scale 512.0000 (571.9243) mem 14939MB [2024-07-25 05:31:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][590/625] eta 0:00:14 lr 0.000442 wd 0.0500 time 0.3981 (0.4152) data time 0.0006 (0.0017) model time 0.3975 (0.4148) loss 6.8089 (6.9574) grad_norm 2.2870 (inf) loss_scale 512.0000 (570.9103) mem 14939MB [2024-07-25 05:32:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][600/625] eta 0:00:10 lr 0.000442 wd 0.0500 time 0.3987 (0.4149) data time 0.0009 (0.0017) model time 0.3978 (0.4146) loss 7.8362 (6.9561) grad_norm 2.5676 (inf) loss_scale 512.0000 (569.9301) mem 14939MB [2024-07-25 05:32:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][610/625] eta 0:00:06 lr 0.000442 wd 0.0500 time 0.4003 (0.4147) data time 0.0006 (0.0017) model time 0.3997 (0.4143) loss 7.5006 (6.9548) grad_norm 2.1800 (inf) loss_scale 512.0000 (568.9820) mem 14939MB [2024-07-25 05:32:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][620/625] eta 0:00:02 lr 0.000442 wd 0.0500 time 0.3984 (0.4145) data time 0.0005 (0.0017) model time 0.3978 (0.4140) loss 6.1324 (6.9514) grad_norm 2.1148 (inf) loss_scale 512.0000 (568.0644) mem 14939MB [2024-07-25 05:32:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 184 training takes 0:04:18 [2024-07-25 05:32:12 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:32:13 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:32:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5610 (0.5610) Acc@1 89.502 (89.502) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 05:32:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.8872 (0.7084) Acc@1 80.420 (86.066) Acc@5 95.947 (97.528) Mem 14939MB [2024-07-25 05:32:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.0332 (0.8195) Acc@1 75.537 (82.817) Acc@5 94.385 (96.350) Mem 14939MB [2024-07-25 05:32:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.406 Acc@5 96.311 [2024-07-25 05:32:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 05:32:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.41% [2024-07-25 05:32:16 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 05:32:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 05:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5503 (0.5503) Acc@1 89.990 (89.990) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:32:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8706 (0.6855) Acc@1 81.396 (86.417) Acc@5 96.289 (97.705) Mem 14939MB [2024-07-25 05:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0029 (0.8029) Acc@1 76.172 (83.108) Acc@5 95.410 (96.573) Mem 14939MB [2024-07-25 05:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.692 Acc@5 96.531 [2024-07-25 05:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-07-25 05:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.69% [2024-07-25 05:32:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:32:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:32:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][0/625] eta 0:07:53 lr 0.000442 wd 0.0500 time 0.7583 (0.7583) data time 0.3547 (0.3547) model time 0.0000 (0.0000) loss 6.8132 (6.8132) grad_norm 2.3318 (2.3318) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][10/625] eta 0:04:37 lr 0.000441 wd 0.0500 time 0.5989 (0.4510) data time 0.0010 (0.0332) model time 0.0000 (0.0000) loss 6.3728 (6.8208) grad_norm 3.2154 (2.7199) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][20/625] eta 0:04:18 lr 0.000441 wd 0.0500 time 0.4019 (0.4267) data time 0.0008 (0.0178) model time 0.0000 (0.0000) loss 6.0038 (6.8272) grad_norm 2.0722 (2.6028) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][30/625] eta 0:04:08 lr 0.000441 wd 0.0500 time 0.3993 (0.4182) data time 0.0009 (0.0123) model time 0.0000 (0.0000) loss 5.6414 (6.8096) grad_norm 2.7672 (2.8981) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][40/625] eta 0:04:02 lr 0.000441 wd 0.0500 time 0.3970 (0.4140) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 8.1884 (6.8414) grad_norm 3.8560 (2.8796) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][50/625] eta 0:03:56 lr 0.000441 wd 0.0500 time 0.3983 (0.4109) data time 0.0006 (0.0079) model time 0.0000 (0.0000) loss 6.1619 (6.7877) grad_norm 2.5342 (2.8328) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][60/625] eta 0:03:51 lr 0.000441 wd 0.0500 time 0.3979 (0.4091) data time 0.0008 (0.0067) model time 0.3971 (0.3988) loss 7.7243 (6.7901) grad_norm 2.3688 (2.8454) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][70/625] eta 0:03:46 lr 0.000441 wd 0.0500 time 0.4098 (0.4081) data time 0.0007 (0.0059) model time 0.4091 (0.3999) loss 7.0799 (6.7747) grad_norm 1.6250 (2.7555) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][80/625] eta 0:03:41 lr 0.000441 wd 0.0500 time 0.3999 (0.4072) data time 0.0007 (0.0053) model time 0.3993 (0.3999) loss 7.6612 (6.7581) grad_norm 2.5129 (2.6954) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:32:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][90/625] eta 0:03:37 lr 0.000441 wd 0.0500 time 0.4046 (0.4063) data time 0.0009 (0.0048) model time 0.4038 (0.3996) loss 8.2356 (6.7829) grad_norm 2.9044 (2.7218) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][100/625] eta 0:03:33 lr 0.000441 wd 0.0500 time 0.3954 (0.4059) data time 0.0006 (0.0044) model time 0.3948 (0.3999) loss 6.3998 (6.7838) grad_norm 3.4777 (2.6874) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][110/625] eta 0:03:28 lr 0.000440 wd 0.0500 time 0.3974 (0.4054) data time 0.0008 (0.0041) model time 0.3966 (0.3999) loss 6.9123 (6.8066) grad_norm 1.8294 (2.6420) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][120/625] eta 0:03:24 lr 0.000440 wd 0.0500 time 0.3968 (0.4049) data time 0.0007 (0.0038) model time 0.3961 (0.3996) loss 7.7112 (6.8230) grad_norm 3.0062 (2.6193) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][130/625] eta 0:03:20 lr 0.000440 wd 0.0500 time 0.4099 (0.4045) data time 0.0008 (0.0036) model time 0.4091 (0.3995) loss 5.3734 (6.8192) grad_norm 2.2468 (2.6634) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][140/625] eta 0:03:16 lr 0.000440 wd 0.0500 time 0.3944 (0.4042) data time 0.0008 (0.0034) model time 0.3936 (0.3994) loss 7.8678 (6.8134) grad_norm 2.8222 (2.6553) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][150/625] eta 0:03:12 lr 0.000440 wd 0.0500 time 0.5963 (0.4061) data time 0.0009 (0.0033) model time 0.5953 (0.4027) loss 7.6336 (6.8243) grad_norm 2.8097 (2.6517) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][160/625] eta 0:03:10 lr 0.000440 wd 0.0500 time 0.5782 (0.4097) data time 0.0006 (0.0031) model time 0.5775 (0.4082) loss 6.6355 (6.8301) grad_norm 2.9268 (2.6625) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][170/625] eta 0:03:08 lr 0.000440 wd 0.0500 time 0.3974 (0.4135) data time 0.0008 (0.0030) model time 0.3966 (0.4137) loss 8.3679 (6.8246) grad_norm 2.4517 (2.7030) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][180/625] eta 0:03:05 lr 0.000440 wd 0.0500 time 0.5532 (0.4176) data time 0.0006 (0.0029) model time 0.5526 (0.4193) loss 6.8705 (6.8358) grad_norm 4.9671 (2.7700) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][190/625] eta 0:03:01 lr 0.000440 wd 0.0500 time 0.3982 (0.4178) data time 0.0008 (0.0028) model time 0.3974 (0.4194) loss 8.1690 (6.8373) grad_norm 5.1638 (2.8081) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][200/625] eta 0:02:57 lr 0.000440 wd 0.0500 time 0.4017 (0.4169) data time 0.0008 (0.0027) model time 0.4009 (0.4181) loss 6.7224 (6.8327) grad_norm 2.1354 (2.8430) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][210/625] eta 0:02:52 lr 0.000439 wd 0.0500 time 0.4055 (0.4162) data time 0.0008 (0.0026) model time 0.4046 (0.4169) loss 6.3519 (6.8375) grad_norm 3.4875 (2.8805) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][220/625] eta 0:02:48 lr 0.000439 wd 0.0500 time 0.3981 (0.4155) data time 0.0009 (0.0025) model time 0.3972 (0.4160) loss 7.5861 (6.8515) grad_norm 3.9330 (2.9089) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:33:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][230/625] eta 0:02:44 lr 0.000439 wd 0.0500 time 0.4025 (0.4158) data time 0.0008 (0.0024) model time 0.4017 (0.4163) loss 5.9449 (6.8535) grad_norm 1.8038 (2.8875) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][240/625] eta 0:02:39 lr 0.000439 wd 0.0500 time 0.3965 (0.4152) data time 0.0006 (0.0024) model time 0.3960 (0.4155) loss 6.7159 (6.8580) grad_norm 2.1423 (2.8983) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][250/625] eta 0:02:35 lr 0.000439 wd 0.0500 time 0.3995 (0.4147) data time 0.0008 (0.0023) model time 0.3988 (0.4147) loss 7.7695 (6.8747) grad_norm 2.6200 (2.9055) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][260/625] eta 0:02:31 lr 0.000439 wd 0.0500 time 0.3976 (0.4141) data time 0.0007 (0.0023) model time 0.3969 (0.4140) loss 6.8480 (6.8733) grad_norm 2.5655 (2.9065) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][270/625] eta 0:02:26 lr 0.000439 wd 0.0500 time 0.4020 (0.4137) data time 0.0006 (0.0022) model time 0.4014 (0.4134) loss 6.4058 (6.8684) grad_norm 2.5529 (2.8859) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][280/625] eta 0:02:22 lr 0.000439 wd 0.0500 time 0.3997 (0.4132) data time 0.0007 (0.0022) model time 0.3991 (0.4128) loss 7.2392 (6.8809) grad_norm 1.9429 (2.8705) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][290/625] eta 0:02:18 lr 0.000439 wd 0.0500 time 0.4044 (0.4128) data time 0.0009 (0.0021) model time 0.4035 (0.4123) loss 6.5526 (6.8847) grad_norm 4.0954 (2.8742) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][300/625] eta 0:02:14 lr 0.000438 wd 0.0500 time 0.4043 (0.4127) data time 0.0010 (0.0021) model time 0.4034 (0.4122) loss 6.8857 (6.8854) grad_norm 3.7503 (2.8972) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][310/625] eta 0:02:09 lr 0.000438 wd 0.0500 time 0.3975 (0.4124) data time 0.0008 (0.0021) model time 0.3967 (0.4118) loss 6.6134 (6.8953) grad_norm 3.0724 (2.8893) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][320/625] eta 0:02:05 lr 0.000438 wd 0.0500 time 0.4026 (0.4121) data time 0.0008 (0.0020) model time 0.4018 (0.4114) loss 6.6393 (6.8985) grad_norm 2.7207 (2.8838) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][330/625] eta 0:02:01 lr 0.000438 wd 0.0500 time 0.4000 (0.4118) data time 0.0009 (0.0020) model time 0.3990 (0.4110) loss 8.5870 (6.9093) grad_norm 2.2688 (2.8755) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][340/625] eta 0:01:57 lr 0.000438 wd 0.0500 time 0.4002 (0.4115) data time 0.0007 (0.0020) model time 0.3995 (0.4107) loss 6.8980 (6.9068) grad_norm 2.3597 (2.8640) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][350/625] eta 0:01:53 lr 0.000438 wd 0.0500 time 0.4039 (0.4115) data time 0.0006 (0.0019) model time 0.4033 (0.4107) loss 5.5588 (6.9013) grad_norm 2.1632 (2.8622) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][360/625] eta 0:01:48 lr 0.000438 wd 0.0500 time 0.3996 (0.4112) data time 0.0006 (0.0019) model time 0.3990 (0.4103) loss 7.6251 (6.8936) grad_norm 1.7561 (2.8427) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][370/625] eta 0:01:44 lr 0.000438 wd 0.0500 time 0.5929 (0.4115) data time 0.0007 (0.0020) model time 0.5922 (0.4106) loss 6.8848 (6.9006) grad_norm 2.6996 (2.8255) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:34:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][380/625] eta 0:01:41 lr 0.000438 wd 0.0500 time 0.6003 (0.4129) data time 0.0009 (0.0019) model time 0.5994 (0.4122) loss 7.6599 (6.9006) grad_norm 2.4138 (2.8160) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][390/625] eta 0:01:37 lr 0.000438 wd 0.0500 time 0.5833 (0.4145) data time 0.0006 (0.0019) model time 0.5827 (0.4140) loss 6.3506 (6.9046) grad_norm 4.7420 (2.8233) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][400/625] eta 0:01:33 lr 0.000437 wd 0.0500 time 0.3964 (0.4164) data time 0.0008 (0.0019) model time 0.3956 (0.4162) loss 6.1946 (6.9035) grad_norm 1.6964 (2.8127) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][410/625] eta 0:01:29 lr 0.000437 wd 0.0500 time 0.3969 (0.4168) data time 0.0009 (0.0019) model time 0.3960 (0.4166) loss 6.9617 (6.9123) grad_norm 3.2098 (2.8045) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][420/625] eta 0:01:25 lr 0.000437 wd 0.0500 time 0.3953 (0.4164) data time 0.0007 (0.0018) model time 0.3947 (0.4161) loss 6.7579 (6.9089) grad_norm 2.1273 (2.8070) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][430/625] eta 0:01:21 lr 0.000437 wd 0.0500 time 0.3948 (0.4161) data time 0.0008 (0.0018) model time 0.3941 (0.4157) loss 7.4780 (6.9073) grad_norm 2.7184 (2.8106) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][440/625] eta 0:01:16 lr 0.000437 wd 0.0500 time 0.3980 (0.4156) data time 0.0006 (0.0018) model time 0.3975 (0.4153) loss 7.8586 (6.9037) grad_norm 2.8919 (2.8136) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][450/625] eta 0:01:12 lr 0.000437 wd 0.0500 time 0.3972 (0.4156) data time 0.0009 (0.0018) model time 0.3963 (0.4152) loss 7.2785 (6.9083) grad_norm 3.1946 (2.8176) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][460/625] eta 0:01:08 lr 0.000437 wd 0.0500 time 0.3995 (0.4153) data time 0.0007 (0.0018) model time 0.3987 (0.4148) loss 8.0160 (6.9090) grad_norm 2.2987 (2.8374) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][470/625] eta 0:01:04 lr 0.000437 wd 0.0500 time 0.3971 (0.4149) data time 0.0009 (0.0017) model time 0.3963 (0.4144) loss 6.7174 (6.9124) grad_norm 2.7219 (2.8425) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][480/625] eta 0:01:00 lr 0.000437 wd 0.0500 time 0.4009 (0.4146) data time 0.0007 (0.0017) model time 0.4003 (0.4141) loss 5.4805 (6.9164) grad_norm 2.5693 (2.8525) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][490/625] eta 0:00:55 lr 0.000437 wd 0.0500 time 0.3966 (0.4143) data time 0.0006 (0.0017) model time 0.3960 (0.4137) loss 6.9439 (6.9200) grad_norm 2.4144 (2.8642) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][500/625] eta 0:00:51 lr 0.000436 wd 0.0500 time 0.3989 (0.4140) data time 0.0006 (0.0017) model time 0.3983 (0.4134) loss 6.9265 (6.9241) grad_norm 5.2379 (2.8846) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][510/625] eta 0:00:47 lr 0.000436 wd 0.0500 time 0.4027 (0.4138) data time 0.0009 (0.0017) model time 0.4018 (0.4131) loss 8.1284 (6.9307) grad_norm 2.5865 (2.8802) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][520/625] eta 0:00:43 lr 0.000436 wd 0.0500 time 0.3958 (0.4135) data time 0.0007 (0.0017) model time 0.3952 (0.4128) loss 7.1626 (6.9356) grad_norm 3.0226 (2.8752) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:35:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][530/625] eta 0:00:39 lr 0.000436 wd 0.0500 time 0.3997 (0.4133) data time 0.0006 (0.0017) model time 0.3991 (0.4125) loss 8.2559 (6.9419) grad_norm 2.9422 (2.8799) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][540/625] eta 0:00:35 lr 0.000436 wd 0.0500 time 0.4124 (0.4131) data time 0.0008 (0.0016) model time 0.4116 (0.4123) loss 6.2414 (6.9386) grad_norm 2.3409 (2.8966) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][550/625] eta 0:00:30 lr 0.000436 wd 0.0500 time 0.3957 (0.4128) data time 0.0009 (0.0016) model time 0.3947 (0.4120) loss 8.1995 (6.9385) grad_norm 4.0762 (2.9073) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][560/625] eta 0:00:26 lr 0.000436 wd 0.0500 time 0.3969 (0.4126) data time 0.0006 (0.0016) model time 0.3963 (0.4118) loss 6.6810 (6.9374) grad_norm 4.3259 (2.9390) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][570/625] eta 0:00:22 lr 0.000436 wd 0.0500 time 0.4060 (0.4124) data time 0.0009 (0.0016) model time 0.4052 (0.4115) loss 7.5149 (6.9355) grad_norm 2.7729 (2.9728) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][580/625] eta 0:00:18 lr 0.000436 wd 0.0500 time 0.3969 (0.4122) data time 0.0007 (0.0016) model time 0.3963 (0.4113) loss 7.5778 (6.9353) grad_norm 2.5479 (2.9808) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][590/625] eta 0:00:14 lr 0.000436 wd 0.0500 time 0.3963 (0.4123) data time 0.0007 (0.0016) model time 0.3956 (0.4114) loss 7.3116 (6.9377) grad_norm 3.8998 (2.9723) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][600/625] eta 0:00:10 lr 0.000435 wd 0.0500 time 0.3964 (0.4136) data time 0.0008 (0.0016) model time 0.3955 (0.4128) loss 7.1543 (6.9385) grad_norm 2.4909 (2.9734) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][610/625] eta 0:00:06 lr 0.000435 wd 0.0500 time 0.5631 (0.4148) data time 0.0006 (0.0016) model time 0.5625 (0.4142) loss 7.5982 (6.9403) grad_norm 2.4358 (2.9660) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][620/625] eta 0:00:02 lr 0.000435 wd 0.0500 time 0.4085 (0.4154) data time 0.0004 (0.0016) model time 0.4081 (0.4149) loss 5.8862 (6.9379) grad_norm 2.8340 (2.9571) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 185 training takes 0:04:19 [2024-07-25 05:36:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:36:40 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:36:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.5708 (0.5708) Acc@1 88.818 (88.818) Acc@5 98.535 (98.535) Mem 14939MB [2024-07-25 05:36:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9028 (0.6959) Acc@1 80.176 (85.986) Acc@5 95.947 (97.585) Mem 14939MB [2024-07-25 05:36:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 1.0273 (0.8162) Acc@1 76.367 (82.803) Acc@5 94.482 (96.431) Mem 14939MB [2024-07-25 05:36:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.400 Acc@5 96.395 [2024-07-25 05:36:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 05:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.795 (0.795) Loss 0.5498 (0.5498) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:36:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.156) Loss 0.8691 (0.6848) Acc@1 81.348 (86.386) Acc@5 96.289 (97.732) Mem 14939MB [2024-07-25 05:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0010 (0.8018) Acc@1 76.074 (83.073) Acc@5 95.459 (96.598) Mem 14939MB [2024-07-25 05:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.674 Acc@5 96.559 [2024-07-25 05:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-07-25 05:36:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][0/625] eta 0:13:17 lr 0.000435 wd 0.0500 time 1.2768 (1.2768) data time 0.4789 (0.4789) model time 0.0000 (0.0000) loss 7.8364 (7.8364) grad_norm 2.7920 (2.7920) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][10/625] eta 0:05:13 lr 0.000435 wd 0.0500 time 0.3776 (0.5101) data time 0.0009 (0.0444) model time 0.0000 (0.0000) loss 7.8073 (7.4916) grad_norm 2.0358 (2.5078) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][20/625] eta 0:04:36 lr 0.000435 wd 0.0500 time 0.3967 (0.4571) data time 0.0008 (0.0237) model time 0.0000 (0.0000) loss 7.0563 (7.2295) grad_norm 3.2518 (2.5790) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:36:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][30/625] eta 0:04:20 lr 0.000435 wd 0.0500 time 0.4017 (0.4379) data time 0.0007 (0.0164) model time 0.0000 (0.0000) loss 7.6935 (7.1370) grad_norm 2.1824 (2.7275) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][40/625] eta 0:04:10 lr 0.000435 wd 0.0500 time 0.3949 (0.4279) data time 0.0009 (0.0127) model time 0.0000 (0.0000) loss 7.7172 (7.0424) grad_norm 1.9374 (3.0148) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][50/625] eta 0:04:02 lr 0.000435 wd 0.0500 time 0.3998 (0.4226) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 6.7319 (6.9679) grad_norm 1.7828 (2.9270) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][60/625] eta 0:03:56 lr 0.000435 wd 0.0500 time 0.3998 (0.4189) data time 0.0008 (0.0089) model time 0.3990 (0.3991) loss 6.0997 (6.9997) grad_norm 3.5977 (3.0555) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][70/625] eta 0:03:50 lr 0.000434 wd 0.0500 time 0.3969 (0.4159) data time 0.0007 (0.0077) model time 0.3963 (0.3980) loss 5.8321 (6.9407) grad_norm 2.1570 (2.9692) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][80/625] eta 0:03:45 lr 0.000434 wd 0.0500 time 0.3978 (0.4137) data time 0.0007 (0.0069) model time 0.3972 (0.3978) loss 5.5737 (6.9715) grad_norm 2.3486 (2.9571) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][90/625] eta 0:03:40 lr 0.000434 wd 0.0500 time 0.4028 (0.4122) data time 0.0006 (0.0062) model time 0.4022 (0.3980) loss 6.8501 (6.9319) grad_norm 1.7408 (2.8629) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][100/625] eta 0:03:35 lr 0.000434 wd 0.0500 time 0.3954 (0.4108) data time 0.0008 (0.0057) model time 0.3945 (0.3979) loss 6.6720 (6.8903) grad_norm 2.9560 (2.8450) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][110/625] eta 0:03:30 lr 0.000434 wd 0.0500 time 0.4007 (0.4097) data time 0.0008 (0.0053) model time 0.3999 (0.3978) loss 6.3988 (6.9046) grad_norm 3.2702 (2.8128) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][120/625] eta 0:03:26 lr 0.000434 wd 0.0500 time 0.3981 (0.4091) data time 0.0009 (0.0049) model time 0.3972 (0.3984) loss 6.9317 (6.9003) grad_norm 2.3710 (2.8533) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][130/625] eta 0:03:22 lr 0.000434 wd 0.0500 time 0.4001 (0.4084) data time 0.0006 (0.0046) model time 0.3994 (0.3985) loss 7.6389 (6.9052) grad_norm 1.6415 (2.8321) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][140/625] eta 0:03:17 lr 0.000434 wd 0.0500 time 0.3969 (0.4080) data time 0.0008 (0.0043) model time 0.3961 (0.3988) loss 7.1759 (6.9251) grad_norm 1.8623 (2.7996) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][150/625] eta 0:03:13 lr 0.000434 wd 0.0500 time 0.4063 (0.4075) data time 0.0007 (0.0041) model time 0.4056 (0.3990) loss 7.7484 (6.9327) grad_norm 3.5705 (2.8543) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][160/625] eta 0:03:09 lr 0.000434 wd 0.0500 time 0.4000 (0.4073) data time 0.0007 (0.0039) model time 0.3993 (0.3993) loss 5.6826 (6.9138) grad_norm 3.2056 (2.8955) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][170/625] eta 0:03:05 lr 0.000433 wd 0.0500 time 0.3984 (0.4068) data time 0.0007 (0.0037) model time 0.3978 (0.3993) loss 7.0131 (6.9069) grad_norm 3.5937 (2.9305) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:37:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][180/625] eta 0:03:00 lr 0.000433 wd 0.0500 time 0.4093 (0.4065) data time 0.0008 (0.0036) model time 0.4085 (0.3993) loss 8.3115 (6.9215) grad_norm 2.7287 (2.9383) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][190/625] eta 0:02:57 lr 0.000433 wd 0.0500 time 0.3948 (0.4090) data time 0.0007 (0.0034) model time 0.3942 (0.4032) loss 6.2187 (6.9137) grad_norm 3.5321 (2.9564) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][200/625] eta 0:02:55 lr 0.000433 wd 0.0500 time 0.3983 (0.4133) data time 0.0007 (0.0033) model time 0.3976 (0.4093) loss 6.0442 (6.8958) grad_norm 3.7492 (2.9929) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][210/625] eta 0:02:53 lr 0.000433 wd 0.0500 time 0.4061 (0.4175) data time 0.0006 (0.0032) model time 0.4054 (0.4150) loss 5.6967 (6.9112) grad_norm 2.4230 (3.0259) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][220/625] eta 0:02:49 lr 0.000433 wd 0.0500 time 0.5110 (0.4192) data time 0.0008 (0.0031) model time 0.5102 (0.4173) loss 7.7911 (6.9035) grad_norm 1.9407 (3.0313) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][230/625] eta 0:02:45 lr 0.000433 wd 0.0500 time 0.5942 (0.4192) data time 0.0008 (0.0030) model time 0.5934 (0.4173) loss 6.1578 (6.9017) grad_norm 3.6931 (3.0253) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][240/625] eta 0:02:41 lr 0.000433 wd 0.0500 time 0.3997 (0.4184) data time 0.0008 (0.0029) model time 0.3989 (0.4163) loss 5.5096 (6.9038) grad_norm 2.0765 (3.0044) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][250/625] eta 0:02:36 lr 0.000433 wd 0.0500 time 0.4071 (0.4176) data time 0.0006 (0.0028) model time 0.4064 (0.4154) loss 6.3696 (6.9130) grad_norm 2.4227 (3.0265) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][260/625] eta 0:02:32 lr 0.000433 wd 0.0500 time 0.4009 (0.4169) data time 0.0008 (0.0028) model time 0.4001 (0.4146) loss 6.1790 (6.9161) grad_norm 2.3401 (3.0044) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][270/625] eta 0:02:27 lr 0.000432 wd 0.0500 time 0.3957 (0.4162) data time 0.0009 (0.0027) model time 0.3948 (0.4138) loss 6.8827 (6.9225) grad_norm 2.9043 (2.9896) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][280/625] eta 0:02:23 lr 0.000432 wd 0.0500 time 0.3973 (0.4157) data time 0.0007 (0.0026) model time 0.3966 (0.4133) loss 6.6831 (6.9234) grad_norm 2.9165 (2.9851) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][290/625] eta 0:02:19 lr 0.000432 wd 0.0500 time 0.4061 (0.4151) data time 0.0007 (0.0026) model time 0.4054 (0.4126) loss 7.2349 (6.9363) grad_norm 2.2429 (2.9876) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][300/625] eta 0:02:14 lr 0.000432 wd 0.0500 time 0.3959 (0.4145) data time 0.0008 (0.0025) model time 0.3951 (0.4120) loss 6.2677 (6.9357) grad_norm 2.2485 (2.9842) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][310/625] eta 0:02:10 lr 0.000432 wd 0.0500 time 0.3945 (0.4140) data time 0.0009 (0.0025) model time 0.3937 (0.4114) loss 6.2627 (6.9398) grad_norm 3.5415 (2.9990) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][320/625] eta 0:02:06 lr 0.000432 wd 0.0500 time 0.4000 (0.4135) data time 0.0007 (0.0024) model time 0.3993 (0.4109) loss 7.3704 (6.9508) grad_norm 3.3192 (3.0017) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][330/625] eta 0:02:01 lr 0.000432 wd 0.0500 time 0.3938 (0.4131) data time 0.0011 (0.0024) model time 0.3927 (0.4105) loss 6.6962 (6.9556) grad_norm 4.3721 (3.0001) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][340/625] eta 0:01:57 lr 0.000432 wd 0.0500 time 0.3951 (0.4126) data time 0.0007 (0.0023) model time 0.3943 (0.4100) loss 7.3941 (6.9436) grad_norm 2.2802 (3.0017) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][350/625] eta 0:01:53 lr 0.000432 wd 0.0500 time 0.3970 (0.4122) data time 0.0009 (0.0023) model time 0.3961 (0.4096) loss 7.4732 (6.9480) grad_norm 2.4203 (3.0004) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][360/625] eta 0:01:49 lr 0.000431 wd 0.0500 time 0.4033 (0.4120) data time 0.0009 (0.0023) model time 0.4024 (0.4093) loss 8.0582 (6.9579) grad_norm 1.9273 (2.9884) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][370/625] eta 0:01:44 lr 0.000431 wd 0.0500 time 0.3972 (0.4117) data time 0.0008 (0.0023) model time 0.3964 (0.4090) loss 5.7174 (6.9509) grad_norm 1.7792 (3.0121) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][380/625] eta 0:01:40 lr 0.000431 wd 0.0500 time 0.3981 (0.4117) data time 0.0007 (0.0022) model time 0.3975 (0.4091) loss 6.8847 (6.9498) grad_norm 2.1360 (2.9906) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][390/625] eta 0:01:36 lr 0.000431 wd 0.0500 time 0.3987 (0.4114) data time 0.0007 (0.0022) model time 0.3981 (0.4087) loss 7.0358 (6.9485) grad_norm 2.1765 (2.9723) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][400/625] eta 0:01:32 lr 0.000431 wd 0.0500 time 0.5283 (0.4116) data time 0.0009 (0.0022) model time 0.5275 (0.4090) loss 7.4398 (6.9505) grad_norm 2.3040 (2.9599) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][410/625] eta 0:01:28 lr 0.000431 wd 0.0500 time 0.6033 (0.4125) data time 0.0010 (0.0021) model time 0.6023 (0.4101) loss 7.6752 (6.9638) grad_norm 1.9717 (2.9611) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][420/625] eta 0:01:24 lr 0.000431 wd 0.0500 time 0.4013 (0.4140) data time 0.0008 (0.0021) model time 0.4006 (0.4118) loss 7.0530 (6.9600) grad_norm 4.0545 (2.9710) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][430/625] eta 0:01:21 lr 0.000431 wd 0.0500 time 0.3961 (0.4159) data time 0.0009 (0.0021) model time 0.3952 (0.4140) loss 7.5836 (6.9569) grad_norm 5.9300 (3.0002) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][440/625] eta 0:01:17 lr 0.000431 wd 0.0500 time 0.5732 (0.4163) data time 0.0007 (0.0021) model time 0.5725 (0.4146) loss 6.3451 (6.9535) grad_norm 3.1632 (3.0059) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][450/625] eta 0:01:12 lr 0.000431 wd 0.0500 time 0.4004 (0.4164) data time 0.0006 (0.0020) model time 0.3997 (0.4146) loss 6.3542 (6.9473) grad_norm 1.8973 (2.9905) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:39:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][460/625] eta 0:01:08 lr 0.000430 wd 0.0500 time 0.4008 (0.4161) data time 0.0007 (0.0020) model time 0.4001 (0.4143) loss 7.8062 (6.9550) grad_norm 1.6897 (2.9725) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][470/625] eta 0:01:04 lr 0.000430 wd 0.0500 time 0.3986 (0.4157) data time 0.0007 (0.0020) model time 0.3978 (0.4139) loss 5.8662 (6.9492) grad_norm 2.0246 (2.9636) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][480/625] eta 0:01:00 lr 0.000430 wd 0.0500 time 0.4012 (0.4154) data time 0.0008 (0.0020) model time 0.4004 (0.4136) loss 7.5772 (6.9438) grad_norm 2.0895 (2.9497) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][490/625] eta 0:00:56 lr 0.000430 wd 0.0500 time 0.3979 (0.4151) data time 0.0009 (0.0019) model time 0.3971 (0.4133) loss 7.4330 (6.9436) grad_norm 2.2346 (2.9331) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][500/625] eta 0:00:51 lr 0.000430 wd 0.0500 time 0.3982 (0.4148) data time 0.0007 (0.0019) model time 0.3975 (0.4129) loss 6.8287 (6.9499) grad_norm 1.7125 (2.9177) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][510/625] eta 0:00:47 lr 0.000430 wd 0.0500 time 0.3991 (0.4145) data time 0.0006 (0.0019) model time 0.3985 (0.4127) loss 8.0263 (6.9477) grad_norm 2.4718 (2.9019) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][520/625] eta 0:00:43 lr 0.000430 wd 0.0500 time 0.3998 (0.4143) data time 0.0006 (0.0019) model time 0.3992 (0.4125) loss 7.7659 (6.9480) grad_norm 2.1591 (2.8976) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][530/625] eta 0:00:39 lr 0.000430 wd 0.0500 time 0.4007 (0.4140) data time 0.0006 (0.0019) model time 0.4001 (0.4122) loss 7.4585 (6.9477) grad_norm 4.1450 (2.9020) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][540/625] eta 0:00:35 lr 0.000430 wd 0.0500 time 0.3999 (0.4138) data time 0.0008 (0.0018) model time 0.3991 (0.4119) loss 7.4155 (6.9432) grad_norm 3.9447 (2.9055) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][550/625] eta 0:00:31 lr 0.000430 wd 0.0500 time 0.3966 (0.4135) data time 0.0007 (0.0018) model time 0.3959 (0.4117) loss 7.8575 (6.9455) grad_norm 4.0933 (2.9107) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][560/625] eta 0:00:26 lr 0.000429 wd 0.0500 time 0.3946 (0.4132) data time 0.0009 (0.0018) model time 0.3937 (0.4114) loss 7.1050 (6.9384) grad_norm 2.1732 (2.9091) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][570/625] eta 0:00:22 lr 0.000429 wd 0.0500 time 0.3964 (0.4130) data time 0.0006 (0.0018) model time 0.3958 (0.4111) loss 7.0204 (6.9485) grad_norm 1.7878 (2.9025) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][580/625] eta 0:00:18 lr 0.000429 wd 0.0500 time 0.4044 (0.4128) data time 0.0006 (0.0018) model time 0.4038 (0.4109) loss 6.5770 (6.9399) grad_norm 3.2784 (2.8980) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][590/625] eta 0:00:14 lr 0.000429 wd 0.0500 time 0.3942 (0.4125) data time 0.0007 (0.0018) model time 0.3935 (0.4106) loss 7.6629 (6.9394) grad_norm 4.8801 (2.9168) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][600/625] eta 0:00:10 lr 0.000429 wd 0.0500 time 0.4035 (0.4123) data time 0.0009 (0.0018) model time 0.4026 (0.4104) loss 6.0891 (6.9428) grad_norm 1.8528 (2.9143) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:40:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][610/625] eta 0:00:06 lr 0.000429 wd 0.0500 time 0.3960 (0.4121) data time 0.0006 (0.0017) model time 0.3953 (0.4101) loss 6.8604 (6.9376) grad_norm 2.2780 (2.9130) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][620/625] eta 0:00:02 lr 0.000429 wd 0.0500 time 0.5643 (0.4121) data time 0.0007 (0.0017) model time 0.5636 (0.4102) loss 6.5495 (6.9429) grad_norm 3.8735 (2.9102) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 186 training takes 0:04:17 [2024-07-25 05:41:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:41:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:41:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5908 (0.5908) Acc@1 88.477 (88.477) Acc@5 98.486 (98.486) Mem 14939MB [2024-07-25 05:41:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8833 (0.7117) Acc@1 80.908 (85.951) Acc@5 96.191 (97.559) Mem 14939MB [2024-07-25 05:41:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0137 (0.8345) Acc@1 77.051 (82.740) Acc@5 95.068 (96.373) Mem 14939MB [2024-07-25 05:41:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.386 Acc@5 96.349 [2024-07-25 05:41:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 05:41:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.777 (0.777) Loss 0.5498 (0.5498) Acc@1 89.990 (89.990) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:41:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.8682 (0.6845) Acc@1 81.445 (86.408) Acc@5 96.240 (97.727) Mem 14939MB [2024-07-25 05:41:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0010 (0.8012) Acc@1 76.172 (83.133) Acc@5 95.410 (96.601) Mem 14939MB [2024-07-25 05:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.734 Acc@5 96.559 [2024-07-25 05:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-07-25 05:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.73% [2024-07-25 05:41:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:41:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:41:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][0/625] eta 0:07:44 lr 0.000429 wd 0.0500 time 0.7427 (0.7427) data time 0.3542 (0.3542) model time 0.0000 (0.0000) loss 6.6344 (6.6344) grad_norm 2.2380 (2.2380) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][10/625] eta 0:04:54 lr 0.000429 wd 0.0500 time 0.4004 (0.4781) data time 0.0007 (0.0330) model time 0.0000 (0.0000) loss 7.8475 (7.0815) grad_norm 1.8663 (3.0265) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][20/625] eta 0:04:52 lr 0.000429 wd 0.0500 time 0.5988 (0.4842) data time 0.0006 (0.0177) model time 0.0000 (0.0000) loss 8.1295 (7.0943) grad_norm 2.3168 (3.6955) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][30/625] eta 0:04:44 lr 0.000428 wd 0.0500 time 0.3998 (0.4786) data time 0.0009 (0.0123) model time 0.0000 (0.0000) loss 6.6385 (7.1185) grad_norm 2.0354 (3.4315) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][40/625] eta 0:04:35 lr 0.000428 wd 0.0500 time 0.3977 (0.4706) data time 0.0009 (0.0095) model time 0.0000 (0.0000) loss 6.5673 (6.9975) grad_norm 2.3055 (3.1006) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][50/625] eta 0:04:22 lr 0.000428 wd 0.0500 time 0.4003 (0.4565) data time 0.0006 (0.0078) model time 0.0000 (0.0000) loss 6.8866 (6.9728) grad_norm 2.3846 (3.1520) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][60/625] eta 0:04:12 lr 0.000428 wd 0.0500 time 0.3985 (0.4477) data time 0.0006 (0.0067) model time 0.3979 (0.4018) loss 5.7406 (6.9635) grad_norm 2.1029 (3.2847) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][70/625] eta 0:04:04 lr 0.000428 wd 0.0500 time 0.3978 (0.4410) data time 0.0008 (0.0058) model time 0.3970 (0.4007) loss 7.5522 (6.9829) grad_norm 2.6116 (3.2179) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][80/625] eta 0:03:57 lr 0.000428 wd 0.0500 time 0.4000 (0.4360) data time 0.0007 (0.0052) model time 0.3994 (0.4001) loss 6.4147 (6.9953) grad_norm 1.8741 (3.1127) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][90/625] eta 0:03:51 lr 0.000428 wd 0.0500 time 0.4007 (0.4320) data time 0.0006 (0.0048) model time 0.4001 (0.3999) loss 5.8137 (6.9721) grad_norm 2.0539 (3.0057) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][100/625] eta 0:03:45 lr 0.000428 wd 0.0500 time 0.4032 (0.4291) data time 0.0006 (0.0044) model time 0.4026 (0.4002) loss 5.9858 (6.9539) grad_norm 6.0052 (2.9686) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:41:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][110/625] eta 0:03:39 lr 0.000428 wd 0.0500 time 0.3958 (0.4263) data time 0.0007 (0.0041) model time 0.3952 (0.3997) loss 7.2508 (6.9576) grad_norm 2.2832 (2.9254) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][120/625] eta 0:03:34 lr 0.000428 wd 0.0500 time 0.3976 (0.4242) data time 0.0009 (0.0038) model time 0.3967 (0.3997) loss 7.7285 (6.9689) grad_norm 4.6443 (2.9793) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][130/625] eta 0:03:29 lr 0.000427 wd 0.0500 time 0.3989 (0.4223) data time 0.0007 (0.0036) model time 0.3982 (0.3996) loss 6.9308 (6.9620) grad_norm 3.0623 (3.0073) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][140/625] eta 0:03:24 lr 0.000427 wd 0.0500 time 0.3974 (0.4207) data time 0.0008 (0.0034) model time 0.3966 (0.3995) loss 6.3950 (6.9571) grad_norm 3.5635 (2.9988) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][150/625] eta 0:03:19 lr 0.000427 wd 0.0500 time 0.3982 (0.4192) data time 0.0006 (0.0032) model time 0.3976 (0.3992) loss 7.0913 (6.9623) grad_norm 8.7047 (3.0105) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][160/625] eta 0:03:14 lr 0.000427 wd 0.0500 time 0.4020 (0.4179) data time 0.0008 (0.0031) model time 0.4012 (0.3992) loss 7.3159 (6.9861) grad_norm 3.0075 (2.9787) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][170/625] eta 0:03:09 lr 0.000427 wd 0.0500 time 0.3987 (0.4168) data time 0.0007 (0.0030) model time 0.3980 (0.3990) loss 6.5772 (6.9732) grad_norm 2.4811 (2.9573) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][180/625] eta 0:03:05 lr 0.000427 wd 0.0500 time 0.4014 (0.4159) data time 0.0009 (0.0028) model time 0.4005 (0.3990) loss 8.1103 (6.9553) grad_norm 3.5026 (2.9835) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][190/625] eta 0:03:00 lr 0.000427 wd 0.0500 time 0.3973 (0.4150) data time 0.0007 (0.0027) model time 0.3966 (0.3990) loss 7.2540 (6.9617) grad_norm 2.1688 (2.9886) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][200/625] eta 0:02:56 lr 0.000427 wd 0.0500 time 0.4024 (0.4142) data time 0.0009 (0.0026) model time 0.4015 (0.3989) loss 7.0256 (6.9578) grad_norm 3.1128 (2.9954) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][210/625] eta 0:02:51 lr 0.000427 wd 0.0500 time 0.4011 (0.4135) data time 0.0007 (0.0026) model time 0.4004 (0.3990) loss 6.1919 (6.9810) grad_norm 4.1728 (2.9869) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][220/625] eta 0:02:47 lr 0.000427 wd 0.0500 time 0.4018 (0.4145) data time 0.0008 (0.0025) model time 0.4010 (0.4009) loss 5.6929 (6.9672) grad_norm 4.4710 (2.9820) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:42:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][230/625] eta 0:02:44 lr 0.000426 wd 0.0500 time 0.3971 (0.4174) data time 0.0007 (0.0024) model time 0.3964 (0.4054) loss 7.6605 (6.9881) grad_norm 5.4441 (inf) loss_scale 256.0000 (508.6753) mem 14939MB [2024-07-25 05:42:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][240/625] eta 0:02:41 lr 0.000426 wd 0.0500 time 0.3956 (0.4193) data time 0.0008 (0.0024) model time 0.3948 (0.4084) loss 7.9336 (6.9942) grad_norm 3.8137 (inf) loss_scale 256.0000 (498.1909) mem 14939MB [2024-07-25 05:42:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][250/625] eta 0:02:38 lr 0.000426 wd 0.0500 time 0.3974 (0.4214) data time 0.0010 (0.0023) model time 0.3964 (0.4115) loss 6.3616 (6.9866) grad_norm 2.6969 (inf) loss_scale 256.0000 (488.5418) mem 14939MB [2024-07-25 05:43:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][260/625] eta 0:02:34 lr 0.000426 wd 0.0500 time 0.3988 (0.4225) data time 0.0009 (0.0023) model time 0.3979 (0.4133) loss 6.9502 (6.9900) grad_norm 4.0934 (inf) loss_scale 256.0000 (479.6322) mem 14939MB [2024-07-25 05:43:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][270/625] eta 0:02:29 lr 0.000426 wd 0.0500 time 0.3952 (0.4216) data time 0.0010 (0.0022) model time 0.3942 (0.4126) loss 6.3852 (6.9881) grad_norm 3.2121 (inf) loss_scale 256.0000 (471.3801) mem 14939MB [2024-07-25 05:43:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][280/625] eta 0:02:25 lr 0.000426 wd 0.0500 time 0.3984 (0.4208) data time 0.0009 (0.0022) model time 0.3974 (0.4120) loss 7.4753 (6.9897) grad_norm 4.9625 (inf) loss_scale 256.0000 (463.7153) mem 14939MB [2024-07-25 05:43:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][290/625] eta 0:02:20 lr 0.000426 wd 0.0500 time 0.4182 (0.4202) data time 0.0006 (0.0022) model time 0.4176 (0.4115) loss 7.4821 (6.9939) grad_norm 2.4687 (inf) loss_scale 256.0000 (456.5773) mem 14939MB [2024-07-25 05:43:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][300/625] eta 0:02:16 lr 0.000426 wd 0.0500 time 0.3936 (0.4195) data time 0.0009 (0.0021) model time 0.3926 (0.4110) loss 6.2901 (6.9957) grad_norm 2.9563 (inf) loss_scale 256.0000 (449.9136) mem 14939MB [2024-07-25 05:43:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][310/625] eta 0:02:11 lr 0.000426 wd 0.0500 time 0.3991 (0.4189) data time 0.0007 (0.0021) model time 0.3985 (0.4106) loss 8.4061 (7.0030) grad_norm 2.9894 (inf) loss_scale 256.0000 (443.6785) mem 14939MB [2024-07-25 05:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][320/625] eta 0:02:07 lr 0.000426 wd 0.0500 time 0.4006 (0.4183) data time 0.0008 (0.0021) model time 0.3997 (0.4102) loss 6.6864 (7.0025) grad_norm 4.2068 (inf) loss_scale 256.0000 (437.8318) mem 14939MB [2024-07-25 05:43:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][330/625] eta 0:02:03 lr 0.000425 wd 0.0500 time 0.3960 (0.4178) data time 0.0007 (0.0020) model time 0.3954 (0.4097) loss 6.0879 (6.9973) grad_norm 3.3742 (inf) loss_scale 256.0000 (432.3384) mem 14939MB [2024-07-25 05:43:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][340/625] eta 0:01:58 lr 0.000425 wd 0.0500 time 0.3960 (0.4172) data time 0.0008 (0.0020) model time 0.3952 (0.4094) loss 7.1500 (6.9975) grad_norm 1.8306 (inf) loss_scale 256.0000 (427.1672) mem 14939MB [2024-07-25 05:43:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][350/625] eta 0:01:54 lr 0.000425 wd 0.0500 time 0.4003 (0.4168) data time 0.0007 (0.0020) model time 0.3997 (0.4091) loss 6.2742 (6.9952) grad_norm 4.5502 (inf) loss_scale 256.0000 (422.2906) mem 14939MB [2024-07-25 05:43:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][360/625] eta 0:01:50 lr 0.000425 wd 0.0500 time 0.3965 (0.4163) data time 0.0007 (0.0019) model time 0.3959 (0.4087) loss 7.4736 (6.9935) grad_norm 2.6224 (inf) loss_scale 256.0000 (417.6842) mem 14939MB [2024-07-25 05:43:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][370/625] eta 0:01:46 lr 0.000425 wd 0.0500 time 0.3970 (0.4159) data time 0.0009 (0.0019) model time 0.3961 (0.4084) loss 6.4605 (6.9792) grad_norm 4.6760 (inf) loss_scale 256.0000 (413.3261) mem 14939MB [2024-07-25 05:43:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][380/625] eta 0:01:41 lr 0.000425 wd 0.0500 time 0.4698 (0.4156) data time 0.0007 (0.0019) model time 0.4691 (0.4083) loss 6.1069 (6.9804) grad_norm 2.7420 (inf) loss_scale 256.0000 (409.1969) mem 14939MB [2024-07-25 05:43:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][390/625] eta 0:01:37 lr 0.000425 wd 0.0500 time 0.3954 (0.4152) data time 0.0009 (0.0019) model time 0.3944 (0.4080) loss 7.8996 (6.9802) grad_norm 2.7234 (inf) loss_scale 256.0000 (405.2788) mem 14939MB [2024-07-25 05:43:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][400/625] eta 0:01:33 lr 0.000425 wd 0.0500 time 0.4013 (0.4149) data time 0.0008 (0.0018) model time 0.4005 (0.4079) loss 6.3566 (6.9736) grad_norm 3.7582 (inf) loss_scale 256.0000 (401.5561) mem 14939MB [2024-07-25 05:44:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][410/625] eta 0:01:29 lr 0.000425 wd 0.0500 time 0.4029 (0.4145) data time 0.0009 (0.0018) model time 0.4021 (0.4076) loss 8.5556 (6.9890) grad_norm 2.6292 (inf) loss_scale 256.0000 (398.0146) mem 14939MB [2024-07-25 05:44:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][420/625] eta 0:01:24 lr 0.000425 wd 0.0500 time 0.3958 (0.4142) data time 0.0009 (0.0018) model time 0.3949 (0.4074) loss 7.6518 (6.9941) grad_norm 3.5802 (inf) loss_scale 256.0000 (394.6413) mem 14939MB [2024-07-25 05:44:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][430/625] eta 0:01:20 lr 0.000424 wd 0.0500 time 0.4028 (0.4138) data time 0.0007 (0.0018) model time 0.4021 (0.4072) loss 7.2238 (7.0033) grad_norm 2.9350 (inf) loss_scale 256.0000 (391.4246) mem 14939MB [2024-07-25 05:44:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][440/625] eta 0:01:16 lr 0.000424 wd 0.0500 time 0.4035 (0.4143) data time 0.0006 (0.0018) model time 0.4029 (0.4078) loss 7.1690 (6.9987) grad_norm 1.8753 (inf) loss_scale 256.0000 (388.3537) mem 14939MB [2024-07-25 05:44:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][450/625] eta 0:01:12 lr 0.000424 wd 0.0500 time 0.5798 (0.4156) data time 0.0006 (0.0017) model time 0.5792 (0.4094) loss 7.6636 (7.0053) grad_norm 3.0782 (inf) loss_scale 256.0000 (385.4191) mem 14939MB [2024-07-25 05:44:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][460/625] eta 0:01:08 lr 0.000424 wd 0.0500 time 0.5874 (0.4172) data time 0.0009 (0.0017) model time 0.5865 (0.4113) loss 5.5770 (7.0060) grad_norm 4.6539 (inf) loss_scale 256.0000 (382.6117) mem 14939MB [2024-07-25 05:44:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][470/625] eta 0:01:04 lr 0.000424 wd 0.0500 time 0.6046 (0.4192) data time 0.0009 (0.0017) model time 0.6037 (0.4137) loss 7.0011 (7.0043) grad_norm 1.8007 (inf) loss_scale 256.0000 (379.9236) mem 14939MB [2024-07-25 05:44:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][480/625] eta 0:01:00 lr 0.000424 wd 0.0500 time 0.3977 (0.4194) data time 0.0006 (0.0017) model time 0.3971 (0.4140) loss 6.4073 (6.9950) grad_norm 3.8484 (inf) loss_scale 256.0000 (377.3472) mem 14939MB [2024-07-25 05:44:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][490/625] eta 0:00:56 lr 0.000424 wd 0.0500 time 0.3974 (0.4190) data time 0.0006 (0.0017) model time 0.3968 (0.4137) loss 5.3346 (6.9840) grad_norm 3.5072 (inf) loss_scale 256.0000 (374.8758) mem 14939MB [2024-07-25 05:44:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][500/625] eta 0:00:52 lr 0.000424 wd 0.0500 time 0.3971 (0.4186) data time 0.0007 (0.0017) model time 0.3964 (0.4133) loss 7.3742 (6.9850) grad_norm 2.6452 (inf) loss_scale 256.0000 (372.5030) mem 14939MB [2024-07-25 05:44:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][510/625] eta 0:00:48 lr 0.000424 wd 0.0500 time 0.3980 (0.4182) data time 0.0009 (0.0016) model time 0.3971 (0.4130) loss 6.9716 (6.9871) grad_norm 4.1625 (inf) loss_scale 256.0000 (370.2231) mem 14939MB [2024-07-25 05:44:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][520/625] eta 0:00:43 lr 0.000424 wd 0.0500 time 0.3963 (0.4179) data time 0.0006 (0.0016) model time 0.3957 (0.4127) loss 7.2306 (6.9851) grad_norm 2.5736 (inf) loss_scale 256.0000 (368.0307) mem 14939MB [2024-07-25 05:44:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][530/625] eta 0:00:39 lr 0.000423 wd 0.0500 time 0.3962 (0.4175) data time 0.0008 (0.0016) model time 0.3954 (0.4124) loss 7.9658 (6.9834) grad_norm 2.5292 (inf) loss_scale 256.0000 (365.9209) mem 14939MB [2024-07-25 05:44:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][540/625] eta 0:00:35 lr 0.000423 wd 0.0500 time 0.4044 (0.4172) data time 0.0007 (0.0016) model time 0.4037 (0.4122) loss 7.0692 (6.9826) grad_norm 3.4942 (inf) loss_scale 256.0000 (363.8891) mem 14939MB [2024-07-25 05:45:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][550/625] eta 0:00:31 lr 0.000423 wd 0.0500 time 0.3991 (0.4169) data time 0.0009 (0.0016) model time 0.3982 (0.4119) loss 7.7157 (6.9858) grad_norm 4.6891 (inf) loss_scale 256.0000 (361.9310) mem 14939MB [2024-07-25 05:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][560/625] eta 0:00:27 lr 0.000423 wd 0.0500 time 0.3989 (0.4166) data time 0.0009 (0.0016) model time 0.3980 (0.4116) loss 7.6901 (6.9862) grad_norm 41.5506 (inf) loss_scale 256.0000 (360.0428) mem 14939MB [2024-07-25 05:45:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][570/625] eta 0:00:22 lr 0.000423 wd 0.0500 time 0.3984 (0.4163) data time 0.0008 (0.0016) model time 0.3976 (0.4113) loss 7.3528 (6.9875) grad_norm 5.0318 (inf) loss_scale 256.0000 (358.2207) mem 14939MB [2024-07-25 05:45:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][580/625] eta 0:00:18 lr 0.000423 wd 0.0500 time 0.3985 (0.4160) data time 0.0006 (0.0016) model time 0.3978 (0.4111) loss 7.4652 (6.9839) grad_norm 4.7029 (inf) loss_scale 256.0000 (356.4613) mem 14939MB [2024-07-25 05:45:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][590/625] eta 0:00:14 lr 0.000423 wd 0.0500 time 0.4042 (0.4157) data time 0.0007 (0.0015) model time 0.4035 (0.4109) loss 7.2471 (6.9816) grad_norm 3.0961 (inf) loss_scale 256.0000 (354.7614) mem 14939MB [2024-07-25 05:45:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][600/625] eta 0:00:10 lr 0.000423 wd 0.0500 time 0.4031 (0.4154) data time 0.0007 (0.0015) model time 0.4024 (0.4106) loss 7.5908 (6.9859) grad_norm 3.4569 (inf) loss_scale 256.0000 (353.1181) mem 14939MB [2024-07-25 05:45:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][610/625] eta 0:00:06 lr 0.000423 wd 0.0500 time 0.3966 (0.4152) data time 0.0006 (0.0015) model time 0.3960 (0.4104) loss 7.4826 (6.9826) grad_norm 2.7506 (inf) loss_scale 256.0000 (351.5286) mem 14939MB [2024-07-25 05:45:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][620/625] eta 0:00:02 lr 0.000422 wd 0.0500 time 0.3984 (0.4149) data time 0.0006 (0.0015) model time 0.3978 (0.4102) loss 6.5576 (6.9856) grad_norm 3.7069 (inf) loss_scale 256.0000 (349.9903) mem 14939MB [2024-07-25 05:45:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 187 training takes 0:04:19 [2024-07-25 05:45:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:45:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:45:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.5742 (0.5742) Acc@1 89.551 (89.551) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 05:45:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.9009 (0.7176) Acc@1 79.932 (85.698) Acc@5 95.508 (97.479) Mem 14939MB [2024-07-25 05:45:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0557 (0.8409) Acc@1 75.635 (82.496) Acc@5 94.287 (96.201) Mem 14939MB [2024-07-25 05:45:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.102 Acc@5 96.193 [2024-07-25 05:45:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-07-25 05:45:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.775 (0.775) Loss 0.5493 (0.5493) Acc@1 90.088 (90.088) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.154) Loss 0.8672 (0.6840) Acc@1 81.641 (86.439) Acc@5 96.240 (97.723) Mem 14939MB [2024-07-25 05:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 1.0000 (0.8005) Acc@1 76.172 (83.173) Acc@5 95.410 (96.601) Mem 14939MB [2024-07-25 05:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.766 Acc@5 96.559 [2024-07-25 05:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 05:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.77% [2024-07-25 05:45:36 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:45:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:45:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][0/625] eta 0:07:34 lr 0.000422 wd 0.0500 time 0.7274 (0.7274) data time 0.3445 (0.3445) model time 0.0000 (0.0000) loss 8.5218 (8.5218) grad_norm 16.9616 (16.9616) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:45:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][10/625] eta 0:04:23 lr 0.000422 wd 0.0500 time 0.3969 (0.4291) data time 0.0008 (0.0321) model time 0.0000 (0.0000) loss 7.0738 (7.1371) grad_norm 6.7639 (6.1286) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:45:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][20/625] eta 0:04:10 lr 0.000422 wd 0.0500 time 0.3956 (0.4147) data time 0.0008 (0.0173) model time 0.0000 (0.0000) loss 5.9703 (7.1644) grad_norm 2.5883 (5.1794) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:45:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][30/625] eta 0:04:07 lr 0.000422 wd 0.0500 time 0.3951 (0.4152) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 6.9908 (7.1754) grad_norm 2.7712 (5.4914) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:45:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][40/625] eta 0:04:07 lr 0.000422 wd 0.0500 time 0.5973 (0.4239) data time 0.0006 (0.0093) model time 0.0000 (0.0000) loss 8.0849 (7.1260) grad_norm 3.1801 (5.0376) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][50/625] eta 0:04:05 lr 0.000422 wd 0.0500 time 0.6168 (0.4263) data time 0.0008 (0.0076) model time 0.0000 (0.0000) loss 6.5111 (7.1294) grad_norm 3.0528 (4.6677) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][60/625] eta 0:04:05 lr 0.000422 wd 0.0500 time 0.5246 (0.4346) data time 0.0007 (0.0065) model time 0.5239 (0.4759) loss 6.6770 (7.0716) grad_norm 3.5722 (4.5125) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][70/625] eta 0:04:06 lr 0.000422 wd 0.0500 time 0.5581 (0.4442) data time 0.0008 (0.0057) model time 0.5573 (0.4890) loss 7.1049 (7.0622) grad_norm 9.3615 (4.4099) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][80/625] eta 0:04:00 lr 0.000422 wd 0.0500 time 0.3935 (0.4414) data time 0.0007 (0.0051) model time 0.3927 (0.4661) loss 6.4049 (7.0593) grad_norm 3.8451 (4.5118) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][90/625] eta 0:03:53 lr 0.000422 wd 0.0500 time 0.3961 (0.4367) data time 0.0008 (0.0047) model time 0.3952 (0.4490) loss 5.5541 (7.0684) grad_norm 5.9833 (4.5094) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][100/625] eta 0:03:47 lr 0.000421 wd 0.0500 time 0.3923 (0.4331) data time 0.0009 (0.0043) model time 0.3915 (0.4392) loss 8.2684 (7.0833) grad_norm 2.2188 (4.3600) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][110/625] eta 0:03:41 lr 0.000421 wd 0.0500 time 0.4060 (0.4303) data time 0.0006 (0.0040) model time 0.4053 (0.4328) loss 6.2242 (7.0555) grad_norm 2.2070 (4.2853) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][120/625] eta 0:03:36 lr 0.000421 wd 0.0500 time 0.4111 (0.4278) data time 0.0006 (0.0038) model time 0.4105 (0.4280) loss 7.4534 (7.0421) grad_norm 1.7823 (4.1587) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][130/625] eta 0:03:30 lr 0.000421 wd 0.0500 time 0.4060 (0.4258) data time 0.0006 (0.0035) model time 0.4053 (0.4245) loss 7.7191 (7.0601) grad_norm 1.9612 (4.0427) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][140/625] eta 0:03:25 lr 0.000421 wd 0.0500 time 0.3990 (0.4239) data time 0.0009 (0.0034) model time 0.3981 (0.4216) loss 6.6547 (7.0533) grad_norm 7.2370 (4.0139) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][150/625] eta 0:03:20 lr 0.000421 wd 0.0500 time 0.4105 (0.4223) data time 0.0009 (0.0032) model time 0.4096 (0.4194) loss 8.2371 (7.0725) grad_norm 2.0969 (3.9436) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][160/625] eta 0:03:15 lr 0.000421 wd 0.0500 time 0.3976 (0.4208) data time 0.0007 (0.0031) model time 0.3969 (0.4174) loss 5.5513 (7.0663) grad_norm 2.9676 (3.8980) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][170/625] eta 0:03:10 lr 0.000421 wd 0.0500 time 0.3951 (0.4196) data time 0.0009 (0.0029) model time 0.3942 (0.4159) loss 8.3669 (7.0720) grad_norm 2.9393 (3.8672) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][180/625] eta 0:03:06 lr 0.000421 wd 0.0500 time 0.4009 (0.4185) data time 0.0008 (0.0028) model time 0.4001 (0.4145) loss 7.7152 (7.0681) grad_norm 2.6073 (3.8039) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:46:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][190/625] eta 0:03:01 lr 0.000421 wd 0.0500 time 0.3969 (0.4176) data time 0.0008 (0.0027) model time 0.3960 (0.4135) loss 6.6389 (7.0593) grad_norm 2.4367 (3.7211) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][200/625] eta 0:02:57 lr 0.000420 wd 0.0500 time 0.3963 (0.4176) data time 0.0007 (0.0026) model time 0.3956 (0.4138) loss 6.5017 (7.0447) grad_norm 3.8422 (3.6608) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][210/625] eta 0:02:53 lr 0.000420 wd 0.0500 time 0.3956 (0.4169) data time 0.0009 (0.0026) model time 0.3948 (0.4129) loss 8.5491 (7.0585) grad_norm 4.3333 (3.6016) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][220/625] eta 0:02:48 lr 0.000420 wd 0.0500 time 0.3951 (0.4161) data time 0.0006 (0.0025) model time 0.3945 (0.4120) loss 6.6729 (7.0691) grad_norm 2.1966 (3.5528) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][230/625] eta 0:02:44 lr 0.000420 wd 0.0500 time 0.3967 (0.4153) data time 0.0008 (0.0025) model time 0.3958 (0.4112) loss 7.2904 (7.0576) grad_norm 3.2417 (3.5180) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][240/625] eta 0:02:39 lr 0.000420 wd 0.0500 time 0.3972 (0.4146) data time 0.0008 (0.0024) model time 0.3964 (0.4105) loss 7.3304 (7.0443) grad_norm 2.1153 (3.4765) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][250/625] eta 0:02:35 lr 0.000420 wd 0.0500 time 0.3919 (0.4146) data time 0.0010 (0.0023) model time 0.3909 (0.4107) loss 7.4732 (7.0572) grad_norm 2.4403 (3.4615) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][260/625] eta 0:02:31 lr 0.000420 wd 0.0500 time 0.4225 (0.4153) data time 0.0008 (0.0023) model time 0.4216 (0.4117) loss 7.6477 (7.0413) grad_norm 3.7640 (3.4234) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][270/625] eta 0:02:28 lr 0.000420 wd 0.0500 time 0.6278 (0.4177) data time 0.0007 (0.0022) model time 0.6271 (0.4148) loss 7.2186 (7.0371) grad_norm 3.3197 (3.4055) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][280/625] eta 0:02:24 lr 0.000420 wd 0.0500 time 0.4001 (0.4192) data time 0.0006 (0.0022) model time 0.3995 (0.4167) loss 7.8382 (7.0466) grad_norm 2.0690 (3.3682) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][290/625] eta 0:02:21 lr 0.000420 wd 0.0500 time 0.5575 (0.4216) data time 0.0006 (0.0021) model time 0.5569 (0.4196) loss 6.8653 (7.0305) grad_norm 2.7071 (3.3376) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][300/625] eta 0:02:16 lr 0.000419 wd 0.0500 time 0.4030 (0.4214) data time 0.0008 (0.0021) model time 0.4022 (0.4195) loss 6.5513 (7.0191) grad_norm 3.3927 (3.3196) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][310/625] eta 0:02:12 lr 0.000419 wd 0.0500 time 0.3954 (0.4207) data time 0.0006 (0.0021) model time 0.3948 (0.4186) loss 6.0941 (7.0008) grad_norm 2.1257 (3.3035) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][320/625] eta 0:02:08 lr 0.000419 wd 0.0500 time 0.4006 (0.4200) data time 0.0008 (0.0020) model time 0.3998 (0.4179) loss 6.6038 (7.0066) grad_norm 3.2184 (3.2824) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:47:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][330/625] eta 0:02:03 lr 0.000419 wd 0.0500 time 0.4017 (0.4194) data time 0.0009 (0.0020) model time 0.4008 (0.4172) loss 6.7543 (7.0022) grad_norm 1.7349 (3.2504) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][340/625] eta 0:01:59 lr 0.000419 wd 0.0500 time 0.4000 (0.4189) data time 0.0009 (0.0019) model time 0.3991 (0.4167) loss 7.4223 (7.0021) grad_norm 5.6199 (3.2482) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][350/625] eta 0:01:55 lr 0.000419 wd 0.0500 time 0.4080 (0.4184) data time 0.0008 (0.0019) model time 0.4072 (0.4161) loss 7.0031 (6.9997) grad_norm 4.9623 (3.2701) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][360/625] eta 0:01:50 lr 0.000419 wd 0.0500 time 0.4030 (0.4180) data time 0.0009 (0.0019) model time 0.4021 (0.4157) loss 7.2332 (6.9984) grad_norm 2.5345 (3.2586) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][370/625] eta 0:01:46 lr 0.000419 wd 0.0500 time 0.3943 (0.4176) data time 0.0008 (0.0019) model time 0.3935 (0.4152) loss 6.8312 (7.0027) grad_norm 2.7412 (3.2358) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][380/625] eta 0:01:42 lr 0.000419 wd 0.0500 time 0.3950 (0.4171) data time 0.0009 (0.0018) model time 0.3941 (0.4147) loss 7.1507 (7.0111) grad_norm 2.0430 (3.2149) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][390/625] eta 0:01:37 lr 0.000418 wd 0.0500 time 0.3978 (0.4166) data time 0.0007 (0.0018) model time 0.3971 (0.4142) loss 7.1839 (7.0107) grad_norm 2.4644 (3.2011) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][400/625] eta 0:01:33 lr 0.000418 wd 0.0500 time 0.3952 (0.4162) data time 0.0008 (0.0018) model time 0.3944 (0.4137) loss 7.6075 (7.0097) grad_norm 2.5845 (3.1948) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][410/625] eta 0:01:29 lr 0.000418 wd 0.0500 time 0.3977 (0.4157) data time 0.0009 (0.0018) model time 0.3968 (0.4133) loss 8.1676 (7.0185) grad_norm 5.1232 (3.2049) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][420/625] eta 0:01:25 lr 0.000418 wd 0.0500 time 0.3974 (0.4158) data time 0.0009 (0.0018) model time 0.3965 (0.4134) loss 7.3934 (7.0123) grad_norm 1.9760 (3.1888) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][430/625] eta 0:01:20 lr 0.000418 wd 0.0500 time 0.3968 (0.4154) data time 0.0009 (0.0017) model time 0.3959 (0.4130) loss 7.0786 (7.0113) grad_norm 1.7330 (3.1721) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][440/625] eta 0:01:16 lr 0.000418 wd 0.0500 time 0.3950 (0.4150) data time 0.0007 (0.0017) model time 0.3943 (0.4126) loss 6.6249 (7.0130) grad_norm 1.9332 (3.1560) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][450/625] eta 0:01:12 lr 0.000418 wd 0.0500 time 0.4004 (0.4146) data time 0.0008 (0.0017) model time 0.3996 (0.4122) loss 6.4247 (7.0127) grad_norm 2.6276 (3.1404) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][460/625] eta 0:01:08 lr 0.000418 wd 0.0500 time 0.3962 (0.4143) data time 0.0006 (0.0017) model time 0.3956 (0.4118) loss 7.3583 (7.0188) grad_norm 1.9450 (3.1221) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][470/625] eta 0:01:04 lr 0.000418 wd 0.0500 time 0.3950 (0.4143) data time 0.0008 (0.0017) model time 0.3941 (0.4119) loss 6.5592 (7.0198) grad_norm 6.5554 (3.1734) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:48:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][480/625] eta 0:01:00 lr 0.000418 wd 0.0500 time 0.3988 (0.4147) data time 0.0007 (0.0017) model time 0.3980 (0.4123) loss 6.4271 (7.0209) grad_norm 2.2217 (3.1970) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][490/625] eta 0:00:56 lr 0.000417 wd 0.0500 time 0.3952 (0.4159) data time 0.0009 (0.0016) model time 0.3943 (0.4138) loss 7.0180 (7.0252) grad_norm 2.8892 (3.1992) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][500/625] eta 0:00:52 lr 0.000417 wd 0.0500 time 0.5943 (0.4174) data time 0.0009 (0.0016) model time 0.5934 (0.4155) loss 8.1256 (7.0282) grad_norm 2.3731 (3.1925) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][510/625] eta 0:00:48 lr 0.000417 wd 0.0500 time 0.4013 (0.4186) data time 0.0008 (0.0016) model time 0.4005 (0.4168) loss 7.1979 (7.0228) grad_norm 1.6113 (3.1776) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][520/625] eta 0:00:43 lr 0.000417 wd 0.0500 time 0.4070 (0.4188) data time 0.0008 (0.0016) model time 0.4062 (0.4171) loss 7.1654 (7.0246) grad_norm 3.7856 (3.1728) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][530/625] eta 0:00:39 lr 0.000417 wd 0.0500 time 0.3997 (0.4185) data time 0.0008 (0.0016) model time 0.3989 (0.4167) loss 6.7300 (7.0220) grad_norm 4.3585 (3.1609) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][540/625] eta 0:00:35 lr 0.000417 wd 0.0500 time 0.3949 (0.4182) data time 0.0009 (0.0016) model time 0.3940 (0.4164) loss 7.4256 (7.0217) grad_norm 2.2021 (3.1898) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][550/625] eta 0:00:31 lr 0.000417 wd 0.0500 time 0.4092 (0.4179) data time 0.0006 (0.0016) model time 0.4086 (0.4161) loss 5.8627 (7.0140) grad_norm 3.4986 (3.1873) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][560/625] eta 0:00:27 lr 0.000417 wd 0.0500 time 0.3995 (0.4176) data time 0.0006 (0.0015) model time 0.3989 (0.4158) loss 6.6033 (7.0215) grad_norm 1.9388 (3.1759) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][570/625] eta 0:00:22 lr 0.000417 wd 0.0500 time 0.4009 (0.4173) data time 0.0009 (0.0015) model time 0.4001 (0.4155) loss 6.9738 (7.0153) grad_norm 2.3958 (3.1843) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][580/625] eta 0:00:18 lr 0.000417 wd 0.0500 time 0.3999 (0.4170) data time 0.0010 (0.0015) model time 0.3989 (0.4152) loss 7.8600 (7.0233) grad_norm 2.2075 (3.1781) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][590/625] eta 0:00:14 lr 0.000416 wd 0.0500 time 0.4025 (0.4168) data time 0.0006 (0.0015) model time 0.4020 (0.4150) loss 5.8181 (7.0240) grad_norm 1.8622 (3.1803) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][600/625] eta 0:00:10 lr 0.000416 wd 0.0500 time 0.3983 (0.4165) data time 0.0008 (0.0015) model time 0.3975 (0.4147) loss 7.7623 (7.0227) grad_norm 7.6117 (3.1789) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][610/625] eta 0:00:06 lr 0.000416 wd 0.0500 time 0.3974 (0.4163) data time 0.0006 (0.0015) model time 0.3969 (0.4145) loss 6.8893 (7.0196) grad_norm 2.8583 (3.1769) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][620/625] eta 0:00:02 lr 0.000416 wd 0.0500 time 0.4010 (0.4161) data time 0.0004 (0.0015) model time 0.4006 (0.4142) loss 5.9926 (7.0200) grad_norm 1.8868 (3.1670) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:49:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 188 training takes 0:04:19 [2024-07-25 05:49:57 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:49:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.444 (0.444) Loss 0.5645 (0.5645) Acc@1 89.307 (89.307) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 05:50:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.9121 (0.6981) Acc@1 79.639 (85.733) Acc@5 96.191 (97.541) Mem 14939MB [2024-07-25 05:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0127 (0.8174) Acc@1 76.123 (82.699) Acc@5 95.117 (96.429) Mem 14939MB [2024-07-25 05:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.336 Acc@5 96.415 [2024-07-25 05:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-07-25 05:50:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.778 (0.778) Loss 0.5493 (0.5493) Acc@1 90.039 (90.039) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:50:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.8667 (0.6837) Acc@1 81.543 (86.444) Acc@5 96.240 (97.723) Mem 14939MB [2024-07-25 05:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9985 (0.8000) Acc@1 76.416 (83.215) Acc@5 95.361 (96.580) Mem 14939MB [2024-07-25 05:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.806 Acc@5 96.545 [2024-07-25 05:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 05:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.81% [2024-07-25 05:50:04 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:50:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:50:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][0/625] eta 0:21:54 lr 0.000416 wd 0.0500 time 2.1034 (2.1034) data time 1.6330 (1.6330) model time 0.0000 (0.0000) loss 6.9579 (6.9579) grad_norm 4.3576 (4.3576) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][10/625] eta 0:05:41 lr 0.000416 wd 0.0500 time 0.3980 (0.5558) data time 0.0008 (0.1493) model time 0.0000 (0.0000) loss 6.9183 (7.1037) grad_norm 13.5593 (3.9913) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][20/625] eta 0:04:51 lr 0.000416 wd 0.0500 time 0.3967 (0.4816) data time 0.0008 (0.0787) model time 0.0000 (0.0000) loss 7.0823 (7.1161) grad_norm 1.9192 (3.1112) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][30/625] eta 0:04:31 lr 0.000416 wd 0.0500 time 0.4010 (0.4557) data time 0.0007 (0.0536) model time 0.0000 (0.0000) loss 5.9857 (6.9764) grad_norm 2.2097 (3.1607) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][40/625] eta 0:04:18 lr 0.000416 wd 0.0500 time 0.3961 (0.4425) data time 0.0007 (0.0407) model time 0.0000 (0.0000) loss 6.6179 (7.0141) grad_norm 4.2204 (3.1177) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][50/625] eta 0:04:09 lr 0.000416 wd 0.0500 time 0.3974 (0.4340) data time 0.0008 (0.0329) model time 0.0000 (0.0000) loss 6.5654 (7.0378) grad_norm 5.7681 (3.1824) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][60/625] eta 0:04:02 lr 0.000416 wd 0.0500 time 0.4101 (0.4294) data time 0.0008 (0.0277) model time 0.4092 (0.4054) loss 6.1928 (7.0218) grad_norm 2.1004 (3.1567) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][70/625] eta 0:03:58 lr 0.000415 wd 0.0500 time 0.5693 (0.4306) data time 0.0007 (0.0239) model time 0.5687 (0.4209) loss 6.0518 (6.9995) grad_norm 2.1216 (3.0081) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][80/625] eta 0:03:54 lr 0.000415 wd 0.0500 time 0.5431 (0.4307) data time 0.0006 (0.0211) model time 0.5425 (0.4242) loss 7.5493 (6.9929) grad_norm 6.9333 (2.9991) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][90/625] eta 0:03:52 lr 0.000415 wd 0.0500 time 0.5781 (0.4355) data time 0.0007 (0.0189) model time 0.5775 (0.4365) loss 6.4483 (7.0066) grad_norm 3.0040 (3.0164) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][100/625] eta 0:03:50 lr 0.000415 wd 0.0500 time 0.5873 (0.4397) data time 0.0009 (0.0171) model time 0.5864 (0.4446) loss 7.5508 (7.0184) grad_norm 4.3384 (3.0979) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][110/625] eta 0:03:46 lr 0.000415 wd 0.0500 time 0.5252 (0.4398) data time 0.0007 (0.0157) model time 0.5245 (0.4437) loss 6.4892 (6.9974) grad_norm 3.5652 (3.1695) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][120/625] eta 0:03:40 lr 0.000415 wd 0.0500 time 0.3952 (0.4365) data time 0.0006 (0.0145) model time 0.3945 (0.4374) loss 5.4239 (6.9634) grad_norm 2.8048 (3.1312) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][130/625] eta 0:03:34 lr 0.000415 wd 0.0500 time 0.4033 (0.4339) data time 0.0007 (0.0134) model time 0.4026 (0.4328) loss 7.2270 (6.9608) grad_norm 3.4652 (3.1016) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][140/625] eta 0:03:29 lr 0.000415 wd 0.0500 time 0.3932 (0.4315) data time 0.0009 (0.0126) model time 0.3923 (0.4290) loss 7.4711 (6.9294) grad_norm 2.4166 (3.0678) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][150/625] eta 0:03:23 lr 0.000415 wd 0.0500 time 0.3981 (0.4294) data time 0.0007 (0.0118) model time 0.3974 (0.4261) loss 7.9338 (6.9285) grad_norm 2.2802 (3.0424) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][160/625] eta 0:03:18 lr 0.000415 wd 0.0500 time 0.4093 (0.4276) data time 0.0009 (0.0111) model time 0.4085 (0.4237) loss 6.4860 (6.9318) grad_norm 2.7512 (3.0044) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][170/625] eta 0:03:14 lr 0.000414 wd 0.0500 time 0.3781 (0.4270) data time 0.0008 (0.0105) model time 0.3772 (0.4230) loss 7.2682 (6.9395) grad_norm 2.3237 (3.0002) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][180/625] eta 0:03:09 lr 0.000414 wd 0.0500 time 0.4009 (0.4255) data time 0.0008 (0.0100) model time 0.4002 (0.4211) loss 8.1166 (6.9466) grad_norm 1.8347 (2.9625) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][190/625] eta 0:03:04 lr 0.000414 wd 0.0500 time 0.4006 (0.4242) data time 0.0008 (0.0095) model time 0.3998 (0.4196) loss 7.5723 (6.9689) grad_norm 1.9113 (2.9496) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][200/625] eta 0:02:59 lr 0.000414 wd 0.0500 time 0.3938 (0.4230) data time 0.0010 (0.0091) model time 0.3929 (0.4183) loss 7.5302 (6.9435) grad_norm 2.0262 (2.9289) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][210/625] eta 0:02:55 lr 0.000414 wd 0.0500 time 0.3992 (0.4219) data time 0.0007 (0.0087) model time 0.3985 (0.4171) loss 7.1909 (6.9526) grad_norm 1.9929 (2.9722) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][220/625] eta 0:02:50 lr 0.000414 wd 0.0500 time 0.3983 (0.4209) data time 0.0007 (0.0083) model time 0.3976 (0.4160) loss 7.8796 (6.9380) grad_norm 2.4316 (2.9539) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][230/625] eta 0:02:45 lr 0.000414 wd 0.0500 time 0.3949 (0.4199) data time 0.0009 (0.0080) model time 0.3940 (0.4150) loss 6.7830 (6.9574) grad_norm 1.8494 (2.9225) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][240/625] eta 0:02:41 lr 0.000414 wd 0.0500 time 0.3934 (0.4190) data time 0.0009 (0.0077) model time 0.3925 (0.4140) loss 6.6686 (6.9492) grad_norm 2.5581 (2.9255) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][250/625] eta 0:02:36 lr 0.000414 wd 0.0500 time 0.4030 (0.4182) data time 0.0007 (0.0075) model time 0.4023 (0.4133) loss 6.4777 (6.9305) grad_norm 3.1309 (2.9272) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][260/625] eta 0:02:32 lr 0.000413 wd 0.0500 time 0.3958 (0.4176) data time 0.0006 (0.0072) model time 0.3952 (0.4126) loss 6.4157 (6.9572) grad_norm 2.1403 (2.9274) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:51:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][270/625] eta 0:02:27 lr 0.000413 wd 0.0500 time 0.3961 (0.4169) data time 0.0007 (0.0070) model time 0.3955 (0.4120) loss 5.9389 (6.9590) grad_norm 2.3440 (2.8974) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][280/625] eta 0:02:23 lr 0.000413 wd 0.0500 time 0.5934 (0.4170) data time 0.0009 (0.0068) model time 0.5925 (0.4122) loss 5.5643 (6.9391) grad_norm 2.2898 (2.9968) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][290/625] eta 0:02:19 lr 0.000413 wd 0.0500 time 0.3965 (0.4163) data time 0.0009 (0.0066) model time 0.3955 (0.4116) loss 6.0237 (6.9427) grad_norm 4.0261 (3.0054) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][300/625] eta 0:02:15 lr 0.000413 wd 0.0500 time 0.5942 (0.4174) data time 0.0008 (0.0064) model time 0.5934 (0.4131) loss 7.7203 (6.9357) grad_norm 3.8205 (3.0082) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][310/625] eta 0:02:11 lr 0.000413 wd 0.0500 time 0.3964 (0.4189) data time 0.0008 (0.0062) model time 0.3956 (0.4150) loss 6.1253 (6.9235) grad_norm 2.5496 (3.0112) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][320/625] eta 0:02:08 lr 0.000413 wd 0.0500 time 0.5854 (0.4209) data time 0.0008 (0.0060) model time 0.5845 (0.4175) loss 6.3885 (6.9286) grad_norm 2.6589 (2.9961) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][330/625] eta 0:02:04 lr 0.000413 wd 0.0500 time 0.3987 (0.4215) data time 0.0008 (0.0059) model time 0.3978 (0.4183) loss 7.4703 (6.9245) grad_norm 2.0326 (2.9772) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][340/625] eta 0:02:00 lr 0.000413 wd 0.0500 time 0.4021 (0.4213) data time 0.0006 (0.0057) model time 0.4015 (0.4181) loss 6.6765 (6.9190) grad_norm 3.2205 (2.9868) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][350/625] eta 0:01:55 lr 0.000413 wd 0.0500 time 0.3980 (0.4206) data time 0.0008 (0.0056) model time 0.3971 (0.4174) loss 6.3838 (6.9230) grad_norm 2.0174 (2.9853) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][360/625] eta 0:01:51 lr 0.000412 wd 0.0500 time 0.3989 (0.4201) data time 0.0010 (0.0055) model time 0.3979 (0.4168) loss 6.2771 (6.9188) grad_norm 3.2118 (2.9876) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][370/625] eta 0:01:46 lr 0.000412 wd 0.0500 time 0.3997 (0.4195) data time 0.0009 (0.0053) model time 0.3988 (0.4163) loss 7.9389 (6.9170) grad_norm 10.8912 (3.0005) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][380/625] eta 0:01:42 lr 0.000412 wd 0.0500 time 0.4168 (0.4191) data time 0.0009 (0.0052) model time 0.4160 (0.4158) loss 6.8260 (6.9201) grad_norm 1.9167 (2.9827) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][390/625] eta 0:01:38 lr 0.000412 wd 0.0500 time 0.3989 (0.4189) data time 0.0008 (0.0051) model time 0.3981 (0.4157) loss 5.8148 (6.9307) grad_norm 3.3530 (2.9715) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][400/625] eta 0:01:34 lr 0.000412 wd 0.0500 time 0.4019 (0.4184) data time 0.0006 (0.0050) model time 0.4012 (0.4152) loss 7.3382 (6.9306) grad_norm 1.9683 (2.9630) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:52:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][410/625] eta 0:01:29 lr 0.000412 wd 0.0500 time 0.3976 (0.4179) data time 0.0007 (0.0049) model time 0.3969 (0.4147) loss 6.6659 (6.9278) grad_norm 2.3057 (2.9517) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][420/625] eta 0:01:25 lr 0.000412 wd 0.0500 time 0.4001 (0.4175) data time 0.0007 (0.0048) model time 0.3994 (0.4143) loss 7.7625 (6.9348) grad_norm 1.8990 (2.9485) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][430/625] eta 0:01:21 lr 0.000412 wd 0.0500 time 0.4152 (0.4172) data time 0.0006 (0.0047) model time 0.4146 (0.4140) loss 6.5264 (6.9302) grad_norm 2.4169 (2.9573) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][440/625] eta 0:01:17 lr 0.000412 wd 0.0500 time 0.3982 (0.4168) data time 0.0006 (0.0046) model time 0.3976 (0.4137) loss 7.0573 (6.9264) grad_norm 4.6888 (2.9613) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][450/625] eta 0:01:12 lr 0.000412 wd 0.0500 time 0.3955 (0.4165) data time 0.0007 (0.0046) model time 0.3948 (0.4133) loss 7.5392 (6.9321) grad_norm 2.7404 (2.9575) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][460/625] eta 0:01:08 lr 0.000411 wd 0.0500 time 0.3996 (0.4161) data time 0.0009 (0.0045) model time 0.3987 (0.4129) loss 7.4985 (6.9377) grad_norm 2.6635 (2.9560) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][470/625] eta 0:01:04 lr 0.000411 wd 0.0500 time 0.4013 (0.4157) data time 0.0006 (0.0044) model time 0.4006 (0.4125) loss 6.2900 (6.9351) grad_norm 1.7816 (2.9482) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][480/625] eta 0:01:00 lr 0.000411 wd 0.0500 time 0.4001 (0.4154) data time 0.0008 (0.0043) model time 0.3993 (0.4122) loss 6.5228 (6.9315) grad_norm 1.9934 (2.9390) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][490/625] eta 0:00:56 lr 0.000411 wd 0.0500 time 0.3964 (0.4150) data time 0.0009 (0.0043) model time 0.3956 (0.4119) loss 7.0447 (6.9341) grad_norm 4.1094 (2.9410) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][500/625] eta 0:00:51 lr 0.000411 wd 0.0500 time 0.3995 (0.4147) data time 0.0007 (0.0042) model time 0.3988 (0.4116) loss 5.8523 (6.9361) grad_norm 4.9659 (2.9477) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][510/625] eta 0:00:47 lr 0.000411 wd 0.0500 time 0.3916 (0.4148) data time 0.0009 (0.0041) model time 0.3907 (0.4117) loss 5.9234 (6.9322) grad_norm 3.1905 (2.9432) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][520/625] eta 0:00:43 lr 0.000411 wd 0.0500 time 0.5705 (0.4155) data time 0.0007 (0.0041) model time 0.5699 (0.4126) loss 7.3841 (6.9279) grad_norm 2.5488 (2.9315) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][530/625] eta 0:00:39 lr 0.000411 wd 0.0500 time 0.3977 (0.4163) data time 0.0007 (0.0040) model time 0.3970 (0.4135) loss 6.6805 (6.9233) grad_norm 2.6530 (2.9325) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][540/625] eta 0:00:35 lr 0.000411 wd 0.0500 time 0.3989 (0.4174) data time 0.0010 (0.0040) model time 0.3979 (0.4147) loss 5.4428 (6.9265) grad_norm 2.5027 (2.9409) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][550/625] eta 0:00:31 lr 0.000411 wd 0.0500 time 0.3985 (0.4182) data time 0.0007 (0.0039) model time 0.3978 (0.4157) loss 6.4811 (6.9316) grad_norm 2.8147 (2.9422) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:53:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][560/625] eta 0:00:27 lr 0.000410 wd 0.0500 time 0.3974 (0.4182) data time 0.0009 (0.0039) model time 0.3966 (0.4156) loss 7.0828 (6.9340) grad_norm 2.0402 (2.9305) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][570/625] eta 0:00:22 lr 0.000410 wd 0.0500 time 0.4081 (0.4178) data time 0.0007 (0.0038) model time 0.4074 (0.4153) loss 7.1521 (6.9399) grad_norm 2.3059 (2.9528) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][580/625] eta 0:00:18 lr 0.000410 wd 0.0500 time 0.3955 (0.4175) data time 0.0007 (0.0038) model time 0.3948 (0.4150) loss 6.4835 (6.9363) grad_norm 1.7455 (2.9504) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][590/625] eta 0:00:14 lr 0.000410 wd 0.0500 time 0.3983 (0.4172) data time 0.0008 (0.0037) model time 0.3975 (0.4147) loss 7.2315 (6.9305) grad_norm 4.9397 (2.9533) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][600/625] eta 0:00:10 lr 0.000410 wd 0.0500 time 0.3987 (0.4169) data time 0.0009 (0.0037) model time 0.3979 (0.4143) loss 7.1472 (6.9279) grad_norm 2.0195 (2.9488) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][610/625] eta 0:00:06 lr 0.000410 wd 0.0500 time 0.3975 (0.4169) data time 0.0004 (0.0037) model time 0.3970 (0.4143) loss 5.4690 (6.9198) grad_norm 5.2638 (2.9586) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][620/625] eta 0:00:02 lr 0.000410 wd 0.0500 time 0.3976 (0.4166) data time 0.0006 (0.0037) model time 0.3971 (0.4140) loss 7.6944 (6.9190) grad_norm 2.2873 (2.9777) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 189 training takes 0:04:20 [2024-07-25 05:54:25 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:54:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:54:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.807 (0.807) Loss 0.5596 (0.5596) Acc@1 88.623 (88.623) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 05:54:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.153) Loss 0.8994 (0.6967) Acc@1 79.736 (85.898) Acc@5 95.508 (97.590) Mem 14939MB [2024-07-25 05:54:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 0.9736 (0.8147) Acc@1 77.148 (82.764) Acc@5 95.166 (96.394) Mem 14939MB [2024-07-25 05:54:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.410 Acc@5 96.363 [2024-07-25 05:54:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 05:54:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.41% [2024-07-25 05:54:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 05:54:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 05:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.5488 (0.5488) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:54:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8652 (0.6832) Acc@1 81.543 (86.452) Acc@5 96.289 (97.727) Mem 14939MB [2024-07-25 05:54:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9966 (0.7993) Acc@1 76.514 (83.222) Acc@5 95.508 (96.584) Mem 14939MB [2024-07-25 05:54:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.806 Acc@5 96.549 [2024-07-25 05:54:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 05:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][0/625] eta 0:13:47 lr 0.000410 wd 0.0500 time 1.3242 (1.3242) data time 0.5566 (0.5566) model time 0.0000 (0.0000) loss 7.1982 (7.1982) grad_norm 3.2697 (3.2697) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][10/625] eta 0:04:57 lr 0.000410 wd 0.0500 time 0.3991 (0.4838) data time 0.0009 (0.0515) model time 0.0000 (0.0000) loss 6.5287 (6.7818) grad_norm 2.2323 (2.5285) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][20/625] eta 0:04:27 lr 0.000410 wd 0.0500 time 0.3978 (0.4429) data time 0.0008 (0.0274) model time 0.0000 (0.0000) loss 7.6271 (6.9121) grad_norm 2.1184 (3.5337) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][30/625] eta 0:04:15 lr 0.000410 wd 0.0500 time 0.4011 (0.4293) data time 0.0007 (0.0188) model time 0.0000 (0.0000) loss 6.7311 (6.8673) grad_norm 2.4655 (3.1364) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][40/625] eta 0:04:07 lr 0.000409 wd 0.0500 time 0.3952 (0.4228) data time 0.0008 (0.0145) model time 0.0000 (0.0000) loss 7.0104 (6.9461) grad_norm 2.1811 (2.9539) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][50/625] eta 0:04:00 lr 0.000409 wd 0.0500 time 0.3968 (0.4182) data time 0.0007 (0.0118) model time 0.0000 (0.0000) loss 6.8970 (6.8269) grad_norm 4.4807 (2.9565) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:54:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][60/625] eta 0:03:55 lr 0.000409 wd 0.0500 time 0.3986 (0.4166) data time 0.0007 (0.0100) model time 0.3978 (0.4077) loss 8.0260 (6.9168) grad_norm 3.7939 (2.9341) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][70/625] eta 0:03:49 lr 0.000409 wd 0.0500 time 0.3989 (0.4143) data time 0.0009 (0.0088) model time 0.3980 (0.4035) loss 8.2937 (6.9705) grad_norm 3.7692 (2.8917) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][80/625] eta 0:03:44 lr 0.000409 wd 0.0500 time 0.4171 (0.4127) data time 0.0006 (0.0078) model time 0.4165 (0.4026) loss 6.8808 (6.9726) grad_norm 2.2428 (2.8069) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][90/625] eta 0:03:40 lr 0.000409 wd 0.0500 time 0.4010 (0.4116) data time 0.0008 (0.0070) model time 0.4002 (0.4022) loss 6.7045 (6.9772) grad_norm 1.9834 (2.7889) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][100/625] eta 0:03:36 lr 0.000409 wd 0.0500 time 0.3939 (0.4127) data time 0.0007 (0.0064) model time 0.3932 (0.4062) loss 5.7316 (6.9438) grad_norm 3.8360 (2.8016) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][110/625] eta 0:03:32 lr 0.000409 wd 0.0500 time 0.3960 (0.4128) data time 0.0009 (0.0059) model time 0.3951 (0.4073) loss 6.3369 (6.9209) grad_norm 3.1357 (2.8414) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][120/625] eta 0:03:31 lr 0.000409 wd 0.0500 time 0.5952 (0.4183) data time 0.0007 (0.0055) model time 0.5945 (0.4175) loss 5.0855 (6.8995) grad_norm 2.6724 (2.8215) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][130/625] eta 0:03:28 lr 0.000409 wd 0.0500 time 0.5710 (0.4211) data time 0.0006 (0.0052) model time 0.5704 (0.4221) loss 6.5319 (6.8668) grad_norm 1.6680 (2.8074) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][140/625] eta 0:03:26 lr 0.000408 wd 0.0500 time 0.4477 (0.4259) data time 0.0009 (0.0049) model time 0.4468 (0.4293) loss 6.6998 (6.8764) grad_norm 4.1258 (2.8237) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][150/625] eta 0:03:23 lr 0.000408 wd 0.0500 time 0.6127 (0.4290) data time 0.0009 (0.0046) model time 0.6118 (0.4336) loss 6.8360 (6.8918) grad_norm 1.8391 (2.8528) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][160/625] eta 0:03:18 lr 0.000408 wd 0.0500 time 0.3983 (0.4271) data time 0.0008 (0.0044) model time 0.3975 (0.4303) loss 7.8943 (6.9069) grad_norm 3.4517 (2.8565) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][170/625] eta 0:03:13 lr 0.000408 wd 0.0500 time 0.3967 (0.4255) data time 0.0006 (0.0042) model time 0.3961 (0.4277) loss 6.8771 (6.9193) grad_norm 1.8128 (2.8741) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][180/625] eta 0:03:08 lr 0.000408 wd 0.0500 time 0.4121 (0.4242) data time 0.0006 (0.0040) model time 0.4115 (0.4256) loss 8.2038 (6.9024) grad_norm 6.6126 (2.9068) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][190/625] eta 0:03:03 lr 0.000408 wd 0.0500 time 0.3976 (0.4229) data time 0.0009 (0.0038) model time 0.3967 (0.4238) loss 7.5541 (6.9167) grad_norm 3.0614 (2.9584) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:55:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][200/625] eta 0:02:59 lr 0.000408 wd 0.0500 time 0.3961 (0.4217) data time 0.0006 (0.0037) model time 0.3955 (0.4220) loss 8.0900 (6.9363) grad_norm 2.9220 (2.9660) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][210/625] eta 0:02:54 lr 0.000408 wd 0.0500 time 0.3958 (0.4206) data time 0.0008 (0.0035) model time 0.3950 (0.4205) loss 7.2502 (6.9350) grad_norm 2.6826 (2.9809) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][220/625] eta 0:02:49 lr 0.000408 wd 0.0500 time 0.3981 (0.4197) data time 0.0008 (0.0034) model time 0.3974 (0.4193) loss 6.8381 (6.9291) grad_norm 2.1488 (2.9687) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][230/625] eta 0:02:45 lr 0.000408 wd 0.0500 time 0.3961 (0.4189) data time 0.0009 (0.0033) model time 0.3951 (0.4182) loss 6.1754 (6.9223) grad_norm 2.1754 (2.9689) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][240/625] eta 0:02:40 lr 0.000407 wd 0.0500 time 0.3964 (0.4181) data time 0.0008 (0.0032) model time 0.3956 (0.4172) loss 7.8780 (6.9130) grad_norm 3.4699 (2.9482) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][250/625] eta 0:02:36 lr 0.000407 wd 0.0500 time 0.3946 (0.4174) data time 0.0009 (0.0031) model time 0.3937 (0.4163) loss 6.5673 (6.9186) grad_norm 3.9830 (2.9266) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][260/625] eta 0:02:32 lr 0.000407 wd 0.0500 time 0.3964 (0.4167) data time 0.0009 (0.0030) model time 0.3955 (0.4155) loss 7.6920 (6.9046) grad_norm 3.4010 (2.9034) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][270/625] eta 0:02:27 lr 0.000407 wd 0.0500 time 0.3972 (0.4161) data time 0.0006 (0.0030) model time 0.3965 (0.4148) loss 5.8406 (6.8940) grad_norm 2.0450 (2.9034) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][280/625] eta 0:02:23 lr 0.000407 wd 0.0500 time 0.3984 (0.4156) data time 0.0007 (0.0029) model time 0.3978 (0.4141) loss 5.6716 (6.8898) grad_norm 2.0413 (2.8935) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][290/625] eta 0:02:19 lr 0.000407 wd 0.0500 time 0.3966 (0.4150) data time 0.0007 (0.0028) model time 0.3959 (0.4135) loss 6.3193 (6.8810) grad_norm 2.9824 (2.9041) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][300/625] eta 0:02:14 lr 0.000407 wd 0.0500 time 0.3973 (0.4145) data time 0.0008 (0.0028) model time 0.3964 (0.4129) loss 7.0550 (6.8886) grad_norm 3.3561 (2.9075) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][310/625] eta 0:02:10 lr 0.000407 wd 0.0500 time 0.4004 (0.4141) data time 0.0008 (0.0027) model time 0.3996 (0.4124) loss 7.4217 (6.8928) grad_norm 2.8260 (2.8973) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][320/625] eta 0:02:06 lr 0.000407 wd 0.0500 time 0.3972 (0.4146) data time 0.0007 (0.0026) model time 0.3965 (0.4130) loss 6.4051 (6.9000) grad_norm 2.0766 (2.8855) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][330/625] eta 0:02:02 lr 0.000406 wd 0.0500 time 0.3980 (0.4146) data time 0.0007 (0.0026) model time 0.3973 (0.4130) loss 7.6585 (6.9005) grad_norm 2.0487 (2.8960) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][340/625] eta 0:01:58 lr 0.000406 wd 0.0500 time 0.5585 (0.4165) data time 0.0007 (0.0025) model time 0.5578 (0.4153) loss 6.9645 (6.9054) grad_norm 2.1639 (3.0133) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:56:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][350/625] eta 0:01:54 lr 0.000406 wd 0.0500 time 0.5266 (0.4174) data time 0.0007 (0.0025) model time 0.5259 (0.4163) loss 6.2913 (6.9021) grad_norm 2.1992 (2.9912) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 05:57:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][360/625] eta 0:01:51 lr 0.000406 wd 0.0500 time 0.5803 (0.4194) data time 0.0010 (0.0025) model time 0.5793 (0.4187) loss 7.7818 (6.9129) grad_norm 2.2985 (2.9677) loss_scale 512.0000 (261.6731) mem 14939MB [2024-07-25 05:57:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][370/625] eta 0:01:47 lr 0.000406 wd 0.0500 time 0.5996 (0.4204) data time 0.0010 (0.0024) model time 0.5986 (0.4198) loss 5.8586 (6.9076) grad_norm 3.3080 (2.9782) loss_scale 512.0000 (268.4205) mem 14939MB [2024-07-25 05:57:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][380/625] eta 0:01:42 lr 0.000406 wd 0.0500 time 0.4034 (0.4199) data time 0.0008 (0.0024) model time 0.4026 (0.4192) loss 7.5349 (6.9139) grad_norm 4.1864 (3.0297) loss_scale 512.0000 (274.8136) mem 14939MB [2024-07-25 05:57:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][390/625] eta 0:01:38 lr 0.000406 wd 0.0500 time 0.3927 (0.4194) data time 0.0008 (0.0023) model time 0.3919 (0.4186) loss 7.4246 (6.9228) grad_norm 2.0149 (3.0595) loss_scale 512.0000 (280.8798) mem 14939MB [2024-07-25 05:57:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][400/625] eta 0:01:34 lr 0.000406 wd 0.0500 time 0.3968 (0.4189) data time 0.0008 (0.0023) model time 0.3960 (0.4181) loss 8.3133 (6.9243) grad_norm 4.4002 (3.0626) loss_scale 512.0000 (286.6434) mem 14939MB [2024-07-25 05:57:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][410/625] eta 0:01:29 lr 0.000406 wd 0.0500 time 0.4089 (0.4185) data time 0.0009 (0.0023) model time 0.4080 (0.4176) loss 7.5754 (6.9234) grad_norm 1.7447 (3.0787) loss_scale 512.0000 (292.1265) mem 14939MB [2024-07-25 05:57:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][420/625] eta 0:01:25 lr 0.000406 wd 0.0500 time 0.3983 (0.4180) data time 0.0008 (0.0022) model time 0.3975 (0.4171) loss 7.0233 (6.9198) grad_norm 3.3394 (3.0749) loss_scale 512.0000 (297.3492) mem 14939MB [2024-07-25 05:57:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][430/625] eta 0:01:21 lr 0.000405 wd 0.0500 time 0.3995 (0.4176) data time 0.0006 (0.0022) model time 0.3988 (0.4166) loss 7.5652 (6.9194) grad_norm 1.8075 (3.0808) loss_scale 512.0000 (302.3295) mem 14939MB [2024-07-25 05:57:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][440/625] eta 0:01:17 lr 0.000405 wd 0.0500 time 0.4000 (0.4172) data time 0.0009 (0.0022) model time 0.3990 (0.4161) loss 6.5738 (6.9235) grad_norm 2.1466 (3.0769) loss_scale 512.0000 (307.0839) mem 14939MB [2024-07-25 05:57:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][450/625] eta 0:01:12 lr 0.000405 wd 0.0500 time 0.4001 (0.4168) data time 0.0008 (0.0022) model time 0.3993 (0.4157) loss 7.9668 (6.9216) grad_norm 2.0856 (3.0733) loss_scale 512.0000 (311.6275) mem 14939MB [2024-07-25 05:57:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][460/625] eta 0:01:08 lr 0.000405 wd 0.0500 time 0.3995 (0.4164) data time 0.0008 (0.0021) model time 0.3987 (0.4153) loss 7.3001 (6.9167) grad_norm 3.7800 (3.0609) loss_scale 512.0000 (315.9740) mem 14939MB [2024-07-25 05:57:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][470/625] eta 0:01:04 lr 0.000405 wd 0.0500 time 0.4132 (0.4161) data time 0.0008 (0.0021) model time 0.4124 (0.4150) loss 6.0734 (6.9129) grad_norm 1.8651 (3.0491) loss_scale 512.0000 (320.1359) mem 14939MB [2024-07-25 05:57:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][480/625] eta 0:01:00 lr 0.000405 wd 0.0500 time 0.3991 (0.4158) data time 0.0008 (0.0021) model time 0.3982 (0.4146) loss 8.3594 (6.9189) grad_norm 2.2018 (3.0509) loss_scale 512.0000 (324.1247) mem 14939MB [2024-07-25 05:57:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][490/625] eta 0:00:56 lr 0.000405 wd 0.0500 time 0.3998 (0.4155) data time 0.0007 (0.0021) model time 0.3991 (0.4142) loss 7.3264 (6.9255) grad_norm 3.5381 (3.0554) loss_scale 512.0000 (327.9511) mem 14939MB [2024-07-25 05:58:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][500/625] eta 0:00:51 lr 0.000405 wd 0.0500 time 0.4039 (0.4152) data time 0.0007 (0.0021) model time 0.4032 (0.4139) loss 6.5309 (6.9205) grad_norm 2.2168 (3.0772) loss_scale 512.0000 (331.6248) mem 14939MB [2024-07-25 05:58:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][510/625] eta 0:00:47 lr 0.000405 wd 0.0500 time 0.3988 (0.4149) data time 0.0006 (0.0020) model time 0.3982 (0.4136) loss 6.4778 (6.9272) grad_norm 2.7852 (3.0786) loss_scale 512.0000 (335.1546) mem 14939MB [2024-07-25 05:58:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][520/625] eta 0:00:43 lr 0.000405 wd 0.0500 time 0.4055 (0.4146) data time 0.0007 (0.0020) model time 0.4048 (0.4133) loss 8.0924 (6.9283) grad_norm 2.4265 (3.0742) loss_scale 512.0000 (338.5489) mem 14939MB [2024-07-25 05:58:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][530/625] eta 0:00:39 lr 0.000404 wd 0.0500 time 0.4053 (0.4143) data time 0.0006 (0.0020) model time 0.4047 (0.4129) loss 7.4016 (6.9330) grad_norm 2.4201 (3.0865) loss_scale 512.0000 (341.8154) mem 14939MB [2024-07-25 05:58:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][540/625] eta 0:00:35 lr 0.000404 wd 0.0500 time 0.3984 (0.4146) data time 0.0009 (0.0020) model time 0.3975 (0.4132) loss 7.0864 (6.9337) grad_norm 2.8152 (3.0863) loss_scale 512.0000 (344.9612) mem 14939MB [2024-07-25 05:58:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][550/625] eta 0:00:31 lr 0.000404 wd 0.0500 time 0.5953 (0.4146) data time 0.0008 (0.0020) model time 0.5945 (0.4133) loss 7.2820 (6.9386) grad_norm 3.2728 (3.0746) loss_scale 512.0000 (347.9927) mem 14939MB [2024-07-25 05:58:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][560/625] eta 0:00:26 lr 0.000404 wd 0.0500 time 0.4004 (0.4153) data time 0.0008 (0.0019) model time 0.3996 (0.4140) loss 8.1796 (6.9362) grad_norm 2.9825 (3.0688) loss_scale 512.0000 (350.9162) mem 14939MB [2024-07-25 05:58:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][570/625] eta 0:00:22 lr 0.000404 wd 0.0500 time 0.5800 (0.4156) data time 0.0008 (0.0019) model time 0.5791 (0.4144) loss 7.8419 (6.9414) grad_norm 6.2155 (3.0645) loss_scale 512.0000 (353.7373) mem 14939MB [2024-07-25 05:58:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][580/625] eta 0:00:18 lr 0.000404 wd 0.0500 time 0.3976 (0.4167) data time 0.0008 (0.0019) model time 0.3968 (0.4157) loss 6.3599 (6.9450) grad_norm 2.1258 (3.0545) loss_scale 512.0000 (356.4613) mem 14939MB [2024-07-25 05:58:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][590/625] eta 0:00:14 lr 0.000404 wd 0.0500 time 0.3969 (0.4174) data time 0.0009 (0.0019) model time 0.3960 (0.4163) loss 6.1842 (6.9428) grad_norm 2.0913 (3.0434) loss_scale 512.0000 (359.0931) mem 14939MB [2024-07-25 05:58:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][600/625] eta 0:00:10 lr 0.000404 wd 0.0500 time 0.4044 (0.4171) data time 0.0009 (0.0019) model time 0.4035 (0.4160) loss 6.3639 (6.9384) grad_norm 2.3702 (3.0416) loss_scale 512.0000 (361.6373) mem 14939MB [2024-07-25 05:58:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][610/625] eta 0:00:06 lr 0.000404 wd 0.0500 time 0.4314 (0.4169) data time 0.0004 (0.0019) model time 0.4310 (0.4158) loss 7.3655 (6.9364) grad_norm 2.2192 (3.0347) loss_scale 512.0000 (364.0982) mem 14939MB [2024-07-25 05:58:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][620/625] eta 0:00:02 lr 0.000404 wd 0.0500 time 0.3960 (0.4165) data time 0.0006 (0.0018) model time 0.3954 (0.4154) loss 7.0820 (6.9370) grad_norm 2.2012 (3.0288) loss_scale 512.0000 (366.4799) mem 14939MB [2024-07-25 05:58:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 190 training takes 0:04:20 [2024-07-25 05:58:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 05:58:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 05:58:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5542 (0.5542) Acc@1 89.062 (89.062) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 05:58:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8794 (0.6898) Acc@1 80.469 (85.986) Acc@5 96.143 (97.670) Mem 14939MB [2024-07-25 05:58:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9795 (0.8098) Acc@1 76.465 (82.803) Acc@5 94.824 (96.436) Mem 14939MB [2024-07-25 05:58:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.384 Acc@5 96.387 [2024-07-25 05:58:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 05:58:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.781 (0.781) Loss 0.5493 (0.5493) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 05:58:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.156) Loss 0.8652 (0.6828) Acc@1 81.445 (86.452) Acc@5 96.289 (97.723) Mem 14939MB [2024-07-25 05:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9951 (0.7987) Acc@1 76.758 (83.243) Acc@5 95.508 (96.591) Mem 14939MB [2024-07-25 05:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.831 Acc@5 96.551 [2024-07-25 05:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 05:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.83% [2024-07-25 05:58:59 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 05:59:00 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 05:59:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][0/625] eta 0:08:44 lr 0.000404 wd 0.0500 time 0.8390 (0.8390) data time 0.4576 (0.4576) model time 0.0000 (0.0000) loss 7.3218 (7.3218) grad_norm 2.1261 (2.1261) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][10/625] eta 0:04:30 lr 0.000403 wd 0.0500 time 0.4025 (0.4393) data time 0.0009 (0.0425) model time 0.0000 (0.0000) loss 7.1801 (7.1151) grad_norm 3.3565 (2.4761) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][20/625] eta 0:04:14 lr 0.000403 wd 0.0500 time 0.3968 (0.4214) data time 0.0009 (0.0227) model time 0.0000 (0.0000) loss 7.8690 (7.1478) grad_norm 3.6855 (2.9121) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][30/625] eta 0:04:07 lr 0.000403 wd 0.0500 time 0.3955 (0.4154) data time 0.0007 (0.0157) model time 0.0000 (0.0000) loss 7.3847 (7.1848) grad_norm 2.3197 (2.7986) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][40/625] eta 0:04:00 lr 0.000403 wd 0.0500 time 0.3973 (0.4114) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 6.1761 (7.1988) grad_norm 1.9816 (2.6642) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][50/625] eta 0:03:54 lr 0.000403 wd 0.0500 time 0.3984 (0.4086) data time 0.0009 (0.0099) model time 0.0000 (0.0000) loss 6.0663 (7.1514) grad_norm 3.3765 (2.7767) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][60/625] eta 0:03:50 lr 0.000403 wd 0.0500 time 0.4009 (0.4073) data time 0.0009 (0.0084) model time 0.4000 (0.3996) loss 6.6413 (7.1916) grad_norm 2.6964 (2.7772) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][70/625] eta 0:03:45 lr 0.000403 wd 0.0500 time 0.4053 (0.4063) data time 0.0007 (0.0074) model time 0.4047 (0.3994) loss 6.7848 (7.0988) grad_norm 1.9492 (2.7780) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][80/625] eta 0:03:41 lr 0.000403 wd 0.0500 time 0.4049 (0.4057) data time 0.0008 (0.0066) model time 0.4042 (0.3997) loss 5.8805 (7.0523) grad_norm 3.1375 (2.8354) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][90/625] eta 0:03:36 lr 0.000403 wd 0.0500 time 0.4001 (0.4052) data time 0.0008 (0.0059) model time 0.3992 (0.4000) loss 7.4923 (7.0623) grad_norm 2.3416 (2.7855) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][100/625] eta 0:03:32 lr 0.000403 wd 0.0500 time 0.3984 (0.4048) data time 0.0008 (0.0054) model time 0.3976 (0.4000) loss 6.8926 (7.0473) grad_norm 3.1988 (2.9274) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][110/625] eta 0:03:29 lr 0.000402 wd 0.0500 time 0.3955 (0.4061) data time 0.0009 (0.0050) model time 0.3946 (0.4031) loss 7.2251 (7.0674) grad_norm 2.9165 (2.9537) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][120/625] eta 0:03:24 lr 0.000402 wd 0.0500 time 0.3982 (0.4056) data time 0.0008 (0.0047) model time 0.3974 (0.4025) loss 6.5036 (7.0601) grad_norm 2.1830 (2.9327) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][130/625] eta 0:03:20 lr 0.000402 wd 0.0500 time 0.3985 (0.4051) data time 0.0008 (0.0044) model time 0.3978 (0.4020) loss 7.5586 (7.0347) grad_norm 2.2559 (2.9626) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 05:59:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][140/625] eta 0:03:17 lr 0.000402 wd 0.0500 time 0.4014 (0.4069) data time 0.0008 (0.0042) model time 0.4005 (0.4051) loss 7.6313 (6.9949) grad_norm 2.9464 (3.0022) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][150/625] eta 0:03:13 lr 0.000402 wd 0.0500 time 0.5268 (0.4084) data time 0.0009 (0.0039) model time 0.5260 (0.4074) loss 7.5876 (6.9980) grad_norm 2.2189 (2.9646) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][160/625] eta 0:03:12 lr 0.000402 wd 0.0500 time 0.6036 (0.4140) data time 0.0008 (0.0038) model time 0.6028 (0.4156) loss 6.8682 (7.0045) grad_norm 1.8641 (2.9148) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][170/625] eta 0:03:09 lr 0.000402 wd 0.0500 time 0.3963 (0.4160) data time 0.0010 (0.0036) model time 0.3953 (0.4182) loss 7.3937 (6.9820) grad_norm 1.9871 (2.8705) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][180/625] eta 0:03:06 lr 0.000402 wd 0.0500 time 0.4009 (0.4193) data time 0.0009 (0.0034) model time 0.4000 (0.4225) loss 7.3861 (6.9841) grad_norm 1.9087 (2.8456) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][190/625] eta 0:03:02 lr 0.000402 wd 0.0500 time 0.3989 (0.4191) data time 0.0007 (0.0033) model time 0.3983 (0.4220) loss 6.6269 (6.9731) grad_norm 2.3976 (2.8375) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][200/625] eta 0:02:57 lr 0.000402 wd 0.0500 time 0.4019 (0.4181) data time 0.0008 (0.0032) model time 0.4011 (0.4203) loss 7.0398 (6.9399) grad_norm 2.5256 (2.8048) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][210/625] eta 0:02:53 lr 0.000401 wd 0.0500 time 0.3950 (0.4171) data time 0.0007 (0.0031) model time 0.3943 (0.4189) loss 8.0397 (6.9542) grad_norm 2.1905 (2.7827) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][220/625] eta 0:02:48 lr 0.000401 wd 0.0500 time 0.3994 (0.4163) data time 0.0007 (0.0030) model time 0.3988 (0.4177) loss 6.4006 (6.9624) grad_norm 2.7673 (2.7980) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][230/625] eta 0:02:44 lr 0.000401 wd 0.0500 time 0.3996 (0.4157) data time 0.0008 (0.0029) model time 0.3988 (0.4168) loss 7.7131 (6.9593) grad_norm 2.6113 (2.7979) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][240/625] eta 0:02:39 lr 0.000401 wd 0.0500 time 0.3967 (0.4151) data time 0.0008 (0.0028) model time 0.3958 (0.4159) loss 6.6946 (6.9773) grad_norm 2.9857 (2.7968) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][250/625] eta 0:02:35 lr 0.000401 wd 0.0500 time 0.3975 (0.4144) data time 0.0006 (0.0027) model time 0.3969 (0.4150) loss 7.2797 (6.9863) grad_norm 6.6966 (2.9816) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][260/625] eta 0:02:31 lr 0.000401 wd 0.0500 time 0.3986 (0.4139) data time 0.0007 (0.0027) model time 0.3980 (0.4142) loss 5.7277 (6.9896) grad_norm 3.5002 (2.9948) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][270/625] eta 0:02:26 lr 0.000401 wd 0.0500 time 0.4005 (0.4134) data time 0.0007 (0.0026) model time 0.3998 (0.4136) loss 7.0577 (6.9900) grad_norm 3.7860 (3.0000) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:00:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][280/625] eta 0:02:22 lr 0.000401 wd 0.0500 time 0.3973 (0.4129) data time 0.0008 (0.0025) model time 0.3965 (0.4129) loss 6.7741 (6.9934) grad_norm 4.9240 (2.9973) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][290/625] eta 0:02:18 lr 0.000401 wd 0.0500 time 0.3987 (0.4124) data time 0.0008 (0.0025) model time 0.3980 (0.4123) loss 7.6521 (6.9981) grad_norm 2.3857 (2.9810) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][300/625] eta 0:02:13 lr 0.000401 wd 0.0500 time 0.3968 (0.4119) data time 0.0009 (0.0024) model time 0.3959 (0.4117) loss 7.7501 (7.0095) grad_norm 2.7484 (2.9638) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][310/625] eta 0:02:09 lr 0.000400 wd 0.0500 time 0.4014 (0.4116) data time 0.0006 (0.0024) model time 0.4008 (0.4113) loss 6.2307 (7.0008) grad_norm 2.3286 (2.9617) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][320/625] eta 0:02:05 lr 0.000400 wd 0.0500 time 0.4046 (0.4114) data time 0.0006 (0.0023) model time 0.4039 (0.4110) loss 7.5083 (7.0063) grad_norm 2.9757 (2.9580) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][330/625] eta 0:02:01 lr 0.000400 wd 0.0500 time 0.3990 (0.4116) data time 0.0008 (0.0023) model time 0.3982 (0.4112) loss 6.9381 (7.0076) grad_norm 2.2982 (2.9351) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][340/625] eta 0:01:57 lr 0.000400 wd 0.0500 time 0.3989 (0.4113) data time 0.0007 (0.0022) model time 0.3982 (0.4109) loss 6.1682 (7.0147) grad_norm 2.0293 (2.9228) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][350/625] eta 0:01:53 lr 0.000400 wd 0.0500 time 0.3995 (0.4110) data time 0.0009 (0.0022) model time 0.3986 (0.4105) loss 6.3683 (7.0132) grad_norm 1.8731 (2.9095) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][360/625] eta 0:01:49 lr 0.000400 wd 0.0500 time 0.4019 (0.4116) data time 0.0009 (0.0022) model time 0.4011 (0.4112) loss 7.9718 (7.0136) grad_norm 2.7974 (2.9261) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][370/625] eta 0:01:45 lr 0.000400 wd 0.0500 time 0.3978 (0.4118) data time 0.0007 (0.0021) model time 0.3971 (0.4115) loss 7.8649 (7.0277) grad_norm 3.1603 (2.9276) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][380/625] eta 0:01:41 lr 0.000400 wd 0.0500 time 0.5708 (0.4135) data time 0.0006 (0.0021) model time 0.5702 (0.4134) loss 6.3418 (7.0259) grad_norm 2.4049 (2.9419) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][390/625] eta 0:01:37 lr 0.000400 wd 0.0500 time 0.5576 (0.4140) data time 0.0009 (0.0021) model time 0.5567 (0.4139) loss 6.8254 (7.0351) grad_norm 2.6372 (2.9333) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][400/625] eta 0:01:33 lr 0.000400 wd 0.0500 time 0.5871 (0.4161) data time 0.0009 (0.0021) model time 0.5862 (0.4163) loss 7.5601 (7.0379) grad_norm 2.0020 (2.9167) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][410/625] eta 0:01:29 lr 0.000399 wd 0.0500 time 0.3996 (0.4162) data time 0.0008 (0.0020) model time 0.3988 (0.4163) loss 7.7578 (7.0457) grad_norm 2.6549 (2.9013) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][420/625] eta 0:01:25 lr 0.000399 wd 0.0500 time 0.3979 (0.4157) data time 0.0007 (0.0020) model time 0.3972 (0.4158) loss 6.8964 (7.0343) grad_norm 2.5798 (2.8849) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][430/625] eta 0:01:20 lr 0.000399 wd 0.0500 time 0.3972 (0.4154) data time 0.0006 (0.0020) model time 0.3966 (0.4153) loss 5.9467 (7.0247) grad_norm 2.9607 (2.8728) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][440/625] eta 0:01:16 lr 0.000399 wd 0.0500 time 0.3973 (0.4151) data time 0.0007 (0.0019) model time 0.3966 (0.4150) loss 6.9494 (7.0268) grad_norm 2.1720 (2.8785) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][450/625] eta 0:01:12 lr 0.000399 wd 0.0500 time 0.4252 (0.4147) data time 0.0006 (0.0019) model time 0.4246 (0.4146) loss 6.2386 (7.0183) grad_norm 2.2090 (2.8615) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][460/625] eta 0:01:08 lr 0.000399 wd 0.0500 time 0.3992 (0.4144) data time 0.0007 (0.0019) model time 0.3986 (0.4142) loss 5.8564 (7.0188) grad_norm 2.3247 (2.9130) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][470/625] eta 0:01:04 lr 0.000399 wd 0.0500 time 0.3993 (0.4141) data time 0.0007 (0.0019) model time 0.3986 (0.4139) loss 5.8736 (7.0177) grad_norm 2.3732 (2.9583) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][480/625] eta 0:01:00 lr 0.000399 wd 0.0500 time 0.3972 (0.4138) data time 0.0006 (0.0019) model time 0.3966 (0.4135) loss 6.3713 (7.0153) grad_norm 4.3245 (3.0115) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][490/625] eta 0:00:55 lr 0.000399 wd 0.0500 time 0.3979 (0.4135) data time 0.0006 (0.0018) model time 0.3973 (0.4132) loss 5.2321 (7.0058) grad_norm 2.6235 (3.0201) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][500/625] eta 0:00:51 lr 0.000399 wd 0.0500 time 0.4008 (0.4133) data time 0.0009 (0.0018) model time 0.3999 (0.4129) loss 7.3040 (7.0134) grad_norm 2.2936 (3.0440) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][510/625] eta 0:00:47 lr 0.000398 wd 0.0500 time 0.4035 (0.4130) data time 0.0006 (0.0018) model time 0.4028 (0.4126) loss 8.2942 (7.0129) grad_norm 3.0733 (3.0397) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][520/625] eta 0:00:43 lr 0.000398 wd 0.0500 time 0.4063 (0.4128) data time 0.0006 (0.0018) model time 0.4057 (0.4123) loss 6.6979 (7.0067) grad_norm 12.2929 (3.0492) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][530/625] eta 0:00:39 lr 0.000398 wd 0.0500 time 0.4000 (0.4126) data time 0.0009 (0.0018) model time 0.3991 (0.4121) loss 7.9160 (7.0089) grad_norm 2.5106 (3.0674) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][540/625] eta 0:00:35 lr 0.000398 wd 0.0500 time 0.4000 (0.4123) data time 0.0008 (0.0018) model time 0.3991 (0.4118) loss 5.5566 (7.0057) grad_norm 3.1075 (3.0699) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][550/625] eta 0:00:30 lr 0.000398 wd 0.0500 time 0.3998 (0.4124) data time 0.0007 (0.0017) model time 0.3992 (0.4119) loss 6.4463 (6.9995) grad_norm 2.0746 (3.0835) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][560/625] eta 0:00:26 lr 0.000398 wd 0.0500 time 0.3993 (0.4122) data time 0.0007 (0.0017) model time 0.3986 (0.4116) loss 7.7317 (7.0015) grad_norm 2.1192 (3.0759) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][570/625] eta 0:00:22 lr 0.000398 wd 0.0500 time 0.3977 (0.4120) data time 0.0007 (0.0017) model time 0.3970 (0.4114) loss 6.1166 (6.9913) grad_norm 2.6514 (3.0732) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:02:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][580/625] eta 0:00:18 lr 0.000398 wd 0.0500 time 0.3983 (0.4121) data time 0.0006 (0.0017) model time 0.3977 (0.4115) loss 7.3357 (6.9877) grad_norm 1.8585 (3.0618) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][590/625] eta 0:00:14 lr 0.000398 wd 0.0500 time 0.3988 (0.4122) data time 0.0009 (0.0017) model time 0.3979 (0.4116) loss 7.8091 (6.9891) grad_norm 2.6429 (3.0588) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][600/625] eta 0:00:10 lr 0.000398 wd 0.0500 time 0.3992 (0.4130) data time 0.0009 (0.0017) model time 0.3984 (0.4125) loss 8.0982 (6.9935) grad_norm 3.1083 (3.0771) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][610/625] eta 0:00:06 lr 0.000397 wd 0.0500 time 0.5679 (0.4140) data time 0.0004 (0.0017) model time 0.5675 (0.4136) loss 5.6125 (6.9842) grad_norm 2.1274 (3.0757) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][620/625] eta 0:00:02 lr 0.000397 wd 0.0500 time 0.5600 (0.4153) data time 0.0004 (0.0016) model time 0.5596 (0.4150) loss 6.7874 (6.9885) grad_norm 2.0853 (3.0655) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 191 training takes 0:04:19 [2024-07-25 06:03:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:03:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:03:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5732 (0.5732) Acc@1 89.258 (89.258) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 06:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8896 (0.6961) Acc@1 80.176 (86.080) Acc@5 96.240 (97.643) Mem 14939MB [2024-07-25 06:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0029 (0.8114) Acc@1 76.318 (82.882) Acc@5 95.166 (96.440) Mem 14939MB [2024-07-25 06:03:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.506 Acc@5 96.381 [2024-07-25 06:03:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 06:03:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.51% [2024-07-25 06:03:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 06:03:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 06:03:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5488 (0.5488) Acc@1 89.990 (89.990) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:03:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.120) Loss 0.8638 (0.6824) Acc@1 81.494 (86.435) Acc@5 96.240 (97.723) Mem 14939MB [2024-07-25 06:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9941 (0.7981) Acc@1 76.807 (83.257) Acc@5 95.508 (96.605) Mem 14939MB [2024-07-25 06:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.845 Acc@5 96.565 [2024-07-25 06:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 06:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.85% [2024-07-25 06:03:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:03:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][0/625] eta 0:07:55 lr 0.000397 wd 0.0500 time 0.7615 (0.7615) data time 0.3849 (0.3849) model time 0.0000 (0.0000) loss 6.4083 (6.4083) grad_norm 1.9525 (1.9525) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][10/625] eta 0:04:25 lr 0.000397 wd 0.0500 time 0.3931 (0.4319) data time 0.0008 (0.0358) model time 0.0000 (0.0000) loss 6.1821 (6.9559) grad_norm 2.3305 (2.3311) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][20/625] eta 0:04:11 lr 0.000397 wd 0.0500 time 0.3991 (0.4159) data time 0.0008 (0.0192) model time 0.0000 (0.0000) loss 6.1037 (6.9302) grad_norm 2.2971 (2.1802) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][30/625] eta 0:04:04 lr 0.000397 wd 0.0500 time 0.4030 (0.4107) data time 0.0009 (0.0133) model time 0.0000 (0.0000) loss 6.2532 (6.7489) grad_norm 2.0391 (2.2620) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][40/625] eta 0:03:58 lr 0.000397 wd 0.0500 time 0.3958 (0.4079) data time 0.0008 (0.0103) model time 0.0000 (0.0000) loss 5.6940 (6.7120) grad_norm 1.9259 (2.4306) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][50/625] eta 0:03:55 lr 0.000397 wd 0.0500 time 0.4016 (0.4092) data time 0.0007 (0.0084) model time 0.0000 (0.0000) loss 7.8999 (6.7689) grad_norm 2.0772 (2.4770) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][60/625] eta 0:03:50 lr 0.000397 wd 0.0500 time 0.4003 (0.4075) data time 0.0006 (0.0072) model time 0.3997 (0.3984) loss 7.5219 (6.8131) grad_norm 2.3678 (2.4589) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:03:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][70/625] eta 0:03:45 lr 0.000397 wd 0.0500 time 0.3945 (0.4064) data time 0.0009 (0.0063) model time 0.3936 (0.3984) loss 5.9282 (6.8315) grad_norm 1.9670 (2.4378) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][80/625] eta 0:03:41 lr 0.000396 wd 0.0500 time 0.4000 (0.4057) data time 0.0006 (0.0056) model time 0.3994 (0.3989) loss 6.6550 (6.8381) grad_norm 1.8772 (2.4837) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][90/625] eta 0:03:36 lr 0.000396 wd 0.0500 time 0.4035 (0.4054) data time 0.0006 (0.0055) model time 0.4030 (0.3987) loss 7.3875 (6.8693) grad_norm 4.4318 (2.6646) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][100/625] eta 0:03:32 lr 0.000396 wd 0.0500 time 0.3995 (0.4051) data time 0.0006 (0.0050) model time 0.3989 (0.3993) loss 6.3641 (6.8830) grad_norm 2.6636 (2.6592) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][110/625] eta 0:03:28 lr 0.000396 wd 0.0500 time 0.4031 (0.4047) data time 0.0007 (0.0047) model time 0.4024 (0.3995) loss 7.8524 (6.9046) grad_norm 2.3730 (2.6391) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][120/625] eta 0:03:24 lr 0.000396 wd 0.0500 time 0.4024 (0.4043) data time 0.0006 (0.0044) model time 0.4018 (0.3994) loss 7.3478 (6.9316) grad_norm 2.2380 (2.7612) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][130/625] eta 0:03:20 lr 0.000396 wd 0.0500 time 0.4130 (0.4041) data time 0.0009 (0.0041) model time 0.4121 (0.3996) loss 7.7072 (6.9197) grad_norm 2.3891 (2.7600) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][140/625] eta 0:03:15 lr 0.000396 wd 0.0500 time 0.4027 (0.4038) data time 0.0007 (0.0039) model time 0.4020 (0.3995) loss 5.2529 (6.9251) grad_norm 3.1219 (2.7812) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][150/625] eta 0:03:11 lr 0.000396 wd 0.0500 time 0.4046 (0.4036) data time 0.0007 (0.0037) model time 0.4039 (0.3995) loss 7.4574 (6.9433) grad_norm 2.0827 (2.7729) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][160/625] eta 0:03:07 lr 0.000396 wd 0.0500 time 0.3987 (0.4035) data time 0.0009 (0.0036) model time 0.3978 (0.3996) loss 7.1044 (6.9285) grad_norm 2.5851 (2.7710) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][170/625] eta 0:03:03 lr 0.000396 wd 0.0500 time 0.3993 (0.4033) data time 0.0008 (0.0034) model time 0.3985 (0.3995) loss 5.9012 (6.9078) grad_norm 4.2081 (2.8144) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][180/625] eta 0:02:59 lr 0.000395 wd 0.0500 time 0.4170 (0.4039) data time 0.0006 (0.0033) model time 0.4163 (0.4006) loss 7.3646 (6.8966) grad_norm 2.3591 (2.8037) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][190/625] eta 0:02:57 lr 0.000395 wd 0.0500 time 0.3987 (0.4086) data time 0.0009 (0.0032) model time 0.3978 (0.4071) loss 6.1313 (6.8701) grad_norm 2.0914 (2.8056) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][200/625] eta 0:02:54 lr 0.000395 wd 0.0500 time 0.4002 (0.4111) data time 0.0007 (0.0031) model time 0.3995 (0.4105) loss 6.8660 (6.8909) grad_norm 2.1354 (2.8024) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][210/625] eta 0:02:52 lr 0.000395 wd 0.0500 time 0.4019 (0.4148) data time 0.0009 (0.0030) model time 0.4010 (0.4153) loss 7.8986 (6.8954) grad_norm 1.9090 (2.7870) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][220/625] eta 0:02:48 lr 0.000395 wd 0.0500 time 0.3988 (0.4170) data time 0.0008 (0.0029) model time 0.3980 (0.4181) loss 6.6023 (6.9058) grad_norm 2.0597 (2.7609) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][230/625] eta 0:02:44 lr 0.000395 wd 0.0500 time 0.4026 (0.4162) data time 0.0008 (0.0028) model time 0.4018 (0.4170) loss 6.9610 (6.8991) grad_norm 3.1912 (2.7762) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][240/625] eta 0:02:39 lr 0.000395 wd 0.0500 time 0.3994 (0.4156) data time 0.0007 (0.0027) model time 0.3987 (0.4161) loss 6.4362 (6.9055) grad_norm 3.7629 (2.7750) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][250/625] eta 0:02:35 lr 0.000395 wd 0.0500 time 0.3960 (0.4150) data time 0.0006 (0.0029) model time 0.3953 (0.4151) loss 7.3586 (6.9123) grad_norm 2.5284 (2.7782) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][260/625] eta 0:02:31 lr 0.000395 wd 0.0500 time 0.3981 (0.4145) data time 0.0010 (0.0028) model time 0.3971 (0.4143) loss 7.2849 (6.9184) grad_norm 1.6847 (2.7806) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][270/625] eta 0:02:27 lr 0.000395 wd 0.0500 time 0.5994 (0.4148) data time 0.0009 (0.0028) model time 0.5986 (0.4146) loss 7.3238 (6.9237) grad_norm 2.2888 (2.7910) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][280/625] eta 0:02:22 lr 0.000394 wd 0.0500 time 0.3989 (0.4142) data time 0.0008 (0.0027) model time 0.3981 (0.4138) loss 6.8299 (6.9134) grad_norm 2.8373 (2.7925) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][290/625] eta 0:02:18 lr 0.000394 wd 0.0500 time 0.3964 (0.4137) data time 0.0007 (0.0027) model time 0.3957 (0.4132) loss 7.8973 (6.9180) grad_norm 2.8577 (2.7886) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][300/625] eta 0:02:14 lr 0.000394 wd 0.0500 time 0.4022 (0.4132) data time 0.0008 (0.0026) model time 0.4014 (0.4126) loss 6.2849 (6.9171) grad_norm 2.5694 (2.7895) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][310/625] eta 0:02:10 lr 0.000394 wd 0.0500 time 0.3981 (0.4128) data time 0.0006 (0.0026) model time 0.3975 (0.4122) loss 5.7149 (6.9232) grad_norm 2.2720 (2.7760) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][320/625] eta 0:02:05 lr 0.000394 wd 0.0500 time 0.4008 (0.4124) data time 0.0009 (0.0025) model time 0.4000 (0.4117) loss 5.7696 (6.9293) grad_norm 3.8842 (2.7754) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][330/625] eta 0:02:01 lr 0.000394 wd 0.0500 time 0.3997 (0.4120) data time 0.0008 (0.0025) model time 0.3989 (0.4112) loss 5.8641 (6.9184) grad_norm 2.0856 (2.7569) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][340/625] eta 0:01:57 lr 0.000394 wd 0.0500 time 0.3962 (0.4116) data time 0.0009 (0.0024) model time 0.3953 (0.4107) loss 6.3050 (6.9254) grad_norm 1.9202 (2.7571) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][350/625] eta 0:01:53 lr 0.000394 wd 0.0500 time 0.4048 (0.4113) data time 0.0006 (0.0024) model time 0.4042 (0.4103) loss 6.6614 (6.9276) grad_norm 2.1988 (2.7725) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][360/625] eta 0:01:48 lr 0.000394 wd 0.0500 time 0.3988 (0.4110) data time 0.0007 (0.0023) model time 0.3981 (0.4100) loss 6.9473 (6.9313) grad_norm 3.1111 (2.7856) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][370/625] eta 0:01:44 lr 0.000394 wd 0.0500 time 0.4018 (0.4107) data time 0.0008 (0.0023) model time 0.4010 (0.4096) loss 5.5110 (6.9390) grad_norm 2.4157 (2.7862) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][380/625] eta 0:01:40 lr 0.000393 wd 0.0500 time 0.3976 (0.4105) data time 0.0006 (0.0022) model time 0.3969 (0.4094) loss 6.1804 (6.9437) grad_norm 2.4270 (2.7858) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][390/625] eta 0:01:36 lr 0.000393 wd 0.0500 time 0.4001 (0.4102) data time 0.0006 (0.0022) model time 0.3995 (0.4091) loss 5.5538 (6.9445) grad_norm 1.8403 (2.7894) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][400/625] eta 0:01:32 lr 0.000393 wd 0.0500 time 0.3962 (0.4102) data time 0.0009 (0.0022) model time 0.3954 (0.4091) loss 6.9804 (6.9371) grad_norm 2.0215 (2.8017) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][410/625] eta 0:01:28 lr 0.000393 wd 0.0500 time 0.5961 (0.4122) data time 0.0007 (0.0022) model time 0.5953 (0.4114) loss 7.3879 (6.9475) grad_norm 2.2964 (2.8046) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][420/625] eta 0:01:24 lr 0.000393 wd 0.0500 time 0.3984 (0.4132) data time 0.0006 (0.0021) model time 0.3978 (0.4125) loss 6.7409 (6.9486) grad_norm 4.9220 (2.8206) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][430/625] eta 0:01:20 lr 0.000393 wd 0.0500 time 0.5564 (0.4148) data time 0.0007 (0.0021) model time 0.5557 (0.4143) loss 5.9002 (6.9378) grad_norm 2.5974 (2.8211) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][440/625] eta 0:01:17 lr 0.000393 wd 0.0500 time 0.3986 (0.4163) data time 0.0007 (0.0021) model time 0.3979 (0.4160) loss 7.5956 (6.9515) grad_norm 3.1215 (2.8355) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][450/625] eta 0:01:12 lr 0.000393 wd 0.0500 time 0.3974 (0.4160) data time 0.0007 (0.0021) model time 0.3967 (0.4155) loss 7.4806 (6.9554) grad_norm 2.9565 (2.8286) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][460/625] eta 0:01:08 lr 0.000393 wd 0.0500 time 0.4100 (0.4156) data time 0.0007 (0.0021) model time 0.4094 (0.4151) loss 5.7505 (6.9471) grad_norm 1.8860 (2.8242) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][470/625] eta 0:01:04 lr 0.000393 wd 0.0500 time 0.3960 (0.4152) data time 0.0009 (0.0021) model time 0.3951 (0.4147) loss 7.5666 (6.9504) grad_norm 2.2365 (2.8185) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][480/625] eta 0:01:00 lr 0.000392 wd 0.0500 time 0.3977 (0.4149) data time 0.0009 (0.0021) model time 0.3969 (0.4143) loss 6.9368 (6.9466) grad_norm 2.9212 (2.8063) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][490/625] eta 0:00:56 lr 0.000392 wd 0.0500 time 0.3969 (0.4149) data time 0.0008 (0.0020) model time 0.3961 (0.4142) loss 6.1262 (6.9418) grad_norm 4.3968 (2.8171) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][500/625] eta 0:00:51 lr 0.000392 wd 0.0500 time 0.3968 (0.4145) data time 0.0008 (0.0020) model time 0.3959 (0.4138) loss 7.8933 (6.9436) grad_norm 1.9285 (2.8196) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:06:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][510/625] eta 0:00:47 lr 0.000392 wd 0.0500 time 0.3969 (0.4142) data time 0.0006 (0.0020) model time 0.3963 (0.4135) loss 7.2171 (6.9437) grad_norm 3.0671 (2.8090) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][520/625] eta 0:00:43 lr 0.000392 wd 0.0500 time 0.4000 (0.4140) data time 0.0008 (0.0020) model time 0.3991 (0.4133) loss 7.2747 (6.9444) grad_norm 3.4981 (2.8014) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][530/625] eta 0:00:39 lr 0.000392 wd 0.0500 time 0.3961 (0.4137) data time 0.0009 (0.0020) model time 0.3952 (0.4130) loss 6.0908 (6.9335) grad_norm 2.1581 (2.8377) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][540/625] eta 0:00:35 lr 0.000392 wd 0.0500 time 0.3990 (0.4135) data time 0.0009 (0.0019) model time 0.3981 (0.4127) loss 7.5482 (6.9349) grad_norm 5.7022 (2.8371) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][550/625] eta 0:00:30 lr 0.000392 wd 0.0500 time 0.3968 (0.4132) data time 0.0009 (0.0019) model time 0.3959 (0.4124) loss 7.8421 (6.9342) grad_norm 1.5187 (2.8320) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][560/625] eta 0:00:26 lr 0.000392 wd 0.0500 time 0.3965 (0.4130) data time 0.0007 (0.0019) model time 0.3958 (0.4121) loss 5.8899 (6.9397) grad_norm 2.1303 (2.8287) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][570/625] eta 0:00:22 lr 0.000392 wd 0.0500 time 0.4005 (0.4128) data time 0.0007 (0.0019) model time 0.3998 (0.4119) loss 8.2168 (6.9421) grad_norm 2.5865 (2.8212) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][580/625] eta 0:00:18 lr 0.000392 wd 0.0500 time 0.4005 (0.4125) data time 0.0009 (0.0019) model time 0.3997 (0.4116) loss 6.5794 (6.9497) grad_norm 2.1926 (2.8133) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][590/625] eta 0:00:14 lr 0.000391 wd 0.0500 time 0.3994 (0.4123) data time 0.0009 (0.0018) model time 0.3985 (0.4114) loss 6.2917 (6.9496) grad_norm 3.8163 (2.8107) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][600/625] eta 0:00:10 lr 0.000391 wd 0.0500 time 0.3981 (0.4121) data time 0.0009 (0.0018) model time 0.3973 (0.4111) loss 6.7336 (6.9543) grad_norm 2.8270 (2.8186) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][610/625] eta 0:00:06 lr 0.000391 wd 0.0500 time 0.3996 (0.4119) data time 0.0004 (0.0018) model time 0.3992 (0.4109) loss 7.3508 (6.9564) grad_norm 1.6152 (2.8116) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][620/625] eta 0:00:02 lr 0.000391 wd 0.0500 time 0.4000 (0.4120) data time 0.0004 (0.0018) model time 0.3996 (0.4110) loss 7.2362 (6.9572) grad_norm 4.7067 (2.8095) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 192 training takes 0:04:17 [2024-07-25 06:07:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:07:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:07:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5547 (0.5547) Acc@1 89.404 (89.404) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 06:07:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8984 (0.6885) Acc@1 80.078 (85.938) Acc@5 95.850 (97.505) Mem 14939MB [2024-07-25 06:07:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9839 (0.8024) Acc@1 77.100 (82.868) Acc@5 95.215 (96.361) Mem 14939MB [2024-07-25 06:07:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.506 Acc@5 96.373 [2024-07-25 06:07:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 06:07:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.795 (0.795) Loss 0.5488 (0.5488) Acc@1 89.941 (89.941) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:07:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.156) Loss 0.8618 (0.6814) Acc@1 81.445 (86.426) Acc@5 96.191 (97.723) Mem 14939MB [2024-07-25 06:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9927 (0.7970) Acc@1 76.904 (83.275) Acc@5 95.459 (96.610) Mem 14939MB [2024-07-25 06:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.879 Acc@5 96.575 [2024-07-25 06:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-07-25 06:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.88% [2024-07-25 06:07:51 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:07:52 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:07:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][0/625] eta 0:08:00 lr 0.000391 wd 0.0500 time 0.7691 (0.7691) data time 0.3916 (0.3916) model time 0.0000 (0.0000) loss 6.5690 (6.5690) grad_norm 2.7933 (2.7933) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:07:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][10/625] eta 0:05:05 lr 0.000391 wd 0.0500 time 0.3985 (0.4973) data time 0.0008 (0.0364) model time 0.0000 (0.0000) loss 7.9472 (6.6936) grad_norm 2.2260 (2.7272) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:08:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][20/625] eta 0:04:52 lr 0.000391 wd 0.0500 time 0.3979 (0.4831) data time 0.0006 (0.0195) model time 0.0000 (0.0000) loss 5.5639 (6.8361) grad_norm 3.4740 (2.8686) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:08:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][30/625] eta 0:04:41 lr 0.000391 wd 0.0500 time 0.3978 (0.4730) data time 0.0007 (0.0136) model time 0.0000 (0.0000) loss 6.9365 (6.7786) grad_norm 2.3942 (2.7636) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:08:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][40/625] eta 0:04:33 lr 0.000391 wd 0.0500 time 0.4022 (0.4673) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 7.5712 (6.7896) grad_norm 4.2057 (2.7705) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:08:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][50/625] eta 0:04:21 lr 0.000391 wd 0.0500 time 0.3966 (0.4539) data time 0.0007 (0.0086) model time 0.0000 (0.0000) loss 7.2083 (6.8459) grad_norm 3.3752 (3.4552) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:08:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][60/625] eta 0:04:11 lr 0.000390 wd 0.0500 time 0.3974 (0.4449) data time 0.0009 (0.0073) model time 0.3965 (0.3977) loss 6.9119 (6.8395) grad_norm 2.8737 (3.3636) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:08:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][70/625] eta 0:04:03 lr 0.000390 wd 0.0500 time 0.4059 (0.4386) data time 0.0009 (0.0064) model time 0.4050 (0.3985) loss 6.1751 (6.8709) grad_norm 3.4183 (3.5052) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:08:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][80/625] eta 0:03:56 lr 0.000390 wd 0.0500 time 0.3976 (0.4336) data time 0.0009 (0.0057) model time 0.3968 (0.3982) loss 7.8211 (6.8551) grad_norm 1.8480 (3.4124) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:08:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][90/625] eta 0:03:49 lr 0.000390 wd 0.0500 time 0.4081 (0.4299) data time 0.0008 (0.0052) model time 0.4072 (0.3983) loss 6.2226 (6.8537) grad_norm 2.8262 (inf) loss_scale 256.0000 (486.6813) mem 14939MB [2024-07-25 06:08:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][100/625] eta 0:03:44 lr 0.000390 wd 0.0500 time 0.3932 (0.4268) data time 0.0008 (0.0048) model time 0.3924 (0.3982) loss 7.6824 (6.8812) grad_norm 2.0984 (inf) loss_scale 256.0000 (463.8416) mem 14939MB [2024-07-25 06:08:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][110/625] eta 0:03:38 lr 0.000390 wd 0.0500 time 0.3999 (0.4242) data time 0.0007 (0.0044) model time 0.3992 (0.3981) loss 6.0247 (6.9001) grad_norm 2.6962 (inf) loss_scale 256.0000 (445.1171) mem 14939MB [2024-07-25 06:08:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][120/625] eta 0:03:33 lr 0.000390 wd 0.0500 time 0.3951 (0.4223) data time 0.0008 (0.0041) model time 0.3943 (0.3983) loss 7.0458 (6.8784) grad_norm 2.8787 (inf) loss_scale 256.0000 (429.4876) mem 14939MB [2024-07-25 06:08:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][130/625] eta 0:03:28 lr 0.000390 wd 0.0500 time 0.3975 (0.4206) data time 0.0008 (0.0039) model time 0.3967 (0.3985) loss 5.6491 (6.8676) grad_norm 2.3045 (inf) loss_scale 256.0000 (416.2443) mem 14939MB [2024-07-25 06:08:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][140/625] eta 0:03:23 lr 0.000390 wd 0.0500 time 0.4033 (0.4191) data time 0.0009 (0.0037) model time 0.4024 (0.3985) loss 7.8188 (6.8919) grad_norm 3.0864 (inf) loss_scale 256.0000 (404.8794) mem 14939MB [2024-07-25 06:08:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][150/625] eta 0:03:18 lr 0.000390 wd 0.0500 time 0.3974 (0.4177) data time 0.0008 (0.0035) model time 0.3965 (0.3983) loss 6.0747 (6.8734) grad_norm 1.7556 (inf) loss_scale 256.0000 (395.0199) mem 14939MB [2024-07-25 06:08:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][160/625] eta 0:03:13 lr 0.000389 wd 0.0500 time 0.3957 (0.4165) data time 0.0006 (0.0033) model time 0.3951 (0.3982) loss 8.3163 (6.8945) grad_norm 6.5032 (inf) loss_scale 256.0000 (386.3851) mem 14939MB [2024-07-25 06:09:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][170/625] eta 0:03:09 lr 0.000389 wd 0.0500 time 0.4037 (0.4155) data time 0.0008 (0.0032) model time 0.4029 (0.3983) loss 7.7084 (6.9047) grad_norm 4.4825 (inf) loss_scale 256.0000 (378.7602) mem 14939MB [2024-07-25 06:09:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][180/625] eta 0:03:04 lr 0.000389 wd 0.0500 time 0.4001 (0.4147) data time 0.0008 (0.0031) model time 0.3992 (0.3985) loss 7.8603 (6.8924) grad_norm 5.1560 (inf) loss_scale 256.0000 (371.9779) mem 14939MB [2024-07-25 06:09:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][190/625] eta 0:03:00 lr 0.000389 wd 0.0500 time 0.3983 (0.4140) data time 0.0009 (0.0030) model time 0.3974 (0.3986) loss 5.8656 (6.8803) grad_norm 2.1464 (inf) loss_scale 256.0000 (365.9058) mem 14939MB [2024-07-25 06:09:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][200/625] eta 0:02:55 lr 0.000389 wd 0.0500 time 0.3996 (0.4132) data time 0.0008 (0.0029) model time 0.3988 (0.3984) loss 6.7189 (6.8814) grad_norm 1.8249 (inf) loss_scale 256.0000 (360.4378) mem 14939MB [2024-07-25 06:09:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][210/625] eta 0:02:51 lr 0.000389 wd 0.0500 time 0.5852 (0.4135) data time 0.0007 (0.0028) model time 0.5845 (0.3996) loss 6.9449 (6.8784) grad_norm 4.0061 (inf) loss_scale 256.0000 (355.4882) mem 14939MB [2024-07-25 06:09:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][220/625] eta 0:02:47 lr 0.000389 wd 0.0500 time 0.5339 (0.4134) data time 0.0006 (0.0027) model time 0.5333 (0.4003) loss 5.7987 (6.8803) grad_norm 4.1573 (inf) loss_scale 256.0000 (350.9864) mem 14939MB [2024-07-25 06:09:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][230/625] eta 0:02:44 lr 0.000389 wd 0.0500 time 0.3959 (0.4168) data time 0.0009 (0.0026) model time 0.3950 (0.4054) loss 5.7970 (6.8835) grad_norm 4.1592 (inf) loss_scale 256.0000 (346.8745) mem 14939MB [2024-07-25 06:09:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][240/625] eta 0:02:40 lr 0.000389 wd 0.0500 time 0.3986 (0.4181) data time 0.0008 (0.0025) model time 0.3979 (0.4075) loss 5.7340 (6.8980) grad_norm 3.3854 (inf) loss_scale 256.0000 (343.1037) mem 14939MB [2024-07-25 06:09:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][250/625] eta 0:02:38 lr 0.000389 wd 0.0500 time 0.5986 (0.4218) data time 0.0009 (0.0025) model time 0.5977 (0.4127) loss 6.9668 (6.8941) grad_norm 2.7358 (inf) loss_scale 256.0000 (339.6335) mem 14939MB [2024-07-25 06:09:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][260/625] eta 0:02:34 lr 0.000388 wd 0.0500 time 0.3958 (0.4223) data time 0.0008 (0.0024) model time 0.3950 (0.4137) loss 5.6611 (6.8954) grad_norm 4.1015 (inf) loss_scale 256.0000 (336.4291) mem 14939MB [2024-07-25 06:09:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][270/625] eta 0:02:29 lr 0.000388 wd 0.0500 time 0.4013 (0.4215) data time 0.0007 (0.0024) model time 0.4006 (0.4131) loss 5.9732 (6.9094) grad_norm 3.3182 (inf) loss_scale 256.0000 (333.4613) mem 14939MB [2024-07-25 06:09:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][280/625] eta 0:02:25 lr 0.000388 wd 0.0500 time 0.3972 (0.4207) data time 0.0008 (0.0023) model time 0.3964 (0.4124) loss 7.2279 (6.8988) grad_norm 3.4562 (inf) loss_scale 256.0000 (330.7046) mem 14939MB [2024-07-25 06:09:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][290/625] eta 0:02:20 lr 0.000388 wd 0.0500 time 0.3987 (0.4199) data time 0.0008 (0.0023) model time 0.3980 (0.4118) loss 6.7817 (6.9077) grad_norm 2.2539 (inf) loss_scale 256.0000 (328.1375) mem 14939MB [2024-07-25 06:09:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][300/625] eta 0:02:16 lr 0.000388 wd 0.0500 time 0.3955 (0.4192) data time 0.0006 (0.0022) model time 0.3949 (0.4112) loss 6.2056 (6.9174) grad_norm 4.2867 (inf) loss_scale 256.0000 (325.7409) mem 14939MB [2024-07-25 06:10:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][310/625] eta 0:02:11 lr 0.000388 wd 0.0500 time 0.4019 (0.4185) data time 0.0010 (0.0022) model time 0.4010 (0.4107) loss 6.5164 (6.9266) grad_norm 5.3391 (inf) loss_scale 256.0000 (323.4984) mem 14939MB [2024-07-25 06:10:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][320/625] eta 0:02:07 lr 0.000388 wd 0.0500 time 0.3976 (0.4179) data time 0.0008 (0.0021) model time 0.3968 (0.4102) loss 5.0874 (6.9117) grad_norm 4.6981 (inf) loss_scale 256.0000 (321.3956) mem 14939MB [2024-07-25 06:10:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][330/625] eta 0:02:03 lr 0.000388 wd 0.0500 time 0.3984 (0.4173) data time 0.0006 (0.0021) model time 0.3978 (0.4097) loss 6.7911 (6.9086) grad_norm 2.8006 (inf) loss_scale 256.0000 (319.4199) mem 14939MB [2024-07-25 06:10:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][340/625] eta 0:01:58 lr 0.000388 wd 0.0500 time 0.4076 (0.4169) data time 0.0009 (0.0021) model time 0.4068 (0.4094) loss 7.7427 (6.9142) grad_norm 3.5675 (inf) loss_scale 256.0000 (317.5601) mem 14939MB [2024-07-25 06:10:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][350/625] eta 0:01:54 lr 0.000388 wd 0.0500 time 0.3973 (0.4163) data time 0.0009 (0.0020) model time 0.3964 (0.4090) loss 7.5044 (6.9215) grad_norm 2.5744 (inf) loss_scale 256.0000 (315.8063) mem 14939MB [2024-07-25 06:10:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][360/625] eta 0:01:50 lr 0.000387 wd 0.0500 time 0.3984 (0.4158) data time 0.0008 (0.0020) model time 0.3976 (0.4087) loss 6.6106 (6.9267) grad_norm 2.5740 (inf) loss_scale 256.0000 (314.1496) mem 14939MB [2024-07-25 06:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][370/625] eta 0:01:45 lr 0.000387 wd 0.0500 time 0.3941 (0.4154) data time 0.0007 (0.0020) model time 0.3934 (0.4084) loss 7.5035 (6.9319) grad_norm 1.8492 (inf) loss_scale 256.0000 (312.5822) mem 14939MB [2024-07-25 06:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][380/625] eta 0:01:41 lr 0.000387 wd 0.0500 time 0.4026 (0.4150) data time 0.0008 (0.0019) model time 0.4018 (0.4080) loss 8.1146 (6.9314) grad_norm 2.0657 (inf) loss_scale 256.0000 (311.0971) mem 14939MB [2024-07-25 06:10:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][390/625] eta 0:01:37 lr 0.000387 wd 0.0500 time 0.3973 (0.4147) data time 0.0009 (0.0019) model time 0.3964 (0.4078) loss 7.5529 (6.9350) grad_norm 2.8874 (inf) loss_scale 256.0000 (309.6880) mem 14939MB [2024-07-25 06:10:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][400/625] eta 0:01:33 lr 0.000387 wd 0.0500 time 0.3958 (0.4143) data time 0.0007 (0.0019) model time 0.3951 (0.4076) loss 5.6116 (6.9350) grad_norm 5.9890 (inf) loss_scale 256.0000 (308.3491) mem 14939MB [2024-07-25 06:10:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][410/625] eta 0:01:28 lr 0.000387 wd 0.0500 time 0.3973 (0.4139) data time 0.0007 (0.0019) model time 0.3966 (0.4073) loss 8.5770 (6.9483) grad_norm 4.3564 (inf) loss_scale 256.0000 (307.0754) mem 14939MB [2024-07-25 06:10:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][420/625] eta 0:01:24 lr 0.000387 wd 0.0500 time 0.3961 (0.4135) data time 0.0008 (0.0018) model time 0.3953 (0.4070) loss 6.2496 (6.9574) grad_norm 3.4887 (inf) loss_scale 256.0000 (305.8622) mem 14939MB [2024-07-25 06:10:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][430/625] eta 0:01:20 lr 0.000387 wd 0.0500 time 0.3957 (0.4136) data time 0.0007 (0.0018) model time 0.3950 (0.4073) loss 7.5904 (6.9609) grad_norm 2.3562 (inf) loss_scale 256.0000 (304.7053) mem 14939MB [2024-07-25 06:10:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][440/625] eta 0:01:16 lr 0.000387 wd 0.0500 time 0.5712 (0.4137) data time 0.0006 (0.0018) model time 0.5706 (0.4075) loss 6.5589 (6.9673) grad_norm 2.1982 (inf) loss_scale 256.0000 (303.6009) mem 14939MB [2024-07-25 06:10:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][450/625] eta 0:01:12 lr 0.000387 wd 0.0500 time 0.3984 (0.4146) data time 0.0009 (0.0018) model time 0.3975 (0.4087) loss 7.3288 (6.9630) grad_norm 2.6611 (inf) loss_scale 256.0000 (302.5455) mem 14939MB [2024-07-25 06:11:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][460/625] eta 0:01:08 lr 0.000386 wd 0.0500 time 0.5873 (0.4154) data time 0.0007 (0.0018) model time 0.5866 (0.4097) loss 7.2633 (6.9632) grad_norm 4.7620 (inf) loss_scale 256.0000 (301.5358) mem 14939MB [2024-07-25 06:11:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][470/625] eta 0:01:04 lr 0.000386 wd 0.0500 time 0.3996 (0.4170) data time 0.0008 (0.0017) model time 0.3987 (0.4116) loss 7.1794 (6.9617) grad_norm 3.7697 (inf) loss_scale 256.0000 (300.5690) mem 14939MB [2024-07-25 06:11:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][480/625] eta 0:01:00 lr 0.000386 wd 0.0500 time 0.4022 (0.4184) data time 0.0009 (0.0017) model time 0.4013 (0.4132) loss 6.5968 (6.9562) grad_norm 3.4477 (inf) loss_scale 256.0000 (299.6424) mem 14939MB [2024-07-25 06:11:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][490/625] eta 0:00:56 lr 0.000386 wd 0.0500 time 0.3944 (0.4180) data time 0.0009 (0.0017) model time 0.3935 (0.4129) loss 6.7525 (6.9468) grad_norm 2.4781 (inf) loss_scale 256.0000 (298.7536) mem 14939MB [2024-07-25 06:11:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][500/625] eta 0:00:52 lr 0.000386 wd 0.0500 time 0.3986 (0.4176) data time 0.0007 (0.0017) model time 0.3979 (0.4126) loss 7.9879 (6.9525) grad_norm 3.2328 (inf) loss_scale 256.0000 (297.9002) mem 14939MB [2024-07-25 06:11:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][510/625] eta 0:00:47 lr 0.000386 wd 0.0500 time 0.4004 (0.4172) data time 0.0006 (0.0017) model time 0.3998 (0.4123) loss 7.6715 (6.9552) grad_norm 2.4216 (inf) loss_scale 256.0000 (297.0802) mem 14939MB [2024-07-25 06:11:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][520/625] eta 0:00:43 lr 0.000386 wd 0.0500 time 0.4002 (0.4169) data time 0.0009 (0.0017) model time 0.3993 (0.4120) loss 7.7408 (6.9529) grad_norm 1.9442 (inf) loss_scale 256.0000 (296.2917) mem 14939MB [2024-07-25 06:11:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][530/625] eta 0:00:39 lr 0.000386 wd 0.0500 time 0.3993 (0.4166) data time 0.0006 (0.0016) model time 0.3987 (0.4117) loss 6.9892 (6.9548) grad_norm 1.6548 (inf) loss_scale 256.0000 (295.5330) mem 14939MB [2024-07-25 06:11:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][540/625] eta 0:00:35 lr 0.000386 wd 0.0500 time 0.3977 (0.4163) data time 0.0007 (0.0016) model time 0.3970 (0.4115) loss 5.7452 (6.9530) grad_norm 2.1160 (inf) loss_scale 256.0000 (294.8022) mem 14939MB [2024-07-25 06:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][550/625] eta 0:00:31 lr 0.000386 wd 0.0500 time 0.4000 (0.4160) data time 0.0007 (0.0016) model time 0.3993 (0.4112) loss 7.7029 (6.9562) grad_norm 3.1224 (inf) loss_scale 256.0000 (294.0980) mem 14939MB [2024-07-25 06:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][560/625] eta 0:00:27 lr 0.000386 wd 0.0500 time 0.3982 (0.4156) data time 0.0008 (0.0016) model time 0.3973 (0.4109) loss 7.0875 (6.9545) grad_norm 2.5920 (inf) loss_scale 256.0000 (293.4189) mem 14939MB [2024-07-25 06:11:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][570/625] eta 0:00:22 lr 0.000385 wd 0.0500 time 0.3943 (0.4153) data time 0.0006 (0.0016) model time 0.3936 (0.4106) loss 7.0474 (6.9567) grad_norm 4.3858 (inf) loss_scale 256.0000 (292.7636) mem 14939MB [2024-07-25 06:11:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][580/625] eta 0:00:18 lr 0.000385 wd 0.0500 time 0.3976 (0.4151) data time 0.0007 (0.0016) model time 0.3969 (0.4104) loss 7.4547 (6.9540) grad_norm 2.5810 (inf) loss_scale 256.0000 (292.1308) mem 14939MB [2024-07-25 06:11:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][590/625] eta 0:00:14 lr 0.000385 wd 0.0500 time 0.4019 (0.4148) data time 0.0008 (0.0016) model time 0.4011 (0.4102) loss 6.4683 (6.9502) grad_norm 4.1137 (inf) loss_scale 256.0000 (291.5195) mem 14939MB [2024-07-25 06:12:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][600/625] eta 0:00:10 lr 0.000385 wd 0.0500 time 0.4045 (0.4146) data time 0.0008 (0.0016) model time 0.4037 (0.4100) loss 8.1816 (6.9534) grad_norm 2.7223 (inf) loss_scale 256.0000 (290.9285) mem 14939MB [2024-07-25 06:12:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][610/625] eta 0:00:06 lr 0.000385 wd 0.0500 time 0.3961 (0.4143) data time 0.0004 (0.0016) model time 0.3957 (0.4098) loss 6.3338 (6.9440) grad_norm 2.6778 (inf) loss_scale 256.0000 (290.3568) mem 14939MB [2024-07-25 06:12:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][620/625] eta 0:00:02 lr 0.000385 wd 0.0500 time 0.3967 (0.4141) data time 0.0004 (0.0015) model time 0.3963 (0.4096) loss 6.1725 (6.9438) grad_norm 3.4344 (inf) loss_scale 256.0000 (289.8035) mem 14939MB [2024-07-25 06:12:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 193 training takes 0:04:18 [2024-07-25 06:12:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:12:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:12:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5610 (0.5610) Acc@1 89.307 (89.307) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 06:12:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8945 (0.6994) Acc@1 80.957 (86.084) Acc@5 95.996 (97.590) Mem 14939MB [2024-07-25 06:12:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 1.0205 (0.8176) Acc@1 76.221 (82.924) Acc@5 95.020 (96.366) Mem 14939MB [2024-07-25 06:12:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.494 Acc@5 96.337 [2024-07-25 06:12:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 06:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.779 (0.779) Loss 0.5483 (0.5483) Acc@1 89.941 (89.941) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:12:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 0.8608 (0.6811) Acc@1 81.494 (86.421) Acc@5 96.191 (97.745) Mem 14939MB [2024-07-25 06:12:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 0.9907 (0.7962) Acc@1 76.904 (83.273) Acc@5 95.410 (96.622) Mem 14939MB [2024-07-25 06:12:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.873 Acc@5 96.585 [2024-07-25 06:12:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-07-25 06:12:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][0/625] eta 0:13:24 lr 0.000385 wd 0.0500 time 1.2877 (1.2877) data time 0.7163 (0.7163) model time 0.0000 (0.0000) loss 7.4377 (7.4377) grad_norm 3.8841 (3.8841) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][10/625] eta 0:04:55 lr 0.000385 wd 0.0500 time 0.3961 (0.4797) data time 0.0006 (0.0659) model time 0.0000 (0.0000) loss 5.1149 (6.6845) grad_norm 3.1530 (3.2600) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][20/625] eta 0:04:32 lr 0.000385 wd 0.0500 time 0.3773 (0.4500) data time 0.0007 (0.0349) model time 0.0000 (0.0000) loss 6.5872 (6.7914) grad_norm 1.8035 (2.8440) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][30/625] eta 0:04:24 lr 0.000385 wd 0.0500 time 0.4042 (0.4441) data time 0.0006 (0.0239) model time 0.0000 (0.0000) loss 7.8853 (7.0175) grad_norm 1.4865 (2.6287) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][40/625] eta 0:04:23 lr 0.000384 wd 0.0500 time 0.6152 (0.4503) data time 0.0007 (0.0183) model time 0.0000 (0.0000) loss 5.8432 (6.9225) grad_norm 3.0123 (2.6087) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][50/625] eta 0:04:17 lr 0.000384 wd 0.0500 time 0.6195 (0.4486) data time 0.0006 (0.0149) model time 0.0000 (0.0000) loss 6.9571 (6.9144) grad_norm 4.8439 (2.8644) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][60/625] eta 0:04:13 lr 0.000384 wd 0.0500 time 0.3978 (0.4484) data time 0.0008 (0.0126) model time 0.3970 (0.4468) loss 5.8208 (6.8905) grad_norm 2.2156 (2.8518) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][70/625] eta 0:04:14 lr 0.000384 wd 0.0500 time 0.4065 (0.4586) data time 0.0007 (0.0110) model time 0.4058 (0.4831) loss 8.0440 (6.8519) grad_norm 2.5286 (2.8487) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][80/625] eta 0:04:06 lr 0.000384 wd 0.0500 time 0.3985 (0.4526) data time 0.0006 (0.0098) model time 0.3979 (0.4584) loss 6.6325 (6.8206) grad_norm 2.0574 (2.8445) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:12:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][90/625] eta 0:03:59 lr 0.000384 wd 0.0500 time 0.3945 (0.4468) data time 0.0009 (0.0088) model time 0.3937 (0.4434) loss 7.0568 (6.8378) grad_norm 1.7957 (2.8620) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][100/625] eta 0:03:52 lr 0.000384 wd 0.0500 time 0.3948 (0.4421) data time 0.0006 (0.0080) model time 0.3942 (0.4344) loss 7.6704 (6.8043) grad_norm 2.4568 (2.8205) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][110/625] eta 0:03:45 lr 0.000384 wd 0.0500 time 0.3977 (0.4381) data time 0.0006 (0.0074) model time 0.3970 (0.4282) loss 6.1324 (6.7934) grad_norm 2.0851 (2.7665) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][120/625] eta 0:03:39 lr 0.000384 wd 0.0500 time 0.3962 (0.4349) data time 0.0007 (0.0068) model time 0.3955 (0.4240) loss 7.8574 (6.8129) grad_norm 2.4748 (2.7451) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][130/625] eta 0:03:34 lr 0.000384 wd 0.0500 time 0.3968 (0.4326) data time 0.0008 (0.0064) model time 0.3960 (0.4215) loss 8.0329 (6.8348) grad_norm 16.9589 (2.8299) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][140/625] eta 0:03:28 lr 0.000383 wd 0.0500 time 0.3984 (0.4303) data time 0.0008 (0.0060) model time 0.3976 (0.4190) loss 6.8884 (6.8637) grad_norm 2.4746 (2.8171) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][150/625] eta 0:03:23 lr 0.000383 wd 0.0500 time 0.3921 (0.4285) data time 0.0008 (0.0057) model time 0.3914 (0.4173) loss 5.6605 (6.8755) grad_norm 2.7279 (2.8389) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][160/625] eta 0:03:18 lr 0.000383 wd 0.0500 time 0.4013 (0.4270) data time 0.0007 (0.0054) model time 0.4006 (0.4161) loss 7.0582 (6.8599) grad_norm 2.9647 (2.8995) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][170/625] eta 0:03:13 lr 0.000383 wd 0.0500 time 0.3965 (0.4254) data time 0.0012 (0.0051) model time 0.3953 (0.4146) loss 7.5021 (6.8652) grad_norm 2.3129 (2.9003) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][180/625] eta 0:03:08 lr 0.000383 wd 0.0500 time 0.4091 (0.4241) data time 0.0007 (0.0049) model time 0.4084 (0.4135) loss 6.6608 (6.8779) grad_norm 2.3884 (2.8980) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][190/625] eta 0:03:03 lr 0.000383 wd 0.0500 time 0.3939 (0.4230) data time 0.0006 (0.0049) model time 0.3933 (0.4123) loss 7.5911 (6.8778) grad_norm 2.8744 (2.9197) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][200/625] eta 0:02:59 lr 0.000383 wd 0.0500 time 0.3974 (0.4217) data time 0.0006 (0.0047) model time 0.3968 (0.4114) loss 6.5044 (6.8880) grad_norm 3.1209 (2.9322) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][210/625] eta 0:02:54 lr 0.000383 wd 0.0500 time 0.3949 (0.4206) data time 0.0006 (0.0046) model time 0.3943 (0.4105) loss 6.6977 (6.8839) grad_norm 1.9747 (2.9410) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][220/625] eta 0:02:49 lr 0.000383 wd 0.0500 time 0.3946 (0.4196) data time 0.0008 (0.0044) model time 0.3938 (0.4097) loss 5.9878 (6.8831) grad_norm 3.5308 (2.9654) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][230/625] eta 0:02:45 lr 0.000383 wd 0.0500 time 0.4038 (0.4188) data time 0.0008 (0.0042) model time 0.4030 (0.4092) loss 6.9409 (6.8857) grad_norm 2.3163 (2.9892) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:13:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][240/625] eta 0:02:41 lr 0.000382 wd 0.0500 time 0.3980 (0.4186) data time 0.0006 (0.0041) model time 0.3974 (0.4094) loss 6.1429 (6.8879) grad_norm 2.5771 (3.0109) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][250/625] eta 0:02:37 lr 0.000382 wd 0.0500 time 0.3976 (0.4191) data time 0.0007 (0.0040) model time 0.3969 (0.4104) loss 8.1776 (6.8998) grad_norm 1.9083 (3.0198) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][260/625] eta 0:02:33 lr 0.000382 wd 0.0500 time 0.3969 (0.4197) data time 0.0007 (0.0038) model time 0.3962 (0.4115) loss 6.3739 (6.8907) grad_norm 2.2523 (3.0396) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][270/625] eta 0:02:29 lr 0.000382 wd 0.0500 time 0.6229 (0.4225) data time 0.0007 (0.0037) model time 0.6222 (0.4153) loss 6.5173 (6.8861) grad_norm 2.2369 (3.0308) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][280/625] eta 0:02:26 lr 0.000382 wd 0.0500 time 0.5767 (0.4233) data time 0.0008 (0.0037) model time 0.5759 (0.4165) loss 5.8124 (6.8655) grad_norm 1.8818 (3.0187) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][290/625] eta 0:02:22 lr 0.000382 wd 0.0500 time 0.5994 (0.4267) data time 0.0009 (0.0036) model time 0.5984 (0.4209) loss 7.0236 (6.8597) grad_norm 3.6201 (3.0447) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][300/625] eta 0:02:18 lr 0.000382 wd 0.0500 time 0.3967 (0.4265) data time 0.0006 (0.0035) model time 0.3960 (0.4209) loss 6.3098 (6.8623) grad_norm 5.1300 (3.0546) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][310/625] eta 0:02:14 lr 0.000382 wd 0.0500 time 0.3986 (0.4257) data time 0.0006 (0.0034) model time 0.3980 (0.4201) loss 6.3678 (6.8629) grad_norm 3.1558 (3.0767) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][320/625] eta 0:02:09 lr 0.000382 wd 0.0500 time 0.4038 (0.4249) data time 0.0009 (0.0033) model time 0.4029 (0.4193) loss 7.4129 (6.8741) grad_norm 2.6600 (3.0808) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][330/625] eta 0:02:05 lr 0.000382 wd 0.0500 time 0.3970 (0.4241) data time 0.0007 (0.0033) model time 0.3964 (0.4186) loss 5.8883 (6.8731) grad_norm 2.8470 (3.0664) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][340/625] eta 0:02:00 lr 0.000381 wd 0.0500 time 0.3999 (0.4234) data time 0.0007 (0.0032) model time 0.3993 (0.4179) loss 7.3402 (6.8627) grad_norm 2.4171 (3.0726) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][350/625] eta 0:01:56 lr 0.000381 wd 0.0500 time 0.4015 (0.4227) data time 0.0007 (0.0031) model time 0.4008 (0.4172) loss 7.9537 (6.8640) grad_norm 3.0884 (3.0612) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][360/625] eta 0:01:51 lr 0.000381 wd 0.0500 time 0.3945 (0.4221) data time 0.0007 (0.0031) model time 0.3938 (0.4166) loss 7.1651 (6.8582) grad_norm 2.0817 (3.0498) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][370/625] eta 0:01:47 lr 0.000381 wd 0.0500 time 0.3999 (0.4215) data time 0.0007 (0.0030) model time 0.3992 (0.4161) loss 6.7521 (6.8584) grad_norm 2.5616 (3.0386) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:14:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][380/625] eta 0:01:43 lr 0.000381 wd 0.0500 time 0.4008 (0.4210) data time 0.0006 (0.0029) model time 0.4001 (0.4156) loss 5.7204 (6.8579) grad_norm 2.5989 (3.0276) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][390/625] eta 0:01:38 lr 0.000381 wd 0.0500 time 0.3973 (0.4204) data time 0.0008 (0.0029) model time 0.3964 (0.4151) loss 6.1803 (6.8533) grad_norm 2.3890 (3.0184) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][400/625] eta 0:01:34 lr 0.000381 wd 0.0500 time 0.4007 (0.4199) data time 0.0008 (0.0028) model time 0.3999 (0.4146) loss 8.2005 (6.8640) grad_norm 2.6302 (3.0053) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][410/625] eta 0:01:30 lr 0.000381 wd 0.0500 time 0.4220 (0.4194) data time 0.0007 (0.0028) model time 0.4213 (0.4142) loss 6.5740 (6.8616) grad_norm 2.9771 (3.0064) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][420/625] eta 0:01:25 lr 0.000381 wd 0.0500 time 0.3969 (0.4189) data time 0.0008 (0.0028) model time 0.3961 (0.4138) loss 5.6197 (6.8571) grad_norm 2.1690 (2.9911) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][430/625] eta 0:01:21 lr 0.000381 wd 0.0500 time 0.3965 (0.4185) data time 0.0009 (0.0027) model time 0.3956 (0.4134) loss 7.4123 (6.8607) grad_norm 2.7160 (2.9803) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][440/625] eta 0:01:17 lr 0.000381 wd 0.0500 time 0.4013 (0.4181) data time 0.0007 (0.0027) model time 0.4006 (0.4131) loss 7.4956 (6.8546) grad_norm 2.6470 (2.9778) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][450/625] eta 0:01:13 lr 0.000380 wd 0.0500 time 0.3963 (0.4177) data time 0.0008 (0.0026) model time 0.3955 (0.4127) loss 7.0019 (6.8543) grad_norm 4.5929 (2.9650) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][460/625] eta 0:01:08 lr 0.000380 wd 0.0500 time 0.3963 (0.4177) data time 0.0006 (0.0026) model time 0.3958 (0.4128) loss 7.4632 (6.8659) grad_norm 2.5231 (2.9624) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][470/625] eta 0:01:04 lr 0.000380 wd 0.0500 time 0.3978 (0.4177) data time 0.0008 (0.0026) model time 0.3969 (0.4129) loss 7.1342 (6.8639) grad_norm 3.3150 (2.9507) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][480/625] eta 0:01:00 lr 0.000380 wd 0.0500 time 0.4006 (0.4180) data time 0.0009 (0.0025) model time 0.3998 (0.4134) loss 6.7753 (6.8613) grad_norm 1.9423 (2.9382) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][490/625] eta 0:00:56 lr 0.000380 wd 0.0500 time 0.3967 (0.4187) data time 0.0006 (0.0025) model time 0.3960 (0.4142) loss 5.8306 (6.8490) grad_norm 2.4499 (2.9379) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][500/625] eta 0:00:52 lr 0.000380 wd 0.0500 time 0.3976 (0.4191) data time 0.0008 (0.0024) model time 0.3968 (0.4147) loss 7.3031 (6.8505) grad_norm 2.2575 (2.9426) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][510/625] eta 0:00:48 lr 0.000380 wd 0.0500 time 0.3998 (0.4204) data time 0.0006 (0.0024) model time 0.3991 (0.4163) loss 5.8401 (6.8436) grad_norm 3.5014 (2.9447) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:15:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][520/625] eta 0:00:44 lr 0.000380 wd 0.0500 time 0.3988 (0.4204) data time 0.0008 (0.0024) model time 0.3980 (0.4163) loss 6.9609 (6.8451) grad_norm 2.1540 (2.9502) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][530/625] eta 0:00:39 lr 0.000380 wd 0.0500 time 0.3999 (0.4200) data time 0.0009 (0.0024) model time 0.3990 (0.4159) loss 5.9207 (6.8456) grad_norm 2.0145 (2.9553) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][540/625] eta 0:00:35 lr 0.000380 wd 0.0500 time 0.3982 (0.4196) data time 0.0009 (0.0023) model time 0.3973 (0.4156) loss 6.6358 (6.8444) grad_norm 2.4818 (2.9558) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][550/625] eta 0:00:31 lr 0.000379 wd 0.0500 time 0.3983 (0.4193) data time 0.0008 (0.0023) model time 0.3976 (0.4153) loss 5.7261 (6.8531) grad_norm 2.1568 (2.9459) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][560/625] eta 0:00:27 lr 0.000379 wd 0.0500 time 0.3994 (0.4189) data time 0.0007 (0.0023) model time 0.3986 (0.4149) loss 7.9311 (6.8522) grad_norm 2.7270 (2.9491) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][570/625] eta 0:00:23 lr 0.000379 wd 0.0500 time 0.4017 (0.4186) data time 0.0008 (0.0023) model time 0.4009 (0.4146) loss 5.9164 (6.8533) grad_norm 3.1857 (2.9444) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][580/625] eta 0:00:18 lr 0.000379 wd 0.0500 time 0.4006 (0.4183) data time 0.0009 (0.0022) model time 0.3997 (0.4144) loss 7.5730 (6.8611) grad_norm 2.6572 (2.9530) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][590/625] eta 0:00:14 lr 0.000379 wd 0.0500 time 0.4049 (0.4180) data time 0.0008 (0.0022) model time 0.4041 (0.4141) loss 7.8723 (6.8617) grad_norm 2.2609 (2.9468) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][600/625] eta 0:00:10 lr 0.000379 wd 0.0500 time 0.4011 (0.4177) data time 0.0007 (0.0022) model time 0.4004 (0.4138) loss 7.6036 (6.8637) grad_norm 3.1443 (2.9510) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][610/625] eta 0:00:06 lr 0.000379 wd 0.0500 time 0.3979 (0.4175) data time 0.0006 (0.0022) model time 0.3973 (0.4136) loss 7.4381 (6.8665) grad_norm 1.8208 (2.9498) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][620/625] eta 0:00:02 lr 0.000379 wd 0.0500 time 0.3968 (0.4172) data time 0.0006 (0.0022) model time 0.3961 (0.4133) loss 5.5252 (6.8710) grad_norm 3.0074 (2.9633) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 194 training takes 0:04:20 [2024-07-25 06:16:38 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:16:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:16:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.5508 (0.5508) Acc@1 89.795 (89.795) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 06:16:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.9043 (0.6967) Acc@1 79.639 (85.973) Acc@5 96.143 (97.625) Mem 14939MB [2024-07-25 06:16:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9775 (0.8149) Acc@1 77.686 (82.792) Acc@5 95.312 (96.477) Mem 14939MB [2024-07-25 06:16:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.398 Acc@5 96.447 [2024-07-25 06:16:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-07-25 06:16:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.836 (0.836) Loss 0.5479 (0.5479) Acc@1 89.990 (89.990) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:16:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.157) Loss 0.8599 (0.6801) Acc@1 81.494 (86.488) Acc@5 96.240 (97.745) Mem 14939MB [2024-07-25 06:16:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9897 (0.7950) Acc@1 77.051 (83.347) Acc@5 95.410 (96.615) Mem 14939MB [2024-07-25 06:16:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.951 Acc@5 96.577 [2024-07-25 06:16:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 06:16:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.95% [2024-07-25 06:16:44 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:16:45 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:16:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][0/625] eta 0:08:10 lr 0.000379 wd 0.0500 time 0.7841 (0.7841) data time 0.4073 (0.4073) model time 0.0000 (0.0000) loss 6.4466 (6.4466) grad_norm 2.3458 (2.3458) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][10/625] eta 0:04:27 lr 0.000379 wd 0.0500 time 0.3984 (0.4350) data time 0.0010 (0.0378) model time 0.0000 (0.0000) loss 7.9942 (6.9854) grad_norm 5.1667 (2.8018) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][20/625] eta 0:04:12 lr 0.000378 wd 0.0500 time 0.3986 (0.4181) data time 0.0007 (0.0203) model time 0.0000 (0.0000) loss 7.0959 (6.9202) grad_norm 4.1116 (3.2493) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][30/625] eta 0:04:06 lr 0.000378 wd 0.0500 time 0.4002 (0.4137) data time 0.0008 (0.0140) model time 0.0000 (0.0000) loss 7.5466 (6.8780) grad_norm 8.2579 (3.8616) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][40/625] eta 0:04:00 lr 0.000378 wd 0.0500 time 0.4136 (0.4111) data time 0.0007 (0.0108) model time 0.0000 (0.0000) loss 7.3371 (6.9262) grad_norm 3.8876 (3.8365) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][50/625] eta 0:03:55 lr 0.000378 wd 0.0500 time 0.3947 (0.4090) data time 0.0007 (0.0089) model time 0.0000 (0.0000) loss 7.6592 (6.8965) grad_norm 2.1963 (3.6035) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][60/625] eta 0:03:50 lr 0.000378 wd 0.0500 time 0.3986 (0.4074) data time 0.0007 (0.0077) model time 0.3979 (0.3973) loss 6.5653 (6.8872) grad_norm 2.9831 (3.5056) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][70/625] eta 0:03:49 lr 0.000378 wd 0.0500 time 0.5619 (0.4133) data time 0.0009 (0.0068) model time 0.5610 (0.4227) loss 6.2908 (6.9252) grad_norm 1.9577 (3.4004) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][80/625] eta 0:03:47 lr 0.000378 wd 0.0500 time 0.6024 (0.4181) data time 0.0007 (0.0061) model time 0.6017 (0.4323) loss 6.1189 (6.8993) grad_norm 1.9444 (3.4177) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][90/625] eta 0:03:45 lr 0.000378 wd 0.0500 time 0.3940 (0.4215) data time 0.0008 (0.0055) model time 0.3931 (0.4363) loss 6.9573 (6.8993) grad_norm 2.3373 (3.3055) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][100/625] eta 0:03:44 lr 0.000378 wd 0.0500 time 0.5706 (0.4273) data time 0.0009 (0.0050) model time 0.5697 (0.4449) loss 7.1729 (6.8851) grad_norm 2.4776 (3.2567) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][110/625] eta 0:03:41 lr 0.000378 wd 0.0500 time 0.3970 (0.4310) data time 0.0007 (0.0047) model time 0.3963 (0.4485) loss 5.8955 (6.8642) grad_norm 4.0180 (3.1910) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][120/625] eta 0:03:36 lr 0.000378 wd 0.0500 time 0.3987 (0.4297) data time 0.0009 (0.0044) model time 0.3979 (0.4436) loss 7.9386 (6.8516) grad_norm 2.3489 (3.1309) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][130/625] eta 0:03:31 lr 0.000377 wd 0.0500 time 0.3982 (0.4272) data time 0.0009 (0.0041) model time 0.3973 (0.4378) loss 7.8739 (6.8849) grad_norm 2.2898 (3.1148) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][140/625] eta 0:03:26 lr 0.000377 wd 0.0500 time 0.4035 (0.4252) data time 0.0009 (0.0039) model time 0.4026 (0.4333) loss 7.4289 (6.9122) grad_norm 1.7479 (3.0860) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][150/625] eta 0:03:21 lr 0.000377 wd 0.0500 time 0.3987 (0.4234) data time 0.0007 (0.0037) model time 0.3980 (0.4297) loss 6.3112 (6.9116) grad_norm 2.2720 (3.0634) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][160/625] eta 0:03:16 lr 0.000377 wd 0.0500 time 0.3995 (0.4218) data time 0.0007 (0.0035) model time 0.3987 (0.4268) loss 7.3342 (6.9045) grad_norm 1.9182 (3.0262) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:17:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][170/625] eta 0:03:11 lr 0.000377 wd 0.0500 time 0.3982 (0.4205) data time 0.0009 (0.0034) model time 0.3973 (0.4243) loss 6.0847 (6.8868) grad_norm 1.7451 (2.9953) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][180/625] eta 0:03:06 lr 0.000377 wd 0.0500 time 0.3982 (0.4192) data time 0.0007 (0.0032) model time 0.3975 (0.4222) loss 5.4870 (6.8767) grad_norm 2.8088 (2.9585) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][190/625] eta 0:03:01 lr 0.000377 wd 0.0500 time 0.3993 (0.4183) data time 0.0007 (0.0031) model time 0.3987 (0.4206) loss 6.1494 (6.8553) grad_norm 3.2406 (2.9693) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][200/625] eta 0:02:57 lr 0.000377 wd 0.0500 time 0.3970 (0.4174) data time 0.0009 (0.0030) model time 0.3960 (0.4193) loss 7.4987 (6.8618) grad_norm 2.4275 (2.9465) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][210/625] eta 0:02:52 lr 0.000377 wd 0.0500 time 0.4079 (0.4166) data time 0.0008 (0.0029) model time 0.4071 (0.4180) loss 7.4811 (6.8723) grad_norm 2.6440 (2.9223) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][220/625] eta 0:02:48 lr 0.000377 wd 0.0500 time 0.3982 (0.4168) data time 0.0007 (0.0028) model time 0.3975 (0.4181) loss 6.6686 (6.8826) grad_norm 2.1688 (2.8860) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][230/625] eta 0:02:44 lr 0.000376 wd 0.0500 time 0.4034 (0.4161) data time 0.0007 (0.0027) model time 0.4027 (0.4171) loss 6.1794 (6.8818) grad_norm 2.2393 (2.8534) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][240/625] eta 0:02:39 lr 0.000376 wd 0.0500 time 0.3993 (0.4154) data time 0.0007 (0.0027) model time 0.3986 (0.4161) loss 6.6385 (6.8894) grad_norm 1.9552 (2.8250) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][250/625] eta 0:02:35 lr 0.000376 wd 0.0500 time 0.3994 (0.4148) data time 0.0007 (0.0026) model time 0.3987 (0.4153) loss 7.8411 (6.8908) grad_norm 3.7110 (2.8304) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][260/625] eta 0:02:31 lr 0.000376 wd 0.0500 time 0.3962 (0.4142) data time 0.0007 (0.0025) model time 0.3955 (0.4144) loss 5.9523 (6.8850) grad_norm 2.6464 (2.8122) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][270/625] eta 0:02:26 lr 0.000376 wd 0.0500 time 0.4009 (0.4136) data time 0.0007 (0.0025) model time 0.4002 (0.4137) loss 5.6363 (6.8710) grad_norm 3.5410 (2.8153) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][280/625] eta 0:02:22 lr 0.000376 wd 0.0500 time 0.3978 (0.4130) data time 0.0008 (0.0024) model time 0.3970 (0.4129) loss 6.5677 (6.8664) grad_norm 2.9010 (2.8114) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][290/625] eta 0:02:18 lr 0.000376 wd 0.0500 time 0.4038 (0.4138) data time 0.0009 (0.0024) model time 0.4029 (0.4138) loss 5.7823 (6.8621) grad_norm 2.3129 (2.7989) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][300/625] eta 0:02:15 lr 0.000376 wd 0.0500 time 0.5877 (0.4157) data time 0.0007 (0.0023) model time 0.5870 (0.4161) loss 5.3465 (6.8603) grad_norm 4.1805 (2.7911) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][310/625] eta 0:02:11 lr 0.000376 wd 0.0500 time 0.3992 (0.4166) data time 0.0008 (0.0023) model time 0.3984 (0.4171) loss 5.9481 (6.8548) grad_norm 2.0120 (2.7826) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:18:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][320/625] eta 0:02:07 lr 0.000376 wd 0.0500 time 0.5758 (0.4183) data time 0.0009 (0.0022) model time 0.5750 (0.4191) loss 6.5199 (6.8580) grad_norm 4.0802 (2.7944) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][330/625] eta 0:02:03 lr 0.000375 wd 0.0500 time 0.4099 (0.4196) data time 0.0007 (0.0022) model time 0.4092 (0.4206) loss 6.6999 (6.8526) grad_norm 5.2715 (2.8490) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][340/625] eta 0:01:59 lr 0.000375 wd 0.0500 time 0.4037 (0.4195) data time 0.0007 (0.0022) model time 0.4030 (0.4204) loss 5.9439 (6.8605) grad_norm 2.8681 (2.8723) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][350/625] eta 0:01:55 lr 0.000375 wd 0.0500 time 0.3965 (0.4190) data time 0.0008 (0.0021) model time 0.3957 (0.4197) loss 7.4352 (6.8546) grad_norm 3.2474 (2.8722) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][360/625] eta 0:01:50 lr 0.000375 wd 0.0500 time 0.3970 (0.4184) data time 0.0009 (0.0021) model time 0.3962 (0.4190) loss 6.7260 (6.8518) grad_norm 2.5265 (2.8748) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][370/625] eta 0:01:46 lr 0.000375 wd 0.0500 time 0.3979 (0.4179) data time 0.0007 (0.0021) model time 0.3973 (0.4183) loss 7.5479 (6.8650) grad_norm 2.2644 (2.9812) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][380/625] eta 0:01:42 lr 0.000375 wd 0.0500 time 0.3966 (0.4174) data time 0.0007 (0.0020) model time 0.3959 (0.4177) loss 7.5361 (6.8565) grad_norm 2.7641 (2.9790) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][390/625] eta 0:01:37 lr 0.000375 wd 0.0500 time 0.3988 (0.4170) data time 0.0007 (0.0020) model time 0.3981 (0.4172) loss 6.8812 (6.8533) grad_norm 2.0319 (2.9765) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][400/625] eta 0:01:33 lr 0.000375 wd 0.0500 time 0.4006 (0.4165) data time 0.0009 (0.0020) model time 0.3997 (0.4167) loss 7.7700 (6.8553) grad_norm 2.5929 (2.9704) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][410/625] eta 0:01:29 lr 0.000375 wd 0.0500 time 0.3993 (0.4161) data time 0.0008 (0.0019) model time 0.3985 (0.4162) loss 6.4746 (6.8495) grad_norm 2.1909 (2.9588) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][420/625] eta 0:01:25 lr 0.000375 wd 0.0500 time 0.4001 (0.4157) data time 0.0006 (0.0019) model time 0.3994 (0.4157) loss 6.3715 (6.8531) grad_norm 2.0755 (2.9469) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][430/625] eta 0:01:21 lr 0.000374 wd 0.0500 time 0.4298 (0.4154) data time 0.0009 (0.0019) model time 0.4289 (0.4153) loss 7.3839 (6.8566) grad_norm 2.7782 (2.9581) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][440/625] eta 0:01:16 lr 0.000374 wd 0.0500 time 0.3939 (0.4154) data time 0.0007 (0.0019) model time 0.3932 (0.4153) loss 6.5522 (6.8555) grad_norm 2.0152 (2.9583) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][450/625] eta 0:01:12 lr 0.000374 wd 0.0500 time 0.3962 (0.4150) data time 0.0006 (0.0019) model time 0.3955 (0.4148) loss 6.8068 (6.8599) grad_norm 2.4054 (2.9540) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:19:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][460/625] eta 0:01:08 lr 0.000374 wd 0.0500 time 0.4026 (0.4147) data time 0.0007 (0.0018) model time 0.4019 (0.4144) loss 7.4740 (6.8558) grad_norm 2.1512 (2.9830) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][470/625] eta 0:01:04 lr 0.000374 wd 0.0500 time 0.3986 (0.4144) data time 0.0009 (0.0018) model time 0.3978 (0.4141) loss 6.6189 (6.8586) grad_norm 2.6136 (2.9860) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][480/625] eta 0:01:00 lr 0.000374 wd 0.0500 time 0.4005 (0.4141) data time 0.0007 (0.0018) model time 0.3998 (0.4137) loss 6.2592 (6.8496) grad_norm 1.8488 (3.0316) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][490/625] eta 0:00:55 lr 0.000374 wd 0.0500 time 0.3960 (0.4138) data time 0.0011 (0.0018) model time 0.3948 (0.4133) loss 6.5637 (6.8442) grad_norm 3.9190 (3.0306) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][500/625] eta 0:00:51 lr 0.000374 wd 0.0500 time 0.3990 (0.4135) data time 0.0007 (0.0018) model time 0.3983 (0.4130) loss 7.9242 (6.8477) grad_norm 2.2627 (3.0223) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][510/625] eta 0:00:47 lr 0.000374 wd 0.0500 time 0.3998 (0.4138) data time 0.0007 (0.0018) model time 0.3991 (0.4133) loss 7.5655 (6.8541) grad_norm 2.5234 (3.0128) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][520/625] eta 0:00:43 lr 0.000374 wd 0.0500 time 0.6055 (0.4149) data time 0.0008 (0.0017) model time 0.6047 (0.4146) loss 8.1765 (6.8584) grad_norm 4.5829 (3.0651) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][530/625] eta 0:00:39 lr 0.000373 wd 0.0500 time 0.3970 (0.4154) data time 0.0008 (0.0017) model time 0.3962 (0.4151) loss 7.1878 (6.8642) grad_norm 1.9496 (3.0557) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][540/625] eta 0:00:35 lr 0.000373 wd 0.0500 time 0.6001 (0.4168) data time 0.0007 (0.0017) model time 0.5994 (0.4167) loss 7.2512 (6.8689) grad_norm 2.7787 (3.0449) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][550/625] eta 0:00:31 lr 0.000373 wd 0.0500 time 0.3922 (0.4174) data time 0.0009 (0.0017) model time 0.3913 (0.4173) loss 7.4623 (6.8693) grad_norm 2.1240 (3.0382) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][560/625] eta 0:00:27 lr 0.000373 wd 0.0500 time 0.3957 (0.4173) data time 0.0008 (0.0017) model time 0.3949 (0.4172) loss 7.4876 (6.8695) grad_norm 4.7205 (3.0341) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][570/625] eta 0:00:22 lr 0.000373 wd 0.0500 time 0.4002 (0.4170) data time 0.0008 (0.0017) model time 0.3994 (0.4168) loss 6.5889 (6.8726) grad_norm 1.7034 (3.0206) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][580/625] eta 0:00:18 lr 0.000373 wd 0.0500 time 0.3943 (0.4166) data time 0.0007 (0.0017) model time 0.3936 (0.4164) loss 6.1373 (6.8777) grad_norm 2.9131 (3.0157) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][590/625] eta 0:00:14 lr 0.000373 wd 0.0500 time 0.3974 (0.4163) data time 0.0007 (0.0016) model time 0.3968 (0.4161) loss 6.6375 (6.8787) grad_norm 2.3730 (3.0126) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][600/625] eta 0:00:10 lr 0.000373 wd 0.0500 time 0.3998 (0.4161) data time 0.0006 (0.0016) model time 0.3992 (0.4158) loss 7.4891 (6.8859) grad_norm 2.3571 (3.0059) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:20:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][610/625] eta 0:00:06 lr 0.000373 wd 0.0500 time 0.3938 (0.4159) data time 0.0006 (0.0016) model time 0.3932 (0.4156) loss 6.2013 (6.8861) grad_norm 2.0035 (3.0044) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][620/625] eta 0:00:02 lr 0.000373 wd 0.0500 time 0.3980 (0.4156) data time 0.0004 (0.0016) model time 0.3976 (0.4153) loss 5.8130 (6.8787) grad_norm 2.5106 (3.0028) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 195 training takes 0:04:19 [2024-07-25 06:21:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:21:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:21:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.561 (0.561) Loss 0.5381 (0.5381) Acc@1 89.551 (89.551) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 06:21:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.130) Loss 0.8950 (0.6859) Acc@1 80.908 (86.257) Acc@5 95.947 (97.576) Mem 14939MB [2024-07-25 06:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.109) Loss 1.0020 (0.8060) Acc@1 76.807 (82.961) Acc@5 94.922 (96.505) Mem 14939MB [2024-07-25 06:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.606 Acc@5 96.483 [2024-07-25 06:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-07-25 06:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.61% [2024-07-25 06:21:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 06:21:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 06:21:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.5479 (0.5479) Acc@1 90.088 (90.088) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:21:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.124) Loss 0.8589 (0.6797) Acc@1 81.445 (86.506) Acc@5 96.387 (97.767) Mem 14939MB [2024-07-25 06:21:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.106) Loss 0.9883 (0.7945) Acc@1 77.002 (83.350) Acc@5 95.312 (96.626) Mem 14939MB [2024-07-25 06:21:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.951 Acc@5 96.587 [2024-07-25 06:21:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 06:21:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.95% [2024-07-25 06:21:12 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:21:13 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:21:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][0/625] eta 0:08:07 lr 0.000373 wd 0.0500 time 0.7792 (0.7792) data time 0.3992 (0.3992) model time 0.0000 (0.0000) loss 6.5516 (6.5516) grad_norm 2.1716 (2.1716) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][10/625] eta 0:04:25 lr 0.000372 wd 0.0500 time 0.3944 (0.4323) data time 0.0006 (0.0370) model time 0.0000 (0.0000) loss 7.6484 (6.9953) grad_norm 2.8780 (2.2541) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][20/625] eta 0:04:12 lr 0.000372 wd 0.0500 time 0.3975 (0.4167) data time 0.0006 (0.0198) model time 0.0000 (0.0000) loss 8.1556 (6.9019) grad_norm 1.9937 (2.7903) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][30/625] eta 0:04:04 lr 0.000372 wd 0.0500 time 0.3988 (0.4109) data time 0.0008 (0.0137) model time 0.0000 (0.0000) loss 6.3589 (6.9830) grad_norm 3.1009 (3.0433) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][40/625] eta 0:03:58 lr 0.000372 wd 0.0500 time 0.3963 (0.4076) data time 0.0008 (0.0106) model time 0.0000 (0.0000) loss 8.0321 (7.0471) grad_norm 2.4918 (3.0464) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][50/625] eta 0:03:53 lr 0.000372 wd 0.0500 time 0.3997 (0.4054) data time 0.0006 (0.0087) model time 0.0000 (0.0000) loss 6.7741 (7.0517) grad_norm 7.2170 (3.2413) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][60/625] eta 0:03:48 lr 0.000372 wd 0.0500 time 0.3996 (0.4043) data time 0.0008 (0.0074) model time 0.3988 (0.3975) loss 7.7670 (7.0366) grad_norm 2.7858 (3.1348) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][70/625] eta 0:03:43 lr 0.000372 wd 0.0500 time 0.3985 (0.4034) data time 0.0009 (0.0065) model time 0.3977 (0.3971) loss 7.7685 (7.0791) grad_norm 3.9611 (3.0243) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][80/625] eta 0:03:39 lr 0.000372 wd 0.0500 time 0.3971 (0.4026) data time 0.0008 (0.0059) model time 0.3962 (0.3965) loss 7.0822 (7.1283) grad_norm 2.1594 (3.2643) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][90/625] eta 0:03:35 lr 0.000372 wd 0.0500 time 0.4016 (0.4023) data time 0.0008 (0.0054) model time 0.4008 (0.3971) loss 6.4284 (7.1043) grad_norm 2.7934 (3.2448) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][100/625] eta 0:03:32 lr 0.000372 wd 0.0500 time 0.5976 (0.4055) data time 0.0008 (0.0049) model time 0.5968 (0.4044) loss 7.0563 (7.0942) grad_norm 3.3061 (3.2608) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:21:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][110/625] eta 0:03:29 lr 0.000371 wd 0.0500 time 0.4114 (0.4061) data time 0.0006 (0.0046) model time 0.4109 (0.4056) loss 6.4850 (7.0902) grad_norm 1.9127 (3.2120) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][120/625] eta 0:03:29 lr 0.000371 wd 0.0500 time 0.6021 (0.4146) data time 0.0007 (0.0043) model time 0.6014 (0.4202) loss 5.5975 (7.0695) grad_norm 3.0773 (3.1903) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][130/625] eta 0:03:26 lr 0.000371 wd 0.0500 time 0.5970 (0.4165) data time 0.0008 (0.0040) model time 0.5961 (0.4225) loss 6.1613 (7.0440) grad_norm 3.0245 (3.2323) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][140/625] eta 0:03:24 lr 0.000371 wd 0.0500 time 0.5769 (0.4215) data time 0.0007 (0.0038) model time 0.5762 (0.4295) loss 6.8929 (7.0618) grad_norm 2.8869 (3.2284) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][150/625] eta 0:03:20 lr 0.000371 wd 0.0500 time 0.5956 (0.4224) data time 0.0009 (0.0036) model time 0.5948 (0.4301) loss 7.4005 (7.0830) grad_norm 2.1679 (3.2073) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][160/625] eta 0:03:15 lr 0.000371 wd 0.0500 time 0.3976 (0.4210) data time 0.0009 (0.0034) model time 0.3967 (0.4272) loss 5.8301 (7.0697) grad_norm 3.2169 (3.1648) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][170/625] eta 0:03:11 lr 0.000371 wd 0.0500 time 0.4027 (0.4205) data time 0.0006 (0.0033) model time 0.4021 (0.4259) loss 7.6867 (7.0812) grad_norm 2.8866 (3.1608) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][180/625] eta 0:03:06 lr 0.000371 wd 0.0500 time 0.3965 (0.4192) data time 0.0008 (0.0032) model time 0.3957 (0.4237) loss 6.6499 (7.0756) grad_norm 2.0227 (3.1053) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][190/625] eta 0:03:01 lr 0.000371 wd 0.0500 time 0.3969 (0.4182) data time 0.0006 (0.0030) model time 0.3962 (0.4219) loss 4.7862 (7.0468) grad_norm 1.9658 (3.0713) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][200/625] eta 0:02:57 lr 0.000371 wd 0.0500 time 0.3965 (0.4172) data time 0.0009 (0.0029) model time 0.3956 (0.4202) loss 7.0769 (7.0563) grad_norm 2.2378 (3.0562) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:22:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][210/625] eta 0:02:52 lr 0.000370 wd 0.0500 time 0.3963 (0.4164) data time 0.0008 (0.0028) model time 0.3955 (0.4190) loss 6.0154 (7.0306) grad_norm 2.4463 (3.0595) loss_scale 512.0000 (260.8531) mem 14939MB [2024-07-25 06:22:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][220/625] eta 0:02:48 lr 0.000370 wd 0.0500 time 0.3985 (0.4159) data time 0.0007 (0.0027) model time 0.3979 (0.4180) loss 6.6854 (7.0198) grad_norm 3.2745 (3.0698) loss_scale 512.0000 (272.2172) mem 14939MB [2024-07-25 06:22:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][230/625] eta 0:02:44 lr 0.000370 wd 0.0500 time 0.3976 (0.4152) data time 0.0006 (0.0027) model time 0.3970 (0.4171) loss 6.3501 (7.0025) grad_norm 3.1450 (3.0575) loss_scale 512.0000 (282.5974) mem 14939MB [2024-07-25 06:22:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][240/625] eta 0:02:39 lr 0.000370 wd 0.0500 time 0.4005 (0.4146) data time 0.0006 (0.0026) model time 0.3999 (0.4161) loss 6.4819 (7.0060) grad_norm 3.1030 (3.0872) loss_scale 512.0000 (292.1162) mem 14939MB [2024-07-25 06:22:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][250/625] eta 0:02:35 lr 0.000370 wd 0.0500 time 0.3971 (0.4141) data time 0.0006 (0.0025) model time 0.3965 (0.4153) loss 7.0082 (7.0054) grad_norm 3.1064 (3.1101) loss_scale 512.0000 (300.8765) mem 14939MB [2024-07-25 06:23:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][260/625] eta 0:02:30 lr 0.000370 wd 0.0500 time 0.3954 (0.4135) data time 0.0006 (0.0024) model time 0.3948 (0.4145) loss 6.9505 (7.0114) grad_norm 1.8535 (3.0855) loss_scale 512.0000 (308.9655) mem 14939MB [2024-07-25 06:23:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][270/625] eta 0:02:26 lr 0.000370 wd 0.0500 time 0.3985 (0.4130) data time 0.0006 (0.0024) model time 0.3979 (0.4138) loss 7.6249 (7.0201) grad_norm 2.6120 (3.0505) loss_scale 512.0000 (316.4576) mem 14939MB [2024-07-25 06:23:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][280/625] eta 0:02:22 lr 0.000370 wd 0.0500 time 0.3958 (0.4124) data time 0.0008 (0.0023) model time 0.3950 (0.4131) loss 6.6510 (7.0232) grad_norm 1.9820 (3.0814) loss_scale 512.0000 (323.4164) mem 14939MB [2024-07-25 06:23:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][290/625] eta 0:02:18 lr 0.000370 wd 0.0500 time 0.3948 (0.4120) data time 0.0006 (0.0023) model time 0.3942 (0.4125) loss 8.1612 (7.0197) grad_norm 2.8555 (3.0691) loss_scale 512.0000 (329.8969) mem 14939MB [2024-07-25 06:23:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][300/625] eta 0:02:13 lr 0.000370 wd 0.0500 time 0.3956 (0.4115) data time 0.0009 (0.0022) model time 0.3947 (0.4118) loss 7.0654 (7.0221) grad_norm 3.1602 (3.0494) loss_scale 512.0000 (335.9468) mem 14939MB [2024-07-25 06:23:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][310/625] eta 0:02:09 lr 0.000370 wd 0.0500 time 0.3981 (0.4112) data time 0.0008 (0.0022) model time 0.3972 (0.4114) loss 7.0816 (7.0060) grad_norm 2.7226 (3.0990) loss_scale 512.0000 (341.6077) mem 14939MB [2024-07-25 06:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][320/625] eta 0:02:05 lr 0.000369 wd 0.0500 time 0.3971 (0.4109) data time 0.0008 (0.0022) model time 0.3963 (0.4110) loss 7.0947 (7.0059) grad_norm 2.4312 (3.1240) loss_scale 512.0000 (346.9159) mem 14939MB [2024-07-25 06:23:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][330/625] eta 0:02:01 lr 0.000369 wd 0.0500 time 0.5807 (0.4122) data time 0.0007 (0.0021) model time 0.5799 (0.4125) loss 6.3577 (7.0050) grad_norm 2.8585 (3.0955) loss_scale 512.0000 (351.9033) mem 14939MB [2024-07-25 06:23:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][340/625] eta 0:01:58 lr 0.000369 wd 0.0500 time 0.5889 (0.4145) data time 0.0009 (0.0021) model time 0.5881 (0.4152) loss 6.2650 (7.0031) grad_norm 3.1435 (3.0990) loss_scale 512.0000 (356.5982) mem 14939MB [2024-07-25 06:23:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][350/625] eta 0:01:54 lr 0.000369 wd 0.0500 time 0.5763 (0.4152) data time 0.0007 (0.0021) model time 0.5756 (0.4159) loss 7.7694 (6.9968) grad_norm 2.6922 (3.0937) loss_scale 512.0000 (361.0256) mem 14939MB [2024-07-25 06:23:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][360/625] eta 0:01:50 lr 0.000369 wd 0.0500 time 0.5902 (0.4177) data time 0.0008 (0.0020) model time 0.5894 (0.4188) loss 7.1500 (6.9920) grad_norm 4.4238 (3.1039) loss_scale 512.0000 (365.2078) mem 14939MB [2024-07-25 06:23:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][370/625] eta 0:01:46 lr 0.000369 wd 0.0500 time 0.5936 (0.4188) data time 0.0008 (0.0020) model time 0.5927 (0.4200) loss 6.8998 (7.0003) grad_norm 2.1859 (3.0952) loss_scale 512.0000 (369.1644) mem 14939MB [2024-07-25 06:23:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][380/625] eta 0:01:42 lr 0.000369 wd 0.0500 time 0.3986 (0.4183) data time 0.0008 (0.0020) model time 0.3978 (0.4193) loss 7.9040 (6.9980) grad_norm 3.3828 (3.0816) loss_scale 512.0000 (372.9134) mem 14939MB [2024-07-25 06:23:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][390/625] eta 0:01:38 lr 0.000369 wd 0.0500 time 0.3978 (0.4181) data time 0.0008 (0.0019) model time 0.3970 (0.4191) loss 6.9614 (6.9934) grad_norm 10.2511 (3.0876) loss_scale 512.0000 (376.4706) mem 14939MB [2024-07-25 06:24:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][400/625] eta 0:01:33 lr 0.000369 wd 0.0500 time 0.3991 (0.4177) data time 0.0007 (0.0019) model time 0.3984 (0.4185) loss 5.4283 (6.9921) grad_norm 3.4360 (3.1509) loss_scale 512.0000 (379.8504) mem 14939MB [2024-07-25 06:24:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][410/625] eta 0:01:29 lr 0.000369 wd 0.0500 time 0.4028 (0.4172) data time 0.0009 (0.0019) model time 0.4018 (0.4180) loss 6.1991 (6.9880) grad_norm 3.2262 (3.1572) loss_scale 512.0000 (383.0657) mem 14939MB [2024-07-25 06:24:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][420/625] eta 0:01:25 lr 0.000368 wd 0.0500 time 0.4026 (0.4168) data time 0.0009 (0.0019) model time 0.4017 (0.4174) loss 8.1226 (6.9864) grad_norm 2.7809 (3.1409) loss_scale 512.0000 (386.1283) mem 14939MB [2024-07-25 06:24:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][430/625] eta 0:01:21 lr 0.000368 wd 0.0500 time 0.4034 (0.4164) data time 0.0008 (0.0019) model time 0.4026 (0.4169) loss 7.2392 (6.9967) grad_norm 2.3834 (3.1275) loss_scale 512.0000 (389.0487) mem 14939MB [2024-07-25 06:24:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][440/625] eta 0:01:16 lr 0.000368 wd 0.0500 time 0.3982 (0.4160) data time 0.0006 (0.0018) model time 0.3976 (0.4164) loss 6.5248 (6.9952) grad_norm 2.2156 (3.1111) loss_scale 512.0000 (391.8367) mem 14939MB [2024-07-25 06:24:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][450/625] eta 0:01:12 lr 0.000368 wd 0.0500 time 0.4067 (0.4157) data time 0.0009 (0.0018) model time 0.4057 (0.4160) loss 7.7326 (6.9906) grad_norm 3.8997 (3.0949) loss_scale 512.0000 (394.5011) mem 14939MB [2024-07-25 06:24:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][460/625] eta 0:01:08 lr 0.000368 wd 0.0500 time 0.3979 (0.4153) data time 0.0010 (0.0018) model time 0.3969 (0.4156) loss 6.9945 (6.9874) grad_norm 3.0987 (3.0890) loss_scale 512.0000 (397.0499) mem 14939MB [2024-07-25 06:24:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][470/625] eta 0:01:04 lr 0.000368 wd 0.0500 time 0.3988 (0.4150) data time 0.0006 (0.0018) model time 0.3982 (0.4152) loss 7.1141 (6.9779) grad_norm 2.2090 (3.0781) loss_scale 512.0000 (399.4904) mem 14939MB [2024-07-25 06:24:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][480/625] eta 0:01:00 lr 0.000368 wd 0.0500 time 0.3972 (0.4147) data time 0.0008 (0.0017) model time 0.3964 (0.4149) loss 6.0781 (6.9664) grad_norm 2.7555 (3.0711) loss_scale 512.0000 (401.8295) mem 14939MB [2024-07-25 06:24:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][490/625] eta 0:00:55 lr 0.000368 wd 0.0500 time 0.4013 (0.4144) data time 0.0008 (0.0017) model time 0.4005 (0.4145) loss 7.3072 (6.9660) grad_norm 2.7185 (3.0758) loss_scale 512.0000 (404.0733) mem 14939MB [2024-07-25 06:24:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][500/625] eta 0:00:51 lr 0.000368 wd 0.0500 time 0.3990 (0.4141) data time 0.0006 (0.0017) model time 0.3984 (0.4142) loss 7.3552 (6.9637) grad_norm 2.9960 (3.0846) loss_scale 512.0000 (406.2275) mem 14939MB [2024-07-25 06:24:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][510/625] eta 0:00:47 lr 0.000368 wd 0.0500 time 0.4000 (0.4139) data time 0.0008 (0.0017) model time 0.3992 (0.4139) loss 6.7058 (6.9578) grad_norm 4.0274 (3.0852) loss_scale 512.0000 (408.2975) mem 14939MB [2024-07-25 06:24:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][520/625] eta 0:00:43 lr 0.000367 wd 0.0500 time 0.4003 (0.4136) data time 0.0006 (0.0017) model time 0.3996 (0.4136) loss 7.4748 (6.9527) grad_norm 4.6818 (3.0814) loss_scale 512.0000 (410.2879) mem 14939MB [2024-07-25 06:24:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][530/625] eta 0:00:39 lr 0.000367 wd 0.0500 time 0.3994 (0.4134) data time 0.0008 (0.0017) model time 0.3985 (0.4133) loss 6.3969 (6.9543) grad_norm 7.5571 (3.0803) loss_scale 512.0000 (412.2034) mem 14939MB [2024-07-25 06:24:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][540/625] eta 0:00:35 lr 0.000367 wd 0.0500 time 0.3993 (0.4131) data time 0.0007 (0.0017) model time 0.3986 (0.4130) loss 6.9378 (6.9560) grad_norm 1.9198 (3.0792) loss_scale 512.0000 (414.0481) mem 14939MB [2024-07-25 06:25:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][550/625] eta 0:00:31 lr 0.000367 wd 0.0500 time 0.5624 (0.4135) data time 0.0007 (0.0016) model time 0.5617 (0.4134) loss 5.5556 (6.9549) grad_norm 3.0205 (3.1055) loss_scale 512.0000 (415.8258) mem 14939MB [2024-07-25 06:25:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][560/625] eta 0:00:26 lr 0.000367 wd 0.0500 time 0.3986 (0.4146) data time 0.0007 (0.0016) model time 0.3979 (0.4145) loss 5.3034 (6.9506) grad_norm 4.9917 (3.1092) loss_scale 512.0000 (417.5401) mem 14939MB [2024-07-25 06:25:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][570/625] eta 0:00:22 lr 0.000367 wd 0.0500 time 0.5860 (0.4153) data time 0.0008 (0.0016) model time 0.5852 (0.4153) loss 5.9638 (6.9487) grad_norm 1.9040 (3.1023) loss_scale 512.0000 (419.1944) mem 14939MB [2024-07-25 06:25:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][580/625] eta 0:00:18 lr 0.000367 wd 0.0500 time 0.5651 (0.4168) data time 0.0006 (0.0016) model time 0.5644 (0.4169) loss 5.9604 (6.9456) grad_norm 2.1778 (3.0954) loss_scale 512.0000 (420.7917) mem 14939MB [2024-07-25 06:25:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][590/625] eta 0:00:14 lr 0.000367 wd 0.0500 time 0.3962 (0.4174) data time 0.0009 (0.0016) model time 0.3953 (0.4176) loss 7.7035 (6.9472) grad_norm 2.3491 (3.0936) loss_scale 512.0000 (422.3350) mem 14939MB [2024-07-25 06:25:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][600/625] eta 0:00:10 lr 0.000367 wd 0.0500 time 0.3999 (0.4171) data time 0.0006 (0.0016) model time 0.3993 (0.4173) loss 6.8923 (6.9491) grad_norm 3.7089 (3.0922) loss_scale 512.0000 (423.8270) mem 14939MB [2024-07-25 06:25:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][610/625] eta 0:00:06 lr 0.000367 wd 0.0500 time 0.3985 (0.4171) data time 0.0005 (0.0016) model time 0.3981 (0.4172) loss 6.2823 (6.9444) grad_norm 2.3655 (3.0958) loss_scale 512.0000 (425.2700) mem 14939MB [2024-07-25 06:25:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][620/625] eta 0:00:02 lr 0.000366 wd 0.0500 time 0.3967 (0.4168) data time 0.0004 (0.0016) model time 0.3964 (0.4169) loss 6.5434 (6.9392) grad_norm 3.9018 (3.0937) loss_scale 512.0000 (426.6667) mem 14939MB [2024-07-25 06:25:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 196 training takes 0:04:20 [2024-07-25 06:25:33 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:25:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:25:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 0.5879 (0.5879) Acc@1 89.307 (89.307) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 06:25:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8916 (0.7146) Acc@1 80.957 (86.115) Acc@5 96.094 (97.590) Mem 14939MB [2024-07-25 06:25:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0010 (0.8284) Acc@1 77.148 (82.954) Acc@5 95.166 (96.452) Mem 14939MB [2024-07-25 06:25:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.548 Acc@5 96.421 [2024-07-25 06:25:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 06:25:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.916 (0.916) Loss 0.5469 (0.5469) Acc@1 90.088 (90.088) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:25:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.162) Loss 0.8574 (0.6792) Acc@1 81.543 (86.475) Acc@5 96.338 (97.763) Mem 14939MB [2024-07-25 06:25:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.126) Loss 0.9868 (0.7938) Acc@1 76.953 (83.329) Acc@5 95.312 (96.652) Mem 14939MB [2024-07-25 06:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.921 Acc@5 96.613 [2024-07-25 06:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-07-25 06:25:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][0/625] eta 0:13:28 lr 0.000366 wd 0.0500 time 1.2943 (1.2943) data time 0.4937 (0.4937) model time 0.0000 (0.0000) loss 7.4922 (7.4922) grad_norm 2.3391 (2.3391) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:25:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][10/625] eta 0:04:56 lr 0.000366 wd 0.0500 time 0.4001 (0.4823) data time 0.0008 (0.0457) model time 0.0000 (0.0000) loss 6.3442 (6.9896) grad_norm 2.7218 (2.3989) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:25:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][20/625] eta 0:04:27 lr 0.000366 wd 0.0500 time 0.3972 (0.4418) data time 0.0008 (0.0244) model time 0.0000 (0.0000) loss 8.4240 (7.1183) grad_norm 2.9114 (2.6443) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:25:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][30/625] eta 0:04:14 lr 0.000366 wd 0.0500 time 0.3973 (0.4278) data time 0.0006 (0.0168) model time 0.0000 (0.0000) loss 6.8720 (7.1070) grad_norm 3.2232 (3.1679) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:25:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][40/625] eta 0:04:06 lr 0.000366 wd 0.0500 time 0.3993 (0.4208) data time 0.0008 (0.0129) model time 0.0000 (0.0000) loss 7.4920 (7.0373) grad_norm 2.6410 (3.0446) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][50/625] eta 0:03:59 lr 0.000366 wd 0.0500 time 0.3979 (0.4168) data time 0.0008 (0.0106) model time 0.0000 (0.0000) loss 6.8175 (7.0628) grad_norm 2.4980 (3.1408) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][60/625] eta 0:03:54 lr 0.000366 wd 0.0500 time 0.3993 (0.4142) data time 0.0008 (0.0090) model time 0.3985 (0.4001) loss 7.9464 (7.0892) grad_norm 2.5198 (3.2676) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][70/625] eta 0:03:48 lr 0.000366 wd 0.0500 time 0.3994 (0.4121) data time 0.0006 (0.0078) model time 0.3988 (0.3995) loss 6.5871 (7.0254) grad_norm 2.1039 (3.1628) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][80/625] eta 0:03:43 lr 0.000366 wd 0.0500 time 0.3979 (0.4107) data time 0.0008 (0.0069) model time 0.3971 (0.3996) loss 7.4483 (7.0369) grad_norm 1.8589 (3.0812) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][90/625] eta 0:03:39 lr 0.000366 wd 0.0500 time 0.4006 (0.4095) data time 0.0006 (0.0063) model time 0.4000 (0.3995) loss 6.5585 (6.9980) grad_norm 2.6951 (3.0738) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][100/625] eta 0:03:34 lr 0.000365 wd 0.0500 time 0.3972 (0.4085) data time 0.0006 (0.0057) model time 0.3966 (0.3992) loss 5.8715 (6.9836) grad_norm 4.2028 (3.0431) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][110/625] eta 0:03:29 lr 0.000365 wd 0.0500 time 0.3967 (0.4076) data time 0.0008 (0.0053) model time 0.3959 (0.3990) loss 7.2309 (7.0000) grad_norm 2.2106 (3.0084) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][120/625] eta 0:03:25 lr 0.000365 wd 0.0500 time 0.3990 (0.4069) data time 0.0008 (0.0049) model time 0.3982 (0.3989) loss 5.9738 (6.9502) grad_norm 2.5038 (2.9940) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][130/625] eta 0:03:21 lr 0.000365 wd 0.0500 time 0.4046 (0.4063) data time 0.0008 (0.0046) model time 0.4038 (0.3988) loss 7.5513 (6.9386) grad_norm 3.3094 (2.9584) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][140/625] eta 0:03:17 lr 0.000365 wd 0.0500 time 0.3965 (0.4081) data time 0.0007 (0.0043) model time 0.3958 (0.4024) loss 6.4715 (6.9507) grad_norm 5.6443 (2.9838) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][150/625] eta 0:03:15 lr 0.000365 wd 0.0500 time 0.5943 (0.4112) data time 0.0006 (0.0041) model time 0.5937 (0.4076) loss 6.2600 (6.9538) grad_norm 3.5308 (2.9570) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][160/625] eta 0:03:12 lr 0.000365 wd 0.0500 time 0.3955 (0.4146) data time 0.0006 (0.0039) model time 0.3949 (0.4128) loss 6.5328 (6.9361) grad_norm 2.0215 (3.0361) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][170/625] eta 0:03:09 lr 0.000365 wd 0.0500 time 0.5328 (0.4166) data time 0.0006 (0.0037) model time 0.5323 (0.4157) loss 6.7260 (6.9163) grad_norm 1.9097 (3.0393) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:26:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][180/625] eta 0:03:07 lr 0.000365 wd 0.0500 time 0.6018 (0.4209) data time 0.0006 (0.0036) model time 0.6012 (0.4216) loss 6.2808 (6.9051) grad_norm 2.4766 (3.0144) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][190/625] eta 0:03:02 lr 0.000365 wd 0.0500 time 0.3966 (0.4206) data time 0.0006 (0.0034) model time 0.3960 (0.4211) loss 7.2663 (6.9003) grad_norm 2.8580 (2.9957) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][200/625] eta 0:02:58 lr 0.000364 wd 0.0500 time 0.3970 (0.4196) data time 0.0007 (0.0033) model time 0.3964 (0.4197) loss 6.4477 (6.8994) grad_norm 3.6188 (2.9859) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][210/625] eta 0:02:53 lr 0.000364 wd 0.0500 time 0.4001 (0.4186) data time 0.0011 (0.0032) model time 0.3990 (0.4183) loss 7.8458 (6.9023) grad_norm 3.5321 (2.9815) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][220/625] eta 0:02:49 lr 0.000364 wd 0.0500 time 0.3997 (0.4178) data time 0.0007 (0.0031) model time 0.3990 (0.4172) loss 6.6596 (6.9034) grad_norm 4.3963 (2.9771) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][230/625] eta 0:02:44 lr 0.000364 wd 0.0500 time 0.3985 (0.4172) data time 0.0007 (0.0030) model time 0.3978 (0.4164) loss 6.4025 (6.8988) grad_norm 3.1167 (2.9750) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][240/625] eta 0:02:40 lr 0.000364 wd 0.0500 time 0.4090 (0.4165) data time 0.0006 (0.0029) model time 0.4084 (0.4156) loss 7.2864 (6.9077) grad_norm 2.7857 (2.9747) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][250/625] eta 0:02:35 lr 0.000364 wd 0.0500 time 0.3965 (0.4158) data time 0.0008 (0.0028) model time 0.3957 (0.4147) loss 5.6721 (6.9061) grad_norm 2.6970 (3.0088) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][260/625] eta 0:02:31 lr 0.000364 wd 0.0500 time 0.3981 (0.4153) data time 0.0007 (0.0028) model time 0.3974 (0.4140) loss 6.1330 (6.9033) grad_norm 2.4800 (3.0037) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][270/625] eta 0:02:27 lr 0.000364 wd 0.0500 time 0.4044 (0.4147) data time 0.0009 (0.0027) model time 0.4035 (0.4133) loss 8.1935 (6.8969) grad_norm 1.9662 (2.9945) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][280/625] eta 0:02:22 lr 0.000364 wd 0.0500 time 0.3932 (0.4143) data time 0.0007 (0.0026) model time 0.3925 (0.4128) loss 7.1201 (6.8943) grad_norm 2.0637 (2.9959) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][290/625] eta 0:02:18 lr 0.000364 wd 0.0500 time 0.3969 (0.4138) data time 0.0007 (0.0026) model time 0.3962 (0.4122) loss 7.0193 (6.8831) grad_norm 2.1997 (2.9976) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][300/625] eta 0:02:14 lr 0.000364 wd 0.0500 time 0.3981 (0.4133) data time 0.0009 (0.0025) model time 0.3972 (0.4117) loss 6.7883 (6.8867) grad_norm 1.8108 (2.9884) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][310/625] eta 0:02:10 lr 0.000363 wd 0.0500 time 0.3998 (0.4129) data time 0.0008 (0.0025) model time 0.3990 (0.4112) loss 6.3414 (6.8993) grad_norm 4.0151 (2.9888) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][320/625] eta 0:02:05 lr 0.000363 wd 0.0500 time 0.3948 (0.4125) data time 0.0010 (0.0024) model time 0.3937 (0.4108) loss 6.0799 (6.8931) grad_norm 2.8223 (2.9920) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:27:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][330/625] eta 0:02:01 lr 0.000363 wd 0.0500 time 0.4004 (0.4122) data time 0.0006 (0.0024) model time 0.3998 (0.4105) loss 6.0351 (6.8864) grad_norm 2.0671 (2.9863) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][340/625] eta 0:01:57 lr 0.000363 wd 0.0500 time 0.4005 (0.4119) data time 0.0006 (0.0023) model time 0.3999 (0.4101) loss 5.9068 (6.8842) grad_norm 2.2971 (2.9713) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][350/625] eta 0:01:53 lr 0.000363 wd 0.0500 time 0.4005 (0.4115) data time 0.0008 (0.0023) model time 0.3997 (0.4097) loss 7.6498 (6.8892) grad_norm 2.3222 (2.9723) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][360/625] eta 0:01:49 lr 0.000363 wd 0.0500 time 0.3902 (0.4120) data time 0.0010 (0.0023) model time 0.3891 (0.4103) loss 6.2092 (6.8870) grad_norm 2.9461 (2.9616) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][370/625] eta 0:01:45 lr 0.000363 wd 0.0500 time 0.5863 (0.4135) data time 0.0008 (0.0022) model time 0.5855 (0.4121) loss 7.7911 (6.8928) grad_norm 3.2035 (2.9647) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][380/625] eta 0:01:41 lr 0.000363 wd 0.0500 time 0.3997 (0.4151) data time 0.0008 (0.0022) model time 0.3989 (0.4139) loss 7.4883 (6.8995) grad_norm 1.7251 (2.9533) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][390/625] eta 0:01:37 lr 0.000363 wd 0.0500 time 0.5792 (0.4166) data time 0.0006 (0.0022) model time 0.5785 (0.4156) loss 7.2222 (6.8997) grad_norm 1.9825 (2.9504) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][400/625] eta 0:01:34 lr 0.000363 wd 0.0500 time 0.4132 (0.4181) data time 0.0009 (0.0021) model time 0.4124 (0.4174) loss 7.5381 (6.9000) grad_norm 2.3056 (2.9346) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][410/625] eta 0:01:29 lr 0.000362 wd 0.0500 time 0.3960 (0.4176) data time 0.0008 (0.0021) model time 0.3952 (0.4168) loss 7.5255 (6.9053) grad_norm 1.7470 (2.9225) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][420/625] eta 0:01:25 lr 0.000362 wd 0.0500 time 0.3974 (0.4171) data time 0.0006 (0.0021) model time 0.3968 (0.4163) loss 7.9257 (6.9114) grad_norm 2.7538 (2.9167) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][430/625] eta 0:01:21 lr 0.000362 wd 0.0500 time 0.3994 (0.4167) data time 0.0010 (0.0020) model time 0.3984 (0.4158) loss 8.3913 (6.9159) grad_norm 1.9740 (2.9108) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][440/625] eta 0:01:17 lr 0.000362 wd 0.0500 time 0.3969 (0.4162) data time 0.0009 (0.0020) model time 0.3960 (0.4152) loss 7.4870 (6.9180) grad_norm 2.4938 (2.9155) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][450/625] eta 0:01:12 lr 0.000362 wd 0.0500 time 0.3989 (0.4159) data time 0.0009 (0.0020) model time 0.3980 (0.4149) loss 8.3806 (6.9258) grad_norm 2.8285 (2.9372) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][460/625] eta 0:01:08 lr 0.000362 wd 0.0500 time 0.4209 (0.4157) data time 0.0006 (0.0020) model time 0.4203 (0.4146) loss 6.0332 (6.9175) grad_norm 4.1170 (2.9386) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][470/625] eta 0:01:04 lr 0.000362 wd 0.0500 time 0.3999 (0.4153) data time 0.0009 (0.0020) model time 0.3989 (0.4142) loss 6.4248 (6.9225) grad_norm 2.8750 (2.9554) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:28:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][480/625] eta 0:01:00 lr 0.000362 wd 0.0500 time 0.3995 (0.4151) data time 0.0007 (0.0019) model time 0.3988 (0.4139) loss 7.7806 (6.9198) grad_norm 5.4466 (3.0367) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][490/625] eta 0:00:55 lr 0.000362 wd 0.0500 time 0.3982 (0.4147) data time 0.0009 (0.0019) model time 0.3973 (0.4136) loss 7.2319 (6.9132) grad_norm 2.1578 (3.0238) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][500/625] eta 0:00:51 lr 0.000362 wd 0.0500 time 0.3997 (0.4144) data time 0.0009 (0.0019) model time 0.3989 (0.4133) loss 6.7718 (6.9129) grad_norm 2.3359 (3.0105) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][510/625] eta 0:00:47 lr 0.000361 wd 0.0500 time 0.3961 (0.4142) data time 0.0009 (0.0019) model time 0.3952 (0.4130) loss 5.9199 (6.9150) grad_norm 2.8741 (3.0021) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][520/625] eta 0:00:43 lr 0.000361 wd 0.0500 time 0.4682 (0.4141) data time 0.0006 (0.0019) model time 0.4676 (0.4129) loss 6.7156 (6.9208) grad_norm 2.9911 (2.9997) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][530/625] eta 0:00:39 lr 0.000361 wd 0.0500 time 0.3932 (0.4139) data time 0.0009 (0.0018) model time 0.3923 (0.4127) loss 7.4784 (6.9239) grad_norm 1.8362 (2.9937) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][540/625] eta 0:00:35 lr 0.000361 wd 0.0500 time 0.3987 (0.4137) data time 0.0006 (0.0018) model time 0.3981 (0.4125) loss 7.6654 (6.9211) grad_norm 1.8681 (2.9861) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][550/625] eta 0:00:31 lr 0.000361 wd 0.0500 time 0.4014 (0.4134) data time 0.0006 (0.0018) model time 0.4007 (0.4122) loss 5.8335 (6.9184) grad_norm 1.8885 (2.9763) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][560/625] eta 0:00:26 lr 0.000361 wd 0.0500 time 0.4007 (0.4132) data time 0.0008 (0.0018) model time 0.3999 (0.4120) loss 6.5956 (6.9135) grad_norm 3.8920 (2.9879) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][570/625] eta 0:00:22 lr 0.000361 wd 0.0500 time 0.3971 (0.4130) data time 0.0007 (0.0018) model time 0.3964 (0.4117) loss 6.0934 (6.9136) grad_norm 1.9703 (2.9800) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][580/625] eta 0:00:18 lr 0.000361 wd 0.0500 time 0.5901 (0.4134) data time 0.0007 (0.0018) model time 0.5894 (0.4121) loss 5.6166 (6.9064) grad_norm 33.6851 (3.0248) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][590/625] eta 0:00:14 lr 0.000361 wd 0.0500 time 0.5990 (0.4145) data time 0.0007 (0.0018) model time 0.5983 (0.4134) loss 7.9793 (6.9070) grad_norm 3.1326 (3.0242) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][600/625] eta 0:00:10 lr 0.000361 wd 0.0500 time 0.5635 (0.4152) data time 0.0009 (0.0018) model time 0.5627 (0.4141) loss 6.0325 (6.9007) grad_norm 2.6053 (3.0189) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][610/625] eta 0:00:06 lr 0.000360 wd 0.0500 time 0.5741 (0.4160) data time 0.0004 (0.0017) model time 0.5736 (0.4150) loss 6.6830 (6.8962) grad_norm 2.3304 (3.0164) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:29:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][620/625] eta 0:00:02 lr 0.000360 wd 0.0500 time 0.5596 (0.4172) data time 0.0005 (0.0017) model time 0.5592 (0.4163) loss 7.4061 (6.9026) grad_norm 2.3250 (3.0093) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 197 training takes 0:04:20 [2024-07-25 06:30:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:30:01 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:30:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5752 (0.5752) Acc@1 88.525 (88.525) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 06:30:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8701 (0.6853) Acc@1 80.957 (86.217) Acc@5 96.777 (97.705) Mem 14939MB [2024-07-25 06:30:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 1.0039 (0.8042) Acc@1 76.416 (83.078) Acc@5 94.189 (96.482) Mem 14939MB [2024-07-25 06:30:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.668 Acc@5 96.461 [2024-07-25 06:30:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-07-25 06:30:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.67% [2024-07-25 06:30:04 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 06:30:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 06:30:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5469 (0.5469) Acc@1 90.088 (90.088) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:30:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8564 (0.6785) Acc@1 81.494 (86.497) Acc@5 96.289 (97.767) Mem 14939MB [2024-07-25 06:30:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9849 (0.7929) Acc@1 76.953 (83.354) Acc@5 95.361 (96.684) Mem 14939MB [2024-07-25 06:30:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.941 Acc@5 96.643 [2024-07-25 06:30:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-07-25 06:30:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][0/625] eta 0:13:09 lr 0.000360 wd 0.0500 time 1.2632 (1.2632) data time 0.7117 (0.7117) model time 0.0000 (0.0000) loss 5.8410 (5.8410) grad_norm 2.4281 (2.4281) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][10/625] eta 0:04:55 lr 0.000360 wd 0.0500 time 0.3992 (0.4804) data time 0.0008 (0.0654) model time 0.0000 (0.0000) loss 7.0062 (6.9901) grad_norm 3.1090 (3.6792) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][20/625] eta 0:04:27 lr 0.000360 wd 0.0500 time 0.3986 (0.4420) data time 0.0007 (0.0347) model time 0.0000 (0.0000) loss 6.3756 (6.7925) grad_norm 1.9449 (3.3937) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][30/625] eta 0:04:14 lr 0.000360 wd 0.0500 time 0.3973 (0.4279) data time 0.0006 (0.0238) model time 0.0000 (0.0000) loss 6.4666 (6.7364) grad_norm 3.5436 (3.3296) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][40/625] eta 0:04:06 lr 0.000360 wd 0.0500 time 0.4008 (0.4208) data time 0.0007 (0.0182) model time 0.0000 (0.0000) loss 8.0683 (6.8239) grad_norm 4.1607 (3.2474) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][50/625] eta 0:03:59 lr 0.000360 wd 0.0500 time 0.3949 (0.4166) data time 0.0006 (0.0148) model time 0.0000 (0.0000) loss 6.4619 (6.8075) grad_norm 3.1329 (3.2116) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][60/625] eta 0:03:53 lr 0.000360 wd 0.0500 time 0.3989 (0.4137) data time 0.0008 (0.0125) model time 0.3981 (0.3975) loss 6.5301 (6.8099) grad_norm 2.0339 (3.1140) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][70/625] eta 0:03:48 lr 0.000360 wd 0.0500 time 0.3973 (0.4116) data time 0.0010 (0.0109) model time 0.3963 (0.3977) loss 7.1534 (6.8066) grad_norm 2.9260 (3.0642) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][80/625] eta 0:03:43 lr 0.000360 wd 0.0500 time 0.3969 (0.4100) data time 0.0008 (0.0096) model time 0.3961 (0.3978) loss 5.5523 (6.7937) grad_norm 4.4808 (3.0464) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][90/625] eta 0:03:38 lr 0.000359 wd 0.0500 time 0.4089 (0.4088) data time 0.0006 (0.0087) model time 0.4083 (0.3980) loss 6.4518 (6.7958) grad_norm 2.8535 (3.0262) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][100/625] eta 0:03:34 lr 0.000359 wd 0.0500 time 0.3992 (0.4078) data time 0.0010 (0.0079) model time 0.3982 (0.3979) loss 5.3869 (6.8063) grad_norm 1.7583 (3.0225) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][110/625] eta 0:03:29 lr 0.000359 wd 0.0500 time 0.3979 (0.4069) data time 0.0008 (0.0073) model time 0.3971 (0.3978) loss 7.5574 (6.7802) grad_norm 1.9728 (3.0046) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][120/625] eta 0:03:25 lr 0.000359 wd 0.0500 time 0.3989 (0.4063) data time 0.0006 (0.0068) model time 0.3983 (0.3979) loss 5.8970 (6.7772) grad_norm 2.1742 (2.9518) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][130/625] eta 0:03:20 lr 0.000359 wd 0.0500 time 0.3970 (0.4058) data time 0.0007 (0.0063) model time 0.3963 (0.3980) loss 7.0133 (6.7760) grad_norm 2.8236 (2.9134) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][140/625] eta 0:03:16 lr 0.000359 wd 0.0500 time 0.4051 (0.4056) data time 0.0009 (0.0060) model time 0.4042 (0.3985) loss 6.5251 (6.7934) grad_norm 1.9358 (2.8897) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][150/625] eta 0:03:12 lr 0.000359 wd 0.0500 time 0.4016 (0.4061) data time 0.0008 (0.0056) model time 0.4008 (0.3997) loss 7.6494 (6.8321) grad_norm 4.3848 (2.9425) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][160/625] eta 0:03:08 lr 0.000359 wd 0.0500 time 0.4016 (0.4056) data time 0.0009 (0.0054) model time 0.4007 (0.3995) loss 6.7397 (6.8262) grad_norm 2.0370 (2.9399) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][170/625] eta 0:03:04 lr 0.000359 wd 0.0500 time 0.3986 (0.4052) data time 0.0006 (0.0051) model time 0.3979 (0.3993) loss 6.6238 (6.8462) grad_norm 3.3582 (2.9709) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][180/625] eta 0:03:01 lr 0.000359 wd 0.0500 time 0.6119 (0.4079) data time 0.0009 (0.0049) model time 0.6110 (0.4035) loss 7.2402 (6.8442) grad_norm 3.1091 (3.1149) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][190/625] eta 0:02:59 lr 0.000359 wd 0.0500 time 0.4001 (0.4124) data time 0.0006 (0.0047) model time 0.3995 (0.4099) loss 7.5981 (6.8477) grad_norm 6.0244 (3.1564) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][200/625] eta 0:02:55 lr 0.000358 wd 0.0500 time 0.3964 (0.4137) data time 0.0006 (0.0045) model time 0.3958 (0.4117) loss 7.4668 (6.8799) grad_norm 3.4432 (3.1663) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][210/625] eta 0:02:53 lr 0.000358 wd 0.0500 time 0.5928 (0.4172) data time 0.0009 (0.0043) model time 0.5919 (0.4165) loss 6.6233 (6.8849) grad_norm 2.0034 (3.1256) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][220/625] eta 0:02:49 lr 0.000358 wd 0.0500 time 0.4126 (0.4188) data time 0.0008 (0.0041) model time 0.4118 (0.4185) loss 7.1885 (6.8908) grad_norm 2.3361 (3.1165) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][230/625] eta 0:02:45 lr 0.000358 wd 0.0500 time 0.4045 (0.4180) data time 0.0009 (0.0040) model time 0.4037 (0.4174) loss 6.8245 (6.8839) grad_norm 6.9770 (3.1443) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][240/625] eta 0:02:40 lr 0.000358 wd 0.0500 time 0.3971 (0.4171) data time 0.0007 (0.0039) model time 0.3964 (0.4163) loss 7.8154 (6.8763) grad_norm 2.5508 (3.2171) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][250/625] eta 0:02:36 lr 0.000358 wd 0.0500 time 0.4065 (0.4165) data time 0.0007 (0.0037) model time 0.4058 (0.4155) loss 7.1511 (6.8847) grad_norm 2.4577 (3.1988) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:31:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][260/625] eta 0:02:31 lr 0.000358 wd 0.0500 time 0.4003 (0.4159) data time 0.0007 (0.0036) model time 0.3995 (0.4148) loss 6.7317 (6.8928) grad_norm 3.3159 (3.1909) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][270/625] eta 0:02:27 lr 0.000358 wd 0.0500 time 0.4022 (0.4154) data time 0.0006 (0.0035) model time 0.4017 (0.4141) loss 6.8243 (6.8812) grad_norm 2.0699 (3.1659) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][280/625] eta 0:02:23 lr 0.000358 wd 0.0500 time 0.4011 (0.4149) data time 0.0008 (0.0034) model time 0.4003 (0.4135) loss 6.5003 (6.8684) grad_norm 2.2455 (3.1323) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][290/625] eta 0:02:18 lr 0.000358 wd 0.0500 time 0.3980 (0.4144) data time 0.0007 (0.0034) model time 0.3973 (0.4130) loss 7.8304 (6.8763) grad_norm 2.3671 (3.0962) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][300/625] eta 0:02:14 lr 0.000357 wd 0.0500 time 0.4015 (0.4140) data time 0.0006 (0.0033) model time 0.4010 (0.4125) loss 6.8580 (6.8885) grad_norm 1.9217 (3.0879) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][310/625] eta 0:02:10 lr 0.000357 wd 0.0500 time 0.3989 (0.4135) data time 0.0008 (0.0032) model time 0.3981 (0.4120) loss 5.5593 (6.8891) grad_norm 2.3544 (3.1008) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][320/625] eta 0:02:05 lr 0.000357 wd 0.0500 time 0.3983 (0.4131) data time 0.0008 (0.0031) model time 0.3974 (0.4115) loss 7.9659 (6.8939) grad_norm 4.0553 (3.0966) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][330/625] eta 0:02:01 lr 0.000357 wd 0.0500 time 0.4019 (0.4127) data time 0.0008 (0.0031) model time 0.4011 (0.4111) loss 5.2703 (6.8685) grad_norm 4.6779 (3.1056) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][340/625] eta 0:01:57 lr 0.000357 wd 0.0500 time 0.4015 (0.4124) data time 0.0008 (0.0030) model time 0.4007 (0.4107) loss 7.0274 (6.8645) grad_norm 6.1954 (3.1105) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][350/625] eta 0:01:53 lr 0.000357 wd 0.0500 time 0.4014 (0.4120) data time 0.0008 (0.0029) model time 0.4005 (0.4103) loss 6.2470 (6.8632) grad_norm 3.3425 (3.1132) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][360/625] eta 0:01:49 lr 0.000357 wd 0.0500 time 0.3961 (0.4117) data time 0.0008 (0.0029) model time 0.3952 (0.4099) loss 6.7398 (6.8574) grad_norm 3.1671 (3.1103) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][370/625] eta 0:01:44 lr 0.000357 wd 0.0500 time 0.4008 (0.4117) data time 0.0009 (0.0028) model time 0.3999 (0.4100) loss 7.5739 (6.8671) grad_norm 3.5867 (3.0935) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][380/625] eta 0:01:40 lr 0.000357 wd 0.0500 time 0.3996 (0.4114) data time 0.0007 (0.0028) model time 0.3989 (0.4097) loss 8.3742 (6.8787) grad_norm 2.1197 (3.0766) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][390/625] eta 0:01:36 lr 0.000357 wd 0.0500 time 0.3988 (0.4111) data time 0.0008 (0.0027) model time 0.3980 (0.4094) loss 8.4783 (6.8855) grad_norm 3.2892 (3.0718) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][400/625] eta 0:01:32 lr 0.000356 wd 0.0500 time 0.4020 (0.4118) data time 0.0008 (0.0027) model time 0.4012 (0.4102) loss 7.7984 (6.8947) grad_norm 2.0010 (3.0670) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:32:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][410/625] eta 0:01:28 lr 0.000356 wd 0.0500 time 0.3998 (0.4136) data time 0.0009 (0.0026) model time 0.3989 (0.4122) loss 6.4405 (6.8863) grad_norm 2.5678 (3.0535) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][420/625] eta 0:01:24 lr 0.000356 wd 0.0500 time 0.3992 (0.4143) data time 0.0006 (0.0026) model time 0.3986 (0.4130) loss 8.2001 (6.8845) grad_norm 2.4946 (3.0438) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][430/625] eta 0:01:21 lr 0.000356 wd 0.0500 time 0.4150 (0.4165) data time 0.0008 (0.0026) model time 0.4141 (0.4156) loss 7.2533 (6.8847) grad_norm 2.6047 (3.0428) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][440/625] eta 0:01:17 lr 0.000356 wd 0.0500 time 0.3944 (0.4169) data time 0.0007 (0.0025) model time 0.3938 (0.4160) loss 8.0285 (6.8795) grad_norm 2.5997 (3.0281) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][450/625] eta 0:01:12 lr 0.000356 wd 0.0500 time 0.3971 (0.4165) data time 0.0009 (0.0025) model time 0.3961 (0.4156) loss 7.7881 (6.8726) grad_norm 3.2549 (3.0277) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][460/625] eta 0:01:08 lr 0.000356 wd 0.0500 time 0.4047 (0.4162) data time 0.0006 (0.0025) model time 0.4042 (0.4152) loss 6.0931 (6.8735) grad_norm 2.8904 (3.0418) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][470/625] eta 0:01:04 lr 0.000356 wd 0.0500 time 0.3989 (0.4158) data time 0.0007 (0.0024) model time 0.3982 (0.4148) loss 6.4963 (6.8740) grad_norm 2.3875 (3.0493) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][480/625] eta 0:01:00 lr 0.000356 wd 0.0500 time 0.4010 (0.4155) data time 0.0007 (0.0024) model time 0.4003 (0.4144) loss 5.5619 (6.8693) grad_norm 3.1368 (3.0519) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][490/625] eta 0:00:56 lr 0.000356 wd 0.0500 time 0.4042 (0.4152) data time 0.0006 (0.0024) model time 0.4036 (0.4142) loss 6.0705 (6.8674) grad_norm 4.3995 (3.0680) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][500/625] eta 0:00:51 lr 0.000356 wd 0.0500 time 0.3962 (0.4149) data time 0.0008 (0.0023) model time 0.3955 (0.4138) loss 7.4717 (6.8651) grad_norm 2.4521 (3.0758) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][510/625] eta 0:00:47 lr 0.000355 wd 0.0500 time 0.4032 (0.4146) data time 0.0008 (0.0023) model time 0.4024 (0.4135) loss 5.6384 (6.8617) grad_norm 2.7709 (3.0679) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][520/625] eta 0:00:43 lr 0.000355 wd 0.0500 time 0.4088 (0.4144) data time 0.0009 (0.0023) model time 0.4079 (0.4132) loss 6.0892 (6.8549) grad_norm 3.2615 (3.0566) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][530/625] eta 0:00:39 lr 0.000355 wd 0.0500 time 0.4166 (0.4142) data time 0.0008 (0.0023) model time 0.4159 (0.4130) loss 7.3311 (6.8563) grad_norm 2.2102 (3.0665) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][540/625] eta 0:00:35 lr 0.000355 wd 0.0500 time 0.3990 (0.4139) data time 0.0008 (0.0022) model time 0.3982 (0.4127) loss 4.8900 (6.8574) grad_norm 2.4888 (3.0814) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][550/625] eta 0:00:31 lr 0.000355 wd 0.0500 time 0.3997 (0.4137) data time 0.0007 (0.0022) model time 0.3990 (0.4125) loss 5.9194 (6.8539) grad_norm 5.1151 (3.0799) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][560/625] eta 0:00:26 lr 0.000355 wd 0.0500 time 0.3929 (0.4135) data time 0.0006 (0.0022) model time 0.3923 (0.4122) loss 5.9957 (6.8424) grad_norm 3.1252 (3.0678) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][570/625] eta 0:00:22 lr 0.000355 wd 0.0500 time 0.3968 (0.4132) data time 0.0009 (0.0022) model time 0.3959 (0.4120) loss 6.7881 (6.8444) grad_norm 3.5330 (3.0619) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][580/625] eta 0:00:18 lr 0.000355 wd 0.0500 time 0.4024 (0.4130) data time 0.0008 (0.0021) model time 0.4016 (0.4118) loss 7.1543 (6.8435) grad_norm 1.8948 (3.0615) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][590/625] eta 0:00:14 lr 0.000355 wd 0.0500 time 0.3942 (0.4131) data time 0.0009 (0.0021) model time 0.3933 (0.4119) loss 6.3289 (6.8406) grad_norm 2.9167 (3.0485) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][600/625] eta 0:00:10 lr 0.000355 wd 0.0500 time 0.3984 (0.4129) data time 0.0007 (0.0021) model time 0.3978 (0.4117) loss 4.9242 (6.8380) grad_norm 2.3156 (3.0572) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][610/625] eta 0:00:06 lr 0.000354 wd 0.0500 time 0.3969 (0.4127) data time 0.0006 (0.0021) model time 0.3963 (0.4115) loss 6.0160 (6.8391) grad_norm 2.6996 (3.0488) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][620/625] eta 0:00:02 lr 0.000354 wd 0.0500 time 0.3925 (0.4132) data time 0.0004 (0.0021) model time 0.3921 (0.4120) loss 6.3393 (6.8408) grad_norm 2.5087 (3.0421) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 198 training takes 0:04:18 [2024-07-25 06:34:25 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:34:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:34:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.489 (0.489) Loss 0.5537 (0.5537) Acc@1 89.697 (89.697) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 06:34:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.124) Loss 0.8613 (0.6763) Acc@1 80.615 (86.297) Acc@5 96.484 (97.674) Mem 14939MB [2024-07-25 06:34:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.106) Loss 0.9829 (0.7968) Acc@1 76.562 (83.073) Acc@5 95.264 (96.473) Mem 14939MB [2024-07-25 06:34:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.636 Acc@5 96.433 [2024-07-25 06:34:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-07-25 06:34:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.888 (0.888) Loss 0.5474 (0.5474) Acc@1 90.039 (90.039) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 06:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.160) Loss 0.8555 (0.6781) Acc@1 81.543 (86.528) Acc@5 96.240 (97.758) Mem 14939MB [2024-07-25 06:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.125) Loss 0.9839 (0.7924) Acc@1 76.953 (83.384) Acc@5 95.410 (96.687) Mem 14939MB [2024-07-25 06:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.971 Acc@5 96.645 [2024-07-25 06:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 06:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.97% [2024-07-25 06:34:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:34:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:34:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][0/625] eta 0:08:26 lr 0.000354 wd 0.0500 time 0.8101 (0.8101) data time 0.4045 (0.4045) model time 0.0000 (0.0000) loss 5.9036 (5.9036) grad_norm 1.8602 (1.8602) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][10/625] eta 0:04:58 lr 0.000354 wd 0.0500 time 0.5183 (0.4859) data time 0.0007 (0.0378) model time 0.0000 (0.0000) loss 6.1707 (6.5348) grad_norm 3.0859 (2.6186) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][20/625] eta 0:04:48 lr 0.000354 wd 0.0500 time 0.3949 (0.4763) data time 0.0006 (0.0202) model time 0.0000 (0.0000) loss 6.7987 (6.7407) grad_norm 2.1497 (2.6297) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][30/625] eta 0:04:44 lr 0.000354 wd 0.0500 time 0.5725 (0.4782) data time 0.0008 (0.0140) model time 0.0000 (0.0000) loss 7.4164 (6.8870) grad_norm 2.3036 (2.7850) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][40/625] eta 0:04:31 lr 0.000354 wd 0.0500 time 0.3971 (0.4634) data time 0.0006 (0.0108) model time 0.0000 (0.0000) loss 6.0778 (6.9502) grad_norm 3.0302 (2.7773) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:34:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][50/625] eta 0:04:19 lr 0.000354 wd 0.0500 time 0.4002 (0.4510) data time 0.0006 (0.0089) model time 0.0000 (0.0000) loss 6.3373 (6.8856) grad_norm 3.0008 (2.8634) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][60/625] eta 0:04:10 lr 0.000354 wd 0.0500 time 0.3993 (0.4425) data time 0.0006 (0.0076) model time 0.3987 (0.3984) loss 5.5421 (6.8453) grad_norm 3.4274 (2.8228) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][70/625] eta 0:04:02 lr 0.000354 wd 0.0500 time 0.3969 (0.4364) data time 0.0008 (0.0066) model time 0.3960 (0.3981) loss 6.3690 (6.8455) grad_norm 3.0155 (2.7766) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][80/625] eta 0:03:55 lr 0.000354 wd 0.0500 time 0.4096 (0.4321) data time 0.0008 (0.0059) model time 0.4088 (0.3990) loss 7.4891 (6.8824) grad_norm 2.0063 (2.7898) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][90/625] eta 0:03:49 lr 0.000353 wd 0.0500 time 0.3967 (0.4285) data time 0.0007 (0.0054) model time 0.3960 (0.3988) loss 7.3802 (6.8250) grad_norm 2.1282 (2.8190) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][100/625] eta 0:03:43 lr 0.000353 wd 0.0500 time 0.4007 (0.4258) data time 0.0008 (0.0049) model time 0.3999 (0.3991) loss 7.1929 (6.8317) grad_norm 4.3395 (2.8643) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][110/625] eta 0:03:38 lr 0.000353 wd 0.0500 time 0.4106 (0.4235) data time 0.0007 (0.0046) model time 0.4099 (0.3993) loss 5.7872 (6.8438) grad_norm 2.1826 (2.8816) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][120/625] eta 0:03:33 lr 0.000353 wd 0.0500 time 0.3969 (0.4227) data time 0.0007 (0.0043) model time 0.3962 (0.4011) loss 6.9824 (6.8307) grad_norm 2.1323 (2.8886) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][130/625] eta 0:03:28 lr 0.000353 wd 0.0500 time 0.4082 (0.4210) data time 0.0007 (0.0040) model time 0.4075 (0.4009) loss 6.8960 (6.8531) grad_norm 13.5051 (2.9591) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][140/625] eta 0:03:23 lr 0.000353 wd 0.0500 time 0.4011 (0.4195) data time 0.0006 (0.0038) model time 0.4004 (0.4008) loss 7.6162 (6.8446) grad_norm 3.4218 (2.9534) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][150/625] eta 0:03:18 lr 0.000353 wd 0.0500 time 0.3966 (0.4182) data time 0.0007 (0.0036) model time 0.3959 (0.4006) loss 7.0219 (6.8338) grad_norm 20.7786 (3.0614) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][160/625] eta 0:03:13 lr 0.000353 wd 0.0500 time 0.3979 (0.4171) data time 0.0008 (0.0034) model time 0.3972 (0.4004) loss 6.7952 (6.8273) grad_norm 2.9206 (3.0420) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][170/625] eta 0:03:09 lr 0.000353 wd 0.0500 time 0.3994 (0.4160) data time 0.0006 (0.0034) model time 0.3988 (0.4001) loss 7.6415 (6.8594) grad_norm 2.6143 (3.0780) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][180/625] eta 0:03:04 lr 0.000353 wd 0.0500 time 0.4012 (0.4152) data time 0.0008 (0.0032) model time 0.4003 (0.4001) loss 7.8005 (6.8802) grad_norm 1.9420 (3.0461) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][190/625] eta 0:03:00 lr 0.000352 wd 0.0500 time 0.3962 (0.4143) data time 0.0009 (0.0031) model time 0.3953 (0.4000) loss 6.0536 (6.8697) grad_norm 2.4767 (3.0115) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:35:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][200/625] eta 0:02:55 lr 0.000352 wd 0.0500 time 0.4012 (0.4136) data time 0.0006 (0.0030) model time 0.4005 (0.3999) loss 6.6745 (6.8626) grad_norm 2.7470 (2.9862) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][210/625] eta 0:02:51 lr 0.000352 wd 0.0500 time 0.3963 (0.4128) data time 0.0006 (0.0029) model time 0.3956 (0.3997) loss 7.1483 (6.8759) grad_norm 2.3830 (2.9746) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][220/625] eta 0:02:47 lr 0.000352 wd 0.0500 time 0.5796 (0.4147) data time 0.0006 (0.0028) model time 0.5790 (0.4029) loss 6.1079 (6.8820) grad_norm 4.0454 (2.9591) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][230/625] eta 0:02:44 lr 0.000352 wd 0.0500 time 0.5683 (0.4162) data time 0.0008 (0.0027) model time 0.5675 (0.4053) loss 6.3337 (6.8952) grad_norm 3.0021 (2.9649) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][240/625] eta 0:02:40 lr 0.000352 wd 0.0500 time 0.3956 (0.4176) data time 0.0008 (0.0027) model time 0.3947 (0.4077) loss 6.9104 (6.9004) grad_norm 2.7276 (2.9564) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][250/625] eta 0:02:37 lr 0.000352 wd 0.0500 time 0.4236 (0.4203) data time 0.0006 (0.0026) model time 0.4230 (0.4115) loss 7.4579 (6.9066) grad_norm 2.2048 (2.9689) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][260/625] eta 0:02:33 lr 0.000352 wd 0.0500 time 0.3958 (0.4210) data time 0.0008 (0.0025) model time 0.3950 (0.4127) loss 6.8167 (6.9055) grad_norm 3.4119 (2.9619) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][270/625] eta 0:02:29 lr 0.000352 wd 0.0500 time 0.3966 (0.4202) data time 0.0006 (0.0025) model time 0.3960 (0.4121) loss 6.8337 (6.8923) grad_norm 3.1071 (2.9589) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][280/625] eta 0:02:24 lr 0.000352 wd 0.0500 time 0.3983 (0.4195) data time 0.0006 (0.0024) model time 0.3977 (0.4115) loss 5.9215 (6.8843) grad_norm 2.2286 (2.9444) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][290/625] eta 0:02:20 lr 0.000351 wd 0.0500 time 0.3940 (0.4188) data time 0.0007 (0.0024) model time 0.3934 (0.4109) loss 5.0949 (6.8846) grad_norm 2.4858 (3.0034) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][300/625] eta 0:02:15 lr 0.000351 wd 0.0500 time 0.3970 (0.4181) data time 0.0006 (0.0023) model time 0.3965 (0.4104) loss 6.0351 (6.8804) grad_norm 3.9798 (3.0153) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][310/625] eta 0:02:11 lr 0.000351 wd 0.0500 time 0.4030 (0.4175) data time 0.0008 (0.0023) model time 0.4022 (0.4100) loss 8.0387 (6.8752) grad_norm 3.2137 (3.0336) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][320/625] eta 0:02:07 lr 0.000351 wd 0.0500 time 0.3966 (0.4170) data time 0.0006 (0.0022) model time 0.3960 (0.4096) loss 5.9999 (6.8750) grad_norm 2.5554 (3.0389) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][330/625] eta 0:02:02 lr 0.000351 wd 0.0500 time 0.3967 (0.4165) data time 0.0010 (0.0022) model time 0.3957 (0.4093) loss 6.9958 (6.8662) grad_norm 1.9918 (3.0210) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:36:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][340/625] eta 0:01:58 lr 0.000351 wd 0.0500 time 0.4028 (0.4165) data time 0.0009 (0.0021) model time 0.4019 (0.4094) loss 6.7487 (6.8756) grad_norm 2.6936 (3.0246) loss_scale 1024.0000 (525.5132) mem 14939MB [2024-07-25 06:36:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][350/625] eta 0:01:54 lr 0.000351 wd 0.0500 time 0.3951 (0.4160) data time 0.0009 (0.0021) model time 0.3942 (0.4090) loss 6.5619 (6.8755) grad_norm 2.4137 (3.0264) loss_scale 1024.0000 (539.7151) mem 14939MB [2024-07-25 06:37:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][360/625] eta 0:01:50 lr 0.000351 wd 0.0500 time 0.3976 (0.4155) data time 0.0006 (0.0021) model time 0.3970 (0.4087) loss 7.2116 (6.8737) grad_norm 2.1766 (3.0352) loss_scale 1024.0000 (553.1302) mem 14939MB [2024-07-25 06:37:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][370/625] eta 0:01:45 lr 0.000351 wd 0.0500 time 0.4014 (0.4151) data time 0.0006 (0.0020) model time 0.4008 (0.4084) loss 8.0888 (6.8858) grad_norm 2.1012 (3.0393) loss_scale 1024.0000 (565.8221) mem 14939MB [2024-07-25 06:37:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][380/625] eta 0:01:41 lr 0.000351 wd 0.0500 time 0.3982 (0.4150) data time 0.0009 (0.0020) model time 0.3973 (0.4084) loss 5.7282 (6.8783) grad_norm 1.9665 (3.0341) loss_scale 1024.0000 (577.8478) mem 14939MB [2024-07-25 06:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][390/625] eta 0:01:37 lr 0.000351 wd 0.0500 time 0.4041 (0.4146) data time 0.0007 (0.0020) model time 0.4034 (0.4082) loss 6.4811 (6.8857) grad_norm 5.2414 (3.0346) loss_scale 1024.0000 (589.2583) mem 14939MB [2024-07-25 06:37:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][400/625] eta 0:01:33 lr 0.000350 wd 0.0500 time 0.4041 (0.4142) data time 0.0010 (0.0020) model time 0.4032 (0.4079) loss 7.4083 (6.8851) grad_norm 6.6676 (3.0433) loss_scale 1024.0000 (600.0998) mem 14939MB [2024-07-25 06:37:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][410/625] eta 0:01:28 lr 0.000350 wd 0.0500 time 0.4009 (0.4139) data time 0.0009 (0.0019) model time 0.4000 (0.4076) loss 7.3533 (6.8820) grad_norm 1.9825 (3.0327) loss_scale 1024.0000 (610.4136) mem 14939MB [2024-07-25 06:37:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][420/625] eta 0:01:24 lr 0.000350 wd 0.0500 time 0.3944 (0.4137) data time 0.0008 (0.0019) model time 0.3936 (0.4076) loss 7.0621 (6.8799) grad_norm 5.1796 (3.0352) loss_scale 1024.0000 (620.2375) mem 14939MB [2024-07-25 06:37:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][430/625] eta 0:01:20 lr 0.000350 wd 0.0500 time 0.5602 (0.4138) data time 0.0008 (0.0019) model time 0.5595 (0.4078) loss 6.5454 (6.8880) grad_norm 2.8071 (3.0323) loss_scale 1024.0000 (629.6056) mem 14939MB [2024-07-25 06:37:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][440/625] eta 0:01:16 lr 0.000350 wd 0.0500 time 0.5714 (0.4147) data time 0.0006 (0.0019) model time 0.5708 (0.4090) loss 7.8334 (6.8959) grad_norm 5.9855 (3.0376) loss_scale 1024.0000 (638.5488) mem 14939MB [2024-07-25 06:37:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][450/625] eta 0:01:12 lr 0.000350 wd 0.0500 time 0.4026 (0.4159) data time 0.0009 (0.0018) model time 0.4018 (0.4104) loss 8.1437 (6.8968) grad_norm 2.7382 (3.0633) loss_scale 1024.0000 (647.0953) mem 14939MB [2024-07-25 06:37:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][460/625] eta 0:01:08 lr 0.000350 wd 0.0500 time 0.5744 (0.4170) data time 0.0008 (0.0018) model time 0.5736 (0.4118) loss 6.2937 (6.8892) grad_norm 3.3772 (3.0640) loss_scale 1024.0000 (655.2711) mem 14939MB [2024-07-25 06:37:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][470/625] eta 0:01:04 lr 0.000350 wd 0.0500 time 0.3952 (0.4184) data time 0.0007 (0.0018) model time 0.3945 (0.4135) loss 6.4437 (6.8976) grad_norm 3.5948 (3.0731) loss_scale 1024.0000 (663.0998) mem 14939MB [2024-07-25 06:37:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][480/625] eta 0:01:00 lr 0.000350 wd 0.0500 time 0.4010 (0.4190) data time 0.0007 (0.0018) model time 0.4003 (0.4143) loss 6.4352 (6.8971) grad_norm 4.2250 (3.0849) loss_scale 1024.0000 (670.6029) mem 14939MB [2024-07-25 06:37:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][490/625] eta 0:00:56 lr 0.000350 wd 0.0500 time 0.3960 (0.4186) data time 0.0006 (0.0018) model time 0.3954 (0.4139) loss 6.5299 (6.8982) grad_norm 1.9455 (3.0830) loss_scale 1024.0000 (677.8004) mem 14939MB [2024-07-25 06:38:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][500/625] eta 0:00:52 lr 0.000349 wd 0.0500 time 0.4007 (0.4182) data time 0.0006 (0.0018) model time 0.4001 (0.4135) loss 6.6380 (6.9053) grad_norm 3.2074 (3.0805) loss_scale 1024.0000 (684.7106) mem 14939MB [2024-07-25 06:38:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][510/625] eta 0:00:48 lr 0.000349 wd 0.0500 time 0.4017 (0.4178) data time 0.0008 (0.0017) model time 0.4009 (0.4132) loss 7.3942 (6.9096) grad_norm 3.1082 (3.0961) loss_scale 1024.0000 (691.3503) mem 14939MB [2024-07-25 06:38:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][520/625] eta 0:00:43 lr 0.000349 wd 0.0500 time 0.3981 (0.4175) data time 0.0006 (0.0017) model time 0.3975 (0.4129) loss 6.4711 (6.9134) grad_norm 2.8169 (3.0888) loss_scale 1024.0000 (697.7351) mem 14939MB [2024-07-25 06:38:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][530/625] eta 0:00:39 lr 0.000349 wd 0.0500 time 0.3976 (0.4172) data time 0.0006 (0.0017) model time 0.3970 (0.4127) loss 6.0070 (6.9094) grad_norm 2.1112 (3.0757) loss_scale 1024.0000 (703.8795) mem 14939MB [2024-07-25 06:38:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][540/625] eta 0:00:35 lr 0.000349 wd 0.0500 time 0.4035 (0.4169) data time 0.0008 (0.0017) model time 0.4027 (0.4124) loss 7.2235 (6.9168) grad_norm 2.1180 (3.0634) loss_scale 1024.0000 (709.7967) mem 14939MB [2024-07-25 06:38:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][550/625] eta 0:00:31 lr 0.000349 wd 0.0500 time 0.3974 (0.4167) data time 0.0009 (0.0017) model time 0.3965 (0.4123) loss 6.8131 (6.9122) grad_norm 7.0843 (3.0657) loss_scale 1024.0000 (715.4991) mem 14939MB [2024-07-25 06:38:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][560/625] eta 0:00:27 lr 0.000349 wd 0.0500 time 0.5385 (0.4168) data time 0.0008 (0.0017) model time 0.5377 (0.4124) loss 6.3806 (6.9116) grad_norm 3.0839 (3.1047) loss_scale 1024.0000 (720.9982) mem 14939MB [2024-07-25 06:38:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][570/625] eta 0:00:22 lr 0.000349 wd 0.0500 time 0.4001 (0.4165) data time 0.0009 (0.0017) model time 0.3992 (0.4122) loss 7.6123 (6.9158) grad_norm 2.4603 (3.1084) loss_scale 1024.0000 (726.3047) mem 14939MB [2024-07-25 06:38:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][580/625] eta 0:00:18 lr 0.000349 wd 0.0500 time 0.3980 (0.4163) data time 0.0007 (0.0016) model time 0.3973 (0.4120) loss 8.0187 (6.9138) grad_norm 1.9119 (3.1045) loss_scale 1024.0000 (731.4286) mem 14939MB [2024-07-25 06:38:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][590/625] eta 0:00:14 lr 0.000349 wd 0.0500 time 0.3957 (0.4160) data time 0.0007 (0.0016) model time 0.3950 (0.4118) loss 6.6283 (6.9079) grad_norm 1.7851 (3.1045) loss_scale 1024.0000 (736.3790) mem 14939MB [2024-07-25 06:38:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][600/625] eta 0:00:10 lr 0.000349 wd 0.0500 time 0.4041 (0.4158) data time 0.0008 (0.0016) model time 0.4032 (0.4116) loss 6.5211 (6.9084) grad_norm 2.2932 (3.0934) loss_scale 1024.0000 (741.1647) mem 14939MB [2024-07-25 06:38:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][610/625] eta 0:00:06 lr 0.000348 wd 0.0500 time 0.3942 (0.4155) data time 0.0006 (0.0016) model time 0.3936 (0.4113) loss 5.9283 (6.9036) grad_norm 1.9380 (3.0803) loss_scale 1024.0000 (745.7938) mem 14939MB [2024-07-25 06:38:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][620/625] eta 0:00:02 lr 0.000348 wd 0.0500 time 0.4013 (0.4153) data time 0.0004 (0.0016) model time 0.4010 (0.4111) loss 5.6665 (6.9039) grad_norm 5.8684 (3.0771) loss_scale 1024.0000 (750.2738) mem 14939MB [2024-07-25 06:38:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 199 training takes 0:04:19 [2024-07-25 06:38:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:38:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:38:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.5566 (0.5566) Acc@1 89.404 (89.404) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:38:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.084 (0.122) Loss 0.8696 (0.6875) Acc@1 81.592 (86.288) Acc@5 96.338 (97.692) Mem 14939MB [2024-07-25 06:38:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9897 (0.8101) Acc@1 77.148 (83.105) Acc@5 94.971 (96.496) Mem 14939MB [2024-07-25 06:38:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.666 Acc@5 96.477 [2024-07-25 06:38:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-07-25 06:38:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.808 (0.808) Loss 0.5474 (0.5474) Acc@1 90.039 (90.039) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.156) Loss 0.8540 (0.6776) Acc@1 81.445 (86.541) Acc@5 96.289 (97.781) Mem 14939MB [2024-07-25 06:38:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9834 (0.7915) Acc@1 76.953 (83.403) Acc@5 95.410 (96.691) Mem 14939MB [2024-07-25 06:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.991 Acc@5 96.645 [2024-07-25 06:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 06:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.99% [2024-07-25 06:38:59 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:38:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:39:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][0/625] eta 0:07:52 lr 0.000348 wd 0.0500 time 0.7561 (0.7561) data time 0.3719 (0.3719) model time 0.0000 (0.0000) loss 6.1001 (6.1001) grad_norm 2.3293 (2.3293) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][10/625] eta 0:04:28 lr 0.000348 wd 0.0500 time 0.4005 (0.4362) data time 0.0007 (0.0347) model time 0.0000 (0.0000) loss 6.6163 (6.7833) grad_norm 4.5577 (2.7561) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][20/625] eta 0:04:14 lr 0.000348 wd 0.0500 time 0.3997 (0.4210) data time 0.0009 (0.0205) model time 0.0000 (0.0000) loss 6.4191 (6.9410) grad_norm 2.0773 (3.0188) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][30/625] eta 0:04:13 lr 0.000348 wd 0.0500 time 0.4089 (0.4268) data time 0.0009 (0.0142) model time 0.0000 (0.0000) loss 7.5921 (6.8751) grad_norm 5.8992 (3.4280) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][40/625] eta 0:04:15 lr 0.000348 wd 0.0500 time 0.6066 (0.4373) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 7.4546 (6.9277) grad_norm 3.5048 (3.4501) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][50/625] eta 0:04:13 lr 0.000348 wd 0.0500 time 0.6084 (0.4406) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 8.0006 (6.9100) grad_norm 5.3574 (3.7082) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][60/625] eta 0:04:14 lr 0.000348 wd 0.0500 time 0.5314 (0.4507) data time 0.0008 (0.0076) model time 0.5305 (0.5011) loss 7.1745 (6.9191) grad_norm 2.0266 (3.7344) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][70/625] eta 0:04:12 lr 0.000348 wd 0.0500 time 0.5629 (0.4557) data time 0.0007 (0.0067) model time 0.5623 (0.4934) loss 7.6442 (6.9484) grad_norm 5.8695 (3.7965) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][80/625] eta 0:04:04 lr 0.000348 wd 0.0500 time 0.3955 (0.4486) data time 0.0009 (0.0060) model time 0.3947 (0.4613) loss 7.8682 (6.9386) grad_norm 14.7783 (3.9812) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][90/625] eta 0:03:57 lr 0.000347 wd 0.0500 time 0.4018 (0.4431) data time 0.0006 (0.0054) model time 0.4011 (0.4454) loss 7.6463 (6.8677) grad_norm 2.4574 (4.0876) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][100/625] eta 0:03:51 lr 0.000347 wd 0.0500 time 0.3942 (0.4401) data time 0.0007 (0.0050) model time 0.3935 (0.4387) loss 5.7327 (6.8687) grad_norm 3.5335 (3.9689) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][110/625] eta 0:03:45 lr 0.000347 wd 0.0500 time 0.4004 (0.4374) data time 0.0009 (0.0046) model time 0.3994 (0.4339) loss 6.2687 (6.8740) grad_norm 2.5246 (3.8122) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][120/625] eta 0:03:39 lr 0.000347 wd 0.0500 time 0.4219 (0.4350) data time 0.0007 (0.0043) model time 0.4212 (0.4300) loss 6.9548 (6.8772) grad_norm 2.1113 (3.7876) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:39:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][130/625] eta 0:03:33 lr 0.000347 wd 0.0500 time 0.3959 (0.4322) data time 0.0009 (0.0041) model time 0.3950 (0.4259) loss 6.5555 (6.8753) grad_norm 2.1605 (3.7048) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][140/625] eta 0:03:28 lr 0.000347 wd 0.0500 time 0.3970 (0.4301) data time 0.0007 (0.0040) model time 0.3963 (0.4229) loss 6.2834 (6.8502) grad_norm 4.0413 (3.6812) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][150/625] eta 0:03:23 lr 0.000347 wd 0.0500 time 0.3972 (0.4281) data time 0.0008 (0.0038) model time 0.3964 (0.4205) loss 7.3263 (6.8617) grad_norm 2.3719 (3.6451) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][160/625] eta 0:03:18 lr 0.000347 wd 0.0500 time 0.4132 (0.4264) data time 0.0007 (0.0036) model time 0.4126 (0.4187) loss 7.1288 (6.8846) grad_norm 2.6502 (3.6104) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][170/625] eta 0:03:13 lr 0.000347 wd 0.0500 time 0.4009 (0.4249) data time 0.0009 (0.0034) model time 0.4001 (0.4172) loss 7.7713 (6.9052) grad_norm 2.3711 (3.6463) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][180/625] eta 0:03:08 lr 0.000347 wd 0.0500 time 0.4570 (0.4241) data time 0.0009 (0.0033) model time 0.4561 (0.4165) loss 7.0482 (6.9003) grad_norm 6.4270 (3.6172) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][190/625] eta 0:03:03 lr 0.000346 wd 0.0500 time 0.3953 (0.4229) data time 0.0009 (0.0032) model time 0.3944 (0.4153) loss 7.2670 (6.8966) grad_norm 3.4995 (3.6160) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][200/625] eta 0:02:59 lr 0.000346 wd 0.0500 time 0.3966 (0.4219) data time 0.0007 (0.0031) model time 0.3959 (0.4145) loss 6.2362 (6.8790) grad_norm 1.5458 (3.5791) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][210/625] eta 0:02:54 lr 0.000346 wd 0.0500 time 0.4192 (0.4213) data time 0.0008 (0.0030) model time 0.4183 (0.4140) loss 6.2450 (6.8740) grad_norm 2.4773 (3.5574) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][220/625] eta 0:02:50 lr 0.000346 wd 0.0500 time 0.4031 (0.4204) data time 0.0008 (0.0029) model time 0.4023 (0.4133) loss 6.8304 (6.8784) grad_norm 3.4872 (3.5274) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][230/625] eta 0:02:45 lr 0.000346 wd 0.0500 time 0.4009 (0.4197) data time 0.0008 (0.0028) model time 0.4001 (0.4127) loss 6.4874 (6.8787) grad_norm 4.9373 (3.5393) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][240/625] eta 0:02:41 lr 0.000346 wd 0.0500 time 0.4049 (0.4190) data time 0.0006 (0.0027) model time 0.4043 (0.4122) loss 5.6387 (6.8717) grad_norm 2.9210 (3.5125) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][250/625] eta 0:02:37 lr 0.000346 wd 0.0500 time 0.3981 (0.4195) data time 0.0007 (0.0027) model time 0.3973 (0.4131) loss 7.1522 (6.8901) grad_norm 2.4417 (3.5000) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][260/625] eta 0:02:33 lr 0.000346 wd 0.0500 time 0.5994 (0.4213) data time 0.0009 (0.0026) model time 0.5985 (0.4156) loss 7.9819 (6.9017) grad_norm 2.1700 (3.4610) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][270/625] eta 0:02:30 lr 0.000346 wd 0.0500 time 0.6018 (0.4230) data time 0.0008 (0.0025) model time 0.6009 (0.4179) loss 7.2916 (6.8907) grad_norm 2.5298 (3.4120) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:40:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][280/625] eta 0:02:26 lr 0.000346 wd 0.0500 time 0.6004 (0.4244) data time 0.0009 (0.0025) model time 0.5995 (0.4198) loss 6.0195 (6.8874) grad_norm 2.4169 (3.3803) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][290/625] eta 0:02:22 lr 0.000345 wd 0.0500 time 0.5688 (0.4261) data time 0.0007 (0.0024) model time 0.5681 (0.4220) loss 6.1259 (6.8778) grad_norm 2.8437 (3.3615) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][300/625] eta 0:02:18 lr 0.000345 wd 0.0500 time 0.3979 (0.4253) data time 0.0008 (0.0024) model time 0.3972 (0.4211) loss 8.2920 (6.8636) grad_norm 3.7447 (3.3516) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][310/625] eta 0:02:13 lr 0.000345 wd 0.0500 time 0.3970 (0.4245) data time 0.0007 (0.0023) model time 0.3963 (0.4203) loss 7.4689 (6.8600) grad_norm 7.0887 (3.3488) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][320/625] eta 0:02:09 lr 0.000345 wd 0.0500 time 0.3968 (0.4243) data time 0.0007 (0.0023) model time 0.3961 (0.4202) loss 7.3980 (6.8558) grad_norm 5.3376 (3.3276) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][330/625] eta 0:02:05 lr 0.000345 wd 0.0500 time 0.4682 (0.4239) data time 0.0007 (0.0022) model time 0.4675 (0.4198) loss 6.9976 (6.8649) grad_norm 2.5565 (3.3176) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][340/625] eta 0:02:00 lr 0.000345 wd 0.0500 time 0.3932 (0.4233) data time 0.0006 (0.0022) model time 0.3926 (0.4193) loss 6.7782 (6.8625) grad_norm 6.8971 (3.3255) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][350/625] eta 0:01:56 lr 0.000345 wd 0.0500 time 0.3965 (0.4227) data time 0.0008 (0.0022) model time 0.3957 (0.4187) loss 6.9125 (6.8558) grad_norm 3.0149 (3.3159) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][360/625] eta 0:01:51 lr 0.000345 wd 0.0500 time 0.4201 (0.4222) data time 0.0009 (0.0021) model time 0.4192 (0.4181) loss 8.2257 (6.8697) grad_norm 3.6052 (3.3499) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][370/625] eta 0:01:47 lr 0.000345 wd 0.0500 time 0.3950 (0.4217) data time 0.0007 (0.0021) model time 0.3943 (0.4177) loss 7.3463 (6.8632) grad_norm 4.9545 (3.3740) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][380/625] eta 0:01:43 lr 0.000345 wd 0.0500 time 0.4057 (0.4212) data time 0.0009 (0.0021) model time 0.4048 (0.4172) loss 7.8036 (6.8663) grad_norm 3.7149 (3.3705) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][390/625] eta 0:01:38 lr 0.000345 wd 0.0500 time 0.3987 (0.4207) data time 0.0007 (0.0020) model time 0.3979 (0.4167) loss 7.5360 (6.8695) grad_norm 2.7290 (3.3792) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][400/625] eta 0:01:34 lr 0.000344 wd 0.0500 time 0.3962 (0.4202) data time 0.0008 (0.0020) model time 0.3954 (0.4162) loss 7.1032 (6.8675) grad_norm 1.8497 (3.3556) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][410/625] eta 0:01:30 lr 0.000344 wd 0.0500 time 0.3993 (0.4197) data time 0.0009 (0.0020) model time 0.3984 (0.4157) loss 7.3329 (6.8592) grad_norm 3.3850 (3.3578) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:41:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][420/625] eta 0:01:25 lr 0.000344 wd 0.0500 time 0.4761 (0.4195) data time 0.0007 (0.0020) model time 0.4753 (0.4156) loss 5.7509 (6.8600) grad_norm 2.2884 (3.3525) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][430/625] eta 0:01:21 lr 0.000344 wd 0.0500 time 0.3963 (0.4192) data time 0.0006 (0.0019) model time 0.3957 (0.4153) loss 7.9962 (6.8668) grad_norm 2.0316 (3.3602) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:42:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][440/625] eta 0:01:17 lr 0.000344 wd 0.0500 time 0.3947 (0.4188) data time 0.0007 (0.0019) model time 0.3941 (0.4149) loss 6.2229 (6.8623) grad_norm 1.7964 (3.3432) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:42:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][450/625] eta 0:01:13 lr 0.000344 wd 0.0500 time 0.4054 (0.4185) data time 0.0006 (0.0019) model time 0.4048 (0.4147) loss 7.4791 (6.8634) grad_norm 2.4934 (3.3522) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 06:42:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][460/625] eta 0:01:09 lr 0.000344 wd 0.0500 time 0.3954 (0.4182) data time 0.0008 (0.0019) model time 0.3946 (0.4145) loss 7.0588 (6.8650) grad_norm 1.9349 (inf) loss_scale 512.0000 (1014.0043) mem 14939MB [2024-07-25 06:42:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][470/625] eta 0:01:04 lr 0.000344 wd 0.0500 time 0.4016 (0.4187) data time 0.0008 (0.0019) model time 0.4008 (0.4150) loss 7.7738 (6.8606) grad_norm 3.0113 (inf) loss_scale 512.0000 (1003.3461) mem 14939MB [2024-07-25 06:42:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][480/625] eta 0:01:00 lr 0.000344 wd 0.0500 time 0.3997 (0.4198) data time 0.0007 (0.0018) model time 0.3990 (0.4164) loss 6.7591 (6.8694) grad_norm 3.3820 (inf) loss_scale 512.0000 (993.1310) mem 14939MB [2024-07-25 06:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][490/625] eta 0:00:56 lr 0.000344 wd 0.0500 time 0.4171 (0.4210) data time 0.0007 (0.0018) model time 0.4165 (0.4177) loss 6.1604 (6.8663) grad_norm 2.1050 (inf) loss_scale 512.0000 (983.3320) mem 14939MB [2024-07-25 06:42:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][500/625] eta 0:00:52 lr 0.000343 wd 0.0500 time 0.4001 (0.4217) data time 0.0006 (0.0018) model time 0.3995 (0.4186) loss 6.3244 (6.8567) grad_norm 3.2369 (inf) loss_scale 512.0000 (973.9242) mem 14939MB [2024-07-25 06:42:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][510/625] eta 0:00:48 lr 0.000343 wd 0.0500 time 0.5925 (0.4227) data time 0.0010 (0.0018) model time 0.5916 (0.4197) loss 5.4924 (6.8528) grad_norm 2.9612 (inf) loss_scale 512.0000 (964.8845) mem 14939MB [2024-07-25 06:42:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][520/625] eta 0:00:44 lr 0.000343 wd 0.0500 time 0.4147 (0.4224) data time 0.0007 (0.0019) model time 0.4140 (0.4193) loss 7.2853 (6.8582) grad_norm 2.5545 (inf) loss_scale 512.0000 (956.1919) mem 14939MB [2024-07-25 06:42:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][530/625] eta 0:00:40 lr 0.000343 wd 0.0500 time 0.3990 (0.4220) data time 0.0008 (0.0019) model time 0.3982 (0.4189) loss 5.3946 (6.8560) grad_norm 2.9468 (inf) loss_scale 512.0000 (947.8267) mem 14939MB [2024-07-25 06:42:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][540/625] eta 0:00:35 lr 0.000343 wd 0.0500 time 0.4126 (0.4219) data time 0.0006 (0.0018) model time 0.4120 (0.4189) loss 6.2257 (6.8522) grad_norm 3.2035 (inf) loss_scale 512.0000 (939.7708) mem 14939MB [2024-07-25 06:42:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][550/625] eta 0:00:31 lr 0.000343 wd 0.0500 time 0.4042 (0.4216) data time 0.0006 (0.0018) model time 0.4036 (0.4186) loss 7.4736 (6.8527) grad_norm 2.5106 (inf) loss_scale 512.0000 (932.0073) mem 14939MB [2024-07-25 06:42:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][560/625] eta 0:00:27 lr 0.000343 wd 0.0500 time 0.3980 (0.4213) data time 0.0007 (0.0018) model time 0.3974 (0.4182) loss 5.6002 (6.8474) grad_norm 3.8875 (inf) loss_scale 512.0000 (924.5205) mem 14939MB [2024-07-25 06:43:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][570/625] eta 0:00:23 lr 0.000343 wd 0.0500 time 0.3963 (0.4210) data time 0.0008 (0.0018) model time 0.3955 (0.4179) loss 6.8321 (6.8517) grad_norm 2.0415 (inf) loss_scale 512.0000 (917.2960) mem 14939MB [2024-07-25 06:43:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][580/625] eta 0:00:18 lr 0.000343 wd 0.0500 time 0.4032 (0.4206) data time 0.0008 (0.0018) model time 0.4024 (0.4176) loss 6.0493 (6.8438) grad_norm 1.6858 (inf) loss_scale 512.0000 (910.3201) mem 14939MB [2024-07-25 06:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][590/625] eta 0:00:14 lr 0.000343 wd 0.0500 time 0.3962 (0.4204) data time 0.0007 (0.0018) model time 0.3955 (0.4174) loss 7.3787 (6.8476) grad_norm 2.2767 (inf) loss_scale 512.0000 (903.5804) mem 14939MB [2024-07-25 06:43:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][600/625] eta 0:00:10 lr 0.000343 wd 0.0500 time 0.4000 (0.4201) data time 0.0007 (0.0018) model time 0.3993 (0.4171) loss 7.4710 (6.8476) grad_norm 2.2348 (inf) loss_scale 512.0000 (897.0649) mem 14939MB [2024-07-25 06:43:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][610/625] eta 0:00:06 lr 0.000342 wd 0.0500 time 0.3976 (0.4198) data time 0.0004 (0.0019) model time 0.3972 (0.4167) loss 6.7411 (6.8522) grad_norm 3.3687 (inf) loss_scale 512.0000 (890.7627) mem 14939MB [2024-07-25 06:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][620/625] eta 0:00:02 lr 0.000342 wd 0.0500 time 0.3961 (0.4195) data time 0.0006 (0.0018) model time 0.3955 (0.4164) loss 6.9195 (6.8509) grad_norm 3.0720 (inf) loss_scale 512.0000 (884.6634) mem 14939MB [2024-07-25 06:43:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 200 training takes 0:04:22 [2024-07-25 06:43:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:43:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:43:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.5723 (0.5723) Acc@1 89.551 (89.551) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 06:43:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.8735 (0.6952) Acc@1 80.371 (86.208) Acc@5 96.533 (97.678) Mem 14939MB [2024-07-25 06:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 0.9946 (0.8112) Acc@1 76.318 (82.978) Acc@5 95.215 (96.491) Mem 14939MB [2024-07-25 06:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.544 Acc@5 96.455 [2024-07-25 06:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-07-25 06:43:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.824 (0.824) Loss 0.5469 (0.5469) Acc@1 90.039 (90.039) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 06:43:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.8521 (0.6770) Acc@1 81.348 (86.555) Acc@5 96.338 (97.763) Mem 14939MB [2024-07-25 06:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9819 (0.7910) Acc@1 76.904 (83.429) Acc@5 95.410 (96.696) Mem 14939MB [2024-07-25 06:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.013 Acc@5 96.645 [2024-07-25 06:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 06:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.01% [2024-07-25 06:43:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:43:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][0/625] eta 0:08:44 lr 0.000342 wd 0.0500 time 0.8392 (0.8392) data time 0.4437 (0.4437) model time 0.0000 (0.0000) loss 5.9706 (5.9706) grad_norm 3.5962 (3.5962) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:43:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][10/625] eta 0:04:35 lr 0.000342 wd 0.0500 time 0.3909 (0.4472) data time 0.0009 (0.0434) model time 0.0000 (0.0000) loss 7.0694 (6.5111) grad_norm 4.3489 (3.0944) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:43:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][20/625] eta 0:04:18 lr 0.000342 wd 0.0500 time 0.4002 (0.4266) data time 0.0008 (0.0232) model time 0.0000 (0.0000) loss 7.7272 (6.6879) grad_norm 2.6169 (3.6494) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:43:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][30/625] eta 0:04:10 lr 0.000342 wd 0.0500 time 0.4193 (0.4215) data time 0.0006 (0.0160) model time 0.0000 (0.0000) loss 6.4033 (6.8475) grad_norm 2.6072 (3.8846) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:43:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][40/625] eta 0:04:05 lr 0.000342 wd 0.0500 time 0.3962 (0.4198) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 7.3710 (6.8911) grad_norm 2.0310 (3.6664) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:43:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][50/625] eta 0:03:59 lr 0.000342 wd 0.0500 time 0.3986 (0.4162) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 7.9420 (6.9427) grad_norm 5.2285 (3.5039) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:43:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][60/625] eta 0:03:54 lr 0.000342 wd 0.0500 time 0.4076 (0.4151) data time 0.0006 (0.0090) model time 0.4069 (0.4090) loss 7.2949 (6.9373) grad_norm 2.9658 (3.6992) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:43:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][70/625] eta 0:03:52 lr 0.000342 wd 0.0500 time 0.4068 (0.4188) data time 0.0008 (0.0078) model time 0.4060 (0.4248) loss 5.7089 (6.9712) grad_norm 1.8476 (3.6150) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][80/625] eta 0:03:52 lr 0.000342 wd 0.0500 time 0.6035 (0.4263) data time 0.0008 (0.0071) model time 0.6027 (0.4422) loss 5.6223 (6.9632) grad_norm 2.5044 (3.5499) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][90/625] eta 0:03:49 lr 0.000341 wd 0.0500 time 0.4002 (0.4293) data time 0.0006 (0.0064) model time 0.3996 (0.4449) loss 6.8225 (6.9597) grad_norm 2.0790 (3.4395) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][100/625] eta 0:03:48 lr 0.000341 wd 0.0500 time 0.5560 (0.4346) data time 0.0006 (0.0059) model time 0.5554 (0.4522) loss 8.1273 (6.9328) grad_norm 1.6061 (3.3700) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][110/625] eta 0:03:44 lr 0.000341 wd 0.0500 time 0.4172 (0.4358) data time 0.0008 (0.0054) model time 0.4164 (0.4514) loss 5.9658 (6.9145) grad_norm 2.1052 (3.2951) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][120/625] eta 0:03:38 lr 0.000341 wd 0.0500 time 0.3953 (0.4331) data time 0.0009 (0.0050) model time 0.3945 (0.4444) loss 7.7267 (6.8955) grad_norm 2.7346 (3.2399) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][130/625] eta 0:03:33 lr 0.000341 wd 0.0500 time 0.3960 (0.4311) data time 0.0006 (0.0050) model time 0.3954 (0.4391) loss 7.4529 (6.8719) grad_norm 2.7600 (3.2116) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][140/625] eta 0:03:28 lr 0.000341 wd 0.0500 time 0.4175 (0.4301) data time 0.0006 (0.0047) model time 0.4169 (0.4365) loss 6.8972 (6.8658) grad_norm 2.9788 (3.2267) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][150/625] eta 0:03:23 lr 0.000341 wd 0.0500 time 0.4032 (0.4288) data time 0.0008 (0.0045) model time 0.4024 (0.4339) loss 7.7711 (6.8719) grad_norm 2.4651 (3.1787) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][160/625] eta 0:03:18 lr 0.000341 wd 0.0500 time 0.3970 (0.4277) data time 0.0006 (0.0044) model time 0.3963 (0.4315) loss 7.0875 (6.8572) grad_norm 4.4874 (3.1877) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 06:44:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][170/625] eta 0:03:14 lr 0.000341 wd 0.0500 time 0.4081 (0.4265) data time 0.0008 (0.0042) model time 0.4073 (0.4295) loss 7.4395 (6.8529) grad_norm 2.6430 (inf) loss_scale 256.0000 (507.5088) mem 14939MB [2024-07-25 06:44:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][180/625] eta 0:03:09 lr 0.000341 wd 0.0500 time 0.3964 (0.4253) data time 0.0007 (0.0041) model time 0.3957 (0.4272) loss 5.6771 (6.8550) grad_norm 2.2556 (inf) loss_scale 256.0000 (493.6133) mem 14939MB [2024-07-25 06:44:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][190/625] eta 0:03:04 lr 0.000340 wd 0.0500 time 0.3986 (0.4241) data time 0.0009 (0.0041) model time 0.3977 (0.4253) loss 5.6006 (6.8551) grad_norm 2.5019 (inf) loss_scale 256.0000 (481.1728) mem 14939MB [2024-07-25 06:44:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][200/625] eta 0:02:59 lr 0.000340 wd 0.0500 time 0.4039 (0.4229) data time 0.0006 (0.0040) model time 0.4033 (0.4235) loss 7.7137 (6.8639) grad_norm 2.4218 (inf) loss_scale 256.0000 (469.9701) mem 14939MB [2024-07-25 06:44:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][210/625] eta 0:02:55 lr 0.000340 wd 0.0500 time 0.3911 (0.4220) data time 0.0007 (0.0039) model time 0.3904 (0.4221) loss 7.5103 (6.8771) grad_norm 4.4797 (inf) loss_scale 256.0000 (459.8294) mem 14939MB [2024-07-25 06:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][220/625] eta 0:02:50 lr 0.000340 wd 0.0500 time 0.3971 (0.4212) data time 0.0007 (0.0037) model time 0.3964 (0.4210) loss 7.7696 (6.8832) grad_norm 5.5559 (inf) loss_scale 256.0000 (450.6063) mem 14939MB [2024-07-25 06:45:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][230/625] eta 0:02:46 lr 0.000340 wd 0.0500 time 0.4268 (0.4206) data time 0.0008 (0.0038) model time 0.4260 (0.4199) loss 6.0468 (6.8734) grad_norm 1.8266 (inf) loss_scale 256.0000 (442.1818) mem 14939MB [2024-07-25 06:45:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][240/625] eta 0:02:41 lr 0.000340 wd 0.0500 time 0.4082 (0.4202) data time 0.0005 (0.0040) model time 0.4076 (0.4191) loss 7.7986 (6.8779) grad_norm 2.5591 (inf) loss_scale 256.0000 (434.4564) mem 14939MB [2024-07-25 06:45:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][250/625] eta 0:02:37 lr 0.000340 wd 0.0500 time 0.3998 (0.4195) data time 0.0007 (0.0039) model time 0.3991 (0.4182) loss 7.6353 (6.8785) grad_norm 2.8769 (inf) loss_scale 256.0000 (427.3466) mem 14939MB [2024-07-25 06:45:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][260/625] eta 0:02:32 lr 0.000340 wd 0.0500 time 0.4049 (0.4190) data time 0.0008 (0.0039) model time 0.4041 (0.4174) loss 5.6161 (6.8902) grad_norm 4.0353 (inf) loss_scale 256.0000 (420.7816) mem 14939MB [2024-07-25 06:45:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][270/625] eta 0:02:28 lr 0.000340 wd 0.0500 time 0.3950 (0.4184) data time 0.0008 (0.0038) model time 0.3941 (0.4167) loss 7.7934 (6.8943) grad_norm 3.8452 (inf) loss_scale 256.0000 (414.7011) mem 14939MB [2024-07-25 06:45:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][280/625] eta 0:02:24 lr 0.000340 wd 0.0500 time 0.5786 (0.4185) data time 0.0007 (0.0037) model time 0.5779 (0.4169) loss 7.0619 (6.9008) grad_norm 2.4136 (inf) loss_scale 256.0000 (409.0534) mem 14939MB [2024-07-25 06:45:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][290/625] eta 0:02:20 lr 0.000340 wd 0.0500 time 0.6045 (0.4197) data time 0.0007 (0.0036) model time 0.6038 (0.4184) loss 7.2732 (6.9084) grad_norm 4.2568 (inf) loss_scale 256.0000 (403.7938) mem 14939MB [2024-07-25 06:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][300/625] eta 0:02:17 lr 0.000339 wd 0.0500 time 0.5881 (0.4218) data time 0.0008 (0.0035) model time 0.5873 (0.4208) loss 7.2040 (6.8963) grad_norm 3.5476 (inf) loss_scale 256.0000 (398.8837) mem 14939MB [2024-07-25 06:45:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][310/625] eta 0:02:13 lr 0.000339 wd 0.0500 time 0.5979 (0.4231) data time 0.0007 (0.0035) model time 0.5972 (0.4223) loss 7.5829 (6.8913) grad_norm 2.7885 (inf) loss_scale 256.0000 (394.2894) mem 14939MB [2024-07-25 06:45:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][320/625] eta 0:02:09 lr 0.000339 wd 0.0500 time 0.4004 (0.4244) data time 0.0007 (0.0034) model time 0.3997 (0.4239) loss 5.3417 (6.8896) grad_norm 2.2854 (inf) loss_scale 256.0000 (389.9813) mem 14939MB [2024-07-25 06:45:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][330/625] eta 0:02:05 lr 0.000339 wd 0.0500 time 0.4064 (0.4250) data time 0.0010 (0.0034) model time 0.4055 (0.4245) loss 7.3601 (6.8981) grad_norm 2.4462 (inf) loss_scale 256.0000 (385.9335) mem 14939MB [2024-07-25 06:45:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][340/625] eta 0:02:01 lr 0.000339 wd 0.0500 time 0.4145 (0.4246) data time 0.0010 (0.0033) model time 0.4135 (0.4241) loss 7.1506 (6.9100) grad_norm 2.1091 (inf) loss_scale 256.0000 (382.1232) mem 14939MB [2024-07-25 06:45:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][350/625] eta 0:01:56 lr 0.000339 wd 0.0500 time 0.3926 (0.4243) data time 0.0011 (0.0033) model time 0.3915 (0.4236) loss 7.6888 (6.9094) grad_norm 2.0125 (inf) loss_scale 256.0000 (378.5299) mem 14939MB [2024-07-25 06:46:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][360/625] eta 0:01:52 lr 0.000339 wd 0.0500 time 0.3982 (0.4239) data time 0.0010 (0.0034) model time 0.3972 (0.4230) loss 7.2391 (6.9028) grad_norm 3.2000 (inf) loss_scale 256.0000 (375.1357) mem 14939MB [2024-07-25 06:46:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][370/625] eta 0:01:48 lr 0.000339 wd 0.0500 time 0.4131 (0.4236) data time 0.0007 (0.0033) model time 0.4123 (0.4226) loss 6.3980 (6.8934) grad_norm 2.8306 (inf) loss_scale 256.0000 (371.9245) mem 14939MB [2024-07-25 06:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][380/625] eta 0:01:43 lr 0.000339 wd 0.0500 time 0.3939 (0.4230) data time 0.0007 (0.0032) model time 0.3932 (0.4220) loss 7.4899 (6.8899) grad_norm 2.8393 (inf) loss_scale 256.0000 (368.8819) mem 14939MB [2024-07-25 06:46:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][390/625] eta 0:01:39 lr 0.000339 wd 0.0500 time 0.4189 (0.4228) data time 0.0008 (0.0032) model time 0.4181 (0.4216) loss 7.6750 (6.8877) grad_norm 2.1061 (inf) loss_scale 256.0000 (365.9949) mem 14939MB [2024-07-25 06:46:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][400/625] eta 0:01:35 lr 0.000338 wd 0.0500 time 0.4237 (0.4225) data time 0.0009 (0.0032) model time 0.4227 (0.4213) loss 5.4643 (6.8911) grad_norm 3.0662 (inf) loss_scale 256.0000 (363.2519) mem 14939MB [2024-07-25 06:46:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][410/625] eta 0:01:30 lr 0.000338 wd 0.0500 time 0.3935 (0.4222) data time 0.0010 (0.0031) model time 0.3925 (0.4210) loss 6.2716 (6.8909) grad_norm 2.6762 (inf) loss_scale 256.0000 (360.6423) mem 14939MB [2024-07-25 06:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][420/625] eta 0:01:26 lr 0.000338 wd 0.0500 time 0.4121 (0.4219) data time 0.0009 (0.0031) model time 0.4112 (0.4206) loss 7.4346 (6.8953) grad_norm 3.6109 (inf) loss_scale 256.0000 (358.1568) mem 14939MB [2024-07-25 06:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][430/625] eta 0:01:22 lr 0.000338 wd 0.0500 time 0.4150 (0.4216) data time 0.0007 (0.0031) model time 0.4143 (0.4202) loss 7.7756 (6.9010) grad_norm 4.7185 (inf) loss_scale 256.0000 (355.7865) mem 14939MB [2024-07-25 06:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][440/625] eta 0:01:17 lr 0.000338 wd 0.0500 time 0.3962 (0.4214) data time 0.0009 (0.0031) model time 0.3953 (0.4200) loss 7.8053 (6.9024) grad_norm 3.8551 (inf) loss_scale 256.0000 (353.5238) mem 14939MB [2024-07-25 06:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][450/625] eta 0:01:13 lr 0.000338 wd 0.0500 time 0.3985 (0.4212) data time 0.0007 (0.0031) model time 0.3978 (0.4197) loss 5.9363 (6.9024) grad_norm 2.7013 (inf) loss_scale 256.0000 (351.3614) mem 14939MB [2024-07-25 06:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][460/625] eta 0:01:09 lr 0.000338 wd 0.0500 time 0.4066 (0.4209) data time 0.0007 (0.0030) model time 0.4060 (0.4194) loss 8.9549 (6.9077) grad_norm 2.3914 (inf) loss_scale 256.0000 (349.2928) mem 14939MB [2024-07-25 06:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][470/625] eta 0:01:05 lr 0.000338 wd 0.0500 time 0.3929 (0.4205) data time 0.0009 (0.0030) model time 0.3920 (0.4190) loss 6.8172 (6.8968) grad_norm 1.8770 (inf) loss_scale 256.0000 (347.3121) mem 14939MB [2024-07-25 06:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][480/625] eta 0:01:00 lr 0.000338 wd 0.0500 time 0.4118 (0.4202) data time 0.0008 (0.0029) model time 0.4110 (0.4187) loss 6.7575 (6.8954) grad_norm 2.5809 (inf) loss_scale 256.0000 (345.4137) mem 14939MB [2024-07-25 06:46:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][490/625] eta 0:00:56 lr 0.000338 wd 0.0500 time 0.4125 (0.4201) data time 0.0007 (0.0029) model time 0.4119 (0.4185) loss 5.5909 (6.8866) grad_norm 2.4597 (inf) loss_scale 256.0000 (343.5927) mem 14939MB [2024-07-25 06:46:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][500/625] eta 0:00:52 lr 0.000338 wd 0.0500 time 0.3935 (0.4199) data time 0.0008 (0.0030) model time 0.3927 (0.4182) loss 5.3265 (6.8813) grad_norm 2.9663 (inf) loss_scale 256.0000 (341.8443) mem 14939MB [2024-07-25 06:47:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][510/625] eta 0:00:48 lr 0.000337 wd 0.0500 time 0.6189 (0.4214) data time 0.0008 (0.0030) model time 0.6180 (0.4198) loss 6.0826 (6.8714) grad_norm 2.3930 (inf) loss_scale 256.0000 (340.1644) mem 14939MB [2024-07-25 06:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][520/625] eta 0:00:44 lr 0.000337 wd 0.0500 time 0.3981 (0.4227) data time 0.0007 (0.0030) model time 0.3974 (0.4212) loss 5.9225 (6.8752) grad_norm 2.1825 (inf) loss_scale 256.0000 (338.5489) mem 14939MB [2024-07-25 06:47:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][530/625] eta 0:00:40 lr 0.000337 wd 0.0500 time 0.4294 (0.4236) data time 0.0008 (0.0030) model time 0.4286 (0.4222) loss 7.9833 (6.8758) grad_norm 4.2438 (inf) loss_scale 256.0000 (336.9944) mem 14939MB [2024-07-25 06:47:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][540/625] eta 0:00:36 lr 0.000337 wd 0.0500 time 0.5848 (0.4250) data time 0.0009 (0.0030) model time 0.5838 (0.4237) loss 6.0507 (6.8790) grad_norm 3.2966 (inf) loss_scale 256.0000 (335.4972) mem 14939MB [2024-07-25 06:47:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][550/625] eta 0:00:31 lr 0.000337 wd 0.0500 time 0.4436 (0.4258) data time 0.0007 (0.0029) model time 0.4429 (0.4247) loss 6.9919 (6.8804) grad_norm 2.9065 (inf) loss_scale 256.0000 (334.0544) mem 14939MB [2024-07-25 06:47:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][560/625] eta 0:00:27 lr 0.000337 wd 0.0500 time 0.3952 (0.4255) data time 0.0009 (0.0029) model time 0.3943 (0.4243) loss 7.4558 (6.8808) grad_norm 3.0187 (inf) loss_scale 256.0000 (332.6631) mem 14939MB [2024-07-25 06:47:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][570/625] eta 0:00:23 lr 0.000337 wd 0.0500 time 0.4027 (0.4251) data time 0.0007 (0.0029) model time 0.4020 (0.4239) loss 6.1768 (6.8873) grad_norm 1.8816 (inf) loss_scale 256.0000 (331.3205) mem 14939MB [2024-07-25 06:47:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][580/625] eta 0:00:19 lr 0.000337 wd 0.0500 time 0.4424 (0.4249) data time 0.0008 (0.0028) model time 0.4416 (0.4237) loss 7.7043 (6.8881) grad_norm 2.1357 (inf) loss_scale 256.0000 (330.0241) mem 14939MB [2024-07-25 06:47:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][590/625] eta 0:00:14 lr 0.000337 wd 0.0500 time 0.3981 (0.4246) data time 0.0006 (0.0028) model time 0.3974 (0.4233) loss 7.2219 (6.8876) grad_norm 2.1029 (inf) loss_scale 256.0000 (328.7716) mem 14939MB [2024-07-25 06:47:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][600/625] eta 0:00:10 lr 0.000337 wd 0.0500 time 0.3969 (0.4243) data time 0.0007 (0.0028) model time 0.3962 (0.4230) loss 6.1708 (6.8916) grad_norm 1.7034 (inf) loss_scale 256.0000 (327.5607) mem 14939MB [2024-07-25 06:47:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][610/625] eta 0:00:06 lr 0.000336 wd 0.0500 time 0.4130 (0.4240) data time 0.0004 (0.0028) model time 0.4126 (0.4226) loss 5.8173 (6.8885) grad_norm 2.8945 (inf) loss_scale 256.0000 (326.3895) mem 14939MB [2024-07-25 06:47:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][620/625] eta 0:00:02 lr 0.000336 wd 0.0500 time 0.4035 (0.4236) data time 0.0004 (0.0027) model time 0.4031 (0.4223) loss 7.0286 (6.8885) grad_norm 1.9479 (inf) loss_scale 256.0000 (325.2560) mem 14939MB [2024-07-25 06:47:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 201 training takes 0:04:24 [2024-07-25 06:47:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:47:54 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:47:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.5688 (0.5688) Acc@1 89.551 (89.551) Acc@5 98.389 (98.389) Mem 14939MB [2024-07-25 06:47:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8740 (0.6902) Acc@1 81.299 (86.479) Acc@5 96.436 (97.718) Mem 14939MB [2024-07-25 06:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9897 (0.8070) Acc@1 76.758 (83.236) Acc@5 94.922 (96.519) Mem 14939MB [2024-07-25 06:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.802 Acc@5 96.499 [2024-07-25 06:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 06:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.80% [2024-07-25 06:47:57 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 06:47:58 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 06:47:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.5469 (0.5469) Acc@1 89.990 (89.990) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 06:47:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8516 (0.6769) Acc@1 81.348 (86.572) Acc@5 96.338 (97.772) Mem 14939MB [2024-07-25 06:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9814 (0.7905) Acc@1 76.855 (83.440) Acc@5 95.459 (96.705) Mem 14939MB [2024-07-25 06:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.031 Acc@5 96.653 [2024-07-25 06:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 06:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.03% [2024-07-25 06:48:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:48:01 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:48:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][0/625] eta 0:08:27 lr 0.000336 wd 0.0500 time 0.8116 (0.8116) data time 0.4267 (0.4267) model time 0.0000 (0.0000) loss 6.7246 (6.7246) grad_norm 2.8083 (2.8083) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][10/625] eta 0:04:39 lr 0.000336 wd 0.0500 time 0.3962 (0.4541) data time 0.0008 (0.0400) model time 0.0000 (0.0000) loss 5.8951 (6.5917) grad_norm 13.8373 (3.9762) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][20/625] eta 0:04:20 lr 0.000336 wd 0.0500 time 0.3974 (0.4298) data time 0.0008 (0.0214) model time 0.0000 (0.0000) loss 5.6236 (6.5629) grad_norm 2.8455 (3.5399) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][30/625] eta 0:04:10 lr 0.000336 wd 0.0500 time 0.4050 (0.4207) data time 0.0008 (0.0148) model time 0.0000 (0.0000) loss 7.1922 (6.6784) grad_norm 2.9110 (3.4632) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][40/625] eta 0:04:03 lr 0.000336 wd 0.0500 time 0.3943 (0.4161) data time 0.0009 (0.0114) model time 0.0000 (0.0000) loss 6.9938 (6.7682) grad_norm 1.7614 (3.3463) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][50/625] eta 0:03:57 lr 0.000336 wd 0.0500 time 0.3926 (0.4128) data time 0.0008 (0.0094) model time 0.0000 (0.0000) loss 6.3363 (6.8036) grad_norm 2.9313 (3.2778) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][60/625] eta 0:03:52 lr 0.000336 wd 0.0500 time 0.3990 (0.4108) data time 0.0009 (0.0080) model time 0.3981 (0.3996) loss 7.3893 (6.7770) grad_norm 2.0492 (3.2353) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][70/625] eta 0:03:47 lr 0.000336 wd 0.0500 time 0.3944 (0.4093) data time 0.0008 (0.0070) model time 0.3936 (0.3993) loss 8.4077 (6.8311) grad_norm 2.8799 (3.1624) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][80/625] eta 0:03:42 lr 0.000336 wd 0.0500 time 0.4012 (0.4082) data time 0.0008 (0.0063) model time 0.4004 (0.3992) loss 7.0287 (6.8262) grad_norm 2.7107 (3.0835) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][90/625] eta 0:03:37 lr 0.000335 wd 0.0500 time 0.4002 (0.4074) data time 0.0007 (0.0057) model time 0.3995 (0.3994) loss 7.3078 (6.8014) grad_norm 1.9694 (2.9906) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][100/625] eta 0:03:34 lr 0.000335 wd 0.0500 time 0.3947 (0.4080) data time 0.0008 (0.0052) model time 0.3939 (0.4021) loss 7.0267 (6.7824) grad_norm 2.3413 (3.0221) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][110/625] eta 0:03:33 lr 0.000335 wd 0.0500 time 0.4036 (0.4144) data time 0.0009 (0.0049) model time 0.4028 (0.4147) loss 5.6935 (6.7725) grad_norm 1.5751 (2.9536) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][120/625] eta 0:03:33 lr 0.000335 wd 0.0500 time 0.5819 (0.4224) data time 0.0007 (0.0045) model time 0.5812 (0.4284) loss 7.2762 (6.7825) grad_norm 2.1076 (2.8995) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:48:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][130/625] eta 0:03:29 lr 0.000335 wd 0.0500 time 0.5710 (0.4235) data time 0.0009 (0.0043) model time 0.5701 (0.4292) loss 6.3526 (6.7676) grad_norm 2.5591 (2.9430) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][140/625] eta 0:03:27 lr 0.000335 wd 0.0500 time 0.5291 (0.4279) data time 0.0008 (0.0040) model time 0.5283 (0.4354) loss 7.0711 (6.7737) grad_norm 2.3081 (2.9578) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][150/625] eta 0:03:23 lr 0.000335 wd 0.0500 time 0.4063 (0.4281) data time 0.0008 (0.0038) model time 0.4055 (0.4349) loss 6.6570 (6.7566) grad_norm 1.7787 (2.9490) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][160/625] eta 0:03:18 lr 0.000335 wd 0.0500 time 0.3949 (0.4263) data time 0.0007 (0.0036) model time 0.3942 (0.4315) loss 6.5815 (6.7524) grad_norm 6.9086 (3.0121) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][170/625] eta 0:03:13 lr 0.000335 wd 0.0500 time 0.3992 (0.4248) data time 0.0008 (0.0035) model time 0.3984 (0.4290) loss 7.4334 (6.7862) grad_norm 5.3681 (3.0498) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][180/625] eta 0:03:08 lr 0.000335 wd 0.0500 time 0.3984 (0.4235) data time 0.0009 (0.0033) model time 0.3975 (0.4267) loss 6.8794 (6.7876) grad_norm 1.7770 (3.0677) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][190/625] eta 0:03:03 lr 0.000335 wd 0.0500 time 0.3975 (0.4224) data time 0.0009 (0.0032) model time 0.3966 (0.4249) loss 7.5342 (6.8047) grad_norm 3.2952 (3.0607) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][200/625] eta 0:02:59 lr 0.000334 wd 0.0500 time 0.3979 (0.4213) data time 0.0007 (0.0031) model time 0.3973 (0.4232) loss 5.7784 (6.8019) grad_norm 3.3580 (3.0580) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][210/625] eta 0:02:54 lr 0.000334 wd 0.0500 time 0.4012 (0.4204) data time 0.0007 (0.0030) model time 0.4005 (0.4218) loss 6.8258 (6.7939) grad_norm 2.1699 (3.0141) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][220/625] eta 0:02:49 lr 0.000334 wd 0.0500 time 0.3951 (0.4195) data time 0.0007 (0.0029) model time 0.3944 (0.4205) loss 8.0852 (6.8219) grad_norm 3.5059 (3.0976) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][230/625] eta 0:02:45 lr 0.000334 wd 0.0500 time 0.4006 (0.4194) data time 0.0009 (0.0028) model time 0.3997 (0.4203) loss 6.3742 (6.8197) grad_norm 2.9857 (3.1224) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][240/625] eta 0:02:41 lr 0.000334 wd 0.0500 time 0.3958 (0.4186) data time 0.0008 (0.0027) model time 0.3950 (0.4191) loss 7.2928 (6.8322) grad_norm 3.1235 (3.1198) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][250/625] eta 0:02:36 lr 0.000334 wd 0.0500 time 0.3955 (0.4179) data time 0.0007 (0.0027) model time 0.3948 (0.4182) loss 8.0649 (6.8410) grad_norm 1.8346 (3.1236) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][260/625] eta 0:02:32 lr 0.000334 wd 0.0500 time 0.3981 (0.4172) data time 0.0009 (0.0026) model time 0.3972 (0.4174) loss 7.8994 (6.8510) grad_norm 2.3264 (3.1328) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][270/625] eta 0:02:27 lr 0.000334 wd 0.0500 time 0.4340 (0.4168) data time 0.0009 (0.0025) model time 0.4331 (0.4168) loss 7.9270 (6.8584) grad_norm 3.5766 (3.1497) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:49:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][280/625] eta 0:02:23 lr 0.000334 wd 0.0500 time 0.3902 (0.4162) data time 0.0009 (0.0025) model time 0.3894 (0.4160) loss 8.3550 (6.8618) grad_norm 3.0297 (3.2064) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][290/625] eta 0:02:19 lr 0.000334 wd 0.0500 time 0.3977 (0.4160) data time 0.0007 (0.0024) model time 0.3970 (0.4157) loss 7.0130 (6.8526) grad_norm 2.8328 (3.2217) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][300/625] eta 0:02:15 lr 0.000333 wd 0.0500 time 0.4043 (0.4156) data time 0.0008 (0.0024) model time 0.4034 (0.4152) loss 5.9442 (6.8498) grad_norm 2.8063 (3.2193) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][310/625] eta 0:02:10 lr 0.000333 wd 0.0500 time 0.4007 (0.4152) data time 0.0010 (0.0024) model time 0.3998 (0.4146) loss 6.8775 (6.8554) grad_norm 2.4718 (3.2130) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][320/625] eta 0:02:06 lr 0.000333 wd 0.0500 time 0.5629 (0.4156) data time 0.0007 (0.0023) model time 0.5622 (0.4152) loss 5.9887 (6.8490) grad_norm 7.3889 (3.2095) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][330/625] eta 0:02:02 lr 0.000333 wd 0.0500 time 0.5648 (0.4163) data time 0.0007 (0.0023) model time 0.5641 (0.4158) loss 7.5969 (6.8454) grad_norm 2.6974 (3.1856) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][340/625] eta 0:01:59 lr 0.000333 wd 0.0500 time 0.6027 (0.4179) data time 0.0006 (0.0023) model time 0.6021 (0.4178) loss 7.8409 (6.8416) grad_norm 2.5135 (3.1814) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][350/625] eta 0:01:55 lr 0.000333 wd 0.0500 time 0.5712 (0.4189) data time 0.0009 (0.0023) model time 0.5704 (0.4189) loss 7.2893 (6.8490) grad_norm 5.5037 (3.1784) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][360/625] eta 0:01:51 lr 0.000333 wd 0.0500 time 0.5841 (0.4210) data time 0.0007 (0.0022) model time 0.5834 (0.4213) loss 6.0073 (6.8327) grad_norm 2.6745 (3.1789) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][370/625] eta 0:01:47 lr 0.000333 wd 0.0500 time 0.4023 (0.4214) data time 0.0007 (0.0022) model time 0.4016 (0.4217) loss 6.3738 (6.8331) grad_norm 3.4543 (3.1913) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][380/625] eta 0:01:43 lr 0.000333 wd 0.0500 time 0.3976 (0.4209) data time 0.0009 (0.0022) model time 0.3967 (0.4211) loss 5.8407 (6.8334) grad_norm 2.7334 (3.1946) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][390/625] eta 0:01:38 lr 0.000333 wd 0.0500 time 0.3961 (0.4203) data time 0.0006 (0.0021) model time 0.3955 (0.4203) loss 7.1779 (6.8301) grad_norm 2.5798 (3.1788) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][400/625] eta 0:01:34 lr 0.000333 wd 0.0500 time 0.3983 (0.4198) data time 0.0007 (0.0021) model time 0.3976 (0.4198) loss 5.9108 (6.8278) grad_norm 2.7556 (3.1700) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][410/625] eta 0:01:30 lr 0.000332 wd 0.0500 time 0.3993 (0.4193) data time 0.0007 (0.0021) model time 0.3986 (0.4192) loss 7.1430 (6.8337) grad_norm 4.2883 (3.1566) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][420/625] eta 0:01:25 lr 0.000332 wd 0.0500 time 0.3938 (0.4189) data time 0.0008 (0.0020) model time 0.3930 (0.4187) loss 6.0581 (6.8387) grad_norm 2.4659 (3.1519) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][430/625] eta 0:01:21 lr 0.000332 wd 0.0500 time 0.3991 (0.4184) data time 0.0009 (0.0020) model time 0.3981 (0.4182) loss 7.1421 (6.8361) grad_norm 1.9331 (3.1378) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][440/625] eta 0:01:17 lr 0.000332 wd 0.0500 time 0.4026 (0.4180) data time 0.0007 (0.0020) model time 0.4019 (0.4176) loss 7.6675 (6.8296) grad_norm 2.6464 (3.1247) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][450/625] eta 0:01:13 lr 0.000332 wd 0.0500 time 0.4000 (0.4179) data time 0.0009 (0.0020) model time 0.3991 (0.4175) loss 7.5210 (6.8286) grad_norm 2.7840 (3.1245) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][460/625] eta 0:01:08 lr 0.000332 wd 0.0500 time 0.4011 (0.4175) data time 0.0008 (0.0019) model time 0.4002 (0.4171) loss 6.0089 (6.8237) grad_norm 3.8525 (3.1166) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][470/625] eta 0:01:04 lr 0.000332 wd 0.0500 time 0.3954 (0.4172) data time 0.0008 (0.0019) model time 0.3947 (0.4167) loss 5.5639 (6.8196) grad_norm 2.4819 (3.1145) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][480/625] eta 0:01:00 lr 0.000332 wd 0.0500 time 0.3949 (0.4168) data time 0.0008 (0.0019) model time 0.3941 (0.4163) loss 6.1912 (6.8144) grad_norm 2.8330 (3.1029) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][490/625] eta 0:00:56 lr 0.000332 wd 0.0500 time 0.3981 (0.4164) data time 0.0009 (0.0019) model time 0.3972 (0.4158) loss 5.6691 (6.8073) grad_norm 1.6763 (3.0964) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][500/625] eta 0:00:52 lr 0.000332 wd 0.0500 time 0.3982 (0.4161) data time 0.0007 (0.0019) model time 0.3976 (0.4155) loss 6.6920 (6.8083) grad_norm 2.3186 (3.0944) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][510/625] eta 0:00:47 lr 0.000331 wd 0.0500 time 0.3995 (0.4158) data time 0.0009 (0.0018) model time 0.3987 (0.4152) loss 6.1640 (6.8103) grad_norm 7.7903 (3.0988) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][520/625] eta 0:00:43 lr 0.000331 wd 0.0500 time 0.4042 (0.4156) data time 0.0009 (0.0018) model time 0.4034 (0.4149) loss 6.5413 (6.8017) grad_norm 2.3627 (3.1105) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][530/625] eta 0:00:39 lr 0.000331 wd 0.0500 time 0.4194 (0.4154) data time 0.0008 (0.0018) model time 0.4186 (0.4147) loss 7.4779 (6.7948) grad_norm 3.2727 (3.1165) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][540/625] eta 0:00:35 lr 0.000331 wd 0.0500 time 0.5910 (0.4157) data time 0.0008 (0.0018) model time 0.5902 (0.4150) loss 8.1155 (6.7972) grad_norm 2.7773 (3.2057) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][550/625] eta 0:00:31 lr 0.000331 wd 0.0500 time 0.4068 (0.4160) data time 0.0010 (0.0018) model time 0.4058 (0.4153) loss 7.1219 (6.8015) grad_norm 3.2057 (3.1897) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:51:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][560/625] eta 0:00:27 lr 0.000331 wd 0.0500 time 0.3925 (0.4178) data time 0.0007 (0.0018) model time 0.3918 (0.4173) loss 8.1023 (6.8013) grad_norm 3.7244 (3.1867) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][570/625] eta 0:00:23 lr 0.000331 wd 0.0500 time 0.4036 (0.4184) data time 0.0007 (0.0018) model time 0.4030 (0.4179) loss 7.4773 (6.7992) grad_norm 3.1533 (3.1903) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][580/625] eta 0:00:18 lr 0.000331 wd 0.0500 time 0.3955 (0.4198) data time 0.0007 (0.0018) model time 0.3948 (0.4195) loss 5.6182 (6.7981) grad_norm 1.8854 (3.1754) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][590/625] eta 0:00:14 lr 0.000331 wd 0.0500 time 0.4068 (0.4198) data time 0.0008 (0.0018) model time 0.4060 (0.4194) loss 7.6486 (6.8011) grad_norm 3.2947 (3.1775) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][600/625] eta 0:00:10 lr 0.000331 wd 0.0500 time 0.4098 (0.4195) data time 0.0007 (0.0018) model time 0.4091 (0.4191) loss 5.6752 (6.8072) grad_norm 14.7405 (3.1953) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][610/625] eta 0:00:06 lr 0.000331 wd 0.0500 time 0.3965 (0.4192) data time 0.0006 (0.0018) model time 0.3960 (0.4187) loss 5.3876 (6.8089) grad_norm 2.5081 (3.1904) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][620/625] eta 0:00:02 lr 0.000330 wd 0.0500 time 0.4011 (0.4189) data time 0.0006 (0.0017) model time 0.4005 (0.4184) loss 7.4522 (6.8135) grad_norm 5.1588 (3.2016) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 202 training takes 0:04:21 [2024-07-25 06:52:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:52:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:52:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5693 (0.5693) Acc@1 89.648 (89.648) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 06:52:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8857 (0.6995) Acc@1 80.908 (86.341) Acc@5 96.240 (97.692) Mem 14939MB [2024-07-25 06:52:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 1.0059 (0.8173) Acc@1 76.709 (83.215) Acc@5 94.824 (96.482) Mem 14939MB [2024-07-25 06:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.784 Acc@5 96.453 [2024-07-25 06:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 06:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.857 (0.857) Loss 0.5469 (0.5469) Acc@1 90.039 (90.039) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:52:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.158) Loss 0.8511 (0.6762) Acc@1 81.299 (86.594) Acc@5 96.338 (97.781) Mem 14939MB [2024-07-25 06:52:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9800 (0.7897) Acc@1 76.904 (83.482) Acc@5 95.459 (96.722) Mem 14939MB [2024-07-25 06:52:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.063 Acc@5 96.665 [2024-07-25 06:52:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 06:52:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.06% [2024-07-25 06:52:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:52:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:52:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][0/625] eta 0:07:52 lr 0.000330 wd 0.0500 time 0.7560 (0.7560) data time 0.3786 (0.3786) model time 0.0000 (0.0000) loss 4.9357 (4.9357) grad_norm 2.0200 (2.0200) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][10/625] eta 0:04:25 lr 0.000330 wd 0.0500 time 0.4004 (0.4311) data time 0.0009 (0.0353) model time 0.0000 (0.0000) loss 7.3509 (6.6054) grad_norm 2.1604 (2.6334) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][20/625] eta 0:04:12 lr 0.000330 wd 0.0500 time 0.3945 (0.4171) data time 0.0008 (0.0190) model time 0.0000 (0.0000) loss 6.0975 (6.7389) grad_norm 2.2182 (2.7348) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][30/625] eta 0:04:04 lr 0.000330 wd 0.0500 time 0.3937 (0.4110) data time 0.0009 (0.0131) model time 0.0000 (0.0000) loss 7.0611 (6.8091) grad_norm 3.1185 (2.9239) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][40/625] eta 0:03:59 lr 0.000330 wd 0.0500 time 0.4218 (0.4093) data time 0.0008 (0.0102) model time 0.0000 (0.0000) loss 6.5764 (6.8428) grad_norm 2.7933 (3.0251) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][50/625] eta 0:03:54 lr 0.000330 wd 0.0500 time 0.3939 (0.4076) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 5.5574 (6.8628) grad_norm 3.0988 (2.9746) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:52:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][60/625] eta 0:03:49 lr 0.000330 wd 0.0500 time 0.3969 (0.4060) data time 0.0006 (0.0072) model time 0.3962 (0.3968) loss 7.5525 (6.8864) grad_norm 3.5274 (3.0732) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][70/625] eta 0:03:45 lr 0.000330 wd 0.0500 time 0.4121 (0.4058) data time 0.0008 (0.0063) model time 0.4113 (0.4001) loss 8.0921 (6.8946) grad_norm 2.0258 (3.0728) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][80/625] eta 0:03:40 lr 0.000330 wd 0.0500 time 0.3942 (0.4048) data time 0.0009 (0.0056) model time 0.3933 (0.3990) loss 7.3420 (6.8658) grad_norm 4.1969 (3.0436) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][90/625] eta 0:03:36 lr 0.000330 wd 0.0500 time 0.3986 (0.4044) data time 0.0009 (0.0051) model time 0.3977 (0.3992) loss 6.9472 (6.8805) grad_norm 2.2279 (3.0299) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][100/625] eta 0:03:32 lr 0.000329 wd 0.0500 time 0.4089 (0.4040) data time 0.0007 (0.0047) model time 0.4082 (0.3994) loss 6.7231 (6.8917) grad_norm 4.0415 (3.5238) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][110/625] eta 0:03:27 lr 0.000329 wd 0.0500 time 0.4004 (0.4037) data time 0.0006 (0.0044) model time 0.3997 (0.3994) loss 7.7689 (6.8787) grad_norm 2.3155 (3.4689) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][120/625] eta 0:03:23 lr 0.000329 wd 0.0500 time 0.3988 (0.4034) data time 0.0009 (0.0041) model time 0.3979 (0.3993) loss 7.3151 (6.8597) grad_norm 3.1009 (3.4376) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][130/625] eta 0:03:20 lr 0.000329 wd 0.0500 time 0.4215 (0.4042) data time 0.0007 (0.0040) model time 0.4208 (0.4008) loss 7.4316 (6.8706) grad_norm 2.4132 (3.3829) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][140/625] eta 0:03:17 lr 0.000329 wd 0.0500 time 0.4037 (0.4064) data time 0.0007 (0.0038) model time 0.4031 (0.4045) loss 5.8458 (6.8451) grad_norm 2.1867 (3.3775) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][150/625] eta 0:03:15 lr 0.000329 wd 0.0500 time 0.6152 (0.4118) data time 0.0009 (0.0037) model time 0.6143 (0.4126) loss 7.6344 (6.8611) grad_norm 1.9121 (3.3416) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][160/625] eta 0:03:13 lr 0.000329 wd 0.0500 time 0.4003 (0.4152) data time 0.0007 (0.0035) model time 0.3996 (0.4175) loss 7.2369 (6.8758) grad_norm 3.4263 (3.3438) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][170/625] eta 0:03:10 lr 0.000329 wd 0.0500 time 0.5555 (0.4182) data time 0.0009 (0.0034) model time 0.5546 (0.4214) loss 6.9308 (6.8662) grad_norm 2.9295 (3.3222) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][180/625] eta 0:03:07 lr 0.000329 wd 0.0500 time 0.4017 (0.4219) data time 0.0008 (0.0033) model time 0.4009 (0.4263) loss 6.9352 (6.8666) grad_norm 2.0923 (3.2903) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][190/625] eta 0:03:03 lr 0.000329 wd 0.0500 time 0.3946 (0.4213) data time 0.0007 (0.0031) model time 0.3939 (0.4250) loss 7.1217 (6.8543) grad_norm 2.6975 (3.2730) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][200/625] eta 0:02:58 lr 0.000329 wd 0.0500 time 0.4008 (0.4202) data time 0.0008 (0.0030) model time 0.4000 (0.4232) loss 7.6465 (6.8783) grad_norm 2.4072 (3.2491) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:53:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][210/625] eta 0:02:53 lr 0.000328 wd 0.0500 time 0.3956 (0.4192) data time 0.0009 (0.0029) model time 0.3947 (0.4217) loss 7.1634 (6.8892) grad_norm 2.3733 (3.2183) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][220/625] eta 0:02:49 lr 0.000328 wd 0.0500 time 0.4004 (0.4194) data time 0.0008 (0.0028) model time 0.3996 (0.4218) loss 6.9996 (6.8950) grad_norm 1.9584 (3.1947) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][230/625] eta 0:02:45 lr 0.000328 wd 0.0500 time 0.3964 (0.4186) data time 0.0006 (0.0028) model time 0.3959 (0.4205) loss 5.9695 (6.9029) grad_norm 2.2644 (3.1571) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][240/625] eta 0:02:40 lr 0.000328 wd 0.0500 time 0.3981 (0.4177) data time 0.0006 (0.0027) model time 0.3975 (0.4193) loss 7.4764 (6.9011) grad_norm 2.2616 (3.1278) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][250/625] eta 0:02:36 lr 0.000328 wd 0.0500 time 0.3963 (0.4170) data time 0.0009 (0.0026) model time 0.3954 (0.4182) loss 6.5799 (6.9098) grad_norm 3.8409 (3.1064) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][260/625] eta 0:02:31 lr 0.000328 wd 0.0500 time 0.3982 (0.4163) data time 0.0009 (0.0026) model time 0.3974 (0.4172) loss 6.3915 (6.9103) grad_norm 2.4205 (3.0960) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][270/625] eta 0:02:27 lr 0.000328 wd 0.0500 time 0.3990 (0.4156) data time 0.0008 (0.0025) model time 0.3981 (0.4164) loss 6.9950 (6.9084) grad_norm 4.1157 (3.0839) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][280/625] eta 0:02:23 lr 0.000328 wd 0.0500 time 0.3982 (0.4150) data time 0.0006 (0.0024) model time 0.3976 (0.4155) loss 5.7511 (6.9059) grad_norm 2.4295 (3.0907) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][290/625] eta 0:02:18 lr 0.000328 wd 0.0500 time 0.3982 (0.4145) data time 0.0006 (0.0024) model time 0.3976 (0.4148) loss 7.8034 (6.9109) grad_norm 3.2192 (3.0808) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][300/625] eta 0:02:14 lr 0.000328 wd 0.0500 time 0.3981 (0.4140) data time 0.0007 (0.0023) model time 0.3974 (0.4142) loss 6.8982 (6.9037) grad_norm 8.2707 (3.0814) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][310/625] eta 0:02:10 lr 0.000327 wd 0.0500 time 0.3983 (0.4136) data time 0.0007 (0.0023) model time 0.3977 (0.4136) loss 7.2383 (6.9043) grad_norm 2.2879 (3.0947) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][320/625] eta 0:02:06 lr 0.000327 wd 0.0500 time 0.3996 (0.4132) data time 0.0006 (0.0022) model time 0.3990 (0.4132) loss 6.4864 (6.9111) grad_norm 1.9204 (3.0748) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][330/625] eta 0:02:01 lr 0.000327 wd 0.0500 time 0.3990 (0.4128) data time 0.0008 (0.0022) model time 0.3982 (0.4127) loss 7.1095 (6.9105) grad_norm 2.0563 (3.0763) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][340/625] eta 0:01:57 lr 0.000327 wd 0.0500 time 0.3957 (0.4125) data time 0.0007 (0.0022) model time 0.3951 (0.4122) loss 6.7660 (6.9189) grad_norm 2.5941 (3.0742) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][350/625] eta 0:01:53 lr 0.000327 wd 0.0500 time 0.3949 (0.4121) data time 0.0009 (0.0021) model time 0.3940 (0.4118) loss 7.4271 (6.9038) grad_norm 2.3448 (3.0521) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][360/625] eta 0:01:49 lr 0.000327 wd 0.0500 time 0.3988 (0.4126) data time 0.0009 (0.0021) model time 0.3979 (0.4124) loss 7.7574 (6.9077) grad_norm 2.2950 (3.0486) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][370/625] eta 0:01:45 lr 0.000327 wd 0.0500 time 0.5827 (0.4143) data time 0.0009 (0.0021) model time 0.5818 (0.4143) loss 6.5582 (6.9115) grad_norm 2.6707 (3.0634) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][380/625] eta 0:01:42 lr 0.000327 wd 0.0500 time 0.5907 (0.4164) data time 0.0008 (0.0020) model time 0.5898 (0.4167) loss 7.5695 (6.9108) grad_norm 2.0907 (3.0627) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][390/625] eta 0:01:38 lr 0.000327 wd 0.0500 time 0.5773 (0.4177) data time 0.0006 (0.0021) model time 0.5767 (0.4180) loss 5.5228 (6.9063) grad_norm 2.1318 (3.0515) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][400/625] eta 0:01:34 lr 0.000327 wd 0.0500 time 0.3960 (0.4193) data time 0.0007 (0.0021) model time 0.3953 (0.4198) loss 7.5047 (6.9150) grad_norm 4.7659 (3.0586) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][410/625] eta 0:01:30 lr 0.000327 wd 0.0500 time 0.4317 (0.4193) data time 0.0006 (0.0020) model time 0.4310 (0.4198) loss 6.9321 (6.9081) grad_norm 2.4123 (3.0593) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][420/625] eta 0:01:25 lr 0.000326 wd 0.0500 time 0.3998 (0.4189) data time 0.0006 (0.0020) model time 0.3991 (0.4193) loss 5.8696 (6.9086) grad_norm 2.9275 (3.0980) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][430/625] eta 0:01:21 lr 0.000326 wd 0.0500 time 0.3979 (0.4184) data time 0.0007 (0.0020) model time 0.3972 (0.4187) loss 7.8032 (6.9081) grad_norm 4.9572 (3.1082) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][440/625] eta 0:01:17 lr 0.000326 wd 0.0500 time 0.3989 (0.4184) data time 0.0009 (0.0020) model time 0.3980 (0.4186) loss 6.8657 (6.9012) grad_norm 1.8322 (3.1179) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][450/625] eta 0:01:13 lr 0.000326 wd 0.0500 time 0.4042 (0.4180) data time 0.0008 (0.0019) model time 0.4034 (0.4182) loss 7.3975 (6.9039) grad_norm 3.3503 (3.1164) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][460/625] eta 0:01:08 lr 0.000326 wd 0.0500 time 0.4033 (0.4176) data time 0.0007 (0.0019) model time 0.4026 (0.4177) loss 5.6753 (6.9084) grad_norm 2.2317 (3.1159) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][470/625] eta 0:01:04 lr 0.000326 wd 0.0500 time 0.3948 (0.4172) data time 0.0007 (0.0019) model time 0.3941 (0.4172) loss 6.6405 (6.9157) grad_norm 3.0233 (3.1259) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][480/625] eta 0:01:00 lr 0.000326 wd 0.0500 time 0.4071 (0.4168) data time 0.0007 (0.0019) model time 0.4063 (0.4167) loss 7.6217 (6.9203) grad_norm 1.9725 (3.1133) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][490/625] eta 0:00:56 lr 0.000326 wd 0.0500 time 0.3967 (0.4164) data time 0.0009 (0.0019) model time 0.3959 (0.4163) loss 7.0816 (6.9138) grad_norm 3.0378 (3.1009) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:55:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][500/625] eta 0:00:51 lr 0.000326 wd 0.0500 time 0.3995 (0.4160) data time 0.0008 (0.0018) model time 0.3987 (0.4159) loss 7.6814 (6.9115) grad_norm 3.0705 (3.0913) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][510/625] eta 0:00:47 lr 0.000326 wd 0.0500 time 0.3985 (0.4157) data time 0.0006 (0.0018) model time 0.3979 (0.4155) loss 6.3575 (6.9124) grad_norm 2.3275 (3.0921) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][520/625] eta 0:00:43 lr 0.000326 wd 0.0500 time 0.3945 (0.4154) data time 0.0009 (0.0018) model time 0.3936 (0.4152) loss 6.3117 (6.9087) grad_norm 3.0237 (3.0895) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][530/625] eta 0:00:39 lr 0.000325 wd 0.0500 time 0.3966 (0.4151) data time 0.0009 (0.0018) model time 0.3957 (0.4148) loss 6.9798 (6.9131) grad_norm 2.5592 (3.1084) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][540/625] eta 0:00:35 lr 0.000325 wd 0.0500 time 0.3975 (0.4148) data time 0.0008 (0.0018) model time 0.3967 (0.4145) loss 7.9225 (6.9201) grad_norm 2.8232 (3.1251) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][550/625] eta 0:00:31 lr 0.000325 wd 0.0500 time 0.3946 (0.4145) data time 0.0006 (0.0017) model time 0.3939 (0.4141) loss 6.8938 (6.9213) grad_norm 3.1233 (3.1360) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][560/625] eta 0:00:26 lr 0.000325 wd 0.0500 time 0.3980 (0.4142) data time 0.0008 (0.0017) model time 0.3972 (0.4138) loss 5.9750 (6.9247) grad_norm 2.7816 (3.1525) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][570/625] eta 0:00:22 lr 0.000325 wd 0.0500 time 0.3988 (0.4139) data time 0.0008 (0.0017) model time 0.3980 (0.4135) loss 7.7333 (6.9210) grad_norm 3.7063 (3.1415) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][580/625] eta 0:00:18 lr 0.000325 wd 0.0500 time 0.5624 (0.4143) data time 0.0009 (0.0017) model time 0.5615 (0.4138) loss 7.0168 (6.9151) grad_norm 2.1199 (3.1334) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][590/625] eta 0:00:14 lr 0.000325 wd 0.0500 time 0.5869 (0.4152) data time 0.0007 (0.0017) model time 0.5863 (0.4148) loss 6.7477 (6.9111) grad_norm 3.2098 (3.1270) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][600/625] eta 0:00:10 lr 0.000325 wd 0.0500 time 0.5618 (0.4164) data time 0.0009 (0.0017) model time 0.5610 (0.4161) loss 6.4184 (6.9113) grad_norm 3.5106 (3.1356) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][610/625] eta 0:00:06 lr 0.000325 wd 0.0500 time 0.5654 (0.4174) data time 0.0006 (0.0017) model time 0.5648 (0.4173) loss 7.0778 (6.9130) grad_norm 2.2951 (3.1460) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][620/625] eta 0:00:02 lr 0.000325 wd 0.0500 time 0.3945 (0.4177) data time 0.0004 (0.0017) model time 0.3940 (0.4175) loss 6.6562 (6.9130) grad_norm 4.0305 (3.1450) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:56:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 203 training takes 0:04:21 [2024-07-25 06:56:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 06:56:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 06:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5396 (0.5396) Acc@1 90.283 (90.283) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 06:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8647 (0.6788) Acc@1 80.420 (86.386) Acc@5 96.582 (97.758) Mem 14939MB [2024-07-25 06:56:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9810 (0.7982) Acc@1 77.051 (83.175) Acc@5 94.727 (96.545) Mem 14939MB [2024-07-25 06:56:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.817 Acc@5 96.529 [2024-07-25 06:56:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 06:56:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.82% [2024-07-25 06:56:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 06:56:56 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 06:56:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5464 (0.5464) Acc@1 89.990 (89.990) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 06:56:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8491 (0.6758) Acc@1 81.494 (86.617) Acc@5 96.289 (97.785) Mem 14939MB [2024-07-25 06:56:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9785 (0.7890) Acc@1 76.855 (83.503) Acc@5 95.410 (96.729) Mem 14939MB [2024-07-25 06:56:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.077 Acc@5 96.677 [2024-07-25 06:56:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 06:56:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.08% [2024-07-25 06:56:59 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 06:57:00 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 06:57:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][0/625] eta 0:08:11 lr 0.000325 wd 0.0500 time 0.7858 (0.7858) data time 0.3970 (0.3970) model time 0.0000 (0.0000) loss 7.8040 (7.8040) grad_norm 4.3038 (4.3038) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][10/625] eta 0:04:26 lr 0.000324 wd 0.0500 time 0.3933 (0.4337) data time 0.0008 (0.0370) model time 0.0000 (0.0000) loss 8.3873 (7.1026) grad_norm 4.0675 (3.8951) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][20/625] eta 0:04:13 lr 0.000324 wd 0.0500 time 0.4094 (0.4185) data time 0.0011 (0.0199) model time 0.0000 (0.0000) loss 5.7326 (6.8531) grad_norm 1.8382 (3.5104) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][30/625] eta 0:04:05 lr 0.000324 wd 0.0500 time 0.4142 (0.4120) data time 0.0006 (0.0138) model time 0.0000 (0.0000) loss 5.3980 (6.7438) grad_norm 4.2305 (3.3328) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][40/625] eta 0:03:58 lr 0.000324 wd 0.0500 time 0.3952 (0.4084) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 5.7205 (6.7943) grad_norm 1.9661 (3.1651) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][50/625] eta 0:03:53 lr 0.000324 wd 0.0500 time 0.3967 (0.4061) data time 0.0007 (0.0088) model time 0.0000 (0.0000) loss 6.8365 (6.7730) grad_norm 3.7380 (3.1835) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][60/625] eta 0:03:48 lr 0.000324 wd 0.0500 time 0.3964 (0.4048) data time 0.0008 (0.0075) model time 0.3957 (0.3969) loss 7.2937 (6.8370) grad_norm 3.1084 (3.2539) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][70/625] eta 0:03:44 lr 0.000324 wd 0.0500 time 0.3939 (0.4038) data time 0.0008 (0.0065) model time 0.3931 (0.3969) loss 7.4721 (6.8405) grad_norm 4.8689 (3.3137) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][80/625] eta 0:03:39 lr 0.000324 wd 0.0500 time 0.4001 (0.4031) data time 0.0006 (0.0058) model time 0.3994 (0.3971) loss 7.6527 (6.8337) grad_norm 3.4554 (3.3048) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][90/625] eta 0:03:35 lr 0.000324 wd 0.0500 time 0.3981 (0.4026) data time 0.0010 (0.0053) model time 0.3971 (0.3971) loss 6.5772 (6.8115) grad_norm 1.8909 (3.2407) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][100/625] eta 0:03:31 lr 0.000324 wd 0.0500 time 0.3951 (0.4020) data time 0.0008 (0.0049) model time 0.3943 (0.3969) loss 6.9848 (6.7895) grad_norm 2.6611 (3.2002) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][110/625] eta 0:03:26 lr 0.000323 wd 0.0500 time 0.3994 (0.4018) data time 0.0006 (0.0045) model time 0.3988 (0.3973) loss 5.8338 (6.7722) grad_norm 3.0768 (3.2319) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][120/625] eta 0:03:22 lr 0.000323 wd 0.0500 time 0.3995 (0.4014) data time 0.0006 (0.0042) model time 0.3989 (0.3971) loss 7.8124 (6.7730) grad_norm 2.3841 (3.2008) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][130/625] eta 0:03:18 lr 0.000323 wd 0.0500 time 0.3965 (0.4013) data time 0.0006 (0.0040) model time 0.3959 (0.3972) loss 6.2526 (6.7905) grad_norm 2.3764 (3.1499) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:57:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][140/625] eta 0:03:14 lr 0.000323 wd 0.0500 time 0.4018 (0.4010) data time 0.0007 (0.0037) model time 0.4011 (0.3973) loss 5.9903 (6.7811) grad_norm 2.2623 (3.0691) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][150/625] eta 0:03:10 lr 0.000323 wd 0.0500 time 0.4008 (0.4010) data time 0.0006 (0.0036) model time 0.4002 (0.3974) loss 5.1191 (6.7840) grad_norm 3.3971 (3.0448) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][160/625] eta 0:03:06 lr 0.000323 wd 0.0500 time 0.3969 (0.4009) data time 0.0008 (0.0034) model time 0.3961 (0.3976) loss 7.3235 (6.7808) grad_norm 2.0602 (3.0710) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][170/625] eta 0:03:02 lr 0.000323 wd 0.0500 time 0.3991 (0.4019) data time 0.0008 (0.0032) model time 0.3984 (0.3991) loss 6.3185 (6.7815) grad_norm 2.8940 (3.0576) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][180/625] eta 0:03:00 lr 0.000323 wd 0.0500 time 0.3965 (0.4056) data time 0.0006 (0.0031) model time 0.3959 (0.4046) loss 5.2150 (6.7757) grad_norm 2.9603 (3.0256) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][190/625] eta 0:02:58 lr 0.000323 wd 0.0500 time 0.4030 (0.4112) data time 0.0006 (0.0030) model time 0.4023 (0.4121) loss 6.8057 (6.7857) grad_norm 2.1853 (3.0115) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][200/625] eta 0:02:55 lr 0.000323 wd 0.0500 time 0.3992 (0.4141) data time 0.0006 (0.0029) model time 0.3986 (0.4159) loss 5.3858 (6.7987) grad_norm 4.2785 (3.0104) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][210/625] eta 0:02:53 lr 0.000323 wd 0.0500 time 0.5902 (0.4181) data time 0.0009 (0.0028) model time 0.5893 (0.4210) loss 8.2268 (6.8030) grad_norm 3.5603 (2.9880) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][220/625] eta 0:02:49 lr 0.000322 wd 0.0500 time 0.3970 (0.4193) data time 0.0007 (0.0027) model time 0.3962 (0.4224) loss 5.4162 (6.7932) grad_norm 2.3081 (2.9673) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][230/625] eta 0:02:45 lr 0.000322 wd 0.0500 time 0.3978 (0.4185) data time 0.0006 (0.0026) model time 0.3971 (0.4212) loss 6.3400 (6.7914) grad_norm 3.4907 (2.9549) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][240/625] eta 0:02:40 lr 0.000322 wd 0.0500 time 0.3998 (0.4178) data time 0.0006 (0.0026) model time 0.3992 (0.4200) loss 6.8107 (6.7985) grad_norm 2.5490 (2.9341) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][250/625] eta 0:02:36 lr 0.000322 wd 0.0500 time 0.3982 (0.4170) data time 0.0009 (0.0025) model time 0.3973 (0.4189) loss 5.3710 (6.7937) grad_norm 1.8725 (2.9336) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][260/625] eta 0:02:31 lr 0.000322 wd 0.0500 time 0.3968 (0.4163) data time 0.0006 (0.0024) model time 0.3962 (0.4179) loss 5.8378 (6.7882) grad_norm 2.4618 (2.9557) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][270/625] eta 0:02:27 lr 0.000322 wd 0.0500 time 0.3974 (0.4157) data time 0.0008 (0.0024) model time 0.3965 (0.4170) loss 6.0483 (6.7829) grad_norm 2.9484 (2.9642) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:58:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][280/625] eta 0:02:23 lr 0.000322 wd 0.0500 time 0.4044 (0.4151) data time 0.0006 (0.0023) model time 0.4038 (0.4162) loss 6.7613 (6.7893) grad_norm 3.0473 (2.9609) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:59:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][290/625] eta 0:02:18 lr 0.000322 wd 0.0500 time 0.3995 (0.4145) data time 0.0006 (0.0023) model time 0.3988 (0.4154) loss 7.8601 (6.7976) grad_norm 2.7165 (2.9750) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 06:59:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][300/625] eta 0:02:14 lr 0.000322 wd 0.0500 time 0.3968 (0.4141) data time 0.0007 (0.0022) model time 0.3961 (0.4148) loss 6.0466 (6.7961) grad_norm 2.9644 (2.9645) loss_scale 512.0000 (262.8040) mem 14939MB [2024-07-25 06:59:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][310/625] eta 0:02:10 lr 0.000322 wd 0.0500 time 0.4006 (0.4136) data time 0.0008 (0.0022) model time 0.3998 (0.4142) loss 6.0145 (6.7992) grad_norm 2.0494 (2.9650) loss_scale 512.0000 (270.8167) mem 14939MB [2024-07-25 06:59:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][320/625] eta 0:02:06 lr 0.000322 wd 0.0500 time 0.4050 (0.4132) data time 0.0008 (0.0021) model time 0.4042 (0.4137) loss 7.2123 (6.7989) grad_norm 1.7146 (2.9418) loss_scale 512.0000 (278.3302) mem 14939MB [2024-07-25 06:59:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][330/625] eta 0:02:01 lr 0.000321 wd 0.0500 time 0.4167 (0.4128) data time 0.0006 (0.0021) model time 0.4160 (0.4131) loss 7.5612 (6.8105) grad_norm 2.0180 (2.9440) loss_scale 512.0000 (285.3897) mem 14939MB [2024-07-25 06:59:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][340/625] eta 0:01:57 lr 0.000321 wd 0.0500 time 0.3992 (0.4124) data time 0.0006 (0.0021) model time 0.3986 (0.4127) loss 7.0037 (6.8182) grad_norm 3.5102 (2.9435) loss_scale 512.0000 (292.0352) mem 14939MB [2024-07-25 06:59:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][350/625] eta 0:01:53 lr 0.000321 wd 0.0500 time 0.3995 (0.4121) data time 0.0009 (0.0020) model time 0.3986 (0.4122) loss 6.4755 (6.8154) grad_norm 1.9433 (2.9485) loss_scale 512.0000 (298.3020) mem 14939MB [2024-07-25 06:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][360/625] eta 0:01:49 lr 0.000321 wd 0.0500 time 0.3984 (0.4117) data time 0.0009 (0.0020) model time 0.3975 (0.4118) loss 6.1065 (6.8206) grad_norm 2.4875 (2.9358) loss_scale 512.0000 (304.2216) mem 14939MB [2024-07-25 06:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][370/625] eta 0:01:44 lr 0.000321 wd 0.0500 time 0.4078 (0.4114) data time 0.0008 (0.0020) model time 0.4070 (0.4114) loss 7.3429 (6.8102) grad_norm 2.4634 (inf) loss_scale 256.0000 (308.4420) mem 14939MB [2024-07-25 06:59:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][380/625] eta 0:01:40 lr 0.000321 wd 0.0500 time 0.3962 (0.4111) data time 0.0008 (0.0019) model time 0.3953 (0.4109) loss 6.0751 (6.8189) grad_norm 2.4184 (inf) loss_scale 256.0000 (307.0656) mem 14939MB [2024-07-25 06:59:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][390/625] eta 0:01:36 lr 0.000321 wd 0.0500 time 0.4022 (0.4113) data time 0.0008 (0.0019) model time 0.4013 (0.4111) loss 6.6512 (6.8124) grad_norm 2.9439 (inf) loss_scale 256.0000 (305.7596) mem 14939MB [2024-07-25 06:59:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][400/625] eta 0:01:32 lr 0.000321 wd 0.0500 time 0.5086 (0.4122) data time 0.0006 (0.0019) model time 0.5079 (0.4122) loss 6.9101 (6.8143) grad_norm 4.0566 (inf) loss_scale 256.0000 (304.5187) mem 14939MB [2024-07-25 06:59:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][410/625] eta 0:01:28 lr 0.000321 wd 0.0500 time 0.3988 (0.4131) data time 0.0009 (0.0019) model time 0.3979 (0.4132) loss 5.8122 (6.8047) grad_norm 2.3340 (inf) loss_scale 256.0000 (303.3382) mem 14939MB [2024-07-25 06:59:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][420/625] eta 0:01:24 lr 0.000321 wd 0.0500 time 0.3956 (0.4145) data time 0.0009 (0.0018) model time 0.3947 (0.4148) loss 5.8345 (6.7967) grad_norm 2.0021 (inf) loss_scale 256.0000 (302.2138) mem 14939MB [2024-07-25 06:59:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][430/625] eta 0:01:21 lr 0.000320 wd 0.0500 time 0.3974 (0.4162) data time 0.0008 (0.0018) model time 0.3965 (0.4166) loss 6.3531 (6.7902) grad_norm 2.6535 (inf) loss_scale 256.0000 (301.1415) mem 14939MB [2024-07-25 07:00:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][440/625] eta 0:01:17 lr 0.000320 wd 0.0500 time 0.3989 (0.4164) data time 0.0009 (0.0018) model time 0.3981 (0.4168) loss 7.4746 (6.7923) grad_norm 2.2849 (inf) loss_scale 256.0000 (300.1179) mem 14939MB [2024-07-25 07:00:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][450/625] eta 0:01:12 lr 0.000320 wd 0.0500 time 0.4033 (0.4161) data time 0.0007 (0.0018) model time 0.4026 (0.4165) loss 6.6927 (6.7924) grad_norm 3.6253 (inf) loss_scale 256.0000 (299.1397) mem 14939MB [2024-07-25 07:00:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][460/625] eta 0:01:08 lr 0.000320 wd 0.0500 time 0.3994 (0.4158) data time 0.0006 (0.0018) model time 0.3987 (0.4161) loss 6.4051 (6.7934) grad_norm 4.8363 (inf) loss_scale 256.0000 (298.2039) mem 14939MB [2024-07-25 07:00:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][470/625] eta 0:01:04 lr 0.000320 wd 0.0500 time 0.3999 (0.4154) data time 0.0006 (0.0018) model time 0.3992 (0.4156) loss 6.8774 (6.7972) grad_norm 2.1425 (inf) loss_scale 256.0000 (297.3079) mem 14939MB [2024-07-25 07:00:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][480/625] eta 0:01:00 lr 0.000320 wd 0.0500 time 0.3998 (0.4151) data time 0.0010 (0.0017) model time 0.3988 (0.4152) loss 5.7627 (6.7949) grad_norm 2.4342 (inf) loss_scale 256.0000 (296.4491) mem 14939MB [2024-07-25 07:00:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][490/625] eta 0:00:56 lr 0.000320 wd 0.0500 time 0.4041 (0.4149) data time 0.0007 (0.0017) model time 0.4034 (0.4150) loss 6.8569 (6.8000) grad_norm 3.8565 (inf) loss_scale 256.0000 (295.6253) mem 14939MB [2024-07-25 07:00:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][500/625] eta 0:00:51 lr 0.000320 wd 0.0500 time 0.4099 (0.4146) data time 0.0007 (0.0017) model time 0.4093 (0.4146) loss 6.1691 (6.8005) grad_norm 2.0795 (inf) loss_scale 256.0000 (294.8343) mem 14939MB [2024-07-25 07:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][510/625] eta 0:00:47 lr 0.000320 wd 0.0500 time 0.3979 (0.4144) data time 0.0011 (0.0017) model time 0.3967 (0.4144) loss 5.9175 (6.8086) grad_norm 2.4798 (inf) loss_scale 256.0000 (294.0744) mem 14939MB [2024-07-25 07:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][520/625] eta 0:00:43 lr 0.000320 wd 0.0500 time 0.4018 (0.4143) data time 0.0011 (0.0017) model time 0.4007 (0.4142) loss 6.0823 (6.8074) grad_norm 1.7638 (inf) loss_scale 256.0000 (293.3436) mem 14939MB [2024-07-25 07:00:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][530/625] eta 0:00:39 lr 0.000320 wd 0.0500 time 0.4063 (0.4142) data time 0.0009 (0.0017) model time 0.4054 (0.4141) loss 6.6015 (6.8085) grad_norm 2.1036 (inf) loss_scale 256.0000 (292.6403) mem 14939MB [2024-07-25 07:00:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][540/625] eta 0:00:35 lr 0.000319 wd 0.0500 time 0.3949 (0.4140) data time 0.0011 (0.0017) model time 0.3938 (0.4140) loss 6.7017 (6.8077) grad_norm 2.5336 (inf) loss_scale 256.0000 (291.9630) mem 14939MB [2024-07-25 07:00:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][550/625] eta 0:00:32 lr 0.000319 wd 0.0500 time 9.1400 (0.4298) data time 8.7633 (0.0175) model time 0.3768 (0.4138) loss 6.5001 (6.8159) grad_norm 3.8323 (inf) loss_scale 256.0000 (291.3103) mem 14939MB [2024-07-25 07:01:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][560/625] eta 0:00:28 lr 0.000319 wd 0.0500 time 0.9372 (0.4421) data time 0.0008 (0.0172) model time 0.9364 (0.4276) loss 5.9286 (6.8151) grad_norm 2.5718 (inf) loss_scale 256.0000 (290.6809) mem 14939MB [2024-07-25 07:01:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][570/625] eta 0:00:24 lr 0.000319 wd 0.0500 time 1.1308 (0.4539) data time 0.0009 (0.0170) model time 1.1300 (0.4409) loss 7.0220 (6.8154) grad_norm 2.6699 (inf) loss_scale 256.0000 (290.0736) mem 14939MB [2024-07-25 07:01:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][580/625] eta 0:00:20 lr 0.000319 wd 0.0500 time 0.4140 (0.4656) data time 0.0009 (0.0167) model time 0.4131 (0.4539) loss 6.2812 (6.8089) grad_norm 2.4667 (inf) loss_scale 256.0000 (289.4871) mem 14939MB [2024-07-25 07:01:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][590/625] eta 0:00:16 lr 0.000319 wd 0.0500 time 1.1546 (0.4749) data time 0.0007 (0.0164) model time 1.1539 (0.4642) loss 6.7463 (6.8114) grad_norm 2.7121 (inf) loss_scale 256.0000 (288.9205) mem 14939MB [2024-07-25 07:01:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][600/625] eta 0:00:12 lr 0.000319 wd 0.0500 time 0.7821 (0.4894) data time 0.0009 (0.0162) model time 0.7812 (0.4803) loss 7.9960 (6.8125) grad_norm 2.2078 (inf) loss_scale 256.0000 (288.3727) mem 14939MB [2024-07-25 07:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][610/625] eta 0:00:07 lr 0.000319 wd 0.0500 time 0.4023 (0.4891) data time 0.0004 (0.0159) model time 0.4019 (0.4801) loss 6.9613 (6.8141) grad_norm 2.1824 (inf) loss_scale 256.0000 (287.8429) mem 14939MB [2024-07-25 07:02:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][620/625] eta 0:00:02 lr 0.000319 wd 0.0500 time 0.3949 (0.4884) data time 0.0004 (0.0157) model time 0.3944 (0.4794) loss 7.5576 (6.8189) grad_norm 2.4992 (inf) loss_scale 256.0000 (287.3301) mem 14939MB [2024-07-25 07:02:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 204 training takes 0:05:05 [2024-07-25 07:02:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:02:06 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:02:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 22.372 (22.372) Loss 0.5537 (0.5537) Acc@1 89.453 (89.453) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 07:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.083 (2.254) Loss 0.8770 (0.6794) Acc@1 80.664 (86.337) Acc@5 96.484 (97.789) Mem 14939MB [2024-07-25 07:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.084 (1.221) Loss 1.0039 (0.7989) Acc@1 76.562 (83.233) Acc@5 94.482 (96.575) Mem 14939MB [2024-07-25 07:02:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.835 Acc@5 96.559 [2024-07-25 07:02:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-07-25 07:02:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.84% [2024-07-25 07:02:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 07:02:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 07:02:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 24.718 (24.718) Loss 0.5459 (0.5459) Acc@1 89.990 (89.990) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 07:03:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.084 (2.461) Loss 0.8481 (0.6751) Acc@1 81.592 (86.648) Acc@5 96.338 (97.794) Mem 14939MB [2024-07-25 07:03:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.084 (1.329) Loss 0.9780 (0.7885) Acc@1 76.855 (83.524) Acc@5 95.410 (96.733) Mem 14939MB [2024-07-25 07:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.097 Acc@5 96.681 [2024-07-25 07:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.10% [2024-07-25 07:03:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:03:01 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:03:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][0/625] eta 3:02:16 lr 0.000319 wd 0.0500 time 17.4984 (17.4984) data time 15.2971 (15.2971) model time 0.0000 (0.0000) loss 6.9023 (6.9023) grad_norm 2.5770 (2.5770) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:03:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][10/625] eta 0:21:18 lr 0.000319 wd 0.0500 time 0.3970 (2.0791) data time 0.0009 (1.4380) model time 0.0000 (0.0000) loss 6.0883 (6.7471) grad_norm 4.7610 (3.1893) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][20/625] eta 0:15:41 lr 0.000318 wd 0.0500 time 5.9274 (1.5570) data time 0.0008 (0.7537) model time 0.0000 (0.0000) loss 5.9217 (6.6231) grad_norm 2.5177 (3.6791) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:03:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][30/625] eta 0:15:45 lr 0.000318 wd 0.0500 time 0.3967 (1.5889) data time 0.0009 (0.5109) model time 0.0000 (0.0000) loss 6.8201 (6.6822) grad_norm 2.0148 (3.3602) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:03:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][40/625] eta 0:13:10 lr 0.000318 wd 0.0500 time 0.3929 (1.3516) data time 0.0009 (0.3866) model time 0.0000 (0.0000) loss 6.2805 (6.6369) grad_norm 3.2031 (3.2126) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:04:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][50/625] eta 0:13:46 lr 0.000318 wd 0.0500 time 1.3339 (1.4373) data time 0.0009 (0.3110) model time 0.0000 (0.0000) loss 8.0505 (6.7499) grad_norm 2.9360 (3.2205) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:04:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][60/625] eta 0:12:00 lr 0.000318 wd 0.0500 time 0.3960 (1.2757) data time 0.0007 (0.2601) model time 0.3953 (0.4510) loss 7.8127 (6.7528) grad_norm 6.1884 (3.2909) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:04:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][70/625] eta 0:11:48 lr 0.000318 wd 0.0500 time 0.7969 (1.2761) data time 0.0007 (0.2236) model time 0.7962 (0.8643) loss 7.0646 (6.7604) grad_norm 1.9949 (3.1984) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:04:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][80/625] eta 0:11:25 lr 0.000318 wd 0.0500 time 0.6803 (1.2576) data time 0.0009 (0.1961) model time 0.6794 (0.9512) loss 7.4623 (6.8149) grad_norm 2.0097 (3.0781) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:04:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][90/625] eta 0:11:00 lr 0.000318 wd 0.0500 time 1.0941 (1.2346) data time 0.0010 (0.1747) model time 1.0931 (0.9752) loss 6.4763 (6.8059) grad_norm 2.9864 (3.0383) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:05:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][100/625] eta 0:10:33 lr 0.000318 wd 0.0500 time 0.7231 (1.2060) data time 0.0009 (0.1575) model time 0.7221 (0.9692) loss 7.5784 (6.8341) grad_norm 2.8133 (2.9752) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:05:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][110/625] eta 0:10:51 lr 0.000318 wd 0.0500 time 0.9685 (1.2645) data time 0.0007 (0.1434) model time 0.9678 (1.1166) loss 7.6872 (6.8529) grad_norm 2.6844 (3.0414) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:05:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][120/625] eta 0:10:36 lr 0.000318 wd 0.0500 time 1.2649 (1.2605) data time 0.0008 (0.1316) model time 1.2641 (1.1307) loss 7.6130 (6.8541) grad_norm 3.0949 (3.0069) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:05:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][130/625] eta 0:10:17 lr 0.000317 wd 0.0500 time 0.4257 (1.2471) data time 0.0009 (0.1217) model time 0.4248 (1.1249) loss 6.1938 (6.8294) grad_norm 1.7985 (2.9737) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:05:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][140/625] eta 0:10:00 lr 0.000317 wd 0.0500 time 0.4077 (1.2391) data time 0.0008 (0.1131) model time 0.4070 (1.1258) loss 6.9703 (6.8266) grad_norm 5.6837 (2.9424) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][150/625] eta 0:09:22 lr 0.000317 wd 0.0500 time 0.3967 (1.1837) data time 0.0009 (0.1057) model time 0.3958 (1.0534) loss 7.3214 (6.8348) grad_norm 2.5912 (2.9142) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][160/625] eta 0:08:47 lr 0.000317 wd 0.0500 time 0.4055 (1.1352) data time 0.0011 (0.0992) model time 0.4045 (0.9942) loss 6.6216 (6.8430) grad_norm 6.5758 (2.8980) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][170/625] eta 0:08:17 lr 0.000317 wd 0.0500 time 0.4028 (1.0925) data time 0.0007 (0.0934) model time 0.4022 (0.9450) loss 6.4691 (6.8341) grad_norm 2.1126 (2.8685) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][180/625] eta 0:07:49 lr 0.000317 wd 0.0500 time 0.4009 (1.0543) data time 0.0006 (0.0883) model time 0.4003 (0.9030) loss 6.6221 (6.8416) grad_norm 2.5054 (2.8398) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][190/625] eta 0:07:23 lr 0.000317 wd 0.0500 time 0.4064 (1.0200) data time 0.0008 (0.0837) model time 0.4056 (0.8670) loss 7.2501 (6.8517) grad_norm 1.9404 (2.8364) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][200/625] eta 0:07:00 lr 0.000317 wd 0.0500 time 0.3955 (0.9891) data time 0.0007 (0.0796) model time 0.3949 (0.8358) loss 7.1315 (6.8641) grad_norm 3.1692 (2.8221) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][210/625] eta 0:06:46 lr 0.000317 wd 0.0500 time 4.0713 (0.9793) data time 2.4540 (0.0875) model time 1.6173 (0.8170) loss 7.4218 (6.8844) grad_norm 5.2106 (2.8546) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][220/625] eta 0:06:34 lr 0.000317 wd 0.0500 time 0.8859 (0.9729) data time 0.0009 (0.0836) model time 0.8850 (0.8182) loss 6.0675 (6.8667) grad_norm 2.6169 (2.8589) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:06:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][230/625] eta 0:06:36 lr 0.000317 wd 0.0500 time 2.0049 (1.0046) data time 0.0007 (0.0800) model time 2.0042 (0.8674) loss 7.0714 (6.8709) grad_norm 1.9675 (2.8271) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:07:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][240/625] eta 0:06:37 lr 0.000316 wd 0.0500 time 1.1361 (1.0320) data time 0.0007 (0.0767) model time 1.1354 (0.9094) loss 6.9704 (6.8721) grad_norm 3.0949 (2.8242) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:07:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][250/625] eta 0:06:32 lr 0.000316 wd 0.0500 time 0.7145 (1.0457) data time 0.0007 (0.0737) model time 0.7138 (0.9326) loss 7.8644 (6.8734) grad_norm 2.6908 (2.8097) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:07:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][260/625] eta 0:06:21 lr 0.000316 wd 0.0500 time 0.6616 (1.0439) data time 0.0008 (0.0709) model time 0.6608 (0.9358) loss 7.0297 (6.8642) grad_norm 2.8374 (2.7913) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:07:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][270/625] eta 0:06:05 lr 0.000316 wd 0.0500 time 0.3896 (1.0285) data time 0.0009 (0.0684) model time 0.3888 (0.9216) loss 6.4387 (6.8655) grad_norm 3.1148 (2.7845) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:07:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][280/625] eta 0:05:59 lr 0.000316 wd 0.0500 time 2.4310 (1.0407) data time 0.0009 (0.0660) model time 2.4302 (0.9411) loss 7.3821 (6.8651) grad_norm 3.1428 (2.7764) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:08:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][290/625] eta 0:05:47 lr 0.000316 wd 0.0500 time 1.0485 (1.0374) data time 0.0007 (0.0637) model time 1.0479 (0.9412) loss 5.6051 (6.8545) grad_norm 2.6720 (2.7741) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:08:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][300/625] eta 0:05:37 lr 0.000316 wd 0.0500 time 1.3855 (1.0388) data time 0.0007 (0.0616) model time 1.3848 (0.9467) loss 5.3662 (6.8374) grad_norm 2.5632 (2.7735) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:08:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][310/625] eta 0:05:27 lr 0.000316 wd 0.0500 time 1.1800 (1.0411) data time 0.0009 (0.0597) model time 1.1791 (0.9530) loss 7.3556 (6.8406) grad_norm 2.4299 (2.7720) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:08:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][320/625] eta 0:05:20 lr 0.000316 wd 0.0500 time 0.3934 (1.0507) data time 0.0007 (0.0579) model time 0.3927 (0.9676) loss 5.7071 (6.8352) grad_norm 2.8047 (2.8178) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:08:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][330/625] eta 0:05:04 lr 0.000316 wd 0.0500 time 0.3966 (1.0309) data time 0.0008 (0.0561) model time 0.3958 (0.9472) loss 7.2357 (6.8312) grad_norm 4.3622 (2.8325) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:08:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][340/625] eta 0:04:54 lr 0.000316 wd 0.0500 time 0.9429 (1.0336) data time 0.0007 (0.0545) model time 0.9421 (0.9532) loss 5.9691 (6.8283) grad_norm 2.3933 (2.8314) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:09:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][350/625] eta 0:04:41 lr 0.000315 wd 0.0500 time 1.7755 (1.0241) data time 0.0008 (0.0530) model time 1.7747 (0.9448) loss 6.5801 (6.8314) grad_norm 3.8042 (2.8400) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:09:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][360/625] eta 0:04:33 lr 0.000315 wd 0.0500 time 1.1372 (1.0308) data time 0.0009 (0.0515) model time 1.1363 (0.9550) loss 7.6373 (6.8389) grad_norm 3.0929 (2.8414) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:09:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][370/625] eta 0:04:23 lr 0.000315 wd 0.0500 time 0.4047 (1.0314) data time 0.0009 (0.0502) model time 0.4038 (0.9581) loss 5.2044 (6.8325) grad_norm 2.2396 (2.8427) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:09:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][380/625] eta 0:04:14 lr 0.000315 wd 0.0500 time 0.9311 (1.0371) data time 0.0008 (0.0489) model time 0.9303 (0.9669) loss 6.4585 (6.8330) grad_norm 2.7034 (2.8468) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:09:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][390/625] eta 0:04:08 lr 0.000315 wd 0.0500 time 3.2176 (1.0595) data time 0.0009 (0.0477) model time 3.2168 (0.9946) loss 7.4723 (6.8374) grad_norm 2.7746 (2.8377) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:10:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][400/625] eta 0:03:56 lr 0.000315 wd 0.0500 time 0.3984 (1.0511) data time 0.0008 (0.0465) model time 0.3976 (0.9869) loss 7.1712 (6.8429) grad_norm 2.1767 (2.8462) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:10:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][410/625] eta 0:03:49 lr 0.000315 wd 0.0500 time 13.6615 (1.0675) data time 8.6873 (0.0665) model time 4.9742 (0.9832) loss 6.9611 (6.8456) grad_norm 2.7558 (2.8558) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:10:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][420/625] eta 0:03:39 lr 0.000315 wd 0.0500 time 1.5157 (1.0703) data time 0.0008 (0.0650) model time 1.5149 (0.9887) loss 6.2208 (6.8449) grad_norm 2.8495 (2.8509) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:10:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][430/625] eta 0:03:26 lr 0.000315 wd 0.0500 time 0.3959 (1.0602) data time 0.0009 (0.0635) model time 0.3949 (0.9793) loss 6.3972 (6.8439) grad_norm 2.9097 (2.8476) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:10:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][440/625] eta 0:03:13 lr 0.000315 wd 0.0500 time 0.3969 (1.0460) data time 0.0007 (0.0621) model time 0.3962 (0.9653) loss 6.2125 (6.8544) grad_norm 3.1357 (2.8448) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:10:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][450/625] eta 0:03:00 lr 0.000314 wd 0.0500 time 0.3958 (1.0341) data time 0.0006 (0.0607) model time 0.3952 (0.9539) loss 7.2966 (6.8555) grad_norm 1.8808 (2.8310) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:10:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][460/625] eta 0:02:48 lr 0.000314 wd 0.0500 time 0.4068 (1.0222) data time 0.0008 (0.0594) model time 0.4060 (0.9425) loss 6.3309 (6.8530) grad_norm 3.3400 (2.8476) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:10:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][470/625] eta 0:02:36 lr 0.000314 wd 0.0500 time 0.6099 (1.0111) data time 0.0009 (0.0582) model time 0.6089 (0.9319) loss 6.3130 (6.8553) grad_norm 3.0939 (2.8409) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][480/625] eta 0:02:24 lr 0.000314 wd 0.0500 time 0.3939 (0.9991) data time 0.0009 (0.0570) model time 0.3930 (0.9203) loss 5.9887 (6.8552) grad_norm 3.1794 (2.8451) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][490/625] eta 0:02:13 lr 0.000314 wd 0.0500 time 0.3964 (0.9868) data time 0.0011 (0.0558) model time 0.3953 (0.9083) loss 8.4281 (6.8537) grad_norm 2.1822 (2.8475) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][500/625] eta 0:02:01 lr 0.000314 wd 0.0500 time 0.3996 (0.9751) data time 0.0007 (0.0547) model time 0.3989 (0.8970) loss 5.1763 (6.8504) grad_norm 3.0783 (2.8565) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][510/625] eta 0:01:50 lr 0.000314 wd 0.0500 time 0.3986 (0.9638) data time 0.0010 (0.0537) model time 0.3976 (0.8862) loss 7.1770 (6.8594) grad_norm 2.2341 (2.8529) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][520/625] eta 0:01:40 lr 0.000314 wd 0.0500 time 0.4159 (0.9531) data time 0.0007 (0.0527) model time 0.4152 (0.8759) loss 7.3285 (6.8571) grad_norm 2.3369 (2.8443) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][530/625] eta 0:01:29 lr 0.000314 wd 0.0500 time 0.3980 (0.9426) data time 0.0007 (0.0517) model time 0.3974 (0.8659) loss 7.2068 (6.8557) grad_norm 2.1404 (2.8493) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][540/625] eta 0:01:19 lr 0.000314 wd 0.0500 time 0.3981 (0.9328) data time 0.0006 (0.0508) model time 0.3975 (0.8566) loss 8.1015 (6.8547) grad_norm 1.9780 (2.8480) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][550/625] eta 0:01:09 lr 0.000314 wd 0.0500 time 0.3983 (0.9231) data time 0.0007 (0.0499) model time 0.3977 (0.8474) loss 6.8469 (6.8564) grad_norm 2.1271 (2.8381) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][560/625] eta 0:00:59 lr 0.000313 wd 0.0500 time 0.3967 (0.9137) data time 0.0008 (0.0490) model time 0.3958 (0.8386) loss 7.8978 (6.8557) grad_norm 6.8762 (2.8532) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][570/625] eta 0:00:49 lr 0.000313 wd 0.0500 time 0.4033 (0.9047) data time 0.0009 (0.0481) model time 0.4024 (0.8301) loss 6.1675 (6.8566) grad_norm 2.1679 (2.8477) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][580/625] eta 0:00:40 lr 0.000313 wd 0.0500 time 0.3983 (0.8960) data time 0.0006 (0.0473) model time 0.3977 (0.8220) loss 6.3235 (6.8582) grad_norm 2.5484 (2.8445) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][590/625] eta 0:00:31 lr 0.000313 wd 0.0500 time 0.4015 (0.8876) data time 0.0007 (0.0465) model time 0.4008 (0.8141) loss 7.1063 (6.8545) grad_norm 3.6605 (2.8532) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][600/625] eta 0:00:21 lr 0.000313 wd 0.0500 time 0.3990 (0.8795) data time 0.0006 (0.0458) model time 0.3985 (0.8066) loss 6.8826 (6.8535) grad_norm 3.7335 (2.9051) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][610/625] eta 0:00:13 lr 0.000313 wd 0.0500 time 0.3961 (0.8716) data time 0.0004 (0.0450) model time 0.3957 (0.7993) loss 6.2796 (6.8538) grad_norm 4.1281 (2.9117) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:11:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][620/625] eta 0:00:04 lr 0.000313 wd 0.0500 time 0.3948 (0.8640) data time 0.0006 (0.0443) model time 0.3942 (0.7922) loss 7.5563 (6.8570) grad_norm 5.8605 (2.9312) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 205 training takes 0:08:58 [2024-07-25 07:12:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:12:00 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:12:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.5615 (0.5615) Acc@1 89.844 (89.844) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 07:12:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.8584 (0.6894) Acc@1 81.982 (86.359) Acc@5 96.484 (97.723) Mem 14939MB [2024-07-25 07:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9814 (0.8131) Acc@1 76.904 (83.140) Acc@5 95.068 (96.503) Mem 14939MB [2024-07-25 07:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.744 Acc@5 96.481 [2024-07-25 07:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-07-25 07:12:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.773 (0.773) Loss 0.5459 (0.5459) Acc@1 89.941 (89.941) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 07:12:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.156) Loss 0.8472 (0.6748) Acc@1 81.641 (86.603) Acc@5 96.338 (97.794) Mem 14939MB [2024-07-25 07:12:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9766 (0.7878) Acc@1 76.953 (83.512) Acc@5 95.410 (96.745) Mem 14939MB [2024-07-25 07:12:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.093 Acc@5 96.697 [2024-07-25 07:12:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:12:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][0/625] eta 0:13:19 lr 0.000313 wd 0.0500 time 1.2790 (1.2790) data time 0.6240 (0.6240) model time 0.0000 (0.0000) loss 6.4210 (6.4210) grad_norm 6.2016 (6.2016) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][10/625] eta 0:04:55 lr 0.000313 wd 0.0500 time 0.3994 (0.4799) data time 0.0007 (0.0575) model time 0.0000 (0.0000) loss 6.6017 (6.6084) grad_norm 5.0831 (3.8818) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][20/625] eta 0:04:27 lr 0.000313 wd 0.0500 time 0.4012 (0.4420) data time 0.0008 (0.0305) model time 0.0000 (0.0000) loss 6.3355 (6.8277) grad_norm 2.3102 (3.2543) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][30/625] eta 0:04:21 lr 0.000313 wd 0.0500 time 0.3978 (0.4402) data time 0.0006 (0.0209) model time 0.0000 (0.0000) loss 6.2731 (6.9428) grad_norm 2.9211 (3.4972) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][40/625] eta 0:04:19 lr 0.000312 wd 0.0500 time 0.5720 (0.4436) data time 0.0006 (0.0160) model time 0.0000 (0.0000) loss 7.2883 (6.8936) grad_norm 3.1801 (3.7308) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][50/625] eta 0:04:21 lr 0.000312 wd 0.0500 time 0.5978 (0.4543) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 6.6683 (6.8446) grad_norm 7.8160 (4.5374) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][60/625] eta 0:04:20 lr 0.000312 wd 0.0500 time 0.6022 (0.4605) data time 0.0006 (0.0110) model time 0.6016 (0.4910) loss 7.8884 (6.8944) grad_norm 5.0515 (4.5207) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][70/625] eta 0:04:16 lr 0.000312 wd 0.0500 time 0.5327 (0.4615) data time 0.0006 (0.0096) model time 0.5321 (0.4790) loss 7.5299 (6.8958) grad_norm 1.9390 (4.4595) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][80/625] eta 0:04:07 lr 0.000312 wd 0.0500 time 0.4013 (0.4539) data time 0.0006 (0.0085) model time 0.4007 (0.4523) loss 6.9623 (6.8738) grad_norm 1.9318 (4.2338) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][90/625] eta 0:03:59 lr 0.000312 wd 0.0500 time 0.4011 (0.4477) data time 0.0010 (0.0077) model time 0.4001 (0.4385) loss 7.5707 (6.8734) grad_norm 2.2094 (4.0467) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][100/625] eta 0:03:52 lr 0.000312 wd 0.0500 time 0.3991 (0.4428) data time 0.0008 (0.0070) model time 0.3982 (0.4302) loss 7.3041 (6.9111) grad_norm 2.3124 (3.9608) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][110/625] eta 0:03:46 lr 0.000312 wd 0.0500 time 0.3995 (0.4407) data time 0.0006 (0.0064) model time 0.3989 (0.4283) loss 7.7855 (6.9226) grad_norm 3.3053 (3.8536) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:12:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][120/625] eta 0:03:40 lr 0.000312 wd 0.0500 time 0.3975 (0.4373) data time 0.0007 (0.0060) model time 0.3969 (0.4240) loss 6.5675 (6.9406) grad_norm 2.3804 (3.7669) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][130/625] eta 0:03:35 lr 0.000312 wd 0.0500 time 0.3998 (0.4344) data time 0.0007 (0.0056) model time 0.3992 (0.4209) loss 6.9984 (6.9377) grad_norm 2.9351 (3.7993) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][140/625] eta 0:03:29 lr 0.000312 wd 0.0500 time 0.4065 (0.4320) data time 0.0007 (0.0052) model time 0.4057 (0.4185) loss 6.4474 (6.9396) grad_norm 4.0228 (3.8787) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][150/625] eta 0:03:24 lr 0.000311 wd 0.0500 time 0.3964 (0.4298) data time 0.0008 (0.0049) model time 0.3956 (0.4165) loss 7.1238 (6.9524) grad_norm 3.0628 (3.8269) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][160/625] eta 0:03:19 lr 0.000311 wd 0.0500 time 0.3966 (0.4280) data time 0.0006 (0.0047) model time 0.3960 (0.4150) loss 6.9647 (6.9320) grad_norm 3.6806 (3.7908) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][170/625] eta 0:03:13 lr 0.000311 wd 0.0500 time 0.4022 (0.4263) data time 0.0006 (0.0045) model time 0.4016 (0.4136) loss 6.1995 (6.9136) grad_norm 2.4227 (3.7493) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][180/625] eta 0:03:09 lr 0.000311 wd 0.0500 time 0.3974 (0.4249) data time 0.0009 (0.0043) model time 0.3965 (0.4125) loss 7.9794 (6.9318) grad_norm 3.2035 (3.6883) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][190/625] eta 0:03:04 lr 0.000311 wd 0.0500 time 0.3955 (0.4235) data time 0.0009 (0.0041) model time 0.3947 (0.4115) loss 6.7946 (6.9324) grad_norm 3.4559 (3.6384) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][200/625] eta 0:02:59 lr 0.000311 wd 0.0500 time 0.3999 (0.4223) data time 0.0006 (0.0039) model time 0.3993 (0.4106) loss 7.3773 (6.9480) grad_norm 5.0629 (3.6455) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][210/625] eta 0:02:54 lr 0.000311 wd 0.0500 time 0.3945 (0.4211) data time 0.0010 (0.0038) model time 0.3935 (0.4098) loss 7.7006 (6.9327) grad_norm 2.2769 (3.6786) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][220/625] eta 0:02:50 lr 0.000311 wd 0.0500 time 0.3974 (0.4202) data time 0.0008 (0.0036) model time 0.3966 (0.4091) loss 7.5810 (6.9434) grad_norm 2.1221 (3.6241) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][230/625] eta 0:02:45 lr 0.000311 wd 0.0500 time 0.3958 (0.4192) data time 0.0007 (0.0035) model time 0.3951 (0.4084) loss 6.8483 (6.9348) grad_norm 2.5665 (3.5902) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][240/625] eta 0:02:41 lr 0.000311 wd 0.0500 time 0.3956 (0.4184) data time 0.0007 (0.0034) model time 0.3949 (0.4079) loss 6.7764 (6.9275) grad_norm 2.1138 (3.5589) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][250/625] eta 0:02:36 lr 0.000311 wd 0.0500 time 0.3979 (0.4186) data time 0.0007 (0.0033) model time 0.3973 (0.4087) loss 6.0702 (6.9321) grad_norm 1.9781 (3.5189) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:13:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][260/625] eta 0:02:33 lr 0.000310 wd 0.0500 time 0.5859 (0.4203) data time 0.0009 (0.0032) model time 0.5850 (0.4113) loss 7.4117 (6.9348) grad_norm 2.5171 (3.5237) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][270/625] eta 0:02:30 lr 0.000310 wd 0.0500 time 0.5905 (0.4228) data time 0.0006 (0.0031) model time 0.5899 (0.4146) loss 6.9356 (6.9333) grad_norm 2.8375 (3.5458) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][280/625] eta 0:02:26 lr 0.000310 wd 0.0500 time 0.6000 (0.4246) data time 0.0007 (0.0030) model time 0.5994 (0.4172) loss 6.0205 (6.9214) grad_norm 3.3570 (3.5062) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][290/625] eta 0:02:22 lr 0.000310 wd 0.0500 time 0.5628 (0.4255) data time 0.0008 (0.0030) model time 0.5621 (0.4185) loss 7.6751 (6.9124) grad_norm 2.8137 (3.5292) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][300/625] eta 0:02:17 lr 0.000310 wd 0.0500 time 0.3948 (0.4245) data time 0.0009 (0.0029) model time 0.3939 (0.4176) loss 6.5255 (6.9113) grad_norm 3.9977 (3.5173) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][310/625] eta 0:02:13 lr 0.000310 wd 0.0500 time 0.3989 (0.4236) data time 0.0006 (0.0028) model time 0.3982 (0.4168) loss 6.6786 (6.9132) grad_norm 2.2847 (3.5007) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][320/625] eta 0:02:08 lr 0.000310 wd 0.0500 time 0.4010 (0.4229) data time 0.0009 (0.0028) model time 0.4001 (0.4161) loss 7.2400 (6.9101) grad_norm 2.3831 (3.4694) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][330/625] eta 0:02:04 lr 0.000310 wd 0.0500 time 0.3967 (0.4227) data time 0.0009 (0.0027) model time 0.3958 (0.4162) loss 7.0995 (6.9102) grad_norm 3.9932 (3.4921) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][340/625] eta 0:02:00 lr 0.000310 wd 0.0500 time 0.3986 (0.4220) data time 0.0007 (0.0026) model time 0.3980 (0.4156) loss 7.2087 (6.9028) grad_norm 3.1725 (3.4808) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][350/625] eta 0:01:55 lr 0.000310 wd 0.0500 time 0.3988 (0.4214) data time 0.0007 (0.0026) model time 0.3981 (0.4149) loss 7.8631 (6.9072) grad_norm 4.4557 (3.4707) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][360/625] eta 0:01:51 lr 0.000310 wd 0.0500 time 0.3979 (0.4208) data time 0.0008 (0.0025) model time 0.3971 (0.4144) loss 7.0065 (6.9088) grad_norm 4.6302 (3.4874) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][370/625] eta 0:01:47 lr 0.000309 wd 0.0500 time 0.3984 (0.4202) data time 0.0009 (0.0025) model time 0.3976 (0.4139) loss 7.6954 (6.9064) grad_norm 2.2724 (3.4589) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][380/625] eta 0:01:42 lr 0.000309 wd 0.0500 time 0.3968 (0.4196) data time 0.0006 (0.0024) model time 0.3962 (0.4135) loss 6.8046 (6.9000) grad_norm 2.8813 (3.4387) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][390/625] eta 0:01:38 lr 0.000309 wd 0.0500 time 0.4012 (0.4191) data time 0.0009 (0.0024) model time 0.4002 (0.4130) loss 7.7369 (6.9026) grad_norm 3.8087 (3.4316) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][400/625] eta 0:01:34 lr 0.000309 wd 0.0500 time 0.4007 (0.4186) data time 0.0008 (0.0024) model time 0.4000 (0.4126) loss 7.3484 (6.9037) grad_norm 9.8470 (3.4485) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][410/625] eta 0:01:29 lr 0.000309 wd 0.0500 time 0.3948 (0.4182) data time 0.0009 (0.0023) model time 0.3940 (0.4122) loss 6.7803 (6.9024) grad_norm 2.6719 (3.4386) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][420/625] eta 0:01:25 lr 0.000309 wd 0.0500 time 0.3949 (0.4177) data time 0.0011 (0.0023) model time 0.3938 (0.4119) loss 6.2811 (6.9012) grad_norm 2.0668 (3.4184) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][430/625] eta 0:01:21 lr 0.000309 wd 0.0500 time 0.4066 (0.4173) data time 0.0008 (0.0023) model time 0.4058 (0.4116) loss 6.5134 (6.8959) grad_norm 1.9776 (3.4232) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][440/625] eta 0:01:17 lr 0.000309 wd 0.0500 time 0.4089 (0.4170) data time 0.0008 (0.0022) model time 0.4081 (0.4113) loss 7.7359 (6.8919) grad_norm 3.6460 (3.4114) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][450/625] eta 0:01:12 lr 0.000309 wd 0.0500 time 0.3980 (0.4166) data time 0.0006 (0.0022) model time 0.3974 (0.4110) loss 7.1233 (6.8892) grad_norm 2.6438 (3.3941) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][460/625] eta 0:01:08 lr 0.000309 wd 0.0500 time 0.3986 (0.4163) data time 0.0008 (0.0022) model time 0.3977 (0.4107) loss 6.5417 (6.8839) grad_norm 2.8824 (3.3778) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][470/625] eta 0:01:04 lr 0.000309 wd 0.0500 time 0.3968 (0.4165) data time 0.0009 (0.0021) model time 0.3959 (0.4111) loss 5.5785 (6.8762) grad_norm 3.7188 (3.3627) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][480/625] eta 0:01:00 lr 0.000308 wd 0.0500 time 0.4203 (0.4174) data time 0.0008 (0.0021) model time 0.4194 (0.4122) loss 7.2614 (6.8779) grad_norm 1.8651 (3.3494) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][490/625] eta 0:00:56 lr 0.000308 wd 0.0500 time 0.3979 (0.4188) data time 0.0009 (0.0021) model time 0.3970 (0.4139) loss 7.2501 (6.8776) grad_norm 9.9207 (3.3442) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][500/625] eta 0:00:52 lr 0.000308 wd 0.0500 time 0.3984 (0.4198) data time 0.0006 (0.0021) model time 0.3978 (0.4151) loss 8.0306 (6.8783) grad_norm 2.6557 (3.3333) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][510/625] eta 0:00:48 lr 0.000308 wd 0.0500 time 0.3982 (0.4215) data time 0.0007 (0.0020) model time 0.3974 (0.4170) loss 5.5344 (6.8790) grad_norm 1.7619 (3.3169) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][520/625] eta 0:00:44 lr 0.000308 wd 0.0500 time 0.3999 (0.4211) data time 0.0007 (0.0020) model time 0.3993 (0.4167) loss 7.2012 (6.8816) grad_norm 2.4623 (3.3077) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][530/625] eta 0:00:39 lr 0.000308 wd 0.0500 time 0.4015 (0.4207) data time 0.0009 (0.0020) model time 0.4007 (0.4163) loss 6.2676 (6.8801) grad_norm 4.0331 (3.3066) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][540/625] eta 0:00:35 lr 0.000308 wd 0.0500 time 0.3978 (0.4203) data time 0.0007 (0.0020) model time 0.3971 (0.4159) loss 7.6797 (6.8798) grad_norm 2.2521 (3.2958) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:15:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][550/625] eta 0:00:31 lr 0.000308 wd 0.0500 time 0.4010 (0.4203) data time 0.0007 (0.0019) model time 0.4002 (0.4160) loss 6.4367 (6.8833) grad_norm 1.7673 (3.2900) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][560/625] eta 0:00:27 lr 0.000308 wd 0.0500 time 0.4039 (0.4199) data time 0.0007 (0.0019) model time 0.4032 (0.4157) loss 7.2764 (6.8798) grad_norm 2.1677 (3.2870) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][570/625] eta 0:00:23 lr 0.000308 wd 0.0500 time 0.3982 (0.4196) data time 0.0008 (0.0019) model time 0.3974 (0.4154) loss 6.6095 (6.8810) grad_norm 2.3711 (3.2988) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][580/625] eta 0:00:18 lr 0.000307 wd 0.0500 time 0.4002 (0.4193) data time 0.0007 (0.0019) model time 0.3995 (0.4151) loss 5.8462 (6.8830) grad_norm 3.2028 (3.3144) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][590/625] eta 0:00:14 lr 0.000307 wd 0.0500 time 0.3962 (0.4189) data time 0.0009 (0.0019) model time 0.3953 (0.4148) loss 6.7267 (6.8811) grad_norm 11.5767 (3.3217) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][600/625] eta 0:00:10 lr 0.000307 wd 0.0500 time 0.3944 (0.4186) data time 0.0007 (0.0019) model time 0.3938 (0.4145) loss 5.9230 (6.8809) grad_norm 2.4441 (3.3144) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][610/625] eta 0:00:06 lr 0.000307 wd 0.0500 time 0.3986 (0.4183) data time 0.0006 (0.0018) model time 0.3980 (0.4142) loss 6.8938 (6.8797) grad_norm 2.4234 (3.3029) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][620/625] eta 0:00:02 lr 0.000307 wd 0.0500 time 0.3943 (0.4180) data time 0.0006 (0.0018) model time 0.3938 (0.4139) loss 5.5328 (6.8754) grad_norm 2.0536 (3.2954) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 206 training takes 0:04:21 [2024-07-25 07:16:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:16:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:16:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5620 (0.5620) Acc@1 90.039 (90.039) Acc@5 98.633 (98.633) Mem 14939MB [2024-07-25 07:16:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8608 (0.6802) Acc@1 80.957 (86.523) Acc@5 96.582 (97.767) Mem 14939MB [2024-07-25 07:16:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9644 (0.7989) Acc@1 77.930 (83.343) Acc@5 95.215 (96.582) Mem 14939MB [2024-07-25 07:16:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.957 Acc@5 96.553 [2024-07-25 07:16:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 07:16:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.96% [2024-07-25 07:16:31 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 07:16:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 07:16:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5459 (0.5459) Acc@1 89.990 (89.990) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 07:16:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8467 (0.6743) Acc@1 81.689 (86.617) Acc@5 96.338 (97.798) Mem 14939MB [2024-07-25 07:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9751 (0.7872) Acc@1 77.100 (83.529) Acc@5 95.459 (96.740) Mem 14939MB [2024-07-25 07:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.103 Acc@5 96.697 [2024-07-25 07:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.10% [2024-07-25 07:16:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:16:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:16:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][0/625] eta 0:08:31 lr 0.000307 wd 0.0500 time 0.8186 (0.8186) data time 0.4375 (0.4375) model time 0.0000 (0.0000) loss 7.5215 (7.5215) grad_norm 3.1191 (3.1191) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][10/625] eta 0:04:27 lr 0.000307 wd 0.0500 time 0.3938 (0.4355) data time 0.0008 (0.0405) model time 0.0000 (0.0000) loss 5.9666 (7.0354) grad_norm 2.0469 (2.5932) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][20/625] eta 0:04:13 lr 0.000307 wd 0.0500 time 0.3977 (0.4182) data time 0.0008 (0.0216) model time 0.0000 (0.0000) loss 7.7314 (6.9462) grad_norm 2.0594 (2.5545) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][30/625] eta 0:04:04 lr 0.000307 wd 0.0500 time 0.3962 (0.4115) data time 0.0010 (0.0149) model time 0.0000 (0.0000) loss 7.9384 (6.9464) grad_norm 2.2021 (2.5824) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][40/625] eta 0:03:58 lr 0.000307 wd 0.0500 time 0.3975 (0.4084) data time 0.0009 (0.0115) model time 0.0000 (0.0000) loss 6.1117 (6.9564) grad_norm 2.6394 (2.6820) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:16:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][50/625] eta 0:03:55 lr 0.000307 wd 0.0500 time 0.3959 (0.4095) data time 0.0007 (0.0094) model time 0.0000 (0.0000) loss 7.0880 (6.9283) grad_norm 2.6180 (2.7345) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][60/625] eta 0:03:51 lr 0.000307 wd 0.0500 time 0.5596 (0.4103) data time 0.0008 (0.0080) model time 0.5589 (0.4133) loss 6.6838 (6.9009) grad_norm 2.9604 (2.7736) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][70/625] eta 0:03:49 lr 0.000306 wd 0.0500 time 0.5213 (0.4126) data time 0.0007 (0.0070) model time 0.5206 (0.4197) loss 5.9246 (6.8848) grad_norm 2.1778 (2.8055) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][80/625] eta 0:03:48 lr 0.000306 wd 0.0500 time 0.6298 (0.4193) data time 0.0009 (0.0062) model time 0.6290 (0.4351) loss 7.4531 (6.8721) grad_norm 2.2902 (2.9145) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][90/625] eta 0:03:48 lr 0.000306 wd 0.0500 time 0.5867 (0.4274) data time 0.0008 (0.0056) model time 0.5859 (0.4494) loss 6.3932 (6.8638) grad_norm 3.0127 (2.9220) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][100/625] eta 0:03:45 lr 0.000306 wd 0.0500 time 0.3936 (0.4301) data time 0.0008 (0.0051) model time 0.3928 (0.4502) loss 7.7116 (6.8501) grad_norm 2.1601 (2.8979) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][110/625] eta 0:03:42 lr 0.000306 wd 0.0500 time 0.4074 (0.4315) data time 0.0007 (0.0047) model time 0.4068 (0.4494) loss 7.0196 (6.8539) grad_norm 2.0277 (2.8517) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][120/625] eta 0:03:36 lr 0.000306 wd 0.0500 time 0.3971 (0.4289) data time 0.0007 (0.0044) model time 0.3965 (0.4422) loss 6.9183 (6.8594) grad_norm 4.4657 (2.9261) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][130/625] eta 0:03:31 lr 0.000306 wd 0.0500 time 0.3993 (0.4266) data time 0.0007 (0.0041) model time 0.3986 (0.4367) loss 6.0759 (6.8187) grad_norm 1.9113 (2.8881) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][140/625] eta 0:03:25 lr 0.000306 wd 0.0500 time 0.3972 (0.4247) data time 0.0008 (0.0039) model time 0.3964 (0.4324) loss 7.7231 (6.8589) grad_norm 4.5270 (2.8955) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][150/625] eta 0:03:20 lr 0.000306 wd 0.0500 time 0.4007 (0.4229) data time 0.0009 (0.0037) model time 0.3999 (0.4290) loss 7.2089 (6.8612) grad_norm 2.4701 (2.8883) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][160/625] eta 0:03:15 lr 0.000306 wd 0.0500 time 0.3940 (0.4214) data time 0.0007 (0.0035) model time 0.3933 (0.4261) loss 7.3346 (6.8636) grad_norm 3.4758 (2.8945) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][170/625] eta 0:03:11 lr 0.000306 wd 0.0500 time 0.3993 (0.4201) data time 0.0007 (0.0034) model time 0.3986 (0.4238) loss 7.1533 (6.8844) grad_norm 2.1063 (2.9091) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][180/625] eta 0:03:06 lr 0.000305 wd 0.0500 time 0.3990 (0.4191) data time 0.0009 (0.0032) model time 0.3981 (0.4221) loss 7.9450 (6.8596) grad_norm 1.9515 (2.8935) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][190/625] eta 0:03:01 lr 0.000305 wd 0.0500 time 0.3973 (0.4180) data time 0.0006 (0.0031) model time 0.3967 (0.4203) loss 6.3407 (6.8437) grad_norm 5.2853 (2.9162) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][200/625] eta 0:02:57 lr 0.000305 wd 0.0500 time 0.4011 (0.4171) data time 0.0008 (0.0030) model time 0.4003 (0.4189) loss 6.8089 (6.8582) grad_norm 4.5245 (2.9376) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][210/625] eta 0:02:52 lr 0.000305 wd 0.0500 time 0.3973 (0.4163) data time 0.0007 (0.0029) model time 0.3966 (0.4176) loss 6.9776 (6.8570) grad_norm 4.6828 (2.9505) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][220/625] eta 0:02:48 lr 0.000305 wd 0.0500 time 0.3958 (0.4155) data time 0.0009 (0.0028) model time 0.3948 (0.4165) loss 7.0408 (6.8428) grad_norm 2.7642 (2.9654) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][230/625] eta 0:02:43 lr 0.000305 wd 0.0500 time 0.4101 (0.4148) data time 0.0009 (0.0027) model time 0.4092 (0.4155) loss 7.4123 (6.8487) grad_norm 3.7915 (2.9599) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][240/625] eta 0:02:39 lr 0.000305 wd 0.0500 time 0.3985 (0.4141) data time 0.0009 (0.0026) model time 0.3976 (0.4146) loss 6.3355 (6.8573) grad_norm 1.9128 (2.9340) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][250/625] eta 0:02:35 lr 0.000305 wd 0.0500 time 0.3985 (0.4135) data time 0.0008 (0.0026) model time 0.3977 (0.4137) loss 6.6428 (6.8578) grad_norm 7.0923 (2.9459) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][260/625] eta 0:02:30 lr 0.000305 wd 0.0500 time 0.3962 (0.4130) data time 0.0009 (0.0025) model time 0.3954 (0.4130) loss 6.2536 (6.8601) grad_norm 2.1249 (2.9453) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][270/625] eta 0:02:26 lr 0.000305 wd 0.0500 time 0.3937 (0.4130) data time 0.0008 (0.0024) model time 0.3930 (0.4129) loss 7.3382 (6.8711) grad_norm 2.5803 (2.9396) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][280/625] eta 0:02:22 lr 0.000305 wd 0.0500 time 0.3950 (0.4129) data time 0.0009 (0.0024) model time 0.3942 (0.4128) loss 6.7176 (6.8692) grad_norm 2.2344 (2.9464) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][290/625] eta 0:02:18 lr 0.000304 wd 0.0500 time 0.5816 (0.4135) data time 0.0006 (0.0023) model time 0.5810 (0.4135) loss 5.7689 (6.8718) grad_norm 2.3370 (2.9648) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][300/625] eta 0:02:14 lr 0.000304 wd 0.0500 time 0.5716 (0.4152) data time 0.0008 (0.0023) model time 0.5708 (0.4155) loss 6.5104 (6.8842) grad_norm 2.7240 (2.9650) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][310/625] eta 0:02:11 lr 0.000304 wd 0.0500 time 0.5632 (0.4174) data time 0.0007 (0.0022) model time 0.5625 (0.4181) loss 6.5928 (6.8743) grad_norm 3.4252 (2.9635) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][320/625] eta 0:02:07 lr 0.000304 wd 0.0500 time 0.5598 (0.4195) data time 0.0007 (0.0022) model time 0.5591 (0.4205) loss 7.3796 (6.8692) grad_norm 3.1043 (2.9597) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][330/625] eta 0:02:03 lr 0.000304 wd 0.0500 time 0.4135 (0.4203) data time 0.0007 (0.0021) model time 0.4128 (0.4214) loss 7.6274 (6.8692) grad_norm 2.7810 (2.9503) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:18:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][340/625] eta 0:01:59 lr 0.000304 wd 0.0500 time 0.3981 (0.4197) data time 0.0010 (0.0021) model time 0.3971 (0.4206) loss 7.8760 (6.8692) grad_norm 2.0852 (2.9315) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][350/625] eta 0:01:55 lr 0.000304 wd 0.0500 time 0.3970 (0.4191) data time 0.0008 (0.0021) model time 0.3962 (0.4199) loss 7.1387 (6.8752) grad_norm 2.8850 (2.9898) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][360/625] eta 0:01:50 lr 0.000304 wd 0.0500 time 0.4009 (0.4186) data time 0.0009 (0.0020) model time 0.4000 (0.4193) loss 7.3312 (6.8923) grad_norm 2.6352 (2.9898) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][370/625] eta 0:01:46 lr 0.000304 wd 0.0500 time 0.3994 (0.4181) data time 0.0007 (0.0020) model time 0.3987 (0.4186) loss 5.8403 (6.8877) grad_norm 2.4934 (2.9808) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][380/625] eta 0:01:42 lr 0.000304 wd 0.0500 time 0.3975 (0.4175) data time 0.0007 (0.0020) model time 0.3968 (0.4180) loss 6.8466 (6.8794) grad_norm 2.2402 (2.9670) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][390/625] eta 0:01:38 lr 0.000303 wd 0.0500 time 0.3978 (0.4170) data time 0.0009 (0.0019) model time 0.3969 (0.4174) loss 6.3491 (6.8717) grad_norm 3.1094 (2.9613) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][400/625] eta 0:01:33 lr 0.000303 wd 0.0500 time 0.3952 (0.4166) data time 0.0009 (0.0019) model time 0.3944 (0.4168) loss 7.6006 (6.8748) grad_norm 3.1575 (2.9950) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][410/625] eta 0:01:29 lr 0.000303 wd 0.0500 time 0.3993 (0.4161) data time 0.0010 (0.0019) model time 0.3984 (0.4162) loss 7.0326 (6.8772) grad_norm 1.7589 (2.9938) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][420/625] eta 0:01:25 lr 0.000303 wd 0.0500 time 0.4020 (0.4157) data time 0.0009 (0.0019) model time 0.4011 (0.4158) loss 7.1639 (6.8847) grad_norm 2.7520 (2.9872) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][430/625] eta 0:01:20 lr 0.000303 wd 0.0500 time 0.4100 (0.4154) data time 0.0006 (0.0018) model time 0.4094 (0.4153) loss 6.6727 (6.8896) grad_norm 3.3392 (2.9815) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][440/625] eta 0:01:16 lr 0.000303 wd 0.0500 time 0.4025 (0.4150) data time 0.0008 (0.0018) model time 0.4017 (0.4149) loss 7.3299 (6.8849) grad_norm 3.3168 (2.9725) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][450/625] eta 0:01:12 lr 0.000303 wd 0.0500 time 0.4060 (0.4147) data time 0.0006 (0.0018) model time 0.4054 (0.4146) loss 6.4731 (6.8862) grad_norm 10.5743 (3.0090) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][460/625] eta 0:01:08 lr 0.000303 wd 0.0500 time 0.3999 (0.4144) data time 0.0007 (0.0018) model time 0.3992 (0.4142) loss 6.6261 (6.8889) grad_norm 2.4179 (3.0049) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][470/625] eta 0:01:04 lr 0.000303 wd 0.0500 time 0.4016 (0.4141) data time 0.0009 (0.0018) model time 0.4007 (0.4138) loss 6.5562 (6.8890) grad_norm 3.4255 (3.0081) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][480/625] eta 0:00:59 lr 0.000303 wd 0.0500 time 0.3990 (0.4138) data time 0.0007 (0.0017) model time 0.3984 (0.4135) loss 5.9337 (6.8917) grad_norm 2.4333 (3.0086) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:19:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][490/625] eta 0:00:55 lr 0.000303 wd 0.0500 time 0.3983 (0.4139) data time 0.0007 (0.0017) model time 0.3977 (0.4135) loss 6.5351 (6.8952) grad_norm 2.5900 (3.0098) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:20:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][500/625] eta 0:00:51 lr 0.000302 wd 0.0500 time 0.3991 (0.4139) data time 0.0007 (0.0017) model time 0.3984 (0.4136) loss 6.7466 (6.8960) grad_norm 3.5056 (3.0054) loss_scale 512.0000 (259.5768) mem 14939MB [2024-07-25 07:20:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][510/625] eta 0:00:47 lr 0.000302 wd 0.0500 time 0.3982 (0.4144) data time 0.0007 (0.0017) model time 0.3975 (0.4141) loss 6.9793 (6.8999) grad_norm 2.4431 (3.0125) loss_scale 512.0000 (264.5166) mem 14939MB [2024-07-25 07:20:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][520/625] eta 0:00:43 lr 0.000302 wd 0.0500 time 0.5795 (0.4152) data time 0.0009 (0.0017) model time 0.5787 (0.4150) loss 7.3184 (6.8986) grad_norm 2.3156 (3.0036) loss_scale 512.0000 (269.2668) mem 14939MB [2024-07-25 07:20:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][530/625] eta 0:00:39 lr 0.000302 wd 0.0500 time 0.3976 (0.4165) data time 0.0008 (0.0016) model time 0.3968 (0.4164) loss 6.9018 (6.8985) grad_norm 22.6365 (3.0424) loss_scale 512.0000 (273.8380) mem 14939MB [2024-07-25 07:20:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][540/625] eta 0:00:35 lr 0.000302 wd 0.0500 time 0.6045 (0.4175) data time 0.0007 (0.0016) model time 0.6038 (0.4175) loss 8.0579 (6.9028) grad_norm 10.3002 (3.0548) loss_scale 512.0000 (278.2403) mem 14939MB [2024-07-25 07:20:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][550/625] eta 0:00:31 lr 0.000302 wd 0.0500 time 0.3980 (0.4177) data time 0.0009 (0.0016) model time 0.3970 (0.4178) loss 8.3135 (6.9054) grad_norm 2.3333 (3.0873) loss_scale 512.0000 (282.4828) mem 14939MB [2024-07-25 07:20:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][560/625] eta 0:00:27 lr 0.000302 wd 0.0500 time 0.3993 (0.4174) data time 0.0008 (0.0016) model time 0.3984 (0.4174) loss 7.7468 (6.9066) grad_norm 2.0420 (3.0992) loss_scale 512.0000 (286.5740) mem 14939MB [2024-07-25 07:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][570/625] eta 0:00:22 lr 0.000302 wd 0.0500 time 0.4619 (0.4172) data time 0.0007 (0.0016) model time 0.4612 (0.4171) loss 8.3615 (6.9102) grad_norm 3.4440 (3.0916) loss_scale 512.0000 (290.5219) mem 14939MB [2024-07-25 07:20:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][580/625] eta 0:00:18 lr 0.000302 wd 0.0500 time 0.4077 (0.4169) data time 0.0007 (0.0016) model time 0.4070 (0.4168) loss 6.0855 (6.9157) grad_norm 5.2571 (3.0900) loss_scale 512.0000 (294.3339) mem 14939MB [2024-07-25 07:20:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][590/625] eta 0:00:14 lr 0.000302 wd 0.0500 time 0.3967 (0.4166) data time 0.0007 (0.0016) model time 0.3960 (0.4164) loss 6.4092 (6.9172) grad_norm 8.1042 (3.1039) loss_scale 512.0000 (298.0169) mem 14939MB [2024-07-25 07:20:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][600/625] eta 0:00:10 lr 0.000302 wd 0.0500 time 0.3939 (0.4163) data time 0.0009 (0.0016) model time 0.3931 (0.4161) loss 7.8631 (6.9242) grad_norm 3.6464 (3.1137) loss_scale 512.0000 (301.5774) mem 14939MB [2024-07-25 07:20:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][610/625] eta 0:00:06 lr 0.000301 wd 0.0500 time 0.4049 (0.4160) data time 0.0006 (0.0015) model time 0.4043 (0.4158) loss 6.8789 (6.9170) grad_norm 2.3808 (3.1049) loss_scale 512.0000 (305.0213) mem 14939MB [2024-07-25 07:20:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][620/625] eta 0:00:02 lr 0.000301 wd 0.0500 time 0.3973 (0.4157) data time 0.0006 (0.0015) model time 0.3967 (0.4154) loss 7.4787 (6.9173) grad_norm 1.9993 (3.1034) loss_scale 512.0000 (308.3543) mem 14939MB [2024-07-25 07:20:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 207 training takes 0:04:19 [2024-07-25 07:20:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:20:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:20:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5654 (0.5654) Acc@1 89.258 (89.258) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 07:20:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8721 (0.6847) Acc@1 81.396 (86.519) Acc@5 96.143 (97.785) Mem 14939MB [2024-07-25 07:20:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9722 (0.8014) Acc@1 77.783 (83.415) Acc@5 95.508 (96.654) Mem 14939MB [2024-07-25 07:20:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.953 Acc@5 96.613 [2024-07-25 07:20:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 07:20:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.922 (0.922) Loss 0.5454 (0.5454) Acc@1 89.990 (89.990) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 07:21:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.168) Loss 0.8462 (0.6737) Acc@1 81.689 (86.630) Acc@5 96.289 (97.794) Mem 14939MB [2024-07-25 07:21:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.129) Loss 0.9736 (0.7866) Acc@1 77.100 (83.540) Acc@5 95.508 (96.731) Mem 14939MB [2024-07-25 07:21:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.113 Acc@5 96.699 [2024-07-25 07:21:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:21:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.11% [2024-07-25 07:21:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:21:02 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:21:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][0/625] eta 0:07:58 lr 0.000301 wd 0.0500 time 0.7661 (0.7661) data time 0.3739 (0.3739) model time 0.0000 (0.0000) loss 6.0277 (6.0277) grad_norm 2.0518 (2.0518) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][10/625] eta 0:04:26 lr 0.000301 wd 0.0500 time 0.3963 (0.4328) data time 0.0007 (0.0347) model time 0.0000 (0.0000) loss 5.9216 (7.0682) grad_norm 2.0337 (2.5556) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][20/625] eta 0:04:17 lr 0.000301 wd 0.0500 time 0.3995 (0.4264) data time 0.0006 (0.0186) model time 0.0000 (0.0000) loss 6.9709 (6.8699) grad_norm 2.4318 (2.8657) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][30/625] eta 0:04:08 lr 0.000301 wd 0.0500 time 0.3955 (0.4175) data time 0.0008 (0.0128) model time 0.0000 (0.0000) loss 5.6168 (6.7727) grad_norm 2.4041 (2.7441) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][40/625] eta 0:04:01 lr 0.000301 wd 0.0500 time 0.3968 (0.4131) data time 0.0006 (0.0099) model time 0.0000 (0.0000) loss 6.0742 (6.7957) grad_norm 5.6302 (3.0525) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][50/625] eta 0:03:55 lr 0.000301 wd 0.0500 time 0.3951 (0.4100) data time 0.0009 (0.0081) model time 0.0000 (0.0000) loss 7.6714 (6.8481) grad_norm 4.1185 (3.2924) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][60/625] eta 0:03:50 lr 0.000301 wd 0.0500 time 0.3952 (0.4083) data time 0.0008 (0.0069) model time 0.3944 (0.3985) loss 7.3319 (6.8475) grad_norm 2.5574 (3.4078) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][70/625] eta 0:03:45 lr 0.000301 wd 0.0500 time 0.3955 (0.4068) data time 0.0007 (0.0061) model time 0.3948 (0.3978) loss 7.1440 (6.8551) grad_norm 5.1772 (3.4528) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][80/625] eta 0:03:41 lr 0.000301 wd 0.0500 time 0.3973 (0.4059) data time 0.0009 (0.0054) model time 0.3964 (0.3981) loss 6.8817 (6.8361) grad_norm 2.8387 (3.4024) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][90/625] eta 0:03:36 lr 0.000301 wd 0.0500 time 0.4020 (0.4050) data time 0.0006 (0.0049) model time 0.4013 (0.3977) loss 5.5487 (6.8392) grad_norm 4.3795 (3.3459) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][100/625] eta 0:03:33 lr 0.000300 wd 0.0500 time 0.3968 (0.4060) data time 0.0007 (0.0045) model time 0.3961 (0.4010) loss 6.6495 (6.8199) grad_norm 2.7905 (3.2917) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][110/625] eta 0:03:30 lr 0.000300 wd 0.0500 time 0.5128 (0.4090) data time 0.0009 (0.0042) model time 0.5120 (0.4073) loss 8.4085 (6.8282) grad_norm 3.4731 (3.3233) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][120/625] eta 0:03:31 lr 0.000300 wd 0.0500 time 0.5607 (0.4188) data time 0.0008 (0.0039) model time 0.5599 (0.4243) loss 7.6937 (6.8361) grad_norm 2.7842 (3.4654) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:21:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][130/625] eta 0:03:28 lr 0.000300 wd 0.0500 time 0.5863 (0.4208) data time 0.0009 (0.0037) model time 0.5854 (0.4269) loss 6.5940 (6.8271) grad_norm 2.9767 (3.4640) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][140/625] eta 0:03:25 lr 0.000300 wd 0.0500 time 0.3939 (0.4244) data time 0.0009 (0.0035) model time 0.3929 (0.4317) loss 7.6027 (6.8248) grad_norm 2.3954 (3.5612) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][150/625] eta 0:03:21 lr 0.000300 wd 0.0500 time 0.3924 (0.4248) data time 0.0009 (0.0033) model time 0.3915 (0.4315) loss 6.8179 (6.8389) grad_norm 3.3284 (3.5498) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][160/625] eta 0:03:16 lr 0.000300 wd 0.0500 time 0.3972 (0.4231) data time 0.0006 (0.0031) model time 0.3966 (0.4284) loss 5.2092 (6.8347) grad_norm 2.7695 (3.5518) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][170/625] eta 0:03:11 lr 0.000300 wd 0.0500 time 0.3977 (0.4217) data time 0.0007 (0.0030) model time 0.3971 (0.4259) loss 6.9866 (6.8339) grad_norm 2.1738 (3.4918) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][180/625] eta 0:03:07 lr 0.000300 wd 0.0500 time 0.4118 (0.4206) data time 0.0006 (0.0029) model time 0.4112 (0.4239) loss 6.2323 (6.8246) grad_norm 1.6966 (3.4219) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][190/625] eta 0:03:02 lr 0.000300 wd 0.0500 time 0.3894 (0.4195) data time 0.0009 (0.0028) model time 0.3885 (0.4222) loss 7.5900 (6.8159) grad_norm 1.6315 (3.3640) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][200/625] eta 0:02:57 lr 0.000300 wd 0.0500 time 0.3969 (0.4184) data time 0.0006 (0.0027) model time 0.3963 (0.4204) loss 7.4605 (6.8115) grad_norm 3.5552 (3.3392) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][210/625] eta 0:02:53 lr 0.000299 wd 0.0500 time 0.3963 (0.4174) data time 0.0009 (0.0026) model time 0.3954 (0.4189) loss 7.6924 (6.7969) grad_norm 1.9529 (3.3128) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][220/625] eta 0:02:48 lr 0.000299 wd 0.0500 time 0.4014 (0.4165) data time 0.0008 (0.0025) model time 0.4006 (0.4177) loss 5.8940 (6.7899) grad_norm 2.2675 (3.2934) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][230/625] eta 0:02:44 lr 0.000299 wd 0.0500 time 0.4005 (0.4158) data time 0.0007 (0.0024) model time 0.3999 (0.4166) loss 6.1568 (6.7838) grad_norm 2.8965 (3.2615) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][240/625] eta 0:02:40 lr 0.000299 wd 0.0500 time 0.3959 (0.4157) data time 0.0008 (0.0024) model time 0.3951 (0.4164) loss 7.7273 (6.7961) grad_norm 2.4624 (3.2638) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][250/625] eta 0:02:35 lr 0.000299 wd 0.0500 time 0.4003 (0.4151) data time 0.0009 (0.0023) model time 0.3994 (0.4156) loss 7.3660 (6.7884) grad_norm 2.7156 (3.2327) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][260/625] eta 0:02:31 lr 0.000299 wd 0.0500 time 0.4058 (0.4146) data time 0.0009 (0.0022) model time 0.4049 (0.4148) loss 6.6705 (6.7793) grad_norm 2.8793 (3.2135) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][270/625] eta 0:02:26 lr 0.000299 wd 0.0500 time 0.3980 (0.4140) data time 0.0008 (0.0022) model time 0.3973 (0.4141) loss 6.4705 (6.7769) grad_norm 2.5958 (3.1948) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:22:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][280/625] eta 0:02:22 lr 0.000299 wd 0.0500 time 0.4022 (0.4135) data time 0.0009 (0.0021) model time 0.4013 (0.4135) loss 7.5156 (6.7922) grad_norm 2.6094 (3.1794) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:23:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][290/625] eta 0:02:18 lr 0.000299 wd 0.0500 time 0.3953 (0.4130) data time 0.0010 (0.0021) model time 0.3943 (0.4128) loss 7.2089 (6.7963) grad_norm 2.1032 (3.1743) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:23:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][300/625] eta 0:02:14 lr 0.000299 wd 0.0500 time 0.3988 (0.4125) data time 0.0008 (0.0021) model time 0.3979 (0.4122) loss 6.8985 (6.8060) grad_norm 2.3431 (3.1541) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:23:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][310/625] eta 0:02:09 lr 0.000299 wd 0.0500 time 0.5800 (0.4127) data time 0.0009 (0.0020) model time 0.5792 (0.4124) loss 6.4263 (6.8031) grad_norm 1.9683 (3.1651) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:23:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][320/625] eta 0:02:05 lr 0.000298 wd 0.0500 time 0.4037 (0.4122) data time 0.0008 (0.0020) model time 0.4029 (0.4118) loss 7.7784 (6.8141) grad_norm 2.3831 (inf) loss_scale 256.0000 (506.4174) mem 14939MB [2024-07-25 07:23:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][330/625] eta 0:02:02 lr 0.000298 wd 0.0500 time 0.5939 (0.4136) data time 0.0009 (0.0019) model time 0.5930 (0.4134) loss 6.9143 (6.8211) grad_norm 5.5015 (inf) loss_scale 256.0000 (498.8520) mem 14939MB [2024-07-25 07:23:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][340/625] eta 0:01:58 lr 0.000298 wd 0.0500 time 0.5889 (0.4158) data time 0.0008 (0.0019) model time 0.5880 (0.4160) loss 5.4769 (6.8220) grad_norm 2.8872 (inf) loss_scale 256.0000 (491.7302) mem 14939MB [2024-07-25 07:23:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][350/625] eta 0:01:54 lr 0.000298 wd 0.0500 time 0.5136 (0.4170) data time 0.0008 (0.0019) model time 0.5127 (0.4173) loss 5.7540 (6.8271) grad_norm 2.1527 (inf) loss_scale 256.0000 (485.0142) mem 14939MB [2024-07-25 07:23:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][360/625] eta 0:01:50 lr 0.000298 wd 0.0500 time 0.3952 (0.4175) data time 0.0009 (0.0018) model time 0.3943 (0.4180) loss 7.2411 (6.8251) grad_norm 1.9705 (inf) loss_scale 256.0000 (478.6704) mem 14939MB [2024-07-25 07:23:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][370/625] eta 0:01:46 lr 0.000298 wd 0.0500 time 0.3982 (0.4181) data time 0.0008 (0.0018) model time 0.3974 (0.4185) loss 6.2868 (6.8123) grad_norm 2.4662 (inf) loss_scale 256.0000 (472.6685) mem 14939MB [2024-07-25 07:23:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][380/625] eta 0:01:42 lr 0.000298 wd 0.0500 time 0.3973 (0.4176) data time 0.0009 (0.0018) model time 0.3964 (0.4179) loss 6.5367 (6.8188) grad_norm 2.0581 (inf) loss_scale 256.0000 (466.9816) mem 14939MB [2024-07-25 07:23:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][390/625] eta 0:01:38 lr 0.000298 wd 0.0500 time 0.3993 (0.4171) data time 0.0006 (0.0018) model time 0.3987 (0.4174) loss 6.0944 (6.8127) grad_norm 2.4688 (inf) loss_scale 256.0000 (461.5857) mem 14939MB [2024-07-25 07:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][400/625] eta 0:01:33 lr 0.000298 wd 0.0500 time 0.3991 (0.4167) data time 0.0008 (0.0017) model time 0.3983 (0.4169) loss 5.5460 (6.8170) grad_norm 1.8728 (inf) loss_scale 256.0000 (456.4589) mem 14939MB [2024-07-25 07:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][410/625] eta 0:01:29 lr 0.000298 wd 0.0500 time 0.3968 (0.4163) data time 0.0010 (0.0017) model time 0.3958 (0.4164) loss 5.8815 (6.8217) grad_norm 2.9325 (inf) loss_scale 256.0000 (451.5815) mem 14939MB [2024-07-25 07:23:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][420/625] eta 0:01:25 lr 0.000298 wd 0.0500 time 0.3970 (0.4159) data time 0.0006 (0.0017) model time 0.3963 (0.4159) loss 6.2080 (6.8142) grad_norm 4.1697 (inf) loss_scale 256.0000 (446.9359) mem 14939MB [2024-07-25 07:24:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][430/625] eta 0:01:21 lr 0.000297 wd 0.0500 time 0.3987 (0.4155) data time 0.0006 (0.0017) model time 0.3980 (0.4155) loss 6.9742 (6.8213) grad_norm 2.9874 (inf) loss_scale 256.0000 (442.5058) mem 14939MB [2024-07-25 07:24:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][440/625] eta 0:01:16 lr 0.000297 wd 0.0500 time 0.4006 (0.4152) data time 0.0007 (0.0017) model time 0.3999 (0.4150) loss 7.1529 (6.8200) grad_norm 3.0231 (inf) loss_scale 256.0000 (438.2766) mem 14939MB [2024-07-25 07:24:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][450/625] eta 0:01:12 lr 0.000297 wd 0.0500 time 0.4013 (0.4148) data time 0.0007 (0.0016) model time 0.4006 (0.4146) loss 7.3486 (6.8146) grad_norm 1.9154 (inf) loss_scale 256.0000 (434.2350) mem 14939MB [2024-07-25 07:24:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][460/625] eta 0:01:08 lr 0.000297 wd 0.0500 time 0.3983 (0.4147) data time 0.0006 (0.0016) model time 0.3977 (0.4145) loss 6.3024 (6.8128) grad_norm 1.8317 (inf) loss_scale 256.0000 (430.3688) mem 14939MB [2024-07-25 07:24:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][470/625] eta 0:01:04 lr 0.000297 wd 0.0500 time 0.3982 (0.4144) data time 0.0007 (0.0016) model time 0.3975 (0.4141) loss 6.2827 (6.8119) grad_norm 3.6899 (inf) loss_scale 256.0000 (426.6667) mem 14939MB [2024-07-25 07:24:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][480/625] eta 0:01:00 lr 0.000297 wd 0.0500 time 0.3999 (0.4141) data time 0.0008 (0.0016) model time 0.3991 (0.4137) loss 7.5620 (6.8106) grad_norm 13.0699 (inf) loss_scale 256.0000 (423.1185) mem 14939MB [2024-07-25 07:24:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][490/625] eta 0:00:55 lr 0.000297 wd 0.0500 time 0.3988 (0.4138) data time 0.0008 (0.0016) model time 0.3980 (0.4134) loss 5.7519 (6.8085) grad_norm 2.2831 (inf) loss_scale 256.0000 (419.7149) mem 14939MB [2024-07-25 07:24:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][500/625] eta 0:00:51 lr 0.000297 wd 0.0500 time 0.4008 (0.4135) data time 0.0008 (0.0016) model time 0.4000 (0.4131) loss 7.2404 (6.8142) grad_norm 2.2931 (inf) loss_scale 256.0000 (416.4471) mem 14939MB [2024-07-25 07:24:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][510/625] eta 0:00:47 lr 0.000297 wd 0.0500 time 0.3995 (0.4133) data time 0.0006 (0.0015) model time 0.3989 (0.4128) loss 6.9962 (6.8096) grad_norm 2.8801 (inf) loss_scale 256.0000 (413.3072) mem 14939MB [2024-07-25 07:24:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][520/625] eta 0:00:43 lr 0.000297 wd 0.0500 time 0.3975 (0.4130) data time 0.0009 (0.0015) model time 0.3965 (0.4125) loss 7.0198 (6.8099) grad_norm 2.4371 (inf) loss_scale 256.0000 (410.2879) mem 14939MB [2024-07-25 07:24:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][530/625] eta 0:00:39 lr 0.000297 wd 0.0500 time 0.4018 (0.4128) data time 0.0006 (0.0015) model time 0.4011 (0.4122) loss 7.9234 (6.8084) grad_norm 2.1168 (inf) loss_scale 256.0000 (407.3823) mem 14939MB [2024-07-25 07:24:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][540/625] eta 0:00:35 lr 0.000296 wd 0.0500 time 0.3997 (0.4127) data time 0.0007 (0.0015) model time 0.3990 (0.4122) loss 6.6263 (6.8038) grad_norm 2.8756 (inf) loss_scale 256.0000 (404.5841) mem 14939MB [2024-07-25 07:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][550/625] eta 0:00:31 lr 0.000296 wd 0.0500 time 0.4000 (0.4138) data time 0.0009 (0.0015) model time 0.3991 (0.4134) loss 7.9602 (6.8020) grad_norm 2.8717 (inf) loss_scale 256.0000 (401.8875) mem 14939MB [2024-07-25 07:24:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][560/625] eta 0:00:27 lr 0.000296 wd 0.0500 time 0.5986 (0.4158) data time 0.0009 (0.0015) model time 0.5977 (0.4155) loss 7.0970 (6.8046) grad_norm 2.5326 (inf) loss_scale 256.0000 (399.2870) mem 14939MB [2024-07-25 07:25:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][570/625] eta 0:00:22 lr 0.000296 wd 0.0500 time 0.6012 (0.4168) data time 0.0008 (0.0015) model time 0.6004 (0.4167) loss 6.2403 (6.8012) grad_norm 1.7357 (inf) loss_scale 256.0000 (396.7776) mem 14939MB [2024-07-25 07:25:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][580/625] eta 0:00:18 lr 0.000296 wd 0.0500 time 0.5605 (0.4176) data time 0.0006 (0.0015) model time 0.5599 (0.4176) loss 6.1922 (6.7968) grad_norm 1.8571 (inf) loss_scale 256.0000 (394.3546) mem 14939MB [2024-07-25 07:25:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][590/625] eta 0:00:14 lr 0.000296 wd 0.0500 time 0.3978 (0.4175) data time 0.0007 (0.0014) model time 0.3971 (0.4174) loss 6.6495 (6.7956) grad_norm 3.1335 (inf) loss_scale 256.0000 (392.0135) mem 14939MB [2024-07-25 07:25:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][600/625] eta 0:00:10 lr 0.000296 wd 0.0500 time 0.3975 (0.4172) data time 0.0007 (0.0014) model time 0.3968 (0.4171) loss 6.6657 (6.7863) grad_norm 5.3642 (inf) loss_scale 256.0000 (389.7504) mem 14939MB [2024-07-25 07:25:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][610/625] eta 0:00:06 lr 0.000296 wd 0.0500 time 0.3949 (0.4169) data time 0.0006 (0.0014) model time 0.3944 (0.4168) loss 6.5863 (6.7883) grad_norm 3.3972 (inf) loss_scale 256.0000 (387.5614) mem 14939MB [2024-07-25 07:25:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][620/625] eta 0:00:02 lr 0.000296 wd 0.0500 time 0.3952 (0.4166) data time 0.0006 (0.0014) model time 0.3946 (0.4164) loss 6.9850 (6.7910) grad_norm 2.3838 (inf) loss_scale 256.0000 (385.4428) mem 14939MB [2024-07-25 07:25:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 208 training takes 0:04:20 [2024-07-25 07:25:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:25:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:25:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5562 (0.5562) Acc@1 89.844 (89.844) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 07:25:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.8574 (0.6865) Acc@1 81.348 (86.461) Acc@5 96.777 (97.723) Mem 14939MB [2024-07-25 07:25:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9561 (0.8001) Acc@1 78.662 (83.375) Acc@5 95.850 (96.663) Mem 14939MB [2024-07-25 07:25:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.965 Acc@5 96.627 [2024-07-25 07:25:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-07-25 07:25:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.97% [2024-07-25 07:25:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 07:25:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 07:25:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5454 (0.5454) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 07:25:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8447 (0.6730) Acc@1 81.641 (86.648) Acc@5 96.387 (97.798) Mem 14939MB [2024-07-25 07:25:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9717 (0.7858) Acc@1 77.246 (83.559) Acc@5 95.508 (96.742) Mem 14939MB [2024-07-25 07:25:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.131 Acc@5 96.711 [2024-07-25 07:25:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:25:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.13% [2024-07-25 07:25:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:25:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:25:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][0/625] eta 0:07:49 lr 0.000296 wd 0.0500 time 0.7519 (0.7519) data time 0.3666 (0.3666) model time 0.0000 (0.0000) loss 6.0101 (6.0101) grad_norm 1.8597 (1.8597) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:25:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][10/625] eta 0:04:25 lr 0.000296 wd 0.0500 time 0.4019 (0.4310) data time 0.0009 (0.0341) model time 0.0000 (0.0000) loss 6.8966 (6.5859) grad_norm 2.4205 (2.8536) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:25:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][20/625] eta 0:04:11 lr 0.000295 wd 0.0500 time 0.3989 (0.4154) data time 0.0007 (0.0182) model time 0.0000 (0.0000) loss 6.1394 (6.7291) grad_norm 2.0628 (2.7668) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:25:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][30/625] eta 0:04:03 lr 0.000295 wd 0.0500 time 0.3988 (0.4097) data time 0.0006 (0.0126) model time 0.0000 (0.0000) loss 7.5578 (6.8021) grad_norm 1.6376 (2.7555) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:25:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][40/625] eta 0:03:58 lr 0.000295 wd 0.0500 time 0.3966 (0.4076) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 7.5018 (6.7622) grad_norm 2.3329 (3.0419) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:25:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][50/625] eta 0:03:53 lr 0.000295 wd 0.0500 time 0.3991 (0.4062) data time 0.0006 (0.0080) model time 0.0000 (0.0000) loss 5.4321 (6.7067) grad_norm 2.6644 (3.1027) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:25:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][60/625] eta 0:03:48 lr 0.000295 wd 0.0500 time 0.3937 (0.4049) data time 0.0007 (0.0068) model time 0.3930 (0.3972) loss 7.9836 (6.7766) grad_norm 3.5042 (3.3073) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:25:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][70/625] eta 0:03:44 lr 0.000295 wd 0.0500 time 0.3958 (0.4038) data time 0.0008 (0.0060) model time 0.3950 (0.3969) loss 7.6293 (6.8171) grad_norm 2.5563 (3.3598) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][80/625] eta 0:03:39 lr 0.000295 wd 0.0500 time 0.3969 (0.4030) data time 0.0007 (0.0053) model time 0.3962 (0.3968) loss 6.3141 (6.7843) grad_norm 3.5407 (3.2585) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][90/625] eta 0:03:35 lr 0.000295 wd 0.0500 time 0.3956 (0.4024) data time 0.0008 (0.0048) model time 0.3948 (0.3968) loss 6.6965 (6.8057) grad_norm 3.4146 (3.2847) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][100/625] eta 0:03:31 lr 0.000295 wd 0.0500 time 0.3966 (0.4021) data time 0.0007 (0.0044) model time 0.3959 (0.3972) loss 5.9544 (6.8100) grad_norm 2.2438 (3.3481) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][110/625] eta 0:03:27 lr 0.000295 wd 0.0500 time 0.3978 (0.4020) data time 0.0009 (0.0041) model time 0.3969 (0.3975) loss 6.9927 (6.7859) grad_norm 2.6205 (3.2773) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][120/625] eta 0:03:22 lr 0.000295 wd 0.0500 time 0.3983 (0.4016) data time 0.0008 (0.0038) model time 0.3975 (0.3974) loss 6.3992 (6.7827) grad_norm 3.2944 (3.2402) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][130/625] eta 0:03:19 lr 0.000294 wd 0.0500 time 0.3953 (0.4026) data time 0.0008 (0.0036) model time 0.3944 (0.3996) loss 5.8195 (6.7820) grad_norm 2.4926 (3.1973) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][140/625] eta 0:03:15 lr 0.000294 wd 0.0500 time 0.5643 (0.4035) data time 0.0007 (0.0034) model time 0.5636 (0.4012) loss 6.3958 (6.7974) grad_norm 3.9692 (3.2356) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][150/625] eta 0:03:13 lr 0.000294 wd 0.0500 time 0.5680 (0.4079) data time 0.0006 (0.0032) model time 0.5673 (0.4080) loss 5.9300 (6.8005) grad_norm 1.9862 (3.2190) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][160/625] eta 0:03:11 lr 0.000294 wd 0.0500 time 0.5451 (0.4122) data time 0.0008 (0.0031) model time 0.5443 (0.4142) loss 7.6272 (6.7937) grad_norm 2.6524 (3.2287) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][170/625] eta 0:03:09 lr 0.000294 wd 0.0500 time 0.5985 (0.4170) data time 0.0008 (0.0029) model time 0.5977 (0.4207) loss 7.4101 (6.8176) grad_norm 4.3998 (3.2184) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][180/625] eta 0:03:06 lr 0.000294 wd 0.0500 time 0.4066 (0.4199) data time 0.0009 (0.0028) model time 0.4057 (0.4244) loss 6.8813 (6.8348) grad_norm 3.0114 (3.2015) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][190/625] eta 0:03:02 lr 0.000294 wd 0.0500 time 0.3959 (0.4188) data time 0.0010 (0.0027) model time 0.3949 (0.4226) loss 6.0042 (6.8255) grad_norm 3.5691 (3.2146) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][200/625] eta 0:02:57 lr 0.000294 wd 0.0500 time 0.3966 (0.4188) data time 0.0010 (0.0026) model time 0.3957 (0.4223) loss 7.5996 (6.8248) grad_norm 1.9868 (3.1867) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:26:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][210/625] eta 0:02:53 lr 0.000294 wd 0.0500 time 0.3979 (0.4179) data time 0.0009 (0.0026) model time 0.3971 (0.4209) loss 6.9437 (6.8218) grad_norm 2.1804 (3.1721) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][220/625] eta 0:02:48 lr 0.000294 wd 0.0500 time 0.4030 (0.4172) data time 0.0007 (0.0025) model time 0.4024 (0.4197) loss 8.3954 (6.8500) grad_norm 2.4980 (3.1588) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][230/625] eta 0:02:44 lr 0.000294 wd 0.0500 time 0.4052 (0.4165) data time 0.0008 (0.0024) model time 0.4044 (0.4186) loss 7.4171 (6.8304) grad_norm 2.4053 (3.1425) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][240/625] eta 0:02:40 lr 0.000293 wd 0.0500 time 0.3993 (0.4158) data time 0.0006 (0.0023) model time 0.3986 (0.4175) loss 6.2244 (6.8213) grad_norm 4.4184 (3.1597) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][250/625] eta 0:02:35 lr 0.000293 wd 0.0500 time 0.3978 (0.4150) data time 0.0008 (0.0023) model time 0.3969 (0.4164) loss 6.4629 (6.8159) grad_norm 2.1759 (3.1518) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][260/625] eta 0:02:31 lr 0.000293 wd 0.0500 time 0.3934 (0.4144) data time 0.0009 (0.0022) model time 0.3925 (0.4155) loss 6.4131 (6.8197) grad_norm 1.9130 (3.1581) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][270/625] eta 0:02:26 lr 0.000293 wd 0.0500 time 0.3994 (0.4138) data time 0.0008 (0.0022) model time 0.3986 (0.4147) loss 6.8251 (6.8328) grad_norm 2.0649 (3.1435) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][280/625] eta 0:02:22 lr 0.000293 wd 0.0500 time 0.3987 (0.4132) data time 0.0006 (0.0021) model time 0.3981 (0.4139) loss 5.7989 (6.8295) grad_norm 3.1485 (3.1283) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][290/625] eta 0:02:18 lr 0.000293 wd 0.0500 time 0.3998 (0.4127) data time 0.0007 (0.0021) model time 0.3992 (0.4133) loss 7.7242 (6.8318) grad_norm 6.1882 (3.1169) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][300/625] eta 0:02:13 lr 0.000293 wd 0.0500 time 0.3967 (0.4123) data time 0.0007 (0.0020) model time 0.3961 (0.4127) loss 6.8809 (6.8232) grad_norm 2.8736 (3.1109) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][310/625] eta 0:02:09 lr 0.000293 wd 0.0500 time 0.4001 (0.4119) data time 0.0007 (0.0020) model time 0.3995 (0.4122) loss 5.6165 (6.8170) grad_norm 2.7776 (3.0949) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][320/625] eta 0:02:05 lr 0.000293 wd 0.0500 time 0.4000 (0.4115) data time 0.0008 (0.0020) model time 0.3992 (0.4117) loss 6.7075 (6.8188) grad_norm 3.1502 (3.1062) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][330/625] eta 0:02:01 lr 0.000293 wd 0.0500 time 0.3996 (0.4112) data time 0.0006 (0.0019) model time 0.3990 (0.4113) loss 5.8679 (6.8108) grad_norm 4.2410 (3.1093) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][340/625] eta 0:01:57 lr 0.000293 wd 0.0500 time 0.3962 (0.4109) data time 0.0007 (0.0019) model time 0.3955 (0.4108) loss 7.5940 (6.8099) grad_norm 2.7026 (3.1051) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][350/625] eta 0:01:53 lr 0.000292 wd 0.0500 time 0.3968 (0.4110) data time 0.0009 (0.0019) model time 0.3960 (0.4110) loss 7.5886 (6.8116) grad_norm 2.3552 (3.1677) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:27:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][360/625] eta 0:01:48 lr 0.000292 wd 0.0500 time 0.5494 (0.4111) data time 0.0008 (0.0018) model time 0.5486 (0.4110) loss 7.1530 (6.8175) grad_norm 2.3279 (3.1541) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][370/625] eta 0:01:45 lr 0.000292 wd 0.0500 time 0.6112 (0.4131) data time 0.0006 (0.0018) model time 0.6106 (0.4134) loss 6.6331 (6.8144) grad_norm 1.8206 (3.1463) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][380/625] eta 0:01:41 lr 0.000292 wd 0.0500 time 0.5870 (0.4154) data time 0.0007 (0.0018) model time 0.5863 (0.4159) loss 6.7750 (6.8172) grad_norm 1.9254 (3.1405) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][390/625] eta 0:01:37 lr 0.000292 wd 0.0500 time 0.3972 (0.4162) data time 0.0007 (0.0018) model time 0.3965 (0.4169) loss 7.4301 (6.8109) grad_norm 1.9891 (3.1204) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][400/625] eta 0:01:34 lr 0.000292 wd 0.0500 time 0.3952 (0.4185) data time 0.0009 (0.0017) model time 0.3943 (0.4195) loss 7.0792 (6.8172) grad_norm 2.1089 (3.1030) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][410/625] eta 0:01:29 lr 0.000292 wd 0.0500 time 0.3986 (0.4180) data time 0.0009 (0.0017) model time 0.3977 (0.4189) loss 7.1325 (6.8210) grad_norm 2.4008 (3.0921) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][420/625] eta 0:01:25 lr 0.000292 wd 0.0500 time 0.4211 (0.4180) data time 0.0008 (0.0017) model time 0.4203 (0.4188) loss 7.0768 (6.8202) grad_norm 2.6280 (3.0838) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][430/625] eta 0:01:21 lr 0.000292 wd 0.0500 time 0.4038 (0.4176) data time 0.0009 (0.0017) model time 0.4029 (0.4183) loss 7.3767 (6.8245) grad_norm 2.1899 (3.1337) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][440/625] eta 0:01:17 lr 0.000292 wd 0.0500 time 0.4029 (0.4172) data time 0.0007 (0.0017) model time 0.4022 (0.4178) loss 7.0290 (6.8222) grad_norm 2.6367 (3.1208) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][450/625] eta 0:01:12 lr 0.000292 wd 0.0500 time 0.3996 (0.4168) data time 0.0009 (0.0016) model time 0.3987 (0.4173) loss 5.7821 (6.8230) grad_norm 2.4281 (3.2043) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][460/625] eta 0:01:08 lr 0.000291 wd 0.0500 time 0.3953 (0.4164) data time 0.0008 (0.0016) model time 0.3945 (0.4168) loss 6.4357 (6.8319) grad_norm 1.9415 (3.1851) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][470/625] eta 0:01:04 lr 0.000291 wd 0.0500 time 0.3974 (0.4160) data time 0.0009 (0.0016) model time 0.3965 (0.4163) loss 6.8134 (6.8341) grad_norm 5.0545 (3.1759) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][480/625] eta 0:01:00 lr 0.000291 wd 0.0500 time 0.3991 (0.4156) data time 0.0008 (0.0016) model time 0.3983 (0.4159) loss 7.1024 (6.8375) grad_norm 4.9341 (3.2929) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][490/625] eta 0:00:56 lr 0.000291 wd 0.0500 time 0.3984 (0.4153) data time 0.0009 (0.0016) model time 0.3975 (0.4155) loss 5.8995 (6.8357) grad_norm 1.9896 (3.2779) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][500/625] eta 0:00:51 lr 0.000291 wd 0.0500 time 0.4036 (0.4150) data time 0.0006 (0.0016) model time 0.4029 (0.4151) loss 5.9133 (6.8306) grad_norm 2.0180 (3.2633) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][510/625] eta 0:00:47 lr 0.000291 wd 0.0500 time 0.3986 (0.4147) data time 0.0010 (0.0015) model time 0.3976 (0.4148) loss 5.8474 (6.8351) grad_norm 2.3785 (3.2563) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][520/625] eta 0:00:43 lr 0.000291 wd 0.0500 time 0.3958 (0.4144) data time 0.0007 (0.0015) model time 0.3951 (0.4144) loss 5.9639 (6.8370) grad_norm 3.6816 (3.2487) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][530/625] eta 0:00:39 lr 0.000291 wd 0.0500 time 0.3998 (0.4141) data time 0.0009 (0.0015) model time 0.3990 (0.4141) loss 5.5313 (6.8374) grad_norm 2.8182 (3.2736) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][540/625] eta 0:00:35 lr 0.000291 wd 0.0500 time 0.4021 (0.4138) data time 0.0007 (0.0015) model time 0.4014 (0.4138) loss 6.5855 (6.8409) grad_norm 2.1035 (3.2652) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][550/625] eta 0:00:31 lr 0.000291 wd 0.0500 time 0.3963 (0.4135) data time 0.0009 (0.0015) model time 0.3954 (0.4134) loss 7.9789 (6.8386) grad_norm 3.2483 (3.2893) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][560/625] eta 0:00:26 lr 0.000291 wd 0.0500 time 0.3963 (0.4132) data time 0.0008 (0.0015) model time 0.3954 (0.4131) loss 8.2692 (6.8370) grad_norm 2.7116 (3.2827) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][570/625] eta 0:00:22 lr 0.000290 wd 0.0500 time 0.3989 (0.4132) data time 0.0008 (0.0015) model time 0.3981 (0.4130) loss 6.6345 (6.8381) grad_norm 2.8413 (3.2881) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][580/625] eta 0:00:18 lr 0.000290 wd 0.0500 time 0.6131 (0.4133) data time 0.0007 (0.0015) model time 0.6125 (0.4132) loss 6.8331 (6.8342) grad_norm 1.8050 (3.2692) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][590/625] eta 0:00:14 lr 0.000290 wd 0.0500 time 0.5542 (0.4144) data time 0.0006 (0.0014) model time 0.5536 (0.4144) loss 6.7555 (6.8381) grad_norm 2.8654 (3.2599) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][600/625] eta 0:00:10 lr 0.000290 wd 0.0500 time 0.5768 (0.4162) data time 0.0006 (0.0014) model time 0.5762 (0.4163) loss 6.5704 (6.8348) grad_norm 2.9038 (3.2476) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][610/625] eta 0:00:06 lr 0.000290 wd 0.0500 time 0.5592 (0.4171) data time 0.0006 (0.0014) model time 0.5586 (0.4173) loss 6.7883 (6.8323) grad_norm 2.2426 (3.2356) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][620/625] eta 0:00:02 lr 0.000290 wd 0.0500 time 0.3956 (0.4177) data time 0.0004 (0.0014) model time 0.3952 (0.4179) loss 6.8501 (6.8292) grad_norm 2.2298 (3.2250) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:29:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 209 training takes 0:04:20 [2024-07-25 07:29:51 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:29:52 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:29:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5601 (0.5601) Acc@1 89.453 (89.453) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 07:29:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8584 (0.6790) Acc@1 81.738 (86.492) Acc@5 96.484 (97.807) Mem 14939MB [2024-07-25 07:29:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9658 (0.7909) Acc@1 77.393 (83.389) Acc@5 94.971 (96.677) Mem 14939MB [2024-07-25 07:29:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.931 Acc@5 96.657 [2024-07-25 07:29:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-07-25 07:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.766 (0.766) Loss 0.5444 (0.5444) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 07:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.152) Loss 0.8433 (0.6722) Acc@1 81.641 (86.661) Acc@5 96.484 (97.825) Mem 14939MB [2024-07-25 07:29:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.120) Loss 0.9707 (0.7850) Acc@1 77.295 (83.598) Acc@5 95.605 (96.763) Mem 14939MB [2024-07-25 07:29:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.167 Acc@5 96.731 [2024-07-25 07:29:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-07-25 07:29:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.17% [2024-07-25 07:29:58 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:29:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:29:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][0/625] eta 0:09:03 lr 0.000290 wd 0.0500 time 0.8693 (0.8693) data time 0.4921 (0.4921) model time 0.0000 (0.0000) loss 6.5248 (6.5248) grad_norm 3.6190 (3.6190) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][10/625] eta 0:04:31 lr 0.000290 wd 0.0500 time 0.3957 (0.4415) data time 0.0007 (0.0455) model time 0.0000 (0.0000) loss 6.7094 (6.7835) grad_norm 3.1337 (3.5766) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][20/625] eta 0:04:15 lr 0.000290 wd 0.0500 time 0.3963 (0.4221) data time 0.0007 (0.0242) model time 0.0000 (0.0000) loss 7.7120 (6.7776) grad_norm 4.1364 (3.4940) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][30/625] eta 0:04:06 lr 0.000290 wd 0.0500 time 0.3990 (0.4151) data time 0.0009 (0.0167) model time 0.0000 (0.0000) loss 5.6898 (6.7365) grad_norm 2.3461 (3.2039) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][40/625] eta 0:04:01 lr 0.000290 wd 0.0500 time 0.3983 (0.4135) data time 0.0009 (0.0128) model time 0.0000 (0.0000) loss 7.3119 (6.7811) grad_norm 6.5886 (3.1432) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][50/625] eta 0:03:58 lr 0.000290 wd 0.0500 time 0.3895 (0.4145) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 6.8728 (6.7756) grad_norm 3.5539 (3.1941) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][60/625] eta 0:03:53 lr 0.000289 wd 0.0500 time 0.4023 (0.4125) data time 0.0008 (0.0089) model time 0.4015 (0.4011) loss 6.9727 (6.8405) grad_norm 3.6612 (3.1499) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][70/625] eta 0:03:48 lr 0.000289 wd 0.0500 time 0.4043 (0.4111) data time 0.0010 (0.0078) model time 0.4033 (0.4014) loss 7.1686 (6.8201) grad_norm 2.8115 (3.1343) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][80/625] eta 0:03:43 lr 0.000289 wd 0.0500 time 0.3947 (0.4102) data time 0.0007 (0.0069) model time 0.3939 (0.4021) loss 6.7479 (6.8246) grad_norm 2.5733 (3.0789) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][90/625] eta 0:03:38 lr 0.000289 wd 0.0500 time 0.4035 (0.4091) data time 0.0006 (0.0063) model time 0.4028 (0.4014) loss 6.2597 (6.7685) grad_norm 1.6719 (3.0465) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][100/625] eta 0:03:34 lr 0.000289 wd 0.0500 time 0.3975 (0.4084) data time 0.0009 (0.0057) model time 0.3966 (0.4013) loss 7.9124 (6.8180) grad_norm 3.1691 (3.0212) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][110/625] eta 0:03:30 lr 0.000289 wd 0.0500 time 0.4016 (0.4078) data time 0.0007 (0.0053) model time 0.4010 (0.4012) loss 6.2785 (6.8295) grad_norm 3.2932 (3.0605) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][120/625] eta 0:03:26 lr 0.000289 wd 0.0500 time 0.4118 (0.4082) data time 0.0007 (0.0049) model time 0.4111 (0.4027) loss 7.6476 (6.8454) grad_norm 3.1751 (3.0733) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][130/625] eta 0:03:21 lr 0.000289 wd 0.0500 time 0.3981 (0.4081) data time 0.0009 (0.0046) model time 0.3972 (0.4031) loss 5.9717 (6.8906) grad_norm 2.6154 (3.0857) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][140/625] eta 0:03:17 lr 0.000289 wd 0.0500 time 0.3961 (0.4074) data time 0.0007 (0.0043) model time 0.3954 (0.4025) loss 6.2894 (6.9014) grad_norm 2.1630 (3.1002) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][150/625] eta 0:03:13 lr 0.000289 wd 0.0500 time 0.4100 (0.4069) data time 0.0006 (0.0041) model time 0.4093 (0.4021) loss 7.3107 (6.9008) grad_norm 1.8648 (3.0878) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][160/625] eta 0:03:09 lr 0.000289 wd 0.0500 time 0.3963 (0.4066) data time 0.0009 (0.0039) model time 0.3954 (0.4021) loss 6.2285 (6.8840) grad_norm 2.3789 (3.0632) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][170/625] eta 0:03:05 lr 0.000288 wd 0.0500 time 0.3959 (0.4068) data time 0.0007 (0.0037) model time 0.3952 (0.4027) loss 8.0226 (6.9181) grad_norm 2.3531 (3.0080) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][180/625] eta 0:03:02 lr 0.000288 wd 0.0500 time 0.4029 (0.4096) data time 0.0007 (0.0035) model time 0.4022 (0.4068) loss 6.9568 (6.9042) grad_norm 3.2929 (3.1064) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][190/625] eta 0:03:00 lr 0.000288 wd 0.0500 time 0.6056 (0.4158) data time 0.0009 (0.0034) model time 0.6047 (0.4154) loss 6.4626 (6.9213) grad_norm 5.4990 (3.1275) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][200/625] eta 0:02:58 lr 0.000288 wd 0.0500 time 0.3948 (0.4203) data time 0.0009 (0.0033) model time 0.3939 (0.4215) loss 8.1312 (6.9234) grad_norm 1.7052 (3.1084) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][210/625] eta 0:02:55 lr 0.000288 wd 0.0500 time 0.4134 (0.4234) data time 0.0006 (0.0032) model time 0.4127 (0.4254) loss 7.2146 (6.9192) grad_norm 4.1010 (3.1016) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][220/625] eta 0:02:51 lr 0.000288 wd 0.0500 time 0.4010 (0.4240) data time 0.0007 (0.0031) model time 0.4003 (0.4260) loss 7.9392 (6.9082) grad_norm 2.3771 (3.0932) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][230/625] eta 0:02:47 lr 0.000288 wd 0.0500 time 0.3999 (0.4231) data time 0.0007 (0.0030) model time 0.3992 (0.4247) loss 5.7693 (6.8990) grad_norm 2.8526 (3.0726) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][240/625] eta 0:02:42 lr 0.000288 wd 0.0500 time 0.4046 (0.4223) data time 0.0008 (0.0029) model time 0.4037 (0.4236) loss 6.6476 (6.8954) grad_norm 1.8057 (3.0573) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][250/625] eta 0:02:38 lr 0.000288 wd 0.0500 time 0.4034 (0.4216) data time 0.0006 (0.0028) model time 0.4028 (0.4226) loss 5.7825 (6.8786) grad_norm 1.6591 (3.0607) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][260/625] eta 0:02:33 lr 0.000288 wd 0.0500 time 0.4078 (0.4210) data time 0.0008 (0.0027) model time 0.4070 (0.4217) loss 5.9077 (6.8806) grad_norm 1.7914 (3.0851) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][270/625] eta 0:02:29 lr 0.000288 wd 0.0500 time 0.4036 (0.4203) data time 0.0006 (0.0026) model time 0.4029 (0.4208) loss 5.9138 (6.8814) grad_norm 4.1391 (3.0762) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:31:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][280/625] eta 0:02:24 lr 0.000287 wd 0.0500 time 0.4103 (0.4198) data time 0.0006 (0.0026) model time 0.4097 (0.4201) loss 7.2589 (6.8882) grad_norm 2.8965 (3.0826) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][290/625] eta 0:02:20 lr 0.000287 wd 0.0500 time 0.4068 (0.4194) data time 0.0006 (0.0025) model time 0.4062 (0.4196) loss 6.1839 (6.8877) grad_norm 3.6957 (3.0749) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][300/625] eta 0:02:16 lr 0.000287 wd 0.0500 time 0.4005 (0.4190) data time 0.0008 (0.0025) model time 0.3997 (0.4191) loss 6.8139 (6.8895) grad_norm 1.8258 (3.0655) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][310/625] eta 0:02:11 lr 0.000287 wd 0.0500 time 0.4046 (0.4185) data time 0.0008 (0.0024) model time 0.4038 (0.4185) loss 7.7220 (6.8909) grad_norm 2.6419 (3.0515) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][320/625] eta 0:02:07 lr 0.000287 wd 0.0500 time 0.4034 (0.4179) data time 0.0006 (0.0023) model time 0.4028 (0.4177) loss 6.8376 (6.8875) grad_norm 3.2216 (3.0569) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][330/625] eta 0:02:03 lr 0.000287 wd 0.0500 time 0.4027 (0.4174) data time 0.0007 (0.0023) model time 0.4020 (0.4171) loss 6.5694 (6.8756) grad_norm 2.1465 (3.0432) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][340/625] eta 0:01:58 lr 0.000287 wd 0.0500 time 0.4063 (0.4170) data time 0.0006 (0.0023) model time 0.4057 (0.4166) loss 6.6710 (6.8641) grad_norm 2.1582 (3.0579) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][350/625] eta 0:01:54 lr 0.000287 wd 0.0500 time 0.3965 (0.4166) data time 0.0007 (0.0022) model time 0.3958 (0.4161) loss 6.4355 (6.8657) grad_norm 2.0270 (3.0385) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][360/625] eta 0:01:50 lr 0.000287 wd 0.0500 time 0.3987 (0.4162) data time 0.0006 (0.0022) model time 0.3981 (0.4157) loss 6.7364 (6.8680) grad_norm 2.7961 (3.0187) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][370/625] eta 0:01:46 lr 0.000287 wd 0.0500 time 0.4023 (0.4158) data time 0.0007 (0.0021) model time 0.4016 (0.4151) loss 7.3912 (6.8706) grad_norm 2.0900 (2.9996) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][380/625] eta 0:01:41 lr 0.000287 wd 0.0500 time 0.3967 (0.4154) data time 0.0008 (0.0021) model time 0.3960 (0.4147) loss 6.8590 (6.8758) grad_norm 2.2632 (2.9899) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][390/625] eta 0:01:37 lr 0.000286 wd 0.0500 time 0.4000 (0.4154) data time 0.0008 (0.0021) model time 0.3992 (0.4147) loss 7.3963 (6.8682) grad_norm 2.1829 (3.0053) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][400/625] eta 0:01:33 lr 0.000286 wd 0.0500 time 0.3996 (0.4162) data time 0.0007 (0.0020) model time 0.3989 (0.4157) loss 7.1311 (6.8507) grad_norm 2.8834 (3.0111) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][410/625] eta 0:01:29 lr 0.000286 wd 0.0500 time 0.6020 (0.4175) data time 0.0010 (0.0020) model time 0.6009 (0.4171) loss 6.7396 (6.8507) grad_norm 2.3743 (3.0160) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:32:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][420/625] eta 0:01:25 lr 0.000286 wd 0.0500 time 0.3966 (0.4190) data time 0.0009 (0.0020) model time 0.3957 (0.4188) loss 6.2027 (6.8468) grad_norm 3.3649 (3.0712) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][430/625] eta 0:01:22 lr 0.000286 wd 0.0500 time 0.5608 (0.4209) data time 0.0007 (0.0020) model time 0.5601 (0.4209) loss 7.0544 (6.8477) grad_norm 4.0398 (3.0777) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][440/625] eta 0:01:17 lr 0.000286 wd 0.0500 time 0.4039 (0.4210) data time 0.0008 (0.0019) model time 0.4031 (0.4210) loss 5.9100 (6.8414) grad_norm 7.1363 (3.1305) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][450/625] eta 0:01:13 lr 0.000286 wd 0.0500 time 0.4004 (0.4205) data time 0.0007 (0.0019) model time 0.3997 (0.4205) loss 7.1262 (6.8451) grad_norm 6.3184 (3.1505) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][460/625] eta 0:01:09 lr 0.000286 wd 0.0500 time 0.3947 (0.4201) data time 0.0007 (0.0019) model time 0.3940 (0.4199) loss 7.3624 (6.8412) grad_norm 2.3683 (3.1694) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][470/625] eta 0:01:05 lr 0.000286 wd 0.0500 time 0.3964 (0.4197) data time 0.0007 (0.0019) model time 0.3956 (0.4195) loss 7.1807 (6.8394) grad_norm 2.4807 (3.1548) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][480/625] eta 0:01:00 lr 0.000286 wd 0.0500 time 0.3990 (0.4193) data time 0.0009 (0.0018) model time 0.3982 (0.4190) loss 7.7494 (6.8388) grad_norm 1.7597 (3.1980) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][490/625] eta 0:00:56 lr 0.000286 wd 0.0500 time 0.4008 (0.4189) data time 0.0006 (0.0018) model time 0.4002 (0.4186) loss 6.2659 (6.8373) grad_norm 2.8665 (3.2040) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][500/625] eta 0:00:52 lr 0.000285 wd 0.0500 time 0.3984 (0.4185) data time 0.0007 (0.0018) model time 0.3978 (0.4181) loss 6.4067 (6.8356) grad_norm 2.7510 (3.1907) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][510/625] eta 0:00:48 lr 0.000285 wd 0.0500 time 0.3967 (0.4182) data time 0.0009 (0.0018) model time 0.3959 (0.4177) loss 6.2829 (6.8345) grad_norm 2.6299 (3.1977) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][520/625] eta 0:00:43 lr 0.000285 wd 0.0500 time 0.3969 (0.4178) data time 0.0008 (0.0018) model time 0.3961 (0.4173) loss 6.7263 (6.8335) grad_norm 1.8023 (3.1861) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][530/625] eta 0:00:39 lr 0.000285 wd 0.0500 time 0.4012 (0.4175) data time 0.0006 (0.0017) model time 0.4006 (0.4170) loss 5.7946 (6.8318) grad_norm 2.4946 (3.1755) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][540/625] eta 0:00:35 lr 0.000285 wd 0.0500 time 0.4007 (0.4172) data time 0.0009 (0.0017) model time 0.3998 (0.4167) loss 6.1048 (6.8293) grad_norm 2.3559 (3.1643) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][550/625] eta 0:00:31 lr 0.000285 wd 0.0500 time 0.4039 (0.4169) data time 0.0010 (0.0017) model time 0.4029 (0.4163) loss 6.3378 (6.8347) grad_norm 2.1107 (3.1531) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][560/625] eta 0:00:27 lr 0.000285 wd 0.0500 time 0.4006 (0.4166) data time 0.0008 (0.0017) model time 0.3998 (0.4160) loss 7.9201 (6.8362) grad_norm 2.5074 (3.1475) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:33:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][570/625] eta 0:00:22 lr 0.000285 wd 0.0500 time 0.3951 (0.4163) data time 0.0006 (0.0017) model time 0.3944 (0.4157) loss 6.2474 (6.8336) grad_norm 2.3573 (3.1458) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][580/625] eta 0:00:18 lr 0.000285 wd 0.0500 time 0.4018 (0.4160) data time 0.0009 (0.0017) model time 0.4009 (0.4154) loss 7.0271 (6.8331) grad_norm 2.1139 (3.1590) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][590/625] eta 0:00:14 lr 0.000285 wd 0.0500 time 0.3975 (0.4158) data time 0.0009 (0.0016) model time 0.3966 (0.4151) loss 7.0820 (6.8288) grad_norm 2.4070 (3.1542) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][600/625] eta 0:00:10 lr 0.000285 wd 0.0500 time 0.4011 (0.4155) data time 0.0006 (0.0016) model time 0.4005 (0.4147) loss 5.6720 (6.8269) grad_norm 4.7125 (3.1439) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][610/625] eta 0:00:06 lr 0.000284 wd 0.0500 time 0.3957 (0.4154) data time 0.0005 (0.0016) model time 0.3952 (0.4147) loss 6.5282 (6.8248) grad_norm 2.7200 (3.1332) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][620/625] eta 0:00:02 lr 0.000284 wd 0.0500 time 0.5937 (0.4160) data time 0.0004 (0.0016) model time 0.5932 (0.4153) loss 7.3293 (6.8190) grad_norm 3.7094 (3.1818) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 210 training takes 0:04:19 [2024-07-25 07:34:18 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:34:19 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:34:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.546 (0.546) Loss 0.5591 (0.5591) Acc@1 89.551 (89.551) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 07:34:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.128) Loss 0.8726 (0.6854) Acc@1 81.543 (86.470) Acc@5 96.533 (97.754) Mem 14939MB [2024-07-25 07:34:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.108) Loss 0.9692 (0.8042) Acc@1 77.393 (83.317) Acc@5 94.971 (96.594) Mem 14939MB [2024-07-25 07:34:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.903 Acc@5 96.587 [2024-07-25 07:34:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-07-25 07:34:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.836 (0.836) Loss 0.5439 (0.5439) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 07:34:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.165) Loss 0.8423 (0.6718) Acc@1 81.592 (86.657) Acc@5 96.436 (97.816) Mem 14939MB [2024-07-25 07:34:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.127) Loss 0.9692 (0.7843) Acc@1 77.441 (83.624) Acc@5 95.654 (96.761) Mem 14939MB [2024-07-25 07:34:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.181 Acc@5 96.733 [2024-07-25 07:34:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-07-25 07:34:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.18% [2024-07-25 07:34:25 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:34:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:34:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][0/625] eta 0:08:37 lr 0.000284 wd 0.0500 time 0.8277 (0.8277) data time 0.4277 (0.4277) model time 0.0000 (0.0000) loss 6.1162 (6.1162) grad_norm 4.2032 (4.2032) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][10/625] eta 0:05:22 lr 0.000284 wd 0.0500 time 0.3964 (0.5250) data time 0.0007 (0.0396) model time 0.0000 (0.0000) loss 6.2848 (7.0269) grad_norm 2.1594 (2.8992) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][20/625] eta 0:05:04 lr 0.000284 wd 0.0500 time 0.6078 (0.5038) data time 0.0008 (0.0212) model time 0.0000 (0.0000) loss 6.8195 (6.8066) grad_norm 2.1928 (3.0683) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][30/625] eta 0:04:48 lr 0.000284 wd 0.0500 time 0.3960 (0.4855) data time 0.0007 (0.0146) model time 0.0000 (0.0000) loss 5.3783 (6.6813) grad_norm 2.8726 (2.8880) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][40/625] eta 0:04:31 lr 0.000284 wd 0.0500 time 0.3957 (0.4648) data time 0.0008 (0.0112) model time 0.0000 (0.0000) loss 7.3974 (6.7498) grad_norm 2.0798 (2.7880) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][50/625] eta 0:04:19 lr 0.000284 wd 0.0500 time 0.4003 (0.4519) data time 0.0009 (0.0092) model time 0.0000 (0.0000) loss 7.1288 (6.7332) grad_norm 2.1653 (2.8754) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][60/625] eta 0:04:10 lr 0.000284 wd 0.0500 time 0.4033 (0.4433) data time 0.0008 (0.0078) model time 0.4025 (0.3986) loss 6.9825 (6.7417) grad_norm 3.4086 (2.9321) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:34:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][70/625] eta 0:04:02 lr 0.000284 wd 0.0500 time 0.3968 (0.4372) data time 0.0007 (0.0068) model time 0.3962 (0.3990) loss 7.2096 (6.7186) grad_norm 2.8730 (2.8857) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][80/625] eta 0:03:55 lr 0.000284 wd 0.0500 time 0.4187 (0.4329) data time 0.0008 (0.0061) model time 0.4179 (0.3997) loss 6.1461 (6.7488) grad_norm 3.3331 (2.8842) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][90/625] eta 0:03:49 lr 0.000284 wd 0.0500 time 0.3965 (0.4295) data time 0.0007 (0.0055) model time 0.3958 (0.4001) loss 7.1961 (6.7250) grad_norm 2.3496 (2.8731) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][100/625] eta 0:03:43 lr 0.000283 wd 0.0500 time 0.4016 (0.4266) data time 0.0008 (0.0051) model time 0.4008 (0.3999) loss 8.0544 (6.7667) grad_norm 2.5671 (2.8740) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][110/625] eta 0:03:38 lr 0.000283 wd 0.0500 time 0.4049 (0.4241) data time 0.0007 (0.0047) model time 0.4042 (0.3997) loss 6.9566 (6.8021) grad_norm 2.9784 (2.8794) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][120/625] eta 0:03:33 lr 0.000283 wd 0.0500 time 0.4026 (0.4221) data time 0.0007 (0.0044) model time 0.4020 (0.3995) loss 6.7515 (6.7552) grad_norm 2.7373 (2.8620) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][130/625] eta 0:03:28 lr 0.000283 wd 0.0500 time 0.3960 (0.4204) data time 0.0009 (0.0041) model time 0.3951 (0.3995) loss 6.8400 (6.7789) grad_norm 3.9402 (2.8728) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][140/625] eta 0:03:23 lr 0.000283 wd 0.0500 time 0.4041 (0.4189) data time 0.0010 (0.0039) model time 0.4031 (0.3994) loss 6.8353 (6.7717) grad_norm 2.4957 (2.8932) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][150/625] eta 0:03:19 lr 0.000283 wd 0.0500 time 0.4000 (0.4191) data time 0.0009 (0.0037) model time 0.3991 (0.4016) loss 5.3769 (6.7628) grad_norm 2.5836 (2.9506) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][160/625] eta 0:03:14 lr 0.000283 wd 0.0500 time 0.3936 (0.4181) data time 0.0008 (0.0035) model time 0.3928 (0.4016) loss 6.8228 (6.7579) grad_norm 2.5095 (2.9341) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][170/625] eta 0:03:09 lr 0.000283 wd 0.0500 time 0.4078 (0.4170) data time 0.0007 (0.0033) model time 0.4071 (0.4014) loss 7.6513 (6.7788) grad_norm 2.8130 (2.8949) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][180/625] eta 0:03:05 lr 0.000283 wd 0.0500 time 0.4025 (0.4162) data time 0.0007 (0.0032) model time 0.4018 (0.4013) loss 6.8449 (6.7655) grad_norm 2.2905 (2.8639) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][190/625] eta 0:03:00 lr 0.000283 wd 0.0500 time 0.3946 (0.4152) data time 0.0008 (0.0031) model time 0.3938 (0.4010) loss 5.3530 (6.7782) grad_norm 3.2634 (2.8409) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][200/625] eta 0:02:56 lr 0.000283 wd 0.0500 time 0.4038 (0.4152) data time 0.0007 (0.0030) model time 0.4031 (0.4018) loss 7.3731 (6.7911) grad_norm 4.8003 (2.8407) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][210/625] eta 0:02:52 lr 0.000282 wd 0.0500 time 0.4107 (0.4145) data time 0.0009 (0.0029) model time 0.4098 (0.4017) loss 7.3537 (6.7893) grad_norm 2.2871 (2.8306) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:35:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][220/625] eta 0:02:49 lr 0.000282 wd 0.0500 time 0.5225 (0.4176) data time 0.0007 (0.0028) model time 0.5218 (0.4064) loss 7.2837 (6.7856) grad_norm 2.8469 (2.8251) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][230/625] eta 0:02:45 lr 0.000282 wd 0.0500 time 0.3959 (0.4199) data time 0.0008 (0.0027) model time 0.3950 (0.4099) loss 5.5358 (6.7675) grad_norm 2.5273 (2.8124) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][240/625] eta 0:02:42 lr 0.000282 wd 0.0500 time 0.3996 (0.4214) data time 0.0008 (0.0026) model time 0.3987 (0.4124) loss 7.2508 (6.7650) grad_norm 1.8917 (2.8034) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][250/625] eta 0:02:38 lr 0.000282 wd 0.0500 time 0.4076 (0.4233) data time 0.0007 (0.0025) model time 0.4068 (0.4152) loss 7.6253 (6.7704) grad_norm 3.7281 (2.8071) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][260/625] eta 0:02:34 lr 0.000282 wd 0.0500 time 0.3988 (0.4228) data time 0.0007 (0.0025) model time 0.3981 (0.4149) loss 7.1167 (6.7790) grad_norm 2.5501 (2.8132) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][270/625] eta 0:02:29 lr 0.000282 wd 0.0500 time 0.3991 (0.4220) data time 0.0008 (0.0024) model time 0.3984 (0.4143) loss 6.9208 (6.7807) grad_norm 3.4791 (2.8130) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][280/625] eta 0:02:25 lr 0.000282 wd 0.0500 time 0.4017 (0.4212) data time 0.0008 (0.0024) model time 0.4009 (0.4136) loss 7.8847 (6.7976) grad_norm 4.5970 (2.8461) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][290/625] eta 0:02:20 lr 0.000282 wd 0.0500 time 0.4034 (0.4205) data time 0.0007 (0.0023) model time 0.4026 (0.4130) loss 6.0437 (6.7912) grad_norm 2.4721 (2.8485) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][300/625] eta 0:02:16 lr 0.000282 wd 0.0500 time 0.4034 (0.4199) data time 0.0007 (0.0023) model time 0.4027 (0.4125) loss 6.0626 (6.7890) grad_norm 3.2090 (2.8431) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][310/625] eta 0:02:12 lr 0.000282 wd 0.0500 time 0.3982 (0.4193) data time 0.0006 (0.0022) model time 0.3975 (0.4120) loss 7.2760 (6.7991) grad_norm 2.2855 (2.8246) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][320/625] eta 0:02:07 lr 0.000281 wd 0.0500 time 0.3997 (0.4186) data time 0.0008 (0.0022) model time 0.3989 (0.4115) loss 6.2355 (6.7947) grad_norm 2.4366 (2.8466) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][330/625] eta 0:02:03 lr 0.000281 wd 0.0500 time 0.4081 (0.4182) data time 0.0007 (0.0021) model time 0.4075 (0.4113) loss 6.7828 (6.7873) grad_norm 1.6404 (2.8470) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][340/625] eta 0:01:59 lr 0.000281 wd 0.0500 time 0.3993 (0.4177) data time 0.0010 (0.0021) model time 0.3984 (0.4108) loss 5.4184 (6.7784) grad_norm 2.8338 (2.8361) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][350/625] eta 0:01:54 lr 0.000281 wd 0.0500 time 0.3960 (0.4171) data time 0.0009 (0.0021) model time 0.3951 (0.4104) loss 7.3271 (6.7774) grad_norm 2.3068 (2.8377) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:36:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][360/625] eta 0:01:50 lr 0.000281 wd 0.0500 time 0.3994 (0.4166) data time 0.0009 (0.0020) model time 0.3985 (0.4100) loss 8.1235 (6.7819) grad_norm 1.7571 (2.8414) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:37:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][370/625] eta 0:01:46 lr 0.000281 wd 0.0500 time 0.3985 (0.4167) data time 0.0007 (0.0020) model time 0.3978 (0.4102) loss 7.6328 (6.7939) grad_norm 2.1962 (2.8260) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:37:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][380/625] eta 0:01:41 lr 0.000281 wd 0.0500 time 0.4001 (0.4163) data time 0.0009 (0.0020) model time 0.3992 (0.4099) loss 7.0466 (6.7884) grad_norm 2.1336 (2.8234) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:37:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][390/625] eta 0:01:37 lr 0.000281 wd 0.0500 time 0.3987 (0.4159) data time 0.0009 (0.0019) model time 0.3979 (0.4097) loss 6.9085 (6.7909) grad_norm 2.1494 (2.8282) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:37:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][400/625] eta 0:01:33 lr 0.000281 wd 0.0500 time 0.4006 (0.4156) data time 0.0008 (0.0019) model time 0.3997 (0.4094) loss 6.1367 (6.7941) grad_norm 3.1747 (2.8573) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:37:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][410/625] eta 0:01:29 lr 0.000281 wd 0.0500 time 0.4051 (0.4152) data time 0.0009 (0.0019) model time 0.4042 (0.4091) loss 7.0869 (6.7916) grad_norm 3.2738 (2.8928) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:37:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][420/625] eta 0:01:25 lr 0.000281 wd 0.0500 time 0.4025 (0.4152) data time 0.0009 (0.0019) model time 0.4016 (0.4093) loss 8.0413 (6.7947) grad_norm 3.4558 (2.8961) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:37:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][430/625] eta 0:01:20 lr 0.000281 wd 0.0500 time 0.5692 (0.4153) data time 0.0007 (0.0018) model time 0.5685 (0.4095) loss 6.4469 (6.7904) grad_norm 2.4596 (2.8878) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:37:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][440/625] eta 0:01:16 lr 0.000280 wd 0.0500 time 0.4001 (0.4160) data time 0.0008 (0.0018) model time 0.3993 (0.4105) loss 7.7412 (6.7930) grad_norm 4.8208 (2.9542) loss_scale 512.0000 (257.1610) mem 14939MB [2024-07-25 07:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][450/625] eta 0:01:13 lr 0.000280 wd 0.0500 time 0.5980 (0.4186) data time 0.0007 (0.0018) model time 0.5973 (0.4136) loss 6.3073 (6.7948) grad_norm 2.3891 (2.9500) loss_scale 512.0000 (262.8115) mem 14939MB [2024-07-25 07:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][460/625] eta 0:01:09 lr 0.000280 wd 0.0500 time 0.5798 (0.4199) data time 0.0009 (0.0018) model time 0.5789 (0.4151) loss 7.7703 (6.7946) grad_norm 3.3685 (2.9720) loss_scale 512.0000 (268.2169) mem 14939MB [2024-07-25 07:37:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][470/625] eta 0:01:05 lr 0.000280 wd 0.0500 time 0.3953 (0.4212) data time 0.0011 (0.0017) model time 0.3943 (0.4166) loss 7.6754 (6.7925) grad_norm 2.9119 (2.9801) loss_scale 512.0000 (273.3928) mem 14939MB [2024-07-25 07:37:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][480/625] eta 0:01:01 lr 0.000280 wd 0.0500 time 0.3998 (0.4211) data time 0.0010 (0.0017) model time 0.3988 (0.4166) loss 6.2759 (6.7860) grad_norm 1.8955 (2.9700) loss_scale 512.0000 (278.3534) mem 14939MB [2024-07-25 07:37:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][490/625] eta 0:00:56 lr 0.000280 wd 0.0500 time 0.4002 (0.4207) data time 0.0008 (0.0017) model time 0.3993 (0.4162) loss 8.0557 (6.7958) grad_norm 2.6971 (2.9642) loss_scale 512.0000 (283.1120) mem 14939MB [2024-07-25 07:37:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][500/625] eta 0:00:52 lr 0.000280 wd 0.0500 time 0.4029 (0.4203) data time 0.0009 (0.0017) model time 0.4021 (0.4159) loss 7.2844 (6.7975) grad_norm 1.8833 (2.9516) loss_scale 512.0000 (287.6806) mem 14939MB [2024-07-25 07:38:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][510/625] eta 0:00:48 lr 0.000280 wd 0.0500 time 0.3946 (0.4199) data time 0.0007 (0.0017) model time 0.3939 (0.4155) loss 7.6270 (6.8030) grad_norm 6.8625 (2.9559) loss_scale 512.0000 (292.0705) mem 14939MB [2024-07-25 07:38:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][520/625] eta 0:00:44 lr 0.000280 wd 0.0500 time 0.3997 (0.4196) data time 0.0007 (0.0017) model time 0.3990 (0.4152) loss 7.2892 (6.8031) grad_norm 2.2768 (2.9535) loss_scale 512.0000 (296.2917) mem 14939MB [2024-07-25 07:38:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][530/625] eta 0:00:39 lr 0.000280 wd 0.0500 time 0.4110 (0.4192) data time 0.0009 (0.0016) model time 0.4101 (0.4149) loss 5.4211 (6.8050) grad_norm 2.7300 (2.9739) loss_scale 512.0000 (300.3540) mem 14939MB [2024-07-25 07:38:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][540/625] eta 0:00:35 lr 0.000280 wd 0.0500 time 0.4029 (0.4189) data time 0.0007 (0.0016) model time 0.4022 (0.4146) loss 5.8941 (6.7962) grad_norm 2.7662 (2.9961) loss_scale 512.0000 (304.2662) mem 14939MB [2024-07-25 07:38:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][550/625] eta 0:00:31 lr 0.000279 wd 0.0500 time 0.3975 (0.4186) data time 0.0006 (0.0016) model time 0.3968 (0.4143) loss 8.1476 (6.8050) grad_norm 4.2434 (3.0271) loss_scale 512.0000 (308.0363) mem 14939MB [2024-07-25 07:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][560/625] eta 0:00:27 lr 0.000279 wd 0.0500 time 0.3995 (0.4182) data time 0.0009 (0.0016) model time 0.3986 (0.4140) loss 7.5647 (6.8063) grad_norm 3.2714 (3.0908) loss_scale 512.0000 (311.6720) mem 14939MB [2024-07-25 07:38:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][570/625] eta 0:00:22 lr 0.000279 wd 0.0500 time 0.3955 (0.4179) data time 0.0006 (0.0016) model time 0.3948 (0.4137) loss 7.0421 (6.8036) grad_norm 2.3051 (3.0952) loss_scale 512.0000 (315.1804) mem 14939MB [2024-07-25 07:38:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][580/625] eta 0:00:18 lr 0.000279 wd 0.0500 time 0.3966 (0.4176) data time 0.0006 (0.0016) model time 0.3959 (0.4135) loss 5.3428 (6.8002) grad_norm 2.6815 (3.0850) loss_scale 512.0000 (318.5680) mem 14939MB [2024-07-25 07:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][590/625] eta 0:00:14 lr 0.000279 wd 0.0500 time 0.4012 (0.4177) data time 0.0010 (0.0016) model time 0.4003 (0.4136) loss 6.5798 (6.8026) grad_norm 1.5622 (3.0704) loss_scale 512.0000 (321.8409) mem 14939MB [2024-07-25 07:38:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][600/625] eta 0:00:10 lr 0.000279 wd 0.0500 time 0.4001 (0.4174) data time 0.0006 (0.0015) model time 0.3994 (0.4134) loss 6.5551 (6.8032) grad_norm 3.1648 (3.0651) loss_scale 512.0000 (325.0050) mem 14939MB [2024-07-25 07:38:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][610/625] eta 0:00:06 lr 0.000279 wd 0.0500 time 0.4020 (0.4172) data time 0.0006 (0.0015) model time 0.4014 (0.4131) loss 6.9954 (6.8033) grad_norm 8.9530 (3.0730) loss_scale 512.0000 (328.0655) mem 14939MB [2024-07-25 07:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][620/625] eta 0:00:02 lr 0.000279 wd 0.0500 time 0.3963 (0.4169) data time 0.0006 (0.0015) model time 0.3956 (0.4129) loss 6.7281 (6.7991) grad_norm 2.6537 (3.0916) loss_scale 512.0000 (331.0274) mem 14939MB [2024-07-25 07:38:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 211 training takes 0:04:20 [2024-07-25 07:38:46 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:38:47 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:38:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.516 (0.516) Loss 0.5469 (0.5469) Acc@1 89.990 (89.990) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 07:38:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.126) Loss 0.8706 (0.6833) Acc@1 81.299 (86.519) Acc@5 96.289 (97.758) Mem 14939MB [2024-07-25 07:38:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.106) Loss 0.9658 (0.7968) Acc@1 77.588 (83.489) Acc@5 94.873 (96.584) Mem 14939MB [2024-07-25 07:38:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.087 Acc@5 96.563 [2024-07-25 07:38:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:38:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.09% [2024-07-25 07:38:50 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 07:38:51 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 07:38:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.496 (0.496) Loss 0.5439 (0.5439) Acc@1 89.941 (89.941) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 07:38:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.124) Loss 0.8413 (0.6714) Acc@1 81.592 (86.665) Acc@5 96.484 (97.820) Mem 14939MB [2024-07-25 07:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9663 (0.7835) Acc@1 77.441 (83.636) Acc@5 95.654 (96.763) Mem 14939MB [2024-07-25 07:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.197 Acc@5 96.737 [2024-07-25 07:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-07-25 07:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.20% [2024-07-25 07:38:53 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:38:54 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:38:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][0/625] eta 0:08:39 lr 0.000279 wd 0.0500 time 0.8305 (0.8305) data time 0.4466 (0.4466) model time 0.0000 (0.0000) loss 6.6224 (6.6224) grad_norm 1.9596 (1.9596) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:38:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][10/625] eta 0:04:31 lr 0.000279 wd 0.0500 time 0.4081 (0.4422) data time 0.0009 (0.0414) model time 0.0000 (0.0000) loss 6.9208 (7.0489) grad_norm 4.7290 (2.9828) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][20/625] eta 0:04:20 lr 0.000279 wd 0.0500 time 0.4059 (0.4303) data time 0.0007 (0.0220) model time 0.0000 (0.0000) loss 7.4488 (6.8061) grad_norm 2.2429 (2.7162) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][30/625] eta 0:04:18 lr 0.000279 wd 0.0500 time 0.4009 (0.4339) data time 0.0009 (0.0152) model time 0.0000 (0.0000) loss 7.3053 (6.8184) grad_norm 2.9614 (2.5976) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][40/625] eta 0:04:16 lr 0.000278 wd 0.0500 time 0.6233 (0.4383) data time 0.0009 (0.0117) model time 0.0000 (0.0000) loss 7.1207 (6.8407) grad_norm 6.2300 (2.6752) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][50/625] eta 0:04:13 lr 0.000278 wd 0.0500 time 0.3980 (0.4404) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 6.4179 (6.7791) grad_norm 2.3070 (3.0228) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][60/625] eta 0:04:15 lr 0.000278 wd 0.0500 time 0.5912 (0.4517) data time 0.0009 (0.0081) model time 0.5903 (0.5081) loss 6.3507 (6.7699) grad_norm 2.5136 (3.0164) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][70/625] eta 0:04:10 lr 0.000278 wd 0.0500 time 0.4103 (0.4517) data time 0.0009 (0.0071) model time 0.4093 (0.4795) loss 5.9276 (6.7645) grad_norm 4.6845 (3.0576) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][80/625] eta 0:04:03 lr 0.000278 wd 0.0500 time 0.5724 (0.4474) data time 0.0009 (0.0063) model time 0.5715 (0.4584) loss 7.6062 (6.7758) grad_norm 2.3873 (3.0191) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][90/625] eta 0:03:56 lr 0.000278 wd 0.0500 time 0.4000 (0.4417) data time 0.0009 (0.0057) model time 0.3992 (0.4426) loss 7.2416 (6.7912) grad_norm 2.6833 (2.9809) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][100/625] eta 0:03:49 lr 0.000278 wd 0.0500 time 0.3990 (0.4377) data time 0.0009 (0.0052) model time 0.3981 (0.4340) loss 5.1479 (6.7561) grad_norm 3.3977 (3.0273) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][110/625] eta 0:03:43 lr 0.000278 wd 0.0500 time 0.4012 (0.4346) data time 0.0009 (0.0049) model time 0.4003 (0.4287) loss 6.8472 (6.7314) grad_norm 3.2684 (3.1194) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][120/625] eta 0:03:38 lr 0.000278 wd 0.0500 time 0.3978 (0.4320) data time 0.0008 (0.0045) model time 0.3970 (0.4250) loss 7.1635 (6.7528) grad_norm 5.3949 (3.1190) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][130/625] eta 0:03:32 lr 0.000278 wd 0.0500 time 0.4028 (0.4296) data time 0.0010 (0.0042) model time 0.4019 (0.4219) loss 7.9387 (6.7675) grad_norm 2.7282 (3.1064) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][140/625] eta 0:03:27 lr 0.000278 wd 0.0500 time 0.3957 (0.4276) data time 0.0009 (0.0040) model time 0.3948 (0.4195) loss 5.6320 (6.7216) grad_norm 3.5682 (3.2073) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:39:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][150/625] eta 0:03:22 lr 0.000277 wd 0.0500 time 0.3972 (0.4259) data time 0.0009 (0.0038) model time 0.3963 (0.4176) loss 7.2874 (6.7206) grad_norm 1.9015 (3.1843) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][160/625] eta 0:03:17 lr 0.000277 wd 0.0500 time 0.4063 (0.4244) data time 0.0007 (0.0036) model time 0.4056 (0.4161) loss 6.9833 (6.7340) grad_norm 4.4189 (3.1671) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][170/625] eta 0:03:12 lr 0.000277 wd 0.0500 time 0.3941 (0.4230) data time 0.0008 (0.0034) model time 0.3932 (0.4148) loss 6.6116 (6.7242) grad_norm 6.3415 (3.1803) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][180/625] eta 0:03:07 lr 0.000277 wd 0.0500 time 0.3950 (0.4219) data time 0.0008 (0.0033) model time 0.3942 (0.4138) loss 7.4401 (6.7321) grad_norm 2.4575 (3.2974) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][190/625] eta 0:03:03 lr 0.000277 wd 0.0500 time 0.4111 (0.4209) data time 0.0008 (0.0032) model time 0.4103 (0.4129) loss 7.3761 (6.7345) grad_norm 2.1701 (3.4065) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][200/625] eta 0:02:58 lr 0.000277 wd 0.0500 time 0.4075 (0.4199) data time 0.0006 (0.0030) model time 0.4068 (0.4121) loss 6.6668 (6.7302) grad_norm 2.9856 (3.3891) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][210/625] eta 0:02:53 lr 0.000277 wd 0.0500 time 0.3994 (0.4190) data time 0.0007 (0.0029) model time 0.3987 (0.4114) loss 5.9970 (6.7283) grad_norm 2.8695 (3.3731) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][220/625] eta 0:02:49 lr 0.000277 wd 0.0500 time 0.3960 (0.4183) data time 0.0009 (0.0028) model time 0.3951 (0.4108) loss 7.3060 (6.7537) grad_norm 3.8743 (3.3617) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][230/625] eta 0:02:44 lr 0.000277 wd 0.0500 time 0.4033 (0.4176) data time 0.0009 (0.0028) model time 0.4024 (0.4103) loss 6.5197 (6.7471) grad_norm 2.0053 (3.3543) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][240/625] eta 0:02:40 lr 0.000277 wd 0.0500 time 0.4030 (0.4174) data time 0.0009 (0.0027) model time 0.4021 (0.4104) loss 6.7794 (6.7560) grad_norm 3.8368 (3.3391) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][250/625] eta 0:02:36 lr 0.000277 wd 0.0500 time 0.4098 (0.4183) data time 0.0007 (0.0026) model time 0.4091 (0.4118) loss 6.9411 (6.7572) grad_norm 4.2939 (3.4112) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][260/625] eta 0:02:33 lr 0.000276 wd 0.0500 time 0.5870 (0.4202) data time 0.0008 (0.0025) model time 0.5862 (0.4144) loss 7.4396 (6.7609) grad_norm 2.3980 (3.4142) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][270/625] eta 0:02:30 lr 0.000276 wd 0.0500 time 0.5772 (0.4241) data time 0.0009 (0.0025) model time 0.5763 (0.4195) loss 7.0962 (6.7648) grad_norm 2.3322 (3.3985) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][280/625] eta 0:02:26 lr 0.000276 wd 0.0500 time 0.4023 (0.4257) data time 0.0007 (0.0024) model time 0.4016 (0.4216) loss 7.7743 (6.7665) grad_norm 2.3928 (3.3626) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:40:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][290/625] eta 0:02:22 lr 0.000276 wd 0.0500 time 0.4084 (0.4263) data time 0.0007 (0.0024) model time 0.4077 (0.4225) loss 7.0357 (6.7760) grad_norm 4.5631 (3.3449) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][300/625] eta 0:02:18 lr 0.000276 wd 0.0500 time 0.3998 (0.4255) data time 0.0006 (0.0023) model time 0.3992 (0.4216) loss 7.9634 (6.7878) grad_norm 2.5061 (3.3246) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][310/625] eta 0:02:13 lr 0.000276 wd 0.0500 time 0.3872 (0.4253) data time 0.0007 (0.0023) model time 0.3865 (0.4215) loss 6.8425 (6.7862) grad_norm 2.6075 (3.2893) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][320/625] eta 0:02:09 lr 0.000276 wd 0.0500 time 0.4010 (0.4245) data time 0.0008 (0.0022) model time 0.4001 (0.4207) loss 6.8568 (6.7857) grad_norm 1.5726 (3.2655) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][330/625] eta 0:02:05 lr 0.000276 wd 0.0500 time 0.3987 (0.4238) data time 0.0007 (0.0022) model time 0.3980 (0.4199) loss 5.6641 (6.7807) grad_norm 3.1986 (3.2960) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][340/625] eta 0:02:00 lr 0.000276 wd 0.0500 time 0.4005 (0.4231) data time 0.0006 (0.0021) model time 0.3999 (0.4192) loss 6.4663 (6.7758) grad_norm 3.0887 (3.3083) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][350/625] eta 0:01:56 lr 0.000276 wd 0.0500 time 0.4078 (0.4225) data time 0.0008 (0.0021) model time 0.4070 (0.4187) loss 6.6238 (6.7739) grad_norm 1.9035 (3.3140) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][360/625] eta 0:01:51 lr 0.000276 wd 0.0500 time 0.3970 (0.4219) data time 0.0007 (0.0021) model time 0.3963 (0.4181) loss 7.9322 (6.7812) grad_norm 2.4444 (3.3051) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][370/625] eta 0:01:47 lr 0.000275 wd 0.0500 time 0.3957 (0.4214) data time 0.0009 (0.0020) model time 0.3947 (0.4176) loss 6.9410 (6.7816) grad_norm 2.3029 (3.2994) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][380/625] eta 0:01:43 lr 0.000275 wd 0.0500 time 0.4012 (0.4209) data time 0.0006 (0.0020) model time 0.4006 (0.4170) loss 5.7611 (6.7829) grad_norm 6.2108 (3.3047) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][390/625] eta 0:01:38 lr 0.000275 wd 0.0500 time 0.4117 (0.4205) data time 0.0007 (0.0020) model time 0.4110 (0.4166) loss 6.6661 (6.7866) grad_norm 2.3599 (3.2860) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][400/625] eta 0:01:34 lr 0.000275 wd 0.0500 time 0.4163 (0.4202) data time 0.0007 (0.0019) model time 0.4156 (0.4164) loss 6.7522 (6.7880) grad_norm 3.1311 (3.2679) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][410/625] eta 0:01:30 lr 0.000275 wd 0.0500 time 0.4053 (0.4198) data time 0.0007 (0.0019) model time 0.4045 (0.4161) loss 7.4684 (6.7875) grad_norm 2.4997 (3.2533) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][420/625] eta 0:01:25 lr 0.000275 wd 0.0500 time 0.3971 (0.4195) data time 0.0007 (0.0019) model time 0.3965 (0.4158) loss 7.7175 (6.7824) grad_norm 2.2266 (3.2379) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][430/625] eta 0:01:21 lr 0.000275 wd 0.0500 time 0.4047 (0.4191) data time 0.0009 (0.0019) model time 0.4038 (0.4154) loss 5.8923 (6.7865) grad_norm 2.5244 (3.2320) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:41:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][440/625] eta 0:01:17 lr 0.000275 wd 0.0500 time 0.4009 (0.4187) data time 0.0007 (0.0018) model time 0.4002 (0.4150) loss 6.4824 (6.7871) grad_norm 2.7987 (3.2314) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][450/625] eta 0:01:13 lr 0.000275 wd 0.0500 time 0.3977 (0.4183) data time 0.0008 (0.0018) model time 0.3969 (0.4146) loss 5.4993 (6.7833) grad_norm 5.2797 (3.2317) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][460/625] eta 0:01:09 lr 0.000275 wd 0.0500 time 0.4060 (0.4182) data time 0.0008 (0.0018) model time 0.4053 (0.4146) loss 6.5816 (6.7930) grad_norm 2.5615 (3.2239) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][470/625] eta 0:01:04 lr 0.000275 wd 0.0500 time 0.3957 (0.4186) data time 0.0007 (0.0018) model time 0.3951 (0.4151) loss 7.7640 (6.7959) grad_norm 3.3110 (3.2123) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][480/625] eta 0:01:00 lr 0.000275 wd 0.0500 time 0.6098 (0.4198) data time 0.0007 (0.0017) model time 0.6092 (0.4165) loss 6.6128 (6.7971) grad_norm 2.3505 (3.2112) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][490/625] eta 0:00:56 lr 0.000274 wd 0.0500 time 0.5933 (0.4213) data time 0.0008 (0.0017) model time 0.5925 (0.4183) loss 6.8469 (6.7958) grad_norm 2.4623 (3.2075) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][500/625] eta 0:00:52 lr 0.000274 wd 0.0500 time 0.5777 (0.4229) data time 0.0008 (0.0017) model time 0.5769 (0.4201) loss 6.0097 (6.7902) grad_norm 2.6829 (3.2055) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][510/625] eta 0:00:48 lr 0.000274 wd 0.0500 time 0.5832 (0.4238) data time 0.0007 (0.0017) model time 0.5825 (0.4212) loss 6.5124 (6.7857) grad_norm 3.0504 (3.2164) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][520/625] eta 0:00:44 lr 0.000274 wd 0.0500 time 0.4013 (0.4234) data time 0.0009 (0.0017) model time 0.4004 (0.4207) loss 8.0104 (6.7876) grad_norm 2.7104 (3.2162) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][530/625] eta 0:00:40 lr 0.000274 wd 0.0500 time 0.4026 (0.4233) data time 0.0007 (0.0017) model time 0.4019 (0.4207) loss 6.3079 (6.7831) grad_norm 2.2427 (3.2025) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][540/625] eta 0:00:35 lr 0.000274 wd 0.0500 time 0.4181 (0.4229) data time 0.0008 (0.0016) model time 0.4173 (0.4203) loss 7.6677 (6.7845) grad_norm 2.4039 (3.2295) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][550/625] eta 0:00:31 lr 0.000274 wd 0.0500 time 0.3952 (0.4225) data time 0.0009 (0.0016) model time 0.3943 (0.4199) loss 8.0583 (6.7911) grad_norm 2.3254 (3.2187) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][560/625] eta 0:00:27 lr 0.000274 wd 0.0500 time 0.3958 (0.4221) data time 0.0009 (0.0016) model time 0.3949 (0.4195) loss 6.5201 (6.7980) grad_norm 2.4317 (3.2278) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][570/625] eta 0:00:23 lr 0.000274 wd 0.0500 time 0.3988 (0.4218) data time 0.0007 (0.0016) model time 0.3982 (0.4191) loss 6.9399 (6.7980) grad_norm 5.9871 (3.2261) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:42:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][580/625] eta 0:00:18 lr 0.000274 wd 0.0500 time 0.4017 (0.4214) data time 0.0007 (0.0016) model time 0.4010 (0.4188) loss 6.4123 (6.7998) grad_norm 3.2254 (3.2246) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][590/625] eta 0:00:14 lr 0.000274 wd 0.0500 time 0.3920 (0.4211) data time 0.0007 (0.0016) model time 0.3913 (0.4184) loss 6.4178 (6.7973) grad_norm 1.6819 (3.2219) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][600/625] eta 0:00:10 lr 0.000273 wd 0.0500 time 0.3973 (0.4208) data time 0.0008 (0.0016) model time 0.3965 (0.4182) loss 7.1135 (6.7926) grad_norm 3.9458 (3.2350) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][610/625] eta 0:00:06 lr 0.000273 wd 0.0500 time 0.3971 (0.4205) data time 0.0006 (0.0016) model time 0.3964 (0.4178) loss 6.5683 (6.7959) grad_norm 3.5464 (3.2392) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][620/625] eta 0:00:02 lr 0.000273 wd 0.0500 time 0.4006 (0.4201) data time 0.0004 (0.0015) model time 0.4001 (0.4175) loss 7.5746 (6.7964) grad_norm 2.1860 (3.2352) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 212 training takes 0:04:22 [2024-07-25 07:43:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:43:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:43:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.498 (0.498) Loss 0.5542 (0.5542) Acc@1 89.307 (89.307) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 07:43:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.124) Loss 0.8521 (0.6694) Acc@1 81.885 (86.470) Acc@5 96.338 (97.705) Mem 14939MB [2024-07-25 07:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9395 (0.7816) Acc@1 78.271 (83.512) Acc@5 95.703 (96.612) Mem 14939MB [2024-07-25 07:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.053 Acc@5 96.575 [2024-07-25 07:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 7.998 (7.998) Loss 0.5439 (0.5439) Acc@1 89.990 (89.990) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 07:43:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.806) Loss 0.8418 (0.6711) Acc@1 81.641 (86.697) Acc@5 96.436 (97.820) Mem 14939MB [2024-07-25 07:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.463) Loss 0.9658 (0.7828) Acc@1 77.441 (83.661) Acc@5 95.703 (96.768) Mem 14939MB [2024-07-25 07:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.223 Acc@5 96.743 [2024-07-25 07:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-07-25 07:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.22% [2024-07-25 07:43:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:43:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:43:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][0/625] eta 0:09:50 lr 0.000273 wd 0.0500 time 0.9446 (0.9446) data time 0.5638 (0.5638) model time 0.0000 (0.0000) loss 7.8981 (7.8981) grad_norm 1.6969 (1.6969) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][10/625] eta 0:04:34 lr 0.000273 wd 0.0500 time 0.3998 (0.4459) data time 0.0006 (0.0520) model time 0.0000 (0.0000) loss 7.0044 (7.2944) grad_norm 4.6178 (2.7225) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][20/625] eta 0:04:16 lr 0.000273 wd 0.0500 time 0.3953 (0.4240) data time 0.0007 (0.0276) model time 0.0000 (0.0000) loss 6.0545 (6.7936) grad_norm 2.7429 (3.2385) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][30/625] eta 0:04:07 lr 0.000273 wd 0.0500 time 0.3954 (0.4156) data time 0.0007 (0.0190) model time 0.0000 (0.0000) loss 6.9551 (6.6843) grad_norm 2.1134 (3.2900) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][40/625] eta 0:04:00 lr 0.000273 wd 0.0500 time 0.3991 (0.4113) data time 0.0008 (0.0146) model time 0.0000 (0.0000) loss 7.9460 (6.7711) grad_norm 2.6249 (3.5680) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][50/625] eta 0:03:56 lr 0.000273 wd 0.0500 time 0.5656 (0.4121) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 5.8690 (6.7412) grad_norm 2.0701 (3.7696) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:43:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][60/625] eta 0:03:53 lr 0.000273 wd 0.0500 time 0.4041 (0.4129) data time 0.0007 (0.0101) model time 0.4034 (0.4164) loss 6.5995 (6.6605) grad_norm 4.0476 (3.7813) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][70/625] eta 0:03:50 lr 0.000273 wd 0.0500 time 0.3960 (0.4160) data time 0.0008 (0.0088) model time 0.3952 (0.4252) loss 6.7856 (6.7126) grad_norm 1.8697 (3.6498) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][80/625] eta 0:03:51 lr 0.000273 wd 0.0500 time 0.5482 (0.4244) data time 0.0008 (0.0078) model time 0.5474 (0.4444) loss 7.1102 (6.7095) grad_norm 2.5290 (3.6836) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][90/625] eta 0:03:49 lr 0.000272 wd 0.0500 time 0.3973 (0.4295) data time 0.0007 (0.0070) model time 0.3966 (0.4509) loss 8.0657 (6.7531) grad_norm 3.5234 (3.8520) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][100/625] eta 0:03:46 lr 0.000272 wd 0.0500 time 0.3976 (0.4321) data time 0.0006 (0.0064) model time 0.3970 (0.4517) loss 6.4744 (6.7386) grad_norm 2.1562 (3.7786) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][110/625] eta 0:03:43 lr 0.000272 wd 0.0500 time 0.4019 (0.4339) data time 0.0007 (0.0059) model time 0.4011 (0.4516) loss 5.4330 (6.7452) grad_norm 2.2258 (3.7025) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][120/625] eta 0:03:37 lr 0.000272 wd 0.0500 time 0.4060 (0.4312) data time 0.0009 (0.0055) model time 0.4052 (0.4443) loss 7.2701 (6.7613) grad_norm 2.1654 (3.6691) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][130/625] eta 0:03:32 lr 0.000272 wd 0.0500 time 0.3931 (0.4288) data time 0.0007 (0.0051) model time 0.3924 (0.4386) loss 5.6824 (6.7628) grad_norm 2.1396 (3.6027) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][140/625] eta 0:03:27 lr 0.000272 wd 0.0500 time 0.3972 (0.4268) data time 0.0008 (0.0048) model time 0.3964 (0.4343) loss 7.5524 (6.7443) grad_norm 2.2146 (3.5407) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][150/625] eta 0:03:21 lr 0.000272 wd 0.0500 time 0.4017 (0.4251) data time 0.0007 (0.0046) model time 0.4010 (0.4309) loss 6.7881 (6.7629) grad_norm 3.6770 (3.5899) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][160/625] eta 0:03:16 lr 0.000272 wd 0.0500 time 0.4034 (0.4235) data time 0.0009 (0.0043) model time 0.4024 (0.4280) loss 6.0978 (6.7752) grad_norm 4.3952 (3.5706) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][170/625] eta 0:03:12 lr 0.000272 wd 0.0500 time 0.4011 (0.4222) data time 0.0009 (0.0041) model time 0.4002 (0.4256) loss 7.5365 (6.7702) grad_norm 2.8053 (3.5582) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][180/625] eta 0:03:07 lr 0.000272 wd 0.0500 time 0.4102 (0.4211) data time 0.0007 (0.0040) model time 0.4095 (0.4237) loss 6.3290 (6.7525) grad_norm 1.9807 (3.6411) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][190/625] eta 0:03:02 lr 0.000272 wd 0.0500 time 0.3999 (0.4201) data time 0.0007 (0.0038) model time 0.3992 (0.4222) loss 7.7540 (6.7552) grad_norm 2.5778 (3.6342) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][200/625] eta 0:02:58 lr 0.000271 wd 0.0500 time 0.3998 (0.4191) data time 0.0007 (0.0036) model time 0.3991 (0.4206) loss 7.7126 (6.7358) grad_norm 3.1632 (3.7505) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:44:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][210/625] eta 0:02:53 lr 0.000271 wd 0.0500 time 0.4019 (0.4184) data time 0.0008 (0.0035) model time 0.4010 (0.4196) loss 6.5892 (6.7428) grad_norm 7.1689 (3.7620) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][220/625] eta 0:02:49 lr 0.000271 wd 0.0500 time 0.3930 (0.4175) data time 0.0006 (0.0034) model time 0.3923 (0.4184) loss 5.6669 (6.7563) grad_norm 2.7779 (3.7379) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][230/625] eta 0:02:44 lr 0.000271 wd 0.0500 time 0.4121 (0.4168) data time 0.0008 (0.0033) model time 0.4113 (0.4174) loss 6.5367 (6.7479) grad_norm 3.6682 (3.7261) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][240/625] eta 0:02:40 lr 0.000271 wd 0.0500 time 0.3947 (0.4161) data time 0.0010 (0.0032) model time 0.3937 (0.4164) loss 7.9975 (6.7536) grad_norm 2.7532 (3.6795) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][250/625] eta 0:02:35 lr 0.000271 wd 0.0500 time 0.3967 (0.4155) data time 0.0009 (0.0031) model time 0.3958 (0.4155) loss 7.3605 (6.7606) grad_norm 2.6723 (3.6627) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][260/625] eta 0:02:31 lr 0.000271 wd 0.0500 time 0.3992 (0.4150) data time 0.0007 (0.0030) model time 0.3985 (0.4148) loss 6.8429 (6.7637) grad_norm 2.8183 (3.6107) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][270/625] eta 0:02:27 lr 0.000271 wd 0.0500 time 0.5855 (0.4151) data time 0.0007 (0.0029) model time 0.5849 (0.4149) loss 6.1508 (6.7625) grad_norm 2.3662 (3.5790) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][280/625] eta 0:02:23 lr 0.000271 wd 0.0500 time 0.6380 (0.4160) data time 0.0009 (0.0028) model time 0.6371 (0.4160) loss 6.3424 (6.7681) grad_norm 3.2453 (3.6107) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][290/625] eta 0:02:19 lr 0.000271 wd 0.0500 time 0.6187 (0.4173) data time 0.0009 (0.0028) model time 0.6179 (0.4175) loss 6.2043 (6.7572) grad_norm 3.1379 (3.5872) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][300/625] eta 0:02:16 lr 0.000271 wd 0.0500 time 0.5956 (0.4191) data time 0.0007 (0.0027) model time 0.5948 (0.4197) loss 6.9077 (6.7659) grad_norm 2.9630 (3.5643) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][310/625] eta 0:02:12 lr 0.000270 wd 0.0500 time 0.5862 (0.4214) data time 0.0009 (0.0026) model time 0.5853 (0.4224) loss 7.2114 (6.7841) grad_norm 2.9127 (3.5321) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][320/625] eta 0:02:08 lr 0.000270 wd 0.0500 time 0.3946 (0.4224) data time 0.0007 (0.0026) model time 0.3939 (0.4235) loss 7.1766 (6.7835) grad_norm 4.2354 (3.6067) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][330/625] eta 0:02:04 lr 0.000270 wd 0.0500 time 0.4051 (0.4231) data time 0.0007 (0.0025) model time 0.4044 (0.4243) loss 7.8611 (6.7962) grad_norm 2.9731 (3.5926) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][340/625] eta 0:02:00 lr 0.000270 wd 0.0500 time 0.3962 (0.4224) data time 0.0007 (0.0025) model time 0.3955 (0.4234) loss 6.6095 (6.8126) grad_norm 2.3885 (3.6217) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][350/625] eta 0:01:55 lr 0.000270 wd 0.0500 time 0.3962 (0.4218) data time 0.0010 (0.0024) model time 0.3952 (0.4226) loss 6.6024 (6.8020) grad_norm 2.4300 (3.5993) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][360/625] eta 0:01:51 lr 0.000270 wd 0.0500 time 0.4005 (0.4212) data time 0.0008 (0.0024) model time 0.3997 (0.4219) loss 6.6722 (6.7959) grad_norm 2.1188 (3.5752) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][370/625] eta 0:01:47 lr 0.000270 wd 0.0500 time 0.3983 (0.4207) data time 0.0009 (0.0024) model time 0.3975 (0.4212) loss 6.9765 (6.7944) grad_norm 1.8389 (3.5774) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][380/625] eta 0:01:42 lr 0.000270 wd 0.0500 time 0.3999 (0.4203) data time 0.0009 (0.0023) model time 0.3990 (0.4207) loss 6.4433 (6.7909) grad_norm 2.5218 (3.5852) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][390/625] eta 0:01:38 lr 0.000270 wd 0.0500 time 0.3981 (0.4198) data time 0.0009 (0.0023) model time 0.3972 (0.4201) loss 6.6852 (6.7906) grad_norm 3.7021 (3.5822) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][400/625] eta 0:01:34 lr 0.000270 wd 0.0500 time 0.4010 (0.4193) data time 0.0008 (0.0022) model time 0.4002 (0.4195) loss 6.6988 (6.7870) grad_norm 4.5359 (3.6015) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][410/625] eta 0:01:30 lr 0.000270 wd 0.0500 time 0.3963 (0.4188) data time 0.0009 (0.0022) model time 0.3954 (0.4189) loss 7.1474 (6.7915) grad_norm 1.7591 (3.5876) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][420/625] eta 0:01:25 lr 0.000270 wd 0.0500 time 0.3966 (0.4183) data time 0.0008 (0.0022) model time 0.3958 (0.4183) loss 7.3982 (6.7903) grad_norm 2.1297 (3.5853) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][430/625] eta 0:01:21 lr 0.000269 wd 0.0500 time 0.4074 (0.4179) data time 0.0006 (0.0021) model time 0.4067 (0.4179) loss 6.8728 (6.7921) grad_norm 4.2778 (3.5818) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][440/625] eta 0:01:17 lr 0.000269 wd 0.0500 time 0.3967 (0.4175) data time 0.0007 (0.0021) model time 0.3960 (0.4174) loss 8.3642 (6.8030) grad_norm 3.5791 (3.5683) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][450/625] eta 0:01:12 lr 0.000269 wd 0.0500 time 0.4037 (0.4171) data time 0.0007 (0.0021) model time 0.4030 (0.4169) loss 7.5697 (6.8122) grad_norm 2.4461 (3.5477) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][460/625] eta 0:01:08 lr 0.000269 wd 0.0500 time 0.3993 (0.4167) data time 0.0008 (0.0021) model time 0.3985 (0.4165) loss 7.1169 (6.8154) grad_norm 2.2994 (3.5322) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][470/625] eta 0:01:04 lr 0.000269 wd 0.0500 time 0.3955 (0.4164) data time 0.0008 (0.0020) model time 0.3946 (0.4161) loss 7.4628 (6.8086) grad_norm 2.3856 (3.5069) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][480/625] eta 0:01:00 lr 0.000269 wd 0.0500 time 0.3959 (0.4161) data time 0.0007 (0.0020) model time 0.3952 (0.4157) loss 6.4403 (6.8049) grad_norm 2.2329 (3.4926) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:46:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][490/625] eta 0:00:56 lr 0.000269 wd 0.0500 time 0.4062 (0.4158) data time 0.0008 (0.0020) model time 0.4053 (0.4154) loss 7.0864 (6.8040) grad_norm 1.8202 (3.4718) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][500/625] eta 0:00:52 lr 0.000269 wd 0.0500 time 0.3993 (0.4160) data time 0.0007 (0.0020) model time 0.3985 (0.4156) loss 7.9400 (6.8134) grad_norm 2.0969 (3.4649) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][510/625] eta 0:00:47 lr 0.000269 wd 0.0500 time 0.3967 (0.4168) data time 0.0008 (0.0019) model time 0.3959 (0.4164) loss 7.8736 (6.8169) grad_norm 2.7541 (3.4434) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][520/625] eta 0:00:43 lr 0.000269 wd 0.0500 time 0.5971 (0.4182) data time 0.0009 (0.0019) model time 0.5962 (0.4181) loss 7.0149 (6.8200) grad_norm 2.4007 (3.4314) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][530/625] eta 0:00:39 lr 0.000269 wd 0.0500 time 0.5903 (0.4196) data time 0.0010 (0.0019) model time 0.5893 (0.4196) loss 7.9482 (6.8229) grad_norm 3.8419 (3.4422) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][540/625] eta 0:00:35 lr 0.000268 wd 0.0500 time 0.3976 (0.4202) data time 0.0009 (0.0019) model time 0.3967 (0.4202) loss 7.5300 (6.8316) grad_norm 2.4443 (3.4628) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][550/625] eta 0:00:31 lr 0.000268 wd 0.0500 time 0.4025 (0.4209) data time 0.0006 (0.0019) model time 0.4019 (0.4210) loss 7.7233 (6.8281) grad_norm 2.4086 (3.4904) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][560/625] eta 0:00:27 lr 0.000268 wd 0.0500 time 0.4027 (0.4206) data time 0.0007 (0.0018) model time 0.4020 (0.4206) loss 7.0976 (6.8284) grad_norm 2.3546 (3.5019) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][570/625] eta 0:00:23 lr 0.000268 wd 0.0500 time 0.4126 (0.4203) data time 0.0009 (0.0018) model time 0.4117 (0.4203) loss 7.9382 (6.8293) grad_norm 2.0139 (3.4909) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][580/625] eta 0:00:18 lr 0.000268 wd 0.0500 time 0.4157 (0.4201) data time 0.0009 (0.0018) model time 0.4148 (0.4200) loss 7.1221 (6.8264) grad_norm 3.9509 (3.4814) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][590/625] eta 0:00:14 lr 0.000268 wd 0.0500 time 0.4068 (0.4199) data time 0.0007 (0.0018) model time 0.4061 (0.4198) loss 7.2616 (6.8199) grad_norm 3.8724 (3.4703) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][600/625] eta 0:00:10 lr 0.000268 wd 0.0500 time 0.3967 (0.4197) data time 0.0009 (0.0018) model time 0.3958 (0.4196) loss 5.7344 (6.8247) grad_norm 2.0664 (3.4924) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][610/625] eta 0:00:06 lr 0.000268 wd 0.0500 time 0.3970 (0.4194) data time 0.0005 (0.0018) model time 0.3965 (0.4192) loss 6.5321 (6.8222) grad_norm 3.5374 (3.4948) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][620/625] eta 0:00:02 lr 0.000268 wd 0.0500 time 0.3957 (0.4191) data time 0.0007 (0.0017) model time 0.3950 (0.4189) loss 6.2513 (6.8225) grad_norm 3.9677 (3.4975) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 213 training takes 0:04:21 [2024-07-25 07:47:53 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:47:54 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:47:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.657 (0.657) Loss 0.5684 (0.5684) Acc@1 89.941 (89.941) Acc@5 98.584 (98.584) Mem 14939MB [2024-07-25 07:47:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.138) Loss 0.8525 (0.6832) Acc@1 82.178 (86.728) Acc@5 96.582 (97.758) Mem 14939MB [2024-07-25 07:47:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.113) Loss 0.9727 (0.7965) Acc@1 77.832 (83.594) Acc@5 95.020 (96.594) Mem 14939MB [2024-07-25 07:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.135 Acc@5 96.565 [2024-07-25 07:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.14% [2024-07-25 07:47:57 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 07:47:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 07:48:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 3.114 (3.114) Loss 0.5435 (0.5435) Acc@1 89.941 (89.941) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 07:48:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.102 (0.364) Loss 0.8403 (0.6708) Acc@1 81.836 (86.683) Acc@5 96.484 (97.829) Mem 14939MB [2024-07-25 07:48:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.231) Loss 0.9648 (0.7824) Acc@1 77.490 (83.652) Acc@5 95.703 (96.770) Mem 14939MB [2024-07-25 07:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.227 Acc@5 96.747 [2024-07-25 07:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-07-25 07:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.23% [2024-07-25 07:48:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:48:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:48:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][0/625] eta 0:11:08 lr 0.000268 wd 0.0500 time 1.0696 (1.0696) data time 0.4349 (0.4349) model time 0.0000 (0.0000) loss 6.5121 (6.5121) grad_norm 4.4762 (4.4762) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][10/625] eta 0:04:50 lr 0.000268 wd 0.0500 time 0.4011 (0.4721) data time 0.0008 (0.0403) model time 0.0000 (0.0000) loss 5.9311 (6.5264) grad_norm 2.6970 (2.8329) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][20/625] eta 0:04:27 lr 0.000268 wd 0.0500 time 0.4103 (0.4417) data time 0.0007 (0.0215) model time 0.0000 (0.0000) loss 7.3116 (6.7374) grad_norm 2.4902 (3.0881) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][30/625] eta 0:04:16 lr 0.000267 wd 0.0500 time 0.4149 (0.4306) data time 0.0010 (0.0148) model time 0.0000 (0.0000) loss 7.3268 (6.8213) grad_norm 2.3242 (3.1962) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][40/625] eta 0:04:08 lr 0.000267 wd 0.0500 time 0.4061 (0.4251) data time 0.0007 (0.0114) model time 0.0000 (0.0000) loss 6.5352 (6.8001) grad_norm 2.0570 (3.0684) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][50/625] eta 0:04:02 lr 0.000267 wd 0.0500 time 0.3980 (0.4210) data time 0.0010 (0.0094) model time 0.0000 (0.0000) loss 7.4432 (6.7461) grad_norm 2.3014 (2.9670) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][60/625] eta 0:03:55 lr 0.000267 wd 0.0500 time 0.4068 (0.4176) data time 0.0009 (0.0080) model time 0.4059 (0.3993) loss 6.5646 (6.7368) grad_norm 5.1976 (3.0174) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][70/625] eta 0:03:50 lr 0.000267 wd 0.0500 time 0.3972 (0.4148) data time 0.0006 (0.0070) model time 0.3965 (0.3983) loss 6.7270 (6.7220) grad_norm 4.6302 (3.1011) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][80/625] eta 0:03:45 lr 0.000267 wd 0.0500 time 0.4039 (0.4128) data time 0.0006 (0.0062) model time 0.4032 (0.3981) loss 6.8960 (6.7384) grad_norm 4.8682 (3.3096) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][90/625] eta 0:03:41 lr 0.000267 wd 0.0500 time 0.4033 (0.4136) data time 0.0009 (0.0056) model time 0.4024 (0.4032) loss 6.1956 (6.7073) grad_norm 3.1169 (3.3137) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][100/625] eta 0:03:37 lr 0.000267 wd 0.0500 time 0.3951 (0.4143) data time 0.0010 (0.0051) model time 0.3941 (0.4067) loss 6.6815 (6.7054) grad_norm 4.6622 (3.4147) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][110/625] eta 0:03:35 lr 0.000267 wd 0.0500 time 0.4008 (0.4181) data time 0.0008 (0.0047) model time 0.3999 (0.4148) loss 7.4734 (6.7170) grad_norm 3.0126 (3.3654) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][120/625] eta 0:03:33 lr 0.000267 wd 0.0500 time 0.5927 (0.4223) data time 0.0008 (0.0044) model time 0.5919 (0.4224) loss 6.8905 (6.7154) grad_norm 3.1163 (3.3174) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:48:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][130/625] eta 0:03:30 lr 0.000267 wd 0.0500 time 0.4034 (0.4262) data time 0.0008 (0.0041) model time 0.4026 (0.4286) loss 7.8020 (6.6936) grad_norm 3.1460 (3.2566) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][140/625] eta 0:03:26 lr 0.000267 wd 0.0500 time 0.5143 (0.4262) data time 0.0007 (0.0039) model time 0.5135 (0.4284) loss 6.8183 (6.6977) grad_norm 4.1943 (3.2232) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][150/625] eta 0:03:22 lr 0.000266 wd 0.0500 time 0.4047 (0.4271) data time 0.0006 (0.0037) model time 0.4041 (0.4293) loss 6.7364 (6.7032) grad_norm 3.3472 (3.1853) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][160/625] eta 0:03:17 lr 0.000266 wd 0.0500 time 0.3955 (0.4253) data time 0.0009 (0.0035) model time 0.3946 (0.4265) loss 5.7736 (6.6778) grad_norm 2.6506 (3.1731) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][170/625] eta 0:03:12 lr 0.000266 wd 0.0500 time 0.4079 (0.4239) data time 0.0007 (0.0034) model time 0.4073 (0.4243) loss 7.9658 (6.6980) grad_norm 1.8318 (3.1697) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][180/625] eta 0:03:08 lr 0.000266 wd 0.0500 time 0.4153 (0.4226) data time 0.0006 (0.0032) model time 0.4147 (0.4224) loss 6.7670 (6.6957) grad_norm 2.3170 (3.1675) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][190/625] eta 0:03:03 lr 0.000266 wd 0.0500 time 0.4027 (0.4216) data time 0.0008 (0.0031) model time 0.4019 (0.4210) loss 6.7094 (6.6893) grad_norm 4.2317 (3.1526) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][200/625] eta 0:02:58 lr 0.000266 wd 0.0500 time 0.4088 (0.4211) data time 0.0008 (0.0030) model time 0.4079 (0.4203) loss 6.5689 (6.7004) grad_norm 2.3871 (3.1018) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][210/625] eta 0:02:54 lr 0.000266 wd 0.0500 time 0.4037 (0.4205) data time 0.0009 (0.0029) model time 0.4028 (0.4195) loss 6.9315 (6.7119) grad_norm 2.4614 (3.0843) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][220/625] eta 0:02:50 lr 0.000266 wd 0.0500 time 0.3993 (0.4205) data time 0.0008 (0.0028) model time 0.3985 (0.4195) loss 6.9656 (6.7192) grad_norm 1.6518 (3.0625) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][230/625] eta 0:02:45 lr 0.000266 wd 0.0500 time 0.4032 (0.4197) data time 0.0008 (0.0027) model time 0.4023 (0.4185) loss 7.8241 (6.7051) grad_norm 2.4101 (3.0444) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][240/625] eta 0:02:41 lr 0.000266 wd 0.0500 time 0.3952 (0.4187) data time 0.0008 (0.0026) model time 0.3944 (0.4173) loss 7.2122 (6.7041) grad_norm 2.2484 (3.0271) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][250/625] eta 0:02:36 lr 0.000266 wd 0.0500 time 0.3998 (0.4179) data time 0.0008 (0.0026) model time 0.3990 (0.4163) loss 6.2331 (6.6923) grad_norm 4.8443 (3.0187) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][260/625] eta 0:02:32 lr 0.000265 wd 0.0500 time 0.3964 (0.4171) data time 0.0008 (0.0025) model time 0.3956 (0.4154) loss 6.6395 (6.7040) grad_norm 3.1059 (3.0394) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:49:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][270/625] eta 0:02:27 lr 0.000265 wd 0.0500 time 0.3952 (0.4165) data time 0.0009 (0.0024) model time 0.3943 (0.4146) loss 7.5938 (6.7253) grad_norm 4.7660 (3.0632) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][280/625] eta 0:02:23 lr 0.000265 wd 0.0500 time 0.3944 (0.4159) data time 0.0009 (0.0024) model time 0.3935 (0.4139) loss 7.2732 (6.7223) grad_norm 2.3069 (3.0535) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][290/625] eta 0:02:19 lr 0.000265 wd 0.0500 time 0.4005 (0.4153) data time 0.0009 (0.0023) model time 0.3996 (0.4133) loss 5.5599 (6.7156) grad_norm 2.4240 (3.0897) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][300/625] eta 0:02:14 lr 0.000265 wd 0.0500 time 0.3975 (0.4148) data time 0.0009 (0.0023) model time 0.3966 (0.4126) loss 6.9576 (6.7101) grad_norm 3.9934 (3.0908) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][310/625] eta 0:02:10 lr 0.000265 wd 0.0500 time 0.3977 (0.4149) data time 0.0007 (0.0022) model time 0.3970 (0.4128) loss 7.2635 (6.7192) grad_norm 3.8401 (3.0813) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][320/625] eta 0:02:06 lr 0.000265 wd 0.0500 time 0.5416 (0.4154) data time 0.0009 (0.0022) model time 0.5407 (0.4136) loss 7.5710 (6.7187) grad_norm 2.5559 (3.1352) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][330/625] eta 0:02:02 lr 0.000265 wd 0.0500 time 0.4044 (0.4159) data time 0.0009 (0.0021) model time 0.4035 (0.4141) loss 6.4821 (6.7150) grad_norm 3.1570 (3.1235) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][340/625] eta 0:01:59 lr 0.000265 wd 0.0500 time 0.6160 (0.4182) data time 0.0008 (0.0021) model time 0.6152 (0.4169) loss 7.4040 (6.7192) grad_norm 1.9729 (3.1304) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][350/625] eta 0:01:55 lr 0.000265 wd 0.0500 time 0.3954 (0.4196) data time 0.0007 (0.0021) model time 0.3948 (0.4186) loss 5.9764 (6.7202) grad_norm 5.9916 (3.2017) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][360/625] eta 0:01:51 lr 0.000265 wd 0.0500 time 0.5977 (0.4206) data time 0.0009 (0.0020) model time 0.5968 (0.4197) loss 7.1042 (6.7262) grad_norm 2.4553 (3.2112) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][370/625] eta 0:01:47 lr 0.000264 wd 0.0500 time 0.3930 (0.4210) data time 0.0007 (0.0020) model time 0.3923 (0.4201) loss 5.2707 (6.7187) grad_norm 2.1890 (3.2333) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][380/625] eta 0:01:42 lr 0.000264 wd 0.0500 time 0.3970 (0.4204) data time 0.0008 (0.0020) model time 0.3962 (0.4194) loss 5.7875 (6.7127) grad_norm 2.3347 (3.2248) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][390/625] eta 0:01:38 lr 0.000264 wd 0.0500 time 0.3991 (0.4198) data time 0.0009 (0.0019) model time 0.3982 (0.4188) loss 7.1797 (6.7185) grad_norm 2.2966 (3.2046) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][400/625] eta 0:01:34 lr 0.000264 wd 0.0500 time 0.4018 (0.4192) data time 0.0009 (0.0019) model time 0.4010 (0.4182) loss 7.0654 (6.7187) grad_norm 2.8487 (3.1913) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:50:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][410/625] eta 0:01:30 lr 0.000264 wd 0.0500 time 0.4056 (0.4187) data time 0.0007 (0.0019) model time 0.4048 (0.4175) loss 6.9046 (6.7268) grad_norm 3.8391 (3.1768) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][420/625] eta 0:01:25 lr 0.000264 wd 0.0500 time 0.3948 (0.4182) data time 0.0007 (0.0019) model time 0.3942 (0.4170) loss 6.8006 (6.7292) grad_norm 2.4084 (3.1690) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][430/625] eta 0:01:21 lr 0.000264 wd 0.0500 time 0.3998 (0.4178) data time 0.0007 (0.0018) model time 0.3991 (0.4166) loss 6.6145 (6.7395) grad_norm 2.4323 (3.1582) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][440/625] eta 0:01:17 lr 0.000264 wd 0.0500 time 0.4032 (0.4178) data time 0.0009 (0.0018) model time 0.4024 (0.4165) loss 6.6359 (6.7421) grad_norm 2.6888 (3.1666) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][450/625] eta 0:01:13 lr 0.000264 wd 0.0500 time 0.3961 (0.4175) data time 0.0007 (0.0018) model time 0.3953 (0.4162) loss 6.5281 (6.7378) grad_norm 2.1487 (3.1670) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][460/625] eta 0:01:08 lr 0.000264 wd 0.0500 time 0.3971 (0.4172) data time 0.0009 (0.0018) model time 0.3962 (0.4159) loss 6.9710 (6.7387) grad_norm 3.8995 (3.1691) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][470/625] eta 0:01:04 lr 0.000264 wd 0.0500 time 0.3976 (0.4168) data time 0.0010 (0.0018) model time 0.3967 (0.4154) loss 7.1464 (6.7430) grad_norm 2.6988 (3.1765) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][480/625] eta 0:01:00 lr 0.000264 wd 0.0500 time 0.4109 (0.4164) data time 0.0008 (0.0017) model time 0.4102 (0.4151) loss 7.9373 (6.7471) grad_norm 2.1354 (3.1656) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][490/625] eta 0:00:56 lr 0.000263 wd 0.0500 time 0.4002 (0.4161) data time 0.0006 (0.0017) model time 0.3996 (0.4146) loss 5.5281 (6.7398) grad_norm 1.7583 (3.1780) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][500/625] eta 0:00:51 lr 0.000263 wd 0.0500 time 0.4123 (0.4158) data time 0.0006 (0.0017) model time 0.4117 (0.4144) loss 6.5615 (6.7363) grad_norm 3.0172 (3.1609) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][510/625] eta 0:00:47 lr 0.000263 wd 0.0500 time 0.4057 (0.4156) data time 0.0006 (0.0017) model time 0.4050 (0.4141) loss 7.6063 (6.7371) grad_norm 3.3783 (3.1599) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][520/625] eta 0:00:43 lr 0.000263 wd 0.0500 time 0.4074 (0.4152) data time 0.0008 (0.0017) model time 0.4065 (0.4138) loss 6.8847 (6.7302) grad_norm 2.3421 (3.1642) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][530/625] eta 0:00:39 lr 0.000263 wd 0.0500 time 0.3984 (0.4153) data time 0.0006 (0.0017) model time 0.3978 (0.4139) loss 7.0101 (6.7342) grad_norm 3.8236 (3.1941) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][540/625] eta 0:00:35 lr 0.000263 wd 0.0500 time 0.5966 (0.4157) data time 0.0009 (0.0016) model time 0.5958 (0.4143) loss 5.7764 (6.7360) grad_norm 2.4108 (3.1856) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][550/625] eta 0:00:31 lr 0.000263 wd 0.0500 time 0.5918 (0.4163) data time 0.0009 (0.0016) model time 0.5909 (0.4150) loss 6.1403 (6.7423) grad_norm 4.5295 (3.2106) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:51:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][560/625] eta 0:00:27 lr 0.000263 wd 0.0500 time 0.5964 (0.4172) data time 0.0009 (0.0016) model time 0.5955 (0.4160) loss 7.7959 (6.7368) grad_norm 2.2647 (3.2061) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 07:52:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][570/625] eta 0:00:22 lr 0.000263 wd 0.0500 time 0.4004 (0.4181) data time 0.0009 (0.0016) model time 0.3995 (0.4170) loss 6.5421 (6.7425) grad_norm 3.0316 (3.2133) loss_scale 1024.0000 (518.2767) mem 14939MB [2024-07-25 07:52:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][580/625] eta 0:00:18 lr 0.000263 wd 0.0500 time 0.3982 (0.4184) data time 0.0007 (0.0016) model time 0.3976 (0.4173) loss 6.5181 (6.7442) grad_norm 2.6461 (3.2135) loss_scale 1024.0000 (526.9811) mem 14939MB [2024-07-25 07:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][590/625] eta 0:00:14 lr 0.000263 wd 0.0500 time 0.3999 (0.4185) data time 0.0007 (0.0016) model time 0.3992 (0.4174) loss 5.8475 (6.7412) grad_norm 3.4512 (3.2094) loss_scale 1024.0000 (535.3909) mem 14939MB [2024-07-25 07:52:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][600/625] eta 0:00:10 lr 0.000262 wd 0.0500 time 0.3924 (0.4182) data time 0.0010 (0.0016) model time 0.3914 (0.4171) loss 7.2926 (6.7393) grad_norm 2.3141 (3.1987) loss_scale 1024.0000 (543.5208) mem 14939MB [2024-07-25 07:52:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][610/625] eta 0:00:06 lr 0.000262 wd 0.0500 time 0.3991 (0.4179) data time 0.0006 (0.0016) model time 0.3984 (0.4168) loss 6.0548 (6.7394) grad_norm 2.5662 (3.1906) loss_scale 1024.0000 (551.3846) mem 14939MB [2024-07-25 07:52:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][620/625] eta 0:00:02 lr 0.000262 wd 0.0500 time 0.4059 (0.4176) data time 0.0007 (0.0015) model time 0.4052 (0.4165) loss 6.5913 (6.7479) grad_norm 2.7527 (3.1844) loss_scale 1024.0000 (558.9952) mem 14939MB [2024-07-25 07:52:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 214 training takes 0:04:20 [2024-07-25 07:52:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:52:25 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:52:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.757 (0.757) Loss 0.5669 (0.5669) Acc@1 89.746 (89.746) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 07:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.147) Loss 0.8789 (0.6891) Acc@1 80.957 (86.630) Acc@5 96.338 (97.754) Mem 14939MB [2024-07-25 07:52:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.118) Loss 0.9824 (0.8010) Acc@1 77.734 (83.536) Acc@5 95.020 (96.587) Mem 14939MB [2024-07-25 07:52:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.145 Acc@5 96.555 [2024-07-25 07:52:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:52:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.15% [2024-07-25 07:52:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 07:52:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 07:52:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.647 (1.647) Loss 0.5430 (0.5430) Acc@1 89.941 (89.941) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 07:52:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.228) Loss 0.8403 (0.6703) Acc@1 81.689 (86.705) Acc@5 96.533 (97.852) Mem 14939MB [2024-07-25 07:52:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.161) Loss 0.9639 (0.7816) Acc@1 77.539 (83.675) Acc@5 95.654 (96.777) Mem 14939MB [2024-07-25 07:52:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.253 Acc@5 96.753 [2024-07-25 07:52:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 07:52:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.25% [2024-07-25 07:52:33 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:52:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:52:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][0/625] eta 0:08:11 lr 0.000262 wd 0.0500 time 0.7870 (0.7870) data time 0.3985 (0.3985) model time 0.0000 (0.0000) loss 7.2191 (7.2191) grad_norm 3.2582 (3.2582) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:52:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][10/625] eta 0:04:33 lr 0.000262 wd 0.0500 time 0.4116 (0.4451) data time 0.0007 (0.0370) model time 0.0000 (0.0000) loss 5.2223 (6.4396) grad_norm 6.5072 (3.7012) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:52:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][20/625] eta 0:04:17 lr 0.000262 wd 0.0500 time 0.3827 (0.4262) data time 0.0010 (0.0198) model time 0.0000 (0.0000) loss 6.1105 (6.5252) grad_norm 3.5703 (3.8102) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:52:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][30/625] eta 0:04:08 lr 0.000262 wd 0.0500 time 0.3942 (0.4178) data time 0.0007 (0.0137) model time 0.0000 (0.0000) loss 7.0696 (6.6710) grad_norm 3.2454 (3.6095) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:52:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][40/625] eta 0:04:01 lr 0.000262 wd 0.0500 time 0.3985 (0.4136) data time 0.0007 (0.0106) model time 0.0000 (0.0000) loss 7.0568 (6.7509) grad_norm 2.2572 (3.4037) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:52:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][50/625] eta 0:03:56 lr 0.000262 wd 0.0500 time 0.3972 (0.4113) data time 0.0009 (0.0087) model time 0.0000 (0.0000) loss 5.7774 (6.7667) grad_norm 3.2874 (3.2973) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:52:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][60/625] eta 0:03:51 lr 0.000262 wd 0.0500 time 0.4000 (0.4091) data time 0.0010 (0.0074) model time 0.3990 (0.3970) loss 7.3430 (6.7745) grad_norm 5.7716 (3.2639) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:53:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][70/625] eta 0:03:46 lr 0.000262 wd 0.0500 time 0.4010 (0.4080) data time 0.0007 (0.0065) model time 0.4003 (0.3988) loss 7.4793 (6.7868) grad_norm 4.4596 (3.3500) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:53:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][80/625] eta 0:03:41 lr 0.000262 wd 0.0500 time 0.3923 (0.4067) data time 0.0006 (0.0058) model time 0.3917 (0.3980) loss 6.8801 (6.7670) grad_norm 2.2731 (3.3267) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:53:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][90/625] eta 0:03:37 lr 0.000261 wd 0.0500 time 0.4029 (0.4060) data time 0.0007 (0.0052) model time 0.4022 (0.3985) loss 7.0243 (6.7735) grad_norm 3.0427 (3.2883) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:53:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][100/625] eta 0:03:32 lr 0.000261 wd 0.0500 time 0.3968 (0.4053) data time 0.0007 (0.0048) model time 0.3961 (0.3985) loss 5.8862 (6.7902) grad_norm 3.5480 (3.1989) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:53:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][110/625] eta 0:03:28 lr 0.000261 wd 0.0500 time 0.3936 (0.4046) data time 0.0009 (0.0044) model time 0.3927 (0.3981) loss 7.3353 (6.7761) grad_norm 2.6122 (3.1255) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:53:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][120/625] eta 0:03:24 lr 0.000261 wd 0.0500 time 0.4053 (0.4044) data time 0.0008 (0.0041) model time 0.4046 (0.3986) loss 8.2819 (6.7991) grad_norm 2.5687 (3.0861) loss_scale 1024.0000 (1024.0000) mem 14939MB [2024-07-25 07:53:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][130/625] eta 0:03:20 lr 0.000261 wd 0.0500 time 0.4026 (0.4055) data time 0.0008 (0.0039) model time 0.4018 (0.4009) loss 7.0796 (6.8195) grad_norm 4.4574 (inf) loss_scale 512.0000 (984.9160) mem 14939MB [2024-07-25 07:53:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][140/625] eta 0:03:17 lr 0.000261 wd 0.0500 time 0.3997 (0.4082) data time 0.0010 (0.0037) model time 0.3987 (0.4057) loss 7.1931 (6.8597) grad_norm 2.8523 (inf) loss_scale 512.0000 (951.3759) mem 14939MB [2024-07-25 07:53:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][150/625] eta 0:03:15 lr 0.000261 wd 0.0500 time 0.5635 (0.4124) data time 0.0009 (0.0035) model time 0.5626 (0.4122) loss 6.4875 (6.8300) grad_norm 2.2767 (inf) loss_scale 512.0000 (922.2781) mem 14939MB [2024-07-25 07:53:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][160/625] eta 0:03:14 lr 0.000261 wd 0.0500 time 0.5787 (0.4184) data time 0.0009 (0.0033) model time 0.5779 (0.4209) loss 7.6679 (6.8532) grad_norm 3.7040 (inf) loss_scale 512.0000 (896.7950) mem 14939MB [2024-07-25 07:53:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][170/625] eta 0:03:11 lr 0.000261 wd 0.0500 time 0.5521 (0.4213) data time 0.0007 (0.0032) model time 0.5515 (0.4248) loss 6.5924 (6.8578) grad_norm 2.0124 (inf) loss_scale 512.0000 (874.2924) mem 14939MB [2024-07-25 07:53:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][180/625] eta 0:03:08 lr 0.000261 wd 0.0500 time 0.3985 (0.4232) data time 0.0006 (0.0030) model time 0.3979 (0.4271) loss 6.1152 (6.8520) grad_norm 1.9152 (inf) loss_scale 512.0000 (854.2762) mem 14939MB [2024-07-25 07:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][190/625] eta 0:03:03 lr 0.000261 wd 0.0500 time 0.3986 (0.4227) data time 0.0008 (0.0029) model time 0.3978 (0.4261) loss 6.7654 (6.8387) grad_norm 1.7446 (inf) loss_scale 512.0000 (836.3560) mem 14939MB [2024-07-25 07:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][200/625] eta 0:02:59 lr 0.000261 wd 0.0500 time 0.3980 (0.4216) data time 0.0008 (0.0028) model time 0.3972 (0.4242) loss 7.9588 (6.8305) grad_norm 2.5893 (inf) loss_scale 512.0000 (820.2189) mem 14939MB [2024-07-25 07:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][210/625] eta 0:02:54 lr 0.000260 wd 0.0500 time 0.4039 (0.4207) data time 0.0009 (0.0027) model time 0.4030 (0.4228) loss 6.3237 (6.8213) grad_norm 9.8271 (inf) loss_scale 512.0000 (805.6114) mem 14939MB [2024-07-25 07:54:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][220/625] eta 0:02:50 lr 0.000260 wd 0.0500 time 0.4052 (0.4199) data time 0.0008 (0.0026) model time 0.4043 (0.4217) loss 7.4194 (6.8189) grad_norm 3.3425 (inf) loss_scale 512.0000 (792.3258) mem 14939MB [2024-07-25 07:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][230/625] eta 0:02:45 lr 0.000260 wd 0.0500 time 0.4013 (0.4192) data time 0.0008 (0.0026) model time 0.4005 (0.4206) loss 7.2886 (6.8202) grad_norm 3.6005 (inf) loss_scale 512.0000 (780.1905) mem 14939MB [2024-07-25 07:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][240/625] eta 0:02:41 lr 0.000260 wd 0.0500 time 0.4027 (0.4185) data time 0.0006 (0.0025) model time 0.4021 (0.4196) loss 7.9916 (6.8208) grad_norm 2.1154 (inf) loss_scale 512.0000 (769.0622) mem 14939MB [2024-07-25 07:54:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][250/625] eta 0:02:36 lr 0.000260 wd 0.0500 time 0.3977 (0.4178) data time 0.0009 (0.0024) model time 0.3968 (0.4186) loss 7.5828 (6.8284) grad_norm 3.9808 (inf) loss_scale 512.0000 (758.8207) mem 14939MB [2024-07-25 07:54:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][260/625] eta 0:02:32 lr 0.000260 wd 0.0500 time 0.3982 (0.4171) data time 0.0007 (0.0024) model time 0.3975 (0.4177) loss 7.8532 (6.8265) grad_norm 2.7075 (inf) loss_scale 512.0000 (749.3640) mem 14939MB [2024-07-25 07:54:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][270/625] eta 0:02:27 lr 0.000260 wd 0.0500 time 0.4016 (0.4164) data time 0.0008 (0.0023) model time 0.4008 (0.4168) loss 7.3185 (6.8260) grad_norm 2.9290 (inf) loss_scale 512.0000 (740.6052) mem 14939MB [2024-07-25 07:54:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][280/625] eta 0:02:23 lr 0.000260 wd 0.0500 time 0.3954 (0.4158) data time 0.0007 (0.0022) model time 0.3947 (0.4160) loss 7.4180 (6.8137) grad_norm 2.0699 (inf) loss_scale 256.0000 (727.9146) mem 14939MB [2024-07-25 07:54:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][290/625] eta 0:02:19 lr 0.000260 wd 0.0500 time 0.3993 (0.4153) data time 0.0009 (0.0022) model time 0.3984 (0.4153) loss 7.5043 (6.8208) grad_norm 2.6078 (inf) loss_scale 256.0000 (711.6976) mem 14939MB [2024-07-25 07:54:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][300/625] eta 0:02:14 lr 0.000260 wd 0.0500 time 0.4090 (0.4153) data time 0.0009 (0.0022) model time 0.4082 (0.4153) loss 7.4744 (6.8246) grad_norm 2.6224 (inf) loss_scale 256.0000 (696.5581) mem 14939MB [2024-07-25 07:54:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][310/625] eta 0:02:10 lr 0.000260 wd 0.0500 time 0.3992 (0.4148) data time 0.0007 (0.0021) model time 0.3985 (0.4147) loss 5.4070 (6.8095) grad_norm 2.0315 (inf) loss_scale 256.0000 (682.3923) mem 14939MB [2024-07-25 07:54:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][320/625] eta 0:02:06 lr 0.000259 wd 0.0500 time 0.4064 (0.4145) data time 0.0007 (0.0021) model time 0.4058 (0.4143) loss 6.7375 (6.8094) grad_norm 7.3154 (inf) loss_scale 256.0000 (669.1090) mem 14939MB [2024-07-25 07:54:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][330/625] eta 0:02:02 lr 0.000259 wd 0.0500 time 0.3983 (0.4141) data time 0.0007 (0.0020) model time 0.3976 (0.4138) loss 5.6136 (6.7963) grad_norm 2.4862 (inf) loss_scale 256.0000 (656.6284) mem 14939MB [2024-07-25 07:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][340/625] eta 0:01:57 lr 0.000259 wd 0.0500 time 0.4014 (0.4137) data time 0.0009 (0.0020) model time 0.4006 (0.4134) loss 7.9847 (6.7999) grad_norm 3.4343 (inf) loss_scale 256.0000 (644.8798) mem 14939MB [2024-07-25 07:54:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][350/625] eta 0:01:53 lr 0.000259 wd 0.0500 time 0.4015 (0.4139) data time 0.0006 (0.0020) model time 0.4008 (0.4135) loss 7.1325 (6.8126) grad_norm 2.6458 (inf) loss_scale 256.0000 (633.8006) mem 14939MB [2024-07-25 07:55:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][360/625] eta 0:01:49 lr 0.000259 wd 0.0500 time 0.5155 (0.4151) data time 0.0007 (0.0019) model time 0.5148 (0.4149) loss 7.9774 (6.8176) grad_norm 2.1813 (inf) loss_scale 256.0000 (623.3352) mem 14939MB [2024-07-25 07:55:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][370/625] eta 0:01:46 lr 0.000259 wd 0.0500 time 0.4001 (0.4160) data time 0.0009 (0.0019) model time 0.3992 (0.4159) loss 6.6818 (6.8131) grad_norm 1.9600 (inf) loss_scale 256.0000 (613.4340) mem 14939MB [2024-07-25 07:55:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][380/625] eta 0:01:42 lr 0.000259 wd 0.0500 time 0.4166 (0.4175) data time 0.0007 (0.0019) model time 0.4159 (0.4176) loss 5.7383 (6.8022) grad_norm 2.1355 (inf) loss_scale 256.0000 (604.0525) mem 14939MB [2024-07-25 07:55:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][390/625] eta 0:01:38 lr 0.000259 wd 0.0500 time 0.5994 (0.4199) data time 0.0006 (0.0018) model time 0.5989 (0.4203) loss 6.9832 (6.8062) grad_norm 2.0722 (inf) loss_scale 256.0000 (595.1509) mem 14939MB [2024-07-25 07:55:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][400/625] eta 0:01:34 lr 0.000259 wd 0.0500 time 0.5566 (0.4216) data time 0.0009 (0.0018) model time 0.5558 (0.4223) loss 7.9520 (6.8123) grad_norm 22.2219 (inf) loss_scale 256.0000 (586.6933) mem 14939MB [2024-07-25 07:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][410/625] eta 0:01:30 lr 0.000259 wd 0.0500 time 0.4005 (0.4213) data time 0.0008 (0.0018) model time 0.3997 (0.4219) loss 7.3760 (6.8117) grad_norm 3.1077 (inf) loss_scale 256.0000 (578.6472) mem 14939MB [2024-07-25 07:55:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][420/625] eta 0:01:26 lr 0.000259 wd 0.0500 time 0.4013 (0.4208) data time 0.0007 (0.0018) model time 0.4006 (0.4212) loss 6.2592 (6.8164) grad_norm 3.3084 (inf) loss_scale 256.0000 (570.9834) mem 14939MB [2024-07-25 07:55:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][430/625] eta 0:01:21 lr 0.000259 wd 0.0500 time 0.4002 (0.4203) data time 0.0007 (0.0018) model time 0.3995 (0.4207) loss 5.6854 (6.8166) grad_norm 2.0203 (inf) loss_scale 256.0000 (563.6752) mem 14939MB [2024-07-25 07:55:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][440/625] eta 0:01:17 lr 0.000258 wd 0.0500 time 0.4001 (0.4199) data time 0.0007 (0.0017) model time 0.3994 (0.4202) loss 7.1129 (6.8200) grad_norm 3.5425 (inf) loss_scale 256.0000 (556.6984) mem 14939MB [2024-07-25 07:55:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][450/625] eta 0:01:13 lr 0.000258 wd 0.0500 time 0.4025 (0.4195) data time 0.0009 (0.0017) model time 0.4016 (0.4197) loss 5.9199 (6.8202) grad_norm 3.4598 (inf) loss_scale 256.0000 (550.0310) mem 14939MB [2024-07-25 07:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][460/625] eta 0:01:09 lr 0.000258 wd 0.0500 time 0.3958 (0.4190) data time 0.0008 (0.0017) model time 0.3949 (0.4192) loss 7.1551 (6.8208) grad_norm 2.3076 (inf) loss_scale 256.0000 (543.6529) mem 14939MB [2024-07-25 07:55:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][470/625] eta 0:01:04 lr 0.000258 wd 0.0500 time 0.4093 (0.4187) data time 0.0009 (0.0017) model time 0.4084 (0.4187) loss 7.3949 (6.8177) grad_norm 3.6542 (inf) loss_scale 256.0000 (537.5456) mem 14939MB [2024-07-25 07:55:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][480/625] eta 0:01:00 lr 0.000258 wd 0.0500 time 0.4086 (0.4183) data time 0.0008 (0.0017) model time 0.4078 (0.4183) loss 7.6907 (6.8171) grad_norm 3.2361 (inf) loss_scale 256.0000 (531.6923) mem 14939MB [2024-07-25 07:55:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][490/625] eta 0:00:56 lr 0.000258 wd 0.0500 time 0.3980 (0.4179) data time 0.0007 (0.0016) model time 0.3974 (0.4179) loss 6.0229 (6.8266) grad_norm 2.5181 (inf) loss_scale 256.0000 (526.0774) mem 14939MB [2024-07-25 07:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][500/625] eta 0:00:52 lr 0.000258 wd 0.0500 time 0.4003 (0.4176) data time 0.0009 (0.0016) model time 0.3994 (0.4175) loss 7.2496 (6.8282) grad_norm 2.2969 (inf) loss_scale 256.0000 (520.6866) mem 14939MB [2024-07-25 07:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][510/625] eta 0:00:47 lr 0.000258 wd 0.0500 time 0.4048 (0.4173) data time 0.0007 (0.0016) model time 0.4041 (0.4172) loss 7.3090 (6.8264) grad_norm 3.2095 (inf) loss_scale 256.0000 (515.5068) mem 14939MB [2024-07-25 07:56:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][520/625] eta 0:00:43 lr 0.000258 wd 0.0500 time 0.3942 (0.4170) data time 0.0011 (0.0016) model time 0.3932 (0.4168) loss 6.0319 (6.8170) grad_norm 1.8896 (inf) loss_scale 256.0000 (510.5259) mem 14939MB [2024-07-25 07:56:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][530/625] eta 0:00:39 lr 0.000258 wd 0.0500 time 0.3981 (0.4167) data time 0.0010 (0.0016) model time 0.3972 (0.4164) loss 5.8027 (6.8144) grad_norm 2.9063 (inf) loss_scale 256.0000 (505.7326) mem 14939MB [2024-07-25 07:56:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][540/625] eta 0:00:35 lr 0.000258 wd 0.0500 time 0.4024 (0.4164) data time 0.0008 (0.0016) model time 0.4016 (0.4161) loss 6.8941 (6.8110) grad_norm 3.5269 (inf) loss_scale 256.0000 (501.1165) mem 14939MB [2024-07-25 07:56:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][550/625] eta 0:00:31 lr 0.000258 wd 0.0500 time 0.4061 (0.4161) data time 0.0009 (0.0016) model time 0.4053 (0.4157) loss 6.5775 (6.8119) grad_norm 2.4587 (inf) loss_scale 256.0000 (496.6679) mem 14939MB [2024-07-25 07:56:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][560/625] eta 0:00:27 lr 0.000257 wd 0.0500 time 0.4013 (0.4158) data time 0.0008 (0.0015) model time 0.4005 (0.4154) loss 6.6977 (6.8113) grad_norm 2.9371 (inf) loss_scale 256.0000 (492.3779) mem 14939MB [2024-07-25 07:56:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][570/625] eta 0:00:22 lr 0.000257 wd 0.0500 time 0.3994 (0.4158) data time 0.0009 (0.0015) model time 0.3984 (0.4154) loss 7.0103 (6.8061) grad_norm 1.9202 (inf) loss_scale 256.0000 (488.2382) mem 14939MB [2024-07-25 07:56:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][580/625] eta 0:00:18 lr 0.000257 wd 0.0500 time 0.6006 (0.4165) data time 0.0007 (0.0015) model time 0.5999 (0.4161) loss 5.5616 (6.8108) grad_norm 3.1577 (inf) loss_scale 256.0000 (484.2410) mem 14939MB [2024-07-25 07:56:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][590/625] eta 0:00:14 lr 0.000257 wd 0.0500 time 0.4009 (0.4171) data time 0.0007 (0.0015) model time 0.4002 (0.4168) loss 6.2556 (6.8087) grad_norm 2.7880 (inf) loss_scale 256.0000 (480.3790) mem 14939MB [2024-07-25 07:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][600/625] eta 0:00:10 lr 0.000257 wd 0.0500 time 0.6090 (0.4186) data time 0.0009 (0.0015) model time 0.6081 (0.4184) loss 7.4752 (6.8035) grad_norm 2.1457 (inf) loss_scale 256.0000 (476.6456) mem 14939MB [2024-07-25 07:56:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][610/625] eta 0:00:06 lr 0.000257 wd 0.0500 time 0.4052 (0.4194) data time 0.0005 (0.0015) model time 0.4047 (0.4193) loss 7.0043 (6.7995) grad_norm 2.7472 (inf) loss_scale 256.0000 (473.0344) mem 14939MB [2024-07-25 07:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][620/625] eta 0:00:02 lr 0.000257 wd 0.0500 time 0.3957 (0.4199) data time 0.0005 (0.0015) model time 0.3952 (0.4198) loss 5.9331 (6.7958) grad_norm 3.3512 (inf) loss_scale 256.0000 (469.5395) mem 14939MB [2024-07-25 07:56:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 215 training takes 0:04:22 [2024-07-25 07:56:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 07:56:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 07:56:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5581 (0.5581) Acc@1 89.502 (89.502) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 07:56:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8623 (0.6780) Acc@1 81.250 (86.439) Acc@5 96.533 (97.798) Mem 14939MB [2024-07-25 07:56:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9814 (0.7915) Acc@1 77.344 (83.522) Acc@5 94.922 (96.652) Mem 14939MB [2024-07-25 07:57:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.115 Acc@5 96.635 [2024-07-25 07:57:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 07:57:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 5.547 (5.547) Loss 0.5430 (0.5430) Acc@1 89.941 (89.941) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 07:57:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.582) Loss 0.8389 (0.6697) Acc@1 81.787 (86.714) Acc@5 96.484 (97.856) Mem 14939MB [2024-07-25 07:57:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.345) Loss 0.9629 (0.7810) Acc@1 77.490 (83.696) Acc@5 95.703 (96.810) Mem 14939MB [2024-07-25 07:57:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.269 Acc@5 96.777 [2024-07-25 07:57:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 07:57:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.27% [2024-07-25 07:57:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 07:57:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 07:57:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][0/625] eta 0:08:31 lr 0.000257 wd 0.0500 time 0.8189 (0.8189) data time 0.4406 (0.4406) model time 0.0000 (0.0000) loss 6.5264 (6.5264) grad_norm 2.3053 (2.3053) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][10/625] eta 0:04:38 lr 0.000257 wd 0.0500 time 0.4042 (0.4536) data time 0.0007 (0.0408) model time 0.0000 (0.0000) loss 8.4872 (7.0582) grad_norm 5.5053 (3.3004) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][20/625] eta 0:04:19 lr 0.000257 wd 0.0500 time 0.3961 (0.4286) data time 0.0009 (0.0218) model time 0.0000 (0.0000) loss 6.6214 (7.0183) grad_norm 5.2747 (3.7418) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][30/625] eta 0:04:09 lr 0.000257 wd 0.0500 time 0.4062 (0.4194) data time 0.0009 (0.0150) model time 0.0000 (0.0000) loss 6.8221 (6.9302) grad_norm 3.3900 (3.9063) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][40/625] eta 0:04:03 lr 0.000257 wd 0.0500 time 0.4032 (0.4159) data time 0.0009 (0.0116) model time 0.0000 (0.0000) loss 8.1312 (6.9157) grad_norm 2.9373 (3.7984) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][50/625] eta 0:03:57 lr 0.000256 wd 0.0500 time 0.3973 (0.4138) data time 0.0006 (0.0095) model time 0.0000 (0.0000) loss 6.7441 (6.8830) grad_norm 5.2990 (4.0260) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][60/625] eta 0:03:52 lr 0.000256 wd 0.0500 time 0.4083 (0.4123) data time 0.0007 (0.0081) model time 0.4076 (0.4033) loss 7.9887 (6.8817) grad_norm 4.5578 (3.9159) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][70/625] eta 0:03:47 lr 0.000256 wd 0.0500 time 0.3941 (0.4105) data time 0.0007 (0.0070) model time 0.3934 (0.4013) loss 5.8215 (6.8762) grad_norm 2.8040 (3.8148) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][80/625] eta 0:03:42 lr 0.000256 wd 0.0500 time 0.3979 (0.4090) data time 0.0007 (0.0063) model time 0.3972 (0.4000) loss 6.7277 (6.8671) grad_norm 1.9709 (3.6818) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][90/625] eta 0:03:38 lr 0.000256 wd 0.0500 time 0.3999 (0.4080) data time 0.0009 (0.0057) model time 0.3990 (0.3996) loss 7.8537 (6.8842) grad_norm 4.8915 (4.1178) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][100/625] eta 0:03:33 lr 0.000256 wd 0.0500 time 0.4003 (0.4073) data time 0.0007 (0.0052) model time 0.3997 (0.3998) loss 5.8351 (6.8803) grad_norm 2.4701 (4.0354) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][110/625] eta 0:03:29 lr 0.000256 wd 0.0500 time 0.3994 (0.4070) data time 0.0007 (0.0048) model time 0.3987 (0.4002) loss 7.8319 (6.8916) grad_norm 3.5817 (4.1389) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:57:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][120/625] eta 0:03:25 lr 0.000256 wd 0.0500 time 0.4160 (0.4064) data time 0.0007 (0.0045) model time 0.4153 (0.4002) loss 7.2335 (6.8802) grad_norm 3.8962 (4.0768) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][130/625] eta 0:03:20 lr 0.000256 wd 0.0500 time 0.3866 (0.4061) data time 0.0011 (0.0042) model time 0.3855 (0.4002) loss 7.2919 (6.8971) grad_norm 3.2440 (4.0313) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][140/625] eta 0:03:16 lr 0.000256 wd 0.0500 time 0.4004 (0.4057) data time 0.0007 (0.0040) model time 0.3997 (0.4002) loss 7.1376 (6.9213) grad_norm 2.2133 (3.9977) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][150/625] eta 0:03:13 lr 0.000256 wd 0.0500 time 0.4049 (0.4065) data time 0.0008 (0.0038) model time 0.4041 (0.4019) loss 5.8832 (6.9101) grad_norm 2.5214 (3.9541) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][160/625] eta 0:03:09 lr 0.000255 wd 0.0500 time 0.4018 (0.4069) data time 0.0010 (0.0036) model time 0.4008 (0.4029) loss 6.3270 (6.9019) grad_norm 54.4571 (4.2143) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][170/625] eta 0:03:04 lr 0.000255 wd 0.0500 time 0.4050 (0.4065) data time 0.0009 (0.0034) model time 0.4041 (0.4026) loss 5.7524 (6.8865) grad_norm 2.4875 (4.1580) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][180/625] eta 0:03:01 lr 0.000255 wd 0.0500 time 0.4015 (0.4080) data time 0.0009 (0.0033) model time 0.4007 (0.4049) loss 7.5509 (6.8935) grad_norm 4.9542 (4.0977) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][190/625] eta 0:02:59 lr 0.000255 wd 0.0500 time 0.3981 (0.4126) data time 0.0006 (0.0031) model time 0.3975 (0.4113) loss 5.7132 (6.9023) grad_norm 4.3804 (4.0344) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][200/625] eta 0:02:57 lr 0.000255 wd 0.0500 time 0.4020 (0.4173) data time 0.0007 (0.0030) model time 0.4013 (0.4177) loss 7.1386 (6.8968) grad_norm 2.2629 (3.9957) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][210/625] eta 0:02:54 lr 0.000255 wd 0.0500 time 0.3967 (0.4208) data time 0.0008 (0.0029) model time 0.3959 (0.4221) loss 6.0918 (6.9013) grad_norm 3.3105 (3.9495) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][220/625] eta 0:02:51 lr 0.000255 wd 0.0500 time 0.3937 (0.4222) data time 0.0008 (0.0028) model time 0.3929 (0.4239) loss 6.4482 (6.9068) grad_norm 8.8368 (3.9358) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][230/625] eta 0:02:46 lr 0.000255 wd 0.0500 time 0.3997 (0.4218) data time 0.0008 (0.0027) model time 0.3989 (0.4232) loss 6.6185 (6.9043) grad_norm 2.6034 (3.9099) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][240/625] eta 0:02:42 lr 0.000255 wd 0.0500 time 0.3922 (0.4208) data time 0.0006 (0.0027) model time 0.3915 (0.4218) loss 6.9587 (6.8826) grad_norm 2.3563 (3.8623) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][250/625] eta 0:02:37 lr 0.000255 wd 0.0500 time 0.3977 (0.4202) data time 0.0007 (0.0026) model time 0.3970 (0.4210) loss 6.2099 (6.8815) grad_norm 3.9053 (3.8108) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:58:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][260/625] eta 0:02:33 lr 0.000255 wd 0.0500 time 0.3970 (0.4194) data time 0.0006 (0.0025) model time 0.3963 (0.4199) loss 6.2588 (6.8777) grad_norm 2.5015 (3.7784) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][270/625] eta 0:02:28 lr 0.000255 wd 0.0500 time 0.3975 (0.4187) data time 0.0008 (0.0025) model time 0.3968 (0.4189) loss 6.9640 (6.8617) grad_norm 2.2194 (3.7317) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][280/625] eta 0:02:24 lr 0.000254 wd 0.0500 time 0.3890 (0.4180) data time 0.0009 (0.0024) model time 0.3882 (0.4181) loss 7.0289 (6.8658) grad_norm 2.6068 (3.7032) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][290/625] eta 0:02:19 lr 0.000254 wd 0.0500 time 0.3980 (0.4174) data time 0.0007 (0.0024) model time 0.3973 (0.4173) loss 5.9118 (6.8633) grad_norm 2.8696 (3.7078) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][300/625] eta 0:02:15 lr 0.000254 wd 0.0500 time 0.3915 (0.4168) data time 0.0009 (0.0023) model time 0.3906 (0.4166) loss 7.2932 (6.8498) grad_norm 4.0439 (3.6866) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][310/625] eta 0:02:11 lr 0.000254 wd 0.0500 time 0.3992 (0.4163) data time 0.0008 (0.0023) model time 0.3984 (0.4160) loss 6.0101 (6.8401) grad_norm 1.8712 (3.6594) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][320/625] eta 0:02:06 lr 0.000254 wd 0.0500 time 0.3947 (0.4158) data time 0.0009 (0.0022) model time 0.3937 (0.4153) loss 6.5538 (6.8348) grad_norm 2.1591 (3.6315) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][330/625] eta 0:02:02 lr 0.000254 wd 0.0500 time 0.3965 (0.4153) data time 0.0007 (0.0022) model time 0.3958 (0.4148) loss 6.2037 (6.8282) grad_norm 3.4684 (3.6141) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][340/625] eta 0:01:58 lr 0.000254 wd 0.0500 time 0.4021 (0.4149) data time 0.0006 (0.0021) model time 0.4014 (0.4142) loss 5.7741 (6.8352) grad_norm 2.2418 (3.5822) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][350/625] eta 0:01:53 lr 0.000254 wd 0.0500 time 0.3948 (0.4145) data time 0.0007 (0.0021) model time 0.3941 (0.4138) loss 7.0213 (6.8348) grad_norm 2.6420 (3.5539) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][360/625] eta 0:01:49 lr 0.000254 wd 0.0500 time 0.3976 (0.4141) data time 0.0007 (0.0021) model time 0.3969 (0.4133) loss 7.1835 (6.8378) grad_norm 4.9005 (3.5371) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][370/625] eta 0:01:45 lr 0.000254 wd 0.0500 time 0.4132 (0.4142) data time 0.0008 (0.0020) model time 0.4123 (0.4134) loss 6.7603 (6.8332) grad_norm 4.0644 (3.5352) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][380/625] eta 0:01:41 lr 0.000254 wd 0.0500 time 0.3961 (0.4139) data time 0.0006 (0.0020) model time 0.3954 (0.4130) loss 7.1859 (6.8296) grad_norm 2.3550 (3.5047) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][390/625] eta 0:01:37 lr 0.000253 wd 0.0500 time 0.3964 (0.4139) data time 0.0009 (0.0020) model time 0.3956 (0.4130) loss 5.7524 (6.8311) grad_norm 4.3153 (3.4909) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][400/625] eta 0:01:33 lr 0.000253 wd 0.0500 time 0.3995 (0.4145) data time 0.0006 (0.0019) model time 0.3988 (0.4138) loss 6.4414 (6.8236) grad_norm 2.5080 (3.5183) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 07:59:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][410/625] eta 0:01:29 lr 0.000253 wd 0.0500 time 0.5929 (0.4158) data time 0.0009 (0.0019) model time 0.5919 (0.4153) loss 5.5136 (6.8114) grad_norm 1.9343 (3.5343) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][420/625] eta 0:01:25 lr 0.000253 wd 0.0500 time 0.3963 (0.4172) data time 0.0009 (0.0019) model time 0.3954 (0.4168) loss 7.3813 (6.8105) grad_norm 8.5878 (3.5490) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][430/625] eta 0:01:21 lr 0.000253 wd 0.0500 time 0.6060 (0.4184) data time 0.0009 (0.0019) model time 0.6052 (0.4181) loss 5.3413 (6.7945) grad_norm 3.3215 (3.5477) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][440/625] eta 0:01:17 lr 0.000253 wd 0.0500 time 0.3989 (0.4187) data time 0.0008 (0.0018) model time 0.3980 (0.4185) loss 6.4420 (6.7935) grad_norm 3.5099 (3.5407) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][450/625] eta 0:01:13 lr 0.000253 wd 0.0500 time 0.3939 (0.4186) data time 0.0007 (0.0018) model time 0.3931 (0.4184) loss 6.5718 (6.7943) grad_norm 1.9364 (3.5202) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][460/625] eta 0:01:09 lr 0.000253 wd 0.0500 time 0.3979 (0.4182) data time 0.0009 (0.0018) model time 0.3970 (0.4179) loss 7.6929 (6.7856) grad_norm 1.8128 (3.5110) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][470/625] eta 0:01:04 lr 0.000253 wd 0.0500 time 0.4018 (0.4178) data time 0.0008 (0.0018) model time 0.4010 (0.4174) loss 7.0782 (6.7923) grad_norm 2.2194 (3.4921) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][480/625] eta 0:01:00 lr 0.000253 wd 0.0500 time 0.3984 (0.4174) data time 0.0009 (0.0018) model time 0.3975 (0.4169) loss 6.2703 (6.7851) grad_norm 2.5948 (3.4686) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][490/625] eta 0:00:56 lr 0.000253 wd 0.0500 time 0.3933 (0.4170) data time 0.0009 (0.0017) model time 0.3924 (0.4165) loss 5.6420 (6.7772) grad_norm 2.4454 (3.4637) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][500/625] eta 0:00:52 lr 0.000253 wd 0.0500 time 0.3971 (0.4166) data time 0.0007 (0.0017) model time 0.3965 (0.4161) loss 7.2760 (6.7835) grad_norm 5.4996 (3.5474) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][510/625] eta 0:00:47 lr 0.000252 wd 0.0500 time 0.3963 (0.4162) data time 0.0006 (0.0017) model time 0.3957 (0.4157) loss 6.2130 (6.7853) grad_norm 2.7322 (3.5306) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][520/625] eta 0:00:43 lr 0.000252 wd 0.0500 time 0.3968 (0.4159) data time 0.0009 (0.0017) model time 0.3959 (0.4153) loss 6.0689 (6.7865) grad_norm 3.3284 (3.5194) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][530/625] eta 0:00:39 lr 0.000252 wd 0.0500 time 0.3998 (0.4156) data time 0.0009 (0.0017) model time 0.3989 (0.4150) loss 6.3291 (6.7799) grad_norm 3.7258 (3.5162) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][540/625] eta 0:00:35 lr 0.000252 wd 0.0500 time 0.3974 (0.4153) data time 0.0008 (0.0016) model time 0.3966 (0.4146) loss 7.5489 (6.7772) grad_norm 3.3831 (3.5245) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:00:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][550/625] eta 0:00:31 lr 0.000252 wd 0.0500 time 0.3970 (0.4150) data time 0.0008 (0.0016) model time 0.3962 (0.4143) loss 6.3177 (6.7803) grad_norm 3.0829 (3.5109) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][560/625] eta 0:00:26 lr 0.000252 wd 0.0500 time 0.3952 (0.4147) data time 0.0006 (0.0016) model time 0.3945 (0.4140) loss 5.7268 (6.7753) grad_norm 2.9475 (3.4993) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][570/625] eta 0:00:22 lr 0.000252 wd 0.0500 time 0.3964 (0.4145) data time 0.0009 (0.0016) model time 0.3955 (0.4137) loss 7.6313 (6.7856) grad_norm 2.9965 (3.5079) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][580/625] eta 0:00:18 lr 0.000252 wd 0.0500 time 0.3942 (0.4142) data time 0.0006 (0.0016) model time 0.3936 (0.4134) loss 5.9907 (6.7831) grad_norm 3.4005 (3.4971) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][590/625] eta 0:00:14 lr 0.000252 wd 0.0500 time 0.4004 (0.4141) data time 0.0007 (0.0016) model time 0.3997 (0.4133) loss 7.1363 (6.7809) grad_norm 2.6188 (3.4891) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][600/625] eta 0:00:10 lr 0.000252 wd 0.0500 time 0.3988 (0.4138) data time 0.0007 (0.0016) model time 0.3981 (0.4130) loss 6.9037 (6.7787) grad_norm 2.8081 (3.4799) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][610/625] eta 0:00:06 lr 0.000252 wd 0.0500 time 0.3996 (0.4139) data time 0.0005 (0.0016) model time 0.3990 (0.4131) loss 6.1227 (6.7757) grad_norm 2.8738 (3.4691) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][620/625] eta 0:00:02 lr 0.000252 wd 0.0500 time 0.3960 (0.4145) data time 0.0005 (0.0015) model time 0.3955 (0.4137) loss 5.8296 (6.7749) grad_norm 3.5052 (3.4668) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 216 training takes 0:04:19 [2024-07-25 08:01:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:01:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5596 (0.5596) Acc@1 89.990 (89.990) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 08:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8755 (0.6868) Acc@1 81.445 (86.599) Acc@5 96.436 (97.816) Mem 14939MB [2024-07-25 08:01:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9692 (0.8006) Acc@1 77.490 (83.508) Acc@5 95.361 (96.659) Mem 14939MB [2024-07-25 08:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.127 Acc@5 96.643 [2024-07-25 08:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 08:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.742 (0.742) Loss 0.5420 (0.5420) Acc@1 89.941 (89.941) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 08:01:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 0.8389 (0.6692) Acc@1 81.738 (86.737) Acc@5 96.436 (97.865) Mem 14939MB [2024-07-25 08:01:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9604 (0.7803) Acc@1 77.588 (83.715) Acc@5 95.752 (96.826) Mem 14939MB [2024-07-25 08:01:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.293 Acc@5 96.793 [2024-07-25 08:01:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:01:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.29% [2024-07-25 08:01:33 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:01:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:01:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][0/625] eta 0:08:28 lr 0.000251 wd 0.0500 time 0.8134 (0.8134) data time 0.4372 (0.4372) model time 0.0000 (0.0000) loss 6.5418 (6.5418) grad_norm 2.5178 (2.5178) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][10/625] eta 0:05:18 lr 0.000251 wd 0.0500 time 0.5763 (0.5173) data time 0.0008 (0.0405) model time 0.0000 (0.0000) loss 6.6604 (6.7975) grad_norm 2.0971 (2.7498) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][20/625] eta 0:04:57 lr 0.000251 wd 0.0500 time 0.5246 (0.4918) data time 0.0007 (0.0216) model time 0.0000 (0.0000) loss 6.2913 (6.7362) grad_norm 2.3567 (3.3445) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][30/625] eta 0:04:49 lr 0.000251 wd 0.0500 time 0.3964 (0.4869) data time 0.0009 (0.0149) model time 0.0000 (0.0000) loss 7.5220 (6.8100) grad_norm 3.8965 (3.3458) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][40/625] eta 0:04:33 lr 0.000251 wd 0.0500 time 0.3976 (0.4682) data time 0.0009 (0.0115) model time 0.0000 (0.0000) loss 6.9633 (6.8962) grad_norm 1.9841 (3.1979) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:01:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][50/625] eta 0:04:23 lr 0.000251 wd 0.0500 time 0.4038 (0.4580) data time 0.0009 (0.0094) model time 0.0000 (0.0000) loss 6.7016 (6.9080) grad_norm 3.3227 (3.0990) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][60/625] eta 0:04:13 lr 0.000251 wd 0.0500 time 0.3958 (0.4479) data time 0.0008 (0.0080) model time 0.3950 (0.3957) loss 5.7509 (6.8419) grad_norm 2.7161 (2.9679) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][70/625] eta 0:04:04 lr 0.000251 wd 0.0500 time 0.3955 (0.4407) data time 0.0009 (0.0070) model time 0.3946 (0.3959) loss 6.4179 (6.8306) grad_norm 2.4551 (2.9197) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][80/625] eta 0:03:57 lr 0.000251 wd 0.0500 time 0.3980 (0.4355) data time 0.0009 (0.0062) model time 0.3971 (0.3964) loss 6.2571 (6.8036) grad_norm 4.4797 (2.8798) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][90/625] eta 0:03:50 lr 0.000251 wd 0.0500 time 0.3970 (0.4314) data time 0.0009 (0.0056) model time 0.3962 (0.3968) loss 7.2705 (6.7667) grad_norm 2.2914 (2.8754) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][100/625] eta 0:03:44 lr 0.000251 wd 0.0500 time 0.4002 (0.4281) data time 0.0006 (0.0051) model time 0.3996 (0.3968) loss 6.4653 (6.7772) grad_norm 2.5797 (2.8462) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][110/625] eta 0:03:39 lr 0.000251 wd 0.0500 time 0.4011 (0.4254) data time 0.0006 (0.0048) model time 0.4005 (0.3969) loss 7.6305 (6.7913) grad_norm 2.3498 (2.8296) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][120/625] eta 0:03:34 lr 0.000250 wd 0.0500 time 0.3953 (0.4241) data time 0.0009 (0.0044) model time 0.3944 (0.3987) loss 6.4385 (6.7799) grad_norm 4.6204 (2.8529) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][130/625] eta 0:03:29 lr 0.000250 wd 0.0500 time 0.3955 (0.4223) data time 0.0009 (0.0042) model time 0.3946 (0.3987) loss 7.7851 (6.7594) grad_norm 2.9848 (2.8868) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][140/625] eta 0:03:23 lr 0.000250 wd 0.0500 time 0.4011 (0.4206) data time 0.0009 (0.0039) model time 0.4002 (0.3985) loss 6.4569 (6.7576) grad_norm 2.3371 (2.8735) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][150/625] eta 0:03:19 lr 0.000250 wd 0.0500 time 0.3999 (0.4191) data time 0.0006 (0.0037) model time 0.3992 (0.3984) loss 8.1271 (6.7691) grad_norm 3.1821 (3.0454) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][160/625] eta 0:03:14 lr 0.000250 wd 0.0500 time 0.3996 (0.4179) data time 0.0006 (0.0035) model time 0.3989 (0.3984) loss 7.3564 (6.7662) grad_norm 4.7039 (3.0662) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][170/625] eta 0:03:09 lr 0.000250 wd 0.0500 time 0.3993 (0.4168) data time 0.0006 (0.0034) model time 0.3987 (0.3985) loss 6.6750 (6.7574) grad_norm 2.0771 (3.0980) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][180/625] eta 0:03:05 lr 0.000250 wd 0.0500 time 0.3978 (0.4158) data time 0.0007 (0.0032) model time 0.3971 (0.3984) loss 6.4519 (6.7539) grad_norm 2.2898 (3.1716) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][190/625] eta 0:03:00 lr 0.000250 wd 0.0500 time 0.3984 (0.4149) data time 0.0009 (0.0031) model time 0.3976 (0.3984) loss 5.5076 (6.7359) grad_norm 2.6502 (3.1795) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][200/625] eta 0:02:56 lr 0.000250 wd 0.0500 time 0.3969 (0.4150) data time 0.0009 (0.0030) model time 0.3960 (0.3995) loss 6.9168 (6.7471) grad_norm 2.5544 (3.1710) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][210/625] eta 0:02:52 lr 0.000250 wd 0.0500 time 0.6084 (0.4152) data time 0.0006 (0.0029) model time 0.6077 (0.4008) loss 6.8776 (6.7470) grad_norm 2.5679 (3.1489) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][220/625] eta 0:02:49 lr 0.000250 wd 0.0500 time 0.3967 (0.4178) data time 0.0006 (0.0028) model time 0.3961 (0.4050) loss 7.3314 (6.7591) grad_norm 2.1438 (3.1382) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][230/625] eta 0:02:46 lr 0.000250 wd 0.0500 time 0.5643 (0.4208) data time 0.0006 (0.0027) model time 0.5637 (0.4094) loss 7.0752 (6.7502) grad_norm 4.0165 (3.1375) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][240/625] eta 0:02:42 lr 0.000249 wd 0.0500 time 0.4051 (0.4228) data time 0.0007 (0.0026) model time 0.4044 (0.4125) loss 7.0320 (6.7507) grad_norm 2.2787 (3.1399) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][250/625] eta 0:02:39 lr 0.000249 wd 0.0500 time 0.5651 (0.4248) data time 0.0006 (0.0026) model time 0.5645 (0.4156) loss 7.7225 (6.7664) grad_norm 4.1871 (3.1566) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][260/625] eta 0:02:35 lr 0.000249 wd 0.0500 time 0.5650 (0.4249) data time 0.0006 (0.0025) model time 0.5644 (0.4161) loss 6.4391 (6.7683) grad_norm 2.0643 (3.1394) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][270/625] eta 0:02:30 lr 0.000249 wd 0.0500 time 0.3971 (0.4240) data time 0.0007 (0.0024) model time 0.3964 (0.4153) loss 7.4939 (6.7718) grad_norm 1.6462 (3.1308) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][280/625] eta 0:02:25 lr 0.000249 wd 0.0500 time 0.3995 (0.4232) data time 0.0010 (0.0024) model time 0.3985 (0.4146) loss 7.5870 (6.7696) grad_norm 2.3669 (3.1597) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][290/625] eta 0:02:21 lr 0.000249 wd 0.0500 time 0.3976 (0.4224) data time 0.0007 (0.0023) model time 0.3969 (0.4140) loss 6.8688 (6.7675) grad_norm 4.0796 (3.1549) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][300/625] eta 0:02:17 lr 0.000249 wd 0.0500 time 0.3993 (0.4216) data time 0.0007 (0.0023) model time 0.3986 (0.4134) loss 8.3677 (6.7633) grad_norm 4.6460 (3.1662) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][310/625] eta 0:02:12 lr 0.000249 wd 0.0500 time 0.3974 (0.4209) data time 0.0007 (0.0022) model time 0.3967 (0.4128) loss 6.8546 (6.7579) grad_norm 2.1117 (3.1736) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][320/625] eta 0:02:08 lr 0.000249 wd 0.0500 time 0.3972 (0.4203) data time 0.0009 (0.0022) model time 0.3963 (0.4123) loss 6.5913 (6.7725) grad_norm 3.2493 (3.1594) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][330/625] eta 0:02:03 lr 0.000249 wd 0.0500 time 0.4164 (0.4198) data time 0.0009 (0.0021) model time 0.4155 (0.4120) loss 6.5771 (6.7685) grad_norm 5.2410 (3.1822) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:03:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][340/625] eta 0:01:59 lr 0.000249 wd 0.0500 time 0.3996 (0.4197) data time 0.0007 (0.0021) model time 0.3989 (0.4121) loss 6.8737 (6.7582) grad_norm 4.5339 (3.2375) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][350/625] eta 0:01:55 lr 0.000248 wd 0.0500 time 0.3973 (0.4190) data time 0.0007 (0.0021) model time 0.3966 (0.4116) loss 7.1378 (6.7522) grad_norm 3.8721 (3.2384) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][360/625] eta 0:01:50 lr 0.000248 wd 0.0500 time 0.4028 (0.4185) data time 0.0008 (0.0020) model time 0.4020 (0.4112) loss 8.1838 (6.7541) grad_norm 3.3835 (3.2807) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][370/625] eta 0:01:46 lr 0.000248 wd 0.0500 time 0.3936 (0.4180) data time 0.0009 (0.0020) model time 0.3927 (0.4108) loss 7.6124 (6.7548) grad_norm 2.0928 (3.2571) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][380/625] eta 0:01:42 lr 0.000248 wd 0.0500 time 0.3988 (0.4175) data time 0.0007 (0.0020) model time 0.3981 (0.4104) loss 5.8884 (6.7568) grad_norm 3.1145 (3.2587) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][390/625] eta 0:01:38 lr 0.000248 wd 0.0500 time 0.3996 (0.4170) data time 0.0007 (0.0019) model time 0.3989 (0.4101) loss 7.0087 (6.7633) grad_norm 3.8203 (3.2581) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][400/625] eta 0:01:33 lr 0.000248 wd 0.0500 time 0.3988 (0.4166) data time 0.0007 (0.0019) model time 0.3981 (0.4097) loss 7.3379 (6.7672) grad_norm 3.1290 (3.2495) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][410/625] eta 0:01:29 lr 0.000248 wd 0.0500 time 0.3956 (0.4161) data time 0.0009 (0.0019) model time 0.3947 (0.4094) loss 7.2753 (6.7718) grad_norm 3.5157 (3.2842) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][420/625] eta 0:01:25 lr 0.000248 wd 0.0500 time 0.3963 (0.4162) data time 0.0008 (0.0019) model time 0.3955 (0.4096) loss 6.6254 (6.7660) grad_norm 3.4654 (3.2787) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][430/625] eta 0:01:21 lr 0.000248 wd 0.0500 time 0.4069 (0.4163) data time 0.0007 (0.0018) model time 0.4062 (0.4099) loss 7.0890 (6.7670) grad_norm 8.5735 (3.2884) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][440/625] eta 0:01:17 lr 0.000248 wd 0.0500 time 0.5881 (0.4172) data time 0.0009 (0.0018) model time 0.5872 (0.4110) loss 6.4455 (6.7635) grad_norm 3.2415 (3.2962) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][450/625] eta 0:01:13 lr 0.000248 wd 0.0500 time 0.5954 (0.4186) data time 0.0009 (0.0018) model time 0.5945 (0.4127) loss 7.7021 (6.7732) grad_norm 2.4889 (3.3031) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][460/625] eta 0:01:09 lr 0.000248 wd 0.0500 time 0.3963 (0.4196) data time 0.0009 (0.0018) model time 0.3954 (0.4140) loss 7.2036 (6.7749) grad_norm 2.3819 (3.3098) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][470/625] eta 0:01:05 lr 0.000247 wd 0.0500 time 0.5853 (0.4211) data time 0.0006 (0.0017) model time 0.5846 (0.4158) loss 6.6832 (6.7679) grad_norm 5.1166 (3.3159) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:04:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][480/625] eta 0:01:01 lr 0.000247 wd 0.0500 time 0.4005 (0.4210) data time 0.0007 (0.0017) model time 0.3998 (0.4158) loss 6.8071 (6.7726) grad_norm 3.2514 (3.3295) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:05:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][490/625] eta 0:00:56 lr 0.000247 wd 0.0500 time 0.4085 (0.4209) data time 0.0007 (0.0017) model time 0.4078 (0.4158) loss 5.7563 (6.7710) grad_norm 2.8248 (3.3520) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:05:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][500/625] eta 0:00:52 lr 0.000247 wd 0.0500 time 0.3989 (0.4205) data time 0.0007 (0.0017) model time 0.3983 (0.4154) loss 6.5598 (6.7780) grad_norm 3.1376 (3.3571) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:05:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][510/625] eta 0:00:48 lr 0.000247 wd 0.0500 time 0.3965 (0.4200) data time 0.0009 (0.0017) model time 0.3956 (0.4150) loss 7.0034 (6.7704) grad_norm 2.5463 (3.3470) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:05:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][520/625] eta 0:00:44 lr 0.000247 wd 0.0500 time 0.3958 (0.4196) data time 0.0008 (0.0017) model time 0.3950 (0.4146) loss 6.4801 (6.7728) grad_norm 2.8321 (3.3683) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:05:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][530/625] eta 0:00:39 lr 0.000247 wd 0.0500 time 0.3980 (0.4192) data time 0.0008 (0.0016) model time 0.3972 (0.4142) loss 7.3119 (6.7705) grad_norm 2.4726 (3.3493) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:05:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][540/625] eta 0:00:35 lr 0.000247 wd 0.0500 time 0.4067 (0.4188) data time 0.0007 (0.0016) model time 0.4060 (0.4139) loss 6.5085 (6.7760) grad_norm 2.3638 (3.3330) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:05:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][550/625] eta 0:00:31 lr 0.000247 wd 0.0500 time 0.3952 (0.4184) data time 0.0007 (0.0016) model time 0.3944 (0.4135) loss 7.2520 (6.7791) grad_norm 3.1413 (inf) loss_scale 128.0000 (254.3739) mem 14939MB [2024-07-25 08:05:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][560/625] eta 0:00:27 lr 0.000247 wd 0.0500 time 0.3999 (0.4184) data time 0.0006 (0.0016) model time 0.3993 (0.4136) loss 6.2328 (6.7807) grad_norm 2.4586 (inf) loss_scale 128.0000 (252.1212) mem 14939MB [2024-07-25 08:05:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][570/625] eta 0:00:22 lr 0.000247 wd 0.0500 time 0.3933 (0.4180) data time 0.0009 (0.0016) model time 0.3924 (0.4133) loss 7.5992 (6.7845) grad_norm 2.8126 (inf) loss_scale 128.0000 (249.9475) mem 14939MB [2024-07-25 08:05:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][580/625] eta 0:00:18 lr 0.000247 wd 0.0500 time 0.4012 (0.4177) data time 0.0008 (0.0016) model time 0.4004 (0.4130) loss 7.2820 (6.7939) grad_norm 5.3340 (inf) loss_scale 128.0000 (247.8485) mem 14939MB [2024-07-25 08:05:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][590/625] eta 0:00:14 lr 0.000246 wd 0.0500 time 0.3971 (0.4174) data time 0.0007 (0.0016) model time 0.3964 (0.4127) loss 7.6527 (6.7942) grad_norm 3.0014 (inf) loss_scale 128.0000 (245.8206) mem 14939MB [2024-07-25 08:05:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][600/625] eta 0:00:10 lr 0.000246 wd 0.0500 time 0.3987 (0.4171) data time 0.0007 (0.0015) model time 0.3979 (0.4125) loss 6.8666 (6.7967) grad_norm 2.0649 (inf) loss_scale 128.0000 (243.8602) mem 14939MB [2024-07-25 08:05:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][610/625] eta 0:00:06 lr 0.000246 wd 0.0500 time 0.4000 (0.4168) data time 0.0005 (0.0015) model time 0.3995 (0.4122) loss 7.1201 (6.7987) grad_norm 2.3738 (inf) loss_scale 128.0000 (241.9640) mem 14939MB [2024-07-25 08:05:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][620/625] eta 0:00:02 lr 0.000246 wd 0.0500 time 0.3969 (0.4165) data time 0.0004 (0.0015) model time 0.3965 (0.4120) loss 6.1106 (6.7943) grad_norm 4.0308 (inf) loss_scale 128.0000 (240.1288) mem 14939MB [2024-07-25 08:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 217 training takes 0:04:20 [2024-07-25 08:05:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:05:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:05:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5562 (0.5562) Acc@1 89.209 (89.209) Acc@5 98.682 (98.682) Mem 14939MB [2024-07-25 08:05:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8677 (0.6781) Acc@1 81.104 (86.634) Acc@5 96.387 (97.803) Mem 14939MB [2024-07-25 08:05:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9673 (0.7913) Acc@1 77.734 (83.519) Acc@5 95.117 (96.666) Mem 14939MB [2024-07-25 08:05:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.099 Acc@5 96.623 [2024-07-25 08:05:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-07-25 08:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.762 (0.762) Loss 0.5425 (0.5425) Acc@1 89.893 (89.893) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 08:06:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.8379 (0.6688) Acc@1 81.689 (86.714) Acc@5 96.436 (97.856) Mem 14939MB [2024-07-25 08:06:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9595 (0.7798) Acc@1 77.637 (83.710) Acc@5 95.703 (96.819) Mem 14939MB [2024-07-25 08:06:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.293 Acc@5 96.785 [2024-07-25 08:06:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:06:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][0/625] eta 0:13:46 lr 0.000246 wd 0.0500 time 1.3224 (1.3224) data time 0.4631 (0.4631) model time 0.0000 (0.0000) loss 7.7701 (7.7701) grad_norm 2.7592 (2.7592) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][10/625] eta 0:04:56 lr 0.000246 wd 0.0500 time 0.3965 (0.4814) data time 0.0009 (0.0429) model time 0.0000 (0.0000) loss 7.5556 (7.3812) grad_norm 3.4354 (2.8328) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][20/625] eta 0:04:32 lr 0.000246 wd 0.0500 time 0.4019 (0.4501) data time 0.0007 (0.0229) model time 0.0000 (0.0000) loss 6.8477 (7.1305) grad_norm 2.3791 (2.6576) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][30/625] eta 0:04:30 lr 0.000246 wd 0.0500 time 0.5613 (0.4539) data time 0.0007 (0.0157) model time 0.0000 (0.0000) loss 7.2355 (7.0763) grad_norm 2.7919 (2.6217) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][40/625] eta 0:04:27 lr 0.000246 wd 0.0500 time 0.3978 (0.4577) data time 0.0009 (0.0121) model time 0.0000 (0.0000) loss 7.4584 (7.0946) grad_norm 2.1947 (2.6486) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][50/625] eta 0:04:27 lr 0.000246 wd 0.0500 time 0.5887 (0.4651) data time 0.0007 (0.0099) model time 0.0000 (0.0000) loss 6.1796 (7.0886) grad_norm 3.2923 (2.6546) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][60/625] eta 0:04:23 lr 0.000246 wd 0.0500 time 0.5805 (0.4665) data time 0.0006 (0.0084) model time 0.5799 (0.4727) loss 6.6277 (7.0043) grad_norm 2.5280 (2.6553) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][70/625] eta 0:04:16 lr 0.000246 wd 0.0500 time 0.3963 (0.4614) data time 0.0007 (0.0073) model time 0.3957 (0.4511) loss 6.2989 (6.9609) grad_norm 2.2898 (2.6511) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][80/625] eta 0:04:09 lr 0.000245 wd 0.0500 time 0.4045 (0.4580) data time 0.0009 (0.0065) model time 0.4036 (0.4453) loss 8.0828 (6.9716) grad_norm 2.5714 (2.6465) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][90/625] eta 0:04:01 lr 0.000245 wd 0.0500 time 0.4070 (0.4515) data time 0.0009 (0.0059) model time 0.4061 (0.4335) loss 6.7783 (6.9673) grad_norm 3.4211 (2.7242) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][100/625] eta 0:03:54 lr 0.000245 wd 0.0500 time 0.4006 (0.4464) data time 0.0006 (0.0054) model time 0.3999 (0.4265) loss 6.9123 (6.9451) grad_norm 3.8714 (2.6960) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][110/625] eta 0:03:47 lr 0.000245 wd 0.0500 time 0.3993 (0.4421) data time 0.0016 (0.0050) model time 0.3977 (0.4218) loss 6.2832 (6.9056) grad_norm 2.5491 (2.7428) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][120/625] eta 0:03:41 lr 0.000245 wd 0.0500 time 0.3996 (0.4386) data time 0.0009 (0.0047) model time 0.3987 (0.4186) loss 7.3349 (6.8685) grad_norm 2.7436 (2.7290) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:06:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][130/625] eta 0:03:36 lr 0.000245 wd 0.0500 time 0.3966 (0.4368) data time 0.0009 (0.0044) model time 0.3957 (0.4180) loss 6.1354 (6.8585) grad_norm 2.8748 (2.7872) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][140/625] eta 0:03:30 lr 0.000245 wd 0.0500 time 0.3984 (0.4342) data time 0.0008 (0.0041) model time 0.3976 (0.4158) loss 5.8089 (6.8372) grad_norm 3.0510 (2.8400) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][150/625] eta 0:03:25 lr 0.000245 wd 0.0500 time 0.3998 (0.4319) data time 0.0007 (0.0039) model time 0.3991 (0.4142) loss 6.0665 (6.8277) grad_norm 2.3813 (2.8418) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][160/625] eta 0:03:19 lr 0.000245 wd 0.0500 time 0.4072 (0.4300) data time 0.0008 (0.0037) model time 0.4063 (0.4129) loss 6.6457 (6.8132) grad_norm 2.5813 (2.8717) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][170/625] eta 0:03:14 lr 0.000245 wd 0.0500 time 0.3995 (0.4283) data time 0.0009 (0.0035) model time 0.3987 (0.4118) loss 7.5628 (6.8035) grad_norm 4.5397 (2.9449) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][180/625] eta 0:03:09 lr 0.000245 wd 0.0500 time 0.4013 (0.4268) data time 0.0006 (0.0034) model time 0.4007 (0.4109) loss 5.5292 (6.7933) grad_norm 3.6395 (2.9507) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][190/625] eta 0:03:05 lr 0.000245 wd 0.0500 time 0.3999 (0.4253) data time 0.0008 (0.0032) model time 0.3991 (0.4100) loss 7.5066 (6.8191) grad_norm 2.7671 (2.9145) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][200/625] eta 0:03:00 lr 0.000244 wd 0.0500 time 0.3997 (0.4241) data time 0.0006 (0.0031) model time 0.3991 (0.4093) loss 6.3280 (6.8037) grad_norm 2.1777 (2.9337) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][210/625] eta 0:02:55 lr 0.000244 wd 0.0500 time 0.3986 (0.4229) data time 0.0007 (0.0030) model time 0.3979 (0.4086) loss 5.7212 (6.7905) grad_norm 2.0951 (2.9088) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][220/625] eta 0:02:50 lr 0.000244 wd 0.0500 time 0.3997 (0.4218) data time 0.0006 (0.0029) model time 0.3991 (0.4080) loss 5.3863 (6.7896) grad_norm 2.0149 (2.9058) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][230/625] eta 0:02:46 lr 0.000244 wd 0.0500 time 0.3973 (0.4209) data time 0.0008 (0.0028) model time 0.3964 (0.4075) loss 5.5895 (6.7905) grad_norm 2.8606 (2.9250) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][240/625] eta 0:02:41 lr 0.000244 wd 0.0500 time 0.4009 (0.4205) data time 0.0009 (0.0027) model time 0.4000 (0.4077) loss 6.7128 (6.7927) grad_norm 3.7962 (2.9543) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][250/625] eta 0:02:37 lr 0.000244 wd 0.0500 time 0.3940 (0.4209) data time 0.0006 (0.0027) model time 0.3934 (0.4089) loss 7.1395 (6.8019) grad_norm 2.7193 (2.9503) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][260/625] eta 0:02:34 lr 0.000244 wd 0.0500 time 0.3989 (0.4229) data time 0.0008 (0.0026) model time 0.3981 (0.4119) loss 6.7697 (6.7949) grad_norm 2.4224 (2.9458) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:07:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][270/625] eta 0:02:31 lr 0.000244 wd 0.0500 time 0.5640 (0.4260) data time 0.0010 (0.0025) model time 0.5630 (0.4161) loss 8.0947 (6.8016) grad_norm 2.2705 (2.9282) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][280/625] eta 0:02:27 lr 0.000244 wd 0.0500 time 0.5528 (0.4272) data time 0.0008 (0.0025) model time 0.5520 (0.4180) loss 7.4619 (6.7979) grad_norm 1.9380 (2.9094) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][290/625] eta 0:02:23 lr 0.000244 wd 0.0500 time 0.3954 (0.4280) data time 0.0009 (0.0024) model time 0.3945 (0.4193) loss 5.8753 (6.7880) grad_norm 2.5835 (2.9083) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][300/625] eta 0:02:19 lr 0.000244 wd 0.0500 time 0.3986 (0.4280) data time 0.0007 (0.0024) model time 0.3979 (0.4196) loss 6.6293 (6.7827) grad_norm 2.5532 (2.9063) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][310/625] eta 0:02:14 lr 0.000244 wd 0.0500 time 0.3973 (0.4270) data time 0.0007 (0.0023) model time 0.3966 (0.4187) loss 6.3501 (6.7792) grad_norm 2.1792 (2.8952) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][320/625] eta 0:02:09 lr 0.000243 wd 0.0500 time 0.3994 (0.4261) data time 0.0010 (0.0023) model time 0.3984 (0.4179) loss 7.1912 (6.7777) grad_norm 3.3090 (2.9197) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][330/625] eta 0:02:05 lr 0.000243 wd 0.0500 time 0.3981 (0.4253) data time 0.0011 (0.0022) model time 0.3970 (0.4172) loss 6.9506 (6.7778) grad_norm 2.4154 (2.9171) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][340/625] eta 0:02:00 lr 0.000243 wd 0.0500 time 0.3965 (0.4245) data time 0.0008 (0.0022) model time 0.3957 (0.4165) loss 6.5847 (6.7783) grad_norm 3.1846 (2.9156) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][350/625] eta 0:01:56 lr 0.000243 wd 0.0500 time 0.3978 (0.4243) data time 0.0007 (0.0021) model time 0.3971 (0.4165) loss 6.6249 (6.7758) grad_norm 3.4956 (2.9216) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][360/625] eta 0:01:52 lr 0.000243 wd 0.0500 time 0.3962 (0.4236) data time 0.0006 (0.0021) model time 0.3955 (0.4160) loss 7.7845 (6.7677) grad_norm 2.1616 (2.9244) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][370/625] eta 0:01:47 lr 0.000243 wd 0.0500 time 0.4094 (0.4229) data time 0.0008 (0.0021) model time 0.4086 (0.4154) loss 6.8442 (6.7747) grad_norm 2.5529 (2.9253) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][380/625] eta 0:01:43 lr 0.000243 wd 0.0500 time 0.3974 (0.4224) data time 0.0008 (0.0020) model time 0.3966 (0.4150) loss 6.3898 (6.7780) grad_norm 2.5654 (2.9108) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][390/625] eta 0:01:39 lr 0.000243 wd 0.0500 time 0.3942 (0.4218) data time 0.0007 (0.0020) model time 0.3935 (0.4145) loss 7.5820 (6.7801) grad_norm 3.1502 (2.9041) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][400/625] eta 0:01:34 lr 0.000243 wd 0.0500 time 0.3956 (0.4212) data time 0.0009 (0.0020) model time 0.3947 (0.4139) loss 7.7881 (6.7815) grad_norm 2.1622 (3.0473) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][410/625] eta 0:01:30 lr 0.000243 wd 0.0500 time 0.4036 (0.4206) data time 0.0009 (0.0019) model time 0.4027 (0.4135) loss 7.2381 (6.7896) grad_norm 2.7362 (3.0397) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:08:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][420/625] eta 0:01:26 lr 0.000243 wd 0.0500 time 0.3967 (0.4201) data time 0.0008 (0.0019) model time 0.3959 (0.4130) loss 7.5007 (6.7938) grad_norm 2.7196 (3.0439) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][430/625] eta 0:01:21 lr 0.000243 wd 0.0500 time 0.3986 (0.4196) data time 0.0007 (0.0019) model time 0.3979 (0.4126) loss 7.2119 (6.7905) grad_norm 3.1382 (3.0550) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][440/625] eta 0:01:17 lr 0.000242 wd 0.0500 time 0.4002 (0.4191) data time 0.0008 (0.0019) model time 0.3994 (0.4123) loss 7.2779 (6.7838) grad_norm 2.1090 (3.0560) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][450/625] eta 0:01:13 lr 0.000242 wd 0.0500 time 0.3995 (0.4187) data time 0.0007 (0.0018) model time 0.3988 (0.4119) loss 7.0277 (6.7807) grad_norm 3.2662 (3.0641) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][460/625] eta 0:01:09 lr 0.000242 wd 0.0500 time 0.3974 (0.4185) data time 0.0009 (0.0018) model time 0.3965 (0.4119) loss 6.7532 (6.7831) grad_norm 2.9985 (3.0895) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][470/625] eta 0:01:04 lr 0.000242 wd 0.0500 time 0.5774 (0.4191) data time 0.0008 (0.0018) model time 0.5766 (0.4127) loss 6.7277 (6.7798) grad_norm 20.6702 (3.1133) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][480/625] eta 0:01:00 lr 0.000242 wd 0.0500 time 0.5963 (0.4206) data time 0.0007 (0.0018) model time 0.5956 (0.4145) loss 6.3575 (6.7820) grad_norm 2.5920 (3.1035) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][490/625] eta 0:00:56 lr 0.000242 wd 0.0500 time 0.6010 (0.4221) data time 0.0009 (0.0018) model time 0.6000 (0.4163) loss 5.9394 (6.7831) grad_norm 3.9895 (3.1064) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][500/625] eta 0:00:52 lr 0.000242 wd 0.0500 time 0.5881 (0.4232) data time 0.0006 (0.0017) model time 0.5875 (0.4177) loss 7.7350 (6.7885) grad_norm 3.2643 (3.1132) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][510/625] eta 0:00:48 lr 0.000242 wd 0.0500 time 0.3973 (0.4237) data time 0.0009 (0.0017) model time 0.3965 (0.4183) loss 7.2712 (6.7960) grad_norm 2.9363 (3.1202) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][520/625] eta 0:00:44 lr 0.000242 wd 0.0500 time 0.4199 (0.4237) data time 0.0009 (0.0017) model time 0.4190 (0.4184) loss 6.0662 (6.7907) grad_norm 3.5595 (3.1187) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][530/625] eta 0:00:40 lr 0.000242 wd 0.0500 time 0.3988 (0.4232) data time 0.0007 (0.0017) model time 0.3981 (0.4180) loss 8.2922 (6.7890) grad_norm 3.5180 (3.1666) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][540/625] eta 0:00:35 lr 0.000242 wd 0.0500 time 0.3943 (0.4228) data time 0.0006 (0.0017) model time 0.3937 (0.4176) loss 6.6425 (6.7879) grad_norm 3.2380 (3.1619) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][550/625] eta 0:00:31 lr 0.000242 wd 0.0500 time 0.3973 (0.4223) data time 0.0006 (0.0017) model time 0.3967 (0.4171) loss 6.8081 (6.7869) grad_norm 4.2566 (3.1795) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:09:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][560/625] eta 0:00:27 lr 0.000241 wd 0.0500 time 0.6243 (0.4223) data time 0.0008 (0.0016) model time 0.6235 (0.4172) loss 5.5642 (6.7855) grad_norm 2.2806 (3.1867) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][570/625] eta 0:00:23 lr 0.000241 wd 0.0500 time 0.4156 (0.4218) data time 0.0007 (0.0016) model time 0.4149 (0.4168) loss 7.3214 (6.7887) grad_norm 2.1519 (3.1855) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][580/625] eta 0:00:18 lr 0.000241 wd 0.0500 time 0.3960 (0.4214) data time 0.0006 (0.0016) model time 0.3954 (0.4164) loss 6.6392 (6.7902) grad_norm 1.9561 (3.1745) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][590/625] eta 0:00:14 lr 0.000241 wd 0.0500 time 0.4007 (0.4210) data time 0.0006 (0.0016) model time 0.4001 (0.4161) loss 7.2197 (6.7912) grad_norm 2.5551 (3.1648) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][600/625] eta 0:00:10 lr 0.000241 wd 0.0500 time 0.4043 (0.4207) data time 0.0006 (0.0016) model time 0.4037 (0.4158) loss 6.4263 (6.7867) grad_norm 3.0705 (3.1623) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][610/625] eta 0:00:06 lr 0.000241 wd 0.0500 time 0.3990 (0.4203) data time 0.0005 (0.0016) model time 0.3984 (0.4154) loss 8.0426 (6.7928) grad_norm 2.7695 (3.1651) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][620/625] eta 0:00:02 lr 0.000241 wd 0.0500 time 0.3921 (0.4199) data time 0.0006 (0.0016) model time 0.3915 (0.4151) loss 5.7319 (6.7850) grad_norm 4.3552 (3.1698) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 218 training takes 0:04:22 [2024-07-25 08:10:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:10:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:10:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5498 (0.5498) Acc@1 90.234 (90.234) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 08:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8564 (0.6741) Acc@1 81.299 (86.754) Acc@5 96.387 (97.745) Mem 14939MB [2024-07-25 08:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9639 (0.7819) Acc@1 76.904 (83.731) Acc@5 94.727 (96.647) Mem 14939MB [2024-07-25 08:10:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.347 Acc@5 96.597 [2024-07-25 08:10:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:10:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.35% [2024-07-25 08:10:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 08:10:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 08:10:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5420 (0.5420) Acc@1 89.941 (89.941) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 08:10:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.8379 (0.6686) Acc@1 81.934 (86.750) Acc@5 96.533 (97.878) Mem 14939MB [2024-07-25 08:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9595 (0.7793) Acc@1 77.637 (83.747) Acc@5 95.605 (96.831) Mem 14939MB [2024-07-25 08:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.333 Acc@5 96.793 [2024-07-25 08:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.33% [2024-07-25 08:10:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:10:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:10:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][0/625] eta 0:08:21 lr 0.000241 wd 0.0500 time 0.8020 (0.8020) data time 0.4256 (0.4256) model time 0.0000 (0.0000) loss 5.1811 (5.1811) grad_norm 2.0342 (2.0342) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][10/625] eta 0:04:28 lr 0.000241 wd 0.0500 time 0.4022 (0.4360) data time 0.0009 (0.0395) model time 0.0000 (0.0000) loss 6.2765 (6.6934) grad_norm 3.6259 (3.1121) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][20/625] eta 0:04:12 lr 0.000241 wd 0.0500 time 0.3979 (0.4177) data time 0.0007 (0.0211) model time 0.0000 (0.0000) loss 5.2148 (6.7766) grad_norm 3.2913 (3.3658) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][30/625] eta 0:04:04 lr 0.000241 wd 0.0500 time 0.4002 (0.4115) data time 0.0007 (0.0145) model time 0.0000 (0.0000) loss 7.3170 (6.9204) grad_norm 2.4125 (3.9827) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][40/625] eta 0:03:59 lr 0.000241 wd 0.0500 time 0.4070 (0.4086) data time 0.0009 (0.0112) model time 0.0000 (0.0000) loss 6.5547 (6.8715) grad_norm 2.2968 (3.7752) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][50/625] eta 0:03:55 lr 0.000240 wd 0.0500 time 0.5552 (0.4100) data time 0.0007 (0.0091) model time 0.0000 (0.0000) loss 6.9505 (6.8943) grad_norm 3.6739 (3.7141) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:10:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][60/625] eta 0:03:52 lr 0.000240 wd 0.0500 time 0.5632 (0.4108) data time 0.0009 (0.0078) model time 0.5623 (0.4140) loss 6.5667 (6.8155) grad_norm 2.3490 (3.5185) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][70/625] eta 0:03:51 lr 0.000240 wd 0.0500 time 0.5891 (0.4163) data time 0.0008 (0.0068) model time 0.5883 (0.4315) loss 6.2193 (6.8059) grad_norm 2.6345 (3.5496) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][80/625] eta 0:03:49 lr 0.000240 wd 0.0500 time 0.5367 (0.4206) data time 0.0006 (0.0061) model time 0.5360 (0.4376) loss 7.0540 (6.7890) grad_norm 5.7259 (3.4678) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][90/625] eta 0:03:49 lr 0.000240 wd 0.0500 time 0.5896 (0.4282) data time 0.0009 (0.0055) model time 0.5887 (0.4505) loss 7.2548 (6.7965) grad_norm 2.0462 (3.3658) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][100/625] eta 0:03:46 lr 0.000240 wd 0.0500 time 0.6002 (0.4308) data time 0.0007 (0.0050) model time 0.5994 (0.4512) loss 5.9502 (6.8008) grad_norm 3.5634 (3.3057) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][110/625] eta 0:03:42 lr 0.000240 wd 0.0500 time 0.5261 (0.4324) data time 0.0007 (0.0047) model time 0.5255 (0.4505) loss 5.8482 (6.7960) grad_norm 2.2192 (3.2936) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][120/625] eta 0:03:37 lr 0.000240 wd 0.0500 time 0.3956 (0.4308) data time 0.0009 (0.0043) model time 0.3947 (0.4452) loss 6.4066 (6.7969) grad_norm 3.0225 (3.3647) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][130/625] eta 0:03:32 lr 0.000240 wd 0.0500 time 0.3986 (0.4283) data time 0.0007 (0.0041) model time 0.3979 (0.4392) loss 7.4172 (6.7883) grad_norm 3.3486 (3.3606) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][140/625] eta 0:03:26 lr 0.000240 wd 0.0500 time 0.3941 (0.4261) data time 0.0007 (0.0038) model time 0.3933 (0.4344) loss 6.4547 (6.7953) grad_norm 3.2364 (3.3137) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][150/625] eta 0:03:21 lr 0.000240 wd 0.0500 time 0.4021 (0.4243) data time 0.0007 (0.0036) model time 0.4015 (0.4307) loss 7.4253 (6.7578) grad_norm 2.9608 (3.2800) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][160/625] eta 0:03:16 lr 0.000240 wd 0.0500 time 0.3961 (0.4227) data time 0.0007 (0.0035) model time 0.3954 (0.4277) loss 7.1104 (6.7697) grad_norm 3.0773 (3.2459) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][170/625] eta 0:03:11 lr 0.000239 wd 0.0500 time 0.3979 (0.4212) data time 0.0008 (0.0033) model time 0.3972 (0.4252) loss 7.4629 (6.7672) grad_norm 3.0968 (3.2567) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][180/625] eta 0:03:06 lr 0.000239 wd 0.0500 time 0.3972 (0.4201) data time 0.0008 (0.0032) model time 0.3964 (0.4232) loss 6.2238 (6.7609) grad_norm 8.2433 (3.3476) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][190/625] eta 0:03:02 lr 0.000239 wd 0.0500 time 0.4161 (0.4191) data time 0.0009 (0.0030) model time 0.4152 (0.4216) loss 6.1775 (6.7705) grad_norm 4.3993 (3.3808) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][200/625] eta 0:02:57 lr 0.000239 wd 0.0500 time 0.3979 (0.4183) data time 0.0007 (0.0029) model time 0.3972 (0.4203) loss 6.3582 (6.7639) grad_norm 3.4353 (3.3746) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:11:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][210/625] eta 0:02:53 lr 0.000239 wd 0.0500 time 0.4009 (0.4174) data time 0.0009 (0.0028) model time 0.4000 (0.4190) loss 7.6026 (6.7632) grad_norm 2.0963 (3.3684) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][220/625] eta 0:02:48 lr 0.000239 wd 0.0500 time 0.3932 (0.4167) data time 0.0006 (0.0027) model time 0.3925 (0.4179) loss 5.5942 (6.7564) grad_norm 3.0086 (3.3555) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][230/625] eta 0:02:44 lr 0.000239 wd 0.0500 time 0.3965 (0.4160) data time 0.0008 (0.0027) model time 0.3957 (0.4168) loss 6.2576 (6.7598) grad_norm 2.0387 (3.3562) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][240/625] eta 0:02:39 lr 0.000239 wd 0.0500 time 0.3982 (0.4152) data time 0.0009 (0.0026) model time 0.3973 (0.4157) loss 6.8767 (6.7584) grad_norm 2.9481 (3.3387) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][250/625] eta 0:02:35 lr 0.000239 wd 0.0500 time 0.3960 (0.4145) data time 0.0006 (0.0025) model time 0.3954 (0.4148) loss 7.0749 (6.7408) grad_norm 2.3380 (3.3187) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][260/625] eta 0:02:31 lr 0.000239 wd 0.0500 time 0.3969 (0.4139) data time 0.0008 (0.0024) model time 0.3961 (0.4140) loss 7.1974 (6.7520) grad_norm 2.0642 (3.2836) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][270/625] eta 0:02:26 lr 0.000239 wd 0.0500 time 0.3970 (0.4139) data time 0.0009 (0.0024) model time 0.3961 (0.4139) loss 5.8307 (6.7569) grad_norm 2.1177 (3.2558) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][280/625] eta 0:02:22 lr 0.000239 wd 0.0500 time 0.6068 (0.4141) data time 0.0009 (0.0023) model time 0.6059 (0.4142) loss 6.0958 (6.7709) grad_norm 3.2275 (3.2454) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][290/625] eta 0:02:19 lr 0.000238 wd 0.0500 time 0.5952 (0.4159) data time 0.0010 (0.0023) model time 0.5943 (0.4164) loss 7.1511 (6.7666) grad_norm 4.3288 (3.2889) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][300/625] eta 0:02:15 lr 0.000238 wd 0.0500 time 0.5788 (0.4179) data time 0.0010 (0.0022) model time 0.5778 (0.4187) loss 7.2888 (6.7744) grad_norm 3.0154 (3.3322) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][310/625] eta 0:02:12 lr 0.000238 wd 0.0500 time 0.5822 (0.4202) data time 0.0006 (0.0022) model time 0.5816 (0.4214) loss 6.6601 (6.7612) grad_norm 3.1624 (3.3367) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][320/625] eta 0:02:08 lr 0.000238 wd 0.0500 time 0.3994 (0.4212) data time 0.0006 (0.0021) model time 0.3987 (0.4225) loss 7.4150 (6.7621) grad_norm 2.6993 (3.3140) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][330/625] eta 0:02:04 lr 0.000238 wd 0.0500 time 0.3970 (0.4211) data time 0.0008 (0.0021) model time 0.3962 (0.4223) loss 6.6239 (6.7521) grad_norm 1.8181 (3.2893) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][340/625] eta 0:01:59 lr 0.000238 wd 0.0500 time 0.4020 (0.4210) data time 0.0007 (0.0021) model time 0.4013 (0.4221) loss 5.4289 (6.7476) grad_norm 2.2773 (3.2563) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:12:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][350/625] eta 0:01:55 lr 0.000238 wd 0.0500 time 0.4002 (0.4204) data time 0.0006 (0.0020) model time 0.3996 (0.4214) loss 7.1976 (6.7505) grad_norm 21.8565 (3.3152) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][360/625] eta 0:01:51 lr 0.000238 wd 0.0500 time 0.3965 (0.4199) data time 0.0008 (0.0020) model time 0.3957 (0.4207) loss 5.9355 (6.7495) grad_norm 2.2748 (3.3097) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][370/625] eta 0:01:46 lr 0.000238 wd 0.0500 time 0.3981 (0.4193) data time 0.0006 (0.0020) model time 0.3974 (0.4200) loss 6.1637 (6.7380) grad_norm 2.5333 (3.3037) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][380/625] eta 0:01:42 lr 0.000238 wd 0.0500 time 0.4004 (0.4188) data time 0.0008 (0.0019) model time 0.3995 (0.4193) loss 7.3588 (6.7460) grad_norm 1.8784 (3.2956) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][390/625] eta 0:01:38 lr 0.000238 wd 0.0500 time 0.3974 (0.4183) data time 0.0009 (0.0019) model time 0.3965 (0.4187) loss 7.2021 (6.7379) grad_norm 1.9804 (3.2791) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][400/625] eta 0:01:34 lr 0.000238 wd 0.0500 time 0.4021 (0.4178) data time 0.0009 (0.0019) model time 0.4012 (0.4181) loss 7.1075 (6.7387) grad_norm 3.3002 (3.2687) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][410/625] eta 0:01:29 lr 0.000237 wd 0.0500 time 0.4035 (0.4175) data time 0.0007 (0.0019) model time 0.4028 (0.4177) loss 6.6267 (6.7425) grad_norm 4.0303 (3.2557) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][420/625] eta 0:01:25 lr 0.000237 wd 0.0500 time 0.4082 (0.4170) data time 0.0007 (0.0018) model time 0.4075 (0.4172) loss 6.7241 (6.7486) grad_norm 3.8625 (3.2546) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][430/625] eta 0:01:21 lr 0.000237 wd 0.0500 time 0.3958 (0.4166) data time 0.0007 (0.0018) model time 0.3951 (0.4167) loss 5.2220 (6.7472) grad_norm 7.6522 (3.2482) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][440/625] eta 0:01:17 lr 0.000237 wd 0.0500 time 0.3959 (0.4162) data time 0.0008 (0.0018) model time 0.3951 (0.4162) loss 6.3348 (6.7456) grad_norm 2.4375 (3.2683) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][450/625] eta 0:01:12 lr 0.000237 wd 0.0500 time 0.3971 (0.4158) data time 0.0006 (0.0018) model time 0.3964 (0.4158) loss 5.7497 (6.7427) grad_norm 2.4042 (3.3404) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][460/625] eta 0:01:08 lr 0.000237 wd 0.0500 time 0.3946 (0.4155) data time 0.0009 (0.0017) model time 0.3937 (0.4153) loss 7.3554 (6.7472) grad_norm 2.6012 (3.3301) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][470/625] eta 0:01:04 lr 0.000237 wd 0.0500 time 0.3988 (0.4151) data time 0.0007 (0.0017) model time 0.3981 (0.4149) loss 5.4880 (6.7520) grad_norm 1.9808 (3.3236) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][480/625] eta 0:01:00 lr 0.000237 wd 0.0500 time 0.3972 (0.4148) data time 0.0007 (0.0017) model time 0.3965 (0.4146) loss 7.1106 (6.7553) grad_norm 1.7864 (3.3089) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][490/625] eta 0:00:56 lr 0.000237 wd 0.0500 time 0.3959 (0.4149) data time 0.0008 (0.0017) model time 0.3950 (0.4146) loss 6.6844 (6.7564) grad_norm 2.3604 (3.2891) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:13:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][500/625] eta 0:00:51 lr 0.000237 wd 0.0500 time 0.4004 (0.4150) data time 0.0006 (0.0017) model time 0.3997 (0.4148) loss 5.9258 (6.7556) grad_norm 3.8023 (3.2810) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][510/625] eta 0:00:47 lr 0.000237 wd 0.0500 time 0.5577 (0.4160) data time 0.0006 (0.0017) model time 0.5570 (0.4158) loss 6.1966 (6.7515) grad_norm 4.2997 (3.2911) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][520/625] eta 0:00:43 lr 0.000237 wd 0.0500 time 0.5685 (0.4168) data time 0.0006 (0.0016) model time 0.5680 (0.4167) loss 6.6095 (6.7475) grad_norm 3.2024 (3.2793) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][530/625] eta 0:00:39 lr 0.000236 wd 0.0500 time 0.3966 (0.4178) data time 0.0009 (0.0016) model time 0.3957 (0.4178) loss 7.8450 (6.7520) grad_norm 4.4131 (3.2862) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][540/625] eta 0:00:35 lr 0.000236 wd 0.0500 time 0.5817 (0.4186) data time 0.0009 (0.0016) model time 0.5808 (0.4187) loss 6.3395 (6.7536) grad_norm 3.0487 (3.3012) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][550/625] eta 0:00:31 lr 0.000236 wd 0.0500 time 0.4069 (0.4185) data time 0.0006 (0.0016) model time 0.4062 (0.4186) loss 7.0808 (6.7560) grad_norm 2.6520 (3.2973) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][560/625] eta 0:00:27 lr 0.000236 wd 0.0500 time 0.4000 (0.4186) data time 0.0006 (0.0016) model time 0.3994 (0.4186) loss 6.4920 (6.7550) grad_norm 3.4846 (3.2928) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][570/625] eta 0:00:23 lr 0.000236 wd 0.0500 time 0.3996 (0.4183) data time 0.0009 (0.0016) model time 0.3987 (0.4182) loss 7.2014 (6.7607) grad_norm 2.5797 (3.3097) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][580/625] eta 0:00:18 lr 0.000236 wd 0.0500 time 0.3993 (0.4179) data time 0.0009 (0.0016) model time 0.3984 (0.4179) loss 6.8397 (6.7607) grad_norm 1.9761 (3.2959) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][590/625] eta 0:00:14 lr 0.000236 wd 0.0500 time 0.4000 (0.4177) data time 0.0007 (0.0015) model time 0.3994 (0.4175) loss 7.2675 (6.7595) grad_norm 2.2844 (3.2836) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][600/625] eta 0:00:10 lr 0.000236 wd 0.0500 time 0.3993 (0.4174) data time 0.0009 (0.0015) model time 0.3984 (0.4172) loss 7.2734 (6.7665) grad_norm 2.4440 (3.2748) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][610/625] eta 0:00:06 lr 0.000236 wd 0.0500 time 0.3982 (0.4171) data time 0.0004 (0.0015) model time 0.3978 (0.4169) loss 7.2425 (6.7670) grad_norm 1.8580 (3.2609) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][620/625] eta 0:00:02 lr 0.000236 wd 0.0500 time 0.4001 (0.4168) data time 0.0004 (0.0015) model time 0.3997 (0.4166) loss 6.5436 (6.7672) grad_norm 2.9603 (3.2533) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:14:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 219 training takes 0:04:20 [2024-07-25 08:14:51 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:14:52 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:14:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5483 (0.5483) Acc@1 89.941 (89.941) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 08:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8408 (0.6656) Acc@1 82.080 (86.705) Acc@5 96.191 (97.736) Mem 14939MB [2024-07-25 08:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9521 (0.7809) Acc@1 77.441 (83.638) Acc@5 95.166 (96.673) Mem 14939MB [2024-07-25 08:14:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.297 Acc@5 96.651 [2024-07-25 08:14:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:14:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.785 (0.785) Loss 0.5420 (0.5420) Acc@1 89.893 (89.893) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 08:14:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.151) Loss 0.8374 (0.6681) Acc@1 81.934 (86.759) Acc@5 96.631 (97.874) Mem 14939MB [2024-07-25 08:14:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.120) Loss 0.9575 (0.7786) Acc@1 77.686 (83.775) Acc@5 95.508 (96.828) Mem 14939MB [2024-07-25 08:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.365 Acc@5 96.791 [2024-07-25 08:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.37% [2024-07-25 08:14:58 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:14:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:14:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][0/625] eta 0:08:27 lr 0.000236 wd 0.0500 time 0.8115 (0.8115) data time 0.4353 (0.4353) model time 0.0000 (0.0000) loss 5.4036 (5.4036) grad_norm 6.5011 (6.5011) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][10/625] eta 0:04:28 lr 0.000236 wd 0.0500 time 0.3986 (0.4365) data time 0.0008 (0.0403) model time 0.0000 (0.0000) loss 7.0735 (6.4626) grad_norm 2.4534 (3.2653) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][20/625] eta 0:04:14 lr 0.000235 wd 0.0500 time 0.3980 (0.4201) data time 0.0008 (0.0215) model time 0.0000 (0.0000) loss 6.2143 (6.7165) grad_norm 2.5898 (3.0143) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][30/625] eta 0:04:05 lr 0.000235 wd 0.0500 time 0.4000 (0.4134) data time 0.0008 (0.0148) model time 0.0000 (0.0000) loss 7.3506 (6.7563) grad_norm 2.5464 (2.9699) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][40/625] eta 0:04:02 lr 0.000235 wd 0.0500 time 0.3920 (0.4146) data time 0.0007 (0.0114) model time 0.0000 (0.0000) loss 7.4277 (6.6798) grad_norm 2.5238 (3.0948) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][50/625] eta 0:03:56 lr 0.000235 wd 0.0500 time 0.3978 (0.4113) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 6.5036 (6.6686) grad_norm 2.8846 (3.8138) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][60/625] eta 0:03:51 lr 0.000235 wd 0.0500 time 0.3991 (0.4092) data time 0.0010 (0.0079) model time 0.3981 (0.3979) loss 6.7414 (6.6772) grad_norm 2.8864 (3.7475) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][70/625] eta 0:03:46 lr 0.000235 wd 0.0500 time 0.3978 (0.4074) data time 0.0008 (0.0069) model time 0.3970 (0.3965) loss 5.9548 (6.6347) grad_norm 2.0574 (3.6177) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][80/625] eta 0:03:41 lr 0.000235 wd 0.0500 time 0.4010 (0.4062) data time 0.0009 (0.0062) model time 0.4001 (0.3966) loss 6.7609 (6.6246) grad_norm 2.1741 (3.5036) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][90/625] eta 0:03:38 lr 0.000235 wd 0.0500 time 0.3933 (0.4075) data time 0.0009 (0.0056) model time 0.3924 (0.4019) loss 5.7264 (6.6241) grad_norm 1.8715 (3.4975) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][100/625] eta 0:03:34 lr 0.000235 wd 0.0500 time 0.4042 (0.4084) data time 0.0008 (0.0051) model time 0.4034 (0.4046) loss 7.4779 (6.6374) grad_norm 3.0112 (3.5599) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][110/625] eta 0:03:32 lr 0.000235 wd 0.0500 time 0.3986 (0.4118) data time 0.0008 (0.0047) model time 0.3978 (0.4114) loss 6.3320 (6.6288) grad_norm 3.2081 (3.4778) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][120/625] eta 0:03:31 lr 0.000235 wd 0.0500 time 0.5895 (0.4182) data time 0.0006 (0.0044) model time 0.5889 (0.4224) loss 6.1144 (6.6374) grad_norm 3.6378 (3.4860) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][130/625] eta 0:03:27 lr 0.000235 wd 0.0500 time 0.4000 (0.4198) data time 0.0006 (0.0041) model time 0.3994 (0.4244) loss 6.3272 (6.6700) grad_norm 2.1683 (3.5164) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:15:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][140/625] eta 0:03:25 lr 0.000234 wd 0.0500 time 0.5488 (0.4232) data time 0.0008 (0.0039) model time 0.5480 (0.4292) loss 6.9829 (6.6533) grad_norm 81.1217 (4.0060) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][150/625] eta 0:03:20 lr 0.000234 wd 0.0500 time 0.5850 (0.4229) data time 0.0006 (0.0037) model time 0.5844 (0.4279) loss 6.4301 (6.6466) grad_norm 1.8342 (3.9559) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][160/625] eta 0:03:16 lr 0.000234 wd 0.0500 time 0.4016 (0.4224) data time 0.0007 (0.0035) model time 0.4009 (0.4267) loss 7.6015 (6.6635) grad_norm 2.0843 (3.9181) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][170/625] eta 0:03:11 lr 0.000234 wd 0.0500 time 0.3982 (0.4210) data time 0.0007 (0.0033) model time 0.3975 (0.4243) loss 6.9119 (6.6656) grad_norm 2.0529 (3.8329) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][180/625] eta 0:03:06 lr 0.000234 wd 0.0500 time 0.3985 (0.4197) data time 0.0008 (0.0032) model time 0.3977 (0.4222) loss 5.9861 (6.6826) grad_norm 3.1044 (3.7700) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][190/625] eta 0:03:02 lr 0.000234 wd 0.0500 time 0.3975 (0.4187) data time 0.0006 (0.0031) model time 0.3969 (0.4206) loss 6.1440 (6.6679) grad_norm 1.7586 (3.8706) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][200/625] eta 0:02:57 lr 0.000234 wd 0.0500 time 0.3987 (0.4178) data time 0.0008 (0.0030) model time 0.3978 (0.4192) loss 6.0597 (6.6633) grad_norm 3.3860 (3.8065) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][210/625] eta 0:02:53 lr 0.000234 wd 0.0500 time 0.3963 (0.4170) data time 0.0008 (0.0029) model time 0.3955 (0.4180) loss 7.5672 (6.6751) grad_norm 2.6377 (3.7443) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][220/625] eta 0:02:48 lr 0.000234 wd 0.0500 time 0.3978 (0.4161) data time 0.0009 (0.0028) model time 0.3970 (0.4168) loss 7.0376 (6.6649) grad_norm 4.0088 (3.6993) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][230/625] eta 0:02:44 lr 0.000234 wd 0.0500 time 0.3984 (0.4155) data time 0.0007 (0.0027) model time 0.3977 (0.4159) loss 6.6746 (6.6594) grad_norm 2.8536 (3.6602) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][240/625] eta 0:02:39 lr 0.000234 wd 0.0500 time 0.4060 (0.4148) data time 0.0008 (0.0026) model time 0.4052 (0.4149) loss 7.4303 (6.6665) grad_norm 2.8354 (3.6317) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][250/625] eta 0:02:35 lr 0.000234 wd 0.0500 time 0.3978 (0.4142) data time 0.0007 (0.0025) model time 0.3971 (0.4141) loss 7.6814 (6.6712) grad_norm 2.9261 (3.6384) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][260/625] eta 0:02:31 lr 0.000233 wd 0.0500 time 0.3967 (0.4142) data time 0.0009 (0.0025) model time 0.3958 (0.4141) loss 6.3277 (6.6662) grad_norm 2.9095 (3.5985) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][270/625] eta 0:02:26 lr 0.000233 wd 0.0500 time 0.3998 (0.4137) data time 0.0006 (0.0024) model time 0.3992 (0.4134) loss 7.2630 (6.6801) grad_norm 2.0437 (3.5684) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][280/625] eta 0:02:22 lr 0.000233 wd 0.0500 time 0.3970 (0.4131) data time 0.0009 (0.0024) model time 0.3962 (0.4127) loss 6.9963 (6.6883) grad_norm 2.6120 (3.5533) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:16:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][290/625] eta 0:02:18 lr 0.000233 wd 0.0500 time 0.4133 (0.4127) data time 0.0006 (0.0023) model time 0.4126 (0.4122) loss 7.5150 (6.6977) grad_norm 3.2271 (3.5463) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][300/625] eta 0:02:14 lr 0.000233 wd 0.0500 time 0.5777 (0.4129) data time 0.0007 (0.0023) model time 0.5769 (0.4124) loss 7.4030 (6.7148) grad_norm 2.6726 (3.5356) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][310/625] eta 0:02:09 lr 0.000233 wd 0.0500 time 0.3975 (0.4125) data time 0.0009 (0.0022) model time 0.3967 (0.4119) loss 5.9780 (6.7258) grad_norm 3.8335 (3.5319) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][320/625] eta 0:02:05 lr 0.000233 wd 0.0500 time 0.5411 (0.4129) data time 0.0009 (0.0022) model time 0.5402 (0.4124) loss 7.3075 (6.7379) grad_norm 2.6132 (3.5192) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][330/625] eta 0:02:02 lr 0.000233 wd 0.0500 time 0.3963 (0.4139) data time 0.0009 (0.0021) model time 0.3954 (0.4135) loss 6.9552 (6.7504) grad_norm 3.3072 (3.5261) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][340/625] eta 0:01:58 lr 0.000233 wd 0.0500 time 0.5890 (0.4161) data time 0.0009 (0.0021) model time 0.5881 (0.4161) loss 7.6220 (6.7581) grad_norm 3.2299 (3.5238) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][350/625] eta 0:01:54 lr 0.000233 wd 0.0500 time 0.5869 (0.4169) data time 0.0009 (0.0021) model time 0.5860 (0.4171) loss 6.0299 (6.7535) grad_norm 4.0048 (3.5149) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][360/625] eta 0:01:50 lr 0.000233 wd 0.0500 time 0.3975 (0.4180) data time 0.0007 (0.0020) model time 0.3968 (0.4182) loss 7.1914 (6.7484) grad_norm 2.7184 (3.4984) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][370/625] eta 0:01:46 lr 0.000233 wd 0.0500 time 0.5831 (0.4182) data time 0.0008 (0.0020) model time 0.5823 (0.4185) loss 8.0102 (6.7522) grad_norm 2.9679 (3.4908) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][380/625] eta 0:01:42 lr 0.000232 wd 0.0500 time 0.4065 (0.4181) data time 0.0006 (0.0020) model time 0.4059 (0.4183) loss 7.3693 (6.7598) grad_norm 2.2637 (3.4905) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][390/625] eta 0:01:38 lr 0.000232 wd 0.0500 time 0.3961 (0.4176) data time 0.0009 (0.0019) model time 0.3953 (0.4177) loss 7.1462 (6.7702) grad_norm 2.0380 (3.4620) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][400/625] eta 0:01:33 lr 0.000232 wd 0.0500 time 0.3957 (0.4171) data time 0.0009 (0.0019) model time 0.3948 (0.4171) loss 7.7481 (6.7724) grad_norm 1.8676 (3.4534) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][410/625] eta 0:01:29 lr 0.000232 wd 0.0500 time 0.3994 (0.4166) data time 0.0009 (0.0019) model time 0.3985 (0.4165) loss 7.3976 (6.7707) grad_norm 2.1212 (3.4364) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][420/625] eta 0:01:25 lr 0.000232 wd 0.0500 time 0.3949 (0.4161) data time 0.0006 (0.0019) model time 0.3942 (0.4160) loss 6.5232 (6.7640) grad_norm 3.4403 (3.4511) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:17:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][430/625] eta 0:01:21 lr 0.000232 wd 0.0500 time 0.3996 (0.4157) data time 0.0009 (0.0018) model time 0.3988 (0.4155) loss 5.8338 (6.7628) grad_norm 11.5365 (3.4570) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][440/625] eta 0:01:16 lr 0.000232 wd 0.0500 time 0.3985 (0.4154) data time 0.0007 (0.0018) model time 0.3978 (0.4151) loss 6.5403 (6.7579) grad_norm 2.7277 (3.4411) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][450/625] eta 0:01:12 lr 0.000232 wd 0.0500 time 0.4029 (0.4151) data time 0.0007 (0.0018) model time 0.4022 (0.4147) loss 7.4887 (6.7543) grad_norm 2.8854 (3.4488) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][460/625] eta 0:01:08 lr 0.000232 wd 0.0500 time 0.3945 (0.4147) data time 0.0009 (0.0018) model time 0.3937 (0.4143) loss 5.6548 (6.7555) grad_norm 2.2845 (3.4260) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][470/625] eta 0:01:04 lr 0.000232 wd 0.0500 time 0.3994 (0.4144) data time 0.0006 (0.0017) model time 0.3988 (0.4140) loss 6.2930 (6.7586) grad_norm 2.5314 (3.3980) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][480/625] eta 0:01:00 lr 0.000232 wd 0.0500 time 0.4000 (0.4143) data time 0.0007 (0.0017) model time 0.3994 (0.4139) loss 5.9073 (6.7551) grad_norm 2.0457 (3.3782) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][490/625] eta 0:00:55 lr 0.000232 wd 0.0500 time 0.3978 (0.4140) data time 0.0007 (0.0017) model time 0.3972 (0.4135) loss 6.4511 (6.7577) grad_norm 3.1373 (3.3705) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][500/625] eta 0:00:51 lr 0.000231 wd 0.0500 time 0.3996 (0.4137) data time 0.0006 (0.0017) model time 0.3990 (0.4132) loss 6.7599 (6.7621) grad_norm 2.0971 (3.3660) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][510/625] eta 0:00:47 lr 0.000231 wd 0.0500 time 0.4009 (0.4134) data time 0.0008 (0.0017) model time 0.4002 (0.4129) loss 6.8832 (6.7610) grad_norm 2.7359 (3.3677) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][520/625] eta 0:00:43 lr 0.000231 wd 0.0500 time 0.4024 (0.4132) data time 0.0009 (0.0017) model time 0.4015 (0.4125) loss 8.3265 (6.7664) grad_norm 3.5629 (3.3702) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][530/625] eta 0:00:39 lr 0.000231 wd 0.0500 time 0.3965 (0.4133) data time 0.0010 (0.0016) model time 0.3955 (0.4126) loss 6.8561 (6.7709) grad_norm 4.3777 (3.3642) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][540/625] eta 0:00:35 lr 0.000231 wd 0.0500 time 0.5861 (0.4138) data time 0.0008 (0.0016) model time 0.5853 (0.4132) loss 5.8877 (6.7636) grad_norm 2.9924 (3.3579) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][550/625] eta 0:00:31 lr 0.000231 wd 0.0500 time 0.5975 (0.4147) data time 0.0006 (0.0016) model time 0.5969 (0.4143) loss 5.4488 (6.7634) grad_norm 3.0846 (3.3458) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][560/625] eta 0:00:27 lr 0.000231 wd 0.0500 time 0.5847 (0.4157) data time 0.0007 (0.0016) model time 0.5840 (0.4153) loss 5.8055 (6.7668) grad_norm 4.1531 (3.3414) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:18:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][570/625] eta 0:00:22 lr 0.000231 wd 0.0500 time 0.3978 (0.4166) data time 0.0008 (0.0016) model time 0.3971 (0.4163) loss 7.4631 (6.7626) grad_norm 2.3150 (3.3373) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][580/625] eta 0:00:18 lr 0.000231 wd 0.0500 time 0.3979 (0.4176) data time 0.0008 (0.0016) model time 0.3971 (0.4174) loss 7.5679 (6.7651) grad_norm 2.1675 (3.3374) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][590/625] eta 0:00:14 lr 0.000231 wd 0.0500 time 0.3959 (0.4178) data time 0.0006 (0.0016) model time 0.3953 (0.4176) loss 6.1142 (6.7687) grad_norm 3.6961 (3.3426) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][600/625] eta 0:00:10 lr 0.000231 wd 0.0500 time 0.4001 (0.4178) data time 0.0006 (0.0015) model time 0.3995 (0.4176) loss 7.5519 (6.7694) grad_norm 1.9245 (3.3425) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][610/625] eta 0:00:06 lr 0.000231 wd 0.0500 time 0.3961 (0.4175) data time 0.0004 (0.0015) model time 0.3957 (0.4172) loss 7.1497 (6.7646) grad_norm 1.8892 (3.3383) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][620/625] eta 0:00:02 lr 0.000231 wd 0.0500 time 0.3966 (0.4172) data time 0.0005 (0.0015) model time 0.3961 (0.4169) loss 7.2482 (6.7592) grad_norm 3.3757 (3.3427) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 220 training takes 0:04:20 [2024-07-25 08:19:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:19:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:19:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.486 (0.486) Loss 0.5493 (0.5493) Acc@1 90.332 (90.332) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 08:19:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.8428 (0.6753) Acc@1 82.129 (86.874) Acc@5 96.484 (97.834) Mem 14939MB [2024-07-25 08:19:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9639 (0.7887) Acc@1 77.637 (83.666) Acc@5 95.215 (96.731) Mem 14939MB [2024-07-25 08:19:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.333 Acc@5 96.695 [2024-07-25 08:19:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:19:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.804 (0.804) Loss 0.5420 (0.5420) Acc@1 89.844 (89.844) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 08:19:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.8359 (0.6678) Acc@1 81.982 (86.768) Acc@5 96.680 (97.883) Mem 14939MB [2024-07-25 08:19:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9565 (0.7779) Acc@1 77.783 (83.784) Acc@5 95.557 (96.833) Mem 14939MB [2024-07-25 08:19:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.379 Acc@5 96.795 [2024-07-25 08:19:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:19:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.38% [2024-07-25 08:19:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:19:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:19:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][0/625] eta 0:08:30 lr 0.000230 wd 0.0500 time 0.8172 (0.8172) data time 0.4416 (0.4416) model time 0.0000 (0.0000) loss 6.9069 (6.9069) grad_norm 2.4167 (2.4167) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][10/625] eta 0:04:35 lr 0.000230 wd 0.0500 time 0.3967 (0.4478) data time 0.0009 (0.0409) model time 0.0000 (0.0000) loss 7.2858 (6.8552) grad_norm 2.4813 (3.3113) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][20/625] eta 0:04:16 lr 0.000230 wd 0.0500 time 0.3968 (0.4236) data time 0.0007 (0.0218) model time 0.0000 (0.0000) loss 7.1944 (6.8629) grad_norm 3.3219 (3.3925) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][30/625] eta 0:04:07 lr 0.000230 wd 0.0500 time 0.3953 (0.4161) data time 0.0007 (0.0150) model time 0.0000 (0.0000) loss 5.9261 (6.7819) grad_norm 2.7375 (3.2205) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][40/625] eta 0:04:00 lr 0.000230 wd 0.0500 time 0.4036 (0.4114) data time 0.0006 (0.0116) model time 0.0000 (0.0000) loss 6.1172 (6.7279) grad_norm 3.6096 (3.2634) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:19:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][50/625] eta 0:03:54 lr 0.000230 wd 0.0500 time 0.3943 (0.4085) data time 0.0007 (0.0095) model time 0.0000 (0.0000) loss 5.7081 (6.7206) grad_norm 2.2852 (3.1101) loss_scale 256.0000 (145.5686) mem 14939MB [2024-07-25 08:19:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][60/625] eta 0:03:49 lr 0.000230 wd 0.0500 time 0.4023 (0.4069) data time 0.0006 (0.0080) model time 0.4017 (0.3978) loss 6.0541 (6.6608) grad_norm 2.9431 (3.0001) loss_scale 256.0000 (163.6721) mem 14939MB [2024-07-25 08:19:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][70/625] eta 0:03:45 lr 0.000230 wd 0.0500 time 0.3998 (0.4060) data time 0.0009 (0.0070) model time 0.3989 (0.3989) loss 7.0704 (6.6366) grad_norm 4.7200 (2.9333) loss_scale 256.0000 (176.6761) mem 14939MB [2024-07-25 08:19:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][80/625] eta 0:03:40 lr 0.000230 wd 0.0500 time 0.3982 (0.4052) data time 0.0008 (0.0063) model time 0.3973 (0.3988) loss 6.4629 (6.7003) grad_norm 2.5825 (2.9411) loss_scale 256.0000 (186.4691) mem 14939MB [2024-07-25 08:20:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][90/625] eta 0:03:36 lr 0.000230 wd 0.0500 time 0.3962 (0.4044) data time 0.0009 (0.0057) model time 0.3953 (0.3984) loss 7.3814 (6.6988) grad_norm 1.8216 (2.8692) loss_scale 256.0000 (194.1099) mem 14939MB [2024-07-25 08:20:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][100/625] eta 0:03:31 lr 0.000230 wd 0.0500 time 0.3971 (0.4038) data time 0.0006 (0.0052) model time 0.3965 (0.3981) loss 7.1051 (6.7429) grad_norm 3.9028 (2.8818) loss_scale 256.0000 (200.2376) mem 14939MB [2024-07-25 08:20:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][110/625] eta 0:03:27 lr 0.000230 wd 0.0500 time 0.4028 (0.4032) data time 0.0006 (0.0048) model time 0.4022 (0.3979) loss 6.0496 (6.7158) grad_norm 4.8530 (2.8793) loss_scale 256.0000 (205.2613) mem 14939MB [2024-07-25 08:20:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][120/625] eta 0:03:24 lr 0.000229 wd 0.0500 time 0.4003 (0.4041) data time 0.0009 (0.0045) model time 0.3994 (0.4001) loss 7.2141 (6.7267) grad_norm 4.0864 (2.9158) loss_scale 256.0000 (209.4545) mem 14939MB [2024-07-25 08:20:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][130/625] eta 0:03:19 lr 0.000229 wd 0.0500 time 0.3944 (0.4036) data time 0.0006 (0.0042) model time 0.3938 (0.3997) loss 6.3326 (6.7444) grad_norm 2.2046 (2.9106) loss_scale 256.0000 (213.0076) mem 14939MB [2024-07-25 08:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][140/625] eta 0:03:17 lr 0.000229 wd 0.0500 time 0.5777 (0.4075) data time 0.0009 (0.0039) model time 0.5769 (0.4061) loss 6.8532 (6.7489) grad_norm 2.1321 (2.9752) loss_scale 256.0000 (216.0567) mem 14939MB [2024-07-25 08:20:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][150/625] eta 0:03:14 lr 0.000229 wd 0.0500 time 0.6062 (0.4097) data time 0.0008 (0.0037) model time 0.6053 (0.4095) loss 7.4632 (6.7263) grad_norm 3.3946 (2.9583) loss_scale 256.0000 (218.7020) mem 14939MB [2024-07-25 08:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][160/625] eta 0:03:13 lr 0.000229 wd 0.0500 time 0.5693 (0.4156) data time 0.0006 (0.0035) model time 0.5687 (0.4181) loss 5.7965 (6.7200) grad_norm 7.4524 (2.9551) loss_scale 256.0000 (221.0186) mem 14939MB [2024-07-25 08:20:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][170/625] eta 0:03:10 lr 0.000229 wd 0.0500 time 0.5564 (0.4197) data time 0.0007 (0.0034) model time 0.5557 (0.4237) loss 7.2902 (6.7383) grad_norm 2.5203 (3.0323) loss_scale 256.0000 (223.0643) mem 14939MB [2024-07-25 08:20:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][180/625] eta 0:03:07 lr 0.000229 wd 0.0500 time 0.3957 (0.4206) data time 0.0008 (0.0032) model time 0.3948 (0.4245) loss 6.6687 (6.7627) grad_norm 2.1248 (3.0178) loss_scale 256.0000 (224.8840) mem 14939MB [2024-07-25 08:20:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][190/625] eta 0:03:03 lr 0.000229 wd 0.0500 time 0.3979 (0.4208) data time 0.0009 (0.0031) model time 0.3970 (0.4245) loss 6.2701 (6.7606) grad_norm 1.9585 (3.0173) loss_scale 256.0000 (226.5131) mem 14939MB [2024-07-25 08:20:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][200/625] eta 0:02:58 lr 0.000229 wd 0.0500 time 0.3970 (0.4198) data time 0.0009 (0.0030) model time 0.3961 (0.4228) loss 7.2844 (6.7414) grad_norm 2.3075 (3.0060) loss_scale 256.0000 (227.9801) mem 14939MB [2024-07-25 08:20:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][210/625] eta 0:02:53 lr 0.000229 wd 0.0500 time 0.3950 (0.4187) data time 0.0007 (0.0029) model time 0.3943 (0.4212) loss 6.6272 (6.7360) grad_norm 3.1929 (3.0081) loss_scale 256.0000 (229.3081) mem 14939MB [2024-07-25 08:20:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][220/625] eta 0:02:49 lr 0.000229 wd 0.0500 time 0.3950 (0.4178) data time 0.0008 (0.0028) model time 0.3942 (0.4198) loss 6.8720 (6.7323) grad_norm 2.6661 (3.0548) loss_scale 256.0000 (230.5158) mem 14939MB [2024-07-25 08:21:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][230/625] eta 0:02:45 lr 0.000229 wd 0.0500 time 0.3985 (0.4178) data time 0.0009 (0.0027) model time 0.3976 (0.4197) loss 6.5281 (6.7349) grad_norm 2.0130 (3.0741) loss_scale 256.0000 (231.6190) mem 14939MB [2024-07-25 08:21:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][240/625] eta 0:02:40 lr 0.000228 wd 0.0500 time 0.4002 (0.4170) data time 0.0009 (0.0026) model time 0.3994 (0.4185) loss 7.3246 (6.7213) grad_norm 2.8131 (3.1652) loss_scale 256.0000 (232.6307) mem 14939MB [2024-07-25 08:21:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][250/625] eta 0:02:36 lr 0.000228 wd 0.0500 time 0.3962 (0.4162) data time 0.0008 (0.0026) model time 0.3954 (0.4174) loss 7.5780 (6.7226) grad_norm 2.6204 (3.1466) loss_scale 256.0000 (233.5618) mem 14939MB [2024-07-25 08:21:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][260/625] eta 0:02:31 lr 0.000228 wd 0.0500 time 0.3950 (0.4155) data time 0.0008 (0.0025) model time 0.3941 (0.4164) loss 6.5957 (6.7222) grad_norm 2.4868 (3.1505) loss_scale 256.0000 (234.4215) mem 14939MB [2024-07-25 08:21:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][270/625] eta 0:02:27 lr 0.000228 wd 0.0500 time 0.3958 (0.4149) data time 0.0007 (0.0024) model time 0.3952 (0.4156) loss 6.1755 (6.7194) grad_norm 3.5483 (3.2400) loss_scale 256.0000 (235.2177) mem 14939MB [2024-07-25 08:21:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][280/625] eta 0:02:22 lr 0.000228 wd 0.0500 time 0.3968 (0.4143) data time 0.0008 (0.0024) model time 0.3960 (0.4147) loss 6.5807 (6.7191) grad_norm 2.4118 (3.2497) loss_scale 256.0000 (235.9573) mem 14939MB [2024-07-25 08:21:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][290/625] eta 0:02:18 lr 0.000228 wd 0.0500 time 0.4020 (0.4137) data time 0.0007 (0.0023) model time 0.4013 (0.4141) loss 7.5186 (6.7244) grad_norm 3.7383 (3.2492) loss_scale 256.0000 (236.6460) mem 14939MB [2024-07-25 08:21:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][300/625] eta 0:02:14 lr 0.000228 wd 0.0500 time 0.3978 (0.4133) data time 0.0009 (0.0023) model time 0.3969 (0.4134) loss 5.7679 (6.7080) grad_norm 1.9687 (3.2649) loss_scale 256.0000 (237.2890) mem 14939MB [2024-07-25 08:21:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][310/625] eta 0:02:10 lr 0.000228 wd 0.0500 time 0.3956 (0.4128) data time 0.0009 (0.0022) model time 0.3947 (0.4128) loss 7.7856 (6.7171) grad_norm 1.9938 (3.2551) loss_scale 256.0000 (237.8907) mem 14939MB [2024-07-25 08:21:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][320/625] eta 0:02:05 lr 0.000228 wd 0.0500 time 0.4007 (0.4123) data time 0.0007 (0.0022) model time 0.4000 (0.4122) loss 7.5933 (6.7271) grad_norm 2.4356 (3.2273) loss_scale 256.0000 (238.4548) mem 14939MB [2024-07-25 08:21:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][330/625] eta 0:02:01 lr 0.000228 wd 0.0500 time 0.3963 (0.4119) data time 0.0006 (0.0021) model time 0.3957 (0.4117) loss 6.0469 (6.7237) grad_norm 2.2070 (3.2002) loss_scale 256.0000 (238.9849) mem 14939MB [2024-07-25 08:21:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][340/625] eta 0:01:57 lr 0.000228 wd 0.0500 time 0.3980 (0.4119) data time 0.0008 (0.0021) model time 0.3972 (0.4117) loss 6.4790 (6.7210) grad_norm 2.1110 (3.1734) loss_scale 256.0000 (239.4839) mem 14939MB [2024-07-25 08:21:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][350/625] eta 0:01:53 lr 0.000228 wd 0.0500 time 0.3982 (0.4115) data time 0.0009 (0.0021) model time 0.3972 (0.4113) loss 8.0929 (6.7238) grad_norm 2.1930 (3.1446) loss_scale 256.0000 (239.9544) mem 14939MB [2024-07-25 08:21:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][360/625] eta 0:01:49 lr 0.000227 wd 0.0500 time 0.3974 (0.4122) data time 0.0007 (0.0020) model time 0.3966 (0.4120) loss 6.2633 (6.7263) grad_norm 2.0795 (3.1303) loss_scale 256.0000 (240.3989) mem 14939MB [2024-07-25 08:22:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][370/625] eta 0:01:45 lr 0.000227 wd 0.0500 time 0.5731 (0.4138) data time 0.0009 (0.0020) model time 0.5722 (0.4139) loss 6.1406 (6.7295) grad_norm 2.4556 (3.1206) loss_scale 256.0000 (240.8194) mem 14939MB [2024-07-25 08:22:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][380/625] eta 0:01:41 lr 0.000227 wd 0.0500 time 0.3991 (0.4160) data time 0.0009 (0.0020) model time 0.3982 (0.4164) loss 7.6947 (6.7277) grad_norm 3.4969 (3.1266) loss_scale 256.0000 (241.2178) mem 14939MB [2024-07-25 08:22:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][390/625] eta 0:01:37 lr 0.000227 wd 0.0500 time 0.3971 (0.4168) data time 0.0009 (0.0019) model time 0.3962 (0.4173) loss 7.0782 (6.7363) grad_norm 7.2044 (3.1376) loss_scale 256.0000 (241.5959) mem 14939MB [2024-07-25 08:22:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][400/625] eta 0:01:34 lr 0.000227 wd 0.0500 time 0.5325 (0.4182) data time 0.0008 (0.0019) model time 0.5317 (0.4188) loss 7.0482 (6.7376) grad_norm 2.3590 (3.1373) loss_scale 256.0000 (241.9551) mem 14939MB [2024-07-25 08:22:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][410/625] eta 0:01:29 lr 0.000227 wd 0.0500 time 0.3993 (0.4181) data time 0.0008 (0.0019) model time 0.3984 (0.4186) loss 7.5699 (6.7447) grad_norm 5.6441 (3.1404) loss_scale 256.0000 (242.2968) mem 14939MB [2024-07-25 08:22:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][420/625] eta 0:01:25 lr 0.000227 wd 0.0500 time 0.4126 (0.4176) data time 0.0008 (0.0019) model time 0.4118 (0.4180) loss 7.3643 (6.7504) grad_norm 1.7747 (3.1625) loss_scale 256.0000 (242.6223) mem 14939MB [2024-07-25 08:22:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][430/625] eta 0:01:21 lr 0.000227 wd 0.0500 time 0.3986 (0.4172) data time 0.0007 (0.0018) model time 0.3979 (0.4175) loss 6.0253 (6.7436) grad_norm 3.8043 (3.1627) loss_scale 256.0000 (242.9327) mem 14939MB [2024-07-25 08:22:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][440/625] eta 0:01:17 lr 0.000227 wd 0.0500 time 0.3988 (0.4168) data time 0.0008 (0.0018) model time 0.3980 (0.4170) loss 6.6520 (6.7435) grad_norm 3.3378 (3.1542) loss_scale 256.0000 (243.2290) mem 14939MB [2024-07-25 08:22:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][450/625] eta 0:01:12 lr 0.000227 wd 0.0500 time 0.3989 (0.4168) data time 0.0006 (0.0018) model time 0.3983 (0.4171) loss 6.4465 (6.7462) grad_norm 3.1184 (3.1630) loss_scale 256.0000 (243.5122) mem 14939MB [2024-07-25 08:22:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][460/625] eta 0:01:08 lr 0.000227 wd 0.0500 time 0.4023 (0.4165) data time 0.0006 (0.0018) model time 0.4017 (0.4166) loss 5.5106 (6.7422) grad_norm 2.4342 (3.1594) loss_scale 256.0000 (243.7831) mem 14939MB [2024-07-25 08:22:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][470/625] eta 0:01:04 lr 0.000227 wd 0.0500 time 0.3972 (0.4161) data time 0.0008 (0.0017) model time 0.3963 (0.4162) loss 7.0866 (6.7397) grad_norm 2.6520 (3.1474) loss_scale 256.0000 (244.0425) mem 14939MB [2024-07-25 08:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][480/625] eta 0:01:00 lr 0.000227 wd 0.0500 time 0.3978 (0.4157) data time 0.0008 (0.0017) model time 0.3970 (0.4158) loss 6.8803 (6.7431) grad_norm 4.7607 (3.1583) loss_scale 256.0000 (244.2911) mem 14939MB [2024-07-25 08:22:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][490/625] eta 0:00:56 lr 0.000226 wd 0.0500 time 0.3998 (0.4154) data time 0.0007 (0.0017) model time 0.3991 (0.4153) loss 6.4374 (6.7373) grad_norm 1.9567 (3.1498) loss_scale 256.0000 (244.5295) mem 14939MB [2024-07-25 08:22:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][500/625] eta 0:00:51 lr 0.000226 wd 0.0500 time 0.3988 (0.4150) data time 0.0008 (0.0017) model time 0.3979 (0.4149) loss 7.6471 (6.7381) grad_norm 4.2018 (3.1517) loss_scale 256.0000 (244.7585) mem 14939MB [2024-07-25 08:22:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][510/625] eta 0:00:47 lr 0.000226 wd 0.0500 time 0.3957 (0.4147) data time 0.0008 (0.0017) model time 0.3949 (0.4146) loss 6.6850 (6.7392) grad_norm 2.0503 (3.1575) loss_scale 256.0000 (244.9785) mem 14939MB [2024-07-25 08:23:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][520/625] eta 0:00:43 lr 0.000226 wd 0.0500 time 0.3990 (0.4144) data time 0.0008 (0.0017) model time 0.3981 (0.4143) loss 6.4975 (6.7289) grad_norm 1.7071 (3.1538) loss_scale 256.0000 (245.1900) mem 14939MB [2024-07-25 08:23:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][530/625] eta 0:00:39 lr 0.000226 wd 0.0500 time 0.3997 (0.4141) data time 0.0009 (0.0016) model time 0.3988 (0.4139) loss 6.9414 (6.7291) grad_norm 2.2094 (3.1533) loss_scale 256.0000 (245.3936) mem 14939MB [2024-07-25 08:23:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][540/625] eta 0:00:35 lr 0.000226 wd 0.0500 time 0.4007 (0.4139) data time 0.0010 (0.0016) model time 0.3997 (0.4136) loss 6.4901 (6.7304) grad_norm 2.6968 (3.1660) loss_scale 256.0000 (245.5896) mem 14939MB [2024-07-25 08:23:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][550/625] eta 0:00:31 lr 0.000226 wd 0.0500 time 0.3946 (0.4136) data time 0.0007 (0.0016) model time 0.3939 (0.4134) loss 5.6979 (6.7344) grad_norm 2.1853 (3.1617) loss_scale 256.0000 (245.7786) mem 14939MB [2024-07-25 08:23:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][560/625] eta 0:00:26 lr 0.000226 wd 0.0500 time 0.3949 (0.4136) data time 0.0008 (0.0016) model time 0.3940 (0.4133) loss 6.5960 (6.7407) grad_norm 1.7815 (3.1549) loss_scale 256.0000 (245.9608) mem 14939MB [2024-07-25 08:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][570/625] eta 0:00:22 lr 0.000226 wd 0.0500 time 0.3971 (0.4134) data time 0.0009 (0.0016) model time 0.3962 (0.4131) loss 5.5769 (6.7394) grad_norm 9.7991 (3.1887) loss_scale 256.0000 (246.1366) mem 14939MB [2024-07-25 08:23:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][580/625] eta 0:00:18 lr 0.000226 wd 0.0500 time 0.3951 (0.4137) data time 0.0007 (0.0016) model time 0.3944 (0.4134) loss 7.7026 (6.7458) grad_norm 3.2088 (3.1950) loss_scale 256.0000 (246.3064) mem 14939MB [2024-07-25 08:23:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][590/625] eta 0:00:14 lr 0.000226 wd 0.0500 time 0.3875 (0.4144) data time 0.0011 (0.0016) model time 0.3864 (0.4142) loss 7.3625 (6.7504) grad_norm 2.6672 (3.2045) loss_scale 256.0000 (246.4704) mem 14939MB [2024-07-25 08:23:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][600/625] eta 0:00:10 lr 0.000226 wd 0.0500 time 0.5992 (0.4164) data time 0.0010 (0.0015) model time 0.5982 (0.4163) loss 6.4994 (6.7418) grad_norm 4.5467 (3.2001) loss_scale 256.0000 (246.6290) mem 14939MB [2024-07-25 08:23:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][610/625] eta 0:00:06 lr 0.000225 wd 0.0500 time 0.5627 (0.4168) data time 0.0004 (0.0015) model time 0.5623 (0.4167) loss 6.6913 (6.7433) grad_norm 2.1108 (3.1942) loss_scale 256.0000 (246.7823) mem 14939MB [2024-07-25 08:23:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][620/625] eta 0:00:02 lr 0.000225 wd 0.0500 time 0.5592 (0.4174) data time 0.0006 (0.0015) model time 0.5586 (0.4174) loss 7.1564 (6.7440) grad_norm 3.5832 (3.1942) loss_scale 256.0000 (246.9308) mem 14939MB [2024-07-25 08:23:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 221 training takes 0:04:20 [2024-07-25 08:23:47 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:23:48 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.480 (0.480) Loss 0.5439 (0.5439) Acc@1 89.990 (89.990) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 08:23:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8560 (0.6782) Acc@1 81.689 (86.590) Acc@5 96.582 (97.776) Mem 14939MB [2024-07-25 08:23:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9482 (0.7827) Acc@1 77.832 (83.682) Acc@5 95.264 (96.780) Mem 14939MB [2024-07-25 08:23:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.331 Acc@5 96.731 [2024-07-25 08:23:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:23:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.759 (0.759) Loss 0.5420 (0.5420) Acc@1 89.941 (89.941) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 08:23:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.153) Loss 0.8345 (0.6673) Acc@1 81.982 (86.799) Acc@5 96.631 (97.887) Mem 14939MB [2024-07-25 08:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9551 (0.7773) Acc@1 77.783 (83.803) Acc@5 95.459 (96.842) Mem 14939MB [2024-07-25 08:23:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.401 Acc@5 96.803 [2024-07-25 08:23:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:23:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.40% [2024-07-25 08:23:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:23:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:23:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][0/625] eta 0:08:22 lr 0.000225 wd 0.0500 time 0.8040 (0.8040) data time 0.4257 (0.4257) model time 0.0000 (0.0000) loss 7.6898 (7.6898) grad_norm 2.0971 (2.0971) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:24:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][10/625] eta 0:04:37 lr 0.000225 wd 0.0500 time 0.3957 (0.4512) data time 0.0006 (0.0395) model time 0.0000 (0.0000) loss 6.2279 (6.8358) grad_norm 4.1449 (2.8635) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:24:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][20/625] eta 0:04:18 lr 0.000225 wd 0.0500 time 0.4045 (0.4266) data time 0.0009 (0.0211) model time 0.0000 (0.0000) loss 6.9563 (6.8076) grad_norm 2.7344 (2.6445) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:24:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][30/625] eta 0:04:08 lr 0.000225 wd 0.0500 time 0.4004 (0.4174) data time 0.0007 (0.0145) model time 0.0000 (0.0000) loss 5.6460 (6.7756) grad_norm 1.8204 (3.0187) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:24:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][40/625] eta 0:04:01 lr 0.000225 wd 0.0500 time 0.3993 (0.4128) data time 0.0008 (0.0112) model time 0.0000 (0.0000) loss 6.1714 (6.8062) grad_norm 4.8047 (3.2195) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:24:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][50/625] eta 0:03:55 lr 0.000225 wd 0.0500 time 0.4012 (0.4098) data time 0.0006 (0.0092) model time 0.0000 (0.0000) loss 6.1030 (6.8099) grad_norm 3.1152 (inf) loss_scale 128.0000 (250.9804) mem 14939MB [2024-07-25 08:24:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][60/625] eta 0:03:50 lr 0.000225 wd 0.0500 time 0.3968 (0.4080) data time 0.0008 (0.0078) model time 0.3960 (0.3981) loss 6.6498 (6.8227) grad_norm 4.8820 (inf) loss_scale 128.0000 (230.8197) mem 14939MB [2024-07-25 08:24:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][70/625] eta 0:03:45 lr 0.000225 wd 0.0500 time 0.4143 (0.4069) data time 0.0008 (0.0068) model time 0.4135 (0.3987) loss 7.2006 (6.8089) grad_norm 7.9608 (inf) loss_scale 128.0000 (216.3380) mem 14939MB [2024-07-25 08:24:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][80/625] eta 0:03:41 lr 0.000225 wd 0.0500 time 0.3934 (0.4061) data time 0.0009 (0.0061) model time 0.3925 (0.3989) loss 6.6311 (6.8197) grad_norm 2.0421 (inf) loss_scale 128.0000 (205.4321) mem 14939MB [2024-07-25 08:24:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][90/625] eta 0:03:36 lr 0.000225 wd 0.0500 time 0.3961 (0.4053) data time 0.0007 (0.0055) model time 0.3955 (0.3987) loss 6.4949 (6.8067) grad_norm 1.9862 (inf) loss_scale 128.0000 (196.9231) mem 14939MB [2024-07-25 08:24:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][100/625] eta 0:03:32 lr 0.000225 wd 0.0500 time 0.4343 (0.4051) data time 0.0006 (0.0050) model time 0.4336 (0.3996) loss 6.8301 (6.8042) grad_norm 2.3029 (inf) loss_scale 128.0000 (190.0990) mem 14939MB [2024-07-25 08:24:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][110/625] eta 0:03:28 lr 0.000224 wd 0.0500 time 0.3938 (0.4045) data time 0.0010 (0.0046) model time 0.3929 (0.3992) loss 5.6909 (6.7575) grad_norm 3.0097 (inf) loss_scale 128.0000 (184.5045) mem 14939MB [2024-07-25 08:24:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][120/625] eta 0:03:24 lr 0.000224 wd 0.0500 time 0.3972 (0.4040) data time 0.0006 (0.0043) model time 0.3966 (0.3990) loss 7.2632 (6.7657) grad_norm 3.3990 (inf) loss_scale 128.0000 (179.8347) mem 14939MB [2024-07-25 08:24:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][130/625] eta 0:03:19 lr 0.000224 wd 0.0500 time 0.3873 (0.4037) data time 0.0008 (0.0040) model time 0.3865 (0.3990) loss 7.9499 (6.7888) grad_norm 3.3007 (inf) loss_scale 128.0000 (175.8779) mem 14939MB [2024-07-25 08:24:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][140/625] eta 0:03:15 lr 0.000224 wd 0.0500 time 0.3970 (0.4033) data time 0.0010 (0.0038) model time 0.3961 (0.3989) loss 7.5835 (6.8243) grad_norm 2.5043 (inf) loss_scale 128.0000 (172.4823) mem 14939MB [2024-07-25 08:24:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][150/625] eta 0:03:11 lr 0.000224 wd 0.0500 time 0.3960 (0.4030) data time 0.0007 (0.0036) model time 0.3953 (0.3987) loss 6.7569 (6.8281) grad_norm 1.9530 (inf) loss_scale 128.0000 (169.5364) mem 14939MB [2024-07-25 08:25:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][160/625] eta 0:03:07 lr 0.000224 wd 0.0500 time 0.3960 (0.4037) data time 0.0007 (0.0034) model time 0.3954 (0.4000) loss 7.3364 (6.8232) grad_norm 2.7030 (inf) loss_scale 128.0000 (166.9565) mem 14939MB [2024-07-25 08:25:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][170/625] eta 0:03:04 lr 0.000224 wd 0.0500 time 0.5900 (0.4047) data time 0.0006 (0.0033) model time 0.5893 (0.4017) loss 7.2161 (6.8312) grad_norm 2.5232 (inf) loss_scale 128.0000 (164.6784) mem 14939MB [2024-07-25 08:25:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][180/625] eta 0:03:00 lr 0.000224 wd 0.0500 time 0.4039 (0.4067) data time 0.0009 (0.0031) model time 0.4030 (0.4047) loss 6.8993 (6.8317) grad_norm 3.1619 (inf) loss_scale 128.0000 (162.6519) mem 14939MB [2024-07-25 08:25:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][190/625] eta 0:02:58 lr 0.000224 wd 0.0500 time 0.3937 (0.4112) data time 0.0009 (0.0030) model time 0.3928 (0.4109) loss 6.6581 (6.8100) grad_norm 2.1264 (inf) loss_scale 128.0000 (160.8377) mem 14939MB [2024-07-25 08:25:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][200/625] eta 0:02:56 lr 0.000224 wd 0.0500 time 0.5869 (0.4158) data time 0.0008 (0.0029) model time 0.5861 (0.4171) loss 6.0601 (6.8145) grad_norm 5.4608 (inf) loss_scale 128.0000 (159.2040) mem 14939MB [2024-07-25 08:25:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][210/625] eta 0:02:53 lr 0.000224 wd 0.0500 time 0.3980 (0.4178) data time 0.0006 (0.0028) model time 0.3973 (0.4196) loss 7.4710 (6.8057) grad_norm 4.3044 (inf) loss_scale 128.0000 (157.7251) mem 14939MB [2024-07-25 08:25:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][220/625] eta 0:02:49 lr 0.000224 wd 0.0500 time 0.3978 (0.4179) data time 0.0008 (0.0027) model time 0.3970 (0.4196) loss 6.1032 (6.7921) grad_norm 3.3341 (inf) loss_scale 128.0000 (156.3801) mem 14939MB [2024-07-25 08:25:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][230/625] eta 0:02:45 lr 0.000223 wd 0.0500 time 0.3964 (0.4178) data time 0.0008 (0.0026) model time 0.3955 (0.4192) loss 6.6897 (6.7857) grad_norm 2.2848 (inf) loss_scale 128.0000 (155.1515) mem 14939MB [2024-07-25 08:25:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][240/625] eta 0:02:40 lr 0.000223 wd 0.0500 time 0.4001 (0.4170) data time 0.0008 (0.0026) model time 0.3993 (0.4181) loss 6.3495 (6.7680) grad_norm 2.3155 (inf) loss_scale 128.0000 (154.0249) mem 14939MB [2024-07-25 08:25:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][250/625] eta 0:02:36 lr 0.000223 wd 0.0500 time 0.3970 (0.4164) data time 0.0008 (0.0025) model time 0.3962 (0.4172) loss 6.9113 (6.7655) grad_norm 2.7446 (inf) loss_scale 128.0000 (152.9880) mem 14939MB [2024-07-25 08:25:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][260/625] eta 0:02:31 lr 0.000223 wd 0.0500 time 0.3986 (0.4157) data time 0.0006 (0.0024) model time 0.3980 (0.4164) loss 6.3811 (6.7641) grad_norm 3.3636 (inf) loss_scale 128.0000 (152.0307) mem 14939MB [2024-07-25 08:25:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][270/625] eta 0:02:27 lr 0.000223 wd 0.0500 time 0.4015 (0.4152) data time 0.0006 (0.0024) model time 0.4009 (0.4157) loss 6.3455 (6.7739) grad_norm 2.4558 (inf) loss_scale 128.0000 (151.1439) mem 14939MB [2024-07-25 08:25:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][280/625] eta 0:02:23 lr 0.000223 wd 0.0500 time 0.3978 (0.4146) data time 0.0008 (0.0023) model time 0.3970 (0.4149) loss 7.7948 (6.7795) grad_norm 2.2865 (inf) loss_scale 128.0000 (150.3203) mem 14939MB [2024-07-25 08:25:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][290/625] eta 0:02:18 lr 0.000223 wd 0.0500 time 0.3992 (0.4141) data time 0.0006 (0.0023) model time 0.3986 (0.4143) loss 6.1408 (6.7775) grad_norm 2.5920 (inf) loss_scale 128.0000 (149.5533) mem 14939MB [2024-07-25 08:25:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][300/625] eta 0:02:14 lr 0.000223 wd 0.0500 time 0.3986 (0.4137) data time 0.0008 (0.0022) model time 0.3978 (0.4138) loss 7.6439 (6.7815) grad_norm 2.0539 (inf) loss_scale 128.0000 (148.8372) mem 14939MB [2024-07-25 08:26:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][310/625] eta 0:02:10 lr 0.000223 wd 0.0500 time 0.3806 (0.4133) data time 0.0010 (0.0022) model time 0.3796 (0.4132) loss 7.5686 (6.7813) grad_norm 12.8035 (inf) loss_scale 128.0000 (148.1672) mem 14939MB [2024-07-25 08:26:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][320/625] eta 0:02:05 lr 0.000223 wd 0.0500 time 0.4001 (0.4129) data time 0.0008 (0.0021) model time 0.3994 (0.4127) loss 7.3944 (6.7852) grad_norm 4.2179 (inf) loss_scale 128.0000 (147.5389) mem 14939MB [2024-07-25 08:26:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][330/625] eta 0:02:01 lr 0.000223 wd 0.0500 time 0.3992 (0.4125) data time 0.0006 (0.0021) model time 0.3987 (0.4122) loss 6.3611 (6.7799) grad_norm 2.9286 (inf) loss_scale 128.0000 (146.9486) mem 14939MB [2024-07-25 08:26:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][340/625] eta 0:01:57 lr 0.000223 wd 0.0500 time 0.3949 (0.4120) data time 0.0006 (0.0020) model time 0.3942 (0.4117) loss 7.0773 (6.7814) grad_norm 3.1406 (inf) loss_scale 128.0000 (146.3930) mem 14939MB [2024-07-25 08:26:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][350/625] eta 0:01:53 lr 0.000222 wd 0.0500 time 0.3970 (0.4117) data time 0.0008 (0.0020) model time 0.3962 (0.4112) loss 6.5810 (6.7768) grad_norm 3.2922 (inf) loss_scale 128.0000 (145.8689) mem 14939MB [2024-07-25 08:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][360/625] eta 0:01:49 lr 0.000222 wd 0.0500 time 0.3994 (0.4113) data time 0.0008 (0.0020) model time 0.3986 (0.4108) loss 7.5505 (6.7772) grad_norm 1.9555 (inf) loss_scale 128.0000 (145.3740) mem 14939MB [2024-07-25 08:26:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][370/625] eta 0:01:44 lr 0.000222 wd 0.0500 time 0.3965 (0.4110) data time 0.0008 (0.0019) model time 0.3957 (0.4104) loss 6.5437 (6.7838) grad_norm 3.0076 (inf) loss_scale 128.0000 (144.9057) mem 14939MB [2024-07-25 08:26:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][380/625] eta 0:01:40 lr 0.000222 wd 0.0500 time 0.3960 (0.4111) data time 0.0008 (0.0019) model time 0.3952 (0.4105) loss 7.3497 (6.7944) grad_norm 1.9578 (inf) loss_scale 128.0000 (144.4619) mem 14939MB [2024-07-25 08:26:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][390/625] eta 0:01:36 lr 0.000222 wd 0.0500 time 0.4058 (0.4108) data time 0.0006 (0.0019) model time 0.4051 (0.4102) loss 7.4394 (6.7948) grad_norm 2.7989 (inf) loss_scale 128.0000 (144.0409) mem 14939MB [2024-07-25 08:26:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][400/625] eta 0:01:32 lr 0.000222 wd 0.0500 time 0.4096 (0.4126) data time 0.0007 (0.0018) model time 0.4089 (0.4122) loss 7.3423 (6.8008) grad_norm 3.2219 (inf) loss_scale 128.0000 (143.6409) mem 14939MB [2024-07-25 08:26:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][410/625] eta 0:01:29 lr 0.000222 wd 0.0500 time 0.5783 (0.4144) data time 0.0009 (0.0018) model time 0.5775 (0.4143) loss 6.4659 (6.7927) grad_norm 2.7208 (inf) loss_scale 128.0000 (143.2603) mem 14939MB [2024-07-25 08:26:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][420/625] eta 0:01:25 lr 0.000222 wd 0.0500 time 0.3986 (0.4170) data time 0.0009 (0.0018) model time 0.3977 (0.4172) loss 5.9606 (6.7905) grad_norm 2.7638 (inf) loss_scale 128.0000 (142.8979) mem 14939MB [2024-07-25 08:26:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][430/625] eta 0:01:21 lr 0.000222 wd 0.0500 time 0.3989 (0.4177) data time 0.0008 (0.0018) model time 0.3981 (0.4179) loss 7.5182 (6.7862) grad_norm 2.6079 (inf) loss_scale 128.0000 (142.5522) mem 14939MB [2024-07-25 08:26:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][440/625] eta 0:01:17 lr 0.000222 wd 0.0500 time 0.3948 (0.4180) data time 0.0006 (0.0017) model time 0.3942 (0.4183) loss 5.7824 (6.7832) grad_norm 2.2500 (inf) loss_scale 128.0000 (142.2222) mem 14939MB [2024-07-25 08:27:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][450/625] eta 0:01:13 lr 0.000222 wd 0.0500 time 0.4049 (0.4178) data time 0.0006 (0.0017) model time 0.4043 (0.4181) loss 6.8655 (6.7770) grad_norm 2.3953 (inf) loss_scale 128.0000 (141.9069) mem 14939MB [2024-07-25 08:27:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][460/625] eta 0:01:08 lr 0.000222 wd 0.0500 time 0.3931 (0.4174) data time 0.0008 (0.0017) model time 0.3923 (0.4176) loss 7.1693 (6.7798) grad_norm 3.4777 (inf) loss_scale 128.0000 (141.6052) mem 14939MB [2024-07-25 08:27:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][470/625] eta 0:01:04 lr 0.000221 wd 0.0500 time 0.3981 (0.4170) data time 0.0008 (0.0017) model time 0.3973 (0.4171) loss 6.8233 (6.7694) grad_norm 2.0577 (inf) loss_scale 128.0000 (141.3163) mem 14939MB [2024-07-25 08:27:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][480/625] eta 0:01:00 lr 0.000221 wd 0.0500 time 0.4139 (0.4166) data time 0.0008 (0.0017) model time 0.4132 (0.4167) loss 5.9575 (6.7702) grad_norm 3.1798 (inf) loss_scale 128.0000 (141.0395) mem 14939MB [2024-07-25 08:27:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][490/625] eta 0:00:56 lr 0.000221 wd 0.0500 time 0.3978 (0.4163) data time 0.0008 (0.0016) model time 0.3970 (0.4163) loss 7.2028 (6.7713) grad_norm 9.2050 (inf) loss_scale 128.0000 (140.7739) mem 14939MB [2024-07-25 08:27:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][500/625] eta 0:00:51 lr 0.000221 wd 0.0500 time 0.4011 (0.4160) data time 0.0007 (0.0016) model time 0.4004 (0.4159) loss 7.2387 (6.7706) grad_norm 3.5414 (inf) loss_scale 128.0000 (140.5190) mem 14939MB [2024-07-25 08:27:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][510/625] eta 0:00:47 lr 0.000221 wd 0.0500 time 0.4003 (0.4156) data time 0.0009 (0.0016) model time 0.3994 (0.4155) loss 6.3902 (6.7682) grad_norm 2.2468 (inf) loss_scale 128.0000 (140.2740) mem 14939MB [2024-07-25 08:27:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][520/625] eta 0:00:43 lr 0.000221 wd 0.0500 time 0.3940 (0.4153) data time 0.0006 (0.0016) model time 0.3934 (0.4151) loss 5.6509 (6.7588) grad_norm 1.8312 (inf) loss_scale 128.0000 (140.0384) mem 14939MB [2024-07-25 08:27:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][530/625] eta 0:00:39 lr 0.000221 wd 0.0500 time 0.4001 (0.4150) data time 0.0008 (0.0016) model time 0.3994 (0.4148) loss 7.2455 (6.7642) grad_norm 2.2179 (inf) loss_scale 128.0000 (139.8117) mem 14939MB [2024-07-25 08:27:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][540/625] eta 0:00:35 lr 0.000221 wd 0.0500 time 0.4098 (0.4147) data time 0.0006 (0.0016) model time 0.4092 (0.4145) loss 5.7518 (6.7597) grad_norm 2.0020 (inf) loss_scale 128.0000 (139.5933) mem 14939MB [2024-07-25 08:27:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][550/625] eta 0:00:31 lr 0.000221 wd 0.0500 time 0.3984 (0.4145) data time 0.0008 (0.0016) model time 0.3976 (0.4142) loss 6.1752 (6.7564) grad_norm 2.7227 (inf) loss_scale 128.0000 (139.3829) mem 14939MB [2024-07-25 08:27:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][560/625] eta 0:00:26 lr 0.000221 wd 0.0500 time 0.3968 (0.4142) data time 0.0008 (0.0015) model time 0.3960 (0.4139) loss 6.1875 (6.7627) grad_norm 1.8465 (inf) loss_scale 128.0000 (139.1800) mem 14939MB [2024-07-25 08:27:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][570/625] eta 0:00:22 lr 0.000221 wd 0.0500 time 0.4068 (0.4140) data time 0.0006 (0.0015) model time 0.4063 (0.4136) loss 7.5596 (6.7653) grad_norm 16.7821 (inf) loss_scale 128.0000 (138.9842) mem 14939MB [2024-07-25 08:27:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][580/625] eta 0:00:18 lr 0.000221 wd 0.0500 time 0.3947 (0.4137) data time 0.0008 (0.0015) model time 0.3940 (0.4133) loss 6.9205 (6.7627) grad_norm 3.0586 (inf) loss_scale 128.0000 (138.7952) mem 14939MB [2024-07-25 08:27:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][590/625] eta 0:00:14 lr 0.000221 wd 0.0500 time 0.3977 (0.4135) data time 0.0008 (0.0015) model time 0.3969 (0.4131) loss 7.8034 (6.7655) grad_norm 3.7274 (inf) loss_scale 128.0000 (138.6125) mem 14939MB [2024-07-25 08:28:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][600/625] eta 0:00:10 lr 0.000220 wd 0.0500 time 0.4194 (0.4136) data time 0.0009 (0.0015) model time 0.4185 (0.4131) loss 7.4514 (6.7694) grad_norm 2.9905 (inf) loss_scale 128.0000 (138.4359) mem 14939MB [2024-07-25 08:28:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][610/625] eta 0:00:06 lr 0.000220 wd 0.0500 time 0.5531 (0.4135) data time 0.0007 (0.0015) model time 0.5524 (0.4131) loss 6.1982 (6.7700) grad_norm 2.4202 (inf) loss_scale 128.0000 (138.2651) mem 14939MB [2024-07-25 08:28:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][620/625] eta 0:00:02 lr 0.000220 wd 0.0500 time 0.3981 (0.4141) data time 0.0006 (0.0015) model time 0.3975 (0.4138) loss 6.9294 (6.7719) grad_norm 3.7542 (inf) loss_scale 128.0000 (138.0998) mem 14939MB [2024-07-25 08:28:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 222 training takes 0:04:18 [2024-07-25 08:28:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:28:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:28:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.442 (0.442) Loss 0.5430 (0.5430) Acc@1 90.039 (90.039) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:28:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8623 (0.6786) Acc@1 81.982 (86.617) Acc@5 96.680 (97.852) Mem 14939MB [2024-07-25 08:28:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9658 (0.7911) Acc@1 77.783 (83.650) Acc@5 95.264 (96.742) Mem 14939MB [2024-07-25 08:28:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.303 Acc@5 96.711 [2024-07-25 08:28:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:28:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.958 (0.958) Loss 0.5420 (0.5420) Acc@1 90.039 (90.039) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 08:28:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.165) Loss 0.8335 (0.6668) Acc@1 82.129 (86.825) Acc@5 96.631 (97.883) Mem 14939MB [2024-07-25 08:28:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.127) Loss 0.9546 (0.7765) Acc@1 77.881 (83.833) Acc@5 95.459 (96.842) Mem 14939MB [2024-07-25 08:28:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.431 Acc@5 96.803 [2024-07-25 08:28:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:28:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.43% [2024-07-25 08:28:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:28:21 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:28:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][0/625] eta 0:08:16 lr 0.000220 wd 0.0500 time 0.7946 (0.7946) data time 0.4070 (0.4070) model time 0.0000 (0.0000) loss 5.5394 (5.5394) grad_norm 2.8446 (2.8446) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:28:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][10/625] eta 0:05:28 lr 0.000220 wd 0.0500 time 0.5842 (0.5348) data time 0.0008 (0.0377) model time 0.0000 (0.0000) loss 6.2923 (6.6250) grad_norm 2.5476 (2.5418) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:28:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][20/625] eta 0:05:03 lr 0.000220 wd 0.0500 time 0.5259 (0.5018) data time 0.0007 (0.0201) model time 0.0000 (0.0000) loss 6.1328 (6.6824) grad_norm 4.2317 (2.6343) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:28:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][30/625] eta 0:04:52 lr 0.000220 wd 0.0500 time 0.3960 (0.4915) data time 0.0008 (0.0139) model time 0.0000 (0.0000) loss 8.3213 (6.7337) grad_norm 4.1910 (2.7787) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:28:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][40/625] eta 0:04:34 lr 0.000220 wd 0.0500 time 0.3927 (0.4684) data time 0.0007 (0.0107) model time 0.0000 (0.0000) loss 6.3450 (6.6710) grad_norm 2.3923 (2.7790) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:28:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][50/625] eta 0:04:22 lr 0.000220 wd 0.0500 time 0.3999 (0.4569) data time 0.0006 (0.0087) model time 0.0000 (0.0000) loss 6.0964 (6.6904) grad_norm 2.6563 (2.8676) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:28:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][60/625] eta 0:04:12 lr 0.000220 wd 0.0500 time 0.4034 (0.4474) data time 0.0006 (0.0074) model time 0.4028 (0.3979) loss 6.2187 (6.6801) grad_norm 3.4459 (3.3735) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:28:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][70/625] eta 0:04:04 lr 0.000220 wd 0.0500 time 0.3960 (0.4402) data time 0.0006 (0.0065) model time 0.3954 (0.3969) loss 5.5784 (6.6805) grad_norm 2.8538 (3.3527) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:28:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][80/625] eta 0:03:57 lr 0.000220 wd 0.0500 time 0.4054 (0.4350) data time 0.0006 (0.0058) model time 0.4048 (0.3971) loss 6.2692 (6.6741) grad_norm 6.2362 (3.5302) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][90/625] eta 0:03:50 lr 0.000219 wd 0.0500 time 0.3999 (0.4310) data time 0.0006 (0.0052) model time 0.3993 (0.3973) loss 6.8762 (6.6465) grad_norm 2.9717 (3.6872) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][100/625] eta 0:03:44 lr 0.000219 wd 0.0500 time 0.4008 (0.4278) data time 0.0006 (0.0048) model time 0.4002 (0.3974) loss 6.1525 (6.6104) grad_norm 2.9359 (3.6302) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][110/625] eta 0:03:39 lr 0.000219 wd 0.0500 time 0.3992 (0.4253) data time 0.0008 (0.0044) model time 0.3983 (0.3977) loss 6.6610 (6.5991) grad_norm 2.2806 (3.6216) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][120/625] eta 0:03:33 lr 0.000219 wd 0.0500 time 0.4011 (0.4232) data time 0.0007 (0.0041) model time 0.4004 (0.3980) loss 4.7572 (6.5890) grad_norm 1.9938 (3.6038) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][130/625] eta 0:03:28 lr 0.000219 wd 0.0500 time 0.3997 (0.4215) data time 0.0006 (0.0039) model time 0.3991 (0.3982) loss 6.4466 (6.6148) grad_norm 4.2553 (3.5811) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][140/625] eta 0:03:23 lr 0.000219 wd 0.0500 time 0.4008 (0.4200) data time 0.0009 (0.0036) model time 0.3999 (0.3983) loss 6.6166 (6.5913) grad_norm 3.2740 (3.5273) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][150/625] eta 0:03:18 lr 0.000219 wd 0.0500 time 0.3986 (0.4187) data time 0.0008 (0.0035) model time 0.3978 (0.3985) loss 6.8829 (6.6157) grad_norm 1.9832 (3.4938) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][160/625] eta 0:03:14 lr 0.000219 wd 0.0500 time 0.3996 (0.4176) data time 0.0006 (0.0033) model time 0.3990 (0.3985) loss 6.3586 (6.6188) grad_norm 3.8635 (3.7027) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][170/625] eta 0:03:09 lr 0.000219 wd 0.0500 time 0.4008 (0.4166) data time 0.0006 (0.0031) model time 0.4002 (0.3987) loss 6.0396 (6.6304) grad_norm 2.1442 (3.6539) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][180/625] eta 0:03:04 lr 0.000219 wd 0.0500 time 0.4060 (0.4157) data time 0.0008 (0.0030) model time 0.4053 (0.3988) loss 7.3880 (6.6360) grad_norm 3.6558 (3.6181) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][190/625] eta 0:03:00 lr 0.000219 wd 0.0500 time 0.3984 (0.4160) data time 0.0007 (0.0029) model time 0.3978 (0.4003) loss 7.0321 (6.6540) grad_norm 5.6606 (3.6233) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][200/625] eta 0:02:56 lr 0.000219 wd 0.0500 time 0.3998 (0.4157) data time 0.0008 (0.0028) model time 0.3991 (0.4009) loss 7.6603 (6.6728) grad_norm 2.2004 (3.6073) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][210/625] eta 0:02:52 lr 0.000219 wd 0.0500 time 0.3992 (0.4167) data time 0.0008 (0.0027) model time 0.3985 (0.4031) loss 5.8232 (6.6556) grad_norm 2.5502 (3.6019) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][220/625] eta 0:02:49 lr 0.000218 wd 0.0500 time 0.4029 (0.4186) data time 0.0009 (0.0026) model time 0.4020 (0.4063) loss 7.7549 (6.6620) grad_norm 2.8110 (3.5799) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:29:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][230/625] eta 0:02:46 lr 0.000218 wd 0.0500 time 0.6042 (0.4222) data time 0.0006 (0.0025) model time 0.6036 (0.4116) loss 6.6479 (6.6593) grad_norm 2.0420 (3.5629) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][240/625] eta 0:02:42 lr 0.000218 wd 0.0500 time 0.3969 (0.4228) data time 0.0007 (0.0025) model time 0.3962 (0.4129) loss 7.0281 (6.6694) grad_norm 4.0950 (3.5642) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][250/625] eta 0:02:39 lr 0.000218 wd 0.0500 time 0.3945 (0.4243) data time 0.0006 (0.0024) model time 0.3939 (0.4152) loss 5.8930 (6.6655) grad_norm 2.2697 (3.6300) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][260/625] eta 0:02:34 lr 0.000218 wd 0.0500 time 0.3990 (0.4240) data time 0.0006 (0.0023) model time 0.3984 (0.4152) loss 7.4262 (6.6725) grad_norm 2.7165 (3.6216) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][270/625] eta 0:02:30 lr 0.000218 wd 0.0500 time 0.3945 (0.4236) data time 0.0006 (0.0023) model time 0.3939 (0.4151) loss 7.0065 (6.6898) grad_norm 2.3578 (3.5970) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][280/625] eta 0:02:25 lr 0.000218 wd 0.0500 time 0.3976 (0.4227) data time 0.0006 (0.0022) model time 0.3970 (0.4144) loss 6.2506 (6.6862) grad_norm 3.3614 (3.6082) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][290/625] eta 0:02:21 lr 0.000218 wd 0.0500 time 0.3976 (0.4219) data time 0.0006 (0.0022) model time 0.3970 (0.4137) loss 5.5676 (6.6823) grad_norm 3.2864 (3.5989) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][300/625] eta 0:02:16 lr 0.000218 wd 0.0500 time 0.3975 (0.4211) data time 0.0008 (0.0021) model time 0.3967 (0.4131) loss 5.8226 (6.6823) grad_norm 3.6348 (3.5964) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][310/625] eta 0:02:12 lr 0.000218 wd 0.0500 time 0.4033 (0.4204) data time 0.0008 (0.0021) model time 0.4025 (0.4124) loss 6.9063 (6.6827) grad_norm 2.7024 (3.5841) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][320/625] eta 0:02:08 lr 0.000218 wd 0.0500 time 0.3990 (0.4198) data time 0.0008 (0.0020) model time 0.3982 (0.4120) loss 6.9103 (6.6765) grad_norm 12.7008 (3.5917) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][330/625] eta 0:02:03 lr 0.000218 wd 0.0500 time 0.3924 (0.4191) data time 0.0006 (0.0020) model time 0.3918 (0.4115) loss 6.0920 (6.6810) grad_norm 2.5878 (3.5765) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][340/625] eta 0:01:59 lr 0.000217 wd 0.0500 time 0.3941 (0.4186) data time 0.0008 (0.0020) model time 0.3933 (0.4111) loss 6.1969 (6.6711) grad_norm 3.9621 (3.7161) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][350/625] eta 0:01:54 lr 0.000217 wd 0.0500 time 0.3979 (0.4180) data time 0.0008 (0.0019) model time 0.3971 (0.4106) loss 7.0977 (6.6730) grad_norm 2.4830 (3.7239) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][360/625] eta 0:01:50 lr 0.000217 wd 0.0500 time 0.3964 (0.4175) data time 0.0007 (0.0019) model time 0.3957 (0.4103) loss 7.0253 (6.6779) grad_norm 2.1343 (3.7143) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][370/625] eta 0:01:46 lr 0.000217 wd 0.0500 time 0.3984 (0.4170) data time 0.0008 (0.0019) model time 0.3976 (0.4099) loss 7.3196 (6.6818) grad_norm 2.4767 (3.6924) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][380/625] eta 0:01:42 lr 0.000217 wd 0.0500 time 0.4060 (0.4165) data time 0.0008 (0.0018) model time 0.4052 (0.4095) loss 5.8378 (6.6855) grad_norm 3.8120 (3.6892) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][390/625] eta 0:01:37 lr 0.000217 wd 0.0500 time 0.3969 (0.4161) data time 0.0008 (0.0018) model time 0.3960 (0.4092) loss 6.3540 (6.6864) grad_norm 2.1757 (3.7022) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][400/625] eta 0:01:33 lr 0.000217 wd 0.0500 time 0.3958 (0.4156) data time 0.0008 (0.0018) model time 0.3950 (0.4089) loss 7.1789 (6.6978) grad_norm 5.2654 (3.6926) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][410/625] eta 0:01:29 lr 0.000217 wd 0.0500 time 0.6024 (0.4157) data time 0.0007 (0.0018) model time 0.6018 (0.4091) loss 5.6173 (6.7004) grad_norm 3.0881 (3.6781) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][420/625] eta 0:01:25 lr 0.000217 wd 0.0500 time 0.3948 (0.4157) data time 0.0006 (0.0017) model time 0.3941 (0.4092) loss 6.4503 (6.7029) grad_norm 2.0246 (3.6511) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][430/625] eta 0:01:21 lr 0.000217 wd 0.0500 time 0.5398 (0.4163) data time 0.0007 (0.0017) model time 0.5391 (0.4100) loss 7.1335 (6.7075) grad_norm 3.2799 (3.6275) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][440/625] eta 0:01:17 lr 0.000217 wd 0.0500 time 0.6019 (0.4167) data time 0.0006 (0.0017) model time 0.6013 (0.4107) loss 6.2694 (6.7100) grad_norm 1.7698 (3.6022) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][450/625] eta 0:01:13 lr 0.000217 wd 0.0500 time 0.5992 (0.4183) data time 0.0007 (0.0017) model time 0.5985 (0.4126) loss 6.0153 (6.7070) grad_norm 8.3911 (3.6214) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][460/625] eta 0:01:09 lr 0.000217 wd 0.0500 time 0.3967 (0.4191) data time 0.0006 (0.0017) model time 0.3960 (0.4136) loss 7.0709 (6.7090) grad_norm 2.7631 (3.5994) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][470/625] eta 0:01:05 lr 0.000216 wd 0.0500 time 0.5984 (0.4205) data time 0.0009 (0.0016) model time 0.5976 (0.4153) loss 7.3249 (6.7101) grad_norm 2.4359 (3.5776) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][480/625] eta 0:01:00 lr 0.000216 wd 0.0500 time 0.3958 (0.4200) data time 0.0006 (0.0016) model time 0.3952 (0.4148) loss 5.7996 (6.7119) grad_norm 3.9108 (3.5708) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][490/625] eta 0:00:56 lr 0.000216 wd 0.0500 time 0.3953 (0.4199) data time 0.0007 (0.0016) model time 0.3946 (0.4148) loss 7.3553 (6.7194) grad_norm 3.3176 (3.5508) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][500/625] eta 0:00:52 lr 0.000216 wd 0.0500 time 0.3961 (0.4195) data time 0.0008 (0.0016) model time 0.3953 (0.4144) loss 6.8297 (6.7123) grad_norm 1.9873 (3.5420) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][510/625] eta 0:00:48 lr 0.000216 wd 0.0500 time 0.4001 (0.4191) data time 0.0008 (0.0016) model time 0.3993 (0.4141) loss 6.6455 (6.7078) grad_norm 4.4126 (3.5447) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:31:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][520/625] eta 0:00:43 lr 0.000216 wd 0.0500 time 0.4002 (0.4187) data time 0.0008 (0.0016) model time 0.3994 (0.4138) loss 6.9967 (6.7067) grad_norm 2.6689 (3.5867) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][530/625] eta 0:00:39 lr 0.000216 wd 0.0500 time 0.3974 (0.4183) data time 0.0007 (0.0015) model time 0.3967 (0.4134) loss 6.9613 (6.7107) grad_norm 1.9458 (3.5690) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][540/625] eta 0:00:35 lr 0.000216 wd 0.0500 time 0.4024 (0.4179) data time 0.0006 (0.0015) model time 0.4018 (0.4131) loss 7.9477 (6.7129) grad_norm 35.0364 (3.6242) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][550/625] eta 0:00:31 lr 0.000216 wd 0.0500 time 0.3971 (0.4176) data time 0.0008 (0.0015) model time 0.3963 (0.4128) loss 6.9863 (6.7179) grad_norm 2.2809 (3.6112) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][560/625] eta 0:00:27 lr 0.000216 wd 0.0500 time 0.3977 (0.4172) data time 0.0008 (0.0015) model time 0.3969 (0.4124) loss 7.2426 (6.7163) grad_norm 1.9227 (3.6110) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][570/625] eta 0:00:22 lr 0.000216 wd 0.0500 time 0.4002 (0.4169) data time 0.0008 (0.0015) model time 0.3994 (0.4122) loss 5.8244 (6.7139) grad_norm 4.0049 (3.5938) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][580/625] eta 0:00:18 lr 0.000216 wd 0.0500 time 0.3950 (0.4166) data time 0.0008 (0.0015) model time 0.3942 (0.4119) loss 6.8044 (6.7174) grad_norm 3.3914 (3.5768) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][590/625] eta 0:00:14 lr 0.000215 wd 0.0500 time 0.3988 (0.4163) data time 0.0006 (0.0015) model time 0.3981 (0.4116) loss 6.6551 (6.7190) grad_norm 12.5805 (3.5987) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][600/625] eta 0:00:10 lr 0.000215 wd 0.0500 time 0.3987 (0.4159) data time 0.0006 (0.0015) model time 0.3981 (0.4114) loss 6.7974 (6.7184) grad_norm 2.2057 (3.6083) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][610/625] eta 0:00:06 lr 0.000215 wd 0.0500 time 0.3955 (0.4157) data time 0.0004 (0.0015) model time 0.3951 (0.4111) loss 5.9930 (6.7168) grad_norm 3.2290 (3.6268) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][620/625] eta 0:00:02 lr 0.000215 wd 0.0500 time 0.3973 (0.4154) data time 0.0006 (0.0014) model time 0.3967 (0.4109) loss 8.1952 (6.7202) grad_norm 3.2475 (3.6181) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 223 training takes 0:04:19 [2024-07-25 08:32:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:32:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:32:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.5356 (0.5356) Acc@1 90.234 (90.234) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 08:32:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8452 (0.6707) Acc@1 82.178 (86.843) Acc@5 96.484 (97.812) Mem 14939MB [2024-07-25 08:32:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9424 (0.7800) Acc@1 78.418 (83.824) Acc@5 95.459 (96.817) Mem 14939MB [2024-07-25 08:32:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.417 Acc@5 96.783 [2024-07-25 08:32:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:32:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.42% [2024-07-25 08:32:44 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 08:32:45 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 08:32:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5415 (0.5415) Acc@1 90.088 (90.088) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:32:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8330 (0.6664) Acc@1 82.178 (86.852) Acc@5 96.582 (97.896) Mem 14939MB [2024-07-25 08:32:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9541 (0.7759) Acc@1 77.930 (83.847) Acc@5 95.459 (96.854) Mem 14939MB [2024-07-25 08:32:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.447 Acc@5 96.813 [2024-07-25 08:32:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:32:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.45% [2024-07-25 08:32:48 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:32:49 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:32:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][0/625] eta 0:08:31 lr 0.000215 wd 0.0500 time 0.8179 (0.8179) data time 0.4423 (0.4423) model time 0.0000 (0.0000) loss 6.1777 (6.1777) grad_norm 7.1745 (7.1745) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][10/625] eta 0:04:37 lr 0.000215 wd 0.0500 time 0.3972 (0.4520) data time 0.0008 (0.0409) model time 0.0000 (0.0000) loss 5.6681 (6.5110) grad_norm 2.5237 (6.2680) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:32:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][20/625] eta 0:04:22 lr 0.000215 wd 0.0500 time 0.5464 (0.4331) data time 0.0006 (0.0218) model time 0.0000 (0.0000) loss 7.3131 (6.4621) grad_norm 3.8254 (6.4461) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][30/625] eta 0:04:20 lr 0.000215 wd 0.0500 time 0.5884 (0.4381) data time 0.0007 (0.0150) model time 0.0000 (0.0000) loss 7.1053 (6.5702) grad_norm 3.0664 (5.5549) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][40/625] eta 0:04:23 lr 0.000215 wd 0.0500 time 0.3969 (0.4502) data time 0.0007 (0.0116) model time 0.0000 (0.0000) loss 4.9486 (6.6028) grad_norm 2.0968 (4.9888) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][50/625] eta 0:04:26 lr 0.000215 wd 0.0500 time 0.5975 (0.4636) data time 0.0007 (0.0095) model time 0.0000 (0.0000) loss 7.2593 (6.6441) grad_norm 2.7210 (4.6049) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][60/625] eta 0:04:20 lr 0.000215 wd 0.0500 time 0.5765 (0.4614) data time 0.0006 (0.0080) model time 0.5759 (0.4494) loss 5.9298 (6.6749) grad_norm 3.3486 (4.4018) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][70/625] eta 0:04:14 lr 0.000215 wd 0.0500 time 0.4055 (0.4580) data time 0.0009 (0.0070) model time 0.4046 (0.4428) loss 6.9615 (6.7325) grad_norm 3.0815 (4.4041) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][80/625] eta 0:04:06 lr 0.000215 wd 0.0500 time 0.5755 (0.4531) data time 0.0009 (0.0063) model time 0.5746 (0.4344) loss 7.7053 (6.7392) grad_norm 4.2533 (4.3533) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][90/625] eta 0:03:59 lr 0.000214 wd 0.0500 time 0.3978 (0.4471) data time 0.0007 (0.0057) model time 0.3971 (0.4253) loss 6.9604 (6.7164) grad_norm 2.5431 (4.2836) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][100/625] eta 0:03:52 lr 0.000214 wd 0.0500 time 0.3981 (0.4424) data time 0.0007 (0.0052) model time 0.3973 (0.4200) loss 6.8874 (6.6947) grad_norm 3.5715 (4.5675) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][110/625] eta 0:03:45 lr 0.000214 wd 0.0500 time 0.3966 (0.4384) data time 0.0007 (0.0048) model time 0.3960 (0.4162) loss 6.6359 (6.6869) grad_norm 2.2656 (4.3914) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][120/625] eta 0:03:39 lr 0.000214 wd 0.0500 time 0.3969 (0.4351) data time 0.0009 (0.0045) model time 0.3960 (0.4136) loss 6.5664 (6.6662) grad_norm 3.5661 (4.2612) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][130/625] eta 0:03:33 lr 0.000214 wd 0.0500 time 0.3997 (0.4323) data time 0.0007 (0.0042) model time 0.3990 (0.4115) loss 6.5590 (6.6708) grad_norm 1.7954 (4.1017) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][140/625] eta 0:03:28 lr 0.000214 wd 0.0500 time 0.3965 (0.4308) data time 0.0006 (0.0039) model time 0.3959 (0.4114) loss 5.3903 (6.6393) grad_norm 2.2846 (3.9954) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][150/625] eta 0:03:23 lr 0.000214 wd 0.0500 time 0.3930 (0.4286) data time 0.0009 (0.0037) model time 0.3922 (0.4100) loss 7.1173 (6.6549) grad_norm 2.2312 (3.9945) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:33:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][160/625] eta 0:03:18 lr 0.000214 wd 0.0500 time 0.3998 (0.4268) data time 0.0008 (0.0036) model time 0.3990 (0.4089) loss 7.6798 (6.6650) grad_norm 2.3291 (3.9433) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][170/625] eta 0:03:13 lr 0.000214 wd 0.0500 time 0.3967 (0.4250) data time 0.0007 (0.0034) model time 0.3961 (0.4078) loss 6.9902 (6.6667) grad_norm 4.6842 (3.9064) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][180/625] eta 0:03:08 lr 0.000214 wd 0.0500 time 0.3978 (0.4235) data time 0.0009 (0.0033) model time 0.3970 (0.4070) loss 7.2481 (6.6503) grad_norm 2.4174 (3.8489) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][190/625] eta 0:03:03 lr 0.000214 wd 0.0500 time 0.3993 (0.4222) data time 0.0008 (0.0031) model time 0.3984 (0.4063) loss 6.1814 (6.6629) grad_norm 2.3842 (3.7890) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][200/625] eta 0:02:58 lr 0.000214 wd 0.0500 time 0.3996 (0.4209) data time 0.0007 (0.0030) model time 0.3989 (0.4056) loss 6.4863 (6.6629) grad_norm 2.3204 (3.7906) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][210/625] eta 0:02:54 lr 0.000214 wd 0.0500 time 0.3969 (0.4198) data time 0.0008 (0.0029) model time 0.3961 (0.4050) loss 5.6907 (6.6659) grad_norm 3.3921 (3.8139) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][220/625] eta 0:02:49 lr 0.000213 wd 0.0500 time 0.3957 (0.4188) data time 0.0008 (0.0028) model time 0.3949 (0.4046) loss 6.1212 (6.6654) grad_norm 2.9111 (3.8291) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][230/625] eta 0:02:45 lr 0.000213 wd 0.0500 time 0.3966 (0.4184) data time 0.0006 (0.0027) model time 0.3960 (0.4048) loss 6.9354 (6.6682) grad_norm 1.9762 (3.7822) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][240/625] eta 0:02:40 lr 0.000213 wd 0.0500 time 0.4000 (0.4176) data time 0.0010 (0.0026) model time 0.3990 (0.4044) loss 7.4869 (6.6683) grad_norm 2.6850 (3.7342) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][250/625] eta 0:02:37 lr 0.000213 wd 0.0500 time 0.4050 (0.4190) data time 0.0008 (0.0026) model time 0.4042 (0.4068) loss 5.3339 (6.6575) grad_norm 2.2344 (3.6925) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][260/625] eta 0:02:33 lr 0.000213 wd 0.0500 time 0.4016 (0.4206) data time 0.0007 (0.0025) model time 0.4009 (0.4093) loss 6.4462 (6.6542) grad_norm 2.0829 (3.6550) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][270/625] eta 0:02:30 lr 0.000213 wd 0.0500 time 0.5848 (0.4244) data time 0.0006 (0.0024) model time 0.5842 (0.4145) loss 5.8327 (6.6465) grad_norm 2.2264 (3.6344) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][280/625] eta 0:02:26 lr 0.000213 wd 0.0500 time 0.5545 (0.4255) data time 0.0008 (0.0024) model time 0.5537 (0.4163) loss 6.8185 (6.6427) grad_norm 2.4906 (3.5914) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][290/625] eta 0:02:22 lr 0.000213 wd 0.0500 time 0.3929 (0.4258) data time 0.0009 (0.0023) model time 0.3919 (0.4169) loss 5.8134 (6.6373) grad_norm 7.2259 (3.5680) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:34:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][300/625] eta 0:02:18 lr 0.000213 wd 0.0500 time 0.4074 (0.4249) data time 0.0006 (0.0023) model time 0.4068 (0.4162) loss 6.7834 (6.6504) grad_norm 2.2174 (3.5430) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][310/625] eta 0:02:13 lr 0.000213 wd 0.0500 time 0.3974 (0.4247) data time 0.0008 (0.0022) model time 0.3966 (0.4163) loss 6.4689 (6.6478) grad_norm 2.6465 (3.5898) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][320/625] eta 0:02:09 lr 0.000213 wd 0.0500 time 0.3989 (0.4240) data time 0.0008 (0.0022) model time 0.3981 (0.4157) loss 7.5995 (6.6535) grad_norm 3.6664 (3.5946) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][330/625] eta 0:02:04 lr 0.000213 wd 0.0500 time 0.3993 (0.4232) data time 0.0008 (0.0021) model time 0.3985 (0.4150) loss 7.2601 (6.6595) grad_norm 2.9153 (3.5756) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][340/625] eta 0:02:00 lr 0.000212 wd 0.0500 time 0.4001 (0.4225) data time 0.0008 (0.0021) model time 0.3993 (0.4144) loss 6.4171 (6.6578) grad_norm 3.0401 (3.5660) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][350/625] eta 0:01:55 lr 0.000212 wd 0.0500 time 0.3960 (0.4218) data time 0.0007 (0.0021) model time 0.3953 (0.4138) loss 7.0220 (6.6569) grad_norm 7.8758 (3.5527) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][360/625] eta 0:01:51 lr 0.000212 wd 0.0500 time 0.5392 (0.4217) data time 0.0006 (0.0020) model time 0.5385 (0.4140) loss 7.5155 (6.6667) grad_norm 3.0607 (3.5377) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][370/625] eta 0:01:47 lr 0.000212 wd 0.0500 time 0.3983 (0.4211) data time 0.0006 (0.0020) model time 0.3977 (0.4135) loss 6.3159 (6.6719) grad_norm 2.5873 (3.5036) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][380/625] eta 0:01:43 lr 0.000212 wd 0.0500 time 0.4015 (0.4207) data time 0.0008 (0.0020) model time 0.4007 (0.4132) loss 5.9508 (6.6635) grad_norm 5.3925 (3.5108) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][390/625] eta 0:01:38 lr 0.000212 wd 0.0500 time 0.3981 (0.4201) data time 0.0006 (0.0019) model time 0.3975 (0.4128) loss 5.7849 (6.6664) grad_norm 4.4212 (3.5422) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][400/625] eta 0:01:34 lr 0.000212 wd 0.0500 time 0.3978 (0.4196) data time 0.0008 (0.0019) model time 0.3970 (0.4123) loss 6.7100 (6.6579) grad_norm 3.0473 (3.5468) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][410/625] eta 0:01:30 lr 0.000212 wd 0.0500 time 0.3984 (0.4191) data time 0.0008 (0.0019) model time 0.3976 (0.4120) loss 6.3534 (6.6609) grad_norm 1.9776 (3.5236) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][420/625] eta 0:01:25 lr 0.000212 wd 0.0500 time 0.4007 (0.4188) data time 0.0006 (0.0019) model time 0.4001 (0.4118) loss 5.9323 (6.6606) grad_norm 1.9674 (3.5085) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][430/625] eta 0:01:21 lr 0.000212 wd 0.0500 time 0.3955 (0.4183) data time 0.0008 (0.0018) model time 0.3947 (0.4114) loss 7.5093 (6.6578) grad_norm 2.5263 (3.4857) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][440/625] eta 0:01:17 lr 0.000212 wd 0.0500 time 0.3946 (0.4179) data time 0.0008 (0.0018) model time 0.3937 (0.4111) loss 8.1491 (6.6619) grad_norm 4.2232 (3.4719) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:35:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][450/625] eta 0:01:13 lr 0.000212 wd 0.0500 time 0.3994 (0.4177) data time 0.0007 (0.0018) model time 0.3987 (0.4111) loss 6.0035 (6.6594) grad_norm 3.7220 (3.4581) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][460/625] eta 0:01:08 lr 0.000212 wd 0.0500 time 0.3992 (0.4174) data time 0.0008 (0.0018) model time 0.3985 (0.4108) loss 7.3810 (6.6568) grad_norm 2.8638 (3.4456) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][470/625] eta 0:01:04 lr 0.000211 wd 0.0500 time 0.3973 (0.4181) data time 0.0010 (0.0017) model time 0.3963 (0.4118) loss 7.3382 (6.6644) grad_norm 3.0747 (3.4328) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][480/625] eta 0:01:00 lr 0.000211 wd 0.0500 time 0.5878 (0.4200) data time 0.0009 (0.0017) model time 0.5869 (0.4140) loss 7.1063 (6.6616) grad_norm 2.4455 (3.4217) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][490/625] eta 0:00:56 lr 0.000211 wd 0.0500 time 0.5986 (0.4218) data time 0.0008 (0.0017) model time 0.5977 (0.4162) loss 6.1410 (6.6589) grad_norm 2.5614 (3.4121) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][500/625] eta 0:00:52 lr 0.000211 wd 0.0500 time 0.3998 (0.4220) data time 0.0007 (0.0017) model time 0.3991 (0.4165) loss 7.5789 (6.6639) grad_norm 3.6067 (3.4223) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][510/625] eta 0:00:48 lr 0.000211 wd 0.0500 time 0.4086 (0.4224) data time 0.0007 (0.0017) model time 0.4079 (0.4170) loss 5.6616 (6.6660) grad_norm 3.6691 (3.4222) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][520/625] eta 0:00:44 lr 0.000211 wd 0.0500 time 0.3988 (0.4220) data time 0.0006 (0.0017) model time 0.3982 (0.4166) loss 6.2056 (6.6698) grad_norm 3.2866 (3.4745) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][530/625] eta 0:00:40 lr 0.000211 wd 0.0500 time 0.4102 (0.4219) data time 0.0007 (0.0016) model time 0.4096 (0.4166) loss 7.0540 (6.6727) grad_norm 3.2343 (3.4784) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][540/625] eta 0:00:35 lr 0.000211 wd 0.0500 time 0.3964 (0.4214) data time 0.0009 (0.0016) model time 0.3956 (0.4162) loss 7.3689 (6.6720) grad_norm 3.1648 (3.4683) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][550/625] eta 0:00:31 lr 0.000211 wd 0.0500 time 0.3978 (0.4210) data time 0.0008 (0.0016) model time 0.3970 (0.4159) loss 6.4988 (6.6713) grad_norm 2.1356 (3.4620) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][560/625] eta 0:00:27 lr 0.000211 wd 0.0500 time 0.4031 (0.4207) data time 0.0008 (0.0016) model time 0.4023 (0.4156) loss 7.5185 (6.6774) grad_norm 2.2440 (3.4658) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][570/625] eta 0:00:23 lr 0.000211 wd 0.0500 time 0.3970 (0.4202) data time 0.0007 (0.0016) model time 0.3963 (0.4152) loss 7.0267 (6.6724) grad_norm 6.2755 (3.5185) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][580/625] eta 0:00:18 lr 0.000211 wd 0.0500 time 0.3977 (0.4198) data time 0.0006 (0.0016) model time 0.3971 (0.4148) loss 7.1015 (6.6777) grad_norm 2.4436 (3.5029) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:36:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][590/625] eta 0:00:14 lr 0.000210 wd 0.0500 time 0.3983 (0.4198) data time 0.0007 (0.0016) model time 0.3976 (0.4148) loss 6.1237 (6.6744) grad_norm 2.9121 (3.4989) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][600/625] eta 0:00:10 lr 0.000210 wd 0.0500 time 0.3989 (0.4194) data time 0.0008 (0.0015) model time 0.3981 (0.4145) loss 7.6091 (6.6786) grad_norm 3.4639 (3.4885) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][610/625] eta 0:00:06 lr 0.000210 wd 0.0500 time 0.3968 (0.4191) data time 0.0006 (0.0015) model time 0.3962 (0.4142) loss 8.1691 (6.6802) grad_norm 2.2096 (3.4757) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][620/625] eta 0:00:02 lr 0.000210 wd 0.0500 time 0.3902 (0.4187) data time 0.0007 (0.0015) model time 0.3895 (0.4139) loss 6.3652 (6.6761) grad_norm 1.9387 (3.4755) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 224 training takes 0:04:21 [2024-07-25 08:37:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:37:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:37:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.5444 (0.5444) Acc@1 89.893 (89.893) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 08:37:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.8442 (0.6745) Acc@1 81.738 (86.790) Acc@5 96.484 (97.900) Mem 14939MB [2024-07-25 08:37:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9785 (0.7852) Acc@1 77.295 (83.738) Acc@5 95.166 (96.819) Mem 14939MB [2024-07-25 08:37:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.369 Acc@5 96.797 [2024-07-25 08:37:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.850 (0.850) Loss 0.5415 (0.5415) Acc@1 90.088 (90.088) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.157) Loss 0.8311 (0.6657) Acc@1 82.227 (86.883) Acc@5 96.533 (97.914) Mem 14939MB [2024-07-25 08:37:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9526 (0.7750) Acc@1 77.881 (83.880) Acc@5 95.361 (96.875) Mem 14939MB [2024-07-25 08:37:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.477 Acc@5 96.833 [2024-07-25 08:37:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-07-25 08:37:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.48% [2024-07-25 08:37:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:37:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:37:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][0/625] eta 0:08:00 lr 0.000210 wd 0.0500 time 0.7682 (0.7682) data time 0.3915 (0.3915) model time 0.0000 (0.0000) loss 7.1283 (7.1283) grad_norm 2.8582 (2.8582) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][10/625] eta 0:04:25 lr 0.000210 wd 0.0500 time 0.3945 (0.4315) data time 0.0008 (0.0363) model time 0.0000 (0.0000) loss 7.2944 (6.8122) grad_norm 3.3839 (6.1220) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][20/625] eta 0:04:12 lr 0.000210 wd 0.0500 time 0.3967 (0.4166) data time 0.0006 (0.0194) model time 0.0000 (0.0000) loss 7.6877 (6.8202) grad_norm 2.5040 (4.7736) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][30/625] eta 0:04:04 lr 0.000210 wd 0.0500 time 0.4127 (0.4111) data time 0.0009 (0.0134) model time 0.0000 (0.0000) loss 7.4164 (6.9061) grad_norm 2.4473 (4.0663) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][40/625] eta 0:03:58 lr 0.000210 wd 0.0500 time 0.3997 (0.4084) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 7.0439 (6.7912) grad_norm 5.7372 (4.2811) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][50/625] eta 0:03:56 lr 0.000210 wd 0.0500 time 0.3987 (0.4108) data time 0.0009 (0.0085) model time 0.0000 (0.0000) loss 6.8394 (6.7628) grad_norm 4.1623 (3.9986) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][60/625] eta 0:03:52 lr 0.000210 wd 0.0500 time 0.6107 (0.4121) data time 0.0006 (0.0072) model time 0.6101 (0.4176) loss 7.7633 (6.7662) grad_norm 2.0559 (3.8433) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][70/625] eta 0:03:51 lr 0.000210 wd 0.0500 time 0.5974 (0.4171) data time 0.0006 (0.0063) model time 0.5968 (0.4321) loss 7.5565 (6.7337) grad_norm 3.3212 (3.7485) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][80/625] eta 0:03:50 lr 0.000210 wd 0.0500 time 0.5976 (0.4230) data time 0.0008 (0.0056) model time 0.5967 (0.4429) loss 6.6474 (6.6729) grad_norm 2.7588 (3.6290) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:37:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][90/625] eta 0:03:47 lr 0.000209 wd 0.0500 time 0.5722 (0.4253) data time 0.0009 (0.0051) model time 0.5714 (0.4430) loss 6.7025 (6.6507) grad_norm 2.4392 (3.6062) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][100/625] eta 0:03:45 lr 0.000209 wd 0.0500 time 0.6035 (0.4298) data time 0.0009 (0.0047) model time 0.6027 (0.4483) loss 5.1779 (6.6740) grad_norm 2.3173 (3.4918) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][110/625] eta 0:03:41 lr 0.000209 wd 0.0500 time 0.3992 (0.4300) data time 0.0006 (0.0043) model time 0.3986 (0.4454) loss 7.8124 (6.6928) grad_norm 3.6570 (3.4424) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][120/625] eta 0:03:36 lr 0.000209 wd 0.0500 time 0.4021 (0.4297) data time 0.0006 (0.0040) model time 0.4015 (0.4426) loss 6.1831 (6.6607) grad_norm 3.6213 (3.4191) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][130/625] eta 0:03:31 lr 0.000209 wd 0.0500 time 0.3972 (0.4275) data time 0.0009 (0.0038) model time 0.3963 (0.4374) loss 7.0675 (6.6962) grad_norm 2.9379 (3.4166) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][140/625] eta 0:03:26 lr 0.000209 wd 0.0500 time 0.3992 (0.4255) data time 0.0009 (0.0036) model time 0.3983 (0.4330) loss 6.0538 (6.6472) grad_norm 2.8883 (3.3851) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][150/625] eta 0:03:21 lr 0.000209 wd 0.0500 time 0.3948 (0.4238) data time 0.0006 (0.0034) model time 0.3942 (0.4296) loss 6.3086 (6.6381) grad_norm 3.3043 (3.4134) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][160/625] eta 0:03:16 lr 0.000209 wd 0.0500 time 0.3973 (0.4223) data time 0.0008 (0.0032) model time 0.3966 (0.4268) loss 8.4720 (6.6554) grad_norm 2.6006 (3.4089) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][170/625] eta 0:03:11 lr 0.000209 wd 0.0500 time 0.4083 (0.4211) data time 0.0009 (0.0031) model time 0.4073 (0.4247) loss 6.0233 (6.6563) grad_norm 3.0878 (3.3957) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 08:38:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][180/625] eta 0:03:06 lr 0.000209 wd 0.0500 time 0.3954 (0.4200) data time 0.0009 (0.0030) model time 0.3945 (0.4229) loss 5.5229 (6.6565) grad_norm 3.2339 (3.3944) loss_scale 256.0000 (132.9503) mem 14939MB [2024-07-25 08:38:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][190/625] eta 0:03:02 lr 0.000209 wd 0.0500 time 0.4008 (0.4190) data time 0.0006 (0.0029) model time 0.4002 (0.4211) loss 8.1364 (6.6741) grad_norm 3.6872 (3.3625) loss_scale 256.0000 (139.3927) mem 14939MB [2024-07-25 08:38:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][200/625] eta 0:02:57 lr 0.000209 wd 0.0500 time 0.3987 (0.4180) data time 0.0006 (0.0028) model time 0.3981 (0.4197) loss 6.7395 (6.6772) grad_norm 3.4647 (3.3394) loss_scale 256.0000 (145.1940) mem 14939MB [2024-07-25 08:38:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][210/625] eta 0:02:53 lr 0.000209 wd 0.0500 time 0.3990 (0.4172) data time 0.0006 (0.0027) model time 0.3984 (0.4184) loss 7.3527 (6.6752) grad_norm 2.3237 (3.2967) loss_scale 256.0000 (150.4455) mem 14939MB [2024-07-25 08:38:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][220/625] eta 0:02:48 lr 0.000208 wd 0.0500 time 0.4012 (0.4165) data time 0.0006 (0.0026) model time 0.4006 (0.4174) loss 6.5014 (6.6679) grad_norm 2.1419 (3.2614) loss_scale 256.0000 (155.2217) mem 14939MB [2024-07-25 08:38:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][230/625] eta 0:02:44 lr 0.000208 wd 0.0500 time 0.3966 (0.4157) data time 0.0008 (0.0025) model time 0.3958 (0.4162) loss 5.5645 (6.6765) grad_norm 2.1160 (3.2300) loss_scale 256.0000 (159.5844) mem 14939MB [2024-07-25 08:38:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][240/625] eta 0:02:39 lr 0.000208 wd 0.0500 time 0.3956 (0.4149) data time 0.0009 (0.0024) model time 0.3947 (0.4152) loss 8.3234 (6.6723) grad_norm 2.2875 (3.3591) loss_scale 256.0000 (163.5851) mem 14939MB [2024-07-25 08:39:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][250/625] eta 0:02:35 lr 0.000208 wd 0.0500 time 0.3966 (0.4143) data time 0.0006 (0.0024) model time 0.3959 (0.4143) loss 7.2545 (6.6755) grad_norm 4.4657 (3.3519) loss_scale 256.0000 (167.2669) mem 14939MB [2024-07-25 08:39:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][260/625] eta 0:02:30 lr 0.000208 wd 0.0500 time 0.4020 (0.4137) data time 0.0007 (0.0023) model time 0.4013 (0.4135) loss 7.5324 (6.6739) grad_norm 3.2461 (3.3307) loss_scale 256.0000 (170.6667) mem 14939MB [2024-07-25 08:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][270/625] eta 0:02:26 lr 0.000208 wd 0.0500 time 0.3970 (0.4135) data time 0.0007 (0.0022) model time 0.3964 (0.4133) loss 6.9171 (6.6692) grad_norm 2.5963 (3.3170) loss_scale 256.0000 (173.8155) mem 14939MB [2024-07-25 08:39:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][280/625] eta 0:02:22 lr 0.000208 wd 0.0500 time 0.3956 (0.4135) data time 0.0007 (0.0022) model time 0.3949 (0.4133) loss 6.6169 (6.6458) grad_norm 3.1572 (3.3626) loss_scale 256.0000 (176.7402) mem 14939MB [2024-07-25 08:39:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][290/625] eta 0:02:19 lr 0.000208 wd 0.0500 time 0.5933 (0.4152) data time 0.0008 (0.0021) model time 0.5925 (0.4154) loss 6.7057 (6.6505) grad_norm 5.3413 (3.3527) loss_scale 256.0000 (179.4639) mem 14939MB [2024-07-25 08:39:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][300/625] eta 0:02:15 lr 0.000208 wd 0.0500 time 0.5700 (0.4174) data time 0.0009 (0.0021) model time 0.5691 (0.4180) loss 7.5522 (6.6629) grad_norm 2.3549 (3.3495) loss_scale 256.0000 (182.0066) mem 14939MB [2024-07-25 08:39:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][310/625] eta 0:02:12 lr 0.000208 wd 0.0500 time 0.5808 (0.4191) data time 0.0007 (0.0021) model time 0.5801 (0.4199) loss 6.4885 (6.6688) grad_norm 3.2369 (3.3347) loss_scale 256.0000 (184.3859) mem 14939MB [2024-07-25 08:39:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][320/625] eta 0:02:08 lr 0.000208 wd 0.0500 time 0.3958 (0.4199) data time 0.0006 (0.0020) model time 0.3952 (0.4208) loss 5.7891 (6.6594) grad_norm 2.1121 (3.3256) loss_scale 256.0000 (186.6168) mem 14939MB [2024-07-25 08:39:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][330/625] eta 0:02:03 lr 0.000208 wd 0.0500 time 0.3956 (0.4199) data time 0.0008 (0.0020) model time 0.3948 (0.4208) loss 6.2096 (6.6701) grad_norm 2.8272 (3.3143) loss_scale 256.0000 (188.7130) mem 14939MB [2024-07-25 08:39:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][340/625] eta 0:01:59 lr 0.000207 wd 0.0500 time 0.3984 (0.4202) data time 0.0008 (0.0020) model time 0.3975 (0.4210) loss 5.8209 (6.6713) grad_norm 2.2569 (3.2994) loss_scale 256.0000 (190.6862) mem 14939MB [2024-07-25 08:39:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][350/625] eta 0:01:55 lr 0.000207 wd 0.0500 time 0.3964 (0.4196) data time 0.0007 (0.0019) model time 0.3957 (0.4202) loss 6.3375 (6.6711) grad_norm 2.5927 (3.3075) loss_scale 256.0000 (192.5470) mem 14939MB [2024-07-25 08:39:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][360/625] eta 0:01:51 lr 0.000207 wd 0.0500 time 0.3970 (0.4190) data time 0.0007 (0.0019) model time 0.3963 (0.4195) loss 7.2709 (6.6719) grad_norm 2.4536 (3.3004) loss_scale 256.0000 (194.3047) mem 14939MB [2024-07-25 08:39:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][370/625] eta 0:01:46 lr 0.000207 wd 0.0500 time 0.3990 (0.4184) data time 0.0007 (0.0019) model time 0.3983 (0.4188) loss 6.3891 (6.6723) grad_norm 1.8785 (3.3525) loss_scale 256.0000 (195.9677) mem 14939MB [2024-07-25 08:39:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][380/625] eta 0:01:42 lr 0.000207 wd 0.0500 time 0.3958 (0.4178) data time 0.0006 (0.0018) model time 0.3952 (0.4181) loss 5.7660 (6.6697) grad_norm 2.9154 (3.3424) loss_scale 256.0000 (197.5433) mem 14939MB [2024-07-25 08:40:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][390/625] eta 0:01:38 lr 0.000207 wd 0.0500 time 0.3987 (0.4173) data time 0.0008 (0.0018) model time 0.3979 (0.4175) loss 6.3716 (6.6592) grad_norm 2.9079 (3.3425) loss_scale 256.0000 (199.0384) mem 14939MB [2024-07-25 08:40:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][400/625] eta 0:01:33 lr 0.000207 wd 0.0500 time 0.3964 (0.4168) data time 0.0007 (0.0018) model time 0.3958 (0.4169) loss 5.6065 (6.6713) grad_norm 2.1755 (3.3245) loss_scale 256.0000 (200.4589) mem 14939MB [2024-07-25 08:40:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][410/625] eta 0:01:29 lr 0.000207 wd 0.0500 time 0.3959 (0.4164) data time 0.0009 (0.0018) model time 0.3951 (0.4164) loss 7.8470 (6.6836) grad_norm 2.8244 (3.3020) loss_scale 256.0000 (201.8102) mem 14939MB [2024-07-25 08:40:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][420/625] eta 0:01:25 lr 0.000207 wd 0.0500 time 0.3964 (0.4160) data time 0.0008 (0.0017) model time 0.3956 (0.4159) loss 7.3149 (6.6859) grad_norm 2.9954 (3.3767) loss_scale 256.0000 (203.0974) mem 14939MB [2024-07-25 08:40:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][430/625] eta 0:01:21 lr 0.000207 wd 0.0500 time 0.3986 (0.4156) data time 0.0009 (0.0017) model time 0.3977 (0.4154) loss 6.2677 (6.6915) grad_norm 2.6586 (3.3842) loss_scale 256.0000 (204.3248) mem 14939MB [2024-07-25 08:40:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][440/625] eta 0:01:16 lr 0.000207 wd 0.0500 time 0.3982 (0.4152) data time 0.0007 (0.0017) model time 0.3975 (0.4150) loss 6.0847 (6.6864) grad_norm 2.6655 (3.3652) loss_scale 256.0000 (205.4966) mem 14939MB [2024-07-25 08:40:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][450/625] eta 0:01:12 lr 0.000207 wd 0.0500 time 0.3999 (0.4148) data time 0.0008 (0.0017) model time 0.3991 (0.4145) loss 7.2387 (6.6891) grad_norm 4.6288 (3.3689) loss_scale 256.0000 (206.6164) mem 14939MB [2024-07-25 08:40:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][460/625] eta 0:01:08 lr 0.000207 wd 0.0500 time 0.3988 (0.4145) data time 0.0009 (0.0017) model time 0.3980 (0.4141) loss 7.3912 (6.6933) grad_norm 2.0349 (3.3646) loss_scale 256.0000 (207.6876) mem 14939MB [2024-07-25 08:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][470/625] eta 0:01:04 lr 0.000206 wd 0.0500 time 0.3975 (0.4142) data time 0.0008 (0.0016) model time 0.3967 (0.4138) loss 6.9280 (6.6928) grad_norm 4.6027 (3.3586) loss_scale 256.0000 (208.7134) mem 14939MB [2024-07-25 08:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][480/625] eta 0:01:00 lr 0.000206 wd 0.0500 time 0.3975 (0.4138) data time 0.0006 (0.0016) model time 0.3969 (0.4134) loss 6.5375 (6.6992) grad_norm 2.9437 (3.3567) loss_scale 256.0000 (209.6965) mem 14939MB [2024-07-25 08:40:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][490/625] eta 0:00:55 lr 0.000206 wd 0.0500 time 0.4031 (0.4138) data time 0.0006 (0.0016) model time 0.4025 (0.4133) loss 6.9455 (6.7074) grad_norm 3.0475 (3.3670) loss_scale 256.0000 (210.6395) mem 14939MB [2024-07-25 08:40:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][500/625] eta 0:00:51 lr 0.000206 wd 0.0500 time 0.3984 (0.4139) data time 0.0008 (0.0016) model time 0.3976 (0.4135) loss 6.4600 (6.7067) grad_norm 1.9120 (3.3717) loss_scale 256.0000 (211.5449) mem 14939MB [2024-07-25 08:40:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][510/625] eta 0:00:47 lr 0.000206 wd 0.0500 time 0.5956 (0.4150) data time 0.0008 (0.0016) model time 0.5948 (0.4147) loss 7.5663 (6.7102) grad_norm 3.3224 (3.3953) loss_scale 256.0000 (212.4149) mem 14939MB [2024-07-25 08:40:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][520/625] eta 0:00:43 lr 0.000206 wd 0.0500 time 0.5926 (0.4168) data time 0.0009 (0.0016) model time 0.5917 (0.4166) loss 7.0457 (6.7112) grad_norm 3.7005 (3.4029) loss_scale 256.0000 (213.2514) mem 14939MB [2024-07-25 08:40:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][530/625] eta 0:00:39 lr 0.000206 wd 0.0500 time 0.4015 (0.4177) data time 0.0009 (0.0015) model time 0.4006 (0.4176) loss 7.6290 (6.7079) grad_norm 2.6603 (3.4183) loss_scale 256.0000 (214.0565) mem 14939MB [2024-07-25 08:41:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][540/625] eta 0:00:35 lr 0.000206 wd 0.0500 time 0.5871 (0.4192) data time 0.0009 (0.0015) model time 0.5862 (0.4193) loss 7.2533 (6.7035) grad_norm 4.2071 (3.4209) loss_scale 256.0000 (214.8318) mem 14939MB [2024-07-25 08:41:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][550/625] eta 0:00:31 lr 0.000206 wd 0.0500 time 0.3983 (0.4191) data time 0.0007 (0.0015) model time 0.3976 (0.4191) loss 7.5255 (6.6997) grad_norm 2.5879 (3.4257) loss_scale 256.0000 (215.5789) mem 14939MB [2024-07-25 08:41:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][560/625] eta 0:00:27 lr 0.000206 wd 0.0500 time 0.5741 (0.4193) data time 0.0008 (0.0015) model time 0.5733 (0.4193) loss 7.0744 (6.7012) grad_norm 3.7835 (3.4175) loss_scale 256.0000 (216.2995) mem 14939MB [2024-07-25 08:41:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][570/625] eta 0:00:23 lr 0.000206 wd 0.0500 time 0.4035 (0.4189) data time 0.0007 (0.0015) model time 0.4028 (0.4189) loss 7.7905 (6.7026) grad_norm 2.3085 (3.5016) loss_scale 256.0000 (216.9947) mem 14939MB [2024-07-25 08:41:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][580/625] eta 0:00:18 lr 0.000206 wd 0.0500 time 0.3984 (0.4186) data time 0.0007 (0.0015) model time 0.3976 (0.4185) loss 7.0323 (6.7017) grad_norm 2.7801 (3.5112) loss_scale 256.0000 (217.6661) mem 14939MB [2024-07-25 08:41:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][590/625] eta 0:00:14 lr 0.000206 wd 0.0500 time 0.3978 (0.4182) data time 0.0009 (0.0015) model time 0.3969 (0.4181) loss 6.1830 (6.6991) grad_norm 2.6026 (3.4964) loss_scale 256.0000 (218.3147) mem 14939MB [2024-07-25 08:41:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][600/625] eta 0:00:10 lr 0.000205 wd 0.0500 time 0.3991 (0.4179) data time 0.0008 (0.0015) model time 0.3982 (0.4178) loss 5.3555 (6.6910) grad_norm 2.3101 (3.4851) loss_scale 256.0000 (218.9418) mem 14939MB [2024-07-25 08:41:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][610/625] eta 0:00:06 lr 0.000205 wd 0.0500 time 0.3976 (0.4176) data time 0.0006 (0.0015) model time 0.3970 (0.4174) loss 6.9920 (6.6936) grad_norm 2.2597 (3.4728) loss_scale 256.0000 (219.5483) mem 14939MB [2024-07-25 08:41:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][620/625] eta 0:00:02 lr 0.000205 wd 0.0500 time 0.4021 (0.4173) data time 0.0007 (0.0014) model time 0.4015 (0.4171) loss 7.2597 (6.6980) grad_norm 2.5610 (3.5434) loss_scale 256.0000 (220.1353) mem 14939MB [2024-07-25 08:41:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 225 training takes 0:04:20 [2024-07-25 08:41:38 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:41:39 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:41:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5308 (0.5308) Acc@1 90.576 (90.576) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 08:41:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8555 (0.6711) Acc@1 81.250 (86.763) Acc@5 96.387 (97.825) Mem 14939MB [2024-07-25 08:41:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9443 (0.7799) Acc@1 78.174 (83.736) Acc@5 95.605 (96.780) Mem 14939MB [2024-07-25 08:41:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.343 Acc@5 96.751 [2024-07-25 08:41:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-07-25 08:41:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.797 (0.797) Loss 0.5410 (0.5410) Acc@1 90.088 (90.088) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:41:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.159) Loss 0.8291 (0.6653) Acc@1 82.324 (86.919) Acc@5 96.582 (97.918) Mem 14939MB [2024-07-25 08:41:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 0.9517 (0.7744) Acc@1 77.832 (83.912) Acc@5 95.459 (96.887) Mem 14939MB [2024-07-25 08:41:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.509 Acc@5 96.839 [2024-07-25 08:41:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-07-25 08:41:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.51% [2024-07-25 08:41:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:41:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:41:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][0/625] eta 0:08:49 lr 0.000205 wd 0.0500 time 0.8476 (0.8476) data time 0.4700 (0.4700) model time 0.0000 (0.0000) loss 7.3260 (7.3260) grad_norm 6.6559 (6.6559) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:41:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][10/625] eta 0:04:29 lr 0.000205 wd 0.0500 time 0.3977 (0.4387) data time 0.0009 (0.0435) model time 0.0000 (0.0000) loss 5.9781 (6.3558) grad_norm 3.5890 (3.4491) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:41:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][20/625] eta 0:04:13 lr 0.000205 wd 0.0500 time 0.3978 (0.4197) data time 0.0009 (0.0232) model time 0.0000 (0.0000) loss 7.3116 (6.4307) grad_norm 2.6301 (3.2718) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:41:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][30/625] eta 0:04:05 lr 0.000205 wd 0.0500 time 0.3976 (0.4129) data time 0.0008 (0.0160) model time 0.0000 (0.0000) loss 5.6979 (6.5225) grad_norm 4.6854 (3.4753) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][40/625] eta 0:03:59 lr 0.000205 wd 0.0500 time 0.3964 (0.4092) data time 0.0007 (0.0123) model time 0.0000 (0.0000) loss 6.1943 (6.4907) grad_norm 14.6126 (3.7414) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][50/625] eta 0:03:54 lr 0.000205 wd 0.0500 time 0.4049 (0.4072) data time 0.0007 (0.0100) model time 0.0000 (0.0000) loss 6.0932 (6.4665) grad_norm 2.6882 (3.6668) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][60/625] eta 0:03:49 lr 0.000205 wd 0.0500 time 0.3972 (0.4055) data time 0.0007 (0.0085) model time 0.3965 (0.3961) loss 6.2230 (6.5490) grad_norm 6.1758 (3.6555) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][70/625] eta 0:03:44 lr 0.000205 wd 0.0500 time 0.3970 (0.4045) data time 0.0007 (0.0074) model time 0.3962 (0.3968) loss 7.0297 (6.5394) grad_norm 2.4621 (3.8384) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][80/625] eta 0:03:39 lr 0.000205 wd 0.0500 time 0.3987 (0.4037) data time 0.0008 (0.0066) model time 0.3979 (0.3968) loss 7.0005 (6.6165) grad_norm 9.1427 (3.7760) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][90/625] eta 0:03:37 lr 0.000205 wd 0.0500 time 0.3953 (0.4069) data time 0.0007 (0.0060) model time 0.3946 (0.4057) loss 5.5012 (6.6097) grad_norm 3.2788 (3.6639) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][100/625] eta 0:03:34 lr 0.000204 wd 0.0500 time 0.3977 (0.4082) data time 0.0006 (0.0055) model time 0.3971 (0.4084) loss 6.5504 (6.5964) grad_norm 1.8189 (3.5897) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][110/625] eta 0:03:34 lr 0.000204 wd 0.0500 time 0.5586 (0.4169) data time 0.0006 (0.0050) model time 0.5580 (0.4244) loss 7.4064 (6.6084) grad_norm 3.6693 (3.7448) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][120/625] eta 0:03:33 lr 0.000204 wd 0.0500 time 0.5731 (0.4226) data time 0.0007 (0.0047) model time 0.5724 (0.4330) loss 7.6186 (6.6359) grad_norm 2.1582 (3.6373) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][130/625] eta 0:03:30 lr 0.000204 wd 0.0500 time 0.4006 (0.4251) data time 0.0009 (0.0044) model time 0.3998 (0.4357) loss 7.7486 (6.6193) grad_norm 3.5217 (3.5943) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][140/625] eta 0:03:27 lr 0.000204 wd 0.0500 time 0.5153 (0.4277) data time 0.0009 (0.0042) model time 0.5143 (0.4385) loss 7.2543 (6.6506) grad_norm 3.9391 (3.6046) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][150/625] eta 0:03:22 lr 0.000204 wd 0.0500 time 0.3983 (0.4258) data time 0.0008 (0.0039) model time 0.3974 (0.4344) loss 7.0757 (6.6652) grad_norm 4.7608 (3.7747) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][160/625] eta 0:03:17 lr 0.000204 wd 0.0500 time 0.3972 (0.4252) data time 0.0009 (0.0037) model time 0.3963 (0.4328) loss 6.1014 (6.6610) grad_norm 2.4634 (3.7241) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:42:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][170/625] eta 0:03:12 lr 0.000204 wd 0.0500 time 0.3994 (0.4237) data time 0.0007 (0.0036) model time 0.3988 (0.4299) loss 7.8505 (6.6604) grad_norm 2.8573 (3.6706) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][180/625] eta 0:03:07 lr 0.000204 wd 0.0500 time 0.3991 (0.4224) data time 0.0007 (0.0034) model time 0.3984 (0.4276) loss 6.8843 (6.6498) grad_norm 4.4727 (3.6317) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][190/625] eta 0:03:03 lr 0.000204 wd 0.0500 time 0.4006 (0.4211) data time 0.0007 (0.0033) model time 0.3999 (0.4254) loss 7.6092 (6.6591) grad_norm 1.9853 (3.6396) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][200/625] eta 0:02:58 lr 0.000204 wd 0.0500 time 0.4022 (0.4200) data time 0.0007 (0.0032) model time 0.4016 (0.4235) loss 7.7867 (6.6738) grad_norm 2.6176 (3.5966) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][210/625] eta 0:02:53 lr 0.000204 wd 0.0500 time 0.3969 (0.4190) data time 0.0009 (0.0030) model time 0.3960 (0.4220) loss 5.7776 (6.6644) grad_norm 3.1054 (3.5700) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][220/625] eta 0:02:49 lr 0.000204 wd 0.0500 time 0.3980 (0.4183) data time 0.0010 (0.0029) model time 0.3971 (0.4208) loss 6.4996 (6.6830) grad_norm 5.1264 (3.8117) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][230/625] eta 0:02:44 lr 0.000203 wd 0.0500 time 0.3996 (0.4175) data time 0.0006 (0.0028) model time 0.3990 (0.4196) loss 6.0890 (6.6822) grad_norm 2.3311 (3.7741) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][240/625] eta 0:02:40 lr 0.000203 wd 0.0500 time 0.3957 (0.4167) data time 0.0006 (0.0028) model time 0.3951 (0.4185) loss 7.1871 (6.6752) grad_norm 3.5780 (3.7420) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][250/625] eta 0:02:36 lr 0.000203 wd 0.0500 time 0.3983 (0.4161) data time 0.0006 (0.0027) model time 0.3976 (0.4175) loss 6.4464 (6.6786) grad_norm 3.3792 (3.7413) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][260/625] eta 0:02:31 lr 0.000203 wd 0.0500 time 0.3978 (0.4154) data time 0.0008 (0.0026) model time 0.3969 (0.4166) loss 7.4155 (6.6811) grad_norm 2.9989 (3.7008) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][270/625] eta 0:02:27 lr 0.000203 wd 0.0500 time 0.4006 (0.4148) data time 0.0007 (0.0026) model time 0.3998 (0.4157) loss 6.1258 (6.6814) grad_norm 7.3643 (3.7085) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][280/625] eta 0:02:22 lr 0.000203 wd 0.0500 time 0.4013 (0.4142) data time 0.0006 (0.0025) model time 0.4007 (0.4150) loss 6.3399 (6.6870) grad_norm 3.9998 (3.7044) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][290/625] eta 0:02:18 lr 0.000203 wd 0.0500 time 0.3982 (0.4137) data time 0.0007 (0.0024) model time 0.3975 (0.4143) loss 5.9952 (6.6951) grad_norm 7.2222 (3.7071) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][300/625] eta 0:02:14 lr 0.000203 wd 0.0500 time 0.4033 (0.4132) data time 0.0006 (0.0024) model time 0.4027 (0.4136) loss 6.1861 (6.7003) grad_norm 2.7540 (3.6771) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][310/625] eta 0:02:10 lr 0.000203 wd 0.0500 time 0.6118 (0.4147) data time 0.0008 (0.0023) model time 0.6109 (0.4153) loss 5.8009 (6.7047) grad_norm 2.9752 (3.6572) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:43:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][320/625] eta 0:02:06 lr 0.000203 wd 0.0500 time 0.5416 (0.4146) data time 0.0006 (0.0023) model time 0.5410 (0.4152) loss 5.9180 (6.7043) grad_norm 3.6818 (3.6305) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][330/625] eta 0:02:02 lr 0.000203 wd 0.0500 time 0.5878 (0.4163) data time 0.0009 (0.0022) model time 0.5869 (0.4171) loss 7.5127 (6.7110) grad_norm 2.2365 (3.5897) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][340/625] eta 0:01:59 lr 0.000203 wd 0.0500 time 0.5937 (0.4187) data time 0.0008 (0.0022) model time 0.5930 (0.4198) loss 6.4512 (6.7073) grad_norm 3.4008 (3.5773) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][350/625] eta 0:01:55 lr 0.000202 wd 0.0500 time 0.5784 (0.4203) data time 0.0006 (0.0022) model time 0.5777 (0.4217) loss 6.5241 (6.7051) grad_norm 2.7850 (3.5722) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][360/625] eta 0:01:51 lr 0.000202 wd 0.0500 time 0.4050 (0.4206) data time 0.0007 (0.0021) model time 0.4043 (0.4220) loss 7.2164 (6.6978) grad_norm 2.4533 (3.5699) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][370/625] eta 0:01:47 lr 0.000202 wd 0.0500 time 0.3985 (0.4203) data time 0.0007 (0.0021) model time 0.3977 (0.4216) loss 6.4637 (6.6956) grad_norm 3.4017 (3.5583) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][380/625] eta 0:01:42 lr 0.000202 wd 0.0500 time 0.3982 (0.4202) data time 0.0007 (0.0021) model time 0.3974 (0.4214) loss 6.6715 (6.6961) grad_norm 2.2366 (3.5676) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][390/625] eta 0:01:38 lr 0.000202 wd 0.0500 time 0.3986 (0.4197) data time 0.0006 (0.0020) model time 0.3980 (0.4207) loss 7.6770 (6.7031) grad_norm 3.4188 (3.6100) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][400/625] eta 0:01:34 lr 0.000202 wd 0.0500 time 0.4010 (0.4192) data time 0.0008 (0.0020) model time 0.4002 (0.4201) loss 6.8044 (6.7012) grad_norm 3.3794 (3.5863) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][410/625] eta 0:01:30 lr 0.000202 wd 0.0500 time 0.3964 (0.4187) data time 0.0007 (0.0020) model time 0.3957 (0.4195) loss 7.5458 (6.7024) grad_norm 2.4747 (3.5871) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][420/625] eta 0:01:25 lr 0.000202 wd 0.0500 time 0.3981 (0.4183) data time 0.0008 (0.0019) model time 0.3973 (0.4190) loss 7.0520 (6.7027) grad_norm 2.3443 (3.5633) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][430/625] eta 0:01:21 lr 0.000202 wd 0.0500 time 0.3945 (0.4178) data time 0.0009 (0.0019) model time 0.3937 (0.4184) loss 6.3571 (6.6976) grad_norm 1.8210 (3.5336) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][440/625] eta 0:01:17 lr 0.000202 wd 0.0500 time 0.3970 (0.4173) data time 0.0009 (0.0019) model time 0.3962 (0.4178) loss 6.3557 (6.6902) grad_norm 3.1962 (3.5230) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][450/625] eta 0:01:12 lr 0.000202 wd 0.0500 time 0.3960 (0.4169) data time 0.0009 (0.0019) model time 0.3952 (0.4173) loss 7.4909 (6.6973) grad_norm 2.4914 (3.5032) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:44:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][460/625] eta 0:01:08 lr 0.000202 wd 0.0500 time 0.4002 (0.4166) data time 0.0008 (0.0018) model time 0.3994 (0.4169) loss 5.1637 (6.6935) grad_norm 2.3383 (3.6118) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][470/625] eta 0:01:04 lr 0.000202 wd 0.0500 time 0.3945 (0.4162) data time 0.0008 (0.0018) model time 0.3937 (0.4165) loss 7.7098 (6.6872) grad_norm 1.9706 (3.6046) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][480/625] eta 0:01:00 lr 0.000201 wd 0.0500 time 0.4003 (0.4159) data time 0.0007 (0.0018) model time 0.3996 (0.4161) loss 6.1887 (6.6803) grad_norm 2.6787 (3.5847) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][490/625] eta 0:00:56 lr 0.000201 wd 0.0500 time 0.3991 (0.4156) data time 0.0006 (0.0018) model time 0.3984 (0.4157) loss 6.4409 (6.6867) grad_norm 4.7482 (3.5759) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][500/625] eta 0:00:51 lr 0.000201 wd 0.0500 time 0.3967 (0.4153) data time 0.0008 (0.0018) model time 0.3959 (0.4154) loss 5.8804 (6.6869) grad_norm 4.4919 (3.5681) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][510/625] eta 0:00:47 lr 0.000201 wd 0.0500 time 0.3987 (0.4150) data time 0.0006 (0.0017) model time 0.3981 (0.4150) loss 6.3081 (6.6876) grad_norm 2.7083 (3.5866) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][520/625] eta 0:00:43 lr 0.000201 wd 0.0500 time 0.4025 (0.4147) data time 0.0006 (0.0017) model time 0.4019 (0.4147) loss 7.5908 (6.6895) grad_norm 3.0746 (3.5710) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][530/625] eta 0:00:39 lr 0.000201 wd 0.0500 time 0.3956 (0.4147) data time 0.0008 (0.0017) model time 0.3947 (0.4146) loss 6.4731 (6.6900) grad_norm 2.0522 (3.5640) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][540/625] eta 0:00:35 lr 0.000201 wd 0.0500 time 0.5686 (0.4151) data time 0.0009 (0.0017) model time 0.5678 (0.4151) loss 6.4614 (6.6951) grad_norm 3.6928 (3.5627) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][550/625] eta 0:00:31 lr 0.000201 wd 0.0500 time 0.5747 (0.4160) data time 0.0009 (0.0017) model time 0.5738 (0.4160) loss 6.6105 (6.6927) grad_norm 1.8453 (3.5648) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][560/625] eta 0:00:27 lr 0.000201 wd 0.0500 time 0.5779 (0.4176) data time 0.0008 (0.0017) model time 0.5770 (0.4178) loss 6.4876 (6.6964) grad_norm 4.2806 (3.5976) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][570/625] eta 0:00:23 lr 0.000201 wd 0.0500 time 0.5934 (0.4183) data time 0.0006 (0.0016) model time 0.5928 (0.4186) loss 6.6161 (6.6973) grad_norm 4.2603 (3.6070) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][580/625] eta 0:00:18 lr 0.000201 wd 0.0500 time 0.3959 (0.4187) data time 0.0007 (0.0016) model time 0.3952 (0.4190) loss 6.8576 (6.6939) grad_norm 3.0248 (3.6029) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][590/625] eta 0:00:14 lr 0.000201 wd 0.0500 time 0.3977 (0.4186) data time 0.0008 (0.0016) model time 0.3970 (0.4188) loss 6.8594 (6.6973) grad_norm 2.3457 (3.5983) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:45:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][600/625] eta 0:00:10 lr 0.000201 wd 0.0500 time 0.4139 (0.4184) data time 0.0006 (0.0016) model time 0.4133 (0.4186) loss 6.0460 (6.6913) grad_norm 2.6593 (3.5997) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][610/625] eta 0:00:06 lr 0.000200 wd 0.0500 time 0.4002 (0.4181) data time 0.0004 (0.0016) model time 0.3997 (0.4183) loss 5.5279 (6.6893) grad_norm 2.6363 (3.5899) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][620/625] eta 0:00:02 lr 0.000200 wd 0.0500 time 0.3941 (0.4178) data time 0.0004 (0.0016) model time 0.3937 (0.4179) loss 6.8636 (6.6927) grad_norm 2.0797 (3.5757) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 226 training takes 0:04:21 [2024-07-25 08:46:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:46:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:46:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5386 (0.5386) Acc@1 89.844 (89.844) Acc@5 99.072 (99.072) Mem 14939MB [2024-07-25 08:46:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8457 (0.6707) Acc@1 81.934 (86.856) Acc@5 96.533 (97.829) Mem 14939MB [2024-07-25 08:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9487 (0.7778) Acc@1 77.197 (83.829) Acc@5 95.605 (96.817) Mem 14939MB [2024-07-25 08:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.477 Acc@5 96.797 [2024-07-25 08:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-07-25 08:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.48% [2024-07-25 08:46:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 08:46:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 08:46:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5400 (0.5400) Acc@1 90.039 (90.039) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:46:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8281 (0.6648) Acc@1 82.324 (86.919) Acc@5 96.631 (97.909) Mem 14939MB [2024-07-25 08:46:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9497 (0.7737) Acc@1 77.881 (83.917) Acc@5 95.508 (96.894) Mem 14939MB [2024-07-25 08:46:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.511 Acc@5 96.847 [2024-07-25 08:46:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-07-25 08:46:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.51% [2024-07-25 08:46:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:46:14 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][0/625] eta 0:08:36 lr 0.000200 wd 0.0500 time 0.8263 (0.8263) data time 0.4505 (0.4505) model time 0.0000 (0.0000) loss 5.9213 (5.9213) grad_norm 2.3727 (2.3727) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][10/625] eta 0:04:28 lr 0.000200 wd 0.0500 time 0.4046 (0.4366) data time 0.0009 (0.0418) model time 0.0000 (0.0000) loss 7.5066 (6.4491) grad_norm 2.3213 (3.4070) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][20/625] eta 0:04:13 lr 0.000200 wd 0.0500 time 0.3968 (0.4187) data time 0.0007 (0.0223) model time 0.0000 (0.0000) loss 6.0361 (6.5132) grad_norm 2.9571 (5.6605) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][30/625] eta 0:04:08 lr 0.000200 wd 0.0500 time 0.3975 (0.4180) data time 0.0006 (0.0153) model time 0.0000 (0.0000) loss 5.8926 (6.6088) grad_norm 2.4503 (4.9433) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][40/625] eta 0:04:01 lr 0.000200 wd 0.0500 time 0.3999 (0.4132) data time 0.0008 (0.0118) model time 0.0000 (0.0000) loss 6.5458 (6.5902) grad_norm 3.8229 (4.6277) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][50/625] eta 0:03:56 lr 0.000200 wd 0.0500 time 0.4039 (0.4110) data time 0.0006 (0.0096) model time 0.0000 (0.0000) loss 5.9034 (6.5973) grad_norm 3.0029 (4.3734) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][60/625] eta 0:03:51 lr 0.000200 wd 0.0500 time 0.3978 (0.4093) data time 0.0006 (0.0082) model time 0.3972 (0.3998) loss 7.6440 (6.6042) grad_norm 2.9075 (4.0940) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][70/625] eta 0:03:46 lr 0.000200 wd 0.0500 time 0.3977 (0.4081) data time 0.0008 (0.0071) model time 0.3969 (0.3997) loss 6.4747 (6.6073) grad_norm 10.2395 (4.1328) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][80/625] eta 0:03:41 lr 0.000200 wd 0.0500 time 0.3954 (0.4069) data time 0.0006 (0.0063) model time 0.3948 (0.3990) loss 7.0630 (6.5954) grad_norm 2.6087 (4.1114) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][90/625] eta 0:03:37 lr 0.000200 wd 0.0500 time 0.4031 (0.4061) data time 0.0008 (0.0057) model time 0.4023 (0.3989) loss 7.6476 (6.6264) grad_norm 3.5774 (4.2378) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][100/625] eta 0:03:32 lr 0.000200 wd 0.0500 time 0.3976 (0.4054) data time 0.0007 (0.0052) model time 0.3970 (0.3988) loss 7.4012 (6.6648) grad_norm 2.2285 (4.0738) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:46:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][110/625] eta 0:03:28 lr 0.000199 wd 0.0500 time 0.3940 (0.4048) data time 0.0006 (0.0048) model time 0.3933 (0.3987) loss 5.4787 (6.6260) grad_norm 3.0611 (3.9778) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][120/625] eta 0:03:24 lr 0.000199 wd 0.0500 time 0.3934 (0.4054) data time 0.0008 (0.0045) model time 0.3926 (0.4005) loss 6.6085 (6.6389) grad_norm 1.8532 (3.8559) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][130/625] eta 0:03:21 lr 0.000199 wd 0.0500 time 0.3963 (0.4064) data time 0.0008 (0.0042) model time 0.3955 (0.4027) loss 6.0634 (6.6485) grad_norm 2.4078 (3.7626) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][140/625] eta 0:03:17 lr 0.000199 wd 0.0500 time 0.5428 (0.4082) data time 0.0007 (0.0040) model time 0.5421 (0.4059) loss 8.1830 (6.6399) grad_norm 2.2687 (3.6873) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][150/625] eta 0:03:15 lr 0.000199 wd 0.0500 time 0.5976 (0.4113) data time 0.0009 (0.0038) model time 0.5967 (0.4107) loss 6.0507 (6.6467) grad_norm 2.1933 (3.6552) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][160/625] eta 0:03:12 lr 0.000199 wd 0.0500 time 0.4032 (0.4144) data time 0.0007 (0.0036) model time 0.4025 (0.4152) loss 7.7598 (6.6336) grad_norm 4.3741 (3.6067) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][170/625] eta 0:03:10 lr 0.000199 wd 0.0500 time 0.3981 (0.4179) data time 0.0009 (0.0034) model time 0.3972 (0.4200) loss 7.2144 (6.6351) grad_norm 2.3910 (3.6632) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][180/625] eta 0:03:06 lr 0.000199 wd 0.0500 time 0.3978 (0.4189) data time 0.0006 (0.0033) model time 0.3972 (0.4212) loss 7.0005 (6.6340) grad_norm 3.6933 (3.6723) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][190/625] eta 0:03:02 lr 0.000199 wd 0.0500 time 0.5660 (0.4188) data time 0.0009 (0.0031) model time 0.5652 (0.4208) loss 5.6633 (6.6339) grad_norm 1.7023 (3.6476) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][200/625] eta 0:02:57 lr 0.000199 wd 0.0500 time 0.4017 (0.4178) data time 0.0007 (0.0030) model time 0.4010 (0.4192) loss 6.8063 (6.6468) grad_norm 2.5441 (3.6442) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][210/625] eta 0:02:53 lr 0.000199 wd 0.0500 time 0.4014 (0.4169) data time 0.0008 (0.0029) model time 0.4005 (0.4179) loss 6.6101 (6.6572) grad_norm 2.5561 (3.6710) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][220/625] eta 0:02:48 lr 0.000199 wd 0.0500 time 0.4015 (0.4162) data time 0.0007 (0.0028) model time 0.4008 (0.4169) loss 6.9092 (6.6525) grad_norm 4.1953 (3.6665) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][230/625] eta 0:02:44 lr 0.000199 wd 0.0500 time 0.3969 (0.4154) data time 0.0008 (0.0027) model time 0.3960 (0.4159) loss 6.7093 (6.6575) grad_norm 3.0487 (3.6496) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][240/625] eta 0:02:39 lr 0.000198 wd 0.0500 time 0.3960 (0.4147) data time 0.0007 (0.0027) model time 0.3953 (0.4149) loss 6.7409 (6.6526) grad_norm 3.6484 (3.6276) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:47:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][250/625] eta 0:02:35 lr 0.000198 wd 0.0500 time 0.4027 (0.4148) data time 0.0008 (0.0026) model time 0.4019 (0.4149) loss 7.8065 (6.6455) grad_norm 2.8320 (3.5914) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][260/625] eta 0:02:31 lr 0.000198 wd 0.0500 time 0.3992 (0.4141) data time 0.0009 (0.0025) model time 0.3983 (0.4141) loss 7.2102 (6.6628) grad_norm 2.2992 (3.5962) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][270/625] eta 0:02:26 lr 0.000198 wd 0.0500 time 0.3964 (0.4136) data time 0.0008 (0.0025) model time 0.3955 (0.4134) loss 6.3387 (6.6546) grad_norm 2.3156 (3.5760) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][280/625] eta 0:02:22 lr 0.000198 wd 0.0500 time 0.4062 (0.4131) data time 0.0006 (0.0024) model time 0.4055 (0.4128) loss 6.0797 (6.6507) grad_norm 2.1334 (3.5407) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][290/625] eta 0:02:18 lr 0.000198 wd 0.0500 time 0.3984 (0.4126) data time 0.0009 (0.0023) model time 0.3975 (0.4122) loss 6.5347 (6.6586) grad_norm 2.6477 (3.5131) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][300/625] eta 0:02:14 lr 0.000198 wd 0.0500 time 0.4079 (0.4124) data time 0.0008 (0.0023) model time 0.4071 (0.4118) loss 5.5106 (6.6552) grad_norm 2.2005 (3.4806) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][310/625] eta 0:02:09 lr 0.000198 wd 0.0500 time 0.3965 (0.4120) data time 0.0007 (0.0023) model time 0.3958 (0.4114) loss 7.3719 (6.6582) grad_norm 2.0377 (3.4707) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][320/625] eta 0:02:05 lr 0.000198 wd 0.0500 time 0.3989 (0.4116) data time 0.0007 (0.0022) model time 0.3982 (0.4109) loss 5.7386 (6.6686) grad_norm 2.7699 (3.4416) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][330/625] eta 0:02:01 lr 0.000198 wd 0.0500 time 0.3980 (0.4112) data time 0.0006 (0.0022) model time 0.3973 (0.4104) loss 7.1729 (6.6745) grad_norm 3.0111 (3.5343) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][340/625] eta 0:01:57 lr 0.000198 wd 0.0500 time 0.3964 (0.4113) data time 0.0007 (0.0021) model time 0.3957 (0.4106) loss 6.1726 (6.6694) grad_norm 2.4274 (3.5160) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][350/625] eta 0:01:53 lr 0.000198 wd 0.0500 time 0.4110 (0.4115) data time 0.0007 (0.0021) model time 0.4104 (0.4108) loss 5.9298 (6.6640) grad_norm 2.5183 (3.4883) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][360/625] eta 0:01:49 lr 0.000198 wd 0.0500 time 0.5668 (0.4121) data time 0.0007 (0.0021) model time 0.5661 (0.4114) loss 5.8646 (6.6543) grad_norm 2.1967 (3.4570) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][370/625] eta 0:01:45 lr 0.000197 wd 0.0500 time 0.6096 (0.4136) data time 0.0009 (0.0020) model time 0.6087 (0.4132) loss 6.6855 (6.6508) grad_norm 2.7258 (3.4889) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][380/625] eta 0:01:41 lr 0.000197 wd 0.0500 time 0.5802 (0.4153) data time 0.0008 (0.0020) model time 0.5794 (0.4152) loss 8.6620 (6.6597) grad_norm 2.3121 (3.4767) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:48:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][390/625] eta 0:01:37 lr 0.000197 wd 0.0500 time 0.3944 (0.4165) data time 0.0007 (0.0020) model time 0.3937 (0.4165) loss 7.1807 (6.6521) grad_norm 3.8524 (3.4951) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][400/625] eta 0:01:33 lr 0.000197 wd 0.0500 time 0.3980 (0.4168) data time 0.0009 (0.0019) model time 0.3971 (0.4168) loss 6.8663 (6.6497) grad_norm 4.3751 (3.4873) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][410/625] eta 0:01:29 lr 0.000197 wd 0.0500 time 0.3972 (0.4167) data time 0.0007 (0.0019) model time 0.3965 (0.4167) loss 6.7222 (6.6570) grad_norm 2.1932 (3.4831) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][420/625] eta 0:01:25 lr 0.000197 wd 0.0500 time 0.3953 (0.4163) data time 0.0007 (0.0019) model time 0.3946 (0.4162) loss 5.8274 (6.6492) grad_norm 2.5163 (3.4751) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][430/625] eta 0:01:21 lr 0.000197 wd 0.0500 time 0.3972 (0.4158) data time 0.0011 (0.0019) model time 0.3961 (0.4157) loss 6.3061 (6.6431) grad_norm 2.9953 (3.4538) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][440/625] eta 0:01:16 lr 0.000197 wd 0.0500 time 0.4107 (0.4155) data time 0.0011 (0.0018) model time 0.4096 (0.4153) loss 6.6624 (6.6442) grad_norm 2.3198 (3.4326) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][450/625] eta 0:01:12 lr 0.000197 wd 0.0500 time 0.3969 (0.4151) data time 0.0008 (0.0018) model time 0.3962 (0.4148) loss 6.0107 (6.6482) grad_norm 3.9474 (3.4524) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][460/625] eta 0:01:08 lr 0.000197 wd 0.0500 time 0.3984 (0.4148) data time 0.0009 (0.0018) model time 0.3975 (0.4144) loss 6.3497 (6.6459) grad_norm 2.8378 (3.4339) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][470/625] eta 0:01:04 lr 0.000197 wd 0.0500 time 0.4047 (0.4147) data time 0.0006 (0.0018) model time 0.4041 (0.4143) loss 5.7945 (6.6404) grad_norm 4.0440 (3.4366) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][480/625] eta 0:01:00 lr 0.000197 wd 0.0500 time 0.3977 (0.4143) data time 0.0009 (0.0017) model time 0.3968 (0.4139) loss 7.1589 (6.6429) grad_norm 3.0674 (3.4351) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][490/625] eta 0:00:55 lr 0.000197 wd 0.0500 time 0.3927 (0.4140) data time 0.0009 (0.0017) model time 0.3918 (0.4135) loss 6.5730 (6.6421) grad_norm 2.2551 (3.4197) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][500/625] eta 0:00:51 lr 0.000196 wd 0.0500 time 0.4021 (0.4137) data time 0.0006 (0.0017) model time 0.4015 (0.4131) loss 6.4016 (6.6403) grad_norm 5.5073 (3.4284) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][510/625] eta 0:00:47 lr 0.000196 wd 0.0500 time 0.3959 (0.4133) data time 0.0006 (0.0017) model time 0.3953 (0.4128) loss 6.3024 (6.6434) grad_norm 5.2671 (3.4617) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][520/625] eta 0:00:43 lr 0.000196 wd 0.0500 time 0.4013 (0.4131) data time 0.0005 (0.0017) model time 0.4007 (0.4125) loss 7.0734 (6.6523) grad_norm 2.8844 (3.4605) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][530/625] eta 0:00:39 lr 0.000196 wd 0.0500 time 0.4076 (0.4129) data time 0.0006 (0.0017) model time 0.4070 (0.4123) loss 5.8041 (6.6584) grad_norm 4.5558 (3.4758) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:49:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][540/625] eta 0:00:35 lr 0.000196 wd 0.0500 time 0.3984 (0.4126) data time 0.0006 (0.0016) model time 0.3978 (0.4120) loss 5.9581 (6.6560) grad_norm 4.5326 (3.4925) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][550/625] eta 0:00:30 lr 0.000196 wd 0.0500 time 0.3993 (0.4124) data time 0.0007 (0.0016) model time 0.3986 (0.4117) loss 6.1839 (6.6507) grad_norm 4.4893 (3.4951) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][560/625] eta 0:00:26 lr 0.000196 wd 0.0500 time 0.3995 (0.4125) data time 0.0006 (0.0016) model time 0.3988 (0.4118) loss 5.9092 (6.6407) grad_norm 3.2741 (3.4825) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][570/625] eta 0:00:22 lr 0.000196 wd 0.0500 time 0.3988 (0.4126) data time 0.0007 (0.0016) model time 0.3981 (0.4119) loss 6.1091 (6.6451) grad_norm 2.4313 (3.4865) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][580/625] eta 0:00:18 lr 0.000196 wd 0.0500 time 0.4002 (0.4130) data time 0.0006 (0.0016) model time 0.3996 (0.4124) loss 6.1447 (6.6431) grad_norm 3.8999 (3.4838) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][590/625] eta 0:00:14 lr 0.000196 wd 0.0500 time 0.6026 (0.4141) data time 0.0008 (0.0016) model time 0.6018 (0.4136) loss 6.7047 (6.6451) grad_norm 7.6661 (3.4897) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][600/625] eta 0:00:10 lr 0.000196 wd 0.0500 time 0.5993 (0.4155) data time 0.0006 (0.0015) model time 0.5988 (0.4151) loss 7.5424 (6.6522) grad_norm 1.9480 (3.4819) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][610/625] eta 0:00:06 lr 0.000196 wd 0.0500 time 0.3971 (0.4167) data time 0.0006 (0.0015) model time 0.3965 (0.4164) loss 6.2905 (6.6526) grad_norm 1.9028 (3.4923) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][620/625] eta 0:00:02 lr 0.000196 wd 0.0500 time 0.3979 (0.4168) data time 0.0007 (0.0015) model time 0.3973 (0.4165) loss 5.1901 (6.6508) grad_norm 3.1117 (3.4846) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 227 training takes 0:04:20 [2024-07-25 08:50:35 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:50:36 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:50:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5464 (0.5464) Acc@1 89.941 (89.941) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:50:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8428 (0.6722) Acc@1 82.080 (86.985) Acc@5 96.631 (97.843) Mem 14939MB [2024-07-25 08:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9722 (0.7842) Acc@1 78.125 (83.852) Acc@5 95.117 (96.766) Mem 14939MB [2024-07-25 08:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.421 Acc@5 96.757 [2024-07-25 08:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:50:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.857 (0.857) Loss 0.5400 (0.5400) Acc@1 90.137 (90.137) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:50:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.158) Loss 0.8291 (0.6646) Acc@1 82.324 (86.874) Acc@5 96.582 (97.909) Mem 14939MB [2024-07-25 08:50:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 0.9487 (0.7733) Acc@1 78.076 (83.915) Acc@5 95.508 (96.896) Mem 14939MB [2024-07-25 08:50:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.513 Acc@5 96.845 [2024-07-25 08:50:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-07-25 08:50:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.51% [2024-07-25 08:50:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:50:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:50:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][0/625] eta 0:08:11 lr 0.000196 wd 0.0500 time 0.7870 (0.7870) data time 0.4096 (0.4096) model time 0.0000 (0.0000) loss 6.7165 (6.7165) grad_norm 3.0523 (3.0523) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][10/625] eta 0:04:44 lr 0.000195 wd 0.0500 time 0.3990 (0.4630) data time 0.0006 (0.0380) model time 0.0000 (0.0000) loss 6.4899 (6.4921) grad_norm 1.9519 (3.3230) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][20/625] eta 0:04:21 lr 0.000195 wd 0.0500 time 0.3994 (0.4323) data time 0.0006 (0.0203) model time 0.0000 (0.0000) loss 6.7595 (6.5976) grad_norm 3.0782 (2.8844) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][30/625] eta 0:04:11 lr 0.000195 wd 0.0500 time 0.3996 (0.4220) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 7.0972 (6.6484) grad_norm 3.9072 (3.0268) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:50:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][40/625] eta 0:04:03 lr 0.000195 wd 0.0500 time 0.3978 (0.4163) data time 0.0009 (0.0108) model time 0.0000 (0.0000) loss 7.0552 (6.6309) grad_norm 2.5677 (3.0118) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][50/625] eta 0:03:57 lr 0.000195 wd 0.0500 time 0.4325 (0.4135) data time 0.0006 (0.0088) model time 0.0000 (0.0000) loss 6.6376 (6.5990) grad_norm 4.5623 (3.2394) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][60/625] eta 0:03:52 lr 0.000195 wd 0.0500 time 0.3952 (0.4112) data time 0.0008 (0.0075) model time 0.3944 (0.3990) loss 7.2984 (6.6448) grad_norm 4.9384 (3.4907) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][70/625] eta 0:03:47 lr 0.000195 wd 0.0500 time 0.3957 (0.4096) data time 0.0006 (0.0066) model time 0.3951 (0.3989) loss 7.2124 (6.6925) grad_norm 5.3920 (3.4454) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][80/625] eta 0:03:42 lr 0.000195 wd 0.0500 time 0.4106 (0.4082) data time 0.0009 (0.0058) model time 0.4097 (0.3985) loss 7.4674 (6.7443) grad_norm 2.7706 (3.3605) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][90/625] eta 0:03:37 lr 0.000195 wd 0.0500 time 0.3976 (0.4074) data time 0.0009 (0.0053) model time 0.3967 (0.3988) loss 7.9229 (6.7745) grad_norm 1.9547 (3.5812) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][100/625] eta 0:03:33 lr 0.000195 wd 0.0500 time 0.3960 (0.4063) data time 0.0008 (0.0049) model time 0.3952 (0.3982) loss 6.8738 (6.7595) grad_norm 1.8254 (3.4873) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][110/625] eta 0:03:29 lr 0.000195 wd 0.0500 time 0.4338 (0.4060) data time 0.0008 (0.0045) model time 0.4330 (0.3989) loss 7.9176 (6.7763) grad_norm 2.3320 (3.4260) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][120/625] eta 0:03:24 lr 0.000195 wd 0.0500 time 0.4041 (0.4056) data time 0.0007 (0.0042) model time 0.4034 (0.3991) loss 6.4387 (6.7659) grad_norm 3.5542 (3.4112) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][130/625] eta 0:03:20 lr 0.000195 wd 0.0500 time 0.3958 (0.4050) data time 0.0009 (0.0039) model time 0.3949 (0.3988) loss 7.2235 (6.7646) grad_norm 2.4025 (3.4664) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][140/625] eta 0:03:16 lr 0.000194 wd 0.0500 time 0.4084 (0.4052) data time 0.0006 (0.0037) model time 0.4077 (0.3998) loss 6.8723 (6.7937) grad_norm 3.0491 (3.4878) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][150/625] eta 0:03:13 lr 0.000194 wd 0.0500 time 0.5507 (0.4063) data time 0.0009 (0.0035) model time 0.5497 (0.4019) loss 6.0114 (6.7836) grad_norm 3.1527 (3.4918) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][160/625] eta 0:03:08 lr 0.000194 wd 0.0500 time 0.3960 (0.4059) data time 0.0009 (0.0033) model time 0.3951 (0.4016) loss 6.4347 (6.7808) grad_norm 2.7750 (3.4649) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][170/625] eta 0:03:05 lr 0.000194 wd 0.0500 time 0.4003 (0.4077) data time 0.0007 (0.0032) model time 0.3997 (0.4045) loss 6.5852 (6.7784) grad_norm 2.8069 (3.4611) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:51:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][180/625] eta 0:03:02 lr 0.000194 wd 0.0500 time 0.3945 (0.4093) data time 0.0006 (0.0031) model time 0.3939 (0.4068) loss 7.2211 (6.7858) grad_norm 3.0397 (3.4707) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][190/625] eta 0:03:00 lr 0.000194 wd 0.0500 time 0.3959 (0.4138) data time 0.0009 (0.0029) model time 0.3950 (0.4132) loss 7.6602 (6.7854) grad_norm 3.2914 (3.4389) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][200/625] eta 0:02:57 lr 0.000194 wd 0.0500 time 0.3995 (0.4179) data time 0.0009 (0.0028) model time 0.3987 (0.4186) loss 6.1177 (6.7760) grad_norm 3.0439 (3.4798) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][210/625] eta 0:02:53 lr 0.000194 wd 0.0500 time 0.3951 (0.4188) data time 0.0009 (0.0027) model time 0.3943 (0.4197) loss 6.4049 (6.7697) grad_norm 3.7263 (3.4511) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][220/625] eta 0:02:49 lr 0.000194 wd 0.0500 time 0.3973 (0.4187) data time 0.0007 (0.0027) model time 0.3966 (0.4195) loss 5.9939 (6.7793) grad_norm 2.6639 (3.4387) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][230/625] eta 0:02:45 lr 0.000194 wd 0.0500 time 0.3962 (0.4191) data time 0.0009 (0.0026) model time 0.3953 (0.4199) loss 5.9480 (6.7671) grad_norm 2.4303 (3.4014) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][240/625] eta 0:02:41 lr 0.000194 wd 0.0500 time 0.4013 (0.4183) data time 0.0009 (0.0025) model time 0.4004 (0.4188) loss 7.8504 (6.7722) grad_norm 3.3605 (3.3797) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][250/625] eta 0:02:36 lr 0.000194 wd 0.0500 time 0.3967 (0.4175) data time 0.0008 (0.0024) model time 0.3959 (0.4178) loss 7.3144 (6.7750) grad_norm 3.6519 (3.4086) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][260/625] eta 0:02:32 lr 0.000194 wd 0.0500 time 0.3973 (0.4168) data time 0.0006 (0.0024) model time 0.3966 (0.4167) loss 6.5913 (6.7760) grad_norm 4.4539 (3.4053) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][270/625] eta 0:02:27 lr 0.000193 wd 0.0500 time 0.4008 (0.4160) data time 0.0007 (0.0023) model time 0.4001 (0.4158) loss 6.3955 (6.7693) grad_norm 2.7985 (3.3901) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][280/625] eta 0:02:23 lr 0.000193 wd 0.0500 time 0.3951 (0.4153) data time 0.0007 (0.0023) model time 0.3944 (0.4149) loss 7.1546 (6.7672) grad_norm 8.6181 (3.3969) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][290/625] eta 0:02:18 lr 0.000193 wd 0.0500 time 0.3992 (0.4148) data time 0.0007 (0.0022) model time 0.3985 (0.4142) loss 6.8720 (6.7631) grad_norm 2.6423 (3.4339) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:52:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][300/625] eta 0:02:14 lr 0.000193 wd 0.0500 time 0.3981 (0.4142) data time 0.0006 (0.0022) model time 0.3974 (0.4135) loss 6.1073 (6.7738) grad_norm 1.7794 (3.4079) loss_scale 512.0000 (257.7010) mem 14939MB [2024-07-25 08:52:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][310/625] eta 0:02:10 lr 0.000193 wd 0.0500 time 0.3936 (0.4137) data time 0.0008 (0.0021) model time 0.3928 (0.4129) loss 5.5503 (6.7772) grad_norm 8.4134 (3.3977) loss_scale 512.0000 (265.8778) mem 14939MB [2024-07-25 08:52:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][320/625] eta 0:02:06 lr 0.000193 wd 0.0500 time 0.3951 (0.4132) data time 0.0007 (0.0021) model time 0.3944 (0.4123) loss 7.1883 (6.7672) grad_norm 2.1615 (3.4119) loss_scale 512.0000 (273.5452) mem 14939MB [2024-07-25 08:52:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][330/625] eta 0:02:01 lr 0.000193 wd 0.0500 time 0.3951 (0.4127) data time 0.0009 (0.0020) model time 0.3943 (0.4118) loss 7.4265 (6.7837) grad_norm 4.7159 (3.4067) loss_scale 512.0000 (280.7492) mem 14939MB [2024-07-25 08:53:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][340/625] eta 0:01:57 lr 0.000193 wd 0.0500 time 0.3960 (0.4124) data time 0.0008 (0.0020) model time 0.3952 (0.4113) loss 7.3709 (6.7870) grad_norm 3.9606 (3.3852) loss_scale 512.0000 (287.5308) mem 14939MB [2024-07-25 08:53:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][350/625] eta 0:01:53 lr 0.000193 wd 0.0500 time 0.3984 (0.4119) data time 0.0008 (0.0020) model time 0.3976 (0.4109) loss 7.3497 (6.7817) grad_norm 1.9069 (3.3655) loss_scale 512.0000 (293.9259) mem 14939MB [2024-07-25 08:53:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][360/625] eta 0:01:49 lr 0.000193 wd 0.0500 time 0.3988 (0.4116) data time 0.0006 (0.0019) model time 0.3981 (0.4104) loss 6.6851 (6.7934) grad_norm 1.8906 (3.3470) loss_scale 512.0000 (299.9668) mem 14939MB [2024-07-25 08:53:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][370/625] eta 0:01:44 lr 0.000193 wd 0.0500 time 0.3986 (0.4112) data time 0.0008 (0.0019) model time 0.3978 (0.4101) loss 7.9378 (6.7964) grad_norm 2.0319 (3.3322) loss_scale 512.0000 (305.6819) mem 14939MB [2024-07-25 08:53:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][380/625] eta 0:01:40 lr 0.000193 wd 0.0500 time 0.4039 (0.4114) data time 0.0007 (0.0019) model time 0.4032 (0.4102) loss 7.6164 (6.7933) grad_norm 2.0807 (3.3060) loss_scale 512.0000 (311.0971) mem 14939MB [2024-07-25 08:53:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][390/625] eta 0:01:36 lr 0.000193 wd 0.0500 time 0.3985 (0.4120) data time 0.0007 (0.0019) model time 0.3978 (0.4109) loss 6.1670 (6.7908) grad_norm 2.4338 (3.3163) loss_scale 512.0000 (316.2353) mem 14939MB [2024-07-25 08:53:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][400/625] eta 0:01:32 lr 0.000192 wd 0.0500 time 0.5721 (0.4125) data time 0.0007 (0.0018) model time 0.5714 (0.4116) loss 7.2401 (6.7921) grad_norm 3.2450 (3.3271) loss_scale 512.0000 (321.1172) mem 14939MB [2024-07-25 08:53:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][410/625] eta 0:01:29 lr 0.000192 wd 0.0500 time 0.5939 (0.4146) data time 0.0006 (0.0018) model time 0.5933 (0.4139) loss 5.5830 (6.7879) grad_norm 2.3908 (3.3123) loss_scale 512.0000 (325.7616) mem 14939MB [2024-07-25 08:53:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][420/625] eta 0:01:25 lr 0.000192 wd 0.0500 time 0.4057 (0.4162) data time 0.0008 (0.0018) model time 0.4049 (0.4157) loss 5.3334 (6.7888) grad_norm 2.5669 (3.2976) loss_scale 512.0000 (330.1853) mem 14939MB [2024-07-25 08:53:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][430/625] eta 0:01:21 lr 0.000192 wd 0.0500 time 0.3955 (0.4169) data time 0.0009 (0.0018) model time 0.3946 (0.4166) loss 6.5571 (6.7907) grad_norm 4.4084 (3.2927) loss_scale 512.0000 (334.4037) mem 14939MB [2024-07-25 08:53:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][440/625] eta 0:01:17 lr 0.000192 wd 0.0500 time 0.4014 (0.4169) data time 0.0009 (0.0017) model time 0.4005 (0.4165) loss 7.5261 (6.7920) grad_norm 4.8494 (3.3213) loss_scale 512.0000 (338.4308) mem 14939MB [2024-07-25 08:53:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][450/625] eta 0:01:13 lr 0.000192 wd 0.0500 time 0.3984 (0.4172) data time 0.0008 (0.0017) model time 0.3975 (0.4169) loss 6.4509 (6.7828) grad_norm 2.4481 (3.3218) loss_scale 512.0000 (342.2794) mem 14939MB [2024-07-25 08:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][460/625] eta 0:01:08 lr 0.000192 wd 0.0500 time 0.3999 (0.4168) data time 0.0009 (0.0017) model time 0.3990 (0.4164) loss 7.6021 (6.7785) grad_norm 3.8838 (3.3267) loss_scale 512.0000 (345.9610) mem 14939MB [2024-07-25 08:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][470/625] eta 0:01:04 lr 0.000192 wd 0.0500 time 0.3967 (0.4165) data time 0.0007 (0.0017) model time 0.3960 (0.4160) loss 7.5077 (6.7748) grad_norm 5.0761 (3.3231) loss_scale 512.0000 (349.4862) mem 14939MB [2024-07-25 08:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][480/625] eta 0:01:00 lr 0.000192 wd 0.0500 time 0.3971 (0.4161) data time 0.0008 (0.0017) model time 0.3962 (0.4156) loss 6.7702 (6.7682) grad_norm 1.9375 (3.2995) loss_scale 512.0000 (352.8649) mem 14939MB [2024-07-25 08:54:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][490/625] eta 0:00:56 lr 0.000192 wd 0.0500 time 0.3990 (0.4157) data time 0.0006 (0.0016) model time 0.3984 (0.4152) loss 5.9816 (6.7592) grad_norm 1.9894 (3.3212) loss_scale 512.0000 (356.1059) mem 14939MB [2024-07-25 08:54:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][500/625] eta 0:00:51 lr 0.000192 wd 0.0500 time 0.4085 (0.4154) data time 0.0008 (0.0016) model time 0.4077 (0.4148) loss 5.7246 (6.7597) grad_norm 2.6271 (3.3189) loss_scale 512.0000 (359.2176) mem 14939MB [2024-07-25 08:54:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][510/625] eta 0:00:47 lr 0.000192 wd 0.0500 time 0.4018 (0.4151) data time 0.0006 (0.0016) model time 0.4012 (0.4145) loss 6.3840 (6.7612) grad_norm 2.8886 (3.3080) loss_scale 512.0000 (362.2074) mem 14939MB [2024-07-25 08:54:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][520/625] eta 0:00:43 lr 0.000192 wd 0.0500 time 0.3993 (0.4148) data time 0.0008 (0.0016) model time 0.3986 (0.4141) loss 7.7805 (6.7613) grad_norm 3.3952 (3.3045) loss_scale 512.0000 (365.0825) mem 14939MB [2024-07-25 08:54:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][530/625] eta 0:00:39 lr 0.000191 wd 0.0500 time 0.3984 (0.4146) data time 0.0006 (0.0016) model time 0.3978 (0.4139) loss 6.2090 (6.7566) grad_norm 4.2053 (3.3093) loss_scale 512.0000 (367.8493) mem 14939MB [2024-07-25 08:54:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][540/625] eta 0:00:35 lr 0.000191 wd 0.0500 time 0.3992 (0.4143) data time 0.0007 (0.0016) model time 0.3986 (0.4135) loss 7.4683 (6.7621) grad_norm 2.8253 (3.3121) loss_scale 512.0000 (370.5139) mem 14939MB [2024-07-25 08:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][550/625] eta 0:00:31 lr 0.000191 wd 0.0500 time 0.3979 (0.4140) data time 0.0008 (0.0016) model time 0.3971 (0.4132) loss 6.0083 (6.7647) grad_norm 2.3055 (3.3068) loss_scale 512.0000 (373.0817) mem 14939MB [2024-07-25 08:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][560/625] eta 0:00:26 lr 0.000191 wd 0.0500 time 0.3990 (0.4137) data time 0.0008 (0.0015) model time 0.3982 (0.4129) loss 6.7828 (6.7582) grad_norm 2.3060 (3.2933) loss_scale 512.0000 (375.5579) mem 14939MB [2024-07-25 08:54:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][570/625] eta 0:00:22 lr 0.000191 wd 0.0500 time 0.4008 (0.4134) data time 0.0007 (0.0015) model time 0.4001 (0.4126) loss 6.7408 (6.7608) grad_norm 1.9724 (3.3025) loss_scale 512.0000 (377.9475) mem 14939MB [2024-07-25 08:54:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][580/625] eta 0:00:18 lr 0.000191 wd 0.0500 time 0.3972 (0.4132) data time 0.0007 (0.0015) model time 0.3965 (0.4123) loss 7.2117 (6.7603) grad_norm 2.1837 (3.2930) loss_scale 512.0000 (380.2547) mem 14939MB [2024-07-25 08:54:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][590/625] eta 0:00:14 lr 0.000191 wd 0.0500 time 0.3961 (0.4129) data time 0.0009 (0.0015) model time 0.3952 (0.4120) loss 7.1434 (6.7630) grad_norm 2.5438 (3.2820) loss_scale 512.0000 (382.4839) mem 14939MB [2024-07-25 08:54:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][600/625] eta 0:00:10 lr 0.000191 wd 0.0500 time 0.3977 (0.4129) data time 0.0008 (0.0015) model time 0.3969 (0.4120) loss 5.7805 (6.7624) grad_norm 5.3165 (3.2897) loss_scale 512.0000 (384.6389) mem 14939MB [2024-07-25 08:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][610/625] eta 0:00:06 lr 0.000191 wd 0.0500 time 0.5671 (0.4135) data time 0.0004 (0.0015) model time 0.5667 (0.4127) loss 5.8928 (6.7607) grad_norm 2.3440 (3.2919) loss_scale 512.0000 (386.7234) mem 14939MB [2024-07-25 08:54:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][620/625] eta 0:00:02 lr 0.000191 wd 0.0500 time 0.5608 (0.4137) data time 0.0005 (0.0015) model time 0.5603 (0.4129) loss 5.7753 (6.7588) grad_norm 2.6032 (3.2855) loss_scale 512.0000 (388.7407) mem 14939MB [2024-07-25 08:55:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 228 training takes 0:04:18 [2024-07-25 08:55:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:55:02 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:55:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.5391 (0.5391) Acc@1 89.990 (89.990) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:55:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8413 (0.6684) Acc@1 81.592 (86.870) Acc@5 96.875 (97.892) Mem 14939MB [2024-07-25 08:55:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9438 (0.7786) Acc@1 77.734 (83.831) Acc@5 95.459 (96.889) Mem 14939MB [2024-07-25 08:55:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.413 Acc@5 96.861 [2024-07-25 08:55:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-07-25 08:55:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.854 (0.854) Loss 0.5396 (0.5396) Acc@1 90.137 (90.137) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:55:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.160) Loss 0.8291 (0.6645) Acc@1 82.373 (86.892) Acc@5 96.582 (97.923) Mem 14939MB [2024-07-25 08:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.124) Loss 0.9473 (0.7729) Acc@1 78.125 (83.938) Acc@5 95.459 (96.915) Mem 14939MB [2024-07-25 08:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.533 Acc@5 96.865 [2024-07-25 08:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-07-25 08:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.53% [2024-07-25 08:55:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:55:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:55:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][0/625] eta 0:07:34 lr 0.000191 wd 0.0500 time 0.7270 (0.7270) data time 0.3457 (0.3457) model time 0.0000 (0.0000) loss 6.3900 (6.3900) grad_norm 3.9265 (3.9265) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][10/625] eta 0:05:13 lr 0.000191 wd 0.0500 time 0.4000 (0.5093) data time 0.0008 (0.0322) model time 0.0000 (0.0000) loss 6.6638 (6.6600) grad_norm 3.5655 (2.9032) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][20/625] eta 0:04:59 lr 0.000191 wd 0.0500 time 0.5909 (0.4945) data time 0.0007 (0.0172) model time 0.0000 (0.0000) loss 5.7285 (6.6203) grad_norm 2.8291 (3.1714) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][30/625] eta 0:04:41 lr 0.000190 wd 0.0500 time 0.4014 (0.4737) data time 0.0007 (0.0119) model time 0.0000 (0.0000) loss 7.5801 (6.6631) grad_norm 2.3959 (2.9483) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][40/625] eta 0:04:26 lr 0.000190 wd 0.0500 time 0.3986 (0.4553) data time 0.0009 (0.0092) model time 0.0000 (0.0000) loss 7.3393 (6.6405) grad_norm 1.5748 (2.7426) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][50/625] eta 0:04:17 lr 0.000190 wd 0.0500 time 0.3999 (0.4479) data time 0.0008 (0.0076) model time 0.0000 (0.0000) loss 5.8666 (6.6283) grad_norm 3.5925 (2.7373) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][60/625] eta 0:04:08 lr 0.000190 wd 0.0500 time 0.3968 (0.4397) data time 0.0008 (0.0065) model time 0.3960 (0.3966) loss 7.5092 (6.6637) grad_norm 3.1571 (2.8692) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][70/625] eta 0:04:00 lr 0.000190 wd 0.0500 time 0.3954 (0.4336) data time 0.0006 (0.0057) model time 0.3948 (0.3963) loss 6.1038 (6.6589) grad_norm 3.3067 (2.8723) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][80/625] eta 0:03:53 lr 0.000190 wd 0.0500 time 0.3976 (0.4292) data time 0.0008 (0.0051) model time 0.3969 (0.3965) loss 8.0166 (6.7148) grad_norm 2.1105 (2.9156) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][90/625] eta 0:03:47 lr 0.000190 wd 0.0500 time 0.3983 (0.4257) data time 0.0007 (0.0046) model time 0.3976 (0.3966) loss 6.8276 (6.7299) grad_norm 2.9347 (2.8803) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][100/625] eta 0:03:41 lr 0.000190 wd 0.0500 time 0.3987 (0.4228) data time 0.0006 (0.0042) model time 0.3982 (0.3965) loss 5.5293 (6.6474) grad_norm 2.1806 (2.9104) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][110/625] eta 0:03:36 lr 0.000190 wd 0.0500 time 0.3980 (0.4207) data time 0.0006 (0.0039) model time 0.3973 (0.3968) loss 6.9943 (6.6485) grad_norm 3.8156 (2.9805) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:55:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][120/625] eta 0:03:31 lr 0.000190 wd 0.0500 time 0.3997 (0.4191) data time 0.0008 (0.0036) model time 0.3988 (0.3972) loss 7.3586 (6.6479) grad_norm 2.6912 (2.9484) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][130/625] eta 0:03:26 lr 0.000190 wd 0.0500 time 0.3963 (0.4178) data time 0.0006 (0.0034) model time 0.3956 (0.3978) loss 6.8076 (6.6383) grad_norm 4.0049 (2.9859) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][140/625] eta 0:03:21 lr 0.000190 wd 0.0500 time 0.3986 (0.4164) data time 0.0007 (0.0032) model time 0.3979 (0.3978) loss 6.5328 (6.6462) grad_norm 2.2070 (3.0319) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][150/625] eta 0:03:17 lr 0.000190 wd 0.0500 time 0.3995 (0.4153) data time 0.0008 (0.0031) model time 0.3986 (0.3978) loss 7.5417 (6.6399) grad_norm 3.3029 (3.0280) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][160/625] eta 0:03:12 lr 0.000189 wd 0.0500 time 0.3975 (0.4143) data time 0.0008 (0.0029) model time 0.3967 (0.3978) loss 7.5981 (6.6459) grad_norm 4.2782 (3.0528) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][170/625] eta 0:03:08 lr 0.000189 wd 0.0500 time 0.3988 (0.4134) data time 0.0008 (0.0028) model time 0.3980 (0.3979) loss 6.7330 (6.6288) grad_norm 3.4441 (3.0584) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][180/625] eta 0:03:03 lr 0.000189 wd 0.0500 time 0.4001 (0.4126) data time 0.0006 (0.0027) model time 0.3995 (0.3979) loss 5.4857 (6.6384) grad_norm 4.7338 (3.0654) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][190/625] eta 0:02:59 lr 0.000189 wd 0.0500 time 0.3964 (0.4129) data time 0.0008 (0.0026) model time 0.3956 (0.3993) loss 7.2841 (6.6276) grad_norm 2.7583 (3.0608) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][200/625] eta 0:02:56 lr 0.000189 wd 0.0500 time 0.6184 (0.4143) data time 0.0009 (0.0025) model time 0.6175 (0.4020) loss 6.0194 (6.6133) grad_norm 1.9605 (3.0360) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][210/625] eta 0:02:51 lr 0.000189 wd 0.0500 time 0.4008 (0.4144) data time 0.0006 (0.0024) model time 0.4002 (0.4029) loss 7.3835 (6.6348) grad_norm 3.6017 (3.1425) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][220/625] eta 0:02:48 lr 0.000189 wd 0.0500 time 0.5876 (0.4167) data time 0.0006 (0.0024) model time 0.5869 (0.4065) loss 7.4500 (6.6358) grad_norm 4.5444 (3.1986) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][230/625] eta 0:02:45 lr 0.000189 wd 0.0500 time 0.3945 (0.4182) data time 0.0008 (0.0023) model time 0.3937 (0.4089) loss 5.7893 (6.6453) grad_norm 2.5097 (3.1960) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][240/625] eta 0:02:41 lr 0.000189 wd 0.0500 time 0.3966 (0.4204) data time 0.0009 (0.0022) model time 0.3958 (0.4122) loss 6.5953 (6.6429) grad_norm 13.4247 (3.2472) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][250/625] eta 0:02:37 lr 0.000189 wd 0.0500 time 0.5947 (0.4209) data time 0.0008 (0.0022) model time 0.5939 (0.4132) loss 5.5055 (6.6285) grad_norm 3.3634 (3.2402) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:56:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][260/625] eta 0:02:33 lr 0.000189 wd 0.0500 time 0.3995 (0.4200) data time 0.0006 (0.0021) model time 0.3988 (0.4124) loss 6.3763 (6.6246) grad_norm 3.3222 (3.2413) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][270/625] eta 0:02:29 lr 0.000189 wd 0.0500 time 0.4093 (0.4200) data time 0.0008 (0.0021) model time 0.4085 (0.4127) loss 6.2334 (6.6225) grad_norm 3.6665 (3.2621) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][280/625] eta 0:02:24 lr 0.000189 wd 0.0500 time 0.3959 (0.4192) data time 0.0009 (0.0020) model time 0.3950 (0.4120) loss 5.9864 (6.6271) grad_norm 3.0211 (3.2503) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][290/625] eta 0:02:20 lr 0.000189 wd 0.0500 time 0.3990 (0.4185) data time 0.0006 (0.0020) model time 0.3984 (0.4114) loss 7.6616 (6.6206) grad_norm 2.3759 (3.2213) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][300/625] eta 0:02:15 lr 0.000188 wd 0.0500 time 0.4049 (0.4179) data time 0.0009 (0.0019) model time 0.4040 (0.4109) loss 5.3908 (6.6117) grad_norm 2.0804 (3.2103) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][310/625] eta 0:02:11 lr 0.000188 wd 0.0500 time 0.3937 (0.4172) data time 0.0007 (0.0019) model time 0.3930 (0.4104) loss 8.2210 (6.6089) grad_norm 3.1148 (3.2246) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][320/625] eta 0:02:07 lr 0.000188 wd 0.0500 time 0.3963 (0.4167) data time 0.0006 (0.0019) model time 0.3957 (0.4100) loss 6.4855 (6.5989) grad_norm 2.6035 (3.2335) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][330/625] eta 0:02:02 lr 0.000188 wd 0.0500 time 0.4016 (0.4161) data time 0.0010 (0.0018) model time 0.4007 (0.4095) loss 6.7734 (6.5978) grad_norm 2.5732 (3.2762) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][340/625] eta 0:01:58 lr 0.000188 wd 0.0500 time 0.4010 (0.4156) data time 0.0007 (0.0018) model time 0.4003 (0.4091) loss 7.2857 (6.5916) grad_norm 3.3370 (3.2747) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][350/625] eta 0:01:54 lr 0.000188 wd 0.0500 time 0.3989 (0.4151) data time 0.0008 (0.0018) model time 0.3982 (0.4087) loss 5.3719 (6.5871) grad_norm 2.1396 (3.2971) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][360/625] eta 0:01:49 lr 0.000188 wd 0.0500 time 0.4025 (0.4147) data time 0.0007 (0.0018) model time 0.4018 (0.4084) loss 6.1842 (6.5863) grad_norm 15.8508 (3.3421) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][370/625] eta 0:01:45 lr 0.000188 wd 0.0500 time 0.3967 (0.4143) data time 0.0009 (0.0017) model time 0.3958 (0.4081) loss 5.4495 (6.5930) grad_norm 1.8304 (3.3149) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][380/625] eta 0:01:41 lr 0.000188 wd 0.0500 time 0.4044 (0.4141) data time 0.0010 (0.0017) model time 0.4034 (0.4080) loss 7.7043 (6.6041) grad_norm 3.4087 (3.2931) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][390/625] eta 0:01:37 lr 0.000188 wd 0.0500 time 0.4031 (0.4137) data time 0.0006 (0.0017) model time 0.4024 (0.4077) loss 7.6276 (6.6018) grad_norm 3.5118 (3.3076) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][400/625] eta 0:01:32 lr 0.000188 wd 0.0500 time 0.3945 (0.4133) data time 0.0006 (0.0017) model time 0.3939 (0.4075) loss 7.7176 (6.6058) grad_norm 1.9043 (3.3814) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:57:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][410/625] eta 0:01:28 lr 0.000188 wd 0.0500 time 0.4001 (0.4133) data time 0.0009 (0.0016) model time 0.3993 (0.4076) loss 6.7349 (6.6084) grad_norm 4.2774 (3.3855) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:58:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][420/625] eta 0:01:24 lr 0.000188 wd 0.0500 time 0.3813 (0.4138) data time 0.0006 (0.0016) model time 0.3807 (0.4083) loss 7.8442 (6.6072) grad_norm 9.6080 (3.3977) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:58:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][430/625] eta 0:01:20 lr 0.000187 wd 0.0500 time 0.3926 (0.4137) data time 0.0007 (0.0016) model time 0.3919 (0.4083) loss 6.0082 (6.6112) grad_norm 3.1534 (3.4007) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:58:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][440/625] eta 0:01:16 lr 0.000187 wd 0.0500 time 0.3963 (0.4151) data time 0.0007 (0.0016) model time 0.3956 (0.4100) loss 7.7359 (6.6055) grad_norm 2.9601 (3.3948) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:58:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][450/625] eta 0:01:12 lr 0.000187 wd 0.0500 time 0.5692 (0.4169) data time 0.0006 (0.0016) model time 0.5686 (0.4121) loss 7.2162 (6.6014) grad_norm 2.9783 (3.3798) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:58:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][460/625] eta 0:01:08 lr 0.000187 wd 0.0500 time 0.3969 (0.4177) data time 0.0008 (0.0015) model time 0.3961 (0.4131) loss 7.2113 (6.5972) grad_norm 2.3917 (3.3677) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:58:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][470/625] eta 0:01:04 lr 0.000187 wd 0.0500 time 0.5711 (0.4179) data time 0.0006 (0.0015) model time 0.5704 (0.4135) loss 7.7110 (6.6010) grad_norm 5.1212 (3.3633) loss_scale 512.0000 (512.0000) mem 14939MB [2024-07-25 08:58:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][480/625] eta 0:01:00 lr 0.000187 wd 0.0500 time 0.4150 (0.4176) data time 0.0007 (0.0015) model time 0.4143 (0.4132) loss 6.8464 (6.5946) grad_norm 2.8859 (inf) loss_scale 256.0000 (508.2744) mem 14939MB [2024-07-25 08:58:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][490/625] eta 0:00:56 lr 0.000187 wd 0.0500 time 0.3983 (0.4174) data time 0.0007 (0.0015) model time 0.3976 (0.4131) loss 5.7279 (6.5964) grad_norm 3.3756 (inf) loss_scale 256.0000 (503.1365) mem 14939MB [2024-07-25 08:58:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][500/625] eta 0:00:52 lr 0.000187 wd 0.0500 time 0.4134 (0.4171) data time 0.0007 (0.0015) model time 0.4127 (0.4128) loss 6.8202 (6.6053) grad_norm 5.6313 (inf) loss_scale 256.0000 (498.2036) mem 14939MB [2024-07-25 08:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][510/625] eta 0:00:47 lr 0.000187 wd 0.0500 time 0.4024 (0.4167) data time 0.0009 (0.0015) model time 0.4016 (0.4125) loss 7.3066 (6.6123) grad_norm 2.0706 (inf) loss_scale 256.0000 (493.4638) mem 14939MB [2024-07-25 08:58:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][520/625] eta 0:00:43 lr 0.000187 wd 0.0500 time 0.3986 (0.4164) data time 0.0009 (0.0015) model time 0.3977 (0.4122) loss 7.2688 (6.6169) grad_norm 2.2023 (inf) loss_scale 256.0000 (488.9060) mem 14939MB [2024-07-25 08:58:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][530/625] eta 0:00:39 lr 0.000187 wd 0.0500 time 0.3968 (0.4161) data time 0.0010 (0.0015) model time 0.3958 (0.4119) loss 6.4501 (6.6215) grad_norm 4.9277 (inf) loss_scale 256.0000 (484.5198) mem 14939MB [2024-07-25 08:58:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][540/625] eta 0:00:35 lr 0.000187 wd 0.0500 time 0.3887 (0.4158) data time 0.0007 (0.0014) model time 0.3880 (0.4116) loss 7.1991 (6.6247) grad_norm 3.0877 (inf) loss_scale 256.0000 (480.2957) mem 14939MB [2024-07-25 08:58:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][550/625] eta 0:00:31 lr 0.000187 wd 0.0500 time 0.3994 (0.4155) data time 0.0006 (0.0014) model time 0.3987 (0.4113) loss 5.6190 (6.6178) grad_norm 3.0672 (inf) loss_scale 256.0000 (476.2250) mem 14939MB [2024-07-25 08:59:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][560/625] eta 0:00:26 lr 0.000186 wd 0.0500 time 0.3985 (0.4152) data time 0.0007 (0.0014) model time 0.3978 (0.4111) loss 7.0427 (6.6279) grad_norm 6.3307 (inf) loss_scale 256.0000 (472.2995) mem 14939MB [2024-07-25 08:59:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][570/625] eta 0:00:22 lr 0.000186 wd 0.0500 time 0.3980 (0.4149) data time 0.0008 (0.0014) model time 0.3972 (0.4108) loss 6.8980 (6.6255) grad_norm 3.3913 (inf) loss_scale 256.0000 (468.5114) mem 14939MB [2024-07-25 08:59:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][580/625] eta 0:00:18 lr 0.000186 wd 0.0500 time 0.3997 (0.4146) data time 0.0007 (0.0014) model time 0.3991 (0.4105) loss 7.4719 (6.6249) grad_norm 2.4574 (inf) loss_scale 256.0000 (464.8537) mem 14939MB [2024-07-25 08:59:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][590/625] eta 0:00:14 lr 0.000186 wd 0.0500 time 0.3975 (0.4143) data time 0.0008 (0.0014) model time 0.3967 (0.4103) loss 5.9076 (6.6236) grad_norm 3.3559 (inf) loss_scale 256.0000 (461.3198) mem 14939MB [2024-07-25 08:59:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][600/625] eta 0:00:10 lr 0.000186 wd 0.0500 time 0.4025 (0.4140) data time 0.0008 (0.0014) model time 0.4017 (0.4101) loss 5.7086 (6.6257) grad_norm 3.8330 (inf) loss_scale 256.0000 (457.9035) mem 14939MB [2024-07-25 08:59:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][610/625] eta 0:00:06 lr 0.000186 wd 0.0500 time 0.3963 (0.4138) data time 0.0004 (0.0014) model time 0.3959 (0.4099) loss 7.1879 (6.6206) grad_norm 2.1219 (inf) loss_scale 256.0000 (454.5990) mem 14939MB [2024-07-25 08:59:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][620/625] eta 0:00:02 lr 0.000186 wd 0.0500 time 0.3977 (0.4136) data time 0.0006 (0.0014) model time 0.3971 (0.4097) loss 6.3036 (6.6185) grad_norm 3.4931 (inf) loss_scale 256.0000 (451.4010) mem 14939MB [2024-07-25 08:59:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 229 training takes 0:04:18 [2024-07-25 08:59:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 08:59:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 08:59:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.444 (0.444) Loss 0.5493 (0.5493) Acc@1 89.893 (89.893) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 08:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8398 (0.6737) Acc@1 82.422 (87.003) Acc@5 96.582 (97.807) Mem 14939MB [2024-07-25 08:59:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9424 (0.7840) Acc@1 78.369 (83.970) Acc@5 95.752 (96.852) Mem 14939MB [2024-07-25 08:59:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.541 Acc@5 96.839 [2024-07-25 08:59:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-07-25 08:59:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.54% [2024-07-25 08:59:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 08:59:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 08:59:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5400 (0.5400) Acc@1 90.186 (90.186) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 08:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8281 (0.6644) Acc@1 82.422 (86.919) Acc@5 96.582 (97.931) Mem 14939MB [2024-07-25 08:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9468 (0.7726) Acc@1 78.125 (83.956) Acc@5 95.508 (96.928) Mem 14939MB [2024-07-25 08:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.559 Acc@5 96.879 [2024-07-25 08:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 08:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.56% [2024-07-25 08:59:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 08:59:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 08:59:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][0/625] eta 0:07:46 lr 0.000186 wd 0.0500 time 0.7471 (0.7471) data time 0.3550 (0.3550) model time 0.0000 (0.0000) loss 7.0276 (7.0276) grad_norm 3.6027 (3.6027) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:59:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][10/625] eta 0:04:24 lr 0.000186 wd 0.0500 time 0.3955 (0.4305) data time 0.0006 (0.0330) model time 0.0000 (0.0000) loss 6.4524 (6.5917) grad_norm 2.5182 (2.8201) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:59:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][20/625] eta 0:04:21 lr 0.000186 wd 0.0500 time 0.3948 (0.4328) data time 0.0008 (0.0177) model time 0.0000 (0.0000) loss 6.2476 (6.6088) grad_norm 1.9714 (2.9571) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:59:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][30/625] eta 0:04:21 lr 0.000186 wd 0.0500 time 0.5789 (0.4402) data time 0.0009 (0.0123) model time 0.0000 (0.0000) loss 6.5764 (6.5793) grad_norm 1.8879 (2.9182) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:59:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][40/625] eta 0:04:22 lr 0.000186 wd 0.0500 time 0.3955 (0.4484) data time 0.0007 (0.0095) model time 0.0000 (0.0000) loss 5.8146 (6.5695) grad_norm 1.9132 (2.8417) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 08:59:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][50/625] eta 0:04:22 lr 0.000186 wd 0.0500 time 0.5926 (0.4569) data time 0.0007 (0.0078) model time 0.0000 (0.0000) loss 7.4133 (6.6799) grad_norm 2.8441 (2.8269) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][60/625] eta 0:04:15 lr 0.000186 wd 0.0500 time 0.3974 (0.4526) data time 0.0008 (0.0066) model time 0.3967 (0.4297) loss 7.1045 (6.6791) grad_norm 2.1738 (2.8490) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][70/625] eta 0:04:11 lr 0.000185 wd 0.0500 time 0.4005 (0.4531) data time 0.0006 (0.0058) model time 0.3999 (0.4427) loss 7.2862 (6.7233) grad_norm 1.8169 (2.8624) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][80/625] eta 0:04:03 lr 0.000185 wd 0.0500 time 0.3975 (0.4474) data time 0.0006 (0.0052) model time 0.3968 (0.4306) loss 8.4159 (6.7893) grad_norm 5.2106 (3.6231) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][90/625] eta 0:03:56 lr 0.000185 wd 0.0500 time 0.3983 (0.4421) data time 0.0006 (0.0047) model time 0.3977 (0.4225) loss 6.2931 (6.7216) grad_norm 3.7500 (3.6087) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][100/625] eta 0:03:49 lr 0.000185 wd 0.0500 time 0.3979 (0.4378) data time 0.0006 (0.0043) model time 0.3974 (0.4175) loss 6.5978 (6.7133) grad_norm 3.3870 (3.9945) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][110/625] eta 0:03:43 lr 0.000185 wd 0.0500 time 0.3990 (0.4342) data time 0.0007 (0.0040) model time 0.3983 (0.4142) loss 7.4817 (6.7646) grad_norm 2.6098 (3.8979) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][120/625] eta 0:03:37 lr 0.000185 wd 0.0500 time 0.3990 (0.4313) data time 0.0006 (0.0037) model time 0.3984 (0.4119) loss 5.9049 (6.7463) grad_norm 3.2072 (3.7954) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][130/625] eta 0:03:32 lr 0.000185 wd 0.0500 time 0.3974 (0.4288) data time 0.0006 (0.0035) model time 0.3967 (0.4101) loss 6.4262 (6.7258) grad_norm 3.3173 (3.7402) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][140/625] eta 0:03:26 lr 0.000185 wd 0.0500 time 0.3987 (0.4266) data time 0.0006 (0.0033) model time 0.3980 (0.4088) loss 7.3874 (6.7390) grad_norm 3.3147 (3.6856) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][150/625] eta 0:03:21 lr 0.000185 wd 0.0500 time 0.4056 (0.4248) data time 0.0008 (0.0031) model time 0.4049 (0.4077) loss 6.6767 (6.7198) grad_norm 3.5247 (3.6372) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][160/625] eta 0:03:17 lr 0.000185 wd 0.0500 time 0.3976 (0.4239) data time 0.0006 (0.0030) model time 0.3970 (0.4079) loss 7.2150 (6.7201) grad_norm 2.4382 (3.5748) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][170/625] eta 0:03:12 lr 0.000185 wd 0.0500 time 0.4080 (0.4226) data time 0.0006 (0.0028) model time 0.4074 (0.4072) loss 5.3798 (6.7140) grad_norm 2.2242 (3.5115) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][180/625] eta 0:03:07 lr 0.000185 wd 0.0500 time 0.3964 (0.4212) data time 0.0008 (0.0027) model time 0.3956 (0.4064) loss 7.4297 (6.7219) grad_norm 2.0427 (3.4731) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][190/625] eta 0:03:02 lr 0.000185 wd 0.0500 time 0.4022 (0.4201) data time 0.0006 (0.0026) model time 0.4016 (0.4060) loss 5.5388 (6.6961) grad_norm 3.0663 (3.4808) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:00:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][200/625] eta 0:02:58 lr 0.000184 wd 0.0500 time 0.3993 (0.4192) data time 0.0008 (0.0025) model time 0.3985 (0.4056) loss 5.8678 (6.6839) grad_norm 2.3351 (3.4341) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][210/625] eta 0:02:53 lr 0.000184 wd 0.0500 time 0.3990 (0.4182) data time 0.0008 (0.0025) model time 0.3982 (0.4051) loss 7.3224 (6.6951) grad_norm 2.0971 (3.3877) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][220/625] eta 0:02:49 lr 0.000184 wd 0.0500 time 0.3983 (0.4174) data time 0.0009 (0.0024) model time 0.3974 (0.4049) loss 6.8135 (6.6944) grad_norm 2.4942 (3.3563) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][230/625] eta 0:02:44 lr 0.000184 wd 0.0500 time 0.4018 (0.4174) data time 0.0007 (0.0023) model time 0.4011 (0.4055) loss 7.4665 (6.7043) grad_norm 2.4648 (3.3609) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][240/625] eta 0:02:40 lr 0.000184 wd 0.0500 time 0.3966 (0.4176) data time 0.0006 (0.0022) model time 0.3959 (0.4063) loss 5.2382 (6.7045) grad_norm 3.2245 (3.3486) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][250/625] eta 0:02:37 lr 0.000184 wd 0.0500 time 0.5923 (0.4188) data time 0.0007 (0.0022) model time 0.5916 (0.4084) loss 6.9005 (6.6978) grad_norm 1.9035 (3.3797) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][260/625] eta 0:02:33 lr 0.000184 wd 0.0500 time 0.3970 (0.4207) data time 0.0006 (0.0021) model time 0.3964 (0.4112) loss 6.5860 (6.7135) grad_norm 2.6345 (3.3536) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][270/625] eta 0:02:30 lr 0.000184 wd 0.0500 time 0.4001 (0.4227) data time 0.0006 (0.0021) model time 0.3994 (0.4141) loss 6.7674 (6.7160) grad_norm 1.8490 (3.3583) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][280/625] eta 0:02:26 lr 0.000184 wd 0.0500 time 0.5357 (0.4236) data time 0.0006 (0.0020) model time 0.5351 (0.4154) loss 6.2881 (6.7015) grad_norm 3.8583 (3.3634) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][290/625] eta 0:02:21 lr 0.000184 wd 0.0500 time 0.3983 (0.4233) data time 0.0007 (0.0020) model time 0.3977 (0.4154) loss 5.8577 (6.7021) grad_norm 2.7766 (3.3465) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][300/625] eta 0:02:17 lr 0.000184 wd 0.0500 time 0.3979 (0.4225) data time 0.0006 (0.0020) model time 0.3973 (0.4147) loss 6.2135 (6.7037) grad_norm 2.1265 (3.3121) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][310/625] eta 0:02:12 lr 0.000184 wd 0.0500 time 0.4022 (0.4221) data time 0.0009 (0.0019) model time 0.4013 (0.4145) loss 7.3619 (6.7031) grad_norm 2.3210 (3.3359) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][320/625] eta 0:02:08 lr 0.000184 wd 0.0500 time 0.3991 (0.4214) data time 0.0007 (0.0019) model time 0.3984 (0.4139) loss 6.0395 (6.6915) grad_norm 3.3708 (3.3706) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][330/625] eta 0:02:04 lr 0.000183 wd 0.0500 time 0.3996 (0.4207) data time 0.0006 (0.0019) model time 0.3990 (0.4134) loss 6.2895 (6.6848) grad_norm 2.5601 (3.3611) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:01:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][340/625] eta 0:01:59 lr 0.000183 wd 0.0500 time 0.3963 (0.4201) data time 0.0007 (0.0018) model time 0.3956 (0.4128) loss 6.1612 (6.6816) grad_norm 1.6543 (3.3512) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][350/625] eta 0:01:55 lr 0.000183 wd 0.0500 time 0.3995 (0.4195) data time 0.0007 (0.0018) model time 0.3988 (0.4124) loss 6.6352 (6.6837) grad_norm 3.6659 (3.3455) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][360/625] eta 0:01:51 lr 0.000183 wd 0.0500 time 0.3970 (0.4189) data time 0.0007 (0.0018) model time 0.3963 (0.4119) loss 6.7456 (6.6857) grad_norm 2.6329 (3.3369) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][370/625] eta 0:01:46 lr 0.000183 wd 0.0500 time 0.3960 (0.4183) data time 0.0007 (0.0017) model time 0.3953 (0.4114) loss 7.8666 (6.6946) grad_norm 1.5878 (3.3238) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][380/625] eta 0:01:42 lr 0.000183 wd 0.0500 time 0.3975 (0.4181) data time 0.0008 (0.0017) model time 0.3968 (0.4114) loss 7.0441 (6.6819) grad_norm 2.2888 (3.2974) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][390/625] eta 0:01:38 lr 0.000183 wd 0.0500 time 0.3968 (0.4177) data time 0.0008 (0.0017) model time 0.3960 (0.4110) loss 6.5529 (6.6702) grad_norm 4.4313 (3.2826) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][400/625] eta 0:01:33 lr 0.000183 wd 0.0500 time 0.3963 (0.4172) data time 0.0007 (0.0017) model time 0.3956 (0.4106) loss 6.8237 (6.6697) grad_norm 3.2103 (3.2834) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][410/625] eta 0:01:29 lr 0.000183 wd 0.0500 time 0.3977 (0.4168) data time 0.0008 (0.0016) model time 0.3969 (0.4103) loss 7.5779 (6.6804) grad_norm 2.5873 (3.2797) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][420/625] eta 0:01:25 lr 0.000183 wd 0.0500 time 0.3931 (0.4163) data time 0.0009 (0.0016) model time 0.3922 (0.4099) loss 6.6041 (6.6794) grad_norm 4.5238 (3.2850) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][430/625] eta 0:01:21 lr 0.000183 wd 0.0500 time 0.3970 (0.4158) data time 0.0008 (0.0016) model time 0.3963 (0.4096) loss 7.0197 (6.6899) grad_norm 2.5247 (3.2725) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][440/625] eta 0:01:16 lr 0.000183 wd 0.0500 time 0.3984 (0.4154) data time 0.0010 (0.0016) model time 0.3974 (0.4092) loss 6.2113 (6.6905) grad_norm 2.3449 (3.2608) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][450/625] eta 0:01:12 lr 0.000183 wd 0.0500 time 0.3952 (0.4154) data time 0.0009 (0.0016) model time 0.3943 (0.4093) loss 6.8082 (6.6854) grad_norm 3.0399 (3.2705) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][460/625] eta 0:01:08 lr 0.000183 wd 0.0500 time 0.3988 (0.4154) data time 0.0006 (0.0016) model time 0.3981 (0.4095) loss 5.8074 (6.6747) grad_norm 3.7114 (3.2939) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][470/625] eta 0:01:04 lr 0.000182 wd 0.0500 time 0.5842 (0.4163) data time 0.0009 (0.0015) model time 0.5833 (0.4106) loss 6.4292 (6.6686) grad_norm 2.5139 (3.2766) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:02:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][480/625] eta 0:01:00 lr 0.000182 wd 0.0500 time 0.5546 (0.4179) data time 0.0008 (0.0015) model time 0.5538 (0.4125) loss 8.0423 (6.6763) grad_norm 2.1970 (3.2765) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][490/625] eta 0:00:56 lr 0.000182 wd 0.0500 time 0.5924 (0.4194) data time 0.0009 (0.0015) model time 0.5915 (0.4142) loss 7.1821 (6.6765) grad_norm 4.9187 (3.3000) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][500/625] eta 0:00:52 lr 0.000182 wd 0.0500 time 0.5938 (0.4198) data time 0.0006 (0.0015) model time 0.5932 (0.4148) loss 6.4479 (6.6727) grad_norm 6.0760 (3.2940) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][510/625] eta 0:00:48 lr 0.000182 wd 0.0500 time 0.3998 (0.4202) data time 0.0008 (0.0015) model time 0.3990 (0.4153) loss 7.4917 (6.6762) grad_norm 2.0805 (3.2886) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][520/625] eta 0:00:44 lr 0.000182 wd 0.0500 time 0.4003 (0.4198) data time 0.0006 (0.0015) model time 0.3996 (0.4149) loss 7.0088 (6.6728) grad_norm 4.6053 (3.2946) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][530/625] eta 0:00:39 lr 0.000182 wd 0.0500 time 0.4072 (0.4198) data time 0.0007 (0.0015) model time 0.4065 (0.4150) loss 6.1601 (6.6729) grad_norm 10.6501 (3.2928) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][540/625] eta 0:00:35 lr 0.000182 wd 0.0500 time 0.4011 (0.4194) data time 0.0008 (0.0015) model time 0.4003 (0.4147) loss 8.2537 (6.6771) grad_norm 4.1839 (3.2872) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][550/625] eta 0:00:31 lr 0.000182 wd 0.0500 time 0.4006 (0.4190) data time 0.0006 (0.0014) model time 0.4000 (0.4143) loss 7.5307 (6.6757) grad_norm 2.0858 (3.2835) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][560/625] eta 0:00:27 lr 0.000182 wd 0.0500 time 0.4014 (0.4186) data time 0.0008 (0.0014) model time 0.4006 (0.4140) loss 6.9547 (6.6733) grad_norm 3.3472 (3.2813) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][570/625] eta 0:00:23 lr 0.000182 wd 0.0500 time 0.3947 (0.4183) data time 0.0007 (0.0014) model time 0.3941 (0.4137) loss 6.4219 (6.6791) grad_norm 3.3984 (3.2703) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][580/625] eta 0:00:18 lr 0.000182 wd 0.0500 time 0.3998 (0.4180) data time 0.0008 (0.0014) model time 0.3989 (0.4134) loss 5.7061 (6.6817) grad_norm 5.3309 (3.3023) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][590/625] eta 0:00:14 lr 0.000182 wd 0.0500 time 0.4034 (0.4177) data time 0.0007 (0.0014) model time 0.4028 (0.4132) loss 7.1050 (6.6818) grad_norm 3.8455 (3.3032) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][600/625] eta 0:00:10 lr 0.000181 wd 0.0500 time 0.3977 (0.4177) data time 0.0008 (0.0014) model time 0.3969 (0.4133) loss 6.9068 (6.6810) grad_norm 1.8260 (3.2932) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][610/625] eta 0:00:06 lr 0.000181 wd 0.0500 time 0.3977 (0.4174) data time 0.0006 (0.0014) model time 0.3971 (0.4130) loss 7.4472 (6.6797) grad_norm 2.1196 (3.2990) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][620/625] eta 0:00:02 lr 0.000181 wd 0.0500 time 0.3975 (0.4171) data time 0.0006 (0.0014) model time 0.3969 (0.4127) loss 6.3468 (6.6741) grad_norm 3.4200 (3.3019) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:03:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 230 training takes 0:04:20 [2024-07-25 09:03:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:03:56 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:03:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5405 (0.5405) Acc@1 90.039 (90.039) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 09:03:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.120) Loss 0.8403 (0.6683) Acc@1 82.031 (87.061) Acc@5 96.631 (97.909) Mem 14939MB [2024-07-25 09:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9468 (0.7766) Acc@1 77.832 (83.945) Acc@5 95.312 (96.882) Mem 14939MB [2024-07-25 09:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.593 Acc@5 96.867 [2024-07-25 09:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.59% [2024-07-25 09:03:59 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 09:04:00 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 09:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.5400 (0.5400) Acc@1 90.186 (90.186) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 09:04:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8271 (0.6642) Acc@1 82.324 (86.923) Acc@5 96.631 (97.927) Mem 14939MB [2024-07-25 09:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9453 (0.7720) Acc@1 78.076 (83.963) Acc@5 95.459 (96.935) Mem 14939MB [2024-07-25 09:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.559 Acc@5 96.885 [2024-07-25 09:04:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:04:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][0/625] eta 0:12:53 lr 0.000181 wd 0.0500 time 1.2378 (1.2378) data time 0.4774 (0.4774) model time 0.0000 (0.0000) loss 6.3383 (6.3383) grad_norm 3.0487 (3.0487) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][10/625] eta 0:04:51 lr 0.000181 wd 0.0500 time 0.3975 (0.4746) data time 0.0009 (0.0441) model time 0.0000 (0.0000) loss 6.6765 (6.9642) grad_norm 3.0366 (2.9994) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][20/625] eta 0:04:25 lr 0.000181 wd 0.0500 time 0.3973 (0.4383) data time 0.0008 (0.0235) model time 0.0000 (0.0000) loss 5.7974 (6.8015) grad_norm 5.5317 (3.8390) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][30/625] eta 0:04:13 lr 0.000181 wd 0.0500 time 0.3995 (0.4253) data time 0.0006 (0.0162) model time 0.0000 (0.0000) loss 6.9894 (6.7991) grad_norm 2.1727 (4.0574) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][40/625] eta 0:04:08 lr 0.000181 wd 0.0500 time 0.3985 (0.4246) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 7.7275 (6.7920) grad_norm 2.0996 (3.7025) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][50/625] eta 0:04:03 lr 0.000181 wd 0.0500 time 0.5806 (0.4235) data time 0.0006 (0.0101) model time 0.0000 (0.0000) loss 5.7588 (6.7044) grad_norm 1.8505 (3.5032) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][60/625] eta 0:03:58 lr 0.000181 wd 0.0500 time 0.4014 (0.4225) data time 0.0008 (0.0086) model time 0.4006 (0.4164) loss 6.4286 (6.6857) grad_norm 2.3375 (3.4140) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][70/625] eta 0:03:58 lr 0.000181 wd 0.0500 time 0.5937 (0.4302) data time 0.0008 (0.0075) model time 0.5929 (0.4463) loss 7.8072 (6.7123) grad_norm 3.4103 (3.3331) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][80/625] eta 0:03:57 lr 0.000181 wd 0.0500 time 0.5866 (0.4358) data time 0.0006 (0.0067) model time 0.5860 (0.4559) loss 6.9263 (6.7039) grad_norm 2.1626 (3.2763) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][90/625] eta 0:03:55 lr 0.000181 wd 0.0500 time 0.5853 (0.4394) data time 0.0007 (0.0060) model time 0.5847 (0.4588) loss 5.5608 (6.7221) grad_norm 3.3765 (3.2590) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][100/625] eta 0:03:50 lr 0.000181 wd 0.0500 time 0.3969 (0.4389) data time 0.0006 (0.0055) model time 0.3963 (0.4537) loss 7.0205 (6.7288) grad_norm 2.1598 (3.2778) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][110/625] eta 0:03:44 lr 0.000180 wd 0.0500 time 0.3955 (0.4368) data time 0.0009 (0.0051) model time 0.3946 (0.4474) loss 6.8931 (6.7539) grad_norm 3.2150 (3.2689) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][120/625] eta 0:03:39 lr 0.000180 wd 0.0500 time 0.4005 (0.4352) data time 0.0006 (0.0047) model time 0.3999 (0.4428) loss 5.5181 (6.7430) grad_norm 2.8326 (3.2540) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][130/625] eta 0:03:34 lr 0.000180 wd 0.0500 time 0.3995 (0.4337) data time 0.0008 (0.0044) model time 0.3987 (0.4395) loss 6.7050 (6.7062) grad_norm 4.1343 (3.2396) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][140/625] eta 0:03:29 lr 0.000180 wd 0.0500 time 0.4028 (0.4313) data time 0.0006 (0.0042) model time 0.4022 (0.4349) loss 5.7140 (6.7121) grad_norm 2.8129 (3.3337) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][150/625] eta 0:03:23 lr 0.000180 wd 0.0500 time 0.3971 (0.4292) data time 0.0006 (0.0040) model time 0.3964 (0.4313) loss 7.1426 (6.7133) grad_norm 2.0925 (3.2764) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][160/625] eta 0:03:18 lr 0.000180 wd 0.0500 time 0.3972 (0.4274) data time 0.0008 (0.0038) model time 0.3964 (0.4284) loss 7.5296 (6.6941) grad_norm 4.5982 (3.2565) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][170/625] eta 0:03:13 lr 0.000180 wd 0.0500 time 0.4016 (0.4257) data time 0.0006 (0.0036) model time 0.4010 (0.4258) loss 6.1789 (6.6846) grad_norm 2.4824 (3.2904) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][180/625] eta 0:03:08 lr 0.000180 wd 0.0500 time 0.3950 (0.4241) data time 0.0006 (0.0034) model time 0.3944 (0.4235) loss 7.1085 (6.6918) grad_norm 2.4694 (3.2753) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][190/625] eta 0:03:03 lr 0.000180 wd 0.0500 time 0.3957 (0.4227) data time 0.0006 (0.0033) model time 0.3952 (0.4216) loss 6.2257 (6.6786) grad_norm 2.2063 (3.2579) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][200/625] eta 0:02:59 lr 0.000180 wd 0.0500 time 0.4004 (0.4215) data time 0.0006 (0.0032) model time 0.3998 (0.4200) loss 6.1990 (6.6715) grad_norm 3.0954 (3.2552) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][210/625] eta 0:02:54 lr 0.000180 wd 0.0500 time 0.3977 (0.4205) data time 0.0008 (0.0031) model time 0.3968 (0.4187) loss 7.3216 (6.6905) grad_norm 2.7691 (3.2263) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][220/625] eta 0:02:49 lr 0.000180 wd 0.0500 time 0.3968 (0.4194) data time 0.0009 (0.0030) model time 0.3959 (0.4174) loss 6.4195 (6.6844) grad_norm 2.7074 (3.2579) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][230/625] eta 0:02:45 lr 0.000180 wd 0.0500 time 0.4047 (0.4185) data time 0.0008 (0.0029) model time 0.4038 (0.4163) loss 6.7812 (6.6706) grad_norm 4.5755 (3.2859) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][240/625] eta 0:02:40 lr 0.000180 wd 0.0500 time 0.3973 (0.4176) data time 0.0006 (0.0028) model time 0.3967 (0.4153) loss 7.1812 (6.6832) grad_norm 3.6341 (3.2773) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][250/625] eta 0:02:36 lr 0.000179 wd 0.0500 time 0.3974 (0.4169) data time 0.0009 (0.0027) model time 0.3965 (0.4144) loss 7.1248 (6.6933) grad_norm 2.1645 (3.2785) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][260/625] eta 0:02:32 lr 0.000179 wd 0.0500 time 0.3986 (0.4167) data time 0.0008 (0.0026) model time 0.3978 (0.4142) loss 6.8338 (6.7132) grad_norm 4.7750 (3.3448) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][270/625] eta 0:02:27 lr 0.000179 wd 0.0500 time 0.3980 (0.4160) data time 0.0009 (0.0026) model time 0.3972 (0.4135) loss 7.2712 (6.7195) grad_norm 2.6062 (3.3251) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][280/625] eta 0:02:23 lr 0.000179 wd 0.0500 time 0.6053 (0.4170) data time 0.0008 (0.0025) model time 0.6045 (0.4148) loss 5.3445 (6.7229) grad_norm 2.6141 (3.3052) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:06:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][290/625] eta 0:02:19 lr 0.000179 wd 0.0500 time 0.5509 (0.4179) data time 0.0008 (0.0024) model time 0.5501 (0.4159) loss 6.0915 (6.7142) grad_norm 5.7211 (inf) loss_scale 128.0000 (253.8007) mem 14939MB [2024-07-25 09:06:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][300/625] eta 0:02:16 lr 0.000179 wd 0.0500 time 0.5814 (0.4197) data time 0.0008 (0.0024) model time 0.5806 (0.4181) loss 6.2714 (6.7147) grad_norm 2.6364 (inf) loss_scale 128.0000 (249.6213) mem 14939MB [2024-07-25 09:06:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][310/625] eta 0:02:12 lr 0.000179 wd 0.0500 time 0.3965 (0.4213) data time 0.0009 (0.0023) model time 0.3957 (0.4201) loss 7.7301 (6.7143) grad_norm 2.9525 (inf) loss_scale 128.0000 (245.7106) mem 14939MB [2024-07-25 09:06:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][320/625] eta 0:02:08 lr 0.000179 wd 0.0500 time 0.3976 (0.4217) data time 0.0008 (0.0023) model time 0.3968 (0.4205) loss 6.5173 (6.7201) grad_norm 2.8387 (inf) loss_scale 128.0000 (242.0436) mem 14939MB [2024-07-25 09:06:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][330/625] eta 0:02:04 lr 0.000179 wd 0.0500 time 0.4002 (0.4216) data time 0.0006 (0.0023) model time 0.3996 (0.4204) loss 5.8860 (6.7126) grad_norm 2.3862 (inf) loss_scale 128.0000 (238.5982) mem 14939MB [2024-07-25 09:06:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][340/625] eta 0:02:00 lr 0.000179 wd 0.0500 time 0.3974 (0.4213) data time 0.0010 (0.0022) model time 0.3964 (0.4201) loss 7.3401 (6.7201) grad_norm 3.0331 (inf) loss_scale 128.0000 (235.3548) mem 14939MB [2024-07-25 09:06:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][350/625] eta 0:01:55 lr 0.000179 wd 0.0500 time 0.3984 (0.4210) data time 0.0009 (0.0022) model time 0.3975 (0.4198) loss 7.5744 (6.7160) grad_norm 3.6242 (inf) loss_scale 128.0000 (232.2963) mem 14939MB [2024-07-25 09:06:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][360/625] eta 0:01:51 lr 0.000179 wd 0.0500 time 0.4017 (0.4204) data time 0.0008 (0.0021) model time 0.4009 (0.4191) loss 6.5861 (6.7156) grad_norm 1.8578 (inf) loss_scale 128.0000 (229.4072) mem 14939MB [2024-07-25 09:06:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][370/625] eta 0:01:47 lr 0.000179 wd 0.0500 time 0.3985 (0.4199) data time 0.0009 (0.0021) model time 0.3976 (0.4185) loss 6.4484 (6.7182) grad_norm 2.0684 (inf) loss_scale 128.0000 (226.6739) mem 14939MB [2024-07-25 09:06:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][380/625] eta 0:01:42 lr 0.000178 wd 0.0500 time 0.4037 (0.4194) data time 0.0009 (0.0021) model time 0.4029 (0.4179) loss 7.0784 (6.7167) grad_norm 2.5807 (inf) loss_scale 128.0000 (224.0840) mem 14939MB [2024-07-25 09:06:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][390/625] eta 0:01:38 lr 0.000178 wd 0.0500 time 0.4047 (0.4189) data time 0.0006 (0.0020) model time 0.4041 (0.4174) loss 6.7709 (6.7163) grad_norm 5.8043 (inf) loss_scale 128.0000 (221.6266) mem 14939MB [2024-07-25 09:06:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][400/625] eta 0:01:34 lr 0.000178 wd 0.0500 time 0.4051 (0.4184) data time 0.0006 (0.0020) model time 0.4045 (0.4169) loss 6.0698 (6.7115) grad_norm 2.2670 (inf) loss_scale 128.0000 (219.2918) mem 14939MB [2024-07-25 09:06:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][410/625] eta 0:01:29 lr 0.000178 wd 0.0500 time 0.3999 (0.4179) data time 0.0007 (0.0020) model time 0.3992 (0.4163) loss 4.9942 (6.6986) grad_norm 3.8812 (inf) loss_scale 128.0000 (217.0706) mem 14939MB [2024-07-25 09:06:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][420/625] eta 0:01:25 lr 0.000178 wd 0.0500 time 0.4005 (0.4175) data time 0.0008 (0.0019) model time 0.3996 (0.4158) loss 8.1417 (6.6989) grad_norm 1.9836 (inf) loss_scale 128.0000 (214.9549) mem 14939MB [2024-07-25 09:07:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][430/625] eta 0:01:21 lr 0.000178 wd 0.0500 time 0.3949 (0.4170) data time 0.0007 (0.0019) model time 0.3942 (0.4153) loss 5.4315 (6.6973) grad_norm 4.0659 (inf) loss_scale 128.0000 (212.9374) mem 14939MB [2024-07-25 09:07:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][440/625] eta 0:01:17 lr 0.000178 wd 0.0500 time 0.4007 (0.4167) data time 0.0008 (0.0019) model time 0.3998 (0.4150) loss 6.3310 (6.6974) grad_norm 2.7656 (inf) loss_scale 128.0000 (211.0113) mem 14939MB [2024-07-25 09:07:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][450/625] eta 0:01:12 lr 0.000178 wd 0.0500 time 0.3985 (0.4163) data time 0.0007 (0.0019) model time 0.3978 (0.4146) loss 5.8730 (6.6960) grad_norm 3.5073 (inf) loss_scale 128.0000 (209.1707) mem 14939MB [2024-07-25 09:07:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][460/625] eta 0:01:08 lr 0.000178 wd 0.0500 time 0.3971 (0.4160) data time 0.0007 (0.0018) model time 0.3965 (0.4142) loss 6.4212 (6.6845) grad_norm 3.2158 (inf) loss_scale 128.0000 (207.4100) mem 14939MB [2024-07-25 09:07:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][470/625] eta 0:01:04 lr 0.000178 wd 0.0500 time 0.3990 (0.4156) data time 0.0009 (0.0018) model time 0.3981 (0.4138) loss 7.4622 (6.6884) grad_norm 2.4536 (inf) loss_scale 128.0000 (205.7240) mem 14939MB [2024-07-25 09:07:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][480/625] eta 0:01:00 lr 0.000178 wd 0.0500 time 0.3983 (0.4154) data time 0.0007 (0.0018) model time 0.3976 (0.4137) loss 7.4689 (6.6803) grad_norm 3.1052 (inf) loss_scale 128.0000 (204.1081) mem 14939MB [2024-07-25 09:07:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][490/625] eta 0:00:56 lr 0.000178 wd 0.0500 time 0.3972 (0.4151) data time 0.0009 (0.0018) model time 0.3963 (0.4133) loss 6.2842 (6.6815) grad_norm 4.1890 (inf) loss_scale 128.0000 (202.5580) mem 14939MB [2024-07-25 09:07:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][500/625] eta 0:00:51 lr 0.000178 wd 0.0500 time 0.3992 (0.4154) data time 0.0009 (0.0018) model time 0.3983 (0.4136) loss 5.6271 (6.6718) grad_norm 10.3536 (inf) loss_scale 128.0000 (201.0699) mem 14939MB [2024-07-25 09:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][510/625] eta 0:00:47 lr 0.000178 wd 0.0500 time 0.6015 (0.4162) data time 0.0007 (0.0017) model time 0.6008 (0.4145) loss 5.6904 (6.6653) grad_norm 2.7366 (inf) loss_scale 128.0000 (199.6399) mem 14939MB [2024-07-25 09:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][520/625] eta 0:00:43 lr 0.000177 wd 0.0500 time 0.3981 (0.4169) data time 0.0008 (0.0017) model time 0.3973 (0.4153) loss 5.7206 (6.6690) grad_norm 3.0910 (inf) loss_scale 128.0000 (198.2649) mem 14939MB [2024-07-25 09:07:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][530/625] eta 0:00:39 lr 0.000177 wd 0.0500 time 0.3944 (0.4185) data time 0.0007 (0.0017) model time 0.3937 (0.4171) loss 7.9503 (6.6766) grad_norm 3.7191 (inf) loss_scale 128.0000 (196.9416) mem 14939MB [2024-07-25 09:07:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][540/625] eta 0:00:35 lr 0.000177 wd 0.0500 time 0.5758 (0.4192) data time 0.0008 (0.0017) model time 0.5750 (0.4179) loss 7.0843 (6.6796) grad_norm 3.4193 (inf) loss_scale 128.0000 (195.6673) mem 14939MB [2024-07-25 09:07:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][550/625] eta 0:00:31 lr 0.000177 wd 0.0500 time 0.3996 (0.4188) data time 0.0006 (0.0017) model time 0.3990 (0.4175) loss 6.5834 (6.6863) grad_norm 2.7443 (inf) loss_scale 128.0000 (194.4392) mem 14939MB [2024-07-25 09:07:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][560/625] eta 0:00:27 lr 0.000177 wd 0.0500 time 0.3959 (0.4186) data time 0.0009 (0.0017) model time 0.3950 (0.4173) loss 5.9460 (6.6828) grad_norm 5.1979 (inf) loss_scale 128.0000 (193.2549) mem 14939MB [2024-07-25 09:08:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][570/625] eta 0:00:23 lr 0.000177 wd 0.0500 time 0.4039 (0.4186) data time 0.0006 (0.0017) model time 0.4034 (0.4172) loss 7.3936 (6.6786) grad_norm 2.7365 (inf) loss_scale 128.0000 (192.1121) mem 14939MB [2024-07-25 09:08:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][580/625] eta 0:00:18 lr 0.000177 wd 0.0500 time 0.3939 (0.4182) data time 0.0006 (0.0016) model time 0.3933 (0.4169) loss 5.7215 (6.6786) grad_norm 2.3234 (inf) loss_scale 128.0000 (191.0086) mem 14939MB [2024-07-25 09:08:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][590/625] eta 0:00:14 lr 0.000177 wd 0.0500 time 0.3975 (0.4179) data time 0.0008 (0.0016) model time 0.3967 (0.4165) loss 6.7200 (6.6775) grad_norm 4.0177 (inf) loss_scale 128.0000 (189.9425) mem 14939MB [2024-07-25 09:08:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][600/625] eta 0:00:10 lr 0.000177 wd 0.0500 time 0.4094 (0.4176) data time 0.0008 (0.0016) model time 0.4086 (0.4162) loss 5.5501 (6.6725) grad_norm 2.5449 (inf) loss_scale 128.0000 (188.9118) mem 14939MB [2024-07-25 09:08:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][610/625] eta 0:00:06 lr 0.000177 wd 0.0500 time 0.3889 (0.4172) data time 0.0006 (0.0016) model time 0.3883 (0.4158) loss 7.1301 (6.6738) grad_norm 2.9482 (inf) loss_scale 128.0000 (187.9149) mem 14939MB [2024-07-25 09:08:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][620/625] eta 0:00:02 lr 0.000177 wd 0.0500 time 0.3949 (0.4169) data time 0.0006 (0.0016) model time 0.3943 (0.4155) loss 7.1355 (6.6736) grad_norm 2.5628 (inf) loss_scale 128.0000 (186.9501) mem 14939MB [2024-07-25 09:08:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 231 training takes 0:04:20 [2024-07-25 09:08:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:08:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:08:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 0.5537 (0.5537) Acc@1 89.893 (89.893) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 09:08:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8301 (0.6696) Acc@1 82.471 (86.927) Acc@5 96.582 (97.838) Mem 14939MB [2024-07-25 09:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9297 (0.7775) Acc@1 78.711 (83.959) Acc@5 95.215 (96.794) Mem 14939MB [2024-07-25 09:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.599 Acc@5 96.769 [2024-07-25 09:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:08:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.60% [2024-07-25 09:08:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 09:08:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 09:08:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.5400 (0.5400) Acc@1 90.186 (90.186) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 09:08:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8276 (0.6641) Acc@1 82.227 (86.932) Acc@5 96.729 (97.936) Mem 14939MB [2024-07-25 09:08:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9429 (0.7715) Acc@1 78.174 (83.980) Acc@5 95.508 (96.942) Mem 14939MB [2024-07-25 09:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.575 Acc@5 96.885 [2024-07-25 09:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:08:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.58% [2024-07-25 09:08:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:08:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:08:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][0/625] eta 0:08:02 lr 0.000177 wd 0.0500 time 0.7728 (0.7728) data time 0.3899 (0.3899) model time 0.0000 (0.0000) loss 7.0098 (7.0098) grad_norm 3.3579 (3.3579) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:08:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][10/625] eta 0:04:26 lr 0.000177 wd 0.0500 time 0.3963 (0.4327) data time 0.0008 (0.0362) model time 0.0000 (0.0000) loss 7.1609 (6.4022) grad_norm 2.9020 (3.0070) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:08:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][20/625] eta 0:04:11 lr 0.000177 wd 0.0500 time 0.3969 (0.4165) data time 0.0007 (0.0193) model time 0.0000 (0.0000) loss 5.1308 (6.5011) grad_norm 2.7983 (3.0437) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:08:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][30/625] eta 0:04:04 lr 0.000176 wd 0.0500 time 0.4008 (0.4107) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 7.3234 (6.5003) grad_norm 1.7477 (2.8646) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:08:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][40/625] eta 0:03:58 lr 0.000176 wd 0.0500 time 0.3979 (0.4077) data time 0.0009 (0.0103) model time 0.0000 (0.0000) loss 6.8750 (6.5955) grad_norm 2.9069 (2.8039) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:08:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][50/625] eta 0:03:53 lr 0.000176 wd 0.0500 time 0.4051 (0.4061) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 5.4456 (6.5462) grad_norm 3.3843 (2.7857) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:08:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][60/625] eta 0:03:48 lr 0.000176 wd 0.0500 time 0.3976 (0.4047) data time 0.0006 (0.0072) model time 0.3970 (0.3970) loss 6.2209 (6.5579) grad_norm 8.1834 (3.1128) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:08:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][70/625] eta 0:03:45 lr 0.000176 wd 0.0500 time 0.3966 (0.4066) data time 0.0010 (0.0063) model time 0.3956 (0.4071) loss 7.2050 (6.6237) grad_norm 3.5329 (3.2375) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][80/625] eta 0:03:41 lr 0.000176 wd 0.0500 time 0.3994 (0.4071) data time 0.0007 (0.0056) model time 0.3987 (0.4081) loss 7.0941 (6.6305) grad_norm 2.9785 (3.3427) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][90/625] eta 0:03:38 lr 0.000176 wd 0.0500 time 0.6130 (0.4086) data time 0.0008 (0.0051) model time 0.6121 (0.4111) loss 6.9953 (6.6542) grad_norm 3.9419 (3.3446) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][100/625] eta 0:03:35 lr 0.000176 wd 0.0500 time 0.3953 (0.4112) data time 0.0010 (0.0047) model time 0.3944 (0.4157) loss 5.5247 (6.6559) grad_norm 2.3749 (3.7045) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][110/625] eta 0:03:33 lr 0.000176 wd 0.0500 time 0.3967 (0.4150) data time 0.0009 (0.0043) model time 0.3958 (0.4218) loss 7.4546 (6.6577) grad_norm 2.2426 (3.6406) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][120/625] eta 0:03:31 lr 0.000176 wd 0.0500 time 0.5967 (0.4193) data time 0.0009 (0.0040) model time 0.5958 (0.4281) loss 6.9868 (6.6384) grad_norm 3.2296 (3.6032) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][130/625] eta 0:03:27 lr 0.000176 wd 0.0500 time 0.3959 (0.4193) data time 0.0008 (0.0038) model time 0.3951 (0.4269) loss 5.6102 (6.6490) grad_norm 3.3132 (3.5323) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][140/625] eta 0:03:24 lr 0.000176 wd 0.0500 time 0.3989 (0.4216) data time 0.0007 (0.0036) model time 0.3982 (0.4296) loss 7.7764 (6.6590) grad_norm 2.8095 (3.5124) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][150/625] eta 0:03:19 lr 0.000176 wd 0.0500 time 0.3957 (0.4201) data time 0.0008 (0.0034) model time 0.3949 (0.4264) loss 7.0927 (6.6598) grad_norm 2.4123 (3.7395) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][160/625] eta 0:03:15 lr 0.000175 wd 0.0500 time 0.3991 (0.4198) data time 0.0009 (0.0032) model time 0.3983 (0.4253) loss 6.8850 (6.6628) grad_norm 2.2918 (3.6705) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][170/625] eta 0:03:10 lr 0.000175 wd 0.0500 time 0.3943 (0.4185) data time 0.0009 (0.0031) model time 0.3934 (0.4230) loss 6.5937 (6.6626) grad_norm 1.8493 (3.5907) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][180/625] eta 0:03:05 lr 0.000175 wd 0.0500 time 0.3965 (0.4174) data time 0.0008 (0.0030) model time 0.3957 (0.4210) loss 7.0348 (6.6615) grad_norm 12.2372 (3.6152) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][190/625] eta 0:03:01 lr 0.000175 wd 0.0500 time 0.3991 (0.4164) data time 0.0006 (0.0028) model time 0.3985 (0.4194) loss 7.1691 (6.6564) grad_norm 1.7342 (3.6158) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][200/625] eta 0:02:56 lr 0.000175 wd 0.0500 time 0.4029 (0.4156) data time 0.0009 (0.0027) model time 0.4019 (0.4180) loss 7.2352 (6.6525) grad_norm 30.6826 (3.7206) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:09:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][210/625] eta 0:02:52 lr 0.000175 wd 0.0500 time 0.3989 (0.4149) data time 0.0007 (0.0027) model time 0.3982 (0.4169) loss 5.9032 (6.6501) grad_norm 107.0509 (4.2521) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][220/625] eta 0:02:47 lr 0.000175 wd 0.0500 time 0.4052 (0.4142) data time 0.0007 (0.0026) model time 0.4045 (0.4159) loss 6.2332 (6.6331) grad_norm 3.8265 (4.2231) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][230/625] eta 0:02:43 lr 0.000175 wd 0.0500 time 0.4064 (0.4136) data time 0.0006 (0.0025) model time 0.4057 (0.4150) loss 6.2578 (6.6188) grad_norm 4.6854 (4.1989) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][240/625] eta 0:02:39 lr 0.000175 wd 0.0500 time 0.3995 (0.4131) data time 0.0006 (0.0024) model time 0.3989 (0.4141) loss 6.2718 (6.6176) grad_norm 2.8886 (4.1888) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][250/625] eta 0:02:34 lr 0.000175 wd 0.0500 time 0.3988 (0.4125) data time 0.0007 (0.0024) model time 0.3982 (0.4134) loss 7.3605 (6.6183) grad_norm 4.2310 (4.2048) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][260/625] eta 0:02:30 lr 0.000175 wd 0.0500 time 0.4005 (0.4120) data time 0.0009 (0.0023) model time 0.3996 (0.4126) loss 7.0999 (6.6310) grad_norm 1.6289 (4.1904) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][270/625] eta 0:02:26 lr 0.000175 wd 0.0500 time 0.3984 (0.4115) data time 0.0006 (0.0022) model time 0.3978 (0.4120) loss 7.2989 (6.6308) grad_norm 3.0495 (4.1446) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][280/625] eta 0:02:21 lr 0.000175 wd 0.0500 time 0.3989 (0.4111) data time 0.0009 (0.0022) model time 0.3980 (0.4114) loss 6.5327 (6.6316) grad_norm 2.0508 (4.1773) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][290/625] eta 0:02:17 lr 0.000175 wd 0.0500 time 0.5975 (0.4114) data time 0.0009 (0.0022) model time 0.5966 (0.4117) loss 7.4399 (6.6421) grad_norm 2.2183 (4.1166) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][300/625] eta 0:02:13 lr 0.000174 wd 0.0500 time 0.3933 (0.4114) data time 0.0009 (0.0021) model time 0.3923 (0.4117) loss 6.7457 (6.6449) grad_norm 1.8960 (4.0517) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][310/625] eta 0:02:09 lr 0.000174 wd 0.0500 time 0.4007 (0.4116) data time 0.0007 (0.0021) model time 0.4001 (0.4118) loss 5.9371 (6.6366) grad_norm 2.0683 (3.9943) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][320/625] eta 0:02:05 lr 0.000174 wd 0.0500 time 0.3994 (0.4120) data time 0.0007 (0.0020) model time 0.3987 (0.4124) loss 7.2444 (6.6393) grad_norm 2.1479 (3.9515) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][330/625] eta 0:02:02 lr 0.000174 wd 0.0500 time 0.3990 (0.4140) data time 0.0007 (0.0020) model time 0.3983 (0.4146) loss 5.6973 (6.6350) grad_norm 2.5239 (3.9400) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][340/625] eta 0:01:58 lr 0.000174 wd 0.0500 time 0.6045 (0.4154) data time 0.0009 (0.0020) model time 0.6036 (0.4163) loss 5.9084 (6.6311) grad_norm 3.9443 (3.9619) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:10:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][350/625] eta 0:01:54 lr 0.000174 wd 0.0500 time 0.5318 (0.4158) data time 0.0006 (0.0019) model time 0.5312 (0.4167) loss 7.0535 (6.6452) grad_norm 2.6362 (3.9684) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][360/625] eta 0:01:50 lr 0.000174 wd 0.0500 time 0.3964 (0.4159) data time 0.0012 (0.0019) model time 0.3953 (0.4167) loss 7.9325 (6.6609) grad_norm 2.6401 (3.9288) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][370/625] eta 0:01:45 lr 0.000174 wd 0.0500 time 0.4021 (0.4154) data time 0.0009 (0.0019) model time 0.4012 (0.4161) loss 7.4937 (6.6616) grad_norm 3.2863 (3.9167) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][380/625] eta 0:01:41 lr 0.000174 wd 0.0500 time 0.3981 (0.4153) data time 0.0006 (0.0018) model time 0.3976 (0.4159) loss 6.7070 (6.6657) grad_norm 2.7970 (3.9148) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][390/625] eta 0:01:37 lr 0.000174 wd 0.0500 time 0.3991 (0.4149) data time 0.0007 (0.0018) model time 0.3984 (0.4154) loss 6.2043 (6.6635) grad_norm 4.8610 (3.9198) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][400/625] eta 0:01:33 lr 0.000174 wd 0.0500 time 0.3976 (0.4145) data time 0.0006 (0.0018) model time 0.3970 (0.4150) loss 6.4550 (6.6540) grad_norm 3.2366 (3.8937) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][410/625] eta 0:01:29 lr 0.000174 wd 0.0500 time 0.3976 (0.4142) data time 0.0007 (0.0018) model time 0.3970 (0.4145) loss 7.0048 (6.6573) grad_norm 4.0237 (3.8735) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][420/625] eta 0:01:24 lr 0.000174 wd 0.0500 time 0.4011 (0.4138) data time 0.0008 (0.0017) model time 0.4003 (0.4141) loss 5.7868 (6.6552) grad_norm 3.3002 (3.8446) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][430/625] eta 0:01:20 lr 0.000174 wd 0.0500 time 0.4163 (0.4135) data time 0.0007 (0.0017) model time 0.4156 (0.4137) loss 5.0390 (6.6480) grad_norm 2.9742 (3.8377) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][440/625] eta 0:01:16 lr 0.000173 wd 0.0500 time 0.3989 (0.4132) data time 0.0008 (0.0017) model time 0.3982 (0.4134) loss 6.6647 (6.6501) grad_norm 3.8513 (3.8201) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][450/625] eta 0:01:12 lr 0.000173 wd 0.0500 time 0.3987 (0.4130) data time 0.0008 (0.0017) model time 0.3979 (0.4130) loss 6.8958 (6.6485) grad_norm 2.6157 (3.8220) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][460/625] eta 0:01:08 lr 0.000173 wd 0.0500 time 0.3987 (0.4127) data time 0.0008 (0.0017) model time 0.3979 (0.4127) loss 6.5645 (6.6538) grad_norm 3.1612 (3.8065) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][470/625] eta 0:01:03 lr 0.000173 wd 0.0500 time 0.3979 (0.4123) data time 0.0009 (0.0016) model time 0.3970 (0.4123) loss 6.8688 (6.6559) grad_norm 2.3784 (3.7795) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][480/625] eta 0:00:59 lr 0.000173 wd 0.0500 time 0.3991 (0.4121) data time 0.0009 (0.0016) model time 0.3982 (0.4120) loss 6.0015 (6.6643) grad_norm 4.7178 (3.7668) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][490/625] eta 0:00:55 lr 0.000173 wd 0.0500 time 0.4075 (0.4119) data time 0.0007 (0.0016) model time 0.4068 (0.4117) loss 6.5823 (6.6577) grad_norm 3.2470 (3.7591) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:11:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][500/625] eta 0:00:51 lr 0.000173 wd 0.0500 time 0.3958 (0.4116) data time 0.0009 (0.0016) model time 0.3949 (0.4114) loss 6.6916 (6.6567) grad_norm 2.4881 (3.7688) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][510/625] eta 0:00:47 lr 0.000173 wd 0.0500 time 0.3970 (0.4117) data time 0.0010 (0.0016) model time 0.3961 (0.4115) loss 6.4010 (6.6567) grad_norm 1.6825 (3.7517) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][520/625] eta 0:00:43 lr 0.000173 wd 0.0500 time 0.3973 (0.4118) data time 0.0008 (0.0016) model time 0.3965 (0.4117) loss 6.9977 (6.6571) grad_norm 3.7826 (3.7479) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][530/625] eta 0:00:39 lr 0.000173 wd 0.0500 time 0.4002 (0.4120) data time 0.0008 (0.0015) model time 0.3994 (0.4119) loss 6.8903 (6.6566) grad_norm 3.3653 (3.7418) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][540/625] eta 0:00:35 lr 0.000173 wd 0.0500 time 0.3990 (0.4123) data time 0.0007 (0.0015) model time 0.3983 (0.4121) loss 6.5191 (6.6544) grad_norm 2.1169 (3.7241) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][550/625] eta 0:00:31 lr 0.000173 wd 0.0500 time 0.5829 (0.4134) data time 0.0007 (0.0015) model time 0.5822 (0.4134) loss 6.7380 (6.6501) grad_norm 2.1041 (3.7047) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][560/625] eta 0:00:26 lr 0.000173 wd 0.0500 time 0.5864 (0.4146) data time 0.0008 (0.0015) model time 0.5857 (0.4147) loss 6.1387 (6.6450) grad_norm 2.9017 (3.6927) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][570/625] eta 0:00:22 lr 0.000172 wd 0.0500 time 0.5873 (0.4150) data time 0.0006 (0.0015) model time 0.5866 (0.4150) loss 5.4499 (6.6402) grad_norm 2.3649 (3.6848) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][580/625] eta 0:00:18 lr 0.000172 wd 0.0500 time 0.4018 (0.4153) data time 0.0007 (0.0015) model time 0.4011 (0.4153) loss 6.4901 (6.6375) grad_norm 4.9210 (3.6939) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][590/625] eta 0:00:14 lr 0.000172 wd 0.0500 time 0.3988 (0.4150) data time 0.0008 (0.0015) model time 0.3979 (0.4150) loss 5.4014 (6.6363) grad_norm 4.3469 (3.6799) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][600/625] eta 0:00:10 lr 0.000172 wd 0.0500 time 0.4026 (0.4149) data time 0.0009 (0.0015) model time 0.4018 (0.4150) loss 7.4090 (6.6380) grad_norm 4.7001 (3.6872) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][610/625] eta 0:00:06 lr 0.000172 wd 0.0500 time 0.3964 (0.4147) data time 0.0006 (0.0015) model time 0.3958 (0.4147) loss 6.4891 (6.6384) grad_norm 2.4157 (3.6842) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][620/625] eta 0:00:02 lr 0.000172 wd 0.0500 time 0.4077 (0.4145) data time 0.0005 (0.0014) model time 0.4072 (0.4144) loss 5.3255 (6.6334) grad_norm 4.3597 (3.6762) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:12:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 232 training takes 0:04:18 [2024-07-25 09:12:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:12:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:12:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5532 (0.5532) Acc@1 89.746 (89.746) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 09:12:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8301 (0.6764) Acc@1 82.373 (86.923) Acc@5 96.680 (97.843) Mem 14939MB [2024-07-25 09:12:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9492 (0.7846) Acc@1 77.881 (83.882) Acc@5 95.215 (96.812) Mem 14939MB [2024-07-25 09:12:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.515 Acc@5 96.811 [2024-07-25 09:12:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-07-25 09:12:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.840 (0.840) Loss 0.5396 (0.5396) Acc@1 90.332 (90.332) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 09:12:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.157) Loss 0.8267 (0.6638) Acc@1 82.227 (86.932) Acc@5 96.729 (97.918) Mem 14939MB [2024-07-25 09:12:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9429 (0.7710) Acc@1 78.125 (84.010) Acc@5 95.508 (96.924) Mem 14939MB [2024-07-25 09:12:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.615 Acc@5 96.875 [2024-07-25 09:12:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:12:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.62% [2024-07-25 09:12:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:12:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:12:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][0/625] eta 0:08:14 lr 0.000172 wd 0.0500 time 0.7908 (0.7908) data time 0.3998 (0.3998) model time 0.0000 (0.0000) loss 6.3442 (6.3442) grad_norm 2.4439 (2.4439) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][10/625] eta 0:04:27 lr 0.000172 wd 0.0500 time 0.3949 (0.4349) data time 0.0006 (0.0371) model time 0.0000 (0.0000) loss 5.2370 (6.3058) grad_norm 5.9736 (3.4857) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][20/625] eta 0:04:13 lr 0.000172 wd 0.0500 time 0.3958 (0.4184) data time 0.0009 (0.0198) model time 0.0000 (0.0000) loss 7.3719 (6.5602) grad_norm 2.2266 (3.3515) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][30/625] eta 0:04:05 lr 0.000172 wd 0.0500 time 0.3981 (0.4121) data time 0.0007 (0.0137) model time 0.0000 (0.0000) loss 7.4331 (6.5878) grad_norm 3.3992 (4.5153) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][40/625] eta 0:04:01 lr 0.000172 wd 0.0500 time 0.3966 (0.4133) data time 0.0007 (0.0105) model time 0.0000 (0.0000) loss 6.9372 (6.6549) grad_norm 3.1668 (4.6211) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][50/625] eta 0:03:55 lr 0.000172 wd 0.0500 time 0.3956 (0.4104) data time 0.0008 (0.0086) model time 0.0000 (0.0000) loss 5.9173 (6.6374) grad_norm 1.9957 (5.1259) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][60/625] eta 0:03:50 lr 0.000172 wd 0.0500 time 0.3971 (0.4085) data time 0.0009 (0.0073) model time 0.3963 (0.3981) loss 7.2314 (6.6713) grad_norm 2.4136 (4.8111) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][70/625] eta 0:03:46 lr 0.000172 wd 0.0500 time 0.4058 (0.4073) data time 0.0010 (0.0064) model time 0.4048 (0.3987) loss 7.2191 (6.6569) grad_norm 2.4960 (4.5699) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][80/625] eta 0:03:41 lr 0.000171 wd 0.0500 time 0.4022 (0.4066) data time 0.0009 (0.0057) model time 0.4012 (0.3992) loss 5.5697 (6.6538) grad_norm 2.6402 (4.4484) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][90/625] eta 0:03:37 lr 0.000171 wd 0.0500 time 0.4115 (0.4059) data time 0.0007 (0.0052) model time 0.4108 (0.3993) loss 6.4965 (6.6365) grad_norm 3.3981 (4.3310) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][100/625] eta 0:03:32 lr 0.000171 wd 0.0500 time 0.3957 (0.4052) data time 0.0009 (0.0048) model time 0.3948 (0.3991) loss 7.5966 (6.6655) grad_norm 2.0702 (4.1834) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][110/625] eta 0:03:28 lr 0.000171 wd 0.0500 time 0.3974 (0.4047) data time 0.0009 (0.0044) model time 0.3965 (0.3991) loss 7.5576 (6.6879) grad_norm 4.4193 (4.1034) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][120/625] eta 0:03:25 lr 0.000171 wd 0.0500 time 0.4146 (0.4060) data time 0.0010 (0.0041) model time 0.4136 (0.4019) loss 7.1028 (6.6966) grad_norm 5.4085 (4.0581) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][130/625] eta 0:03:21 lr 0.000171 wd 0.0500 time 0.3976 (0.4071) data time 0.0007 (0.0039) model time 0.3970 (0.4041) loss 6.1354 (6.7016) grad_norm 3.6179 (4.1191) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][140/625] eta 0:03:18 lr 0.000171 wd 0.0500 time 0.3985 (0.4101) data time 0.0009 (0.0037) model time 0.3976 (0.4090) loss 6.7618 (6.6869) grad_norm 2.7389 (4.1568) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:13:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][150/625] eta 0:03:16 lr 0.000171 wd 0.0500 time 0.5309 (0.4127) data time 0.0009 (0.0035) model time 0.5300 (0.4131) loss 6.6032 (6.6978) grad_norm 72.4506 (4.6078) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][160/625] eta 0:03:14 lr 0.000171 wd 0.0500 time 0.5594 (0.4188) data time 0.0007 (0.0033) model time 0.5587 (0.4218) loss 6.3533 (6.6970) grad_norm 4.2922 (4.5342) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][170/625] eta 0:03:10 lr 0.000171 wd 0.0500 time 0.3993 (0.4188) data time 0.0007 (0.0032) model time 0.3986 (0.4215) loss 5.5485 (6.7014) grad_norm 2.0996 (4.4815) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][180/625] eta 0:03:06 lr 0.000171 wd 0.0500 time 0.3948 (0.4190) data time 0.0008 (0.0030) model time 0.3940 (0.4216) loss 5.7083 (6.6995) grad_norm 3.5007 (4.4365) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][190/625] eta 0:03:02 lr 0.000171 wd 0.0500 time 0.4019 (0.4189) data time 0.0008 (0.0029) model time 0.4010 (0.4211) loss 5.8434 (6.6889) grad_norm 1.6479 (4.3810) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][200/625] eta 0:02:57 lr 0.000171 wd 0.0500 time 0.3974 (0.4178) data time 0.0010 (0.0028) model time 0.3964 (0.4195) loss 6.1423 (6.6907) grad_norm 2.6273 (4.3235) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][210/625] eta 0:02:53 lr 0.000171 wd 0.0500 time 0.3977 (0.4169) data time 0.0006 (0.0027) model time 0.3972 (0.4182) loss 7.1565 (6.6876) grad_norm 3.2131 (4.2991) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][220/625] eta 0:02:48 lr 0.000170 wd 0.0500 time 0.3954 (0.4161) data time 0.0007 (0.0026) model time 0.3947 (0.4170) loss 6.0040 (6.6757) grad_norm 2.2781 (4.2381) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][230/625] eta 0:02:44 lr 0.000170 wd 0.0500 time 0.3976 (0.4153) data time 0.0008 (0.0026) model time 0.3968 (0.4158) loss 7.4762 (6.6716) grad_norm 2.1920 (4.1785) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][240/625] eta 0:02:39 lr 0.000170 wd 0.0500 time 0.3956 (0.4145) data time 0.0008 (0.0025) model time 0.3948 (0.4148) loss 6.6429 (6.6745) grad_norm 3.5529 (4.1301) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][250/625] eta 0:02:35 lr 0.000170 wd 0.0500 time 0.3941 (0.4139) data time 0.0009 (0.0024) model time 0.3932 (0.4139) loss 7.2007 (6.6753) grad_norm 2.1833 (4.1249) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][260/625] eta 0:02:31 lr 0.000170 wd 0.0500 time 0.3989 (0.4137) data time 0.0009 (0.0024) model time 0.3980 (0.4137) loss 6.5142 (6.6637) grad_norm 3.1721 (4.0541) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][270/625] eta 0:02:26 lr 0.000170 wd 0.0500 time 0.3949 (0.4131) data time 0.0006 (0.0023) model time 0.3943 (0.4129) loss 5.5055 (6.6784) grad_norm 3.8806 (4.0129) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][280/625] eta 0:02:22 lr 0.000170 wd 0.0500 time 0.3955 (0.4126) data time 0.0007 (0.0023) model time 0.3948 (0.4122) loss 6.1868 (6.6761) grad_norm 4.2992 (3.9652) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:14:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][290/625] eta 0:02:18 lr 0.000170 wd 0.0500 time 0.4071 (0.4122) data time 0.0006 (0.0022) model time 0.4065 (0.4117) loss 7.4485 (6.6848) grad_norm 2.3284 (3.9314) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][300/625] eta 0:02:13 lr 0.000170 wd 0.0500 time 0.3949 (0.4117) data time 0.0008 (0.0022) model time 0.3941 (0.4111) loss 7.7230 (6.6841) grad_norm 2.2137 (3.8891) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][310/625] eta 0:02:09 lr 0.000170 wd 0.0500 time 0.3976 (0.4112) data time 0.0008 (0.0021) model time 0.3968 (0.4105) loss 7.8881 (6.6778) grad_norm 1.6492 (3.8417) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][320/625] eta 0:02:05 lr 0.000170 wd 0.0500 time 0.3979 (0.4108) data time 0.0007 (0.0021) model time 0.3972 (0.4101) loss 7.0176 (6.6781) grad_norm 3.2586 (3.8697) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][330/625] eta 0:02:01 lr 0.000170 wd 0.0500 time 0.3968 (0.4105) data time 0.0007 (0.0020) model time 0.3961 (0.4097) loss 7.4950 (6.6615) grad_norm 3.6247 (3.8518) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][340/625] eta 0:01:57 lr 0.000170 wd 0.0500 time 0.6092 (0.4113) data time 0.0009 (0.0020) model time 0.6084 (0.4106) loss 7.2414 (6.6615) grad_norm 1.8721 (3.8115) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][350/625] eta 0:01:53 lr 0.000170 wd 0.0500 time 0.5452 (0.4113) data time 0.0008 (0.0020) model time 0.5443 (0.4106) loss 6.5823 (6.6600) grad_norm 1.6900 (3.8615) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][360/625] eta 0:01:49 lr 0.000169 wd 0.0500 time 0.4048 (0.4122) data time 0.0007 (0.0019) model time 0.4041 (0.4116) loss 7.5228 (6.6537) grad_norm 2.1493 (3.8424) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][370/625] eta 0:01:45 lr 0.000169 wd 0.0500 time 0.6152 (0.4134) data time 0.0008 (0.0019) model time 0.6144 (0.4130) loss 5.6691 (6.6528) grad_norm 2.1094 (3.8082) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][380/625] eta 0:01:41 lr 0.000169 wd 0.0500 time 0.4072 (0.4152) data time 0.0006 (0.0019) model time 0.4065 (0.4151) loss 6.5570 (6.6532) grad_norm 1.8558 (3.7909) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][390/625] eta 0:01:37 lr 0.000169 wd 0.0500 time 0.4000 (0.4152) data time 0.0006 (0.0018) model time 0.3994 (0.4151) loss 6.4583 (6.6411) grad_norm 1.9763 (3.7555) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][400/625] eta 0:01:33 lr 0.000169 wd 0.0500 time 0.3974 (0.4157) data time 0.0006 (0.0018) model time 0.3968 (0.4156) loss 6.0691 (6.6424) grad_norm 3.7374 (3.7438) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][410/625] eta 0:01:29 lr 0.000169 wd 0.0500 time 0.4000 (0.4158) data time 0.0007 (0.0018) model time 0.3992 (0.4157) loss 7.2859 (6.6387) grad_norm 3.0332 (3.7489) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][420/625] eta 0:01:25 lr 0.000169 wd 0.0500 time 0.3922 (0.4153) data time 0.0008 (0.0018) model time 0.3914 (0.4151) loss 5.9617 (6.6360) grad_norm 3.2651 (3.7667) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][430/625] eta 0:01:20 lr 0.000169 wd 0.0500 time 0.3952 (0.4149) data time 0.0006 (0.0018) model time 0.3946 (0.4146) loss 5.6363 (6.6323) grad_norm 2.1970 (3.7560) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:15:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][440/625] eta 0:01:16 lr 0.000169 wd 0.0500 time 0.3955 (0.4145) data time 0.0009 (0.0017) model time 0.3946 (0.4142) loss 6.9715 (6.6291) grad_norm 3.6519 (3.7462) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][450/625] eta 0:01:12 lr 0.000169 wd 0.0500 time 0.3985 (0.4141) data time 0.0009 (0.0017) model time 0.3976 (0.4137) loss 7.4090 (6.6300) grad_norm 2.1451 (3.7265) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][460/625] eta 0:01:08 lr 0.000169 wd 0.0500 time 0.3985 (0.4138) data time 0.0009 (0.0017) model time 0.3976 (0.4134) loss 6.7932 (6.6256) grad_norm 2.4832 (3.7132) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][470/625] eta 0:01:04 lr 0.000169 wd 0.0500 time 0.4035 (0.4135) data time 0.0007 (0.0017) model time 0.4028 (0.4130) loss 6.3410 (6.6263) grad_norm 1.9867 (3.6870) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][480/625] eta 0:00:59 lr 0.000169 wd 0.0500 time 0.3974 (0.4134) data time 0.0006 (0.0017) model time 0.3968 (0.4129) loss 6.7863 (6.6242) grad_norm 3.4084 (3.6684) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][490/625] eta 0:00:55 lr 0.000169 wd 0.0500 time 0.3946 (0.4131) data time 0.0008 (0.0016) model time 0.3938 (0.4126) loss 6.1328 (6.6274) grad_norm 4.8942 (3.6565) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][500/625] eta 0:00:51 lr 0.000168 wd 0.0500 time 0.3974 (0.4128) data time 0.0008 (0.0016) model time 0.3966 (0.4123) loss 6.6102 (6.6281) grad_norm 2.1092 (3.6525) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][510/625] eta 0:00:47 lr 0.000168 wd 0.0500 time 0.3984 (0.4125) data time 0.0008 (0.0016) model time 0.3976 (0.4119) loss 6.2122 (6.6215) grad_norm 4.7701 (3.6474) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][520/625] eta 0:00:43 lr 0.000168 wd 0.0500 time 0.3969 (0.4122) data time 0.0007 (0.0016) model time 0.3962 (0.4116) loss 7.1792 (6.6266) grad_norm 3.0333 (3.6390) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][530/625] eta 0:00:39 lr 0.000168 wd 0.0500 time 0.3949 (0.4120) data time 0.0009 (0.0016) model time 0.3940 (0.4114) loss 7.1350 (6.6295) grad_norm 2.8038 (3.6292) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][540/625] eta 0:00:35 lr 0.000168 wd 0.0500 time 0.3975 (0.4118) data time 0.0006 (0.0016) model time 0.3969 (0.4112) loss 6.8971 (6.6308) grad_norm 3.3912 (3.6167) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][550/625] eta 0:00:30 lr 0.000168 wd 0.0500 time 0.3991 (0.4117) data time 0.0006 (0.0015) model time 0.3985 (0.4110) loss 7.1850 (6.6299) grad_norm 30.9914 (3.6515) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][560/625] eta 0:00:26 lr 0.000168 wd 0.0500 time 0.4023 (0.4116) data time 0.0008 (0.0015) model time 0.4015 (0.4110) loss 5.7681 (6.6263) grad_norm 4.3489 (3.6357) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][570/625] eta 0:00:22 lr 0.000168 wd 0.0500 time 0.5975 (0.4124) data time 0.0008 (0.0015) model time 0.5966 (0.4118) loss 6.4871 (6.6320) grad_norm 3.1130 (3.6301) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:16:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][580/625] eta 0:00:18 lr 0.000168 wd 0.0500 time 0.5984 (0.4131) data time 0.0007 (0.0015) model time 0.5977 (0.4125) loss 7.2484 (6.6332) grad_norm 2.2927 (3.6177) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][590/625] eta 0:00:14 lr 0.000168 wd 0.0500 time 0.3945 (0.4133) data time 0.0009 (0.0015) model time 0.3937 (0.4128) loss 6.5245 (6.6322) grad_norm 3.2403 (3.6243) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][600/625] eta 0:00:10 lr 0.000168 wd 0.0500 time 0.4024 (0.4150) data time 0.0009 (0.0015) model time 0.4015 (0.4146) loss 6.0827 (6.6401) grad_norm 2.5574 (3.6243) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][610/625] eta 0:00:06 lr 0.000168 wd 0.0500 time 0.5485 (0.4153) data time 0.0007 (0.0015) model time 0.5479 (0.4149) loss 5.8627 (6.6327) grad_norm 5.3038 (3.6191) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][620/625] eta 0:00:02 lr 0.000168 wd 0.0500 time 0.3986 (0.4153) data time 0.0004 (0.0015) model time 0.3982 (0.4149) loss 6.4763 (6.6290) grad_norm 3.4229 (3.6076) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 233 training takes 0:04:19 [2024-07-25 09:17:16 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:17:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:17:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5381 (0.5381) Acc@1 89.746 (89.746) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 09:17:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8262 (0.6584) Acc@1 82.471 (87.012) Acc@5 96.973 (97.878) Mem 14939MB [2024-07-25 09:17:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9263 (0.7646) Acc@1 78.320 (84.068) Acc@5 95.166 (96.908) Mem 14939MB [2024-07-25 09:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.677 Acc@5 96.873 [2024-07-25 09:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-07-25 09:17:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.68% [2024-07-25 09:17:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 09:17:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 09:17:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5391 (0.5391) Acc@1 90.332 (90.332) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 09:17:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8262 (0.6632) Acc@1 82.373 (86.958) Acc@5 96.826 (97.927) Mem 14939MB [2024-07-25 09:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9419 (0.7704) Acc@1 78.076 (84.033) Acc@5 95.459 (96.931) Mem 14939MB [2024-07-25 09:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.633 Acc@5 96.881 [2024-07-25 09:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.63% [2024-07-25 09:17:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:17:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:17:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][0/625] eta 0:07:26 lr 0.000168 wd 0.0500 time 0.7137 (0.7137) data time 0.3322 (0.3322) model time 0.0000 (0.0000) loss 6.8081 (6.8081) grad_norm 2.2784 (2.2784) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][10/625] eta 0:04:28 lr 0.000167 wd 0.0500 time 0.4022 (0.4363) data time 0.0007 (0.0309) model time 0.0000 (0.0000) loss 6.4590 (6.3686) grad_norm 2.2909 (2.5558) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][20/625] eta 0:04:15 lr 0.000167 wd 0.0500 time 0.3928 (0.4218) data time 0.0009 (0.0166) model time 0.0000 (0.0000) loss 6.4688 (6.4908) grad_norm 2.0001 (2.5755) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][30/625] eta 0:04:06 lr 0.000167 wd 0.0500 time 0.4024 (0.4147) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 5.7620 (6.5961) grad_norm 3.5709 (2.5920) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][40/625] eta 0:04:00 lr 0.000167 wd 0.0500 time 0.3984 (0.4107) data time 0.0007 (0.0089) model time 0.0000 (0.0000) loss 7.6168 (6.6117) grad_norm 3.1532 (2.7171) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][50/625] eta 0:03:54 lr 0.000167 wd 0.0500 time 0.3975 (0.4085) data time 0.0007 (0.0073) model time 0.0000 (0.0000) loss 5.5332 (6.5923) grad_norm 2.9188 (3.0425) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][60/625] eta 0:03:49 lr 0.000167 wd 0.0500 time 0.3987 (0.4070) data time 0.0008 (0.0063) model time 0.3979 (0.3982) loss 5.9363 (6.6050) grad_norm 2.1569 (2.9722) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][70/625] eta 0:03:45 lr 0.000167 wd 0.0500 time 0.3962 (0.4057) data time 0.0008 (0.0055) model time 0.3954 (0.3977) loss 6.9535 (6.5959) grad_norm 3.0846 (2.9798) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:17:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][80/625] eta 0:03:40 lr 0.000167 wd 0.0500 time 0.3969 (0.4053) data time 0.0009 (0.0049) model time 0.3960 (0.3990) loss 8.0334 (6.5961) grad_norm 3.5625 (3.0415) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][90/625] eta 0:03:36 lr 0.000167 wd 0.0500 time 0.3951 (0.4047) data time 0.0008 (0.0045) model time 0.3943 (0.3989) loss 7.0479 (6.5899) grad_norm 2.8006 (3.1590) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][100/625] eta 0:03:32 lr 0.000167 wd 0.0500 time 0.3964 (0.4041) data time 0.0008 (0.0041) model time 0.3956 (0.3987) loss 6.3865 (6.5796) grad_norm 2.1778 (3.3087) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][110/625] eta 0:03:28 lr 0.000167 wd 0.0500 time 0.3977 (0.4041) data time 0.0009 (0.0038) model time 0.3968 (0.3994) loss 7.3718 (6.6048) grad_norm 2.8541 (3.3005) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][120/625] eta 0:03:23 lr 0.000167 wd 0.0500 time 0.3987 (0.4037) data time 0.0006 (0.0035) model time 0.3981 (0.3994) loss 7.0045 (6.6102) grad_norm 2.3318 (3.2484) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][130/625] eta 0:03:19 lr 0.000167 wd 0.0500 time 0.4020 (0.4033) data time 0.0008 (0.0033) model time 0.4012 (0.3992) loss 6.0695 (6.6401) grad_norm 2.6921 (3.2070) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][140/625] eta 0:03:15 lr 0.000167 wd 0.0500 time 0.3960 (0.4030) data time 0.0007 (0.0032) model time 0.3952 (0.3990) loss 6.3574 (6.6275) grad_norm 3.4570 (3.1999) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][150/625] eta 0:03:11 lr 0.000166 wd 0.0500 time 0.5062 (0.4034) data time 0.0009 (0.0030) model time 0.5053 (0.4000) loss 7.3659 (6.6136) grad_norm 1.9594 (3.1418) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][160/625] eta 0:03:08 lr 0.000166 wd 0.0500 time 0.3980 (0.4044) data time 0.0006 (0.0029) model time 0.3974 (0.4017) loss 6.3166 (6.6247) grad_norm 2.0509 (3.1294) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][170/625] eta 0:03:04 lr 0.000166 wd 0.0500 time 0.5759 (0.4063) data time 0.0006 (0.0027) model time 0.5752 (0.4046) loss 6.5630 (6.6283) grad_norm 2.5758 (3.1410) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][180/625] eta 0:03:02 lr 0.000166 wd 0.0500 time 0.3993 (0.4090) data time 0.0009 (0.0026) model time 0.3985 (0.4084) loss 5.9100 (6.6296) grad_norm 4.0238 (3.1618) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][190/625] eta 0:03:00 lr 0.000166 wd 0.0500 time 0.5746 (0.4141) data time 0.0006 (0.0025) model time 0.5740 (0.4153) loss 7.3631 (6.6352) grad_norm 2.2034 (3.3625) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][200/625] eta 0:02:57 lr 0.000166 wd 0.0500 time 0.4109 (0.4172) data time 0.0009 (0.0025) model time 0.4099 (0.4194) loss 6.3011 (6.6388) grad_norm 2.3476 (3.3795) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][210/625] eta 0:02:53 lr 0.000166 wd 0.0500 time 0.3980 (0.4176) data time 0.0009 (0.0024) model time 0.3970 (0.4197) loss 6.8307 (6.6502) grad_norm 6.8104 (3.3927) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][220/625] eta 0:02:49 lr 0.000166 wd 0.0500 time 0.3976 (0.4176) data time 0.0008 (0.0023) model time 0.3968 (0.4195) loss 6.4387 (6.6509) grad_norm 2.3953 (3.3881) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][230/625] eta 0:02:44 lr 0.000166 wd 0.0500 time 0.4005 (0.4173) data time 0.0007 (0.0022) model time 0.3998 (0.4190) loss 6.1410 (6.6487) grad_norm 3.6736 (3.5236) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][240/625] eta 0:02:40 lr 0.000166 wd 0.0500 time 0.4008 (0.4166) data time 0.0007 (0.0022) model time 0.4001 (0.4180) loss 5.9769 (6.6389) grad_norm 1.9241 (3.4802) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][250/625] eta 0:02:35 lr 0.000166 wd 0.0500 time 0.3967 (0.4159) data time 0.0009 (0.0021) model time 0.3958 (0.4170) loss 7.8268 (6.6230) grad_norm 2.7582 (3.4654) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][260/625] eta 0:02:31 lr 0.000166 wd 0.0500 time 0.3982 (0.4152) data time 0.0006 (0.0021) model time 0.3976 (0.4160) loss 6.0928 (6.6081) grad_norm 1.7921 (3.4392) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][270/625] eta 0:02:27 lr 0.000166 wd 0.0500 time 0.3983 (0.4146) data time 0.0007 (0.0020) model time 0.3977 (0.4152) loss 6.3006 (6.6031) grad_norm 3.7006 (3.4465) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][280/625] eta 0:02:22 lr 0.000166 wd 0.0500 time 0.3980 (0.4141) data time 0.0009 (0.0020) model time 0.3971 (0.4145) loss 5.5372 (6.6138) grad_norm 4.2936 (3.4397) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][290/625] eta 0:02:18 lr 0.000165 wd 0.0500 time 0.3986 (0.4135) data time 0.0006 (0.0020) model time 0.3980 (0.4138) loss 5.9898 (6.6212) grad_norm 3.1518 (3.4189) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][300/625] eta 0:02:14 lr 0.000165 wd 0.0500 time 0.4172 (0.4131) data time 0.0006 (0.0019) model time 0.4166 (0.4132) loss 6.7788 (6.6291) grad_norm 1.9221 (3.3943) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][310/625] eta 0:02:09 lr 0.000165 wd 0.0500 time 0.3969 (0.4126) data time 0.0006 (0.0019) model time 0.3963 (0.4125) loss 5.6444 (6.6279) grad_norm 2.7930 (3.3878) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][320/625] eta 0:02:05 lr 0.000165 wd 0.0500 time 0.4000 (0.4121) data time 0.0007 (0.0018) model time 0.3993 (0.4120) loss 5.7259 (6.6305) grad_norm 1.9867 (3.3691) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][330/625] eta 0:02:01 lr 0.000165 wd 0.0500 time 0.3955 (0.4118) data time 0.0006 (0.0018) model time 0.3948 (0.4115) loss 5.2617 (6.6197) grad_norm 2.1490 (3.3895) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][340/625] eta 0:01:57 lr 0.000165 wd 0.0500 time 0.3965 (0.4114) data time 0.0007 (0.0018) model time 0.3958 (0.4110) loss 5.5837 (6.6203) grad_norm 1.8834 (3.3798) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][350/625] eta 0:01:53 lr 0.000165 wd 0.0500 time 0.3967 (0.4110) data time 0.0007 (0.0018) model time 0.3960 (0.4106) loss 6.2605 (6.6196) grad_norm 2.8496 (3.3665) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][360/625] eta 0:01:48 lr 0.000165 wd 0.0500 time 0.3962 (0.4106) data time 0.0009 (0.0017) model time 0.3953 (0.4101) loss 7.0269 (6.6199) grad_norm 2.8132 (3.3527) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:19:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][370/625] eta 0:01:44 lr 0.000165 wd 0.0500 time 0.3986 (0.4103) data time 0.0007 (0.0017) model time 0.3979 (0.4097) loss 5.2335 (6.6165) grad_norm 4.7917 (3.3938) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:20:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][380/625] eta 0:01:40 lr 0.000165 wd 0.0500 time 0.4004 (0.4112) data time 0.0009 (0.0017) model time 0.3995 (0.4108) loss 7.5489 (6.6000) grad_norm 2.0768 (3.4027) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:20:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][390/625] eta 0:01:36 lr 0.000165 wd 0.0500 time 0.5986 (0.4119) data time 0.0009 (0.0017) model time 0.5977 (0.4115) loss 5.6821 (6.5988) grad_norm 1.8830 (3.3902) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:20:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][400/625] eta 0:01:32 lr 0.000165 wd 0.0500 time 0.4274 (0.4133) data time 0.0007 (0.0016) model time 0.4267 (0.4131) loss 7.2395 (6.6086) grad_norm 3.0463 (3.4131) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:20:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][410/625] eta 0:01:29 lr 0.000165 wd 0.0500 time 0.3964 (0.4155) data time 0.0008 (0.0016) model time 0.3956 (0.4157) loss 7.7650 (6.6076) grad_norm 6.1085 (3.4189) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:20:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][420/625] eta 0:01:25 lr 0.000165 wd 0.0500 time 0.4011 (0.4169) data time 0.0006 (0.0016) model time 0.4004 (0.4173) loss 7.4372 (6.6185) grad_norm 2.8642 (3.4120) loss_scale 256.0000 (131.0404) mem 14939MB [2024-07-25 09:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][430/625] eta 0:01:21 lr 0.000164 wd 0.0500 time 0.5242 (0.4175) data time 0.0008 (0.0016) model time 0.5235 (0.4179) loss 6.8738 (6.6192) grad_norm 2.7908 (3.4171) loss_scale 256.0000 (133.9397) mem 14939MB [2024-07-25 09:20:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][440/625] eta 0:01:17 lr 0.000164 wd 0.0500 time 0.3978 (0.4171) data time 0.0008 (0.0016) model time 0.3970 (0.4174) loss 6.3196 (6.6124) grad_norm 4.0378 (3.4168) loss_scale 256.0000 (136.7075) mem 14939MB [2024-07-25 09:20:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][450/625] eta 0:01:13 lr 0.000164 wd 0.0500 time 0.3972 (0.4172) data time 0.0008 (0.0016) model time 0.3964 (0.4175) loss 7.0976 (6.6086) grad_norm 2.7947 (3.4200) loss_scale 256.0000 (139.3525) mem 14939MB [2024-07-25 09:20:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][460/625] eta 0:01:08 lr 0.000164 wd 0.0500 time 0.4003 (0.4168) data time 0.0008 (0.0015) model time 0.3995 (0.4171) loss 5.9198 (6.6090) grad_norm 35.4004 (3.4851) loss_scale 256.0000 (141.8829) mem 14939MB [2024-07-25 09:20:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][470/625] eta 0:01:04 lr 0.000164 wd 0.0500 time 0.4014 (0.4165) data time 0.0009 (0.0015) model time 0.4005 (0.4166) loss 7.1202 (6.6077) grad_norm 2.0302 (3.4659) loss_scale 256.0000 (144.3057) mem 14939MB [2024-07-25 09:20:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][480/625] eta 0:01:00 lr 0.000164 wd 0.0500 time 0.4065 (0.4161) data time 0.0006 (0.0015) model time 0.4059 (0.4162) loss 5.1568 (6.6062) grad_norm 2.6499 (3.4570) loss_scale 256.0000 (146.6279) mem 14939MB [2024-07-25 09:20:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][490/625] eta 0:00:56 lr 0.000164 wd 0.0500 time 0.3989 (0.4158) data time 0.0009 (0.0015) model time 0.3979 (0.4158) loss 7.7208 (6.6093) grad_norm 2.2392 (3.4446) loss_scale 256.0000 (148.8554) mem 14939MB [2024-07-25 09:20:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][500/625] eta 0:00:51 lr 0.000164 wd 0.0500 time 0.3962 (0.4155) data time 0.0007 (0.0015) model time 0.3955 (0.4154) loss 6.1681 (6.6023) grad_norm 4.0080 (3.4446) loss_scale 256.0000 (150.9940) mem 14939MB [2024-07-25 09:20:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][510/625] eta 0:00:47 lr 0.000164 wd 0.0500 time 0.3955 (0.4152) data time 0.0007 (0.0015) model time 0.3947 (0.4151) loss 6.8524 (6.6074) grad_norm 3.2873 (3.4499) loss_scale 256.0000 (153.0489) mem 14939MB [2024-07-25 09:21:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][520/625] eta 0:00:43 lr 0.000164 wd 0.0500 time 0.3959 (0.4148) data time 0.0008 (0.0015) model time 0.3950 (0.4147) loss 6.4408 (6.6051) grad_norm 2.4056 (3.4576) loss_scale 256.0000 (155.0250) mem 14939MB [2024-07-25 09:21:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][530/625] eta 0:00:39 lr 0.000164 wd 0.0500 time 0.4012 (0.4145) data time 0.0007 (0.0014) model time 0.4006 (0.4143) loss 5.6500 (6.6010) grad_norm 3.7692 (3.4603) loss_scale 256.0000 (156.9266) mem 14939MB [2024-07-25 09:21:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][540/625] eta 0:00:35 lr 0.000164 wd 0.0500 time 0.4072 (0.4142) data time 0.0009 (0.0014) model time 0.4064 (0.4140) loss 6.3361 (6.6006) grad_norm 3.1203 (3.4588) loss_scale 256.0000 (158.7579) mem 14939MB [2024-07-25 09:21:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][550/625] eta 0:00:31 lr 0.000164 wd 0.0500 time 0.3947 (0.4139) data time 0.0009 (0.0014) model time 0.3938 (0.4136) loss 7.0283 (6.5938) grad_norm 3.0528 (3.4440) loss_scale 256.0000 (160.5227) mem 14939MB [2024-07-25 09:21:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][560/625] eta 0:00:26 lr 0.000164 wd 0.0500 time 0.3963 (0.4136) data time 0.0008 (0.0014) model time 0.3955 (0.4133) loss 6.3533 (6.5886) grad_norm 2.7420 (3.4415) loss_scale 256.0000 (162.2246) mem 14939MB [2024-07-25 09:21:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][570/625] eta 0:00:22 lr 0.000163 wd 0.0500 time 0.4005 (0.4134) data time 0.0007 (0.0014) model time 0.3999 (0.4130) loss 5.9106 (6.5918) grad_norm 2.2969 (3.4450) loss_scale 256.0000 (163.8669) mem 14939MB [2024-07-25 09:21:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][580/625] eta 0:00:18 lr 0.000163 wd 0.0500 time 0.3921 (0.4131) data time 0.0007 (0.0014) model time 0.3914 (0.4127) loss 5.6214 (6.5941) grad_norm 4.7490 (3.4543) loss_scale 256.0000 (165.4527) mem 14939MB [2024-07-25 09:21:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][590/625] eta 0:00:14 lr 0.000163 wd 0.0500 time 0.4079 (0.4129) data time 0.0006 (0.0014) model time 0.4073 (0.4125) loss 7.5458 (6.5973) grad_norm 3.0118 (3.4699) loss_scale 256.0000 (166.9848) mem 14939MB [2024-07-25 09:21:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][600/625] eta 0:00:10 lr 0.000163 wd 0.0500 time 0.3981 (0.4134) data time 0.0008 (0.0014) model time 0.3973 (0.4130) loss 7.5334 (6.6025) grad_norm 2.6184 (3.4793) loss_scale 256.0000 (168.4659) mem 14939MB [2024-07-25 09:21:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][610/625] eta 0:00:06 lr 0.000163 wd 0.0500 time 0.3977 (0.4137) data time 0.0004 (0.0014) model time 0.3973 (0.4133) loss 6.4552 (6.6040) grad_norm 2.6092 (inf) loss_scale 128.0000 (169.2700) mem 14939MB [2024-07-25 09:21:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][620/625] eta 0:00:02 lr 0.000163 wd 0.0500 time 0.3977 (0.4143) data time 0.0004 (0.0014) model time 0.3972 (0.4140) loss 7.8440 (6.6065) grad_norm 2.8776 (inf) loss_scale 128.0000 (168.6055) mem 14939MB [2024-07-25 09:21:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 234 training takes 0:04:19 [2024-07-25 09:21:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:21:44 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:21:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.446 (0.446) Loss 0.5586 (0.5586) Acc@1 89.551 (89.551) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 09:21:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 0.8350 (0.6723) Acc@1 82.812 (86.910) Acc@5 96.533 (97.865) Mem 14939MB [2024-07-25 09:21:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9277 (0.7787) Acc@1 78.027 (84.019) Acc@5 95.801 (96.882) Mem 14939MB [2024-07-25 09:21:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.643 Acc@5 96.863 [2024-07-25 09:21:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:21:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.787 (0.787) Loss 0.5396 (0.5396) Acc@1 90.234 (90.234) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 09:21:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.156) Loss 0.8252 (0.6631) Acc@1 82.324 (86.950) Acc@5 96.777 (97.918) Mem 14939MB [2024-07-25 09:21:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9419 (0.7701) Acc@1 78.271 (84.038) Acc@5 95.459 (96.917) Mem 14939MB [2024-07-25 09:21:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.639 Acc@5 96.867 [2024-07-25 09:21:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:21:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.64% [2024-07-25 09:21:50 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:21:51 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:21:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][0/625] eta 0:08:12 lr 0.000163 wd 0.0500 time 0.7873 (0.7873) data time 0.4095 (0.4095) model time 0.0000 (0.0000) loss 6.9745 (6.9745) grad_norm 6.9339 (6.9339) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:21:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][10/625] eta 0:05:02 lr 0.000163 wd 0.0500 time 0.5892 (0.4912) data time 0.0006 (0.0379) model time 0.0000 (0.0000) loss 6.6249 (6.6474) grad_norm 5.2830 (4.2460) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][20/625] eta 0:04:43 lr 0.000163 wd 0.0500 time 0.5774 (0.4694) data time 0.0009 (0.0202) model time 0.0000 (0.0000) loss 6.4663 (6.6257) grad_norm 2.8260 (3.9119) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][30/625] eta 0:04:31 lr 0.000163 wd 0.0500 time 0.3946 (0.4567) data time 0.0009 (0.0140) model time 0.0000 (0.0000) loss 7.6712 (6.7569) grad_norm 3.2554 (3.8861) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][40/625] eta 0:04:20 lr 0.000163 wd 0.0500 time 0.4034 (0.4453) data time 0.0008 (0.0108) model time 0.0000 (0.0000) loss 6.0290 (6.6922) grad_norm 6.6083 (4.1337) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][50/625] eta 0:04:10 lr 0.000163 wd 0.0500 time 0.3999 (0.4361) data time 0.0007 (0.0088) model time 0.0000 (0.0000) loss 7.7813 (6.6649) grad_norm 2.8793 (4.0357) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][60/625] eta 0:04:02 lr 0.000163 wd 0.0500 time 0.3944 (0.4298) data time 0.0007 (0.0075) model time 0.3937 (0.3966) loss 5.2087 (6.6781) grad_norm 4.1660 (4.2899) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][70/625] eta 0:03:56 lr 0.000163 wd 0.0500 time 0.3971 (0.4254) data time 0.0009 (0.0066) model time 0.3963 (0.3972) loss 7.7528 (6.6721) grad_norm 2.2362 (4.1163) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][80/625] eta 0:03:50 lr 0.000163 wd 0.0500 time 0.4022 (0.4221) data time 0.0008 (0.0059) model time 0.4013 (0.3975) loss 6.9954 (6.6976) grad_norm 2.4526 (3.9388) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][90/625] eta 0:03:44 lr 0.000162 wd 0.0500 time 0.3985 (0.4195) data time 0.0009 (0.0053) model time 0.3976 (0.3976) loss 6.8491 (6.7046) grad_norm 2.6235 (3.8023) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][100/625] eta 0:03:39 lr 0.000162 wd 0.0500 time 0.4002 (0.4176) data time 0.0009 (0.0049) model time 0.3993 (0.3979) loss 6.4286 (6.6871) grad_norm 3.2471 (3.6776) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][110/625] eta 0:03:34 lr 0.000162 wd 0.0500 time 0.3958 (0.4158) data time 0.0008 (0.0045) model time 0.3950 (0.3977) loss 7.2899 (6.6890) grad_norm 2.1138 (3.6298) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][120/625] eta 0:03:29 lr 0.000162 wd 0.0500 time 0.4001 (0.4144) data time 0.0007 (0.0042) model time 0.3994 (0.3979) loss 7.0340 (6.6968) grad_norm 16.7066 (3.6748) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][130/625] eta 0:03:24 lr 0.000162 wd 0.0500 time 0.3998 (0.4135) data time 0.0009 (0.0039) model time 0.3989 (0.3983) loss 6.8127 (6.6693) grad_norm 4.9053 (3.6654) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][140/625] eta 0:03:20 lr 0.000162 wd 0.0500 time 0.4002 (0.4127) data time 0.0009 (0.0037) model time 0.3994 (0.3986) loss 6.4986 (6.6680) grad_norm 2.4686 (3.6531) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][150/625] eta 0:03:15 lr 0.000162 wd 0.0500 time 0.4164 (0.4119) data time 0.0008 (0.0035) model time 0.4156 (0.3987) loss 5.7900 (6.6344) grad_norm 3.7035 (3.6644) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:22:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][160/625] eta 0:03:11 lr 0.000162 wd 0.0500 time 0.3949 (0.4110) data time 0.0008 (0.0033) model time 0.3941 (0.3985) loss 6.5343 (6.6212) grad_norm 4.3941 (3.7176) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][170/625] eta 0:03:07 lr 0.000162 wd 0.0500 time 0.3824 (0.4115) data time 0.0010 (0.0032) model time 0.3815 (0.4003) loss 7.3247 (6.6111) grad_norm 1.9973 (3.6381) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][180/625] eta 0:03:02 lr 0.000162 wd 0.0500 time 0.3981 (0.4108) data time 0.0011 (0.0031) model time 0.3970 (0.4001) loss 8.3552 (6.6293) grad_norm 2.2149 (3.5787) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][190/625] eta 0:02:58 lr 0.000162 wd 0.0500 time 0.3983 (0.4111) data time 0.0009 (0.0030) model time 0.3974 (0.4011) loss 7.4149 (6.6332) grad_norm 2.9410 (3.5320) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][200/625] eta 0:02:55 lr 0.000162 wd 0.0500 time 0.3959 (0.4122) data time 0.0009 (0.0029) model time 0.3950 (0.4032) loss 7.2749 (6.6406) grad_norm 2.5473 (3.5143) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][210/625] eta 0:02:52 lr 0.000162 wd 0.0500 time 0.5843 (0.4158) data time 0.0009 (0.0028) model time 0.5834 (0.4085) loss 5.8112 (6.6136) grad_norm 4.1881 (3.4995) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][220/625] eta 0:02:48 lr 0.000162 wd 0.0500 time 0.3931 (0.4167) data time 0.0009 (0.0027) model time 0.3922 (0.4101) loss 7.2735 (6.6141) grad_norm 3.3365 (3.5387) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][230/625] eta 0:02:45 lr 0.000161 wd 0.0500 time 0.5867 (0.4187) data time 0.0008 (0.0026) model time 0.5860 (0.4129) loss 5.7935 (6.6117) grad_norm 7.0723 (3.5514) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][240/625] eta 0:02:41 lr 0.000161 wd 0.0500 time 0.3972 (0.4207) data time 0.0008 (0.0025) model time 0.3964 (0.4157) loss 6.2106 (6.5971) grad_norm 3.7640 (3.5475) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][250/625] eta 0:02:38 lr 0.000161 wd 0.0500 time 0.4006 (0.4214) data time 0.0009 (0.0024) model time 0.3997 (0.4168) loss 6.9731 (6.6105) grad_norm 4.3500 (3.5906) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][260/625] eta 0:02:33 lr 0.000161 wd 0.0500 time 0.3974 (0.4212) data time 0.0006 (0.0024) model time 0.3967 (0.4167) loss 6.6702 (6.6126) grad_norm 2.6202 (3.5740) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][270/625] eta 0:02:29 lr 0.000161 wd 0.0500 time 0.3966 (0.4203) data time 0.0007 (0.0023) model time 0.3958 (0.4158) loss 7.2019 (6.6210) grad_norm 4.1515 (3.6213) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][280/625] eta 0:02:24 lr 0.000161 wd 0.0500 time 0.4054 (0.4195) data time 0.0008 (0.0023) model time 0.4046 (0.4150) loss 6.3887 (6.6318) grad_norm 2.6611 (3.6070) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][290/625] eta 0:02:20 lr 0.000161 wd 0.0500 time 0.3998 (0.4187) data time 0.0009 (0.0022) model time 0.3989 (0.4142) loss 7.4778 (6.6428) grad_norm 3.3383 (3.5934) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:23:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][300/625] eta 0:02:15 lr 0.000161 wd 0.0500 time 0.3988 (0.4181) data time 0.0008 (0.0022) model time 0.3980 (0.4136) loss 6.5342 (6.6481) grad_norm 2.9858 (3.5820) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][310/625] eta 0:02:11 lr 0.000161 wd 0.0500 time 0.3965 (0.4175) data time 0.0009 (0.0021) model time 0.3957 (0.4130) loss 6.8279 (6.6430) grad_norm 2.4023 (3.5513) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][320/625] eta 0:02:07 lr 0.000161 wd 0.0500 time 0.3989 (0.4169) data time 0.0008 (0.0021) model time 0.3981 (0.4124) loss 5.7342 (6.6453) grad_norm 1.9912 (3.5300) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][330/625] eta 0:02:02 lr 0.000161 wd 0.0500 time 0.3980 (0.4163) data time 0.0008 (0.0021) model time 0.3972 (0.4119) loss 6.9391 (6.6469) grad_norm 3.2417 (3.5115) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][340/625] eta 0:01:58 lr 0.000161 wd 0.0500 time 0.4004 (0.4158) data time 0.0008 (0.0020) model time 0.3996 (0.4114) loss 6.4788 (6.6452) grad_norm 3.2683 (3.5029) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][350/625] eta 0:01:54 lr 0.000161 wd 0.0500 time 0.4063 (0.4154) data time 0.0006 (0.0020) model time 0.4057 (0.4110) loss 6.8543 (6.6516) grad_norm 3.2928 (3.4967) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][360/625] eta 0:01:49 lr 0.000161 wd 0.0500 time 0.4026 (0.4149) data time 0.0009 (0.0019) model time 0.4017 (0.4106) loss 6.5145 (6.6604) grad_norm 2.1867 (3.4728) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][370/625] eta 0:01:45 lr 0.000160 wd 0.0500 time 0.4017 (0.4145) data time 0.0008 (0.0019) model time 0.4009 (0.4103) loss 6.6497 (6.6582) grad_norm 2.0639 (3.4982) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][380/625] eta 0:01:41 lr 0.000160 wd 0.0500 time 0.3986 (0.4142) data time 0.0009 (0.0019) model time 0.3977 (0.4099) loss 6.8397 (6.6535) grad_norm 2.8641 (3.5073) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][390/625] eta 0:01:37 lr 0.000160 wd 0.0500 time 0.3972 (0.4141) data time 0.0006 (0.0019) model time 0.3966 (0.4100) loss 8.0782 (6.6554) grad_norm 2.2562 (3.4995) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][400/625] eta 0:01:33 lr 0.000160 wd 0.0500 time 0.3986 (0.4138) data time 0.0009 (0.0018) model time 0.3977 (0.4097) loss 6.3157 (6.6468) grad_norm 2.3180 (3.4898) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][410/625] eta 0:01:29 lr 0.000160 wd 0.0500 time 0.6152 (0.4144) data time 0.0007 (0.0018) model time 0.6145 (0.4105) loss 7.3526 (6.6444) grad_norm 1.6890 (3.4874) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][420/625] eta 0:01:24 lr 0.000160 wd 0.0500 time 0.4094 (0.4145) data time 0.0007 (0.0018) model time 0.4087 (0.4107) loss 6.2664 (6.6355) grad_norm 2.2770 (3.4687) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][430/625] eta 0:01:21 lr 0.000160 wd 0.0500 time 0.3971 (0.4154) data time 0.0007 (0.0018) model time 0.3964 (0.4118) loss 6.4126 (6.6435) grad_norm 2.3194 (3.4446) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][440/625] eta 0:01:17 lr 0.000160 wd 0.0500 time 0.5891 (0.4162) data time 0.0008 (0.0017) model time 0.5883 (0.4128) loss 6.8820 (6.6514) grad_norm 1.9146 (3.4276) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:24:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][450/625] eta 0:01:13 lr 0.000160 wd 0.0500 time 0.5816 (0.4177) data time 0.0007 (0.0017) model time 0.5809 (0.4145) loss 7.6653 (6.6545) grad_norm 4.0725 (3.4220) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][460/625] eta 0:01:09 lr 0.000160 wd 0.0500 time 0.5679 (0.4189) data time 0.0008 (0.0017) model time 0.5671 (0.4159) loss 7.6195 (6.6666) grad_norm 3.5134 (3.4272) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][470/625] eta 0:01:04 lr 0.000160 wd 0.0500 time 0.3991 (0.4187) data time 0.0007 (0.0017) model time 0.3984 (0.4157) loss 7.8676 (6.6688) grad_norm 3.7733 (3.4616) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][480/625] eta 0:01:00 lr 0.000160 wd 0.0500 time 0.3989 (0.4186) data time 0.0006 (0.0017) model time 0.3983 (0.4157) loss 7.1532 (6.6673) grad_norm 2.9832 (3.4484) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][490/625] eta 0:00:56 lr 0.000160 wd 0.0500 time 0.4007 (0.4183) data time 0.0009 (0.0017) model time 0.3998 (0.4154) loss 5.7319 (6.6654) grad_norm 2.0129 (3.4394) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][500/625] eta 0:00:52 lr 0.000160 wd 0.0500 time 0.3947 (0.4181) data time 0.0009 (0.0016) model time 0.3938 (0.4152) loss 7.8440 (6.6639) grad_norm 1.9788 (3.4290) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][510/625] eta 0:00:48 lr 0.000159 wd 0.0500 time 0.3975 (0.4178) data time 0.0009 (0.0016) model time 0.3966 (0.4149) loss 7.7874 (6.6652) grad_norm 2.0931 (3.4081) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][520/625] eta 0:00:43 lr 0.000159 wd 0.0500 time 0.3993 (0.4175) data time 0.0008 (0.0016) model time 0.3986 (0.4146) loss 6.6330 (6.6603) grad_norm 2.6992 (3.3922) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][530/625] eta 0:00:39 lr 0.000159 wd 0.0500 time 0.3952 (0.4171) data time 0.0009 (0.0016) model time 0.3943 (0.4142) loss 6.7128 (6.6539) grad_norm 2.3019 (3.3791) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][540/625] eta 0:00:35 lr 0.000159 wd 0.0500 time 0.3975 (0.4169) data time 0.0007 (0.0016) model time 0.3968 (0.4141) loss 5.3938 (6.6485) grad_norm 1.8772 (3.3675) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][550/625] eta 0:00:31 lr 0.000159 wd 0.0500 time 0.3980 (0.4166) data time 0.0009 (0.0016) model time 0.3971 (0.4138) loss 7.3745 (6.6509) grad_norm 2.8775 (3.3547) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][560/625] eta 0:00:27 lr 0.000159 wd 0.0500 time 0.3970 (0.4164) data time 0.0008 (0.0015) model time 0.3962 (0.4136) loss 7.0575 (6.6446) grad_norm 2.3867 (3.4553) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][570/625] eta 0:00:22 lr 0.000159 wd 0.0500 time 0.4009 (0.4162) data time 0.0007 (0.0015) model time 0.4002 (0.4134) loss 6.6924 (6.6476) grad_norm 4.3534 (3.4553) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][580/625] eta 0:00:18 lr 0.000159 wd 0.0500 time 0.5378 (0.4161) data time 0.0009 (0.0015) model time 0.5369 (0.4133) loss 7.3283 (6.6455) grad_norm 3.5489 (3.4492) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:25:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][590/625] eta 0:00:14 lr 0.000159 wd 0.0500 time 0.3967 (0.4158) data time 0.0006 (0.0015) model time 0.3960 (0.4130) loss 6.0049 (6.6484) grad_norm 1.9439 (3.4426) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][600/625] eta 0:00:10 lr 0.000159 wd 0.0500 time 0.3986 (0.4155) data time 0.0009 (0.0015) model time 0.3978 (0.4128) loss 7.4849 (6.6523) grad_norm 2.8353 (3.4349) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][610/625] eta 0:00:06 lr 0.000159 wd 0.0500 time 0.3961 (0.4157) data time 0.0004 (0.0015) model time 0.3957 (0.4130) loss 6.5425 (6.6510) grad_norm 3.0594 (3.4268) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][620/625] eta 0:00:02 lr 0.000159 wd 0.0500 time 0.3977 (0.4154) data time 0.0004 (0.0015) model time 0.3973 (0.4127) loss 6.6070 (6.6496) grad_norm 2.4498 (3.4187) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 235 training takes 0:04:19 [2024-07-25 09:26:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:26:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:26:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5352 (0.5352) Acc@1 89.844 (89.844) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 09:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8267 (0.6593) Acc@1 81.934 (86.941) Acc@5 96.680 (97.869) Mem 14939MB [2024-07-25 09:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9253 (0.7640) Acc@1 78.467 (84.136) Acc@5 95.801 (96.894) Mem 14939MB [2024-07-25 09:26:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.755 Acc@5 96.863 [2024-07-25 09:26:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 09:26:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.76% [2024-07-25 09:26:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 09:26:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 09:26:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 0.5391 (0.5391) Acc@1 90.234 (90.234) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 09:26:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 0.8242 (0.6626) Acc@1 82.471 (86.994) Acc@5 96.777 (97.918) Mem 14939MB [2024-07-25 09:26:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9399 (0.7693) Acc@1 78.271 (84.077) Acc@5 95.361 (96.919) Mem 14939MB [2024-07-25 09:26:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.667 Acc@5 96.873 [2024-07-25 09:26:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-07-25 09:26:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.67% [2024-07-25 09:26:17 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:26:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:26:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][0/625] eta 0:07:57 lr 0.000159 wd 0.0500 time 0.7638 (0.7638) data time 0.3842 (0.3842) model time 0.0000 (0.0000) loss 5.5522 (5.5522) grad_norm 1.6264 (1.6264) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][10/625] eta 0:04:38 lr 0.000159 wd 0.0500 time 0.3958 (0.4524) data time 0.0008 (0.0357) model time 0.0000 (0.0000) loss 7.6070 (6.2072) grad_norm 3.2743 (3.3202) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][20/625] eta 0:04:28 lr 0.000159 wd 0.0500 time 0.3984 (0.4446) data time 0.0007 (0.0190) model time 0.0000 (0.0000) loss 7.1706 (6.2920) grad_norm 2.3408 (3.3499) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][30/625] eta 0:04:30 lr 0.000158 wd 0.0500 time 0.3984 (0.4541) data time 0.0007 (0.0132) model time 0.0000 (0.0000) loss 7.8035 (6.4070) grad_norm 6.0005 (3.3408) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][40/625] eta 0:04:30 lr 0.000158 wd 0.0500 time 0.3974 (0.4622) data time 0.0009 (0.0101) model time 0.0000 (0.0000) loss 6.7619 (6.4249) grad_norm 2.2986 (3.2221) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][50/625] eta 0:04:30 lr 0.000158 wd 0.0500 time 0.5884 (0.4709) data time 0.0007 (0.0083) model time 0.0000 (0.0000) loss 6.6259 (6.4857) grad_norm 2.5539 (3.1716) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][60/625] eta 0:04:22 lr 0.000158 wd 0.0500 time 0.3933 (0.4641) data time 0.0008 (0.0071) model time 0.3925 (0.4285) loss 6.1262 (6.4947) grad_norm 1.9805 (3.1631) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][70/625] eta 0:04:14 lr 0.000158 wd 0.0500 time 0.5544 (0.4592) data time 0.0007 (0.0062) model time 0.5537 (0.4284) loss 6.6102 (6.4715) grad_norm 3.2388 (3.1669) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][80/625] eta 0:04:06 lr 0.000158 wd 0.0500 time 0.3969 (0.4517) data time 0.0008 (0.0056) model time 0.3961 (0.4180) loss 7.9647 (6.4975) grad_norm 2.0006 (3.1374) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:26:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][90/625] eta 0:03:58 lr 0.000158 wd 0.0500 time 0.3954 (0.4456) data time 0.0007 (0.0050) model time 0.3947 (0.4124) loss 6.0764 (6.4717) grad_norm 2.4039 (3.2138) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][100/625] eta 0:03:51 lr 0.000158 wd 0.0500 time 0.3983 (0.4408) data time 0.0011 (0.0046) model time 0.3972 (0.4093) loss 7.4204 (6.5069) grad_norm 2.1429 (3.1364) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][110/625] eta 0:03:45 lr 0.000158 wd 0.0500 time 0.3937 (0.4372) data time 0.0008 (0.0043) model time 0.3929 (0.4076) loss 6.4652 (6.4867) grad_norm 3.1335 (3.1162) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][120/625] eta 0:03:39 lr 0.000158 wd 0.0500 time 0.4005 (0.4350) data time 0.0010 (0.0040) model time 0.3995 (0.4079) loss 7.1349 (6.4798) grad_norm 2.7357 (3.1357) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][130/625] eta 0:03:33 lr 0.000158 wd 0.0500 time 0.3991 (0.4322) data time 0.0006 (0.0038) model time 0.3984 (0.4066) loss 7.1902 (6.4805) grad_norm 2.9458 (3.1259) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][140/625] eta 0:03:28 lr 0.000158 wd 0.0500 time 0.3982 (0.4297) data time 0.0007 (0.0036) model time 0.3975 (0.4055) loss 7.1783 (6.5055) grad_norm 43.0584 (3.3909) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][150/625] eta 0:03:23 lr 0.000158 wd 0.0500 time 0.3953 (0.4275) data time 0.0006 (0.0034) model time 0.3948 (0.4045) loss 6.3272 (6.5385) grad_norm 3.1618 (3.3683) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][160/625] eta 0:03:17 lr 0.000158 wd 0.0500 time 0.3969 (0.4257) data time 0.0009 (0.0032) model time 0.3960 (0.4039) loss 6.5102 (6.5732) grad_norm 2.6938 (3.3371) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][170/625] eta 0:03:12 lr 0.000157 wd 0.0500 time 0.3955 (0.4241) data time 0.0008 (0.0031) model time 0.3948 (0.4034) loss 6.4830 (6.5782) grad_norm 2.9489 (3.2830) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][180/625] eta 0:03:08 lr 0.000157 wd 0.0500 time 0.3984 (0.4227) data time 0.0007 (0.0029) model time 0.3977 (0.4029) loss 7.6532 (6.5781) grad_norm 3.8168 (3.2606) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][190/625] eta 0:03:03 lr 0.000157 wd 0.0500 time 0.4024 (0.4214) data time 0.0006 (0.0028) model time 0.4018 (0.4026) loss 7.1766 (6.5981) grad_norm 2.2719 (3.2736) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][200/625] eta 0:02:58 lr 0.000157 wd 0.0500 time 0.3923 (0.4203) data time 0.0009 (0.0027) model time 0.3914 (0.4023) loss 5.7620 (6.6072) grad_norm 3.8059 (3.2709) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][210/625] eta 0:02:54 lr 0.000157 wd 0.0500 time 0.4038 (0.4193) data time 0.0006 (0.0026) model time 0.4032 (0.4021) loss 6.6498 (6.6149) grad_norm 2.9535 (3.2616) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][220/625] eta 0:02:49 lr 0.000157 wd 0.0500 time 0.3996 (0.4185) data time 0.0006 (0.0026) model time 0.3990 (0.4019) loss 7.1110 (6.6139) grad_norm 2.9034 (3.2563) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:27:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][230/625] eta 0:02:45 lr 0.000157 wd 0.0500 time 0.3975 (0.4193) data time 0.0007 (0.0025) model time 0.3968 (0.4039) loss 7.5286 (6.6136) grad_norm 2.1269 (3.4502) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][240/625] eta 0:02:42 lr 0.000157 wd 0.0500 time 0.6030 (0.4210) data time 0.0008 (0.0024) model time 0.6023 (0.4067) loss 6.7924 (6.6095) grad_norm 3.9911 (3.4118) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][250/625] eta 0:02:38 lr 0.000157 wd 0.0500 time 0.5782 (0.4224) data time 0.0007 (0.0023) model time 0.5775 (0.4092) loss 5.4292 (6.6047) grad_norm 1.8619 (3.3882) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][260/625] eta 0:02:34 lr 0.000157 wd 0.0500 time 0.3960 (0.4237) data time 0.0007 (0.0023) model time 0.3953 (0.4114) loss 5.9546 (6.5925) grad_norm 2.1296 (3.3567) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][270/625] eta 0:02:31 lr 0.000157 wd 0.0500 time 0.5887 (0.4268) data time 0.0008 (0.0022) model time 0.5879 (0.4157) loss 5.1448 (6.5945) grad_norm 3.1797 (3.3873) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][280/625] eta 0:02:27 lr 0.000157 wd 0.0500 time 0.3981 (0.4270) data time 0.0010 (0.0022) model time 0.3971 (0.4164) loss 6.8125 (6.5889) grad_norm 2.8062 (3.3763) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][290/625] eta 0:02:22 lr 0.000157 wd 0.0500 time 0.4019 (0.4264) data time 0.0007 (0.0021) model time 0.4012 (0.4162) loss 5.8785 (6.5804) grad_norm 1.9476 (3.3708) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][300/625] eta 0:02:18 lr 0.000157 wd 0.0500 time 0.4087 (0.4261) data time 0.0009 (0.0021) model time 0.4078 (0.4162) loss 7.6247 (6.5862) grad_norm 5.5415 (3.3679) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][310/625] eta 0:02:13 lr 0.000157 wd 0.0500 time 0.3986 (0.4253) data time 0.0007 (0.0021) model time 0.3979 (0.4155) loss 6.7031 (6.5820) grad_norm 2.7106 (3.3750) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][320/625] eta 0:02:09 lr 0.000156 wd 0.0500 time 0.3956 (0.4244) data time 0.0008 (0.0020) model time 0.3947 (0.4148) loss 6.1151 (6.5819) grad_norm 2.6707 (3.4014) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][330/625] eta 0:02:04 lr 0.000156 wd 0.0500 time 0.3991 (0.4236) data time 0.0008 (0.0020) model time 0.3982 (0.4142) loss 7.2922 (6.5888) grad_norm 2.2272 (3.4115) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][340/625] eta 0:02:00 lr 0.000156 wd 0.0500 time 0.3949 (0.4233) data time 0.0007 (0.0019) model time 0.3942 (0.4141) loss 6.7739 (6.5869) grad_norm 1.8299 (3.4007) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][350/625] eta 0:01:56 lr 0.000156 wd 0.0500 time 0.4048 (0.4226) data time 0.0007 (0.0019) model time 0.4041 (0.4136) loss 7.3510 (6.5844) grad_norm 2.3426 (3.3853) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][360/625] eta 0:01:51 lr 0.000156 wd 0.0500 time 0.3987 (0.4220) data time 0.0009 (0.0019) model time 0.3978 (0.4131) loss 6.2967 (6.5923) grad_norm 1.8249 (3.3726) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][370/625] eta 0:01:47 lr 0.000156 wd 0.0500 time 0.3961 (0.4213) data time 0.0008 (0.0019) model time 0.3953 (0.4126) loss 5.5188 (6.5860) grad_norm 2.3267 (3.3800) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][380/625] eta 0:01:43 lr 0.000156 wd 0.0500 time 0.3998 (0.4207) data time 0.0008 (0.0018) model time 0.3990 (0.4121) loss 7.4030 (6.5964) grad_norm 2.5672 (3.3740) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][390/625] eta 0:01:38 lr 0.000156 wd 0.0500 time 0.4009 (0.4201) data time 0.0008 (0.0018) model time 0.4001 (0.4117) loss 7.0172 (6.5939) grad_norm 2.1110 (3.3735) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][400/625] eta 0:01:34 lr 0.000156 wd 0.0500 time 0.3992 (0.4196) data time 0.0009 (0.0018) model time 0.3983 (0.4113) loss 7.3082 (6.5886) grad_norm 2.6637 (3.3614) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][410/625] eta 0:01:30 lr 0.000156 wd 0.0500 time 0.3971 (0.4191) data time 0.0006 (0.0018) model time 0.3965 (0.4109) loss 7.1779 (6.5898) grad_norm 3.4261 (3.3701) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][420/625] eta 0:01:25 lr 0.000156 wd 0.0500 time 0.3999 (0.4187) data time 0.0007 (0.0017) model time 0.3992 (0.4106) loss 5.2183 (6.5952) grad_norm 3.3973 (3.3557) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][430/625] eta 0:01:21 lr 0.000156 wd 0.0500 time 0.3965 (0.4182) data time 0.0009 (0.0017) model time 0.3957 (0.4103) loss 6.4061 (6.5928) grad_norm 3.3353 (3.3692) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][440/625] eta 0:01:17 lr 0.000156 wd 0.0500 time 0.5790 (0.4182) data time 0.0006 (0.0017) model time 0.5784 (0.4105) loss 6.4418 (6.5974) grad_norm 4.4165 (3.3680) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][450/625] eta 0:01:13 lr 0.000156 wd 0.0500 time 0.4023 (0.4182) data time 0.0008 (0.0017) model time 0.4015 (0.4106) loss 7.2784 (6.5964) grad_norm 2.0241 (3.3588) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][460/625] eta 0:01:09 lr 0.000155 wd 0.0500 time 0.3991 (0.4186) data time 0.0009 (0.0017) model time 0.3981 (0.4112) loss 6.4560 (6.6026) grad_norm 2.8810 (3.4636) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][470/625] eta 0:01:05 lr 0.000155 wd 0.0500 time 0.5967 (0.4198) data time 0.0009 (0.0016) model time 0.5957 (0.4128) loss 8.0248 (6.5983) grad_norm 3.8010 (3.5976) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][480/625] eta 0:01:00 lr 0.000155 wd 0.0500 time 0.4039 (0.4204) data time 0.0007 (0.0016) model time 0.4032 (0.4136) loss 6.0960 (6.5981) grad_norm 2.7861 (3.6011) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][490/625] eta 0:00:56 lr 0.000155 wd 0.0500 time 0.5894 (0.4221) data time 0.0009 (0.0016) model time 0.5885 (0.4156) loss 6.8950 (6.5954) grad_norm 3.0254 (3.5847) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][500/625] eta 0:00:52 lr 0.000155 wd 0.0500 time 0.3952 (0.4222) data time 0.0007 (0.0016) model time 0.3945 (0.4158) loss 7.4418 (6.5995) grad_norm 3.5124 (3.5713) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][510/625] eta 0:00:48 lr 0.000155 wd 0.0500 time 0.4011 (0.4220) data time 0.0007 (0.0016) model time 0.4004 (0.4157) loss 5.9192 (6.6002) grad_norm 2.8196 (3.5574) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:29:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][520/625] eta 0:00:44 lr 0.000155 wd 0.0500 time 0.4018 (0.4218) data time 0.0009 (0.0016) model time 0.4009 (0.4157) loss 6.7562 (6.6014) grad_norm 2.1337 (3.5392) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][530/625] eta 0:00:40 lr 0.000155 wd 0.0500 time 0.4160 (0.4215) data time 0.0006 (0.0015) model time 0.4154 (0.4154) loss 5.8356 (6.6054) grad_norm 3.3815 (3.5279) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][540/625] eta 0:00:35 lr 0.000155 wd 0.0500 time 0.4025 (0.4211) data time 0.0006 (0.0015) model time 0.4018 (0.4151) loss 6.8993 (6.6108) grad_norm 3.0289 (3.5111) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][550/625] eta 0:00:31 lr 0.000155 wd 0.0500 time 0.3997 (0.4207) data time 0.0007 (0.0015) model time 0.3989 (0.4147) loss 6.6385 (6.6106) grad_norm 3.1381 (3.5460) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][560/625] eta 0:00:27 lr 0.000155 wd 0.0500 time 0.3997 (0.4207) data time 0.0008 (0.0015) model time 0.3989 (0.4148) loss 6.8552 (6.6104) grad_norm 2.5948 (3.5398) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][570/625] eta 0:00:23 lr 0.000155 wd 0.0500 time 0.3961 (0.4203) data time 0.0008 (0.0015) model time 0.3953 (0.4145) loss 7.5467 (6.6103) grad_norm 2.6900 (3.6093) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][580/625] eta 0:00:18 lr 0.000155 wd 0.0500 time 0.3972 (0.4199) data time 0.0008 (0.0015) model time 0.3964 (0.4142) loss 6.8999 (6.6123) grad_norm 4.0398 (3.6046) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][590/625] eta 0:00:14 lr 0.000155 wd 0.0500 time 0.3978 (0.4196) data time 0.0007 (0.0015) model time 0.3971 (0.4139) loss 6.7238 (6.6146) grad_norm 3.1541 (3.6017) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][600/625] eta 0:00:10 lr 0.000154 wd 0.0500 time 0.3974 (0.4192) data time 0.0009 (0.0015) model time 0.3965 (0.4136) loss 6.0944 (6.6096) grad_norm 2.4497 (3.6063) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][610/625] eta 0:00:06 lr 0.000154 wd 0.0500 time 0.3954 (0.4189) data time 0.0006 (0.0015) model time 0.3948 (0.4133) loss 6.3179 (6.5993) grad_norm 4.8605 (3.6070) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][620/625] eta 0:00:02 lr 0.000154 wd 0.0500 time 0.3987 (0.4186) data time 0.0006 (0.0015) model time 0.3982 (0.4131) loss 6.7323 (6.6041) grad_norm 2.7868 (3.6019) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 236 training takes 0:04:21 [2024-07-25 09:30:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:30:41 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:30:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 0.5469 (0.5469) Acc@1 90.234 (90.234) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 09:30:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8428 (0.6697) Acc@1 82.715 (87.207) Acc@5 96.631 (97.931) Mem 14939MB [2024-07-25 09:30:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9404 (0.7784) Acc@1 77.734 (84.091) Acc@5 95.752 (96.928) Mem 14939MB [2024-07-25 09:30:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.695 Acc@5 96.897 [2024-07-25 09:30:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-07-25 09:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.836 (0.836) Loss 0.5391 (0.5391) Acc@1 90.283 (90.283) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 09:30:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.158) Loss 0.8252 (0.6624) Acc@1 82.568 (87.021) Acc@5 96.777 (97.918) Mem 14939MB [2024-07-25 09:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.123) Loss 0.9380 (0.7688) Acc@1 78.369 (84.124) Acc@5 95.459 (96.928) Mem 14939MB [2024-07-25 09:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.713 Acc@5 96.883 [2024-07-25 09:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-07-25 09:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.71% [2024-07-25 09:30:46 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:30:47 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][0/625] eta 0:08:09 lr 0.000154 wd 0.0500 time 0.7825 (0.7825) data time 0.4053 (0.4053) model time 0.0000 (0.0000) loss 7.1895 (7.1895) grad_norm 8.8705 (8.8705) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][10/625] eta 0:04:26 lr 0.000154 wd 0.0500 time 0.3942 (0.4335) data time 0.0006 (0.0376) model time 0.0000 (0.0000) loss 6.6292 (6.9056) grad_norm 2.8807 (3.7741) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][20/625] eta 0:04:11 lr 0.000154 wd 0.0500 time 0.3977 (0.4163) data time 0.0008 (0.0201) model time 0.0000 (0.0000) loss 7.0898 (6.7504) grad_norm 2.2487 (3.7818) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][30/625] eta 0:04:04 lr 0.000154 wd 0.0500 time 0.3961 (0.4111) data time 0.0009 (0.0139) model time 0.0000 (0.0000) loss 6.0269 (6.6677) grad_norm 2.4829 (3.6179) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][40/625] eta 0:04:01 lr 0.000154 wd 0.0500 time 0.3960 (0.4122) data time 0.0006 (0.0107) model time 0.0000 (0.0000) loss 6.6012 (6.7008) grad_norm 3.1367 (3.5636) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][50/625] eta 0:03:59 lr 0.000154 wd 0.0500 time 0.5435 (0.4158) data time 0.0007 (0.0088) model time 0.0000 (0.0000) loss 7.1086 (6.7541) grad_norm 2.8877 (3.5108) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][60/625] eta 0:03:58 lr 0.000154 wd 0.0500 time 0.3979 (0.4226) data time 0.0009 (0.0075) model time 0.3970 (0.4562) loss 6.0908 (6.7362) grad_norm 2.5384 (3.4258) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][70/625] eta 0:03:56 lr 0.000154 wd 0.0500 time 0.3978 (0.4265) data time 0.0006 (0.0065) model time 0.3972 (0.4529) loss 6.7978 (6.7442) grad_norm 2.2534 (3.3400) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][80/625] eta 0:03:54 lr 0.000154 wd 0.0500 time 0.4016 (0.4306) data time 0.0007 (0.0058) model time 0.4008 (0.4549) loss 6.6248 (6.7563) grad_norm 4.3622 (3.3474) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][90/625] eta 0:03:52 lr 0.000154 wd 0.0500 time 0.3978 (0.4342) data time 0.0009 (0.0053) model time 0.3969 (0.4568) loss 6.1153 (6.7174) grad_norm 2.5601 (3.3697) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][100/625] eta 0:03:47 lr 0.000154 wd 0.0500 time 0.4001 (0.4338) data time 0.0011 (0.0048) model time 0.3991 (0.4512) loss 6.9925 (6.7176) grad_norm 6.5948 (3.4200) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][110/625] eta 0:03:43 lr 0.000154 wd 0.0500 time 0.3972 (0.4333) data time 0.0007 (0.0045) model time 0.3966 (0.4473) loss 7.2203 (6.7513) grad_norm 2.3816 (3.3677) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][120/625] eta 0:03:37 lr 0.000153 wd 0.0500 time 0.3960 (0.4303) data time 0.0007 (0.0042) model time 0.3953 (0.4400) loss 6.3997 (6.7187) grad_norm 2.9189 (3.3718) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][130/625] eta 0:03:31 lr 0.000153 wd 0.0500 time 0.3988 (0.4279) data time 0.0008 (0.0039) model time 0.3980 (0.4348) loss 6.5877 (6.6976) grad_norm 2.3758 (3.3189) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][140/625] eta 0:03:26 lr 0.000153 wd 0.0500 time 0.3961 (0.4259) data time 0.0009 (0.0037) model time 0.3952 (0.4308) loss 6.8953 (6.6982) grad_norm 2.6444 (3.3026) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][150/625] eta 0:03:21 lr 0.000153 wd 0.0500 time 0.3965 (0.4241) data time 0.0006 (0.0035) model time 0.3958 (0.4274) loss 6.7290 (6.6893) grad_norm 3.2044 (3.3307) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][160/625] eta 0:03:16 lr 0.000153 wd 0.0500 time 0.4013 (0.4226) data time 0.0009 (0.0034) model time 0.4004 (0.4249) loss 7.1684 (6.7042) grad_norm 4.1321 (3.4243) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:31:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][170/625] eta 0:03:11 lr 0.000153 wd 0.0500 time 0.3943 (0.4210) data time 0.0006 (0.0032) model time 0.3937 (0.4224) loss 5.8330 (6.6751) grad_norm 3.3771 (3.4038) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][180/625] eta 0:03:06 lr 0.000153 wd 0.0500 time 0.3976 (0.4198) data time 0.0009 (0.0031) model time 0.3968 (0.4205) loss 5.9537 (6.6444) grad_norm 5.4925 (3.4016) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][190/625] eta 0:03:02 lr 0.000153 wd 0.0500 time 0.3976 (0.4186) data time 0.0006 (0.0030) model time 0.3970 (0.4187) loss 6.9456 (6.6523) grad_norm 2.2245 (3.3804) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][200/625] eta 0:02:57 lr 0.000153 wd 0.0500 time 0.3929 (0.4176) data time 0.0006 (0.0028) model time 0.3923 (0.4174) loss 6.2592 (6.6632) grad_norm 2.2069 (3.3850) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][210/625] eta 0:02:52 lr 0.000153 wd 0.0500 time 0.3985 (0.4167) data time 0.0008 (0.0028) model time 0.3976 (0.4161) loss 6.7119 (6.6630) grad_norm 3.3846 (3.3900) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][220/625] eta 0:02:48 lr 0.000153 wd 0.0500 time 0.4054 (0.4160) data time 0.0007 (0.0027) model time 0.4047 (0.4152) loss 7.1874 (6.6525) grad_norm 3.3974 (3.3524) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][230/625] eta 0:02:44 lr 0.000153 wd 0.0500 time 0.4051 (0.4154) data time 0.0006 (0.0026) model time 0.4045 (0.4144) loss 4.8692 (6.6495) grad_norm 2.5953 (3.3495) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][240/625] eta 0:02:39 lr 0.000153 wd 0.0500 time 0.4780 (0.4151) data time 0.0007 (0.0025) model time 0.4773 (0.4140) loss 7.1582 (6.6427) grad_norm 3.5233 (3.4242) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][250/625] eta 0:02:35 lr 0.000153 wd 0.0500 time 0.4045 (0.4146) data time 0.0009 (0.0025) model time 0.4036 (0.4134) loss 7.5155 (6.6421) grad_norm 4.0708 (3.5581) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][260/625] eta 0:02:31 lr 0.000153 wd 0.0500 time 0.4033 (0.4148) data time 0.0006 (0.0024) model time 0.4027 (0.4137) loss 6.5617 (6.6373) grad_norm 2.4988 (3.7741) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][270/625] eta 0:02:27 lr 0.000152 wd 0.0500 time 0.5938 (0.4157) data time 0.0008 (0.0023) model time 0.5930 (0.4149) loss 7.2724 (6.6457) grad_norm 2.8538 (3.7543) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][280/625] eta 0:02:23 lr 0.000152 wd 0.0500 time 0.4004 (0.4156) data time 0.0006 (0.0023) model time 0.3998 (0.4147) loss 6.5255 (6.6396) grad_norm 1.7939 (3.7263) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][290/625] eta 0:02:19 lr 0.000152 wd 0.0500 time 0.3989 (0.4173) data time 0.0008 (0.0022) model time 0.3981 (0.4167) loss 5.6129 (6.6350) grad_norm 5.8345 (3.8328) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][300/625] eta 0:02:16 lr 0.000152 wd 0.0500 time 0.6086 (0.4192) data time 0.0006 (0.0022) model time 0.6079 (0.4191) loss 5.2954 (6.6299) grad_norm 2.4199 (3.9030) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:32:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][310/625] eta 0:02:12 lr 0.000152 wd 0.0500 time 0.3979 (0.4210) data time 0.0009 (0.0022) model time 0.3970 (0.4211) loss 7.4049 (6.6422) grad_norm 3.0189 (3.8866) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][320/625] eta 0:02:08 lr 0.000152 wd 0.0500 time 0.5398 (0.4218) data time 0.0006 (0.0021) model time 0.5391 (0.4220) loss 6.0688 (6.6394) grad_norm 2.1607 (3.8578) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][330/625] eta 0:02:04 lr 0.000152 wd 0.0500 time 0.4043 (0.4215) data time 0.0006 (0.0021) model time 0.4038 (0.4217) loss 7.6778 (6.6375) grad_norm 2.7009 (3.8346) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][340/625] eta 0:01:59 lr 0.000152 wd 0.0500 time 0.3985 (0.4209) data time 0.0006 (0.0020) model time 0.3978 (0.4209) loss 6.6183 (6.6365) grad_norm 2.3649 (3.7962) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][350/625] eta 0:01:55 lr 0.000152 wd 0.0500 time 0.4003 (0.4204) data time 0.0007 (0.0020) model time 0.3996 (0.4203) loss 5.2785 (6.6285) grad_norm 4.0523 (3.7676) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][360/625] eta 0:01:51 lr 0.000152 wd 0.0500 time 0.4002 (0.4199) data time 0.0008 (0.0020) model time 0.3994 (0.4197) loss 6.9549 (6.6236) grad_norm 2.6020 (3.7406) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][370/625] eta 0:01:46 lr 0.000152 wd 0.0500 time 0.3988 (0.4195) data time 0.0005 (0.0019) model time 0.3983 (0.4192) loss 6.3145 (6.6246) grad_norm 2.8044 (3.7562) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][380/625] eta 0:01:42 lr 0.000152 wd 0.0500 time 0.3989 (0.4190) data time 0.0006 (0.0019) model time 0.3983 (0.4187) loss 6.7762 (6.6358) grad_norm 2.3880 (3.7259) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][390/625] eta 0:01:38 lr 0.000152 wd 0.0500 time 0.3975 (0.4186) data time 0.0006 (0.0019) model time 0.3969 (0.4181) loss 5.1652 (6.6281) grad_norm 2.7039 (3.7036) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][400/625] eta 0:01:34 lr 0.000152 wd 0.0500 time 0.4005 (0.4182) data time 0.0007 (0.0019) model time 0.3998 (0.4177) loss 7.1491 (6.6311) grad_norm 2.7449 (3.6872) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][410/625] eta 0:01:29 lr 0.000151 wd 0.0500 time 0.3977 (0.4178) data time 0.0008 (0.0018) model time 0.3969 (0.4172) loss 7.2682 (6.6290) grad_norm 2.3992 (3.6640) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][420/625] eta 0:01:25 lr 0.000151 wd 0.0500 time 0.4016 (0.4173) data time 0.0006 (0.0018) model time 0.4010 (0.4167) loss 7.0290 (6.6307) grad_norm 3.4100 (3.6486) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][430/625] eta 0:01:21 lr 0.000151 wd 0.0500 time 0.3989 (0.4169) data time 0.0007 (0.0018) model time 0.3982 (0.4162) loss 6.5784 (6.6263) grad_norm 2.0371 (3.6780) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][440/625] eta 0:01:17 lr 0.000151 wd 0.0500 time 0.4060 (0.4166) data time 0.0008 (0.0018) model time 0.4051 (0.4158) loss 6.3385 (6.6343) grad_norm 3.7512 (3.6756) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][450/625] eta 0:01:12 lr 0.000151 wd 0.0500 time 0.3975 (0.4162) data time 0.0008 (0.0017) model time 0.3967 (0.4154) loss 7.2181 (6.6402) grad_norm 2.9921 (3.6739) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][460/625] eta 0:01:08 lr 0.000151 wd 0.0500 time 0.4004 (0.4158) data time 0.0008 (0.0017) model time 0.3996 (0.4150) loss 6.1010 (6.6335) grad_norm 3.1630 (3.9913) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][470/625] eta 0:01:04 lr 0.000151 wd 0.0500 time 0.4001 (0.4155) data time 0.0007 (0.0017) model time 0.3994 (0.4146) loss 6.5691 (6.6342) grad_norm 2.5881 (4.0183) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][480/625] eta 0:01:00 lr 0.000151 wd 0.0500 time 0.4005 (0.4154) data time 0.0007 (0.0017) model time 0.3997 (0.4144) loss 6.4077 (6.6410) grad_norm 3.2318 (4.0070) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][490/625] eta 0:00:56 lr 0.000151 wd 0.0500 time 0.3980 (0.4157) data time 0.0009 (0.0017) model time 0.3971 (0.4148) loss 7.3059 (6.6471) grad_norm 2.8682 (4.0005) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][500/625] eta 0:00:52 lr 0.000151 wd 0.0500 time 0.4014 (0.4161) data time 0.0007 (0.0017) model time 0.4008 (0.4153) loss 6.2817 (6.6540) grad_norm 3.1067 (3.9772) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][510/625] eta 0:00:47 lr 0.000151 wd 0.0500 time 0.5700 (0.4169) data time 0.0007 (0.0016) model time 0.5693 (0.4162) loss 6.7267 (6.6555) grad_norm 2.2983 (3.9723) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][520/625] eta 0:00:43 lr 0.000151 wd 0.0500 time 0.5924 (0.4178) data time 0.0009 (0.0016) model time 0.5915 (0.4172) loss 7.6248 (6.6641) grad_norm 2.1211 (3.9670) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][530/625] eta 0:00:39 lr 0.000151 wd 0.0500 time 0.3988 (0.4191) data time 0.0008 (0.0016) model time 0.3980 (0.4186) loss 5.2824 (6.6610) grad_norm 2.1040 (3.9407) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][540/625] eta 0:00:35 lr 0.000151 wd 0.0500 time 0.6017 (0.4195) data time 0.0008 (0.0016) model time 0.6009 (0.4190) loss 7.1366 (6.6637) grad_norm 2.0454 (3.9189) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][550/625] eta 0:00:31 lr 0.000151 wd 0.0500 time 0.5853 (0.4194) data time 0.0006 (0.0016) model time 0.5847 (0.4189) loss 6.6196 (6.6669) grad_norm 3.0813 (3.9342) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][560/625] eta 0:00:27 lr 0.000150 wd 0.0500 time 0.3954 (0.4191) data time 0.0007 (0.0016) model time 0.3947 (0.4185) loss 6.8560 (6.6670) grad_norm 2.4758 (3.9384) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][570/625] eta 0:00:23 lr 0.000150 wd 0.0500 time 0.4019 (0.4187) data time 0.0008 (0.0016) model time 0.4011 (0.4181) loss 6.3721 (6.6626) grad_norm 2.9096 (3.9628) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][580/625] eta 0:00:18 lr 0.000150 wd 0.0500 time 0.3994 (0.4184) data time 0.0007 (0.0016) model time 0.3987 (0.4177) loss 6.0348 (6.6643) grad_norm 3.2793 (3.9490) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][590/625] eta 0:00:14 lr 0.000150 wd 0.0500 time 0.4095 (0.4180) data time 0.0007 (0.0016) model time 0.4088 (0.4174) loss 5.9964 (6.6632) grad_norm 4.0353 (3.9610) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:34:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][600/625] eta 0:00:10 lr 0.000150 wd 0.0500 time 0.3976 (0.4179) data time 0.0009 (0.0016) model time 0.3967 (0.4172) loss 7.2096 (6.6677) grad_norm 2.9670 (3.9416) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][610/625] eta 0:00:06 lr 0.000150 wd 0.0500 time 0.4000 (0.4176) data time 0.0004 (0.0015) model time 0.3995 (0.4168) loss 5.5470 (6.6665) grad_norm 2.2125 (3.9302) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][620/625] eta 0:00:02 lr 0.000150 wd 0.0500 time 0.3954 (0.4172) data time 0.0004 (0.0015) model time 0.3949 (0.4165) loss 6.2168 (6.6627) grad_norm 2.5376 (3.9141) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 237 training takes 0:04:20 [2024-07-25 09:35:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:35:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:35:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.445 (0.445) Loss 0.5527 (0.5527) Acc@1 89.648 (89.648) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 09:35:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 0.8398 (0.6700) Acc@1 82.178 (86.945) Acc@5 96.729 (97.914) Mem 14939MB [2024-07-25 09:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 0.9238 (0.7735) Acc@1 78.369 (83.956) Acc@5 95.605 (96.912) Mem 14939MB [2024-07-25 09:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.611 Acc@5 96.871 [2024-07-25 09:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:35:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.771 (0.771) Loss 0.5386 (0.5386) Acc@1 90.283 (90.283) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 09:35:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.8237 (0.6619) Acc@1 82.861 (87.052) Acc@5 96.826 (97.931) Mem 14939MB [2024-07-25 09:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9365 (0.7682) Acc@1 78.320 (84.136) Acc@5 95.459 (96.928) Mem 14939MB [2024-07-25 09:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.727 Acc@5 96.883 [2024-07-25 09:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-07-25 09:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.73% [2024-07-25 09:35:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:35:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:35:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][0/625] eta 0:08:16 lr 0.000150 wd 0.0500 time 0.7948 (0.7948) data time 0.4060 (0.4060) model time 0.0000 (0.0000) loss 5.8324 (5.8324) grad_norm 2.2780 (2.2780) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][10/625] eta 0:04:27 lr 0.000150 wd 0.0500 time 0.4005 (0.4346) data time 0.0009 (0.0377) model time 0.0000 (0.0000) loss 6.1382 (6.4996) grad_norm 5.6060 (3.5340) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][20/625] eta 0:04:12 lr 0.000150 wd 0.0500 time 0.4061 (0.4180) data time 0.0008 (0.0201) model time 0.0000 (0.0000) loss 5.5277 (6.5497) grad_norm 3.7484 (3.5595) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][30/625] eta 0:04:05 lr 0.000150 wd 0.0500 time 0.3975 (0.4125) data time 0.0009 (0.0140) model time 0.0000 (0.0000) loss 6.1751 (6.4260) grad_norm 3.0310 (3.3130) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][40/625] eta 0:03:59 lr 0.000150 wd 0.0500 time 0.3963 (0.4093) data time 0.0006 (0.0108) model time 0.0000 (0.0000) loss 7.6915 (6.4646) grad_norm 2.9693 (3.2027) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][50/625] eta 0:03:54 lr 0.000150 wd 0.0500 time 0.4012 (0.4076) data time 0.0009 (0.0091) model time 0.0000 (0.0000) loss 7.0853 (6.5607) grad_norm 1.7641 (3.0989) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][60/625] eta 0:03:51 lr 0.000150 wd 0.0500 time 0.6018 (0.4096) data time 0.0009 (0.0077) model time 0.6009 (0.4185) loss 7.0619 (6.5521) grad_norm 2.9429 (3.0180) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][70/625] eta 0:03:46 lr 0.000150 wd 0.0500 time 0.3992 (0.4082) data time 0.0009 (0.0070) model time 0.3984 (0.4079) loss 8.2957 (6.6215) grad_norm 2.1665 (2.9384) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][80/625] eta 0:03:43 lr 0.000149 wd 0.0500 time 0.5749 (0.4107) data time 0.0006 (0.0062) model time 0.5743 (0.4144) loss 6.5805 (6.6486) grad_norm 3.0286 (3.0015) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][90/625] eta 0:03:41 lr 0.000149 wd 0.0500 time 0.3942 (0.4147) data time 0.0007 (0.0057) model time 0.3934 (0.4224) loss 6.1278 (6.6425) grad_norm 2.5663 (3.0722) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:35:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][100/625] eta 0:03:39 lr 0.000149 wd 0.0500 time 0.3957 (0.4184) data time 0.0007 (0.0052) model time 0.3950 (0.4282) loss 5.8850 (6.6430) grad_norm 2.0894 (3.1489) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:36:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][110/625] eta 0:03:36 lr 0.000149 wd 0.0500 time 0.4000 (0.4201) data time 0.0007 (0.0048) model time 0.3993 (0.4295) loss 8.2464 (6.6600) grad_norm 2.5155 (3.1954) loss_scale 256.0000 (131.4595) mem 14939MB [2024-07-25 09:36:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][120/625] eta 0:03:35 lr 0.000149 wd 0.0500 time 0.4998 (0.4267) data time 0.0008 (0.0045) model time 0.4990 (0.4394) loss 7.2401 (6.6562) grad_norm 4.1786 (3.2153) loss_scale 256.0000 (141.7521) mem 14939MB [2024-07-25 09:36:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][130/625] eta 0:03:32 lr 0.000149 wd 0.0500 time 0.4007 (0.4284) data time 0.0009 (0.0042) model time 0.3997 (0.4405) loss 5.9890 (6.6584) grad_norm 3.1059 (3.3521) loss_scale 256.0000 (150.4733) mem 14939MB [2024-07-25 09:36:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][140/625] eta 0:03:28 lr 0.000149 wd 0.0500 time 0.4001 (0.4289) data time 0.0010 (0.0040) model time 0.3991 (0.4399) loss 6.0732 (6.6428) grad_norm 2.1590 (3.3192) loss_scale 256.0000 (157.9574) mem 14939MB [2024-07-25 09:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][150/625] eta 0:03:23 lr 0.000149 wd 0.0500 time 0.3995 (0.4278) data time 0.0009 (0.0038) model time 0.3985 (0.4370) loss 7.4466 (6.6492) grad_norm 1.9508 (3.3948) loss_scale 256.0000 (164.4503) mem 14939MB [2024-07-25 09:36:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][160/625] eta 0:03:18 lr 0.000149 wd 0.0500 time 0.4001 (0.4263) data time 0.0007 (0.0036) model time 0.3993 (0.4340) loss 6.1697 (6.6565) grad_norm 2.5988 (3.6003) loss_scale 256.0000 (170.1366) mem 14939MB [2024-07-25 09:36:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][170/625] eta 0:03:13 lr 0.000149 wd 0.0500 time 0.3959 (0.4247) data time 0.0008 (0.0034) model time 0.3951 (0.4309) loss 6.7016 (6.6647) grad_norm 2.0520 (3.6026) loss_scale 256.0000 (175.1579) mem 14939MB [2024-07-25 09:36:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][180/625] eta 0:03:08 lr 0.000149 wd 0.0500 time 0.3978 (0.4234) data time 0.0006 (0.0033) model time 0.3972 (0.4286) loss 5.8512 (6.6470) grad_norm 2.8342 (3.5597) loss_scale 256.0000 (179.6243) mem 14939MB [2024-07-25 09:36:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][190/625] eta 0:03:03 lr 0.000149 wd 0.0500 time 0.4621 (0.4225) data time 0.0006 (0.0031) model time 0.4615 (0.4269) loss 5.8187 (6.6340) grad_norm 3.6748 (3.5182) loss_scale 256.0000 (183.6230) mem 14939MB [2024-07-25 09:36:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][200/625] eta 0:02:59 lr 0.000149 wd 0.0500 time 0.3968 (0.4214) data time 0.0008 (0.0033) model time 0.3960 (0.4248) loss 6.5236 (6.6310) grad_norm 3.3616 (3.5126) loss_scale 256.0000 (187.2239) mem 14939MB [2024-07-25 09:36:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][210/625] eta 0:02:54 lr 0.000149 wd 0.0500 time 0.4021 (0.4203) data time 0.0006 (0.0032) model time 0.4014 (0.4230) loss 6.9553 (6.6250) grad_norm 3.0339 (3.4607) loss_scale 256.0000 (190.4834) mem 14939MB [2024-07-25 09:36:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][220/625] eta 0:02:49 lr 0.000149 wd 0.0500 time 0.3963 (0.4196) data time 0.0006 (0.0032) model time 0.3957 (0.4218) loss 4.9124 (6.6193) grad_norm 2.8164 (3.5222) loss_scale 256.0000 (193.4480) mem 14939MB [2024-07-25 09:36:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][230/625] eta 0:02:45 lr 0.000148 wd 0.0500 time 0.3977 (0.4187) data time 0.0009 (0.0031) model time 0.3968 (0.4205) loss 6.1273 (6.6203) grad_norm 2.7080 (3.5053) loss_scale 256.0000 (196.1558) mem 14939MB [2024-07-25 09:36:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][240/625] eta 0:02:40 lr 0.000148 wd 0.0500 time 0.3971 (0.4179) data time 0.0009 (0.0030) model time 0.3963 (0.4193) loss 7.6208 (6.6504) grad_norm 2.5693 (3.4682) loss_scale 256.0000 (198.6390) mem 14939MB [2024-07-25 09:37:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][250/625] eta 0:02:36 lr 0.000148 wd 0.0500 time 0.3952 (0.4173) data time 0.0007 (0.0029) model time 0.3946 (0.4184) loss 6.6336 (6.6548) grad_norm 2.8149 (3.4761) loss_scale 256.0000 (200.9243) mem 14939MB [2024-07-25 09:37:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][260/625] eta 0:02:32 lr 0.000148 wd 0.0500 time 0.3951 (0.4166) data time 0.0008 (0.0028) model time 0.3943 (0.4175) loss 5.9930 (6.6404) grad_norm 3.1959 (3.4515) loss_scale 256.0000 (203.0345) mem 14939MB [2024-07-25 09:37:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][270/625] eta 0:02:27 lr 0.000148 wd 0.0500 time 0.3985 (0.4160) data time 0.0008 (0.0028) model time 0.3977 (0.4167) loss 6.0608 (6.6347) grad_norm 3.4402 (3.4260) loss_scale 256.0000 (204.9889) mem 14939MB [2024-07-25 09:37:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][280/625] eta 0:02:23 lr 0.000148 wd 0.0500 time 0.4082 (0.4158) data time 0.0009 (0.0027) model time 0.4074 (0.4163) loss 7.9381 (6.6402) grad_norm 2.0203 (3.4046) loss_scale 256.0000 (206.8043) mem 14939MB [2024-07-25 09:37:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][290/625] eta 0:02:19 lr 0.000148 wd 0.0500 time 0.3955 (0.4158) data time 0.0006 (0.0027) model time 0.3948 (0.4162) loss 7.2295 (6.6394) grad_norm 46.0956 (3.5715) loss_scale 256.0000 (208.4948) mem 14939MB [2024-07-25 09:37:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][300/625] eta 0:02:15 lr 0.000148 wd 0.0500 time 0.3985 (0.4159) data time 0.0007 (0.0026) model time 0.3978 (0.4163) loss 6.2904 (6.6375) grad_norm 2.7369 (3.5799) loss_scale 256.0000 (210.0731) mem 14939MB [2024-07-25 09:37:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][310/625] eta 0:02:11 lr 0.000148 wd 0.0500 time 0.3993 (0.4169) data time 0.0006 (0.0025) model time 0.3987 (0.4174) loss 7.0669 (6.6500) grad_norm 5.6770 (3.5719) loss_scale 256.0000 (211.5498) mem 14939MB [2024-07-25 09:37:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][320/625] eta 0:02:07 lr 0.000148 wd 0.0500 time 0.3974 (0.4180) data time 0.0006 (0.0025) model time 0.3968 (0.4187) loss 6.9083 (6.6560) grad_norm 2.4831 (3.5735) loss_scale 256.0000 (212.9346) mem 14939MB [2024-07-25 09:37:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][330/625] eta 0:02:03 lr 0.000148 wd 0.0500 time 0.3990 (0.4197) data time 0.0007 (0.0024) model time 0.3983 (0.4207) loss 6.7597 (6.6505) grad_norm 5.4312 (3.5705) loss_scale 256.0000 (214.2356) mem 14939MB [2024-07-25 09:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][340/625] eta 0:02:00 lr 0.000148 wd 0.0500 time 0.5718 (0.4219) data time 0.0010 (0.0024) model time 0.5708 (0.4232) loss 5.8547 (6.6564) grad_norm 4.9315 (3.5690) loss_scale 256.0000 (215.4604) mem 14939MB [2024-07-25 09:37:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][350/625] eta 0:01:56 lr 0.000148 wd 0.0500 time 0.3976 (0.4232) data time 0.0007 (0.0023) model time 0.3969 (0.4247) loss 7.2787 (6.6591) grad_norm 2.1648 (3.5381) loss_scale 256.0000 (216.6154) mem 14939MB [2024-07-25 09:37:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][360/625] eta 0:01:52 lr 0.000148 wd 0.0500 time 0.4040 (0.4231) data time 0.0006 (0.0023) model time 0.4033 (0.4244) loss 6.0551 (6.6555) grad_norm 2.8801 (3.5137) loss_scale 256.0000 (217.7064) mem 14939MB [2024-07-25 09:37:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][370/625] eta 0:01:47 lr 0.000148 wd 0.0500 time 0.3995 (0.4228) data time 0.0007 (0.0023) model time 0.3988 (0.4240) loss 6.2562 (6.6460) grad_norm 2.1288 (3.5073) loss_scale 256.0000 (218.7385) mem 14939MB [2024-07-25 09:37:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][380/625] eta 0:01:43 lr 0.000147 wd 0.0500 time 0.4002 (0.4222) data time 0.0009 (0.0022) model time 0.3993 (0.4232) loss 6.3929 (6.6407) grad_norm 2.2278 (3.5103) loss_scale 256.0000 (219.7165) mem 14939MB [2024-07-25 09:38:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][390/625] eta 0:01:39 lr 0.000147 wd 0.0500 time 0.4069 (0.4216) data time 0.0008 (0.0022) model time 0.4061 (0.4225) loss 6.5959 (6.6434) grad_norm 3.8761 (3.4875) loss_scale 256.0000 (220.6445) mem 14939MB [2024-07-25 09:38:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][400/625] eta 0:01:34 lr 0.000147 wd 0.0500 time 0.4068 (0.4211) data time 0.0006 (0.0022) model time 0.4062 (0.4219) loss 5.4925 (6.6382) grad_norm 2.0060 (3.4697) loss_scale 256.0000 (221.5262) mem 14939MB [2024-07-25 09:38:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][410/625] eta 0:01:30 lr 0.000147 wd 0.0500 time 0.3993 (0.4205) data time 0.0007 (0.0021) model time 0.3986 (0.4212) loss 6.4939 (6.6409) grad_norm 5.7413 (3.4599) loss_scale 256.0000 (222.3650) mem 14939MB [2024-07-25 09:38:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][420/625] eta 0:01:26 lr 0.000147 wd 0.0500 time 0.4031 (0.4200) data time 0.0007 (0.0021) model time 0.4024 (0.4206) loss 5.8357 (6.6451) grad_norm 3.1102 (3.4547) loss_scale 256.0000 (223.1639) mem 14939MB [2024-07-25 09:38:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][430/625] eta 0:01:21 lr 0.000147 wd 0.0500 time 0.4092 (0.4196) data time 0.0009 (0.0021) model time 0.4083 (0.4201) loss 7.4484 (6.6370) grad_norm 2.0059 (3.4682) loss_scale 256.0000 (223.9258) mem 14939MB [2024-07-25 09:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][440/625] eta 0:01:17 lr 0.000147 wd 0.0500 time 0.4004 (0.4191) data time 0.0007 (0.0020) model time 0.3997 (0.4195) loss 6.1979 (6.6335) grad_norm 5.5066 (3.4560) loss_scale 256.0000 (224.6531) mem 14939MB [2024-07-25 09:38:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][450/625] eta 0:01:13 lr 0.000147 wd 0.0500 time 0.4005 (0.4187) data time 0.0007 (0.0020) model time 0.3998 (0.4190) loss 6.2825 (6.6238) grad_norm 3.2067 (3.4493) loss_scale 256.0000 (225.3481) mem 14939MB [2024-07-25 09:38:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][460/625] eta 0:01:09 lr 0.000147 wd 0.0500 time 0.4017 (0.4183) data time 0.0009 (0.0020) model time 0.4008 (0.4185) loss 6.2943 (6.6296) grad_norm 3.0078 (3.4331) loss_scale 256.0000 (226.0130) mem 14939MB [2024-07-25 09:38:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][470/625] eta 0:01:04 lr 0.000147 wd 0.0500 time 0.4059 (0.4179) data time 0.0006 (0.0020) model time 0.4053 (0.4180) loss 7.7500 (6.6352) grad_norm 3.5760 (3.4205) loss_scale 256.0000 (226.6497) mem 14939MB [2024-07-25 09:38:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][480/625] eta 0:01:00 lr 0.000147 wd 0.0500 time 0.4065 (0.4175) data time 0.0007 (0.0019) model time 0.4058 (0.4176) loss 7.3409 (6.6266) grad_norm 3.7522 (3.4215) loss_scale 256.0000 (227.2599) mem 14939MB [2024-07-25 09:38:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][490/625] eta 0:00:56 lr 0.000147 wd 0.0500 time 0.3960 (0.4172) data time 0.0006 (0.0019) model time 0.3954 (0.4172) loss 5.3954 (6.6204) grad_norm 4.1502 (3.4342) loss_scale 256.0000 (227.8452) mem 14939MB [2024-07-25 09:38:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][500/625] eta 0:00:52 lr 0.000147 wd 0.0500 time 0.3971 (0.4168) data time 0.0007 (0.0019) model time 0.3964 (0.4168) loss 5.5232 (6.6140) grad_norm 1.7713 (3.4194) loss_scale 256.0000 (228.4072) mem 14939MB [2024-07-25 09:38:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][510/625] eta 0:00:47 lr 0.000147 wd 0.0500 time 0.4101 (0.4168) data time 0.0007 (0.0019) model time 0.4094 (0.4167) loss 5.7697 (6.6207) grad_norm 2.6861 (3.4232) loss_scale 256.0000 (228.9472) mem 14939MB [2024-07-25 09:38:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][520/625] eta 0:00:43 lr 0.000146 wd 0.0500 time 0.4026 (0.4168) data time 0.0010 (0.0019) model time 0.4015 (0.4167) loss 5.9895 (6.6210) grad_norm 3.0833 (3.4205) loss_scale 256.0000 (229.4664) mem 14939MB [2024-07-25 09:38:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][530/625] eta 0:00:39 lr 0.000146 wd 0.0500 time 0.4010 (0.4172) data time 0.0007 (0.0018) model time 0.4003 (0.4171) loss 6.5248 (6.6168) grad_norm 2.3717 (3.4142) loss_scale 256.0000 (229.9661) mem 14939MB [2024-07-25 09:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][540/625] eta 0:00:35 lr 0.000146 wd 0.0500 time 0.4071 (0.4175) data time 0.0009 (0.0018) model time 0.4063 (0.4174) loss 7.4697 (6.6205) grad_norm 5.7745 (3.4255) loss_scale 256.0000 (230.4473) mem 14939MB [2024-07-25 09:39:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][550/625] eta 0:00:31 lr 0.000146 wd 0.0500 time 0.4021 (0.4180) data time 0.0007 (0.0018) model time 0.4015 (0.4180) loss 6.3424 (6.6198) grad_norm 4.6112 (3.4451) loss_scale 256.0000 (230.9111) mem 14939MB [2024-07-25 09:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][560/625] eta 0:00:27 lr 0.000146 wd 0.0500 time 0.5922 (0.4193) data time 0.0009 (0.0018) model time 0.5913 (0.4194) loss 7.1895 (6.6206) grad_norm 2.9871 (3.4550) loss_scale 256.0000 (231.3583) mem 14939MB [2024-07-25 09:39:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][570/625] eta 0:00:23 lr 0.000146 wd 0.0500 time 0.5860 (0.4200) data time 0.0009 (0.0018) model time 0.5851 (0.4201) loss 7.3828 (6.6259) grad_norm 2.5543 (3.4525) loss_scale 256.0000 (231.7898) mem 14939MB [2024-07-25 09:39:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][580/625] eta 0:00:18 lr 0.000146 wd 0.0500 time 0.4057 (0.4200) data time 0.0009 (0.0018) model time 0.4048 (0.4201) loss 5.6215 (6.6254) grad_norm 1.7929 (3.4410) loss_scale 256.0000 (232.2065) mem 14939MB [2024-07-25 09:39:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][590/625] eta 0:00:14 lr 0.000146 wd 0.0500 time 0.3963 (0.4199) data time 0.0008 (0.0018) model time 0.3954 (0.4200) loss 5.3444 (6.6237) grad_norm 2.7202 (3.4309) loss_scale 256.0000 (232.6091) mem 14939MB [2024-07-25 09:39:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][600/625] eta 0:00:10 lr 0.000146 wd 0.0500 time 0.3951 (0.4196) data time 0.0008 (0.0017) model time 0.3942 (0.4196) loss 6.8060 (6.6193) grad_norm 2.4661 (3.4264) loss_scale 256.0000 (232.9983) mem 14939MB [2024-07-25 09:39:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][610/625] eta 0:00:06 lr 0.000146 wd 0.0500 time 0.3962 (0.4192) data time 0.0006 (0.0017) model time 0.3956 (0.4192) loss 6.4108 (6.6177) grad_norm 2.9466 (3.4261) loss_scale 256.0000 (233.3748) mem 14939MB [2024-07-25 09:39:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][620/625] eta 0:00:02 lr 0.000146 wd 0.0500 time 0.3964 (0.4188) data time 0.0004 (0.0017) model time 0.3959 (0.4188) loss 7.7455 (6.6208) grad_norm 3.3642 (3.4180) loss_scale 256.0000 (233.7391) mem 14939MB [2024-07-25 09:39:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 238 training takes 0:04:21 [2024-07-25 09:39:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:39:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.5454 (0.5454) Acc@1 90.283 (90.283) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 09:39:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8174 (0.6647) Acc@1 82.715 (87.012) Acc@5 96.777 (97.918) Mem 14939MB [2024-07-25 09:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9482 (0.7742) Acc@1 77.588 (83.947) Acc@5 95.801 (96.901) Mem 14939MB [2024-07-25 09:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.599 Acc@5 96.865 [2024-07-25 09:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-07-25 09:39:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.865 (0.865) Loss 0.5386 (0.5386) Acc@1 90.234 (90.234) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 09:39:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.157) Loss 0.8232 (0.6617) Acc@1 82.861 (87.056) Acc@5 96.924 (97.931) Mem 14939MB [2024-07-25 09:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9365 (0.7678) Acc@1 78.467 (84.149) Acc@5 95.459 (96.938) Mem 14939MB [2024-07-25 09:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.745 Acc@5 96.889 [2024-07-25 09:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-07-25 09:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.75% [2024-07-25 09:39:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:39:44 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:39:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][0/625] eta 0:08:00 lr 0.000146 wd 0.0500 time 0.7683 (0.7683) data time 0.3884 (0.3884) model time 0.0000 (0.0000) loss 6.3301 (6.3301) grad_norm 1.9688 (1.9688) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:39:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][10/625] eta 0:04:25 lr 0.000146 wd 0.0500 time 0.3981 (0.4312) data time 0.0010 (0.0361) model time 0.0000 (0.0000) loss 6.8508 (6.6241) grad_norm 4.0979 (3.2588) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:39:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][20/625] eta 0:04:11 lr 0.000146 wd 0.0500 time 0.3970 (0.4163) data time 0.0007 (0.0193) model time 0.0000 (0.0000) loss 6.7867 (6.6293) grad_norm 4.6634 (3.3758) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:39:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][30/625] eta 0:04:04 lr 0.000146 wd 0.0500 time 0.3980 (0.4105) data time 0.0007 (0.0133) model time 0.0000 (0.0000) loss 6.8841 (6.6225) grad_norm 1.9145 (3.1589) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][40/625] eta 0:04:00 lr 0.000146 wd 0.0500 time 0.3957 (0.4114) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 7.6941 (6.6314) grad_norm 3.5071 (3.0751) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][50/625] eta 0:03:55 lr 0.000145 wd 0.0500 time 0.3997 (0.4088) data time 0.0007 (0.0084) model time 0.0000 (0.0000) loss 7.6230 (6.5973) grad_norm 10.1722 (3.4548) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][60/625] eta 0:03:49 lr 0.000145 wd 0.0500 time 0.3968 (0.4070) data time 0.0009 (0.0072) model time 0.3959 (0.3970) loss 6.6935 (6.5457) grad_norm 1.8881 (3.4616) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][70/625] eta 0:03:45 lr 0.000145 wd 0.0500 time 0.4042 (0.4060) data time 0.0007 (0.0063) model time 0.4036 (0.3981) loss 6.2513 (6.5682) grad_norm 3.7282 (3.4137) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][80/625] eta 0:03:40 lr 0.000145 wd 0.0500 time 0.3977 (0.4054) data time 0.0006 (0.0056) model time 0.3971 (0.3987) loss 6.5615 (6.5474) grad_norm 2.6704 (3.3329) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][90/625] eta 0:03:36 lr 0.000145 wd 0.0500 time 0.3986 (0.4047) data time 0.0006 (0.0051) model time 0.3980 (0.3986) loss 5.9275 (6.5052) grad_norm 3.3695 (3.3659) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][100/625] eta 0:03:32 lr 0.000145 wd 0.0500 time 0.3990 (0.4042) data time 0.0008 (0.0047) model time 0.3981 (0.3987) loss 5.9133 (6.5343) grad_norm 4.5106 (3.3421) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][110/625] eta 0:03:27 lr 0.000145 wd 0.0500 time 0.3992 (0.4038) data time 0.0008 (0.0043) model time 0.3984 (0.3987) loss 5.3601 (6.5517) grad_norm 3.0031 (3.3181) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][120/625] eta 0:03:25 lr 0.000145 wd 0.0500 time 0.6100 (0.4068) data time 0.0008 (0.0041) model time 0.6091 (0.4044) loss 5.5225 (6.5423) grad_norm 8.7053 (3.2916) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][130/625] eta 0:03:22 lr 0.000145 wd 0.0500 time 0.5955 (0.4094) data time 0.0009 (0.0038) model time 0.5946 (0.4090) loss 6.8129 (6.5430) grad_norm 3.8861 (3.3043) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][140/625] eta 0:03:20 lr 0.000145 wd 0.0500 time 0.5115 (0.4136) data time 0.0010 (0.0036) model time 0.5106 (0.4155) loss 6.8515 (6.5333) grad_norm 3.4052 (3.3327) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][150/625] eta 0:03:17 lr 0.000145 wd 0.0500 time 0.5852 (0.4163) data time 0.0009 (0.0034) model time 0.5843 (0.4192) loss 6.3227 (6.5511) grad_norm 1.9810 (3.3405) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][160/625] eta 0:03:15 lr 0.000145 wd 0.0500 time 0.5900 (0.4194) data time 0.0008 (0.0033) model time 0.5892 (0.4234) loss 5.6541 (6.5213) grad_norm 3.3320 (3.3524) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:40:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][170/625] eta 0:03:11 lr 0.000145 wd 0.0500 time 0.3979 (0.4203) data time 0.0007 (0.0031) model time 0.3973 (0.4243) loss 7.4851 (6.5300) grad_norm 2.5308 (3.4239) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][180/625] eta 0:03:06 lr 0.000145 wd 0.0500 time 0.4014 (0.4197) data time 0.0007 (0.0030) model time 0.4006 (0.4232) loss 7.1285 (6.5448) grad_norm 2.8853 (3.3914) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][190/625] eta 0:03:02 lr 0.000144 wd 0.0500 time 0.3980 (0.4196) data time 0.0009 (0.0029) model time 0.3971 (0.4227) loss 7.7238 (6.5273) grad_norm 3.0098 (3.3822) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][200/625] eta 0:02:57 lr 0.000144 wd 0.0500 time 0.3960 (0.4186) data time 0.0009 (0.0028) model time 0.3951 (0.4210) loss 8.7163 (6.5431) grad_norm 3.2470 (3.3464) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][210/625] eta 0:02:53 lr 0.000144 wd 0.0500 time 0.3983 (0.4178) data time 0.0009 (0.0027) model time 0.3975 (0.4197) loss 5.8570 (6.5310) grad_norm 2.1330 (3.3253) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][220/625] eta 0:02:48 lr 0.000144 wd 0.0500 time 0.4001 (0.4169) data time 0.0006 (0.0026) model time 0.3995 (0.4185) loss 7.0203 (6.5195) grad_norm 3.2656 (3.3069) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][230/625] eta 0:02:44 lr 0.000144 wd 0.0500 time 0.4006 (0.4162) data time 0.0008 (0.0026) model time 0.3998 (0.4173) loss 7.2673 (6.5385) grad_norm 2.0420 (3.3004) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][240/625] eta 0:02:39 lr 0.000144 wd 0.0500 time 0.4000 (0.4155) data time 0.0009 (0.0025) model time 0.3992 (0.4164) loss 7.2342 (6.5307) grad_norm 2.1329 (3.2892) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][250/625] eta 0:02:35 lr 0.000144 wd 0.0500 time 0.3990 (0.4149) data time 0.0009 (0.0024) model time 0.3982 (0.4155) loss 5.4966 (6.5275) grad_norm 2.2727 (3.3020) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][260/625] eta 0:02:31 lr 0.000144 wd 0.0500 time 0.5424 (0.4149) data time 0.0007 (0.0024) model time 0.5417 (0.4154) loss 6.7784 (6.5330) grad_norm 2.3992 (3.3245) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][270/625] eta 0:02:27 lr 0.000144 wd 0.0500 time 0.3976 (0.4142) data time 0.0007 (0.0023) model time 0.3968 (0.4146) loss 6.2728 (6.5305) grad_norm 2.9834 (3.3054) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][280/625] eta 0:02:22 lr 0.000144 wd 0.0500 time 0.3981 (0.4137) data time 0.0008 (0.0023) model time 0.3972 (0.4138) loss 5.9850 (6.5180) grad_norm 2.1884 (3.3358) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][290/625] eta 0:02:18 lr 0.000144 wd 0.0500 time 0.3999 (0.4132) data time 0.0007 (0.0022) model time 0.3992 (0.4132) loss 6.6687 (6.5199) grad_norm 2.4875 (3.3140) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][300/625] eta 0:02:14 lr 0.000144 wd 0.0500 time 0.3979 (0.4127) data time 0.0010 (0.0022) model time 0.3970 (0.4126) loss 7.4675 (6.5064) grad_norm 2.3218 (3.3043) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][310/625] eta 0:02:09 lr 0.000144 wd 0.0500 time 0.4068 (0.4123) data time 0.0007 (0.0021) model time 0.4061 (0.4121) loss 7.3454 (6.5100) grad_norm 2.6832 (3.2960) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:41:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][320/625] eta 0:02:05 lr 0.000144 wd 0.0500 time 0.4011 (0.4120) data time 0.0006 (0.0021) model time 0.4004 (0.4117) loss 5.4670 (6.5088) grad_norm 2.4735 (3.3462) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][330/625] eta 0:02:01 lr 0.000144 wd 0.0500 time 0.3972 (0.4116) data time 0.0007 (0.0021) model time 0.3964 (0.4112) loss 8.0464 (6.5173) grad_norm 2.0813 (3.3878) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][340/625] eta 0:01:57 lr 0.000143 wd 0.0500 time 0.5900 (0.4126) data time 0.0009 (0.0020) model time 0.5891 (0.4124) loss 7.5942 (6.5248) grad_norm 4.0200 (3.3808) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][350/625] eta 0:01:53 lr 0.000143 wd 0.0500 time 0.5111 (0.4129) data time 0.0007 (0.0020) model time 0.5104 (0.4127) loss 7.6489 (6.5322) grad_norm 4.4847 (3.4002) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][360/625] eta 0:01:49 lr 0.000143 wd 0.0500 time 0.6018 (0.4140) data time 0.0008 (0.0020) model time 0.6010 (0.4139) loss 6.7204 (6.5326) grad_norm 3.2624 (3.4043) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][370/625] eta 0:01:45 lr 0.000143 wd 0.0500 time 0.3941 (0.4150) data time 0.0008 (0.0019) model time 0.3933 (0.4151) loss 6.6736 (6.5368) grad_norm 3.3163 (3.4307) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][380/625] eta 0:01:42 lr 0.000143 wd 0.0500 time 0.5953 (0.4169) data time 0.0009 (0.0019) model time 0.5944 (0.4173) loss 6.9516 (6.5356) grad_norm 3.3844 (3.4071) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][390/625] eta 0:01:38 lr 0.000143 wd 0.0500 time 0.3951 (0.4176) data time 0.0009 (0.0019) model time 0.3943 (0.4180) loss 5.2444 (6.5377) grad_norm 3.7337 (3.4093) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][400/625] eta 0:01:33 lr 0.000143 wd 0.0500 time 0.3976 (0.4176) data time 0.0008 (0.0018) model time 0.3968 (0.4180) loss 6.5778 (6.5452) grad_norm 3.0788 (3.4534) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][410/625] eta 0:01:29 lr 0.000143 wd 0.0500 time 0.4387 (0.4177) data time 0.0009 (0.0018) model time 0.4378 (0.4181) loss 5.4858 (6.5433) grad_norm 2.3163 (3.4416) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][420/625] eta 0:01:25 lr 0.000143 wd 0.0500 time 0.3952 (0.4173) data time 0.0007 (0.0018) model time 0.3945 (0.4176) loss 5.2340 (6.5389) grad_norm 2.4973 (3.4402) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][430/625] eta 0:01:21 lr 0.000143 wd 0.0500 time 0.3972 (0.4168) data time 0.0006 (0.0018) model time 0.3966 (0.4170) loss 6.7277 (6.5357) grad_norm 4.4673 (3.5070) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][440/625] eta 0:01:17 lr 0.000143 wd 0.0500 time 0.3952 (0.4164) data time 0.0007 (0.0018) model time 0.3945 (0.4165) loss 6.6382 (6.5382) grad_norm 2.8709 (3.5003) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][450/625] eta 0:01:12 lr 0.000143 wd 0.0500 time 0.3918 (0.4160) data time 0.0008 (0.0017) model time 0.3910 (0.4160) loss 7.3377 (6.5409) grad_norm 2.1094 (3.4768) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][460/625] eta 0:01:08 lr 0.000143 wd 0.0500 time 0.3959 (0.4156) data time 0.0009 (0.0017) model time 0.3950 (0.4155) loss 7.2114 (6.5422) grad_norm 7.0319 (3.4660) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:42:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][470/625] eta 0:01:04 lr 0.000143 wd 0.0500 time 0.3969 (0.4152) data time 0.0007 (0.0017) model time 0.3962 (0.4151) loss 6.4008 (6.5396) grad_norm 2.9651 (3.4605) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][480/625] eta 0:01:00 lr 0.000143 wd 0.0500 time 0.4011 (0.4150) data time 0.0006 (0.0017) model time 0.4005 (0.4148) loss 6.7105 (6.5353) grad_norm 3.5725 (3.4466) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][490/625] eta 0:00:56 lr 0.000142 wd 0.0500 time 0.3968 (0.4151) data time 0.0008 (0.0017) model time 0.3960 (0.4149) loss 6.0759 (6.5381) grad_norm 4.3943 (3.5382) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][500/625] eta 0:00:51 lr 0.000142 wd 0.0500 time 0.3974 (0.4148) data time 0.0009 (0.0016) model time 0.3965 (0.4146) loss 6.6814 (6.5428) grad_norm 3.7784 (3.5415) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][510/625] eta 0:00:47 lr 0.000142 wd 0.0500 time 0.3974 (0.4146) data time 0.0007 (0.0017) model time 0.3966 (0.4143) loss 5.5029 (6.5411) grad_norm 7.1839 (3.5458) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][520/625] eta 0:00:43 lr 0.000142 wd 0.0500 time 0.3965 (0.4142) data time 0.0009 (0.0017) model time 0.3957 (0.4139) loss 7.9618 (6.5488) grad_norm 2.5812 (3.5731) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][530/625] eta 0:00:39 lr 0.000142 wd 0.0500 time 0.4085 (0.4140) data time 0.0009 (0.0017) model time 0.4076 (0.4136) loss 6.0916 (6.5504) grad_norm 4.0342 (3.6517) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][540/625] eta 0:00:35 lr 0.000142 wd 0.0500 time 0.3979 (0.4137) data time 0.0007 (0.0016) model time 0.3972 (0.4133) loss 5.2341 (6.5525) grad_norm 3.5747 (3.6354) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][550/625] eta 0:00:31 lr 0.000142 wd 0.0500 time 0.3988 (0.4134) data time 0.0009 (0.0016) model time 0.3980 (0.4130) loss 6.9495 (6.5492) grad_norm 3.2483 (3.6240) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][560/625] eta 0:00:26 lr 0.000142 wd 0.0500 time 0.4022 (0.4137) data time 0.0007 (0.0016) model time 0.4015 (0.4132) loss 7.0823 (6.5547) grad_norm 2.8316 (3.6178) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][570/625] eta 0:00:22 lr 0.000142 wd 0.0500 time 0.5918 (0.4140) data time 0.0008 (0.0016) model time 0.5910 (0.4136) loss 6.8141 (6.5596) grad_norm 2.1427 (3.6010) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][580/625] eta 0:00:18 lr 0.000142 wd 0.0500 time 0.5949 (0.4150) data time 0.0006 (0.0016) model time 0.5942 (0.4147) loss 7.0086 (6.5657) grad_norm 2.1299 (3.5831) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][590/625] eta 0:00:14 lr 0.000142 wd 0.0500 time 0.5934 (0.4166) data time 0.0008 (0.0016) model time 0.5926 (0.4164) loss 7.3929 (6.5708) grad_norm 2.4071 (3.6199) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][600/625] eta 0:00:10 lr 0.000142 wd 0.0500 time 0.5999 (0.4177) data time 0.0006 (0.0016) model time 0.5993 (0.4177) loss 6.9557 (6.5679) grad_norm 1.7940 (3.6064) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:43:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][610/625] eta 0:00:06 lr 0.000142 wd 0.0500 time 0.5762 (0.4184) data time 0.0006 (0.0016) model time 0.5755 (0.4183) loss 7.1209 (6.5713) grad_norm 2.7659 (3.6031) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][620/625] eta 0:00:02 lr 0.000142 wd 0.0500 time 0.3982 (0.4181) data time 0.0005 (0.0016) model time 0.3977 (0.4180) loss 6.2575 (6.5721) grad_norm 2.2583 (3.6105) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 239 training takes 0:04:21 [2024-07-25 09:44:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:44:06 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:44:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.5308 (0.5308) Acc@1 90.527 (90.527) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 09:44:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8267 (0.6582) Acc@1 82.666 (87.167) Acc@5 96.777 (97.905) Mem 14939MB [2024-07-25 09:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9277 (0.7670) Acc@1 78.955 (84.170) Acc@5 95.361 (96.919) Mem 14939MB [2024-07-25 09:44:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.785 Acc@5 96.883 [2024-07-25 09:44:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 09:44:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.79% [2024-07-25 09:44:09 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 09:44:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 09:44:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.446 (0.446) Loss 0.5381 (0.5381) Acc@1 90.283 (90.283) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 09:44:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.118) Loss 0.8218 (0.6613) Acc@1 82.861 (87.105) Acc@5 96.826 (97.931) Mem 14939MB [2024-07-25 09:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9355 (0.7674) Acc@1 78.467 (84.187) Acc@5 95.459 (96.933) Mem 14939MB [2024-07-25 09:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.787 Acc@5 96.885 [2024-07-25 09:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 09:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.79% [2024-07-25 09:44:12 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:44:13 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:44:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][0/625] eta 0:08:12 lr 0.000142 wd 0.0500 time 0.7883 (0.7883) data time 0.4141 (0.4141) model time 0.0000 (0.0000) loss 7.0613 (7.0613) grad_norm 1.9787 (1.9787) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][10/625] eta 0:04:27 lr 0.000142 wd 0.0500 time 0.3983 (0.4346) data time 0.0007 (0.0385) model time 0.0000 (0.0000) loss 5.8661 (6.4843) grad_norm 2.1536 (2.2140) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][20/625] eta 0:04:12 lr 0.000141 wd 0.0500 time 0.4005 (0.4180) data time 0.0011 (0.0206) model time 0.0000 (0.0000) loss 7.3339 (6.6925) grad_norm 2.8538 (2.5774) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][30/625] eta 0:04:05 lr 0.000141 wd 0.0500 time 0.3983 (0.4121) data time 0.0006 (0.0142) model time 0.0000 (0.0000) loss 6.6885 (6.6695) grad_norm 3.5977 (2.9032) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][40/625] eta 0:03:59 lr 0.000141 wd 0.0500 time 0.4018 (0.4089) data time 0.0006 (0.0110) model time 0.0000 (0.0000) loss 6.6531 (6.6177) grad_norm 2.8655 (2.8636) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][50/625] eta 0:03:54 lr 0.000141 wd 0.0500 time 0.3934 (0.4070) data time 0.0006 (0.0090) model time 0.0000 (0.0000) loss 6.1811 (6.6286) grad_norm 2.2739 (2.7871) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][60/625] eta 0:03:49 lr 0.000141 wd 0.0500 time 0.3970 (0.4057) data time 0.0008 (0.0077) model time 0.3962 (0.3974) loss 6.4881 (6.5985) grad_norm 2.9381 (2.7080) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][70/625] eta 0:03:45 lr 0.000141 wd 0.0500 time 0.4002 (0.4072) data time 0.0006 (0.0068) model time 0.3995 (0.4065) loss 5.0624 (6.5769) grad_norm 2.8843 (2.7494) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][80/625] eta 0:03:41 lr 0.000141 wd 0.0500 time 0.3993 (0.4061) data time 0.0010 (0.0060) model time 0.3983 (0.4035) loss 7.4565 (6.5745) grad_norm 2.2062 (2.7688) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][90/625] eta 0:03:36 lr 0.000141 wd 0.0500 time 0.3956 (0.4053) data time 0.0009 (0.0055) model time 0.3947 (0.4021) loss 7.9431 (6.5831) grad_norm 1.9376 (2.7222) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][100/625] eta 0:03:32 lr 0.000141 wd 0.0500 time 0.3981 (0.4051) data time 0.0008 (0.0050) model time 0.3973 (0.4021) loss 5.5330 (6.5514) grad_norm 4.5211 (2.7655) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:44:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][110/625] eta 0:03:28 lr 0.000141 wd 0.0500 time 0.3959 (0.4051) data time 0.0006 (0.0046) model time 0.3953 (0.4025) loss 7.0829 (6.5495) grad_norm 2.3944 (2.7912) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][120/625] eta 0:03:24 lr 0.000141 wd 0.0500 time 0.4069 (0.4046) data time 0.0008 (0.0043) model time 0.4060 (0.4020) loss 6.2247 (6.5676) grad_norm 2.7480 (2.8143) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][130/625] eta 0:03:20 lr 0.000141 wd 0.0500 time 0.3950 (0.4041) data time 0.0009 (0.0041) model time 0.3941 (0.4013) loss 7.0810 (6.5665) grad_norm 2.3447 (2.8023) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][140/625] eta 0:03:16 lr 0.000141 wd 0.0500 time 0.3941 (0.4045) data time 0.0007 (0.0038) model time 0.3934 (0.4022) loss 5.9483 (6.5834) grad_norm 1.9297 (2.8265) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][150/625] eta 0:03:12 lr 0.000141 wd 0.0500 time 0.5808 (0.4055) data time 0.0010 (0.0036) model time 0.5798 (0.4037) loss 6.9371 (6.5678) grad_norm 4.0848 (2.8533) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][160/625] eta 0:03:10 lr 0.000141 wd 0.0500 time 0.4153 (0.4086) data time 0.0007 (0.0035) model time 0.4146 (0.4084) loss 5.0255 (6.5251) grad_norm 2.5254 (2.8528) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][170/625] eta 0:03:06 lr 0.000140 wd 0.0500 time 0.3954 (0.4107) data time 0.0009 (0.0033) model time 0.3945 (0.4114) loss 7.2376 (6.5621) grad_norm 2.3008 (2.8869) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][180/625] eta 0:03:04 lr 0.000140 wd 0.0500 time 0.6047 (0.4155) data time 0.0007 (0.0032) model time 0.6040 (0.4179) loss 7.0072 (6.5779) grad_norm 3.4337 (2.9006) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][190/625] eta 0:03:02 lr 0.000140 wd 0.0500 time 0.3955 (0.4187) data time 0.0009 (0.0031) model time 0.3947 (0.4220) loss 7.2325 (6.5685) grad_norm 2.2206 (2.9430) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][200/625] eta 0:02:59 lr 0.000140 wd 0.0500 time 0.3956 (0.4222) data time 0.0009 (0.0030) model time 0.3947 (0.4264) loss 6.0552 (6.5493) grad_norm 2.1649 (2.9294) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][210/625] eta 0:02:55 lr 0.000140 wd 0.0500 time 0.3979 (0.4228) data time 0.0006 (0.0029) model time 0.3973 (0.4269) loss 5.8870 (6.5497) grad_norm 2.5452 (2.9835) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][220/625] eta 0:02:51 lr 0.000140 wd 0.0500 time 0.4043 (0.4225) data time 0.0008 (0.0028) model time 0.4035 (0.4262) loss 6.3060 (6.5444) grad_norm 2.9792 (2.9994) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][230/625] eta 0:02:46 lr 0.000140 wd 0.0500 time 0.3921 (0.4214) data time 0.0007 (0.0027) model time 0.3914 (0.4246) loss 7.2161 (6.5338) grad_norm 4.2225 (3.0155) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][240/625] eta 0:02:41 lr 0.000140 wd 0.0500 time 0.3978 (0.4206) data time 0.0007 (0.0026) model time 0.3972 (0.4233) loss 7.4183 (6.5472) grad_norm 2.6915 (3.1045) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:45:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][250/625] eta 0:02:37 lr 0.000140 wd 0.0500 time 0.3972 (0.4196) data time 0.0009 (0.0026) model time 0.3962 (0.4220) loss 7.2910 (6.5477) grad_norm 6.9695 (3.1053) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][260/625] eta 0:02:32 lr 0.000140 wd 0.0500 time 0.3933 (0.4189) data time 0.0006 (0.0025) model time 0.3926 (0.4209) loss 6.7342 (6.5532) grad_norm 3.1460 (3.1007) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][270/625] eta 0:02:28 lr 0.000140 wd 0.0500 time 0.3987 (0.4181) data time 0.0008 (0.0024) model time 0.3979 (0.4198) loss 6.6222 (6.5449) grad_norm 2.0410 (3.0881) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][280/625] eta 0:02:24 lr 0.000140 wd 0.0500 time 0.4348 (0.4175) data time 0.0006 (0.0024) model time 0.4342 (0.4190) loss 7.7091 (6.5493) grad_norm 2.9868 (3.0870) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][290/625] eta 0:02:19 lr 0.000140 wd 0.0500 time 0.4010 (0.4170) data time 0.0009 (0.0024) model time 0.4001 (0.4181) loss 6.0271 (6.5502) grad_norm 5.7944 (3.1303) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][300/625] eta 0:02:15 lr 0.000140 wd 0.0500 time 0.3988 (0.4165) data time 0.0008 (0.0024) model time 0.3980 (0.4174) loss 6.3140 (6.5451) grad_norm 2.8094 (3.1335) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][310/625] eta 0:02:11 lr 0.000140 wd 0.0500 time 0.4012 (0.4160) data time 0.0008 (0.0023) model time 0.4004 (0.4167) loss 5.8188 (6.5469) grad_norm 3.3818 (3.1599) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][320/625] eta 0:02:06 lr 0.000139 wd 0.0500 time 0.3980 (0.4155) data time 0.0008 (0.0023) model time 0.3971 (0.4161) loss 8.0701 (6.5489) grad_norm 51.8274 (3.3253) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][330/625] eta 0:02:02 lr 0.000139 wd 0.0500 time 0.3976 (0.4152) data time 0.0007 (0.0023) model time 0.3969 (0.4156) loss 6.7207 (6.5636) grad_norm 4.6390 (3.3296) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][340/625] eta 0:01:58 lr 0.000139 wd 0.0500 time 0.3999 (0.4148) data time 0.0006 (0.0022) model time 0.3993 (0.4152) loss 7.4212 (6.5634) grad_norm 2.4424 (3.3167) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][350/625] eta 0:01:53 lr 0.000139 wd 0.0500 time 0.3959 (0.4144) data time 0.0006 (0.0022) model time 0.3952 (0.4146) loss 6.2674 (6.5743) grad_norm 2.3589 (3.2900) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][360/625] eta 0:01:49 lr 0.000139 wd 0.0500 time 0.3991 (0.4141) data time 0.0006 (0.0023) model time 0.3985 (0.4141) loss 7.8849 (6.5790) grad_norm 2.4867 (3.2753) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][370/625] eta 0:01:45 lr 0.000139 wd 0.0500 time 0.6041 (0.4145) data time 0.0006 (0.0023) model time 0.6035 (0.4145) loss 7.0587 (6.5842) grad_norm 3.0664 (3.2782) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][380/625] eta 0:01:41 lr 0.000139 wd 0.0500 time 0.5337 (0.4149) data time 0.0006 (0.0022) model time 0.5331 (0.4149) loss 6.2479 (6.5803) grad_norm 3.5581 (3.2574) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:46:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][390/625] eta 0:01:37 lr 0.000139 wd 0.0500 time 0.3992 (0.4150) data time 0.0008 (0.0022) model time 0.3984 (0.4150) loss 6.9121 (6.5759) grad_norm 2.3290 (3.2279) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][400/625] eta 0:01:33 lr 0.000139 wd 0.0500 time 0.3932 (0.4169) data time 0.0009 (0.0022) model time 0.3923 (0.4172) loss 6.8463 (6.5754) grad_norm 2.2912 (3.2163) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][410/625] eta 0:01:29 lr 0.000139 wd 0.0500 time 0.3970 (0.4182) data time 0.0010 (0.0022) model time 0.3960 (0.4185) loss 5.7175 (6.5715) grad_norm 11.5475 (3.2168) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][420/625] eta 0:01:26 lr 0.000139 wd 0.0500 time 0.5928 (0.4202) data time 0.0007 (0.0022) model time 0.5921 (0.4207) loss 6.7560 (6.5607) grad_norm 3.2781 (3.2258) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][430/625] eta 0:01:21 lr 0.000139 wd 0.0500 time 0.4072 (0.4201) data time 0.0006 (0.0021) model time 0.4066 (0.4206) loss 5.3769 (6.5559) grad_norm 2.8653 (3.2216) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][440/625] eta 0:01:17 lr 0.000139 wd 0.0500 time 0.3967 (0.4198) data time 0.0007 (0.0021) model time 0.3960 (0.4202) loss 7.7061 (6.5592) grad_norm 2.4383 (3.2117) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][450/625] eta 0:01:13 lr 0.000139 wd 0.0500 time 0.4011 (0.4196) data time 0.0006 (0.0022) model time 0.4005 (0.4199) loss 6.8780 (6.5553) grad_norm 4.1735 (3.2412) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][460/625] eta 0:01:09 lr 0.000139 wd 0.0500 time 0.3952 (0.4191) data time 0.0009 (0.0022) model time 0.3943 (0.4193) loss 6.8718 (6.5501) grad_norm 3.7970 (3.2426) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][470/625] eta 0:01:04 lr 0.000138 wd 0.0500 time 0.3950 (0.4187) data time 0.0007 (0.0022) model time 0.3943 (0.4188) loss 7.6070 (6.5474) grad_norm 2.0327 (3.2451) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][480/625] eta 0:01:00 lr 0.000138 wd 0.0500 time 0.3958 (0.4183) data time 0.0007 (0.0021) model time 0.3952 (0.4183) loss 7.7391 (6.5465) grad_norm 3.8127 (3.2648) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][490/625] eta 0:00:56 lr 0.000138 wd 0.0500 time 0.4005 (0.4179) data time 0.0006 (0.0021) model time 0.3998 (0.4178) loss 6.7450 (6.5454) grad_norm 3.2168 (3.2493) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][500/625] eta 0:00:52 lr 0.000138 wd 0.0500 time 0.3965 (0.4175) data time 0.0007 (0.0021) model time 0.3958 (0.4173) loss 6.5248 (6.5459) grad_norm 2.1662 (3.2510) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][510/625] eta 0:00:47 lr 0.000138 wd 0.0500 time 0.4010 (0.4171) data time 0.0006 (0.0021) model time 0.4004 (0.4169) loss 6.3021 (6.5457) grad_norm 2.3826 (3.2414) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][520/625] eta 0:00:43 lr 0.000138 wd 0.0500 time 0.3965 (0.4168) data time 0.0008 (0.0020) model time 0.3956 (0.4165) loss 6.1562 (6.5443) grad_norm 2.4422 (3.2329) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][530/625] eta 0:00:39 lr 0.000138 wd 0.0500 time 0.3956 (0.4164) data time 0.0006 (0.0020) model time 0.3950 (0.4162) loss 6.1861 (6.5408) grad_norm 2.3301 (3.2248) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:47:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][540/625] eta 0:00:35 lr 0.000138 wd 0.0500 time 0.3950 (0.4161) data time 0.0009 (0.0020) model time 0.3942 (0.4157) loss 5.9603 (6.5350) grad_norm 3.1044 (3.2275) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][550/625] eta 0:00:31 lr 0.000138 wd 0.0500 time 0.3976 (0.4158) data time 0.0008 (0.0020) model time 0.3968 (0.4154) loss 6.3808 (6.5360) grad_norm 3.8435 (3.2728) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][560/625] eta 0:00:27 lr 0.000138 wd 0.0500 time 0.4003 (0.4155) data time 0.0007 (0.0020) model time 0.3996 (0.4151) loss 6.2664 (6.5340) grad_norm 3.8567 (3.2880) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][570/625] eta 0:00:22 lr 0.000138 wd 0.0500 time 0.3966 (0.4152) data time 0.0006 (0.0019) model time 0.3960 (0.4148) loss 7.9792 (6.5328) grad_norm 2.2339 (3.2948) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][580/625] eta 0:00:18 lr 0.000138 wd 0.0500 time 0.3960 (0.4150) data time 0.0009 (0.0019) model time 0.3952 (0.4145) loss 7.1587 (6.5370) grad_norm 2.3956 (3.2984) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][590/625] eta 0:00:14 lr 0.000138 wd 0.0500 time 0.3951 (0.4150) data time 0.0007 (0.0019) model time 0.3944 (0.4146) loss 5.5527 (6.5401) grad_norm 2.3584 (3.2918) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][600/625] eta 0:00:10 lr 0.000138 wd 0.0500 time 0.6268 (0.4159) data time 0.0007 (0.0019) model time 0.6262 (0.4155) loss 7.8614 (6.5491) grad_norm 2.5199 (3.2871) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][610/625] eta 0:00:06 lr 0.000138 wd 0.0500 time 0.3960 (0.4159) data time 0.0004 (0.0019) model time 0.3956 (0.4155) loss 7.4035 (6.5528) grad_norm 2.7247 (3.2773) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][620/625] eta 0:00:02 lr 0.000137 wd 0.0500 time 0.3972 (0.4166) data time 0.0006 (0.0019) model time 0.3967 (0.4163) loss 6.1121 (6.5559) grad_norm 2.3235 (3.2883) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 240 training takes 0:04:20 [2024-07-25 09:48:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:48:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:48:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5605 (0.5605) Acc@1 89.600 (89.600) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 09:48:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8242 (0.6707) Acc@1 82.617 (87.038) Acc@5 96.973 (97.940) Mem 14939MB [2024-07-25 09:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9365 (0.7780) Acc@1 78.418 (84.087) Acc@5 95.605 (96.959) Mem 14939MB [2024-07-25 09:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.687 Acc@5 96.917 [2024-07-25 09:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-07-25 09:48:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.772 (0.772) Loss 0.5381 (0.5381) Acc@1 90.283 (90.283) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 09:48:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.8218 (0.6612) Acc@1 82.861 (87.118) Acc@5 96.826 (97.936) Mem 14939MB [2024-07-25 09:48:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9351 (0.7672) Acc@1 78.467 (84.198) Acc@5 95.459 (96.933) Mem 14939MB [2024-07-25 09:48:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.797 Acc@5 96.885 [2024-07-25 09:48:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 09:48:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.80% [2024-07-25 09:48:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:48:41 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:48:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][0/625] eta 0:08:25 lr 0.000137 wd 0.0500 time 0.8090 (0.8090) data time 0.4278 (0.4278) model time 0.0000 (0.0000) loss 7.5095 (7.5095) grad_norm 2.8197 (2.8197) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][10/625] eta 0:05:01 lr 0.000137 wd 0.0500 time 0.4048 (0.4909) data time 0.0006 (0.0396) model time 0.0000 (0.0000) loss 6.3444 (6.8480) grad_norm 2.1686 (7.5143) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][20/625] eta 0:04:39 lr 0.000137 wd 0.0500 time 0.4072 (0.4615) data time 0.0009 (0.0211) model time 0.0000 (0.0000) loss 5.8293 (6.6880) grad_norm 5.5694 (6.0852) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][30/625] eta 0:04:25 lr 0.000137 wd 0.0500 time 0.3974 (0.4459) data time 0.0008 (0.0146) model time 0.0000 (0.0000) loss 6.2431 (6.5710) grad_norm 2.1505 (4.9522) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:48:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][40/625] eta 0:04:15 lr 0.000137 wd 0.0500 time 0.4044 (0.4373) data time 0.0006 (0.0112) model time 0.0000 (0.0000) loss 7.1265 (6.6082) grad_norm 6.8415 (4.8581) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][50/625] eta 0:04:06 lr 0.000137 wd 0.0500 time 0.3946 (0.4294) data time 0.0006 (0.0092) model time 0.0000 (0.0000) loss 7.4660 (6.6655) grad_norm 2.7472 (4.5086) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][60/625] eta 0:03:59 lr 0.000137 wd 0.0500 time 0.3991 (0.4245) data time 0.0008 (0.0078) model time 0.3982 (0.3987) loss 5.7678 (6.6169) grad_norm 2.8728 (4.5683) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][70/625] eta 0:03:53 lr 0.000137 wd 0.0500 time 0.4031 (0.4208) data time 0.0007 (0.0068) model time 0.4024 (0.3980) loss 7.2391 (6.6261) grad_norm 2.0478 (4.4442) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][80/625] eta 0:03:47 lr 0.000137 wd 0.0500 time 0.3943 (0.4180) data time 0.0007 (0.0061) model time 0.3936 (0.3978) loss 7.4595 (6.6480) grad_norm 2.5841 (4.4283) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][90/625] eta 0:03:42 lr 0.000137 wd 0.0500 time 0.3956 (0.4159) data time 0.0007 (0.0055) model time 0.3950 (0.3978) loss 6.1783 (6.6592) grad_norm 2.8896 (4.3116) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][100/625] eta 0:03:37 lr 0.000137 wd 0.0500 time 0.3975 (0.4141) data time 0.0006 (0.0051) model time 0.3969 (0.3976) loss 6.8478 (6.6803) grad_norm 4.2537 (4.3532) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][110/625] eta 0:03:32 lr 0.000137 wd 0.0500 time 0.4035 (0.4129) data time 0.0007 (0.0047) model time 0.4028 (0.3980) loss 5.3757 (6.6604) grad_norm 2.4093 (4.2613) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][120/625] eta 0:03:27 lr 0.000137 wd 0.0500 time 0.3973 (0.4117) data time 0.0009 (0.0044) model time 0.3964 (0.3979) loss 6.6902 (6.6700) grad_norm 3.1785 (4.1234) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][130/625] eta 0:03:23 lr 0.000137 wd 0.0500 time 0.4001 (0.4107) data time 0.0007 (0.0041) model time 0.3994 (0.3979) loss 6.3418 (6.6573) grad_norm 3.5957 (4.0286) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][140/625] eta 0:03:18 lr 0.000137 wd 0.0500 time 0.3930 (0.4100) data time 0.0007 (0.0039) model time 0.3922 (0.3981) loss 6.3865 (6.6581) grad_norm 2.3101 (3.9066) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][150/625] eta 0:03:14 lr 0.000136 wd 0.0500 time 0.3979 (0.4091) data time 0.0009 (0.0037) model time 0.3970 (0.3979) loss 6.2108 (6.6675) grad_norm 2.9688 (3.8262) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][160/625] eta 0:03:09 lr 0.000136 wd 0.0500 time 0.3976 (0.4084) data time 0.0007 (0.0035) model time 0.3970 (0.3978) loss 6.6989 (6.6613) grad_norm 4.4555 (3.7704) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][170/625] eta 0:03:05 lr 0.000136 wd 0.0500 time 0.3978 (0.4078) data time 0.0007 (0.0033) model time 0.3971 (0.3977) loss 6.6256 (6.6593) grad_norm 3.3423 (3.7553) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][180/625] eta 0:03:01 lr 0.000136 wd 0.0500 time 0.4021 (0.4073) data time 0.0009 (0.0032) model time 0.4012 (0.3977) loss 6.8474 (6.6569) grad_norm 2.3112 (3.7193) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][190/625] eta 0:02:58 lr 0.000136 wd 0.0500 time 0.3997 (0.4097) data time 0.0009 (0.0031) model time 0.3987 (0.4016) loss 6.5026 (6.6734) grad_norm 2.9448 (3.6681) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][200/625] eta 0:02:54 lr 0.000136 wd 0.0500 time 0.3992 (0.4111) data time 0.0006 (0.0030) model time 0.3986 (0.4040) loss 6.3768 (6.6628) grad_norm 2.4338 (3.6579) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:50:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][210/625] eta 0:02:51 lr 0.000136 wd 0.0500 time 0.3988 (0.4131) data time 0.0007 (0.0029) model time 0.3981 (0.4071) loss 6.6145 (6.6652) grad_norm 1.9108 (3.7005) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:50:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][220/625] eta 0:02:48 lr 0.000136 wd 0.0500 time 0.5627 (0.4151) data time 0.0008 (0.0028) model time 0.5619 (0.4100) loss 7.7845 (6.6638) grad_norm 2.3275 (3.6926) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:50:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][230/625] eta 0:02:44 lr 0.000136 wd 0.0500 time 0.4125 (0.4174) data time 0.0009 (0.0027) model time 0.4116 (0.4131) loss 6.0456 (6.6667) grad_norm 3.0819 (3.6886) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:50:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][240/625] eta 0:02:40 lr 0.000136 wd 0.0500 time 0.4004 (0.4181) data time 0.0009 (0.0026) model time 0.3996 (0.4142) loss 6.7815 (6.6589) grad_norm 2.9794 (3.6618) loss_scale 512.0000 (264.4979) mem 14939MB [2024-07-25 09:50:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][250/625] eta 0:02:36 lr 0.000136 wd 0.0500 time 0.4001 (0.4178) data time 0.0007 (0.0025) model time 0.3994 (0.4140) loss 6.4378 (6.6762) grad_norm 2.3580 (3.6758) loss_scale 512.0000 (274.3586) mem 14939MB [2024-07-25 09:50:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][260/625] eta 0:02:32 lr 0.000136 wd 0.0500 time 0.3976 (0.4176) data time 0.0006 (0.0025) model time 0.3969 (0.4139) loss 7.1260 (6.6674) grad_norm 3.6774 (3.6548) loss_scale 512.0000 (283.4636) mem 14939MB [2024-07-25 09:50:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][270/625] eta 0:02:28 lr 0.000136 wd 0.0500 time 0.3995 (0.4170) data time 0.0006 (0.0024) model time 0.3989 (0.4133) loss 6.5910 (6.6679) grad_norm 8.7697 (3.6455) loss_scale 512.0000 (291.8967) mem 14939MB [2024-07-25 09:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][280/625] eta 0:02:23 lr 0.000136 wd 0.0500 time 0.4058 (0.4165) data time 0.0007 (0.0024) model time 0.4051 (0.4128) loss 6.4894 (6.6666) grad_norm 3.6939 (3.6128) loss_scale 512.0000 (299.7295) mem 14939MB [2024-07-25 09:50:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][290/625] eta 0:02:19 lr 0.000136 wd 0.0500 time 0.4010 (0.4160) data time 0.0007 (0.0023) model time 0.4003 (0.4123) loss 6.7458 (6.6821) grad_norm 2.9124 (3.6587) loss_scale 512.0000 (307.0241) mem 14939MB [2024-07-25 09:50:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][300/625] eta 0:02:15 lr 0.000136 wd 0.0500 time 0.4011 (0.4154) data time 0.0010 (0.0023) model time 0.4002 (0.4117) loss 7.7418 (6.6838) grad_norm 4.1059 (3.6378) loss_scale 512.0000 (313.8339) mem 14939MB [2024-07-25 09:50:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][310/625] eta 0:02:10 lr 0.000135 wd 0.0500 time 0.3998 (0.4150) data time 0.0008 (0.0022) model time 0.3990 (0.4113) loss 6.5415 (6.6761) grad_norm 2.1433 (3.6279) loss_scale 512.0000 (320.2058) mem 14939MB [2024-07-25 09:50:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][320/625] eta 0:02:06 lr 0.000135 wd 0.0500 time 0.4004 (0.4145) data time 0.0007 (0.0022) model time 0.3997 (0.4108) loss 6.9303 (6.6850) grad_norm 3.8209 (3.6061) loss_scale 512.0000 (326.1807) mem 14939MB [2024-07-25 09:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][330/625] eta 0:02:02 lr 0.000135 wd 0.0500 time 0.3989 (0.4141) data time 0.0007 (0.0021) model time 0.3982 (0.4105) loss 6.9246 (6.6818) grad_norm 2.4865 (3.6035) loss_scale 512.0000 (331.7946) mem 14939MB [2024-07-25 09:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][340/625] eta 0:01:57 lr 0.000135 wd 0.0500 time 0.3920 (0.4137) data time 0.0008 (0.0021) model time 0.3912 (0.4101) loss 5.8877 (6.6918) grad_norm 3.7277 (3.5882) loss_scale 512.0000 (337.0792) mem 14939MB [2024-07-25 09:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][350/625] eta 0:01:53 lr 0.000135 wd 0.0500 time 0.3982 (0.4133) data time 0.0010 (0.0021) model time 0.3972 (0.4097) loss 7.1882 (6.6916) grad_norm 5.5909 (3.5943) loss_scale 512.0000 (342.0627) mem 14939MB [2024-07-25 09:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][360/625] eta 0:01:49 lr 0.000135 wd 0.0500 time 0.4001 (0.4130) data time 0.0009 (0.0020) model time 0.3991 (0.4094) loss 6.7490 (6.6871) grad_norm 3.1189 (3.6080) loss_scale 512.0000 (346.7701) mem 14939MB [2024-07-25 09:51:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][370/625] eta 0:01:45 lr 0.000135 wd 0.0500 time 0.3974 (0.4126) data time 0.0009 (0.0020) model time 0.3966 (0.4090) loss 6.8148 (6.6842) grad_norm 3.2184 (3.6199) loss_scale 512.0000 (351.2237) mem 14939MB [2024-07-25 09:51:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][380/625] eta 0:01:40 lr 0.000135 wd 0.0500 time 0.3963 (0.4122) data time 0.0006 (0.0020) model time 0.3957 (0.4087) loss 6.0364 (6.6841) grad_norm 2.8964 (3.5965) loss_scale 512.0000 (355.4436) mem 14939MB [2024-07-25 09:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][390/625] eta 0:01:36 lr 0.000135 wd 0.0500 time 0.4011 (0.4119) data time 0.0009 (0.0020) model time 0.4002 (0.4085) loss 6.2396 (6.6789) grad_norm 2.2641 (3.5702) loss_scale 512.0000 (359.4476) mem 14939MB [2024-07-25 09:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][400/625] eta 0:01:32 lr 0.000135 wd 0.0500 time 0.4011 (0.4116) data time 0.0007 (0.0019) model time 0.4004 (0.4081) loss 5.9615 (6.6814) grad_norm 2.5309 (3.5936) loss_scale 512.0000 (363.2519) mem 14939MB [2024-07-25 09:51:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][410/625] eta 0:01:28 lr 0.000135 wd 0.0500 time 0.6061 (0.4129) data time 0.0007 (0.0019) model time 0.6054 (0.4097) loss 5.5901 (6.6798) grad_norm 2.8551 (3.6459) loss_scale 512.0000 (366.8710) mem 14939MB [2024-07-25 09:51:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][420/625] eta 0:01:24 lr 0.000135 wd 0.0500 time 0.3986 (0.4129) data time 0.0009 (0.0019) model time 0.3977 (0.4098) loss 7.4320 (6.6795) grad_norm 3.0445 (3.6533) loss_scale 512.0000 (370.3183) mem 14939MB [2024-07-25 09:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][430/625] eta 0:01:20 lr 0.000135 wd 0.0500 time 0.3983 (0.4142) data time 0.0007 (0.0019) model time 0.3977 (0.4113) loss 5.3143 (6.6697) grad_norm 2.2790 (3.6530) loss_scale 512.0000 (373.6056) mem 14939MB [2024-07-25 09:51:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][440/625] eta 0:01:16 lr 0.000135 wd 0.0500 time 0.5930 (0.4146) data time 0.0008 (0.0018) model time 0.5922 (0.4118) loss 7.5801 (6.6672) grad_norm 2.5353 (3.6684) loss_scale 512.0000 (376.7438) mem 14939MB [2024-07-25 09:51:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][450/625] eta 0:01:12 lr 0.000135 wd 0.0500 time 0.3998 (0.4155) data time 0.0006 (0.0018) model time 0.3992 (0.4128) loss 7.2772 (6.6631) grad_norm 4.5139 (3.6737) loss_scale 512.0000 (379.7428) mem 14939MB [2024-07-25 09:51:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][460/625] eta 0:01:08 lr 0.000134 wd 0.0500 time 0.3949 (0.4165) data time 0.0007 (0.0018) model time 0.3942 (0.4140) loss 7.6536 (6.6627) grad_norm 1.9279 (3.6675) loss_scale 512.0000 (382.6117) mem 14939MB [2024-07-25 09:51:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][470/625] eta 0:01:04 lr 0.000134 wd 0.0500 time 0.3970 (0.4166) data time 0.0007 (0.0018) model time 0.3964 (0.4141) loss 6.5787 (6.6689) grad_norm 5.0068 (3.7039) loss_scale 512.0000 (385.3588) mem 14939MB [2024-07-25 09:52:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][480/625] eta 0:01:00 lr 0.000134 wd 0.0500 time 0.3985 (0.4165) data time 0.0009 (0.0017) model time 0.3976 (0.4140) loss 6.4666 (6.6672) grad_norm 3.9394 (inf) loss_scale 256.0000 (385.8628) mem 14939MB [2024-07-25 09:52:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][490/625] eta 0:00:56 lr 0.000134 wd 0.0500 time 0.4010 (0.4161) data time 0.0010 (0.0017) model time 0.4000 (0.4137) loss 6.5123 (6.6672) grad_norm 2.3134 (inf) loss_scale 256.0000 (383.2179) mem 14939MB [2024-07-25 09:52:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][500/625] eta 0:00:51 lr 0.000134 wd 0.0500 time 0.3993 (0.4157) data time 0.0006 (0.0017) model time 0.3986 (0.4133) loss 6.5139 (6.6637) grad_norm 2.2429 (inf) loss_scale 256.0000 (380.6786) mem 14939MB [2024-07-25 09:52:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][510/625] eta 0:00:47 lr 0.000134 wd 0.0500 time 0.3963 (0.4154) data time 0.0006 (0.0017) model time 0.3957 (0.4130) loss 6.3665 (6.6676) grad_norm 2.8568 (inf) loss_scale 256.0000 (378.2387) mem 14939MB [2024-07-25 09:52:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][520/625] eta 0:00:43 lr 0.000134 wd 0.0500 time 0.4156 (0.4151) data time 0.0006 (0.0017) model time 0.4150 (0.4127) loss 7.7997 (6.6681) grad_norm 2.7758 (inf) loss_scale 256.0000 (375.8925) mem 14939MB [2024-07-25 09:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][530/625] eta 0:00:39 lr 0.000134 wd 0.0500 time 0.4004 (0.4147) data time 0.0009 (0.0017) model time 0.3995 (0.4123) loss 6.9485 (6.6618) grad_norm 4.1716 (inf) loss_scale 256.0000 (373.6347) mem 14939MB [2024-07-25 09:52:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][540/625] eta 0:00:35 lr 0.000134 wd 0.0500 time 0.4043 (0.4145) data time 0.0009 (0.0017) model time 0.4034 (0.4121) loss 6.3703 (6.6599) grad_norm 2.8061 (inf) loss_scale 256.0000 (371.4603) mem 14939MB [2024-07-25 09:52:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][550/625] eta 0:00:31 lr 0.000134 wd 0.0500 time 0.3998 (0.4142) data time 0.0006 (0.0016) model time 0.3992 (0.4118) loss 6.1841 (6.6535) grad_norm 2.1362 (inf) loss_scale 256.0000 (369.3648) mem 14939MB [2024-07-25 09:52:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][560/625] eta 0:00:26 lr 0.000134 wd 0.0500 time 0.3938 (0.4139) data time 0.0009 (0.0016) model time 0.3929 (0.4115) loss 7.5103 (6.6495) grad_norm 5.1185 (inf) loss_scale 256.0000 (367.3440) mem 14939MB [2024-07-25 09:52:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][570/625] eta 0:00:22 lr 0.000134 wd 0.0500 time 0.3989 (0.4137) data time 0.0009 (0.0016) model time 0.3980 (0.4113) loss 6.2956 (6.6475) grad_norm 2.4901 (inf) loss_scale 256.0000 (365.3940) mem 14939MB [2024-07-25 09:52:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][580/625] eta 0:00:18 lr 0.000134 wd 0.0500 time 0.4000 (0.4134) data time 0.0008 (0.0016) model time 0.3992 (0.4110) loss 5.5373 (6.6418) grad_norm 10.4710 (inf) loss_scale 256.0000 (363.5112) mem 14939MB [2024-07-25 09:52:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][590/625] eta 0:00:14 lr 0.000134 wd 0.0500 time 0.4010 (0.4132) data time 0.0008 (0.0016) model time 0.4003 (0.4108) loss 7.2786 (6.6417) grad_norm 2.3642 (inf) loss_scale 256.0000 (361.6920) mem 14939MB [2024-07-25 09:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][600/625] eta 0:00:10 lr 0.000134 wd 0.0500 time 0.3954 (0.4129) data time 0.0006 (0.0016) model time 0.3948 (0.4106) loss 6.6413 (6.6465) grad_norm 2.6062 (inf) loss_scale 256.0000 (359.9334) mem 14939MB [2024-07-25 09:52:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][610/625] eta 0:00:06 lr 0.000133 wd 0.0500 time 0.4015 (0.4128) data time 0.0006 (0.0016) model time 0.4010 (0.4104) loss 7.5279 (6.6481) grad_norm 5.3241 (inf) loss_scale 256.0000 (358.2324) mem 14939MB [2024-07-25 09:52:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][620/625] eta 0:00:02 lr 0.000133 wd 0.0500 time 0.3957 (0.4125) data time 0.0006 (0.0015) model time 0.3951 (0.4102) loss 6.4814 (6.6482) grad_norm 3.2853 (inf) loss_scale 256.0000 (356.5862) mem 14939MB [2024-07-25 09:52:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 241 training takes 0:04:17 [2024-07-25 09:52:59 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:53:00 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:53:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5303 (0.5303) Acc@1 90.576 (90.576) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 09:53:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.126) Loss 0.8267 (0.6593) Acc@1 82.715 (87.145) Acc@5 96.631 (97.900) Mem 14939MB [2024-07-25 09:53:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.107) Loss 0.9209 (0.7654) Acc@1 78.906 (84.194) Acc@5 95.361 (96.882) Mem 14939MB [2024-07-25 09:53:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.825 Acc@5 96.861 [2024-07-25 09:53:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 09:53:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.83% [2024-07-25 09:53:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 09:53:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 09:53:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 0.5386 (0.5386) Acc@1 90.234 (90.234) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 09:53:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.118) Loss 0.8203 (0.6608) Acc@1 82.861 (87.132) Acc@5 96.875 (97.954) Mem 14939MB [2024-07-25 09:53:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9336 (0.7667) Acc@1 78.564 (84.210) Acc@5 95.459 (96.935) Mem 14939MB [2024-07-25 09:53:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.805 Acc@5 96.889 [2024-07-25 09:53:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 09:53:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.81% [2024-07-25 09:53:06 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:53:07 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:53:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][0/625] eta 0:07:38 lr 0.000133 wd 0.0500 time 0.7339 (0.7339) data time 0.3485 (0.3485) model time 0.0000 (0.0000) loss 6.3185 (6.3185) grad_norm 2.7764 (2.7764) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][10/625] eta 0:04:42 lr 0.000133 wd 0.0500 time 0.3980 (0.4600) data time 0.0007 (0.0324) model time 0.0000 (0.0000) loss 6.2562 (6.2610) grad_norm 2.0546 (2.7161) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][20/625] eta 0:04:25 lr 0.000133 wd 0.0500 time 0.4000 (0.4395) data time 0.0008 (0.0174) model time 0.0000 (0.0000) loss 6.6222 (6.5495) grad_norm 2.9977 (2.6758) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][30/625] eta 0:04:27 lr 0.000133 wd 0.0500 time 0.4019 (0.4488) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 6.6491 (6.5724) grad_norm 1.9634 (2.8015) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][40/625] eta 0:04:22 lr 0.000133 wd 0.0500 time 0.4089 (0.4495) data time 0.0007 (0.0093) model time 0.0000 (0.0000) loss 6.2998 (6.6127) grad_norm 3.4941 (3.4487) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][50/625] eta 0:04:20 lr 0.000133 wd 0.0500 time 0.5954 (0.4536) data time 0.0006 (0.0079) model time 0.0000 (0.0000) loss 6.0172 (6.6258) grad_norm 3.3301 (3.4493) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][60/625] eta 0:04:14 lr 0.000133 wd 0.0500 time 0.3972 (0.4509) data time 0.0006 (0.0068) model time 0.3966 (0.4363) loss 6.6475 (6.6481) grad_norm 3.7205 (3.3715) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][70/625] eta 0:04:08 lr 0.000133 wd 0.0500 time 0.5473 (0.4476) data time 0.0007 (0.0059) model time 0.5466 (0.4313) loss 6.2784 (6.6830) grad_norm 2.4012 (3.2479) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][80/625] eta 0:04:00 lr 0.000133 wd 0.0500 time 0.3993 (0.4417) data time 0.0008 (0.0053) model time 0.3985 (0.4205) loss 6.0651 (6.6594) grad_norm 2.8844 (3.3026) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][90/625] eta 0:03:53 lr 0.000133 wd 0.0500 time 0.4013 (0.4372) data time 0.0008 (0.0048) model time 0.4006 (0.4153) loss 6.2845 (6.6455) grad_norm 6.2500 (3.2312) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][100/625] eta 0:03:47 lr 0.000133 wd 0.0500 time 0.3979 (0.4338) data time 0.0006 (0.0044) model time 0.3974 (0.4128) loss 6.8775 (6.6118) grad_norm 7.1790 (3.3352) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][110/625] eta 0:03:41 lr 0.000133 wd 0.0500 time 0.3970 (0.4305) data time 0.0008 (0.0041) model time 0.3962 (0.4101) loss 7.1224 (6.6479) grad_norm 4.0766 (3.4182) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:53:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][120/625] eta 0:03:36 lr 0.000133 wd 0.0500 time 0.3965 (0.4279) data time 0.0008 (0.0038) model time 0.3957 (0.4083) loss 7.0249 (6.6609) grad_norm 14.8952 (3.6265) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][130/625] eta 0:03:30 lr 0.000133 wd 0.0500 time 0.4003 (0.4256) data time 0.0007 (0.0036) model time 0.3996 (0.4068) loss 7.2072 (6.6559) grad_norm 2.8485 (3.5438) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][140/625] eta 0:03:26 lr 0.000132 wd 0.0500 time 0.4051 (0.4257) data time 0.0008 (0.0034) model time 0.4043 (0.4090) loss 7.1319 (6.6850) grad_norm 3.2464 (3.5142) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][150/625] eta 0:03:21 lr 0.000132 wd 0.0500 time 0.4008 (0.4240) data time 0.0007 (0.0032) model time 0.4001 (0.4080) loss 5.3854 (6.6732) grad_norm 2.3307 (3.4992) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][160/625] eta 0:03:16 lr 0.000132 wd 0.0500 time 0.3981 (0.4225) data time 0.0007 (0.0031) model time 0.3974 (0.4072) loss 6.6629 (6.6644) grad_norm 4.3910 (3.4869) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][170/625] eta 0:03:11 lr 0.000132 wd 0.0500 time 0.3922 (0.4211) data time 0.0008 (0.0030) model time 0.3914 (0.4064) loss 6.5375 (6.6613) grad_norm 1.8539 (3.4157) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][180/625] eta 0:03:07 lr 0.000132 wd 0.0500 time 0.5063 (0.4204) data time 0.0008 (0.0029) model time 0.5055 (0.4066) loss 7.3843 (6.6564) grad_norm 1.9075 (3.3848) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][190/625] eta 0:03:02 lr 0.000132 wd 0.0500 time 0.3984 (0.4200) data time 0.0007 (0.0028) model time 0.3977 (0.4068) loss 7.6730 (6.6703) grad_norm 2.6612 (3.3335) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][200/625] eta 0:02:58 lr 0.000132 wd 0.0500 time 0.4008 (0.4189) data time 0.0007 (0.0027) model time 0.4002 (0.4063) loss 5.5473 (6.6702) grad_norm 3.1177 (3.3206) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][210/625] eta 0:02:53 lr 0.000132 wd 0.0500 time 0.4034 (0.4182) data time 0.0006 (0.0026) model time 0.4027 (0.4060) loss 5.4042 (6.6562) grad_norm 3.9254 (3.3178) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][220/625] eta 0:02:49 lr 0.000132 wd 0.0500 time 0.3965 (0.4182) data time 0.0006 (0.0025) model time 0.3959 (0.4067) loss 5.8913 (6.6459) grad_norm 2.2606 (3.3116) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][230/625] eta 0:02:45 lr 0.000132 wd 0.0500 time 0.4056 (0.4191) data time 0.0006 (0.0024) model time 0.4049 (0.4084) loss 7.1131 (6.6515) grad_norm 2.4780 (3.3175) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][240/625] eta 0:02:41 lr 0.000132 wd 0.0500 time 0.6037 (0.4198) data time 0.0008 (0.0024) model time 0.6029 (0.4099) loss 7.0701 (6.6557) grad_norm 8.6994 (3.3098) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][250/625] eta 0:02:38 lr 0.000132 wd 0.0500 time 0.5706 (0.4224) data time 0.0006 (0.0023) model time 0.5700 (0.4136) loss 7.5882 (6.6561) grad_norm 2.4069 (3.3248) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:54:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][260/625] eta 0:02:34 lr 0.000132 wd 0.0500 time 0.3965 (0.4230) data time 0.0008 (0.0022) model time 0.3957 (0.4148) loss 6.7019 (6.6586) grad_norm 3.6523 (3.2952) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][270/625] eta 0:02:30 lr 0.000132 wd 0.0500 time 0.5970 (0.4244) data time 0.0007 (0.0022) model time 0.5963 (0.4168) loss 6.6976 (6.6567) grad_norm 3.2017 (3.2936) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][280/625] eta 0:02:26 lr 0.000132 wd 0.0500 time 0.3975 (0.4247) data time 0.0008 (0.0021) model time 0.3967 (0.4174) loss 7.4194 (6.6499) grad_norm 2.2080 (3.2722) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][290/625] eta 0:02:22 lr 0.000132 wd 0.0500 time 0.3971 (0.4243) data time 0.0008 (0.0021) model time 0.3963 (0.4171) loss 7.6286 (6.6448) grad_norm 3.3522 (3.2545) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][300/625] eta 0:02:17 lr 0.000131 wd 0.0500 time 0.3989 (0.4240) data time 0.0008 (0.0021) model time 0.3981 (0.4171) loss 6.5226 (6.6493) grad_norm 2.4739 (3.2394) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][310/625] eta 0:02:13 lr 0.000131 wd 0.0500 time 0.4094 (0.4233) data time 0.0007 (0.0020) model time 0.4087 (0.4165) loss 7.1531 (6.6612) grad_norm 2.8378 (3.2428) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][320/625] eta 0:02:08 lr 0.000131 wd 0.0500 time 0.3978 (0.4225) data time 0.0008 (0.0020) model time 0.3970 (0.4157) loss 7.6248 (6.6675) grad_norm 3.2070 (3.2297) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][330/625] eta 0:02:04 lr 0.000131 wd 0.0500 time 0.4017 (0.4218) data time 0.0007 (0.0020) model time 0.4010 (0.4152) loss 5.8727 (6.6621) grad_norm 3.4548 (3.3279) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][340/625] eta 0:02:00 lr 0.000131 wd 0.0500 time 0.3938 (0.4212) data time 0.0009 (0.0019) model time 0.3929 (0.4146) loss 6.9398 (6.6656) grad_norm 5.4246 (3.3327) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][350/625] eta 0:01:55 lr 0.000131 wd 0.0500 time 0.3990 (0.4206) data time 0.0008 (0.0019) model time 0.3981 (0.4141) loss 7.0343 (6.6700) grad_norm 1.9687 (3.3670) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][360/625] eta 0:01:51 lr 0.000131 wd 0.0500 time 0.4001 (0.4204) data time 0.0008 (0.0019) model time 0.3993 (0.4141) loss 6.8445 (6.6690) grad_norm 2.6761 (3.4949) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][370/625] eta 0:01:47 lr 0.000131 wd 0.0500 time 0.3980 (0.4199) data time 0.0007 (0.0020) model time 0.3973 (0.4135) loss 7.2228 (6.6571) grad_norm 3.5868 (3.4887) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][380/625] eta 0:01:42 lr 0.000131 wd 0.0500 time 0.3989 (0.4194) data time 0.0006 (0.0020) model time 0.3983 (0.4131) loss 7.9159 (6.6514) grad_norm 3.6492 (3.4816) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][390/625] eta 0:01:38 lr 0.000131 wd 0.0500 time 0.4023 (0.4190) data time 0.0009 (0.0019) model time 0.4014 (0.4127) loss 6.6306 (6.6476) grad_norm 2.7127 (3.4699) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][400/625] eta 0:01:34 lr 0.000131 wd 0.0500 time 0.3995 (0.4186) data time 0.0007 (0.0019) model time 0.3988 (0.4124) loss 7.1749 (6.6547) grad_norm 1.8853 (3.4443) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:55:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][410/625] eta 0:01:29 lr 0.000131 wd 0.0500 time 0.4026 (0.4181) data time 0.0008 (0.0019) model time 0.4017 (0.4121) loss 6.9584 (6.6462) grad_norm 3.2391 (3.4378) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][420/625] eta 0:01:25 lr 0.000131 wd 0.0500 time 0.3997 (0.4177) data time 0.0008 (0.0019) model time 0.3988 (0.4117) loss 7.1825 (6.6434) grad_norm 3.4222 (3.4306) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][430/625] eta 0:01:21 lr 0.000131 wd 0.0500 time 0.3978 (0.4173) data time 0.0007 (0.0018) model time 0.3971 (0.4114) loss 7.2597 (6.6396) grad_norm 3.2588 (3.4221) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][440/625] eta 0:01:17 lr 0.000131 wd 0.0500 time 0.3973 (0.4173) data time 0.0007 (0.0018) model time 0.3966 (0.4115) loss 6.7694 (6.6300) grad_norm 5.7148 (3.4267) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][450/625] eta 0:01:13 lr 0.000131 wd 0.0500 time 0.3983 (0.4178) data time 0.0006 (0.0018) model time 0.3977 (0.4122) loss 6.1652 (6.6226) grad_norm 13.2785 (3.4521) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][460/625] eta 0:01:08 lr 0.000130 wd 0.0500 time 0.4032 (0.4178) data time 0.0009 (0.0018) model time 0.4024 (0.4124) loss 6.4345 (6.6210) grad_norm 3.0247 (3.4630) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][470/625] eta 0:01:05 lr 0.000130 wd 0.0500 time 0.4004 (0.4194) data time 0.0010 (0.0018) model time 0.3994 (0.4143) loss 6.3184 (6.6140) grad_norm 3.1247 (3.4548) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][480/625] eta 0:01:00 lr 0.000130 wd 0.0500 time 0.4057 (0.4198) data time 0.0007 (0.0017) model time 0.4051 (0.4148) loss 5.1960 (6.6053) grad_norm 2.0434 (3.4509) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][490/625] eta 0:00:56 lr 0.000130 wd 0.0500 time 0.6011 (0.4209) data time 0.0006 (0.0017) model time 0.6005 (0.4161) loss 5.4678 (6.6074) grad_norm 3.7324 (3.4707) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][500/625] eta 0:00:52 lr 0.000130 wd 0.0500 time 0.3959 (0.4209) data time 0.0006 (0.0017) model time 0.3953 (0.4162) loss 7.0834 (6.6136) grad_norm 4.5215 (3.4702) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][510/625] eta 0:00:48 lr 0.000130 wd 0.0500 time 0.3954 (0.4207) data time 0.0008 (0.0017) model time 0.3946 (0.4161) loss 6.7476 (6.6156) grad_norm 2.7316 (3.4811) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][520/625] eta 0:00:44 lr 0.000130 wd 0.0500 time 0.4329 (0.4207) data time 0.0006 (0.0017) model time 0.4322 (0.4161) loss 6.7848 (6.6138) grad_norm 2.2109 (3.4800) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][530/625] eta 0:00:39 lr 0.000130 wd 0.0500 time 0.3944 (0.4203) data time 0.0006 (0.0017) model time 0.3938 (0.4157) loss 6.1823 (6.6112) grad_norm 4.4112 (3.4738) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 09:56:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][540/625] eta 0:00:35 lr 0.000130 wd 0.0500 time 0.3937 (0.4198) data time 0.0009 (0.0016) model time 0.3929 (0.4153) loss 5.9832 (6.6105) grad_norm 3.9136 (inf) loss_scale 128.0000 (255.0536) mem 14939MB [2024-07-25 09:56:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][550/625] eta 0:00:31 lr 0.000130 wd 0.0500 time 0.4006 (0.4196) data time 0.0007 (0.0016) model time 0.4000 (0.4152) loss 7.2600 (6.6192) grad_norm 4.3035 (inf) loss_scale 128.0000 (252.7477) mem 14939MB [2024-07-25 09:57:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][560/625] eta 0:00:27 lr 0.000130 wd 0.0500 time 0.4021 (0.4193) data time 0.0008 (0.0016) model time 0.4013 (0.4149) loss 6.1754 (6.6239) grad_norm 2.6371 (inf) loss_scale 128.0000 (250.5241) mem 14939MB [2024-07-25 09:57:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][570/625] eta 0:00:23 lr 0.000130 wd 0.0500 time 0.3954 (0.4190) data time 0.0011 (0.0016) model time 0.3943 (0.4146) loss 6.9951 (6.6222) grad_norm 2.5360 (inf) loss_scale 128.0000 (248.3783) mem 14939MB [2024-07-25 09:57:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][580/625] eta 0:00:18 lr 0.000130 wd 0.0500 time 0.3780 (0.4190) data time 0.0010 (0.0016) model time 0.3770 (0.4146) loss 7.1927 (6.6206) grad_norm 3.0885 (inf) loss_scale 128.0000 (246.3064) mem 14939MB [2024-07-25 09:57:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][590/625] eta 0:00:14 lr 0.000130 wd 0.0500 time 0.3998 (0.4186) data time 0.0009 (0.0016) model time 0.3989 (0.4144) loss 7.9957 (6.6097) grad_norm 3.1362 (inf) loss_scale 128.0000 (244.3046) mem 14939MB [2024-07-25 09:57:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][600/625] eta 0:00:10 lr 0.000130 wd 0.0500 time 0.4026 (0.4184) data time 0.0008 (0.0016) model time 0.4018 (0.4142) loss 5.8483 (6.6052) grad_norm 4.5248 (inf) loss_scale 128.0000 (242.3694) mem 14939MB [2024-07-25 09:57:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][610/625] eta 0:00:06 lr 0.000129 wd 0.0500 time 0.4037 (0.4181) data time 0.0005 (0.0016) model time 0.4031 (0.4139) loss 7.2268 (6.6037) grad_norm 3.5144 (inf) loss_scale 128.0000 (240.4975) mem 14939MB [2024-07-25 09:57:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][620/625] eta 0:00:02 lr 0.000129 wd 0.0500 time 0.3952 (0.4179) data time 0.0006 (0.0015) model time 0.3946 (0.4137) loss 6.0645 (6.6076) grad_norm 1.8989 (inf) loss_scale 128.0000 (238.6860) mem 14939MB [2024-07-25 09:57:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 242 training takes 0:04:21 [2024-07-25 09:57:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 09:57:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 09:57:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.439 (0.439) Loss 0.5317 (0.5317) Acc@1 90.576 (90.576) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 09:57:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.7983 (0.6521) Acc@1 82.861 (87.367) Acc@5 96.826 (97.914) Mem 14939MB [2024-07-25 09:57:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9258 (0.7614) Acc@1 78.125 (84.273) Acc@5 95.947 (96.938) Mem 14939MB [2024-07-25 09:57:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.861 Acc@5 96.909 [2024-07-25 09:57:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 09:57:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.86% [2024-07-25 09:57:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 09:57:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 09:57:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5386 (0.5386) Acc@1 90.283 (90.283) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 09:57:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8198 (0.6605) Acc@1 82.959 (87.149) Acc@5 96.875 (97.954) Mem 14939MB [2024-07-25 09:57:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9316 (0.7662) Acc@1 78.467 (84.229) Acc@5 95.557 (96.945) Mem 14939MB [2024-07-25 09:57:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.821 Acc@5 96.895 [2024-07-25 09:57:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 09:57:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.82% [2024-07-25 09:57:35 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 09:57:36 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 09:57:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][0/625] eta 0:07:57 lr 0.000129 wd 0.0500 time 0.7638 (0.7638) data time 0.3785 (0.3785) model time 0.0000 (0.0000) loss 6.3564 (6.3564) grad_norm 2.6580 (2.6580) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:57:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][10/625] eta 0:04:27 lr 0.000129 wd 0.0500 time 0.3975 (0.4350) data time 0.0007 (0.0353) model time 0.0000 (0.0000) loss 7.1863 (6.5923) grad_norm 2.6124 (2.5904) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:57:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][20/625] eta 0:04:14 lr 0.000129 wd 0.0500 time 0.3986 (0.4202) data time 0.0007 (0.0189) model time 0.0000 (0.0000) loss 5.4615 (6.5214) grad_norm 3.8937 (3.6259) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:57:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][30/625] eta 0:04:09 lr 0.000129 wd 0.0500 time 0.5596 (0.4189) data time 0.0007 (0.0130) model time 0.0000 (0.0000) loss 7.0626 (6.6875) grad_norm 2.1682 (4.0164) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:57:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][40/625] eta 0:04:04 lr 0.000129 wd 0.0500 time 0.3957 (0.4183) data time 0.0008 (0.0101) model time 0.0000 (0.0000) loss 6.8015 (6.6896) grad_norm 9.8913 (4.0210) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:57:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][50/625] eta 0:04:02 lr 0.000129 wd 0.0500 time 0.3977 (0.4220) data time 0.0009 (0.0083) model time 0.0000 (0.0000) loss 6.3673 (6.6579) grad_norm 2.0419 (3.8653) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][60/625] eta 0:03:59 lr 0.000129 wd 0.0500 time 0.3967 (0.4246) data time 0.0007 (0.0071) model time 0.3960 (0.4370) loss 7.1053 (6.6873) grad_norm 3.6470 (4.0149) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][70/625] eta 0:04:00 lr 0.000129 wd 0.0500 time 0.5758 (0.4334) data time 0.0006 (0.0062) model time 0.5752 (0.4615) loss 7.4112 (6.6870) grad_norm 32.2655 (4.3141) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][80/625] eta 0:03:55 lr 0.000129 wd 0.0500 time 0.3984 (0.4326) data time 0.0009 (0.0056) model time 0.3976 (0.4496) loss 7.3812 (6.6616) grad_norm 2.1696 (4.7773) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][90/625] eta 0:03:54 lr 0.000129 wd 0.0500 time 0.5251 (0.4381) data time 0.0008 (0.0050) model time 0.5243 (0.4578) loss 5.6556 (6.6274) grad_norm 3.7149 (4.7930) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][100/625] eta 0:03:47 lr 0.000129 wd 0.0500 time 0.4022 (0.4343) data time 0.0007 (0.0046) model time 0.4016 (0.4459) loss 5.8735 (6.5794) grad_norm 2.3820 (4.6440) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][110/625] eta 0:03:43 lr 0.000129 wd 0.0500 time 0.3946 (0.4336) data time 0.0008 (0.0043) model time 0.3938 (0.4425) loss 7.1021 (6.5946) grad_norm 2.4829 (4.4631) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][120/625] eta 0:03:37 lr 0.000129 wd 0.0500 time 0.3997 (0.4308) data time 0.0007 (0.0040) model time 0.3990 (0.4363) loss 6.2548 (6.5799) grad_norm 8.5084 (4.4513) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][130/625] eta 0:03:32 lr 0.000129 wd 0.0500 time 0.3994 (0.4284) data time 0.0008 (0.0038) model time 0.3986 (0.4316) loss 6.1676 (6.6056) grad_norm 4.2909 (4.3766) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][140/625] eta 0:03:26 lr 0.000129 wd 0.0500 time 0.3984 (0.4264) data time 0.0008 (0.0036) model time 0.3976 (0.4280) loss 5.9297 (6.6161) grad_norm 2.4617 (4.2665) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][150/625] eta 0:03:21 lr 0.000128 wd 0.0500 time 0.4008 (0.4247) data time 0.0008 (0.0034) model time 0.4000 (0.4252) loss 5.7353 (6.5941) grad_norm 3.5101 (4.1782) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][160/625] eta 0:03:16 lr 0.000128 wd 0.0500 time 0.3984 (0.4232) data time 0.0009 (0.0032) model time 0.3975 (0.4229) loss 6.0928 (6.5757) grad_norm 3.7529 (4.1493) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][170/625] eta 0:03:11 lr 0.000128 wd 0.0500 time 0.3965 (0.4218) data time 0.0008 (0.0031) model time 0.3957 (0.4208) loss 7.0614 (6.5548) grad_norm 3.7874 (4.1317) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][180/625] eta 0:03:07 lr 0.000128 wd 0.0500 time 0.4006 (0.4208) data time 0.0007 (0.0030) model time 0.3999 (0.4194) loss 6.3213 (6.5552) grad_norm 4.3008 (4.0596) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:58:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][190/625] eta 0:03:02 lr 0.000128 wd 0.0500 time 0.3951 (0.4196) data time 0.0006 (0.0029) model time 0.3945 (0.4179) loss 7.3862 (6.5538) grad_norm 2.2906 (4.0390) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][200/625] eta 0:02:57 lr 0.000128 wd 0.0500 time 0.4017 (0.4188) data time 0.0006 (0.0028) model time 0.4011 (0.4168) loss 7.5652 (6.5577) grad_norm 2.8904 (3.9498) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][210/625] eta 0:02:53 lr 0.000128 wd 0.0500 time 0.4182 (0.4180) data time 0.0007 (0.0027) model time 0.4175 (0.4159) loss 7.1610 (6.5671) grad_norm 2.2062 (3.9386) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][220/625] eta 0:02:48 lr 0.000128 wd 0.0500 time 0.4020 (0.4172) data time 0.0007 (0.0026) model time 0.4013 (0.4149) loss 6.6789 (6.5568) grad_norm 2.8608 (3.8926) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][230/625] eta 0:02:44 lr 0.000128 wd 0.0500 time 0.3943 (0.4165) data time 0.0008 (0.0025) model time 0.3935 (0.4140) loss 5.2151 (6.5394) grad_norm 2.0813 (3.9027) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][240/625] eta 0:02:40 lr 0.000128 wd 0.0500 time 0.4038 (0.4158) data time 0.0009 (0.0024) model time 0.4029 (0.4132) loss 6.6623 (6.5366) grad_norm 3.2666 (3.8731) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][250/625] eta 0:02:35 lr 0.000128 wd 0.0500 time 0.3988 (0.4151) data time 0.0007 (0.0024) model time 0.3980 (0.4124) loss 5.7201 (6.5281) grad_norm 3.0688 (3.8269) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][260/625] eta 0:02:31 lr 0.000128 wd 0.0500 time 0.4120 (0.4152) data time 0.0007 (0.0023) model time 0.4113 (0.4127) loss 6.4405 (6.5483) grad_norm 2.3044 (3.7959) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][270/625] eta 0:02:27 lr 0.000128 wd 0.0500 time 0.3980 (0.4161) data time 0.0007 (0.0023) model time 0.3973 (0.4138) loss 6.5629 (6.5462) grad_norm 2.8082 (3.8314) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][280/625] eta 0:02:23 lr 0.000128 wd 0.0500 time 0.3955 (0.4161) data time 0.0009 (0.0022) model time 0.3945 (0.4139) loss 6.8478 (6.5526) grad_norm 3.3155 (3.8573) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][290/625] eta 0:02:20 lr 0.000128 wd 0.0500 time 0.3967 (0.4182) data time 0.0009 (0.0022) model time 0.3958 (0.4164) loss 5.9946 (6.5550) grad_norm 4.4184 (3.9777) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][300/625] eta 0:02:16 lr 0.000127 wd 0.0500 time 0.3952 (0.4200) data time 0.0009 (0.0022) model time 0.3943 (0.4187) loss 7.0053 (6.5604) grad_norm 3.2766 (3.9740) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][310/625] eta 0:02:12 lr 0.000127 wd 0.0500 time 0.6115 (0.4219) data time 0.0008 (0.0021) model time 0.6107 (0.4209) loss 6.3229 (6.5559) grad_norm 2.3810 (3.9526) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][320/625] eta 0:02:08 lr 0.000127 wd 0.0500 time 0.5439 (0.4219) data time 0.0007 (0.0021) model time 0.5432 (0.4209) loss 5.8464 (6.5606) grad_norm 3.3686 (3.9188) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][330/625] eta 0:02:04 lr 0.000127 wd 0.0500 time 0.3983 (0.4215) data time 0.0006 (0.0020) model time 0.3977 (0.4205) loss 7.1482 (6.5680) grad_norm 3.6871 (4.0250) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 09:59:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][340/625] eta 0:01:59 lr 0.000127 wd 0.0500 time 0.3973 (0.4209) data time 0.0007 (0.0020) model time 0.3965 (0.4198) loss 6.4146 (6.5606) grad_norm 2.9923 (4.0068) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][350/625] eta 0:01:55 lr 0.000127 wd 0.0500 time 0.3998 (0.4202) data time 0.0008 (0.0020) model time 0.3990 (0.4190) loss 7.7284 (6.5655) grad_norm 2.3113 (4.0087) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][360/625] eta 0:01:51 lr 0.000127 wd 0.0500 time 0.3963 (0.4196) data time 0.0007 (0.0019) model time 0.3957 (0.4183) loss 5.0690 (6.5646) grad_norm 1.8751 (3.9725) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][370/625] eta 0:01:46 lr 0.000127 wd 0.0500 time 0.4020 (0.4190) data time 0.0007 (0.0019) model time 0.4013 (0.4177) loss 7.3623 (6.5676) grad_norm 3.0697 (3.9498) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][380/625] eta 0:01:42 lr 0.000127 wd 0.0500 time 0.3958 (0.4185) data time 0.0009 (0.0019) model time 0.3949 (0.4171) loss 6.7401 (6.5652) grad_norm 3.1369 (3.9431) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][390/625] eta 0:01:38 lr 0.000127 wd 0.0500 time 0.3955 (0.4180) data time 0.0009 (0.0019) model time 0.3946 (0.4165) loss 7.1822 (6.5661) grad_norm 3.7862 (3.9704) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][400/625] eta 0:01:33 lr 0.000127 wd 0.0500 time 0.4060 (0.4176) data time 0.0006 (0.0018) model time 0.4054 (0.4160) loss 6.5246 (6.5681) grad_norm 2.6785 (3.9552) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][410/625] eta 0:01:29 lr 0.000127 wd 0.0500 time 0.4033 (0.4171) data time 0.0007 (0.0018) model time 0.4025 (0.4155) loss 6.3145 (6.5682) grad_norm 14.4099 (3.9909) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][420/625] eta 0:01:25 lr 0.000127 wd 0.0500 time 0.3973 (0.4167) data time 0.0009 (0.0018) model time 0.3964 (0.4150) loss 8.3398 (6.5734) grad_norm 2.5797 (3.9780) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][430/625] eta 0:01:21 lr 0.000127 wd 0.0500 time 0.3949 (0.4162) data time 0.0007 (0.0018) model time 0.3943 (0.4146) loss 6.0322 (6.5654) grad_norm 6.2297 (3.9602) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][440/625] eta 0:01:16 lr 0.000127 wd 0.0500 time 0.3976 (0.4158) data time 0.0007 (0.0018) model time 0.3969 (0.4141) loss 5.8443 (6.5628) grad_norm 3.2141 (3.9456) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][450/625] eta 0:01:12 lr 0.000127 wd 0.0500 time 0.4020 (0.4155) data time 0.0007 (0.0018) model time 0.4013 (0.4137) loss 6.1644 (6.5639) grad_norm 2.0498 (3.9498) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][460/625] eta 0:01:08 lr 0.000126 wd 0.0500 time 0.3979 (0.4151) data time 0.0007 (0.0017) model time 0.3972 (0.4133) loss 5.5095 (6.5521) grad_norm 5.3986 (3.9606) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][470/625] eta 0:01:04 lr 0.000126 wd 0.0500 time 0.3966 (0.4147) data time 0.0009 (0.0017) model time 0.3957 (0.4129) loss 6.8365 (6.5509) grad_norm 2.3662 (3.9398) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:00:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][480/625] eta 0:01:00 lr 0.000126 wd 0.0500 time 0.3955 (0.4148) data time 0.0007 (0.0017) model time 0.3948 (0.4130) loss 6.6360 (6.5429) grad_norm 5.2641 (3.9265) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][490/625] eta 0:00:56 lr 0.000126 wd 0.0500 time 0.3965 (0.4153) data time 0.0009 (0.0017) model time 0.3956 (0.4136) loss 6.1872 (6.5457) grad_norm 7.6401 (3.9190) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][500/625] eta 0:00:51 lr 0.000126 wd 0.0500 time 0.3994 (0.4153) data time 0.0006 (0.0017) model time 0.3987 (0.4136) loss 5.7574 (6.5478) grad_norm 3.0296 (3.9043) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][510/625] eta 0:00:47 lr 0.000126 wd 0.0500 time 0.5719 (0.4162) data time 0.0007 (0.0016) model time 0.5712 (0.4147) loss 6.8962 (6.5420) grad_norm 2.4132 (3.8844) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][520/625] eta 0:00:43 lr 0.000126 wd 0.0500 time 0.5923 (0.4176) data time 0.0008 (0.0016) model time 0.5915 (0.4162) loss 6.4745 (6.5402) grad_norm 2.3525 (3.8715) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][530/625] eta 0:00:39 lr 0.000126 wd 0.0500 time 0.3953 (0.4186) data time 0.0008 (0.0016) model time 0.3944 (0.4173) loss 7.1384 (6.5420) grad_norm 2.7190 (3.8661) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][540/625] eta 0:00:35 lr 0.000126 wd 0.0500 time 0.5853 (0.4187) data time 0.0006 (0.0016) model time 0.5846 (0.4175) loss 6.7755 (6.5368) grad_norm 3.2148 (3.8637) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][550/625] eta 0:00:31 lr 0.000126 wd 0.0500 time 0.5785 (0.4187) data time 0.0006 (0.0016) model time 0.5779 (0.4174) loss 5.7656 (6.5365) grad_norm 2.3777 (3.8582) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][560/625] eta 0:00:27 lr 0.000126 wd 0.0500 time 0.4045 (0.4183) data time 0.0006 (0.0016) model time 0.4038 (0.4170) loss 6.3185 (6.5398) grad_norm 2.4143 (3.8438) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][570/625] eta 0:00:22 lr 0.000126 wd 0.0500 time 0.3977 (0.4179) data time 0.0008 (0.0016) model time 0.3969 (0.4166) loss 6.8435 (6.5393) grad_norm 2.1198 (3.8168) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][580/625] eta 0:00:18 lr 0.000126 wd 0.0500 time 0.3980 (0.4176) data time 0.0008 (0.0016) model time 0.3972 (0.4163) loss 7.0620 (6.5401) grad_norm 2.8977 (3.7972) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][590/625] eta 0:00:14 lr 0.000126 wd 0.0500 time 0.3980 (0.4173) data time 0.0008 (0.0015) model time 0.3972 (0.4160) loss 6.4509 (6.5462) grad_norm 3.6040 (3.7767) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][600/625] eta 0:00:10 lr 0.000126 wd 0.0500 time 0.3977 (0.4170) data time 0.0007 (0.0015) model time 0.3970 (0.4156) loss 5.6580 (6.5452) grad_norm 2.1769 (3.7737) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][610/625] eta 0:00:06 lr 0.000126 wd 0.0500 time 0.3967 (0.4167) data time 0.0006 (0.0015) model time 0.3960 (0.4153) loss 5.1536 (6.5491) grad_norm 3.6171 (3.7613) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][620/625] eta 0:00:02 lr 0.000125 wd 0.0500 time 0.3977 (0.4163) data time 0.0006 (0.0015) model time 0.3971 (0.4149) loss 7.3916 (6.5531) grad_norm 1.9747 (3.7534) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:01:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 243 training takes 0:04:20 [2024-07-25 10:01:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:01:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:01:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5396 (0.5396) Acc@1 89.648 (89.648) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 10:01:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8232 (0.6570) Acc@1 82.031 (87.194) Acc@5 96.875 (97.931) Mem 14939MB [2024-07-25 10:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9194 (0.7632) Acc@1 78.369 (84.249) Acc@5 95.654 (96.933) Mem 14939MB [2024-07-25 10:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.845 Acc@5 96.907 [2024-07-25 10:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 10:02:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.769 (0.769) Loss 0.5376 (0.5376) Acc@1 90.332 (90.332) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 10:02:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.154) Loss 0.8193 (0.6599) Acc@1 83.057 (87.189) Acc@5 96.875 (97.949) Mem 14939MB [2024-07-25 10:02:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.122) Loss 0.9307 (0.7657) Acc@1 78.320 (84.252) Acc@5 95.410 (96.933) Mem 14939MB [2024-07-25 10:02:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.843 Acc@5 96.889 [2024-07-25 10:02:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 10:02:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.84% [2024-07-25 10:02:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:02:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:02:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][0/625] eta 0:08:04 lr 0.000125 wd 0.0500 time 0.7758 (0.7758) data time 0.3977 (0.3977) model time 0.0000 (0.0000) loss 5.7904 (5.7904) grad_norm 2.9483 (2.9483) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][10/625] eta 0:04:26 lr 0.000125 wd 0.0500 time 0.4037 (0.4329) data time 0.0009 (0.0370) model time 0.0000 (0.0000) loss 6.0417 (6.4689) grad_norm 1.9659 (3.0121) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][20/625] eta 0:04:11 lr 0.000125 wd 0.0500 time 0.3918 (0.4155) data time 0.0009 (0.0198) model time 0.0000 (0.0000) loss 6.6155 (6.5515) grad_norm 3.0968 (3.0602) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][30/625] eta 0:04:03 lr 0.000125 wd 0.0500 time 0.3996 (0.4098) data time 0.0007 (0.0137) model time 0.0000 (0.0000) loss 7.2662 (6.6811) grad_norm 4.8540 (3.5099) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][40/625] eta 0:03:58 lr 0.000125 wd 0.0500 time 0.3950 (0.4071) data time 0.0008 (0.0106) model time 0.0000 (0.0000) loss 7.7326 (6.6924) grad_norm 5.7227 (3.5047) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][50/625] eta 0:03:53 lr 0.000125 wd 0.0500 time 0.3988 (0.4055) data time 0.0010 (0.0087) model time 0.0000 (0.0000) loss 7.2603 (6.6516) grad_norm 2.2792 (3.4521) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][60/625] eta 0:03:49 lr 0.000125 wd 0.0500 time 0.3985 (0.4063) data time 0.0008 (0.0074) model time 0.3977 (0.4098) loss 7.3728 (6.6698) grad_norm 2.8707 (3.5337) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][70/625] eta 0:03:46 lr 0.000125 wd 0.0500 time 0.3993 (0.4073) data time 0.0007 (0.0065) model time 0.3986 (0.4110) loss 6.9589 (6.6460) grad_norm 3.6622 (3.6440) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][80/625] eta 0:03:42 lr 0.000125 wd 0.0500 time 0.4083 (0.4089) data time 0.0008 (0.0058) model time 0.4075 (0.4139) loss 7.0310 (6.6807) grad_norm 2.1833 (3.5536) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][90/625] eta 0:03:40 lr 0.000125 wd 0.0500 time 0.3953 (0.4120) data time 0.0009 (0.0052) model time 0.3945 (0.4194) loss 6.2705 (6.6805) grad_norm 3.2721 (3.4871) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][100/625] eta 0:03:39 lr 0.000125 wd 0.0500 time 0.5351 (0.4172) data time 0.0008 (0.0048) model time 0.5342 (0.4283) loss 6.4496 (6.6522) grad_norm 3.5149 (3.6318) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][110/625] eta 0:03:36 lr 0.000125 wd 0.0500 time 0.5163 (0.4202) data time 0.0007 (0.0045) model time 0.5156 (0.4318) loss 6.4330 (6.6409) grad_norm 3.2433 (3.5917) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][120/625] eta 0:03:34 lr 0.000125 wd 0.0500 time 0.5010 (0.4240) data time 0.0007 (0.0041) model time 0.5003 (0.4367) loss 7.2025 (6.6355) grad_norm 2.9311 (3.5694) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:02:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][130/625] eta 0:03:31 lr 0.000125 wd 0.0500 time 0.5896 (0.4278) data time 0.0007 (0.0039) model time 0.5890 (0.4411) loss 7.3342 (6.6025) grad_norm 2.9585 (3.4944) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:03:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][140/625] eta 0:03:27 lr 0.000125 wd 0.0500 time 0.3930 (0.4275) data time 0.0008 (0.0037) model time 0.3923 (0.4390) loss 7.1617 (6.6126) grad_norm 2.6905 (3.5322) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:03:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][150/625] eta 0:03:22 lr 0.000125 wd 0.0500 time 0.3976 (0.4261) data time 0.0008 (0.0035) model time 0.3968 (0.4358) loss 6.8952 (6.6215) grad_norm 2.3060 (3.4669) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:03:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][160/625] eta 0:03:17 lr 0.000124 wd 0.0500 time 0.3991 (0.4246) data time 0.0007 (0.0033) model time 0.3984 (0.4325) loss 6.9333 (6.6205) grad_norm 3.5452 (3.4256) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:03:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][170/625] eta 0:03:12 lr 0.000124 wd 0.0500 time 0.3980 (0.4231) data time 0.0006 (0.0032) model time 0.3974 (0.4298) loss 7.3266 (6.6119) grad_norm 3.0027 (3.3829) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:03:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][180/625] eta 0:03:07 lr 0.000124 wd 0.0500 time 0.3975 (0.4217) data time 0.0006 (0.0031) model time 0.3969 (0.4273) loss 5.8991 (6.6016) grad_norm 2.1877 (3.3403) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:03:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][190/625] eta 0:03:02 lr 0.000124 wd 0.0500 time 0.4021 (0.4205) data time 0.0006 (0.0029) model time 0.4015 (0.4251) loss 7.5642 (6.6055) grad_norm 3.0428 (inf) loss_scale 64.0000 (126.6597) mem 14939MB [2024-07-25 10:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][200/625] eta 0:02:58 lr 0.000124 wd 0.0500 time 0.3990 (0.4194) data time 0.0008 (0.0028) model time 0.3982 (0.4233) loss 6.5035 (6.6102) grad_norm 6.6505 (inf) loss_scale 64.0000 (123.5423) mem 14939MB [2024-07-25 10:03:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][210/625] eta 0:02:53 lr 0.000124 wd 0.0500 time 0.3970 (0.4184) data time 0.0007 (0.0027) model time 0.3963 (0.4216) loss 6.6225 (6.6030) grad_norm 3.8631 (inf) loss_scale 64.0000 (120.7204) mem 14939MB [2024-07-25 10:03:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][220/625] eta 0:02:49 lr 0.000124 wd 0.0500 time 0.3998 (0.4175) data time 0.0006 (0.0027) model time 0.3992 (0.4202) loss 7.5442 (6.6051) grad_norm 2.5483 (inf) loss_scale 64.0000 (118.1538) mem 14939MB [2024-07-25 10:03:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][230/625] eta 0:02:44 lr 0.000124 wd 0.0500 time 0.3969 (0.4167) data time 0.0009 (0.0026) model time 0.3961 (0.4190) loss 6.9608 (6.6113) grad_norm 3.7943 (inf) loss_scale 64.0000 (115.8095) mem 14939MB [2024-07-25 10:03:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][240/625] eta 0:02:40 lr 0.000124 wd 0.0500 time 0.3978 (0.4159) data time 0.0008 (0.0025) model time 0.3970 (0.4179) loss 7.1763 (6.6135) grad_norm 3.2993 (inf) loss_scale 64.0000 (113.6598) mem 14939MB [2024-07-25 10:03:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][250/625] eta 0:02:35 lr 0.000124 wd 0.0500 time 0.4010 (0.4153) data time 0.0008 (0.0024) model time 0.4002 (0.4170) loss 6.3824 (6.6194) grad_norm 3.8232 (inf) loss_scale 64.0000 (111.6813) mem 14939MB [2024-07-25 10:03:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][260/625] eta 0:02:31 lr 0.000124 wd 0.0500 time 0.3969 (0.4147) data time 0.0009 (0.0024) model time 0.3960 (0.4160) loss 6.4825 (6.6185) grad_norm 3.6924 (inf) loss_scale 64.0000 (109.8544) mem 14939MB [2024-07-25 10:03:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][270/625] eta 0:02:26 lr 0.000124 wd 0.0500 time 0.4000 (0.4141) data time 0.0008 (0.0023) model time 0.3992 (0.4152) loss 7.3695 (6.6240) grad_norm 2.7282 (inf) loss_scale 64.0000 (108.1624) mem 14939MB [2024-07-25 10:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][280/625] eta 0:02:22 lr 0.000124 wd 0.0500 time 0.3988 (0.4140) data time 0.0006 (0.0023) model time 0.3982 (0.4150) loss 6.2453 (6.6277) grad_norm 1.9574 (inf) loss_scale 64.0000 (106.5907) mem 14939MB [2024-07-25 10:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][290/625] eta 0:02:18 lr 0.000124 wd 0.0500 time 0.3956 (0.4139) data time 0.0008 (0.0022) model time 0.3948 (0.4148) loss 7.2811 (6.6263) grad_norm 4.5803 (inf) loss_scale 64.0000 (105.1271) mem 14939MB [2024-07-25 10:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][300/625] eta 0:02:14 lr 0.000124 wd 0.0500 time 0.4045 (0.4139) data time 0.0008 (0.0022) model time 0.4037 (0.4148) loss 6.7887 (6.6349) grad_norm 3.2205 (inf) loss_scale 64.0000 (103.7608) mem 14939MB [2024-07-25 10:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][310/625] eta 0:02:10 lr 0.000124 wd 0.0500 time 0.4033 (0.4145) data time 0.0007 (0.0021) model time 0.4026 (0.4154) loss 6.6812 (6.6330) grad_norm 2.9861 (inf) loss_scale 64.0000 (102.4823) mem 14939MB [2024-07-25 10:04:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][320/625] eta 0:02:06 lr 0.000123 wd 0.0500 time 0.5834 (0.4161) data time 0.0008 (0.0021) model time 0.5825 (0.4172) loss 7.5966 (6.6360) grad_norm 2.5174 (inf) loss_scale 64.0000 (101.2835) mem 14939MB [2024-07-25 10:04:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][330/625] eta 0:02:02 lr 0.000123 wd 0.0500 time 0.3974 (0.4169) data time 0.0007 (0.0021) model time 0.3968 (0.4181) loss 6.3641 (6.6421) grad_norm 2.8269 (inf) loss_scale 64.0000 (100.1571) mem 14939MB [2024-07-25 10:04:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][340/625] eta 0:01:59 lr 0.000123 wd 0.0500 time 0.5831 (0.4186) data time 0.0006 (0.0020) model time 0.5825 (0.4200) loss 6.9049 (6.6510) grad_norm 17.3965 (inf) loss_scale 64.0000 (99.0968) mem 14939MB [2024-07-25 10:04:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][350/625] eta 0:01:55 lr 0.000123 wd 0.0500 time 0.5778 (0.4201) data time 0.0009 (0.0020) model time 0.5769 (0.4217) loss 7.5646 (6.6520) grad_norm 2.9903 (inf) loss_scale 64.0000 (98.0969) mem 14939MB [2024-07-25 10:04:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][360/625] eta 0:01:51 lr 0.000123 wd 0.0500 time 0.3998 (0.4201) data time 0.0007 (0.0020) model time 0.3991 (0.4216) loss 5.5124 (6.6436) grad_norm 2.0689 (inf) loss_scale 64.0000 (97.1524) mem 14939MB [2024-07-25 10:04:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][370/625] eta 0:01:47 lr 0.000123 wd 0.0500 time 0.4009 (0.4198) data time 0.0008 (0.0019) model time 0.4001 (0.4212) loss 6.0596 (6.6356) grad_norm 3.1003 (inf) loss_scale 64.0000 (96.2588) mem 14939MB [2024-07-25 10:04:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][380/625] eta 0:01:42 lr 0.000123 wd 0.0500 time 0.3998 (0.4192) data time 0.0008 (0.0019) model time 0.3991 (0.4205) loss 7.2907 (6.6246) grad_norm 3.4951 (inf) loss_scale 64.0000 (95.4121) mem 14939MB [2024-07-25 10:04:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][390/625] eta 0:01:38 lr 0.000123 wd 0.0500 time 0.3968 (0.4187) data time 0.0008 (0.0019) model time 0.3960 (0.4198) loss 7.2831 (6.6229) grad_norm 2.3503 (inf) loss_scale 64.0000 (94.6087) mem 14939MB [2024-07-25 10:04:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][400/625] eta 0:01:34 lr 0.000123 wd 0.0500 time 0.3986 (0.4182) data time 0.0007 (0.0018) model time 0.3979 (0.4192) loss 5.6320 (6.6229) grad_norm 2.4188 (inf) loss_scale 64.0000 (93.8454) mem 14939MB [2024-07-25 10:04:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][410/625] eta 0:01:29 lr 0.000123 wd 0.0500 time 0.3975 (0.4177) data time 0.0009 (0.0018) model time 0.3966 (0.4186) loss 6.7870 (6.6270) grad_norm 2.4710 (inf) loss_scale 64.0000 (93.1192) mem 14939MB [2024-07-25 10:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][420/625] eta 0:01:25 lr 0.000123 wd 0.0500 time 0.3965 (0.4173) data time 0.0007 (0.0018) model time 0.3958 (0.4181) loss 7.0168 (6.6262) grad_norm 3.2505 (inf) loss_scale 64.0000 (92.4276) mem 14939MB [2024-07-25 10:05:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][430/625] eta 0:01:21 lr 0.000123 wd 0.0500 time 0.3960 (0.4168) data time 0.0007 (0.0018) model time 0.3953 (0.4175) loss 6.2624 (6.6175) grad_norm 3.5807 (inf) loss_scale 64.0000 (91.7680) mem 14939MB [2024-07-25 10:05:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][440/625] eta 0:01:17 lr 0.000123 wd 0.0500 time 0.3990 (0.4165) data time 0.0007 (0.0018) model time 0.3983 (0.4170) loss 5.8335 (6.6231) grad_norm 2.7381 (inf) loss_scale 64.0000 (91.1383) mem 14939MB [2024-07-25 10:05:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][450/625] eta 0:01:12 lr 0.000123 wd 0.0500 time 0.4004 (0.4161) data time 0.0008 (0.0017) model time 0.3996 (0.4166) loss 7.0041 (6.6184) grad_norm 2.7288 (inf) loss_scale 64.0000 (90.5366) mem 14939MB [2024-07-25 10:05:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][460/625] eta 0:01:08 lr 0.000123 wd 0.0500 time 0.3989 (0.4157) data time 0.0008 (0.0017) model time 0.3981 (0.4161) loss 6.8236 (6.6239) grad_norm 2.1407 (inf) loss_scale 64.0000 (89.9610) mem 14939MB [2024-07-25 10:05:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][470/625] eta 0:01:04 lr 0.000123 wd 0.0500 time 0.3989 (0.4153) data time 0.0006 (0.0017) model time 0.3982 (0.4156) loss 6.1229 (6.6245) grad_norm 2.7092 (inf) loss_scale 64.0000 (89.4098) mem 14939MB [2024-07-25 10:05:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][480/625] eta 0:01:00 lr 0.000122 wd 0.0500 time 0.3994 (0.4150) data time 0.0006 (0.0017) model time 0.3987 (0.4152) loss 5.4304 (6.6287) grad_norm 3.0452 (inf) loss_scale 64.0000 (88.8815) mem 14939MB [2024-07-25 10:05:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][490/625] eta 0:00:55 lr 0.000122 wd 0.0500 time 0.3975 (0.4146) data time 0.0007 (0.0017) model time 0.3968 (0.4148) loss 7.8693 (6.6318) grad_norm 4.3313 (inf) loss_scale 64.0000 (88.3747) mem 14939MB [2024-07-25 10:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][500/625] eta 0:00:51 lr 0.000122 wd 0.0500 time 0.4044 (0.4146) data time 0.0007 (0.0016) model time 0.4037 (0.4148) loss 6.7027 (6.6288) grad_norm 2.3312 (inf) loss_scale 64.0000 (87.8882) mem 14939MB [2024-07-25 10:05:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][510/625] eta 0:00:47 lr 0.000122 wd 0.0500 time 0.5565 (0.4146) data time 0.0008 (0.0016) model time 0.5557 (0.4148) loss 6.1072 (6.6234) grad_norm 4.9048 (inf) loss_scale 64.0000 (87.4207) mem 14939MB [2024-07-25 10:05:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][520/625] eta 0:00:43 lr 0.000122 wd 0.0500 time 0.3946 (0.4146) data time 0.0006 (0.0016) model time 0.3939 (0.4148) loss 5.8003 (6.6200) grad_norm 3.2061 (inf) loss_scale 64.0000 (86.9712) mem 14939MB [2024-07-25 10:05:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][530/625] eta 0:00:39 lr 0.000122 wd 0.0500 time 0.3963 (0.4150) data time 0.0007 (0.0016) model time 0.3957 (0.4152) loss 6.4585 (6.6187) grad_norm 4.1759 (inf) loss_scale 64.0000 (86.5386) mem 14939MB [2024-07-25 10:05:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][540/625] eta 0:00:35 lr 0.000122 wd 0.0500 time 0.6140 (0.4160) data time 0.0007 (0.0016) model time 0.6133 (0.4163) loss 5.4019 (6.6166) grad_norm 12.7736 (inf) loss_scale 64.0000 (86.1220) mem 14939MB [2024-07-25 10:05:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][550/625] eta 0:00:31 lr 0.000122 wd 0.0500 time 0.6064 (0.4170) data time 0.0008 (0.0016) model time 0.6057 (0.4173) loss 7.8245 (6.6195) grad_norm 4.8501 (inf) loss_scale 64.0000 (85.7205) mem 14939MB [2024-07-25 10:05:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][560/625] eta 0:00:27 lr 0.000122 wd 0.0500 time 0.5748 (0.4183) data time 0.0006 (0.0016) model time 0.5742 (0.4187) loss 6.5435 (6.6172) grad_norm 2.5745 (inf) loss_scale 64.0000 (85.3333) mem 14939MB [2024-07-25 10:06:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][570/625] eta 0:00:23 lr 0.000122 wd 0.0500 time 0.3956 (0.4184) data time 0.0008 (0.0015) model time 0.3949 (0.4189) loss 7.7942 (6.6217) grad_norm 3.0794 (inf) loss_scale 64.0000 (84.9597) mem 14939MB [2024-07-25 10:06:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][580/625] eta 0:00:18 lr 0.000122 wd 0.0500 time 0.3960 (0.4185) data time 0.0008 (0.0015) model time 0.3952 (0.4189) loss 6.3611 (6.6204) grad_norm 2.4615 (inf) loss_scale 64.0000 (84.5990) mem 14939MB [2024-07-25 10:06:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][590/625] eta 0:00:14 lr 0.000122 wd 0.0500 time 0.4000 (0.4184) data time 0.0009 (0.0015) model time 0.3991 (0.4188) loss 6.3537 (6.6226) grad_norm 3.1079 (inf) loss_scale 64.0000 (84.2504) mem 14939MB [2024-07-25 10:06:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][600/625] eta 0:00:10 lr 0.000122 wd 0.0500 time 0.3977 (0.4181) data time 0.0008 (0.0015) model time 0.3969 (0.4184) loss 5.8719 (6.6201) grad_norm 2.6588 (inf) loss_scale 64.0000 (83.9135) mem 14939MB [2024-07-25 10:06:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][610/625] eta 0:00:06 lr 0.000122 wd 0.0500 time 0.3953 (0.4178) data time 0.0004 (0.0015) model time 0.3949 (0.4180) loss 5.7772 (6.6155) grad_norm 2.1168 (inf) loss_scale 64.0000 (83.5876) mem 14939MB [2024-07-25 10:06:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][620/625] eta 0:00:02 lr 0.000122 wd 0.0500 time 0.3960 (0.4174) data time 0.0005 (0.0015) model time 0.3956 (0.4176) loss 6.4947 (6.6103) grad_norm 3.7726 (inf) loss_scale 64.0000 (83.2721) mem 14939MB [2024-07-25 10:06:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 244 training takes 0:04:20 [2024-07-25 10:06:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:06:25 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:06:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5356 (0.5356) Acc@1 90.332 (90.332) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 10:06:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8174 (0.6604) Acc@1 82.617 (87.149) Acc@5 96.777 (97.985) Mem 14939MB [2024-07-25 10:06:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9238 (0.7684) Acc@1 79.102 (84.196) Acc@5 95.850 (96.933) Mem 14939MB [2024-07-25 10:06:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.775 Acc@5 96.901 [2024-07-25 10:06:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 10:06:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.768 (0.768) Loss 0.5381 (0.5381) Acc@1 90.332 (90.332) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:06:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 0.8193 (0.6601) Acc@1 83.057 (87.211) Acc@5 96.875 (97.963) Mem 14939MB [2024-07-25 10:06:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9297 (0.7656) Acc@1 78.320 (84.256) Acc@5 95.459 (96.949) Mem 14939MB [2024-07-25 10:06:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.851 Acc@5 96.903 [2024-07-25 10:06:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:06:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.85% [2024-07-25 10:06:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:06:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:06:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][0/625] eta 0:07:53 lr 0.000122 wd 0.0500 time 0.7581 (0.7581) data time 0.3783 (0.3783) model time 0.0000 (0.0000) loss 6.4858 (6.4858) grad_norm 4.4643 (4.4643) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:06:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][10/625] eta 0:04:24 lr 0.000121 wd 0.0500 time 0.3995 (0.4308) data time 0.0006 (0.0351) model time 0.0000 (0.0000) loss 5.7523 (6.5219) grad_norm 1.9437 (3.2511) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:06:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][20/625] eta 0:04:11 lr 0.000121 wd 0.0500 time 0.3968 (0.4160) data time 0.0007 (0.0188) model time 0.0000 (0.0000) loss 7.1033 (6.4787) grad_norm 2.4881 (2.8323) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:06:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][30/625] eta 0:04:07 lr 0.000121 wd 0.0500 time 0.3975 (0.4153) data time 0.0007 (0.0130) model time 0.0000 (0.0000) loss 5.7129 (6.5514) grad_norm 3.2243 (3.0279) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:06:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][40/625] eta 0:04:00 lr 0.000121 wd 0.0500 time 0.4048 (0.4116) data time 0.0007 (0.0100) model time 0.0000 (0.0000) loss 5.9818 (6.5568) grad_norm 2.9266 (3.2312) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:06:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][50/625] eta 0:03:55 lr 0.000121 wd 0.0500 time 0.4005 (0.4094) data time 0.0006 (0.0082) model time 0.0000 (0.0000) loss 7.0919 (6.5285) grad_norm 5.8961 (3.3444) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:06:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][60/625] eta 0:03:50 lr 0.000121 wd 0.0500 time 0.4119 (0.4079) data time 0.0007 (0.0070) model time 0.4112 (0.3993) loss 6.5410 (6.5381) grad_norm 2.1781 (3.2484) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][70/625] eta 0:03:45 lr 0.000121 wd 0.0500 time 0.3977 (0.4066) data time 0.0006 (0.0062) model time 0.3971 (0.3986) loss 6.1025 (6.5314) grad_norm 2.3362 (3.1435) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][80/625] eta 0:03:41 lr 0.000121 wd 0.0500 time 0.3997 (0.4056) data time 0.0010 (0.0056) model time 0.3988 (0.3981) loss 6.2213 (6.5645) grad_norm 3.7917 (3.1202) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][90/625] eta 0:03:36 lr 0.000121 wd 0.0500 time 0.4008 (0.4051) data time 0.0006 (0.0050) model time 0.4002 (0.3986) loss 5.9302 (6.5563) grad_norm 2.9281 (3.2185) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][100/625] eta 0:03:32 lr 0.000121 wd 0.0500 time 0.3993 (0.4044) data time 0.0007 (0.0046) model time 0.3986 (0.3984) loss 6.4003 (6.5462) grad_norm 2.1151 (3.1609) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][110/625] eta 0:03:28 lr 0.000121 wd 0.0500 time 0.4051 (0.4055) data time 0.0009 (0.0043) model time 0.4042 (0.4012) loss 6.8272 (6.5756) grad_norm 2.8113 (3.1710) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][120/625] eta 0:03:26 lr 0.000121 wd 0.0500 time 0.5966 (0.4084) data time 0.0007 (0.0040) model time 0.5960 (0.4068) loss 5.5467 (6.5483) grad_norm 2.4890 (3.2192) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][130/625] eta 0:03:22 lr 0.000121 wd 0.0500 time 0.6001 (0.4093) data time 0.0006 (0.0038) model time 0.5995 (0.4083) loss 7.5122 (6.5621) grad_norm 3.8939 (3.2265) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][140/625] eta 0:03:20 lr 0.000121 wd 0.0500 time 0.5871 (0.4138) data time 0.0009 (0.0036) model time 0.5862 (0.4154) loss 5.5763 (6.5426) grad_norm 4.9331 (3.2854) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][150/625] eta 0:03:18 lr 0.000121 wd 0.0500 time 0.5711 (0.4174) data time 0.0007 (0.0034) model time 0.5703 (0.4206) loss 6.3494 (6.5297) grad_norm 9.4503 (3.2915) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][160/625] eta 0:03:15 lr 0.000121 wd 0.0500 time 0.3990 (0.4201) data time 0.0006 (0.0032) model time 0.3984 (0.4242) loss 5.4237 (6.5247) grad_norm 2.6470 (3.2631) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][170/625] eta 0:03:11 lr 0.000121 wd 0.0500 time 0.3965 (0.4207) data time 0.0009 (0.0031) model time 0.3956 (0.4246) loss 7.5784 (6.5275) grad_norm 3.2410 (3.2551) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][180/625] eta 0:03:07 lr 0.000120 wd 0.0500 time 0.4043 (0.4211) data time 0.0009 (0.0030) model time 0.4033 (0.4248) loss 7.1397 (6.5472) grad_norm 4.8692 (3.3445) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][190/625] eta 0:03:03 lr 0.000120 wd 0.0500 time 0.3990 (0.4207) data time 0.0009 (0.0029) model time 0.3981 (0.4239) loss 7.0176 (6.5429) grad_norm 4.5581 (3.3564) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:07:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][200/625] eta 0:02:58 lr 0.000120 wd 0.0500 time 0.4015 (0.4199) data time 0.0008 (0.0028) model time 0.4007 (0.4225) loss 6.7810 (6.5275) grad_norm 2.5353 (3.3660) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][210/625] eta 0:02:53 lr 0.000120 wd 0.0500 time 0.4000 (0.4191) data time 0.0009 (0.0027) model time 0.3991 (0.4213) loss 8.2495 (6.5388) grad_norm 3.7195 (3.4253) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][220/625] eta 0:02:49 lr 0.000120 wd 0.0500 time 0.3951 (0.4184) data time 0.0007 (0.0026) model time 0.3943 (0.4202) loss 7.7556 (6.5432) grad_norm 4.5046 (3.4184) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][230/625] eta 0:02:44 lr 0.000120 wd 0.0500 time 0.3966 (0.4177) data time 0.0009 (0.0025) model time 0.3957 (0.4192) loss 5.5172 (6.5270) grad_norm 4.4935 (3.4283) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][240/625] eta 0:02:40 lr 0.000120 wd 0.0500 time 0.3986 (0.4170) data time 0.0007 (0.0025) model time 0.3980 (0.4181) loss 6.9175 (6.5325) grad_norm 3.0617 (3.4567) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][250/625] eta 0:02:36 lr 0.000120 wd 0.0500 time 0.4028 (0.4170) data time 0.0008 (0.0024) model time 0.4020 (0.4180) loss 7.0090 (6.5484) grad_norm 3.8670 (3.4421) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][260/625] eta 0:02:31 lr 0.000120 wd 0.0500 time 0.4002 (0.4164) data time 0.0008 (0.0023) model time 0.3995 (0.4172) loss 6.2037 (6.5425) grad_norm 2.7212 (3.4371) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][270/625] eta 0:02:27 lr 0.000120 wd 0.0500 time 0.3928 (0.4159) data time 0.0007 (0.0023) model time 0.3921 (0.4165) loss 6.9012 (6.5653) grad_norm 2.7165 (3.4180) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][280/625] eta 0:02:23 lr 0.000120 wd 0.0500 time 0.4130 (0.4153) data time 0.0009 (0.0022) model time 0.4121 (0.4158) loss 6.3101 (6.5732) grad_norm 3.0786 (3.4266) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][290/625] eta 0:02:18 lr 0.000120 wd 0.0500 time 0.4002 (0.4148) data time 0.0008 (0.0022) model time 0.3994 (0.4151) loss 7.3619 (6.5688) grad_norm 5.3868 (3.4456) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][300/625] eta 0:02:14 lr 0.000120 wd 0.0500 time 0.3939 (0.4143) data time 0.0006 (0.0021) model time 0.3933 (0.4144) loss 7.5971 (6.5708) grad_norm 4.3801 (3.4562) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][310/625] eta 0:02:10 lr 0.000120 wd 0.0500 time 0.3968 (0.4138) data time 0.0007 (0.0021) model time 0.3961 (0.4138) loss 6.2161 (6.5627) grad_norm 4.4838 (3.4580) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][320/625] eta 0:02:06 lr 0.000120 wd 0.0500 time 0.4000 (0.4134) data time 0.0008 (0.0021) model time 0.3992 (0.4133) loss 7.0022 (6.5636) grad_norm 2.9232 (3.4479) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][330/625] eta 0:02:01 lr 0.000120 wd 0.0500 time 0.3940 (0.4133) data time 0.0008 (0.0020) model time 0.3932 (0.4131) loss 7.0601 (6.5728) grad_norm 1.9225 (3.4249) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][340/625] eta 0:01:58 lr 0.000119 wd 0.0500 time 0.4026 (0.4140) data time 0.0006 (0.0020) model time 0.4020 (0.4140) loss 6.2318 (6.5825) grad_norm 3.4956 (3.4000) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:08:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][350/625] eta 0:01:53 lr 0.000119 wd 0.0500 time 0.5054 (0.4144) data time 0.0007 (0.0020) model time 0.5046 (0.4144) loss 6.9722 (6.5847) grad_norm 2.4509 (3.3923) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][360/625] eta 0:01:50 lr 0.000119 wd 0.0500 time 0.5853 (0.4161) data time 0.0008 (0.0019) model time 0.5845 (0.4163) loss 7.0184 (6.5831) grad_norm 3.2738 (3.3742) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][370/625] eta 0:01:46 lr 0.000119 wd 0.0500 time 0.5804 (0.4176) data time 0.0007 (0.0019) model time 0.5797 (0.4181) loss 6.5156 (6.5847) grad_norm 3.3005 (3.3605) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][380/625] eta 0:01:42 lr 0.000119 wd 0.0500 time 0.3957 (0.4185) data time 0.0009 (0.0019) model time 0.3948 (0.4190) loss 7.2113 (6.5856) grad_norm 4.0257 (3.4516) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][390/625] eta 0:01:38 lr 0.000119 wd 0.0500 time 0.4092 (0.4190) data time 0.0008 (0.0018) model time 0.4084 (0.4195) loss 6.8769 (6.5883) grad_norm 2.7856 (3.7594) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][400/625] eta 0:01:34 lr 0.000119 wd 0.0500 time 0.3936 (0.4189) data time 0.0007 (0.0018) model time 0.3929 (0.4194) loss 6.7144 (6.5831) grad_norm 2.4410 (3.7324) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][410/625] eta 0:01:30 lr 0.000119 wd 0.0500 time 0.3967 (0.4187) data time 0.0006 (0.0018) model time 0.3961 (0.4191) loss 6.0106 (6.5779) grad_norm 3.0869 (3.7037) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][420/625] eta 0:01:25 lr 0.000119 wd 0.0500 time 0.3986 (0.4183) data time 0.0007 (0.0018) model time 0.3979 (0.4186) loss 5.8509 (6.5674) grad_norm 2.1951 (3.6780) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][430/625] eta 0:01:21 lr 0.000119 wd 0.0500 time 0.4008 (0.4178) data time 0.0007 (0.0017) model time 0.4002 (0.4181) loss 6.1475 (6.5658) grad_norm 2.2732 (3.6544) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][440/625] eta 0:01:17 lr 0.000119 wd 0.0500 time 0.3962 (0.4174) data time 0.0007 (0.0017) model time 0.3955 (0.4176) loss 7.2136 (6.5671) grad_norm 2.0556 (3.6386) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][450/625] eta 0:01:12 lr 0.000119 wd 0.0500 time 0.4070 (0.4170) data time 0.0008 (0.0017) model time 0.4062 (0.4171) loss 7.5066 (6.5690) grad_norm 5.3077 (3.6402) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][460/625] eta 0:01:08 lr 0.000119 wd 0.0500 time 0.3976 (0.4166) data time 0.0006 (0.0017) model time 0.3970 (0.4166) loss 6.7589 (6.5791) grad_norm 2.7314 (3.6473) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][470/625] eta 0:01:04 lr 0.000119 wd 0.0500 time 0.3982 (0.4163) data time 0.0008 (0.0017) model time 0.3974 (0.4162) loss 4.9536 (6.5815) grad_norm 2.5212 (3.6605) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][480/625] eta 0:01:00 lr 0.000119 wd 0.0500 time 0.3986 (0.4163) data time 0.0007 (0.0016) model time 0.3979 (0.4163) loss 6.6101 (6.5789) grad_norm 3.2375 (3.6977) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:09:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][490/625] eta 0:00:56 lr 0.000119 wd 0.0500 time 0.3984 (0.4160) data time 0.0006 (0.0016) model time 0.3978 (0.4159) loss 6.2691 (6.5804) grad_norm 3.8732 (3.6961) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][500/625] eta 0:00:51 lr 0.000118 wd 0.0500 time 0.3978 (0.4156) data time 0.0009 (0.0016) model time 0.3970 (0.4155) loss 7.0611 (6.5804) grad_norm 2.1163 (3.6790) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][510/625] eta 0:00:47 lr 0.000118 wd 0.0500 time 0.3980 (0.4153) data time 0.0008 (0.0016) model time 0.3972 (0.4151) loss 7.1142 (6.5871) grad_norm 2.4697 (3.6658) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][520/625] eta 0:00:43 lr 0.000118 wd 0.0500 time 0.3962 (0.4150) data time 0.0009 (0.0016) model time 0.3953 (0.4147) loss 6.9322 (6.5874) grad_norm 1.9151 (3.6535) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][530/625] eta 0:00:39 lr 0.000118 wd 0.0500 time 0.3977 (0.4147) data time 0.0009 (0.0016) model time 0.3968 (0.4144) loss 7.3146 (6.5965) grad_norm 4.0584 (3.6484) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][540/625] eta 0:00:35 lr 0.000118 wd 0.0500 time 0.4004 (0.4144) data time 0.0008 (0.0016) model time 0.3996 (0.4141) loss 7.2977 (6.6027) grad_norm 3.1661 (3.6349) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][550/625] eta 0:00:31 lr 0.000118 wd 0.0500 time 0.4018 (0.4145) data time 0.0006 (0.0015) model time 0.4012 (0.4141) loss 5.8841 (6.6074) grad_norm 4.0288 (3.6362) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][560/625] eta 0:00:26 lr 0.000118 wd 0.0500 time 0.5992 (0.4153) data time 0.0007 (0.0015) model time 0.5985 (0.4150) loss 6.2027 (6.6029) grad_norm 4.3115 (3.6673) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][570/625] eta 0:00:22 lr 0.000118 wd 0.0500 time 0.5634 (0.4153) data time 0.0009 (0.0015) model time 0.5625 (0.4150) loss 5.9561 (6.5958) grad_norm 2.2423 (3.6581) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][580/625] eta 0:00:18 lr 0.000118 wd 0.0500 time 0.3914 (0.4165) data time 0.0008 (0.0015) model time 0.3905 (0.4163) loss 7.2277 (6.5976) grad_norm 2.3952 (3.6539) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][590/625] eta 0:00:14 lr 0.000118 wd 0.0500 time 0.5712 (0.4177) data time 0.0008 (0.0015) model time 0.5704 (0.4176) loss 7.1067 (6.5985) grad_norm 2.4181 (3.6356) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][600/625] eta 0:00:10 lr 0.000118 wd 0.0500 time 0.3936 (0.4179) data time 0.0007 (0.0015) model time 0.3929 (0.4179) loss 7.3510 (6.6033) grad_norm 2.3022 (3.6181) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][610/625] eta 0:00:06 lr 0.000118 wd 0.0500 time 0.5514 (0.4188) data time 0.0004 (0.0015) model time 0.5511 (0.4188) loss 5.6029 (6.6020) grad_norm 2.4693 (3.6000) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][620/625] eta 0:00:02 lr 0.000118 wd 0.0500 time 0.3975 (0.4185) data time 0.0006 (0.0015) model time 0.3970 (0.4185) loss 7.2613 (6.5998) grad_norm 2.7936 (3.5818) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:10:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 245 training takes 0:04:21 [2024-07-25 10:10:53 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:10:54 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:10:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5444 (0.5444) Acc@1 89.844 (89.844) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 10:10:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8218 (0.6599) Acc@1 82.129 (87.145) Acc@5 96.777 (97.927) Mem 14939MB [2024-07-25 10:10:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9175 (0.7637) Acc@1 78.320 (84.235) Acc@5 95.898 (96.954) Mem 14939MB [2024-07-25 10:10:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.813 Acc@5 96.947 [2024-07-25 10:10:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 10:10:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.826 (0.826) Loss 0.5376 (0.5376) Acc@1 90.381 (90.381) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:10:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.158) Loss 0.8179 (0.6596) Acc@1 83.057 (87.211) Acc@5 96.826 (97.954) Mem 14939MB [2024-07-25 10:10:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9287 (0.7650) Acc@1 78.320 (84.252) Acc@5 95.459 (96.949) Mem 14939MB [2024-07-25 10:10:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.845 Acc@5 96.905 [2024-07-25 10:10:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 10:11:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][0/625] eta 0:13:00 lr 0.000118 wd 0.0500 time 1.2488 (1.2488) data time 0.5730 (0.5730) model time 0.0000 (0.0000) loss 6.5325 (6.5325) grad_norm 1.9413 (1.9413) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][10/625] eta 0:05:01 lr 0.000118 wd 0.0500 time 0.3938 (0.4896) data time 0.0008 (0.0528) model time 0.0000 (0.0000) loss 6.6880 (6.5992) grad_norm 3.1635 (2.6630) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][20/625] eta 0:04:29 lr 0.000118 wd 0.0500 time 0.4051 (0.4460) data time 0.0006 (0.0281) model time 0.0000 (0.0000) loss 7.1836 (6.5001) grad_norm 3.5021 (2.7598) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][30/625] eta 0:04:16 lr 0.000118 wd 0.0500 time 0.3970 (0.4307) data time 0.0008 (0.0193) model time 0.0000 (0.0000) loss 7.6165 (6.5190) grad_norm 2.2000 (4.0089) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][40/625] eta 0:04:09 lr 0.000117 wd 0.0500 time 0.3987 (0.4268) data time 0.0008 (0.0148) model time 0.0000 (0.0000) loss 6.5515 (6.5391) grad_norm 6.2319 (3.9334) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][50/625] eta 0:04:02 lr 0.000117 wd 0.0500 time 0.3993 (0.4210) data time 0.0006 (0.0120) model time 0.0000 (0.0000) loss 6.7797 (6.5484) grad_norm 8.3042 (3.9276) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][60/625] eta 0:03:55 lr 0.000117 wd 0.0500 time 0.4076 (0.4176) data time 0.0008 (0.0102) model time 0.4068 (0.3995) loss 7.6231 (6.5301) grad_norm 3.8637 (3.7603) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][70/625] eta 0:03:50 lr 0.000117 wd 0.0500 time 0.3982 (0.4147) data time 0.0008 (0.0089) model time 0.3974 (0.3980) loss 7.6721 (6.4995) grad_norm 2.4790 (3.6453) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][80/625] eta 0:03:45 lr 0.000117 wd 0.0500 time 0.3983 (0.4129) data time 0.0006 (0.0079) model time 0.3976 (0.3982) loss 5.4278 (6.4703) grad_norm 2.0960 (3.5941) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][90/625] eta 0:03:40 lr 0.000117 wd 0.0500 time 0.3997 (0.4112) data time 0.0006 (0.0071) model time 0.3991 (0.3980) loss 6.5207 (6.4797) grad_norm 2.0347 (3.5279) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][100/625] eta 0:03:35 lr 0.000117 wd 0.0500 time 0.3984 (0.4100) data time 0.0006 (0.0065) model time 0.3978 (0.3979) loss 6.4588 (6.5064) grad_norm 4.5442 (3.4460) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][110/625] eta 0:03:30 lr 0.000117 wd 0.0500 time 0.3980 (0.4089) data time 0.0006 (0.0060) model time 0.3974 (0.3979) loss 5.9153 (6.5065) grad_norm 2.4917 (3.3470) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][120/625] eta 0:03:26 lr 0.000117 wd 0.0500 time 0.4038 (0.4083) data time 0.0007 (0.0056) model time 0.4031 (0.3981) loss 7.4897 (6.5336) grad_norm 5.5140 (3.5710) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][130/625] eta 0:03:21 lr 0.000117 wd 0.0500 time 0.3950 (0.4080) data time 0.0006 (0.0052) model time 0.3944 (0.3988) loss 5.3144 (6.5414) grad_norm 7.5767 (3.6621) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:11:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][140/625] eta 0:03:17 lr 0.000117 wd 0.0500 time 0.3979 (0.4073) data time 0.0009 (0.0049) model time 0.3970 (0.3987) loss 6.0428 (6.5461) grad_norm 2.4022 (3.6538) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][150/625] eta 0:03:13 lr 0.000117 wd 0.0500 time 0.3993 (0.4075) data time 0.0008 (0.0046) model time 0.3985 (0.3998) loss 6.0787 (6.5438) grad_norm 5.1210 (3.8599) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][160/625] eta 0:03:10 lr 0.000117 wd 0.0500 time 0.3985 (0.4096) data time 0.0008 (0.0044) model time 0.3977 (0.4034) loss 7.2334 (6.5561) grad_norm 2.3979 (3.8758) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][170/625] eta 0:03:07 lr 0.000117 wd 0.0500 time 0.4008 (0.4111) data time 0.0008 (0.0042) model time 0.4000 (0.4060) loss 5.8009 (6.5078) grad_norm 2.2593 (3.8414) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][180/625] eta 0:03:03 lr 0.000117 wd 0.0500 time 0.5680 (0.4135) data time 0.0008 (0.0040) model time 0.5672 (0.4096) loss 7.4074 (6.5123) grad_norm 2.7216 (3.7740) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][190/625] eta 0:03:01 lr 0.000117 wd 0.0500 time 0.4025 (0.4166) data time 0.0007 (0.0038) model time 0.4019 (0.4141) loss 6.8170 (6.5367) grad_norm 3.1020 (3.7345) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][200/625] eta 0:02:58 lr 0.000117 wd 0.0500 time 0.3926 (0.4204) data time 0.0007 (0.0037) model time 0.3919 (0.4193) loss 5.6498 (6.5378) grad_norm 2.8134 (3.7138) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][210/625] eta 0:02:54 lr 0.000116 wd 0.0500 time 0.4026 (0.4215) data time 0.0008 (0.0036) model time 0.4018 (0.4208) loss 7.3557 (6.5468) grad_norm 2.4520 (3.6671) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][220/625] eta 0:02:50 lr 0.000116 wd 0.0500 time 0.3997 (0.4206) data time 0.0009 (0.0036) model time 0.3987 (0.4193) loss 6.2002 (6.5346) grad_norm 4.7986 (3.6569) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][230/625] eta 0:02:46 lr 0.000116 wd 0.0500 time 0.3976 (0.4205) data time 0.0006 (0.0035) model time 0.3969 (0.4193) loss 5.1865 (6.5184) grad_norm 6.6782 (3.6758) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][240/625] eta 0:02:41 lr 0.000116 wd 0.0500 time 0.4041 (0.4197) data time 0.0009 (0.0034) model time 0.4032 (0.4182) loss 6.8215 (6.5285) grad_norm 3.3750 (3.6793) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][250/625] eta 0:02:37 lr 0.000116 wd 0.0500 time 0.4004 (0.4190) data time 0.0009 (0.0033) model time 0.3995 (0.4174) loss 7.8559 (6.5361) grad_norm 3.3990 (3.6837) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][260/625] eta 0:02:32 lr 0.000116 wd 0.0500 time 0.3985 (0.4187) data time 0.0007 (0.0032) model time 0.3979 (0.4170) loss 6.4713 (6.5328) grad_norm 2.1261 (3.6578) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][270/625] eta 0:02:28 lr 0.000116 wd 0.0500 time 0.3995 (0.4180) data time 0.0007 (0.0031) model time 0.3989 (0.4162) loss 7.2785 (6.5450) grad_norm 2.7758 (3.6439) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:12:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][280/625] eta 0:02:23 lr 0.000116 wd 0.0500 time 0.4056 (0.4173) data time 0.0007 (0.0031) model time 0.4049 (0.4155) loss 7.3294 (6.5513) grad_norm 3.0687 (3.6289) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][290/625] eta 0:02:19 lr 0.000116 wd 0.0500 time 0.3974 (0.4167) data time 0.0008 (0.0030) model time 0.3966 (0.4148) loss 6.1155 (6.5491) grad_norm 3.2809 (3.5941) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][300/625] eta 0:02:15 lr 0.000116 wd 0.0500 time 0.3983 (0.4162) data time 0.0009 (0.0029) model time 0.3974 (0.4141) loss 7.1868 (6.5602) grad_norm 2.1790 (3.5593) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][310/625] eta 0:02:10 lr 0.000116 wd 0.0500 time 0.3958 (0.4156) data time 0.0008 (0.0029) model time 0.3950 (0.4135) loss 5.7862 (6.5454) grad_norm 2.1657 (3.5421) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][320/625] eta 0:02:06 lr 0.000116 wd 0.0500 time 0.3970 (0.4150) data time 0.0006 (0.0028) model time 0.3964 (0.4128) loss 7.6037 (6.5523) grad_norm 2.5974 (3.5133) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][330/625] eta 0:02:02 lr 0.000116 wd 0.0500 time 0.3979 (0.4145) data time 0.0008 (0.0027) model time 0.3971 (0.4123) loss 7.1780 (6.5495) grad_norm 2.7758 (3.4980) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][340/625] eta 0:01:58 lr 0.000116 wd 0.0500 time 0.4038 (0.4141) data time 0.0007 (0.0027) model time 0.4032 (0.4119) loss 6.1909 (6.5509) grad_norm 2.3472 (3.4849) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][350/625] eta 0:01:53 lr 0.000116 wd 0.0500 time 0.4041 (0.4137) data time 0.0009 (0.0026) model time 0.4032 (0.4114) loss 5.6247 (6.5543) grad_norm 3.4222 (3.4678) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][360/625] eta 0:01:49 lr 0.000116 wd 0.0500 time 0.3985 (0.4133) data time 0.0008 (0.0026) model time 0.3977 (0.4110) loss 6.1297 (6.5440) grad_norm 2.8302 (3.4769) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][370/625] eta 0:01:45 lr 0.000115 wd 0.0500 time 0.5965 (0.4140) data time 0.0006 (0.0025) model time 0.5959 (0.4119) loss 7.0180 (6.5462) grad_norm 2.5264 (3.5184) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][380/625] eta 0:01:41 lr 0.000115 wd 0.0500 time 0.3989 (0.4147) data time 0.0008 (0.0025) model time 0.3981 (0.4127) loss 6.0899 (6.5418) grad_norm 2.1318 (3.6118) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][390/625] eta 0:01:37 lr 0.000115 wd 0.0500 time 0.3958 (0.4153) data time 0.0007 (0.0024) model time 0.3951 (0.4134) loss 4.9984 (6.5371) grad_norm 3.2091 (3.6095) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][400/625] eta 0:01:33 lr 0.000115 wd 0.0500 time 0.3987 (0.4162) data time 0.0009 (0.0024) model time 0.3978 (0.4145) loss 6.9375 (6.5393) grad_norm 2.2447 (3.5969) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][410/625] eta 0:01:29 lr 0.000115 wd 0.0500 time 0.3988 (0.4171) data time 0.0009 (0.0024) model time 0.3979 (0.4155) loss 6.6270 (6.5355) grad_norm 2.8517 (3.5768) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:13:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][420/625] eta 0:01:25 lr 0.000115 wd 0.0500 time 0.3978 (0.4187) data time 0.0009 (0.0023) model time 0.3969 (0.4174) loss 8.5068 (6.5424) grad_norm 1.8712 (3.5647) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][430/625] eta 0:01:21 lr 0.000115 wd 0.0500 time 0.3981 (0.4194) data time 0.0009 (0.0023) model time 0.3972 (0.4182) loss 7.7190 (6.5538) grad_norm 2.7430 (3.5482) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][440/625] eta 0:01:17 lr 0.000115 wd 0.0500 time 0.4044 (0.4189) data time 0.0009 (0.0023) model time 0.4035 (0.4177) loss 6.5229 (6.5561) grad_norm 1.9437 (3.5731) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][450/625] eta 0:01:13 lr 0.000115 wd 0.0500 time 0.3972 (0.4188) data time 0.0008 (0.0022) model time 0.3964 (0.4176) loss 6.0991 (6.5549) grad_norm 2.5152 (3.5648) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][460/625] eta 0:01:09 lr 0.000115 wd 0.0500 time 0.3992 (0.4184) data time 0.0008 (0.0022) model time 0.3984 (0.4171) loss 7.2051 (6.5542) grad_norm 4.0717 (3.5728) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][470/625] eta 0:01:04 lr 0.000115 wd 0.0500 time 0.3990 (0.4180) data time 0.0010 (0.0022) model time 0.3980 (0.4166) loss 5.7702 (6.5494) grad_norm 3.3845 (3.5742) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][480/625] eta 0:01:00 lr 0.000115 wd 0.0500 time 0.3975 (0.4179) data time 0.0007 (0.0022) model time 0.3968 (0.4165) loss 6.1998 (6.5428) grad_norm 2.1265 (3.5561) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][490/625] eta 0:00:56 lr 0.000115 wd 0.0500 time 0.3965 (0.4175) data time 0.0007 (0.0021) model time 0.3958 (0.4161) loss 7.1099 (6.5429) grad_norm 2.3248 (3.5915) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][500/625] eta 0:00:52 lr 0.000115 wd 0.0500 time 0.4083 (0.4171) data time 0.0009 (0.0021) model time 0.4074 (0.4157) loss 5.7669 (6.5402) grad_norm 2.3052 (3.6481) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][510/625] eta 0:00:47 lr 0.000115 wd 0.0500 time 0.3941 (0.4168) data time 0.0009 (0.0021) model time 0.3932 (0.4154) loss 7.1817 (6.5411) grad_norm 6.7009 (3.6398) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][520/625] eta 0:00:43 lr 0.000115 wd 0.0500 time 0.3983 (0.4165) data time 0.0009 (0.0021) model time 0.3974 (0.4150) loss 7.6567 (6.5365) grad_norm 1.9401 (3.6277) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][530/625] eta 0:00:39 lr 0.000115 wd 0.0500 time 0.3983 (0.4161) data time 0.0007 (0.0020) model time 0.3976 (0.4146) loss 6.1527 (6.5403) grad_norm 2.3755 (3.6092) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][540/625] eta 0:00:35 lr 0.000114 wd 0.0500 time 0.4053 (0.4158) data time 0.0007 (0.0020) model time 0.4046 (0.4143) loss 6.4725 (6.5423) grad_norm 4.0136 (3.6287) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][550/625] eta 0:00:31 lr 0.000114 wd 0.0500 time 0.3997 (0.4155) data time 0.0007 (0.0020) model time 0.3990 (0.4139) loss 7.6101 (6.5376) grad_norm 3.8049 (3.6189) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][560/625] eta 0:00:26 lr 0.000114 wd 0.0500 time 0.4059 (0.4152) data time 0.0009 (0.0020) model time 0.4049 (0.4137) loss 7.1504 (6.5355) grad_norm 2.1639 (3.6136) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:14:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][570/625] eta 0:00:22 lr 0.000114 wd 0.0500 time 0.3994 (0.4149) data time 0.0007 (0.0019) model time 0.3988 (0.4134) loss 6.9035 (6.5303) grad_norm 2.2766 (3.5977) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][580/625] eta 0:00:18 lr 0.000114 wd 0.0500 time 0.3979 (0.4147) data time 0.0007 (0.0019) model time 0.3972 (0.4131) loss 6.0653 (6.5309) grad_norm 3.9243 (3.6198) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][590/625] eta 0:00:14 lr 0.000114 wd 0.0500 time 0.5956 (0.4151) data time 0.0007 (0.0019) model time 0.5949 (0.4135) loss 7.4041 (6.5354) grad_norm 2.2633 (3.6050) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][600/625] eta 0:00:10 lr 0.000114 wd 0.0500 time 0.3973 (0.4154) data time 0.0009 (0.0019) model time 0.3964 (0.4139) loss 5.7771 (6.5330) grad_norm 3.7368 (3.5939) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][610/625] eta 0:00:06 lr 0.000114 wd 0.0500 time 0.5884 (0.4161) data time 0.0004 (0.0019) model time 0.5880 (0.4147) loss 6.7139 (6.5271) grad_norm 2.7260 (3.5924) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][620/625] eta 0:00:02 lr 0.000114 wd 0.0500 time 0.5439 (0.4171) data time 0.0004 (0.0019) model time 0.5435 (0.4157) loss 6.8620 (6.5304) grad_norm 4.4361 (3.6042) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 246 training takes 0:04:20 [2024-07-25 10:15:20 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:15:21 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:15:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.529 (0.529) Loss 0.5396 (0.5396) Acc@1 90.039 (90.039) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 10:15:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.126) Loss 0.8159 (0.6547) Acc@1 82.422 (87.291) Acc@5 97.070 (97.976) Mem 14939MB [2024-07-25 10:15:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.107) Loss 0.9272 (0.7639) Acc@1 77.979 (84.270) Acc@5 95.752 (96.935) Mem 14939MB [2024-07-25 10:15:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.881 Acc@5 96.905 [2024-07-25 10:15:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:15:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.88% [2024-07-25 10:15:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 10:15:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 10:15:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5371 (0.5371) Acc@1 90.234 (90.234) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:15:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8164 (0.6594) Acc@1 83.057 (87.211) Acc@5 96.777 (97.945) Mem 14939MB [2024-07-25 10:15:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9287 (0.7646) Acc@1 78.271 (84.275) Acc@5 95.459 (96.952) Mem 14939MB [2024-07-25 10:15:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.871 Acc@5 96.911 [2024-07-25 10:15:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:15:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.87% [2024-07-25 10:15:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:15:28 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:15:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][0/625] eta 0:07:31 lr 0.000114 wd 0.0500 time 0.7227 (0.7227) data time 0.3479 (0.3479) model time 0.0000 (0.0000) loss 7.0862 (7.0862) grad_norm 2.8579 (2.8579) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][10/625] eta 0:05:01 lr 0.000114 wd 0.0500 time 0.5624 (0.4894) data time 0.0008 (0.0324) model time 0.0000 (0.0000) loss 7.1042 (7.1827) grad_norm 2.3790 (2.8704) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][20/625] eta 0:04:49 lr 0.000114 wd 0.0500 time 0.5943 (0.4780) data time 0.0007 (0.0179) model time 0.0000 (0.0000) loss 6.7789 (6.8000) grad_norm 3.6902 (2.9783) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][30/625] eta 0:04:30 lr 0.000114 wd 0.0500 time 0.4107 (0.4553) data time 0.0006 (0.0124) model time 0.0000 (0.0000) loss 6.7673 (6.6358) grad_norm 3.0547 (2.9329) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][40/625] eta 0:04:20 lr 0.000114 wd 0.0500 time 0.3976 (0.4459) data time 0.0007 (0.0096) model time 0.0000 (0.0000) loss 7.1402 (6.6250) grad_norm 3.2327 (2.8710) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][50/625] eta 0:04:11 lr 0.000114 wd 0.0500 time 0.3989 (0.4366) data time 0.0006 (0.0079) model time 0.0000 (0.0000) loss 6.6195 (6.6468) grad_norm 3.4470 (3.3810) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][60/625] eta 0:04:03 lr 0.000114 wd 0.0500 time 0.4036 (0.4305) data time 0.0006 (0.0067) model time 0.4030 (0.3987) loss 6.0073 (6.6359) grad_norm 2.6638 (3.3258) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:15:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][70/625] eta 0:03:56 lr 0.000114 wd 0.0500 time 0.3957 (0.4260) data time 0.0006 (0.0059) model time 0.3951 (0.3980) loss 7.7782 (6.6309) grad_norm 4.0795 (3.3096) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][80/625] eta 0:03:50 lr 0.000113 wd 0.0500 time 0.3973 (0.4225) data time 0.0007 (0.0053) model time 0.3966 (0.3976) loss 6.2254 (6.6514) grad_norm 2.3386 (3.2723) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][90/625] eta 0:03:44 lr 0.000113 wd 0.0500 time 0.4010 (0.4199) data time 0.0006 (0.0048) model time 0.4003 (0.3978) loss 6.1079 (6.6555) grad_norm 2.5869 (3.1885) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][100/625] eta 0:03:39 lr 0.000113 wd 0.0500 time 0.3969 (0.4180) data time 0.0006 (0.0044) model time 0.3963 (0.3982) loss 7.2189 (6.6221) grad_norm 2.4464 (3.1244) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][110/625] eta 0:03:34 lr 0.000113 wd 0.0500 time 0.3972 (0.4162) data time 0.0008 (0.0041) model time 0.3963 (0.3981) loss 7.2396 (6.6330) grad_norm 1.9592 (3.1127) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][120/625] eta 0:03:29 lr 0.000113 wd 0.0500 time 0.4153 (0.4149) data time 0.0007 (0.0038) model time 0.4146 (0.3982) loss 5.3479 (6.6254) grad_norm 2.5889 (3.1033) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][130/625] eta 0:03:24 lr 0.000113 wd 0.0500 time 0.3963 (0.4137) data time 0.0006 (0.0036) model time 0.3956 (0.3982) loss 7.3301 (6.5836) grad_norm 4.2016 (3.0995) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][140/625] eta 0:03:20 lr 0.000113 wd 0.0500 time 0.3966 (0.4126) data time 0.0009 (0.0034) model time 0.3957 (0.3981) loss 6.8094 (6.5630) grad_norm 1.8471 (3.0910) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][150/625] eta 0:03:15 lr 0.000113 wd 0.0500 time 0.4096 (0.4117) data time 0.0006 (0.0032) model time 0.4090 (0.3981) loss 7.1837 (6.5803) grad_norm 6.8399 (3.1178) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][160/625] eta 0:03:11 lr 0.000113 wd 0.0500 time 0.3984 (0.4109) data time 0.0007 (0.0031) model time 0.3977 (0.3982) loss 5.9085 (6.5912) grad_norm 4.2618 (3.2089) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][170/625] eta 0:03:06 lr 0.000113 wd 0.0500 time 0.3964 (0.4101) data time 0.0008 (0.0029) model time 0.3956 (0.3981) loss 7.0184 (6.5869) grad_norm 2.6738 (3.1918) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][180/625] eta 0:03:02 lr 0.000113 wd 0.0500 time 0.3994 (0.4096) data time 0.0006 (0.0028) model time 0.3988 (0.3981) loss 7.1247 (6.5827) grad_norm 3.0983 (3.5245) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][190/625] eta 0:02:59 lr 0.000113 wd 0.0500 time 0.4047 (0.4118) data time 0.0009 (0.0027) model time 0.4039 (0.4019) loss 6.1470 (6.5862) grad_norm 4.1824 (3.5140) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][200/625] eta 0:02:55 lr 0.000113 wd 0.0500 time 0.5974 (0.4141) data time 0.0006 (0.0026) model time 0.5968 (0.4056) loss 6.1293 (6.5865) grad_norm 2.9153 (3.4905) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:16:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][210/625] eta 0:02:52 lr 0.000113 wd 0.0500 time 0.4006 (0.4160) data time 0.0008 (0.0025) model time 0.3998 (0.4086) loss 7.4092 (6.6074) grad_norm 2.9383 (3.4403) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][220/625] eta 0:02:49 lr 0.000113 wd 0.0500 time 0.5736 (0.4181) data time 0.0007 (0.0025) model time 0.5729 (0.4117) loss 6.3140 (6.5963) grad_norm 2.6672 (3.4847) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][230/625] eta 0:02:45 lr 0.000113 wd 0.0500 time 0.3969 (0.4187) data time 0.0007 (0.0024) model time 0.3962 (0.4128) loss 5.2755 (6.5980) grad_norm 2.8627 (3.4822) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][240/625] eta 0:02:41 lr 0.000113 wd 0.0500 time 0.4044 (0.4201) data time 0.0007 (0.0023) model time 0.4036 (0.4148) loss 7.0316 (6.6020) grad_norm 2.0764 (3.4574) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][250/625] eta 0:02:37 lr 0.000112 wd 0.0500 time 0.3983 (0.4207) data time 0.0009 (0.0023) model time 0.3974 (0.4159) loss 5.6212 (6.5866) grad_norm 2.4683 (3.4264) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][260/625] eta 0:02:33 lr 0.000112 wd 0.0500 time 0.3990 (0.4199) data time 0.0008 (0.0022) model time 0.3983 (0.4150) loss 5.0335 (6.5734) grad_norm 2.8982 (3.4127) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][270/625] eta 0:02:29 lr 0.000112 wd 0.0500 time 0.3996 (0.4198) data time 0.0008 (0.0022) model time 0.3988 (0.4150) loss 6.5234 (6.5743) grad_norm 4.3575 (3.4141) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][280/625] eta 0:02:24 lr 0.000112 wd 0.0500 time 0.4012 (0.4190) data time 0.0007 (0.0021) model time 0.4005 (0.4143) loss 5.7567 (6.5583) grad_norm 3.0261 (3.4017) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][290/625] eta 0:02:20 lr 0.000112 wd 0.0500 time 0.3972 (0.4184) data time 0.0008 (0.0021) model time 0.3964 (0.4137) loss 7.4007 (6.5578) grad_norm 2.8608 (3.3741) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][300/625] eta 0:02:15 lr 0.000112 wd 0.0500 time 0.3945 (0.4177) data time 0.0008 (0.0020) model time 0.3937 (0.4131) loss 6.4267 (6.5521) grad_norm 2.5189 (3.3738) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][310/625] eta 0:02:11 lr 0.000112 wd 0.0500 time 0.3979 (0.4171) data time 0.0007 (0.0020) model time 0.3972 (0.4124) loss 6.7556 (6.5520) grad_norm 4.0324 (3.3951) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:17:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][320/625] eta 0:02:07 lr 0.000112 wd 0.0500 time 0.4025 (0.4169) data time 0.0007 (0.0020) model time 0.4018 (0.4123) loss 5.2677 (6.5421) grad_norm 1.7891 (3.3691) loss_scale 128.0000 (65.7944) mem 14939MB [2024-07-25 10:17:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][330/625] eta 0:02:02 lr 0.000112 wd 0.0500 time 0.3917 (0.4163) data time 0.0009 (0.0019) model time 0.3908 (0.4118) loss 5.5571 (6.5470) grad_norm 2.9977 (3.3442) loss_scale 128.0000 (67.6737) mem 14939MB [2024-07-25 10:17:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][340/625] eta 0:01:58 lr 0.000112 wd 0.0500 time 0.3943 (0.4157) data time 0.0008 (0.0019) model time 0.3934 (0.4112) loss 5.7673 (6.5441) grad_norm 2.1999 (3.3463) loss_scale 128.0000 (69.4428) mem 14939MB [2024-07-25 10:17:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][350/625] eta 0:01:54 lr 0.000112 wd 0.0500 time 0.3996 (0.4152) data time 0.0006 (0.0019) model time 0.3989 (0.4107) loss 7.5904 (6.5483) grad_norm 4.8368 (3.3849) loss_scale 128.0000 (71.1111) mem 14939MB [2024-07-25 10:17:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][360/625] eta 0:01:49 lr 0.000112 wd 0.0500 time 0.3957 (0.4148) data time 0.0009 (0.0018) model time 0.3948 (0.4103) loss 6.1465 (6.5461) grad_norm 3.8710 (3.4165) loss_scale 128.0000 (72.6870) mem 14939MB [2024-07-25 10:18:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][370/625] eta 0:01:45 lr 0.000112 wd 0.0500 time 0.3951 (0.4146) data time 0.0006 (0.0018) model time 0.3945 (0.4102) loss 5.5205 (6.5463) grad_norm 3.9478 (3.4402) loss_scale 128.0000 (74.1779) mem 14939MB [2024-07-25 10:18:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][380/625] eta 0:01:41 lr 0.000112 wd 0.0500 time 0.3965 (0.4142) data time 0.0007 (0.0018) model time 0.3958 (0.4099) loss 7.8287 (6.5380) grad_norm 23.4208 (3.5035) loss_scale 128.0000 (75.5906) mem 14939MB [2024-07-25 10:18:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][390/625] eta 0:01:37 lr 0.000112 wd 0.0500 time 0.3984 (0.4139) data time 0.0007 (0.0018) model time 0.3978 (0.4096) loss 6.4168 (6.5351) grad_norm 4.0300 (3.5103) loss_scale 128.0000 (76.9309) mem 14939MB [2024-07-25 10:18:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][400/625] eta 0:01:33 lr 0.000112 wd 0.0500 time 0.3986 (0.4135) data time 0.0007 (0.0017) model time 0.3980 (0.4093) loss 5.2956 (6.5277) grad_norm 2.7810 (3.4892) loss_scale 128.0000 (78.2045) mem 14939MB [2024-07-25 10:18:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][410/625] eta 0:01:29 lr 0.000112 wd 0.0500 time 0.3967 (0.4141) data time 0.0009 (0.0017) model time 0.3959 (0.4101) loss 6.9671 (6.5286) grad_norm 2.4882 (3.4785) loss_scale 128.0000 (79.4161) mem 14939MB [2024-07-25 10:18:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][420/625] eta 0:01:25 lr 0.000111 wd 0.0500 time 0.3943 (0.4150) data time 0.0008 (0.0017) model time 0.3935 (0.4112) loss 5.8441 (6.5346) grad_norm 2.7256 (3.4524) loss_scale 128.0000 (80.5701) mem 14939MB [2024-07-25 10:18:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][430/625] eta 0:01:21 lr 0.000111 wd 0.0500 time 0.5105 (0.4161) data time 0.0009 (0.0017) model time 0.5097 (0.4125) loss 7.8559 (6.5357) grad_norm 3.4014 (3.4617) loss_scale 128.0000 (81.6705) mem 14939MB [2024-07-25 10:18:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][440/625] eta 0:01:17 lr 0.000111 wd 0.0500 time 0.6003 (0.4174) data time 0.0009 (0.0017) model time 0.5994 (0.4140) loss 7.7281 (6.5362) grad_norm 7.9405 (3.5531) loss_scale 128.0000 (82.7211) mem 14939MB [2024-07-25 10:18:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][450/625] eta 0:01:13 lr 0.000111 wd 0.0500 time 0.5847 (0.4182) data time 0.0008 (0.0016) model time 0.5839 (0.4150) loss 6.3339 (6.5294) grad_norm 3.5792 (3.5341) loss_scale 128.0000 (83.7251) mem 14939MB [2024-07-25 10:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][460/625] eta 0:01:09 lr 0.000111 wd 0.0500 time 0.4002 (0.4184) data time 0.0008 (0.0016) model time 0.3994 (0.4153) loss 6.7567 (6.5371) grad_norm 3.0987 (3.5478) loss_scale 128.0000 (84.6855) mem 14939MB [2024-07-25 10:18:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][470/625] eta 0:01:04 lr 0.000111 wd 0.0500 time 0.4014 (0.4188) data time 0.0008 (0.0016) model time 0.4006 (0.4158) loss 6.2851 (6.5405) grad_norm 3.1607 (3.5344) loss_scale 128.0000 (85.6051) mem 14939MB [2024-07-25 10:18:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][480/625] eta 0:01:00 lr 0.000111 wd 0.0500 time 0.5475 (0.4186) data time 0.0007 (0.0016) model time 0.5468 (0.4157) loss 6.5442 (6.5474) grad_norm 2.6860 (3.5314) loss_scale 128.0000 (86.4865) mem 14939MB [2024-07-25 10:18:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][490/625] eta 0:00:56 lr 0.000111 wd 0.0500 time 0.4033 (0.4183) data time 0.0008 (0.0016) model time 0.4025 (0.4153) loss 6.3844 (6.5381) grad_norm 3.3601 (3.5251) loss_scale 128.0000 (87.3320) mem 14939MB [2024-07-25 10:18:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][500/625] eta 0:00:52 lr 0.000111 wd 0.0500 time 0.3943 (0.4178) data time 0.0006 (0.0016) model time 0.3936 (0.4149) loss 6.9568 (6.5422) grad_norm 2.8244 (3.5228) loss_scale 128.0000 (88.1437) mem 14939MB [2024-07-25 10:19:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][510/625] eta 0:00:48 lr 0.000111 wd 0.0500 time 0.3965 (0.4175) data time 0.0008 (0.0015) model time 0.3957 (0.4145) loss 6.8960 (6.5421) grad_norm 2.8040 (3.5085) loss_scale 128.0000 (88.9237) mem 14939MB [2024-07-25 10:19:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][520/625] eta 0:00:43 lr 0.000111 wd 0.0500 time 0.3991 (0.4171) data time 0.0007 (0.0015) model time 0.3985 (0.4142) loss 6.8504 (6.5356) grad_norm 2.8246 (3.5008) loss_scale 128.0000 (89.6737) mem 14939MB [2024-07-25 10:19:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][530/625] eta 0:00:39 lr 0.000111 wd 0.0500 time 0.4003 (0.4168) data time 0.0006 (0.0015) model time 0.3997 (0.4138) loss 6.1135 (6.5359) grad_norm 3.2780 (3.4913) loss_scale 128.0000 (90.3955) mem 14939MB [2024-07-25 10:19:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][540/625] eta 0:00:35 lr 0.000111 wd 0.0500 time 0.3987 (0.4165) data time 0.0006 (0.0015) model time 0.3981 (0.4135) loss 5.7911 (6.5359) grad_norm 3.0142 (3.5413) loss_scale 128.0000 (91.0906) mem 14939MB [2024-07-25 10:19:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][550/625] eta 0:00:31 lr 0.000111 wd 0.0500 time 0.3975 (0.4162) data time 0.0008 (0.0015) model time 0.3967 (0.4132) loss 6.6761 (6.5392) grad_norm 2.6529 (3.6303) loss_scale 128.0000 (91.7604) mem 14939MB [2024-07-25 10:19:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][560/625] eta 0:00:27 lr 0.000111 wd 0.0500 time 0.3966 (0.4158) data time 0.0007 (0.0015) model time 0.3959 (0.4129) loss 7.2812 (6.5414) grad_norm 2.3675 (3.6196) loss_scale 128.0000 (92.4064) mem 14939MB [2024-07-25 10:19:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][570/625] eta 0:00:22 lr 0.000111 wd 0.0500 time 0.3969 (0.4156) data time 0.0008 (0.0015) model time 0.3962 (0.4127) loss 7.3427 (6.5438) grad_norm 2.0545 (3.6008) loss_scale 128.0000 (93.0298) mem 14939MB [2024-07-25 10:19:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][580/625] eta 0:00:18 lr 0.000111 wd 0.0500 time 0.3990 (0.4152) data time 0.0006 (0.0015) model time 0.3984 (0.4124) loss 6.5788 (6.5441) grad_norm 2.3931 (3.5899) loss_scale 128.0000 (93.6317) mem 14939MB [2024-07-25 10:19:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][590/625] eta 0:00:14 lr 0.000110 wd 0.0500 time 0.3982 (0.4150) data time 0.0006 (0.0014) model time 0.3975 (0.4121) loss 6.8275 (6.5391) grad_norm 3.2368 (3.5870) loss_scale 128.0000 (94.2132) mem 14939MB [2024-07-25 10:19:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][600/625] eta 0:00:10 lr 0.000110 wd 0.0500 time 0.3966 (0.4147) data time 0.0007 (0.0014) model time 0.3959 (0.4119) loss 6.5698 (6.5394) grad_norm 2.7112 (3.6754) loss_scale 128.0000 (94.7754) mem 14939MB [2024-07-25 10:19:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][610/625] eta 0:00:06 lr 0.000110 wd 0.0500 time 0.4032 (0.4145) data time 0.0007 (0.0014) model time 0.4025 (0.4116) loss 6.0487 (6.5419) grad_norm 3.4762 (3.6651) loss_scale 128.0000 (95.3191) mem 14939MB [2024-07-25 10:19:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][620/625] eta 0:00:02 lr 0.000110 wd 0.0500 time 0.4000 (0.4142) data time 0.0004 (0.0014) model time 0.3996 (0.4114) loss 7.6569 (6.5412) grad_norm 2.4247 (3.6573) loss_scale 128.0000 (95.8454) mem 14939MB [2024-07-25 10:19:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 247 training takes 0:04:18 [2024-07-25 10:19:47 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:19:48 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:19:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5444 (0.5444) Acc@1 90.332 (90.332) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 10:19:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8271 (0.6694) Acc@1 82.275 (87.167) Acc@5 97.070 (97.958) Mem 14939MB [2024-07-25 10:19:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9292 (0.7727) Acc@1 78.223 (84.205) Acc@5 95.654 (96.959) Mem 14939MB [2024-07-25 10:19:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.821 Acc@5 96.923 [2024-07-25 10:19:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-07-25 10:19:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.806 (0.806) Loss 0.5381 (0.5381) Acc@1 90.283 (90.283) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:19:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.156) Loss 0.8164 (0.6592) Acc@1 83.105 (87.234) Acc@5 96.826 (97.949) Mem 14939MB [2024-07-25 10:19:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.122) Loss 0.9287 (0.7646) Acc@1 78.320 (84.287) Acc@5 95.557 (96.952) Mem 14939MB [2024-07-25 10:19:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.883 Acc@5 96.909 [2024-07-25 10:19:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:19:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.88% [2024-07-25 10:19:53 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:19:54 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:19:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][0/625] eta 0:07:49 lr 0.000110 wd 0.0500 time 0.7513 (0.7513) data time 0.3507 (0.3507) model time 0.0000 (0.0000) loss 7.2840 (7.2840) grad_norm 2.4278 (2.4278) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:20:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][10/625] eta 0:04:52 lr 0.000110 wd 0.0500 time 0.5797 (0.4757) data time 0.0005 (0.0326) model time 0.0000 (0.0000) loss 6.8196 (6.5362) grad_norm 3.7041 (6.4881) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:20:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][20/625] eta 0:04:38 lr 0.000110 wd 0.0500 time 0.3967 (0.4599) data time 0.0006 (0.0175) model time 0.0000 (0.0000) loss 5.9352 (6.5707) grad_norm 5.7171 (inf) loss_scale 64.0000 (118.8571) mem 14939MB [2024-07-25 10:20:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][30/625] eta 0:04:36 lr 0.000110 wd 0.0500 time 0.5985 (0.4639) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 5.8008 (6.5415) grad_norm 3.5190 (inf) loss_scale 64.0000 (101.1613) mem 14939MB [2024-07-25 10:20:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][40/625] eta 0:04:29 lr 0.000110 wd 0.0500 time 0.3990 (0.4611) data time 0.0009 (0.0094) model time 0.0000 (0.0000) loss 6.6722 (6.5598) grad_norm 5.2267 (inf) loss_scale 64.0000 (92.0976) mem 14939MB [2024-07-25 10:20:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][50/625] eta 0:04:22 lr 0.000110 wd 0.0500 time 0.5687 (0.4574) data time 0.0010 (0.0077) model time 0.0000 (0.0000) loss 7.1265 (6.6225) grad_norm 2.3337 (inf) loss_scale 64.0000 (86.5882) mem 14939MB [2024-07-25 10:20:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][60/625] eta 0:04:14 lr 0.000110 wd 0.0500 time 0.3966 (0.4505) data time 0.0008 (0.0066) model time 0.3958 (0.4148) loss 6.4498 (6.6050) grad_norm 2.3015 (inf) loss_scale 64.0000 (82.8852) mem 14939MB [2024-07-25 10:20:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][70/625] eta 0:04:07 lr 0.000110 wd 0.0500 time 0.3986 (0.4459) data time 0.0007 (0.0058) model time 0.3979 (0.4157) loss 7.3209 (6.5695) grad_norm 3.3248 (inf) loss_scale 64.0000 (80.2254) mem 14939MB [2024-07-25 10:20:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][80/625] eta 0:04:00 lr 0.000110 wd 0.0500 time 0.3975 (0.4418) data time 0.0007 (0.0051) model time 0.3968 (0.4145) loss 6.3658 (6.5323) grad_norm 2.8718 (inf) loss_scale 64.0000 (78.2222) mem 14939MB [2024-07-25 10:20:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][90/625] eta 0:03:53 lr 0.000110 wd 0.0500 time 0.3970 (0.4369) data time 0.0008 (0.0047) model time 0.3962 (0.4099) loss 6.1254 (6.5479) grad_norm 2.6021 (inf) loss_scale 64.0000 (76.6593) mem 14939MB [2024-07-25 10:20:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][100/625] eta 0:03:47 lr 0.000110 wd 0.0500 time 0.3987 (0.4330) data time 0.0008 (0.0043) model time 0.3979 (0.4074) loss 5.8921 (6.5541) grad_norm 3.2535 (inf) loss_scale 64.0000 (75.4059) mem 14939MB [2024-07-25 10:20:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][110/625] eta 0:03:41 lr 0.000110 wd 0.0500 time 0.3976 (0.4300) data time 0.0008 (0.0040) model time 0.3968 (0.4059) loss 6.2132 (6.5390) grad_norm 2.4413 (inf) loss_scale 64.0000 (74.3784) mem 14939MB [2024-07-25 10:20:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][120/625] eta 0:03:35 lr 0.000110 wd 0.0500 time 0.3988 (0.4274) data time 0.0008 (0.0037) model time 0.3979 (0.4047) loss 7.4871 (6.5233) grad_norm 2.8836 (inf) loss_scale 64.0000 (73.5207) mem 14939MB [2024-07-25 10:20:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][130/625] eta 0:03:30 lr 0.000110 wd 0.0500 time 0.3958 (0.4252) data time 0.0008 (0.0035) model time 0.3949 (0.4039) loss 6.4976 (6.5305) grad_norm 5.9730 (inf) loss_scale 64.0000 (72.7939) mem 14939MB [2024-07-25 10:20:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][140/625] eta 0:03:25 lr 0.000109 wd 0.0500 time 0.3928 (0.4234) data time 0.0008 (0.0033) model time 0.3920 (0.4034) loss 5.9151 (6.5225) grad_norm 1.9675 (inf) loss_scale 64.0000 (72.1702) mem 14939MB [2024-07-25 10:20:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][150/625] eta 0:03:20 lr 0.000109 wd 0.0500 time 0.4004 (0.4218) data time 0.0009 (0.0031) model time 0.3995 (0.4029) loss 5.6048 (6.5032) grad_norm 2.3650 (inf) loss_scale 64.0000 (71.6291) mem 14939MB [2024-07-25 10:21:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][160/625] eta 0:03:15 lr 0.000109 wd 0.0500 time 0.4004 (0.4205) data time 0.0006 (0.0030) model time 0.3998 (0.4026) loss 5.8484 (6.4938) grad_norm 3.0397 (inf) loss_scale 64.0000 (71.1553) mem 14939MB [2024-07-25 10:21:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][170/625] eta 0:03:10 lr 0.000109 wd 0.0500 time 0.3972 (0.4193) data time 0.0008 (0.0029) model time 0.3963 (0.4023) loss 6.6756 (6.4915) grad_norm 2.6961 (inf) loss_scale 64.0000 (70.7368) mem 14939MB [2024-07-25 10:21:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][180/625] eta 0:03:06 lr 0.000109 wd 0.0500 time 0.3998 (0.4182) data time 0.0006 (0.0028) model time 0.3992 (0.4020) loss 5.8130 (6.4912) grad_norm 14.9854 (inf) loss_scale 64.0000 (70.3646) mem 14939MB [2024-07-25 10:21:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][190/625] eta 0:03:01 lr 0.000109 wd 0.0500 time 0.4003 (0.4181) data time 0.0008 (0.0027) model time 0.3995 (0.4030) loss 5.9815 (6.4970) grad_norm 5.0861 (inf) loss_scale 64.0000 (70.0314) mem 14939MB [2024-07-25 10:21:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][200/625] eta 0:02:57 lr 0.000109 wd 0.0500 time 0.3977 (0.4172) data time 0.0007 (0.0026) model time 0.3970 (0.4027) loss 6.2619 (6.5075) grad_norm 3.5595 (inf) loss_scale 64.0000 (69.7313) mem 14939MB [2024-07-25 10:21:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][210/625] eta 0:02:52 lr 0.000109 wd 0.0500 time 0.3974 (0.4164) data time 0.0009 (0.0025) model time 0.3965 (0.4025) loss 7.5425 (6.5231) grad_norm 2.0626 (inf) loss_scale 64.0000 (69.4597) mem 14939MB [2024-07-25 10:21:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][220/625] eta 0:02:48 lr 0.000109 wd 0.0500 time 0.5096 (0.4161) data time 0.0009 (0.0024) model time 0.5087 (0.4030) loss 7.1926 (6.5263) grad_norm 2.7580 (inf) loss_scale 64.0000 (69.2127) mem 14939MB [2024-07-25 10:21:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][230/625] eta 0:02:44 lr 0.000109 wd 0.0500 time 0.3970 (0.4164) data time 0.0006 (0.0023) model time 0.3964 (0.4040) loss 6.6889 (6.5255) grad_norm 2.0866 (inf) loss_scale 64.0000 (68.9870) mem 14939MB [2024-07-25 10:21:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][240/625] eta 0:02:40 lr 0.000109 wd 0.0500 time 0.4017 (0.4180) data time 0.0008 (0.0023) model time 0.4008 (0.4066) loss 7.3261 (6.5283) grad_norm 8.2623 (inf) loss_scale 64.0000 (68.7801) mem 14939MB [2024-07-25 10:21:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][250/625] eta 0:02:37 lr 0.000109 wd 0.0500 time 0.3983 (0.4213) data time 0.0006 (0.0022) model time 0.3977 (0.4112) loss 7.4325 (6.5275) grad_norm 2.5445 (inf) loss_scale 64.0000 (68.5896) mem 14939MB [2024-07-25 10:21:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][260/625] eta 0:02:34 lr 0.000109 wd 0.0500 time 0.3972 (0.4231) data time 0.0006 (0.0022) model time 0.3967 (0.4139) loss 7.9035 (6.5240) grad_norm 4.5184 (inf) loss_scale 64.0000 (68.4138) mem 14939MB [2024-07-25 10:21:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][270/625] eta 0:02:30 lr 0.000109 wd 0.0500 time 0.3999 (0.4230) data time 0.0006 (0.0021) model time 0.3993 (0.4142) loss 6.0302 (6.5261) grad_norm 1.9226 (inf) loss_scale 64.0000 (68.2509) mem 14939MB [2024-07-25 10:21:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][280/625] eta 0:02:26 lr 0.000109 wd 0.0500 time 0.3996 (0.4237) data time 0.0009 (0.0021) model time 0.3987 (0.4154) loss 5.3685 (6.5244) grad_norm 2.9763 (inf) loss_scale 64.0000 (68.0996) mem 14939MB [2024-07-25 10:21:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][290/625] eta 0:02:21 lr 0.000109 wd 0.0500 time 0.3968 (0.4232) data time 0.0005 (0.0020) model time 0.3963 (0.4151) loss 7.5653 (6.5251) grad_norm 2.3227 (inf) loss_scale 64.0000 (67.9588) mem 14939MB [2024-07-25 10:22:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][300/625] eta 0:02:17 lr 0.000109 wd 0.0500 time 0.3996 (0.4230) data time 0.0006 (0.0020) model time 0.3990 (0.4152) loss 5.9799 (6.5235) grad_norm 2.3162 (inf) loss_scale 64.0000 (67.8272) mem 14939MB [2024-07-25 10:22:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][310/625] eta 0:02:12 lr 0.000108 wd 0.0500 time 0.3970 (0.4222) data time 0.0007 (0.0020) model time 0.3963 (0.4145) loss 6.9030 (6.5277) grad_norm 4.4898 (inf) loss_scale 64.0000 (67.7042) mem 14939MB [2024-07-25 10:22:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][320/625] eta 0:02:08 lr 0.000108 wd 0.0500 time 0.4164 (0.4215) data time 0.0007 (0.0019) model time 0.4157 (0.4139) loss 7.4132 (6.5282) grad_norm 2.7502 (inf) loss_scale 64.0000 (67.5888) mem 14939MB [2024-07-25 10:22:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][330/625] eta 0:02:04 lr 0.000108 wd 0.0500 time 0.3944 (0.4208) data time 0.0006 (0.0019) model time 0.3938 (0.4133) loss 5.1194 (6.5255) grad_norm 1.7701 (inf) loss_scale 64.0000 (67.4804) mem 14939MB [2024-07-25 10:22:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][340/625] eta 0:01:59 lr 0.000108 wd 0.0500 time 0.3960 (0.4201) data time 0.0010 (0.0019) model time 0.3950 (0.4127) loss 7.0420 (6.5202) grad_norm 3.4980 (inf) loss_scale 64.0000 (67.3783) mem 14939MB [2024-07-25 10:22:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][350/625] eta 0:01:55 lr 0.000108 wd 0.0500 time 0.4086 (0.4195) data time 0.0006 (0.0018) model time 0.4080 (0.4122) loss 5.7267 (6.5197) grad_norm 4.1133 (inf) loss_scale 64.0000 (67.2821) mem 14939MB [2024-07-25 10:22:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][360/625] eta 0:01:51 lr 0.000108 wd 0.0500 time 0.3951 (0.4189) data time 0.0007 (0.0018) model time 0.3945 (0.4118) loss 5.6755 (6.5197) grad_norm 3.7538 (inf) loss_scale 64.0000 (67.1911) mem 14939MB [2024-07-25 10:22:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][370/625] eta 0:01:46 lr 0.000108 wd 0.0500 time 0.3974 (0.4184) data time 0.0006 (0.0018) model time 0.3968 (0.4114) loss 6.9528 (6.5294) grad_norm 3.1275 (inf) loss_scale 64.0000 (67.1051) mem 14939MB [2024-07-25 10:22:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][380/625] eta 0:01:42 lr 0.000108 wd 0.0500 time 0.4098 (0.4179) data time 0.0008 (0.0018) model time 0.4090 (0.4110) loss 6.9997 (6.5247) grad_norm 3.0767 (inf) loss_scale 64.0000 (67.0236) mem 14939MB [2024-07-25 10:22:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][390/625] eta 0:01:38 lr 0.000108 wd 0.0500 time 0.3987 (0.4175) data time 0.0007 (0.0017) model time 0.3981 (0.4107) loss 6.4222 (6.5204) grad_norm 2.7560 (inf) loss_scale 64.0000 (66.9463) mem 14939MB [2024-07-25 10:22:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][400/625] eta 0:01:33 lr 0.000108 wd 0.0500 time 0.4029 (0.4171) data time 0.0007 (0.0017) model time 0.4022 (0.4104) loss 5.6573 (6.5068) grad_norm 2.3964 (inf) loss_scale 64.0000 (66.8728) mem 14939MB [2024-07-25 10:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][410/625] eta 0:01:29 lr 0.000108 wd 0.0500 time 0.4018 (0.4170) data time 0.0006 (0.0017) model time 0.4012 (0.4105) loss 6.2370 (6.5000) grad_norm 2.3041 (inf) loss_scale 64.0000 (66.8029) mem 14939MB [2024-07-25 10:22:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][420/625] eta 0:01:25 lr 0.000108 wd 0.0500 time 0.3976 (0.4167) data time 0.0009 (0.0017) model time 0.3967 (0.4102) loss 6.9111 (6.5054) grad_norm 2.7415 (inf) loss_scale 64.0000 (66.7363) mem 14939MB [2024-07-25 10:22:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][430/625] eta 0:01:21 lr 0.000108 wd 0.0500 time 0.4029 (0.4163) data time 0.0006 (0.0016) model time 0.4023 (0.4100) loss 5.5111 (6.5093) grad_norm 2.5267 (inf) loss_scale 64.0000 (66.6729) mem 14939MB [2024-07-25 10:22:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][440/625] eta 0:01:17 lr 0.000108 wd 0.0500 time 0.5857 (0.4164) data time 0.0007 (0.0016) model time 0.5850 (0.4102) loss 7.0081 (6.5048) grad_norm 3.1963 (inf) loss_scale 64.0000 (66.6122) mem 14939MB [2024-07-25 10:23:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][450/625] eta 0:01:12 lr 0.000108 wd 0.0500 time 0.3986 (0.4168) data time 0.0006 (0.0016) model time 0.3980 (0.4108) loss 7.2147 (6.5062) grad_norm 2.4155 (inf) loss_scale 64.0000 (66.5543) mem 14939MB [2024-07-25 10:23:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][460/625] eta 0:01:08 lr 0.000108 wd 0.0500 time 0.5687 (0.4175) data time 0.0007 (0.0016) model time 0.5680 (0.4117) loss 6.3500 (6.5062) grad_norm 3.6176 (inf) loss_scale 64.0000 (66.4989) mem 14939MB [2024-07-25 10:23:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][470/625] eta 0:01:04 lr 0.000108 wd 0.0500 time 0.3997 (0.4190) data time 0.0008 (0.0016) model time 0.3989 (0.4136) loss 5.5640 (6.5081) grad_norm 2.1709 (inf) loss_scale 64.0000 (66.4459) mem 14939MB [2024-07-25 10:23:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][480/625] eta 0:01:00 lr 0.000107 wd 0.0500 time 0.4014 (0.4203) data time 0.0007 (0.0016) model time 0.4007 (0.4150) loss 7.1845 (6.5134) grad_norm 4.0113 (inf) loss_scale 64.0000 (66.3950) mem 14939MB [2024-07-25 10:23:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][490/625] eta 0:00:56 lr 0.000107 wd 0.0500 time 0.5711 (0.4209) data time 0.0007 (0.0015) model time 0.5704 (0.4159) loss 8.1103 (6.5116) grad_norm 2.3972 (inf) loss_scale 64.0000 (66.3462) mem 14939MB [2024-07-25 10:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][500/625] eta 0:00:52 lr 0.000107 wd 0.0500 time 0.5667 (0.4213) data time 0.0008 (0.0015) model time 0.5659 (0.4164) loss 7.0150 (6.5123) grad_norm 3.3391 (inf) loss_scale 64.0000 (66.2994) mem 14939MB [2024-07-25 10:23:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][510/625] eta 0:00:48 lr 0.000107 wd 0.0500 time 0.3942 (0.4211) data time 0.0007 (0.0015) model time 0.3934 (0.4162) loss 6.6801 (6.5157) grad_norm 3.5134 (inf) loss_scale 64.0000 (66.2544) mem 14939MB [2024-07-25 10:23:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][520/625] eta 0:00:44 lr 0.000107 wd 0.0500 time 0.4051 (0.4208) data time 0.0008 (0.0015) model time 0.4042 (0.4160) loss 7.0831 (6.5200) grad_norm 3.0273 (inf) loss_scale 64.0000 (66.2111) mem 14939MB [2024-07-25 10:23:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][530/625] eta 0:00:39 lr 0.000107 wd 0.0500 time 0.3938 (0.4204) data time 0.0006 (0.0015) model time 0.3932 (0.4157) loss 6.6527 (6.5237) grad_norm 3.6930 (inf) loss_scale 64.0000 (66.1695) mem 14939MB [2024-07-25 10:23:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][540/625] eta 0:00:35 lr 0.000107 wd 0.0500 time 0.4060 (0.4200) data time 0.0008 (0.0015) model time 0.4052 (0.4153) loss 7.2593 (6.5290) grad_norm 2.0289 (inf) loss_scale 64.0000 (66.1294) mem 14939MB [2024-07-25 10:23:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][550/625] eta 0:00:31 lr 0.000107 wd 0.0500 time 0.3971 (0.4197) data time 0.0010 (0.0015) model time 0.3961 (0.4150) loss 6.5820 (6.5274) grad_norm 3.0266 (inf) loss_scale 64.0000 (66.0907) mem 14939MB [2024-07-25 10:23:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][560/625] eta 0:00:27 lr 0.000107 wd 0.0500 time 0.3948 (0.4193) data time 0.0009 (0.0015) model time 0.3940 (0.4146) loss 5.7788 (6.5246) grad_norm 2.7656 (inf) loss_scale 64.0000 (66.0535) mem 14939MB [2024-07-25 10:23:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][570/625] eta 0:00:23 lr 0.000107 wd 0.0500 time 0.4055 (0.4189) data time 0.0006 (0.0014) model time 0.4049 (0.4143) loss 6.4363 (6.5218) grad_norm 2.8204 (inf) loss_scale 64.0000 (66.0175) mem 14939MB [2024-07-25 10:23:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][580/625] eta 0:00:18 lr 0.000107 wd 0.0500 time 0.3987 (0.4185) data time 0.0007 (0.0014) model time 0.3980 (0.4140) loss 7.6504 (6.5211) grad_norm 2.7922 (inf) loss_scale 64.0000 (65.9828) mem 14939MB [2024-07-25 10:24:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][590/625] eta 0:00:14 lr 0.000107 wd 0.0500 time 0.4130 (0.4182) data time 0.0009 (0.0014) model time 0.4120 (0.4137) loss 7.1938 (6.5201) grad_norm 2.7055 (inf) loss_scale 64.0000 (65.9492) mem 14939MB [2024-07-25 10:24:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][600/625] eta 0:00:10 lr 0.000107 wd 0.0500 time 0.3999 (0.4179) data time 0.0007 (0.0014) model time 0.3992 (0.4134) loss 5.2868 (6.5176) grad_norm 3.5140 (inf) loss_scale 64.0000 (65.9168) mem 14939MB [2024-07-25 10:24:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][610/625] eta 0:00:06 lr 0.000107 wd 0.0500 time 0.3986 (0.4176) data time 0.0006 (0.0014) model time 0.3980 (0.4131) loss 6.6996 (6.5231) grad_norm 3.2396 (inf) loss_scale 64.0000 (65.8854) mem 14939MB [2024-07-25 10:24:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][620/625] eta 0:00:02 lr 0.000107 wd 0.0500 time 0.3980 (0.4173) data time 0.0006 (0.0014) model time 0.3974 (0.4129) loss 6.9794 (6.5270) grad_norm 3.2395 (inf) loss_scale 64.0000 (65.8551) mem 14939MB [2024-07-25 10:24:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 248 training takes 0:04:20 [2024-07-25 10:24:15 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:24:16 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:24:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5376 (0.5376) Acc@1 90.332 (90.332) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 10:24:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8203 (0.6583) Acc@1 82.031 (87.336) Acc@5 96.826 (97.980) Mem 14939MB [2024-07-25 10:24:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9209 (0.7664) Acc@1 79.248 (84.408) Acc@5 95.605 (96.949) Mem 14939MB [2024-07-25 10:24:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.013 Acc@5 96.919 [2024-07-25 10:24:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 10:24:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.01% [2024-07-25 10:24:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 10:24:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 10:24:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.5371 (0.5371) Acc@1 90.234 (90.234) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:24:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8159 (0.6588) Acc@1 83.154 (87.243) Acc@5 96.777 (97.945) Mem 14939MB [2024-07-25 10:24:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9272 (0.7641) Acc@1 78.271 (84.303) Acc@5 95.557 (96.956) Mem 14939MB [2024-07-25 10:24:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.897 Acc@5 96.913 [2024-07-25 10:24:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:24:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.90% [2024-07-25 10:24:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:24:23 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:24:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][0/625] eta 0:07:34 lr 0.000107 wd 0.0500 time 0.7268 (0.7268) data time 0.3471 (0.3471) model time 0.0000 (0.0000) loss 6.1218 (6.1218) grad_norm 2.1040 (2.1040) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:24:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][10/625] eta 0:04:23 lr 0.000107 wd 0.0500 time 0.4048 (0.4282) data time 0.0007 (0.0323) model time 0.0000 (0.0000) loss 7.4991 (6.6882) grad_norm 2.9106 (6.1036) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:24:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][20/625] eta 0:04:11 lr 0.000107 wd 0.0500 time 0.3988 (0.4152) data time 0.0006 (0.0173) model time 0.0000 (0.0000) loss 6.9267 (6.5777) grad_norm 5.1187 (4.9641) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:24:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][30/625] eta 0:04:03 lr 0.000106 wd 0.0500 time 0.3997 (0.4098) data time 0.0007 (0.0120) model time 0.0000 (0.0000) loss 6.5115 (6.5272) grad_norm 2.3696 (4.3875) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:24:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][40/625] eta 0:04:02 lr 0.000106 wd 0.0500 time 0.3962 (0.4148) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 5.5191 (6.5377) grad_norm 2.6726 (4.0381) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:24:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][50/625] eta 0:04:01 lr 0.000106 wd 0.0500 time 0.5314 (0.4203) data time 0.0006 (0.0077) model time 0.0000 (0.0000) loss 6.7054 (6.5242) grad_norm 17.2507 (4.0669) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:24:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][60/625] eta 0:03:57 lr 0.000106 wd 0.0500 time 0.4230 (0.4200) data time 0.0008 (0.0066) model time 0.4222 (0.4175) loss 6.8963 (6.5207) grad_norm 2.4259 (3.8965) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:24:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][70/625] eta 0:03:58 lr 0.000106 wd 0.0500 time 0.5731 (0.4294) data time 0.0007 (0.0057) model time 0.5724 (0.4517) loss 6.9873 (6.5455) grad_norm 2.6077 (3.8191) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:24:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][80/625] eta 0:03:54 lr 0.000106 wd 0.0500 time 0.4008 (0.4297) data time 0.0007 (0.0051) model time 0.4001 (0.4448) loss 6.2977 (6.5866) grad_norm 1.9680 (3.7889) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][90/625] eta 0:03:52 lr 0.000106 wd 0.0500 time 0.5879 (0.4349) data time 0.0008 (0.0047) model time 0.5871 (0.4525) loss 7.1556 (6.5512) grad_norm 3.0079 (3.6893) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][100/625] eta 0:03:48 lr 0.000106 wd 0.0500 time 0.3984 (0.4345) data time 0.0008 (0.0043) model time 0.3976 (0.4482) loss 7.6287 (6.5901) grad_norm 4.9953 (3.6307) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][110/625] eta 0:03:42 lr 0.000106 wd 0.0500 time 0.3964 (0.4329) data time 0.0007 (0.0040) model time 0.3957 (0.4428) loss 6.0679 (6.5545) grad_norm 3.2589 (3.5819) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][120/625] eta 0:03:37 lr 0.000106 wd 0.0500 time 0.3985 (0.4311) data time 0.0007 (0.0037) model time 0.3979 (0.4380) loss 7.7407 (6.5636) grad_norm 3.3349 (3.6661) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][130/625] eta 0:03:32 lr 0.000106 wd 0.0500 time 0.4003 (0.4295) data time 0.0007 (0.0035) model time 0.3997 (0.4346) loss 6.5733 (6.5332) grad_norm 2.6392 (3.6190) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][140/625] eta 0:03:27 lr 0.000106 wd 0.0500 time 0.3969 (0.4273) data time 0.0009 (0.0033) model time 0.3960 (0.4304) loss 5.4365 (6.5125) grad_norm 3.3383 (3.5722) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][150/625] eta 0:03:22 lr 0.000106 wd 0.0500 time 0.3992 (0.4255) data time 0.0006 (0.0031) model time 0.3986 (0.4273) loss 6.8448 (6.5227) grad_norm 3.3826 (3.5419) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][160/625] eta 0:03:17 lr 0.000106 wd 0.0500 time 0.3984 (0.4238) data time 0.0007 (0.0030) model time 0.3977 (0.4246) loss 6.6939 (6.5262) grad_norm 4.1244 (3.5742) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][170/625] eta 0:03:12 lr 0.000106 wd 0.0500 time 0.4004 (0.4224) data time 0.0006 (0.0029) model time 0.3998 (0.4224) loss 6.8786 (6.5254) grad_norm 1.7262 (3.5443) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][180/625] eta 0:03:07 lr 0.000106 wd 0.0500 time 0.4053 (0.4211) data time 0.0008 (0.0028) model time 0.4045 (0.4206) loss 7.9250 (6.5216) grad_norm 2.6921 (3.5378) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][190/625] eta 0:03:02 lr 0.000106 wd 0.0500 time 0.3975 (0.4200) data time 0.0006 (0.0027) model time 0.3969 (0.4190) loss 5.6722 (6.5330) grad_norm 2.7202 (3.5182) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][200/625] eta 0:02:58 lr 0.000105 wd 0.0500 time 0.3952 (0.4189) data time 0.0008 (0.0026) model time 0.3944 (0.4176) loss 7.2701 (6.5259) grad_norm 3.3754 (3.7795) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][210/625] eta 0:02:53 lr 0.000105 wd 0.0500 time 0.4104 (0.4180) data time 0.0006 (0.0025) model time 0.4098 (0.4164) loss 6.8614 (6.5351) grad_norm 2.4855 (3.7542) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][220/625] eta 0:02:48 lr 0.000105 wd 0.0500 time 0.3959 (0.4171) data time 0.0006 (0.0024) model time 0.3953 (0.4153) loss 6.9905 (6.5432) grad_norm 2.2677 (3.7246) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:25:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][230/625] eta 0:02:44 lr 0.000105 wd 0.0500 time 0.3947 (0.4163) data time 0.0009 (0.0023) model time 0.3937 (0.4144) loss 5.8595 (6.5238) grad_norm 4.4612 (3.8066) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][240/625] eta 0:02:39 lr 0.000105 wd 0.0500 time 0.4002 (0.4155) data time 0.0006 (0.0023) model time 0.3995 (0.4134) loss 6.2500 (6.5346) grad_norm 3.6524 (3.8282) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][250/625] eta 0:02:35 lr 0.000105 wd 0.0500 time 0.4027 (0.4149) data time 0.0006 (0.0022) model time 0.4021 (0.4127) loss 5.1965 (6.5274) grad_norm 3.9924 (3.8484) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][260/625] eta 0:02:31 lr 0.000105 wd 0.0500 time 0.4008 (0.4151) data time 0.0006 (0.0022) model time 0.4002 (0.4130) loss 6.3982 (6.5155) grad_norm 3.2193 (3.8041) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][270/625] eta 0:02:27 lr 0.000105 wd 0.0500 time 0.3959 (0.4162) data time 0.0007 (0.0022) model time 0.3952 (0.4143) loss 7.0016 (6.5207) grad_norm 3.3793 (3.7836) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][280/625] eta 0:02:24 lr 0.000105 wd 0.0500 time 0.5587 (0.4179) data time 0.0006 (0.0021) model time 0.5581 (0.4165) loss 5.8501 (6.5238) grad_norm 1.9691 (3.7485) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][290/625] eta 0:02:20 lr 0.000105 wd 0.0500 time 0.4072 (0.4188) data time 0.0006 (0.0021) model time 0.4065 (0.4176) loss 5.5708 (6.5242) grad_norm 2.4569 (3.7079) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][300/625] eta 0:02:16 lr 0.000105 wd 0.0500 time 0.6036 (0.4207) data time 0.0008 (0.0020) model time 0.6028 (0.4199) loss 6.9618 (6.5352) grad_norm 2.2664 (3.7089) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][310/625] eta 0:02:12 lr 0.000105 wd 0.0500 time 0.5695 (0.4217) data time 0.0009 (0.0020) model time 0.5686 (0.4210) loss 7.8384 (6.5510) grad_norm 1.7565 (3.6993) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][320/625] eta 0:02:08 lr 0.000105 wd 0.0500 time 0.5316 (0.4220) data time 0.0007 (0.0020) model time 0.5310 (0.4214) loss 5.7182 (6.5541) grad_norm 2.0479 (3.6865) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][330/625] eta 0:02:04 lr 0.000105 wd 0.0500 time 0.3969 (0.4212) data time 0.0009 (0.0019) model time 0.3960 (0.4205) loss 6.4260 (6.5612) grad_norm 5.7097 (3.6779) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][340/625] eta 0:02:00 lr 0.000105 wd 0.0500 time 0.3936 (0.4211) data time 0.0008 (0.0019) model time 0.3927 (0.4203) loss 7.0540 (6.5575) grad_norm 4.1997 (3.6767) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][350/625] eta 0:01:55 lr 0.000105 wd 0.0500 time 0.3944 (0.4210) data time 0.0009 (0.0019) model time 0.3935 (0.4202) loss 7.2953 (6.5649) grad_norm 5.4253 (3.6753) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][360/625] eta 0:01:51 lr 0.000105 wd 0.0500 time 0.3974 (0.4204) data time 0.0007 (0.0018) model time 0.3967 (0.4195) loss 5.8015 (6.5640) grad_norm 2.0966 (3.6549) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:26:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][370/625] eta 0:01:47 lr 0.000104 wd 0.0500 time 0.3979 (0.4198) data time 0.0007 (0.0018) model time 0.3972 (0.4189) loss 6.5685 (6.5753) grad_norm 3.1597 (3.7118) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][380/625] eta 0:01:42 lr 0.000104 wd 0.0500 time 0.3946 (0.4193) data time 0.0008 (0.0018) model time 0.3938 (0.4183) loss 7.1142 (6.5769) grad_norm 3.1172 (3.6934) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][390/625] eta 0:01:38 lr 0.000104 wd 0.0500 time 0.4000 (0.4188) data time 0.0007 (0.0018) model time 0.3994 (0.4177) loss 6.5900 (6.5677) grad_norm 1.8995 (3.6717) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][400/625] eta 0:01:34 lr 0.000104 wd 0.0500 time 0.3941 (0.4183) data time 0.0009 (0.0017) model time 0.3932 (0.4172) loss 7.2877 (6.5780) grad_norm 3.1255 (3.6534) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][410/625] eta 0:01:29 lr 0.000104 wd 0.0500 time 0.4003 (0.4179) data time 0.0009 (0.0017) model time 0.3994 (0.4167) loss 5.5183 (6.5788) grad_norm 1.9213 (3.6812) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][420/625] eta 0:01:25 lr 0.000104 wd 0.0500 time 0.4062 (0.4175) data time 0.0006 (0.0017) model time 0.4056 (0.4162) loss 6.1122 (6.5743) grad_norm 3.1817 (3.6536) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][430/625] eta 0:01:21 lr 0.000104 wd 0.0500 time 0.3977 (0.4171) data time 0.0007 (0.0017) model time 0.3970 (0.4158) loss 5.6448 (6.5698) grad_norm 2.2927 (3.6289) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][440/625] eta 0:01:17 lr 0.000104 wd 0.0500 time 0.3975 (0.4167) data time 0.0007 (0.0017) model time 0.3968 (0.4154) loss 6.4191 (6.5653) grad_norm 3.5523 (3.6217) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][450/625] eta 0:01:12 lr 0.000104 wd 0.0500 time 0.4026 (0.4163) data time 0.0009 (0.0016) model time 0.4017 (0.4150) loss 6.5002 (6.5657) grad_norm 3.9831 (3.6103) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][460/625] eta 0:01:08 lr 0.000104 wd 0.0500 time 0.3970 (0.4159) data time 0.0006 (0.0016) model time 0.3963 (0.4145) loss 7.3719 (6.5679) grad_norm 2.0904 (3.5824) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][470/625] eta 0:01:04 lr 0.000104 wd 0.0500 time 0.3964 (0.4156) data time 0.0006 (0.0016) model time 0.3958 (0.4142) loss 7.1600 (6.5751) grad_norm 2.0715 (3.6329) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][480/625] eta 0:01:00 lr 0.000104 wd 0.0500 time 0.4020 (0.4156) data time 0.0008 (0.0016) model time 0.4012 (0.4142) loss 5.4698 (6.5685) grad_norm 5.0565 (3.6418) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][490/625] eta 0:00:56 lr 0.000104 wd 0.0500 time 0.5717 (0.4163) data time 0.0007 (0.0016) model time 0.5710 (0.4150) loss 7.8970 (6.5731) grad_norm 2.6423 (3.6395) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][500/625] eta 0:00:52 lr 0.000104 wd 0.0500 time 0.3983 (0.4168) data time 0.0008 (0.0016) model time 0.3975 (0.4155) loss 7.2114 (6.5756) grad_norm 2.3601 (3.6332) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:27:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][510/625] eta 0:00:48 lr 0.000104 wd 0.0500 time 0.5823 (0.4184) data time 0.0007 (0.0015) model time 0.5816 (0.4173) loss 6.0265 (6.5785) grad_norm 2.3889 (3.6150) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][520/625] eta 0:00:44 lr 0.000104 wd 0.0500 time 0.4189 (0.4194) data time 0.0007 (0.0015) model time 0.4182 (0.4184) loss 5.9448 (6.5850) grad_norm 2.8475 (3.6028) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][530/625] eta 0:00:39 lr 0.000104 wd 0.0500 time 0.4108 (0.4200) data time 0.0009 (0.0015) model time 0.4100 (0.4191) loss 7.1068 (6.5907) grad_norm 5.0998 (3.5871) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][540/625] eta 0:00:35 lr 0.000104 wd 0.0500 time 0.5798 (0.4203) data time 0.0006 (0.0015) model time 0.5793 (0.4195) loss 6.1974 (6.5857) grad_norm 4.4722 (3.5997) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][550/625] eta 0:00:31 lr 0.000103 wd 0.0500 time 0.3970 (0.4199) data time 0.0007 (0.0015) model time 0.3964 (0.4190) loss 6.5507 (6.5923) grad_norm 2.3928 (3.6244) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][560/625] eta 0:00:27 lr 0.000103 wd 0.0500 time 0.4037 (0.4199) data time 0.0012 (0.0015) model time 0.4025 (0.4190) loss 7.2213 (6.5939) grad_norm 2.5860 (3.6203) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][570/625] eta 0:00:23 lr 0.000103 wd 0.0500 time 0.3959 (0.4198) data time 0.0007 (0.0015) model time 0.3952 (0.4189) loss 6.3040 (6.5953) grad_norm 2.4988 (3.6157) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][580/625] eta 0:00:18 lr 0.000103 wd 0.0500 time 0.3987 (0.4195) data time 0.0006 (0.0015) model time 0.3981 (0.4185) loss 6.5078 (6.5938) grad_norm 2.6485 (3.6210) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][590/625] eta 0:00:14 lr 0.000103 wd 0.0500 time 0.3971 (0.4191) data time 0.0007 (0.0014) model time 0.3964 (0.4182) loss 6.8843 (6.5931) grad_norm 2.3884 (3.6578) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][600/625] eta 0:00:10 lr 0.000103 wd 0.0500 time 0.4048 (0.4188) data time 0.0009 (0.0014) model time 0.4040 (0.4178) loss 5.4333 (6.5935) grad_norm 4.2578 (3.6552) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][610/625] eta 0:00:06 lr 0.000103 wd 0.0500 time 0.3971 (0.4185) data time 0.0005 (0.0014) model time 0.3965 (0.4175) loss 7.1510 (6.5908) grad_norm 3.3205 (3.6387) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][620/625] eta 0:00:02 lr 0.000103 wd 0.0500 time 0.3978 (0.4182) data time 0.0004 (0.0014) model time 0.3974 (0.4171) loss 7.6528 (6.5895) grad_norm 2.5683 (3.7209) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 249 training takes 0:04:21 [2024-07-25 10:28:44 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:28:45 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.5361 (0.5361) Acc@1 90.381 (90.381) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 10:28:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8228 (0.6587) Acc@1 82.275 (87.287) Acc@5 97.119 (97.963) Mem 14939MB [2024-07-25 10:28:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9160 (0.7655) Acc@1 79.102 (84.359) Acc@5 95.996 (96.987) Mem 14939MB [2024-07-25 10:28:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.987 Acc@5 96.957 [2024-07-25 10:28:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 10:28:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.797 (0.797) Loss 0.5376 (0.5376) Acc@1 90.186 (90.186) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:28:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.8164 (0.6586) Acc@1 83.154 (87.278) Acc@5 96.777 (97.936) Mem 14939MB [2024-07-25 10:28:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9268 (0.7639) Acc@1 78.369 (84.328) Acc@5 95.557 (96.959) Mem 14939MB [2024-07-25 10:28:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.923 Acc@5 96.917 [2024-07-25 10:28:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:28:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.92% [2024-07-25 10:28:51 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:28:51 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:28:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][0/625] eta 0:07:48 lr 0.000103 wd 0.0500 time 0.7491 (0.7491) data time 0.3657 (0.3657) model time 0.0000 (0.0000) loss 5.4125 (5.4125) grad_norm 1.8885 (1.8885) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:28:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][10/625] eta 0:04:24 lr 0.000103 wd 0.0500 time 0.3964 (0.4296) data time 0.0006 (0.0340) model time 0.0000 (0.0000) loss 6.6315 (6.4979) grad_norm 2.6456 (3.7666) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][20/625] eta 0:04:10 lr 0.000103 wd 0.0500 time 0.3958 (0.4148) data time 0.0009 (0.0182) model time 0.0000 (0.0000) loss 5.7658 (6.3504) grad_norm 2.6891 (3.1309) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][30/625] eta 0:04:03 lr 0.000103 wd 0.0500 time 0.3967 (0.4099) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 5.3443 (6.2920) grad_norm 2.6879 (3.2012) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][40/625] eta 0:03:58 lr 0.000103 wd 0.0500 time 0.3994 (0.4073) data time 0.0007 (0.0097) model time 0.0000 (0.0000) loss 6.4000 (6.3688) grad_norm 1.9173 (3.3134) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][50/625] eta 0:03:53 lr 0.000103 wd 0.0500 time 0.3967 (0.4063) data time 0.0007 (0.0080) model time 0.0000 (0.0000) loss 6.5812 (6.3968) grad_norm 2.9556 (3.6998) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][60/625] eta 0:03:48 lr 0.000103 wd 0.0500 time 0.4002 (0.4053) data time 0.0008 (0.0068) model time 0.3994 (0.3992) loss 6.8327 (6.3872) grad_norm 2.7218 (4.2096) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][70/625] eta 0:03:44 lr 0.000103 wd 0.0500 time 0.3973 (0.4047) data time 0.0008 (0.0060) model time 0.3965 (0.3997) loss 6.8423 (6.3632) grad_norm 3.0262 (4.4225) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][80/625] eta 0:03:42 lr 0.000103 wd 0.0500 time 0.3975 (0.4077) data time 0.0007 (0.0053) model time 0.3968 (0.4093) loss 7.5567 (6.3800) grad_norm 11.7919 (4.5344) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][90/625] eta 0:03:40 lr 0.000103 wd 0.0500 time 0.5917 (0.4129) data time 0.0008 (0.0048) model time 0.5909 (0.4204) loss 5.8364 (6.3924) grad_norm 3.8280 (4.4246) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][100/625] eta 0:03:41 lr 0.000102 wd 0.0500 time 0.6003 (0.4221) data time 0.0009 (0.0044) model time 0.5994 (0.4374) loss 5.9911 (6.4439) grad_norm 3.1503 (4.2972) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][110/625] eta 0:03:40 lr 0.000102 wd 0.0500 time 0.5561 (0.4273) data time 0.0009 (0.0041) model time 0.5552 (0.4444) loss 6.3603 (6.4741) grad_norm 2.8091 (4.1642) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][120/625] eta 0:03:36 lr 0.000102 wd 0.0500 time 0.5475 (0.4293) data time 0.0009 (0.0039) model time 0.5467 (0.4453) loss 6.6350 (6.5049) grad_norm 2.3215 (4.0696) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][130/625] eta 0:03:32 lr 0.000102 wd 0.0500 time 0.3957 (0.4286) data time 0.0007 (0.0036) model time 0.3951 (0.4420) loss 7.9056 (6.5349) grad_norm 2.6408 (4.0506) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][140/625] eta 0:03:27 lr 0.000102 wd 0.0500 time 0.3959 (0.4287) data time 0.0010 (0.0034) model time 0.3949 (0.4405) loss 7.3801 (6.5475) grad_norm 2.7086 (4.0276) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][150/625] eta 0:03:22 lr 0.000102 wd 0.0500 time 0.3996 (0.4267) data time 0.0007 (0.0033) model time 0.3989 (0.4363) loss 6.7847 (6.5405) grad_norm 2.5213 (3.9615) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][160/625] eta 0:03:18 lr 0.000102 wd 0.0500 time 0.3985 (0.4261) data time 0.0009 (0.0031) model time 0.3977 (0.4344) loss 7.0934 (6.5236) grad_norm 1.9152 (3.8994) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][170/625] eta 0:03:13 lr 0.000102 wd 0.0500 time 0.4065 (0.4247) data time 0.0006 (0.0030) model time 0.4059 (0.4317) loss 7.0624 (6.5152) grad_norm 3.6057 (3.9764) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][180/625] eta 0:03:08 lr 0.000102 wd 0.0500 time 0.4019 (0.4234) data time 0.0007 (0.0029) model time 0.4012 (0.4292) loss 6.8358 (6.5190) grad_norm 2.4976 (3.9648) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][190/625] eta 0:03:03 lr 0.000102 wd 0.0500 time 0.3959 (0.4221) data time 0.0009 (0.0028) model time 0.3949 (0.4270) loss 5.6108 (6.5401) grad_norm 3.7226 (4.0613) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][200/625] eta 0:02:58 lr 0.000102 wd 0.0500 time 0.4002 (0.4210) data time 0.0010 (0.0027) model time 0.3993 (0.4252) loss 6.7242 (6.5349) grad_norm 3.0384 (3.9866) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][210/625] eta 0:02:54 lr 0.000102 wd 0.0500 time 0.3999 (0.4200) data time 0.0007 (0.0026) model time 0.3993 (0.4235) loss 6.6168 (6.5235) grad_norm 2.2963 (3.9291) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][220/625] eta 0:02:49 lr 0.000102 wd 0.0500 time 0.3978 (0.4191) data time 0.0007 (0.0025) model time 0.3971 (0.4221) loss 6.1771 (6.5184) grad_norm 3.9140 (3.9939) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][230/625] eta 0:02:45 lr 0.000102 wd 0.0500 time 0.3988 (0.4182) data time 0.0007 (0.0024) model time 0.3981 (0.4208) loss 6.6603 (6.5093) grad_norm 4.5256 (4.0620) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][240/625] eta 0:02:40 lr 0.000102 wd 0.0500 time 0.3995 (0.4174) data time 0.0009 (0.0024) model time 0.3986 (0.4196) loss 6.9435 (6.5193) grad_norm 2.4038 (4.0499) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][250/625] eta 0:02:36 lr 0.000102 wd 0.0500 time 0.3960 (0.4167) data time 0.0007 (0.0023) model time 0.3953 (0.4185) loss 7.0015 (6.5208) grad_norm 2.3470 (4.0067) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][260/625] eta 0:02:31 lr 0.000102 wd 0.0500 time 0.3968 (0.4161) data time 0.0008 (0.0022) model time 0.3959 (0.4176) loss 6.8668 (6.5233) grad_norm 2.9296 (3.9587) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][270/625] eta 0:02:27 lr 0.000102 wd 0.0500 time 0.4021 (0.4155) data time 0.0008 (0.0022) model time 0.4013 (0.4168) loss 5.4574 (6.5183) grad_norm 2.9003 (3.9521) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][280/625] eta 0:02:23 lr 0.000101 wd 0.0500 time 0.4010 (0.4149) data time 0.0007 (0.0021) model time 0.4003 (0.4160) loss 5.6687 (6.5119) grad_norm 2.6183 (3.9426) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][290/625] eta 0:02:18 lr 0.000101 wd 0.0500 time 0.3954 (0.4144) data time 0.0007 (0.0021) model time 0.3947 (0.4152) loss 6.4267 (6.5138) grad_norm 5.3236 (3.9185) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][300/625] eta 0:02:14 lr 0.000101 wd 0.0500 time 0.5496 (0.4150) data time 0.0006 (0.0021) model time 0.5489 (0.4160) loss 5.8766 (6.5216) grad_norm 2.2961 (3.8856) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][310/625] eta 0:02:10 lr 0.000101 wd 0.0500 time 0.3955 (0.4156) data time 0.0006 (0.0020) model time 0.3949 (0.4166) loss 6.1870 (6.5228) grad_norm 2.2732 (3.8513) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][320/625] eta 0:02:07 lr 0.000101 wd 0.0500 time 0.3956 (0.4166) data time 0.0008 (0.0020) model time 0.3948 (0.4177) loss 7.1917 (6.5216) grad_norm 3.1496 (3.8280) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][330/625] eta 0:02:03 lr 0.000101 wd 0.0500 time 0.3965 (0.4170) data time 0.0009 (0.0019) model time 0.3957 (0.4181) loss 6.3695 (6.5165) grad_norm 2.3876 (3.7888) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][340/625] eta 0:01:58 lr 0.000101 wd 0.0500 time 0.3958 (0.4173) data time 0.0009 (0.0019) model time 0.3950 (0.4184) loss 6.5807 (6.5172) grad_norm 5.6174 (3.8145) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][350/625] eta 0:01:54 lr 0.000101 wd 0.0500 time 0.4004 (0.4175) data time 0.0008 (0.0019) model time 0.3995 (0.4185) loss 7.2702 (6.5144) grad_norm 4.3625 (3.8077) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][360/625] eta 0:01:50 lr 0.000101 wd 0.0500 time 0.3964 (0.4181) data time 0.0008 (0.0019) model time 0.3956 (0.4192) loss 6.9388 (6.5024) grad_norm 2.0526 (3.7812) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][370/625] eta 0:01:46 lr 0.000101 wd 0.0500 time 0.4019 (0.4175) data time 0.0008 (0.0018) model time 0.4010 (0.4185) loss 6.3976 (6.5070) grad_norm 3.0764 (3.7679) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][380/625] eta 0:01:42 lr 0.000101 wd 0.0500 time 0.4017 (0.4174) data time 0.0008 (0.0018) model time 0.4009 (0.4182) loss 7.2285 (6.5123) grad_norm 2.1827 (3.7329) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][390/625] eta 0:01:37 lr 0.000101 wd 0.0500 time 0.3960 (0.4169) data time 0.0007 (0.0018) model time 0.3954 (0.4177) loss 5.3381 (6.5083) grad_norm 1.7856 (3.7311) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][400/625] eta 0:01:33 lr 0.000101 wd 0.0500 time 0.3976 (0.4165) data time 0.0008 (0.0018) model time 0.3968 (0.4171) loss 6.7758 (6.5036) grad_norm 2.2997 (3.7075) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][410/625] eta 0:01:29 lr 0.000101 wd 0.0500 time 0.3986 (0.4161) data time 0.0009 (0.0017) model time 0.3977 (0.4166) loss 5.5718 (6.4951) grad_norm 3.3328 (3.7055) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][420/625] eta 0:01:25 lr 0.000101 wd 0.0500 time 0.3954 (0.4156) data time 0.0008 (0.0017) model time 0.3946 (0.4161) loss 5.7997 (6.4979) grad_norm 3.5268 (3.6872) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][430/625] eta 0:01:20 lr 0.000101 wd 0.0500 time 0.3989 (0.4152) data time 0.0008 (0.0017) model time 0.3981 (0.4156) loss 6.8730 (6.4963) grad_norm 3.1463 (3.6737) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][440/625] eta 0:01:16 lr 0.000101 wd 0.0500 time 0.3984 (0.4149) data time 0.0006 (0.0017) model time 0.3978 (0.4151) loss 6.8643 (6.4986) grad_norm 3.7075 (3.6720) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:31:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][450/625] eta 0:01:12 lr 0.000101 wd 0.0500 time 0.3996 (0.4145) data time 0.0006 (0.0017) model time 0.3990 (0.4147) loss 6.1004 (6.4961) grad_norm 2.2984 (3.7017) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][460/625] eta 0:01:08 lr 0.000100 wd 0.0500 time 0.3992 (0.4142) data time 0.0006 (0.0016) model time 0.3985 (0.4143) loss 6.1866 (6.4986) grad_norm 2.9185 (3.6805) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][470/625] eta 0:01:04 lr 0.000100 wd 0.0500 time 0.4012 (0.4139) data time 0.0009 (0.0016) model time 0.4003 (0.4140) loss 5.8279 (6.4994) grad_norm 7.9425 (3.6805) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][480/625] eta 0:00:59 lr 0.000100 wd 0.0500 time 0.3954 (0.4136) data time 0.0006 (0.0016) model time 0.3948 (0.4136) loss 5.4298 (6.4962) grad_norm 3.0479 (3.7134) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][490/625] eta 0:00:55 lr 0.000100 wd 0.0500 time 0.4022 (0.4133) data time 0.0008 (0.0016) model time 0.4014 (0.4133) loss 6.4245 (6.4974) grad_norm 1.8539 (3.6987) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][500/625] eta 0:00:51 lr 0.000100 wd 0.0500 time 0.3967 (0.4130) data time 0.0009 (0.0016) model time 0.3958 (0.4129) loss 6.3610 (6.4914) grad_norm 4.1800 (3.6863) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][510/625] eta 0:00:47 lr 0.000100 wd 0.0500 time 0.4074 (0.4128) data time 0.0008 (0.0016) model time 0.4066 (0.4126) loss 6.8958 (6.4946) grad_norm 2.6661 (3.6744) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][520/625] eta 0:00:43 lr 0.000100 wd 0.0500 time 0.6118 (0.4133) data time 0.0008 (0.0015) model time 0.6110 (0.4132) loss 6.4597 (6.4972) grad_norm 2.6283 (3.6684) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][530/625] eta 0:00:39 lr 0.000100 wd 0.0500 time 0.3936 (0.4137) data time 0.0009 (0.0015) model time 0.3927 (0.4136) loss 6.7294 (6.4981) grad_norm 4.1448 (3.6495) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][540/625] eta 0:00:35 lr 0.000100 wd 0.0500 time 0.6059 (0.4150) data time 0.0006 (0.0015) model time 0.6053 (0.4151) loss 5.3257 (6.5019) grad_norm 3.2686 (3.6434) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][550/625] eta 0:00:31 lr 0.000100 wd 0.0500 time 0.5725 (0.4157) data time 0.0008 (0.0015) model time 0.5717 (0.4159) loss 6.8995 (6.5043) grad_norm 2.2099 (3.6459) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][560/625] eta 0:00:27 lr 0.000100 wd 0.0500 time 0.3985 (0.4163) data time 0.0006 (0.0015) model time 0.3979 (0.4164) loss 6.8901 (6.5003) grad_norm 2.3719 (3.6352) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][570/625] eta 0:00:22 lr 0.000100 wd 0.0500 time 0.3998 (0.4166) data time 0.0008 (0.0015) model time 0.3991 (0.4167) loss 6.8680 (6.4989) grad_norm 3.3498 (3.6194) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][580/625] eta 0:00:18 lr 0.000100 wd 0.0500 time 0.3965 (0.4171) data time 0.0010 (0.0015) model time 0.3955 (0.4173) loss 7.3477 (6.5058) grad_norm 2.3701 (3.6058) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:32:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][590/625] eta 0:00:14 lr 0.000100 wd 0.0500 time 0.3969 (0.4168) data time 0.0006 (0.0015) model time 0.3963 (0.4170) loss 6.4785 (6.5027) grad_norm 3.2771 (3.6040) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][600/625] eta 0:00:10 lr 0.000100 wd 0.0500 time 0.3989 (0.4167) data time 0.0006 (0.0015) model time 0.3983 (0.4168) loss 6.3768 (6.5030) grad_norm 2.4670 (3.6242) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][610/625] eta 0:00:06 lr 0.000100 wd 0.0500 time 0.3984 (0.4164) data time 0.0004 (0.0015) model time 0.3979 (0.4165) loss 6.1277 (6.5033) grad_norm 2.1294 (3.6490) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][620/625] eta 0:00:02 lr 0.000100 wd 0.0500 time 0.3968 (0.4161) data time 0.0004 (0.0014) model time 0.3964 (0.4161) loss 5.4522 (6.4977) grad_norm 2.4304 (3.6400) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 250 training takes 0:04:19 [2024-07-25 10:33:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:33:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:33:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5352 (0.5352) Acc@1 90.137 (90.137) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 10:33:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8223 (0.6546) Acc@1 81.738 (87.327) Acc@5 96.729 (97.954) Mem 14939MB [2024-07-25 10:33:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9185 (0.7611) Acc@1 78.418 (84.289) Acc@5 95.654 (96.975) Mem 14939MB [2024-07-25 10:33:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.913 Acc@5 96.945 [2024-07-25 10:33:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:33:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.899 (0.899) Loss 0.5376 (0.5376) Acc@1 90.186 (90.186) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:33:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.161) Loss 0.8159 (0.6584) Acc@1 83.105 (87.305) Acc@5 96.777 (97.936) Mem 14939MB [2024-07-25 10:33:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.125) Loss 0.9258 (0.7634) Acc@1 78.369 (84.347) Acc@5 95.508 (96.961) Mem 14939MB [2024-07-25 10:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.949 Acc@5 96.923 [2024-07-25 10:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.95% [2024-07-25 10:33:18 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:33:19 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:33:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][0/625] eta 0:09:07 lr 0.000100 wd 0.0500 time 0.8761 (0.8761) data time 0.4993 (0.4993) model time 0.0000 (0.0000) loss 5.8623 (5.8623) grad_norm 2.4062 (2.4062) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][10/625] eta 0:04:31 lr 0.000099 wd 0.0500 time 0.3979 (0.4412) data time 0.0007 (0.0462) model time 0.0000 (0.0000) loss 5.7658 (6.7088) grad_norm 3.8562 (3.1304) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][20/625] eta 0:04:14 lr 0.000099 wd 0.0500 time 0.3943 (0.4200) data time 0.0007 (0.0246) model time 0.0000 (0.0000) loss 6.5617 (6.6472) grad_norm 2.8520 (2.9203) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][30/625] eta 0:04:05 lr 0.000099 wd 0.0500 time 0.3986 (0.4126) data time 0.0007 (0.0169) model time 0.0000 (0.0000) loss 6.9150 (6.6177) grad_norm 2.2918 (2.9856) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][40/625] eta 0:03:59 lr 0.000099 wd 0.0500 time 0.4014 (0.4092) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 5.5544 (6.5483) grad_norm 3.4442 (3.0291) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][50/625] eta 0:03:54 lr 0.000099 wd 0.0500 time 0.3986 (0.4074) data time 0.0006 (0.0106) model time 0.0000 (0.0000) loss 5.4444 (6.5536) grad_norm 11.9861 (3.4083) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][60/625] eta 0:03:49 lr 0.000099 wd 0.0500 time 0.3996 (0.4061) data time 0.0009 (0.0090) model time 0.3988 (0.3986) loss 7.3728 (6.5617) grad_norm 4.7608 (3.6218) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][70/625] eta 0:03:46 lr 0.000099 wd 0.0500 time 0.4102 (0.4081) data time 0.0007 (0.0079) model time 0.4095 (0.4090) loss 5.9118 (6.5929) grad_norm 3.2154 (3.6124) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][80/625] eta 0:03:41 lr 0.000099 wd 0.0500 time 0.3977 (0.4071) data time 0.0007 (0.0070) model time 0.3970 (0.4057) loss 6.2394 (6.5727) grad_norm 2.0472 (3.5415) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:33:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][90/625] eta 0:03:37 lr 0.000099 wd 0.0500 time 0.3991 (0.4063) data time 0.0007 (0.0063) model time 0.3985 (0.4041) loss 6.3503 (6.5498) grad_norm 2.1722 (3.8237) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:34:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][100/625] eta 0:03:32 lr 0.000099 wd 0.0500 time 0.3978 (0.4056) data time 0.0009 (0.0058) model time 0.3969 (0.4030) loss 6.4101 (6.5463) grad_norm 3.9627 (3.7734) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:34:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][110/625] eta 0:03:28 lr 0.000099 wd 0.0500 time 0.3978 (0.4050) data time 0.0007 (0.0053) model time 0.3971 (0.4022) loss 6.6428 (6.5373) grad_norm 1.8207 (3.6935) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:34:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][120/625] eta 0:03:25 lr 0.000099 wd 0.0500 time 0.4000 (0.4078) data time 0.0010 (0.0050) model time 0.3991 (0.4072) loss 6.6118 (6.5315) grad_norm 2.3894 (3.8088) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:34:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][130/625] eta 0:03:22 lr 0.000099 wd 0.0500 time 0.5661 (0.4098) data time 0.0008 (0.0047) model time 0.5653 (0.4104) loss 5.7531 (6.5017) grad_norm 4.2340 (3.7582) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:34:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][140/625] eta 0:03:20 lr 0.000099 wd 0.0500 time 0.3959 (0.4139) data time 0.0007 (0.0044) model time 0.3952 (0.4167) loss 6.5581 (6.4886) grad_norm 2.0155 (3.7136) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 10:34:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][150/625] eta 0:03:17 lr 0.000099 wd 0.0500 time 0.6080 (0.4164) data time 0.0007 (0.0042) model time 0.6073 (0.4202) loss 5.8548 (6.4753) grad_norm 2.1579 (3.6311) loss_scale 128.0000 (67.3907) mem 14939MB [2024-07-25 10:34:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][160/625] eta 0:03:14 lr 0.000099 wd 0.0500 time 0.5863 (0.4176) data time 0.0006 (0.0040) model time 0.5856 (0.4215) loss 7.5727 (6.4828) grad_norm 3.1795 (3.5785) loss_scale 128.0000 (71.1553) mem 14939MB [2024-07-25 10:34:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][170/625] eta 0:03:10 lr 0.000099 wd 0.0500 time 0.3955 (0.4184) data time 0.0009 (0.0038) model time 0.3946 (0.4222) loss 7.3714 (6.5020) grad_norm 2.6733 (3.5276) loss_scale 128.0000 (74.4795) mem 14939MB [2024-07-25 10:34:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][180/625] eta 0:03:06 lr 0.000099 wd 0.0500 time 0.4006 (0.4184) data time 0.0007 (0.0036) model time 0.3999 (0.4219) loss 6.5386 (6.5203) grad_norm 3.2495 (3.5689) loss_scale 128.0000 (77.4365) mem 14939MB [2024-07-25 10:34:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][190/625] eta 0:03:01 lr 0.000098 wd 0.0500 time 0.3987 (0.4182) data time 0.0007 (0.0035) model time 0.3980 (0.4213) loss 5.5938 (6.5070) grad_norm 2.3583 (3.5527) loss_scale 128.0000 (80.0838) mem 14939MB [2024-07-25 10:34:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][200/625] eta 0:02:57 lr 0.000098 wd 0.0500 time 0.3951 (0.4173) data time 0.0006 (0.0033) model time 0.3945 (0.4198) loss 7.0432 (6.5016) grad_norm 3.5304 (3.5472) loss_scale 128.0000 (82.4677) mem 14939MB [2024-07-25 10:34:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][210/625] eta 0:02:52 lr 0.000098 wd 0.0500 time 0.3983 (0.4165) data time 0.0009 (0.0032) model time 0.3974 (0.4185) loss 8.3517 (6.5158) grad_norm 2.4005 (3.5120) loss_scale 128.0000 (84.6256) mem 14939MB [2024-07-25 10:34:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][220/625] eta 0:02:48 lr 0.000098 wd 0.0500 time 0.3966 (0.4156) data time 0.0008 (0.0031) model time 0.3958 (0.4172) loss 5.8159 (6.5051) grad_norm 2.1640 (3.4692) loss_scale 128.0000 (86.5882) mem 14939MB [2024-07-25 10:34:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][230/625] eta 0:02:43 lr 0.000098 wd 0.0500 time 0.3988 (0.4149) data time 0.0007 (0.0030) model time 0.3981 (0.4161) loss 5.5593 (6.4958) grad_norm 2.9895 (3.9941) loss_scale 128.0000 (88.3810) mem 14939MB [2024-07-25 10:34:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][240/625] eta 0:02:39 lr 0.000098 wd 0.0500 time 0.4079 (0.4142) data time 0.0008 (0.0029) model time 0.4071 (0.4151) loss 6.0583 (6.5062) grad_norm 3.0481 (3.9637) loss_scale 128.0000 (90.0249) mem 14939MB [2024-07-25 10:35:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][250/625] eta 0:02:35 lr 0.000098 wd 0.0500 time 0.3947 (0.4137) data time 0.0009 (0.0029) model time 0.3939 (0.4145) loss 6.9239 (6.5073) grad_norm 4.0122 (3.9267) loss_scale 128.0000 (91.5378) mem 14939MB [2024-07-25 10:35:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][260/625] eta 0:02:30 lr 0.000098 wd 0.0500 time 0.3981 (0.4131) data time 0.0008 (0.0028) model time 0.3972 (0.4136) loss 5.4570 (6.5055) grad_norm 4.0706 (3.8906) loss_scale 128.0000 (92.9349) mem 14939MB [2024-07-25 10:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][270/625] eta 0:02:26 lr 0.000098 wd 0.0500 time 0.4341 (0.4127) data time 0.0007 (0.0027) model time 0.4334 (0.4131) loss 6.9369 (6.5130) grad_norm 2.9933 (3.8450) loss_scale 128.0000 (94.2288) mem 14939MB [2024-07-25 10:35:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][280/625] eta 0:02:22 lr 0.000098 wd 0.0500 time 0.3981 (0.4123) data time 0.0006 (0.0026) model time 0.3975 (0.4125) loss 6.2689 (6.5207) grad_norm 2.1983 (3.8115) loss_scale 128.0000 (95.4306) mem 14939MB [2024-07-25 10:35:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][290/625] eta 0:02:18 lr 0.000098 wd 0.0500 time 0.4011 (0.4126) data time 0.0007 (0.0026) model time 0.4004 (0.4129) loss 6.5602 (6.5147) grad_norm 2.8381 (3.8073) loss_scale 128.0000 (96.5498) mem 14939MB [2024-07-25 10:35:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][300/625] eta 0:02:13 lr 0.000098 wd 0.0500 time 0.3991 (0.4122) data time 0.0007 (0.0025) model time 0.3985 (0.4123) loss 7.2260 (6.5134) grad_norm 3.3471 (3.7847) loss_scale 128.0000 (97.5947) mem 14939MB [2024-07-25 10:35:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][310/625] eta 0:02:09 lr 0.000098 wd 0.0500 time 0.3987 (0.4118) data time 0.0006 (0.0025) model time 0.3981 (0.4118) loss 7.2833 (6.5102) grad_norm 2.4690 (3.8185) loss_scale 128.0000 (98.5723) mem 14939MB [2024-07-25 10:35:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][320/625] eta 0:02:05 lr 0.000098 wd 0.0500 time 0.3985 (0.4115) data time 0.0007 (0.0024) model time 0.3978 (0.4114) loss 7.2445 (6.5137) grad_norm 2.1732 (3.7952) loss_scale 128.0000 (99.4891) mem 14939MB [2024-07-25 10:35:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][330/625] eta 0:02:01 lr 0.000098 wd 0.0500 time 0.4001 (0.4111) data time 0.0007 (0.0024) model time 0.3994 (0.4109) loss 6.7934 (6.5188) grad_norm 11.0527 (3.7939) loss_scale 128.0000 (100.3505) mem 14939MB [2024-07-25 10:35:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][340/625] eta 0:01:57 lr 0.000098 wd 0.0500 time 0.5728 (0.4123) data time 0.0009 (0.0023) model time 0.5719 (0.4123) loss 6.6593 (6.5086) grad_norm 2.5768 (3.7828) loss_scale 128.0000 (101.1613) mem 14939MB [2024-07-25 10:35:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][350/625] eta 0:01:53 lr 0.000098 wd 0.0500 time 0.5710 (0.4127) data time 0.0007 (0.0023) model time 0.5703 (0.4128) loss 6.6783 (6.5099) grad_norm 2.8990 (3.7644) loss_scale 128.0000 (101.9259) mem 14939MB [2024-07-25 10:35:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][360/625] eta 0:01:49 lr 0.000098 wd 0.0500 time 0.3962 (0.4142) data time 0.0007 (0.0022) model time 0.3956 (0.4145) loss 6.4960 (6.5039) grad_norm 2.1873 (3.7614) loss_scale 128.0000 (102.6482) mem 14939MB [2024-07-25 10:35:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][370/625] eta 0:01:45 lr 0.000097 wd 0.0500 time 0.5646 (0.4156) data time 0.0008 (0.0022) model time 0.5638 (0.4161) loss 7.1339 (6.5057) grad_norm 3.1254 (3.7587) loss_scale 128.0000 (103.3315) mem 14939MB [2024-07-25 10:35:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][380/625] eta 0:01:42 lr 0.000097 wd 0.0500 time 0.5591 (0.4164) data time 0.0009 (0.0022) model time 0.5582 (0.4169) loss 6.0440 (6.5081) grad_norm 2.5464 (3.8565) loss_scale 128.0000 (103.9790) mem 14939MB [2024-07-25 10:36:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][390/625] eta 0:01:38 lr 0.000097 wd 0.0500 time 0.5209 (0.4171) data time 0.0009 (0.0021) model time 0.5200 (0.4177) loss 5.8184 (6.5034) grad_norm 2.3689 (3.8455) loss_scale 128.0000 (104.5934) mem 14939MB [2024-07-25 10:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][400/625] eta 0:01:33 lr 0.000097 wd 0.0500 time 0.3981 (0.4171) data time 0.0006 (0.0021) model time 0.3975 (0.4177) loss 6.5720 (6.5088) grad_norm 2.7977 (3.8284) loss_scale 128.0000 (105.1771) mem 14939MB [2024-07-25 10:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][410/625] eta 0:01:29 lr 0.000097 wd 0.0500 time 0.3982 (0.4171) data time 0.0007 (0.0021) model time 0.3975 (0.4176) loss 6.5332 (6.5026) grad_norm 2.3675 (3.8188) loss_scale 128.0000 (105.7324) mem 14939MB [2024-07-25 10:36:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][420/625] eta 0:01:25 lr 0.000097 wd 0.0500 time 0.3984 (0.4167) data time 0.0008 (0.0020) model time 0.3976 (0.4171) loss 5.3444 (6.5031) grad_norm 2.3939 (3.7993) loss_scale 128.0000 (106.2613) mem 14939MB [2024-07-25 10:36:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][430/625] eta 0:01:21 lr 0.000097 wd 0.0500 time 0.3949 (0.4163) data time 0.0009 (0.0020) model time 0.3940 (0.4166) loss 6.4346 (6.5044) grad_norm 2.1423 (3.7778) loss_scale 128.0000 (106.7657) mem 14939MB [2024-07-25 10:36:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][440/625] eta 0:01:16 lr 0.000097 wd 0.0500 time 0.3989 (0.4159) data time 0.0006 (0.0020) model time 0.3982 (0.4162) loss 7.8631 (6.5071) grad_norm 2.5983 (3.7723) loss_scale 128.0000 (107.2472) mem 14939MB [2024-07-25 10:36:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][450/625] eta 0:01:12 lr 0.000097 wd 0.0500 time 0.3986 (0.4156) data time 0.0009 (0.0020) model time 0.3977 (0.4158) loss 6.9815 (6.5036) grad_norm 2.7307 (3.7487) loss_scale 128.0000 (107.7073) mem 14939MB [2024-07-25 10:36:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][460/625] eta 0:01:08 lr 0.000097 wd 0.0500 time 0.3998 (0.4153) data time 0.0009 (0.0019) model time 0.3989 (0.4154) loss 7.6601 (6.5014) grad_norm 8.2595 (3.7290) loss_scale 128.0000 (108.1475) mem 14939MB [2024-07-25 10:36:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][470/625] eta 0:01:04 lr 0.000097 wd 0.0500 time 0.3958 (0.4149) data time 0.0009 (0.0019) model time 0.3949 (0.4150) loss 7.0896 (6.5011) grad_norm 2.3563 (3.7201) loss_scale 128.0000 (108.5690) mem 14939MB [2024-07-25 10:36:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][480/625] eta 0:01:00 lr 0.000097 wd 0.0500 time 0.4012 (0.4146) data time 0.0007 (0.0019) model time 0.4005 (0.4146) loss 6.1869 (6.4961) grad_norm 3.2087 (3.7317) loss_scale 128.0000 (108.9730) mem 14939MB [2024-07-25 10:36:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][490/625] eta 0:00:55 lr 0.000097 wd 0.0500 time 0.3978 (0.4143) data time 0.0007 (0.0019) model time 0.3971 (0.4143) loss 7.2215 (6.4990) grad_norm 3.3235 (3.7242) loss_scale 128.0000 (109.3605) mem 14939MB [2024-07-25 10:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][500/625] eta 0:00:51 lr 0.000097 wd 0.0500 time 0.3988 (0.4141) data time 0.0009 (0.0019) model time 0.3979 (0.4140) loss 6.6920 (6.4979) grad_norm 14.8789 (3.7372) loss_scale 128.0000 (109.7325) mem 14939MB [2024-07-25 10:36:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][510/625] eta 0:00:47 lr 0.000097 wd 0.0500 time 0.4025 (0.4140) data time 0.0009 (0.0018) model time 0.4016 (0.4139) loss 7.3891 (6.4989) grad_norm 2.6359 (3.7196) loss_scale 128.0000 (110.0900) mem 14939MB [2024-07-25 10:36:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][520/625] eta 0:00:43 lr 0.000097 wd 0.0500 time 0.3937 (0.4137) data time 0.0007 (0.0018) model time 0.3930 (0.4136) loss 6.7774 (6.5051) grad_norm 2.5603 (3.7295) loss_scale 128.0000 (110.4338) mem 14939MB [2024-07-25 10:36:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][530/625] eta 0:00:39 lr 0.000097 wd 0.0500 time 0.4005 (0.4135) data time 0.0009 (0.0018) model time 0.3996 (0.4133) loss 5.7367 (6.5033) grad_norm 2.6295 (3.7119) loss_scale 128.0000 (110.7646) mem 14939MB [2024-07-25 10:37:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][540/625] eta 0:00:35 lr 0.000097 wd 0.0500 time 0.3987 (0.4132) data time 0.0009 (0.0018) model time 0.3978 (0.4130) loss 6.3102 (6.5047) grad_norm 2.3278 (3.6883) loss_scale 128.0000 (111.0832) mem 14939MB [2024-07-25 10:37:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][550/625] eta 0:00:30 lr 0.000096 wd 0.0500 time 0.4008 (0.4130) data time 0.0009 (0.0018) model time 0.4000 (0.4127) loss 7.0430 (6.5136) grad_norm 3.2096 (3.7112) loss_scale 128.0000 (111.3902) mem 14939MB [2024-07-25 10:37:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][560/625] eta 0:00:26 lr 0.000096 wd 0.0500 time 0.3989 (0.4133) data time 0.0007 (0.0017) model time 0.3983 (0.4131) loss 6.5752 (6.5104) grad_norm 2.8467 (3.6954) loss_scale 128.0000 (111.6863) mem 14939MB [2024-07-25 10:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][570/625] eta 0:00:22 lr 0.000096 wd 0.0500 time 0.5978 (0.4140) data time 0.0006 (0.0017) model time 0.5972 (0.4137) loss 5.3906 (6.5131) grad_norm 2.6064 (3.6881) loss_scale 128.0000 (111.9720) mem 14939MB [2024-07-25 10:37:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][580/625] eta 0:00:18 lr 0.000096 wd 0.0500 time 0.3968 (0.4152) data time 0.0009 (0.0017) model time 0.3959 (0.4151) loss 7.2277 (6.5123) grad_norm 2.5946 (3.6740) loss_scale 128.0000 (112.2478) mem 14939MB [2024-07-25 10:37:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][590/625] eta 0:00:14 lr 0.000096 wd 0.0500 time 0.3919 (0.4162) data time 0.0007 (0.0017) model time 0.3912 (0.4161) loss 7.5946 (6.5089) grad_norm 5.9202 (3.6725) loss_scale 128.0000 (112.5144) mem 14939MB [2024-07-25 10:37:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][600/625] eta 0:00:10 lr 0.000096 wd 0.0500 time 0.5969 (0.4170) data time 0.0009 (0.0017) model time 0.5960 (0.4170) loss 7.6432 (6.5108) grad_norm 2.2795 (3.6605) loss_scale 128.0000 (112.7720) mem 14939MB [2024-07-25 10:37:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][610/625] eta 0:00:06 lr 0.000096 wd 0.0500 time 0.5357 (0.4174) data time 0.0004 (0.0017) model time 0.5354 (0.4175) loss 7.0545 (6.5135) grad_norm 3.5203 (3.6477) loss_scale 128.0000 (113.0213) mem 14939MB [2024-07-25 10:37:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][620/625] eta 0:00:02 lr 0.000096 wd 0.0500 time 0.3983 (0.4171) data time 0.0004 (0.0017) model time 0.3979 (0.4171) loss 6.0871 (6.5145) grad_norm 2.4893 (3.6369) loss_scale 128.0000 (113.2625) mem 14939MB [2024-07-25 10:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 251 training takes 0:04:20 [2024-07-25 10:37:39 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:37:40 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:37:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5366 (0.5366) Acc@1 90.674 (90.674) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:37:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.8169 (0.6615) Acc@1 82.764 (87.296) Acc@5 96.924 (97.931) Mem 14939MB [2024-07-25 10:37:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9219 (0.7685) Acc@1 78.418 (84.335) Acc@5 96.094 (96.980) Mem 14939MB [2024-07-25 10:37:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.945 Acc@5 96.947 [2024-07-25 10:37:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:37:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.751 (0.751) Loss 0.5376 (0.5376) Acc@1 90.186 (90.186) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:37:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.153) Loss 0.8149 (0.6582) Acc@1 83.203 (87.336) Acc@5 96.826 (97.945) Mem 14939MB [2024-07-25 10:37:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9243 (0.7633) Acc@1 78.320 (84.356) Acc@5 95.508 (96.963) Mem 14939MB [2024-07-25 10:37:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.949 Acc@5 96.923 [2024-07-25 10:37:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-07-25 10:37:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.95% [2024-07-25 10:37:46 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:37:47 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:37:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][0/625] eta 0:08:29 lr 0.000096 wd 0.0500 time 0.8152 (0.8152) data time 0.4370 (0.4370) model time 0.0000 (0.0000) loss 6.0395 (6.0395) grad_norm 2.6390 (2.6390) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:37:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][10/625] eta 0:04:27 lr 0.000096 wd 0.0500 time 0.3992 (0.4352) data time 0.0009 (0.0405) model time 0.0000 (0.0000) loss 7.2764 (6.4971) grad_norm 2.1889 (3.1990) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:37:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][20/625] eta 0:04:12 lr 0.000096 wd 0.0500 time 0.3968 (0.4174) data time 0.0008 (0.0216) model time 0.0000 (0.0000) loss 6.4805 (6.3937) grad_norm 3.1402 (3.0999) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:37:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][30/625] eta 0:04:05 lr 0.000096 wd 0.0500 time 0.3996 (0.4118) data time 0.0008 (0.0149) model time 0.0000 (0.0000) loss 6.5882 (6.3237) grad_norm 2.1074 (2.9195) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][40/625] eta 0:04:01 lr 0.000096 wd 0.0500 time 0.3981 (0.4126) data time 0.0007 (0.0118) model time 0.0000 (0.0000) loss 6.8669 (6.4090) grad_norm 2.7012 (2.9715) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][50/625] eta 0:03:55 lr 0.000096 wd 0.0500 time 0.3966 (0.4098) data time 0.0009 (0.0097) model time 0.0000 (0.0000) loss 6.8310 (6.4427) grad_norm 2.1513 (3.0159) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][60/625] eta 0:03:50 lr 0.000096 wd 0.0500 time 0.3975 (0.4078) data time 0.0009 (0.0082) model time 0.3966 (0.3964) loss 5.2303 (6.3950) grad_norm 2.9982 (3.1222) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][70/625] eta 0:03:45 lr 0.000096 wd 0.0500 time 0.3964 (0.4062) data time 0.0008 (0.0072) model time 0.3956 (0.3960) loss 6.7741 (6.3883) grad_norm 1.9984 (3.0971) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][80/625] eta 0:03:40 lr 0.000096 wd 0.0500 time 0.3964 (0.4052) data time 0.0006 (0.0064) model time 0.3958 (0.3965) loss 7.0988 (6.4326) grad_norm 2.1026 (3.1079) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][90/625] eta 0:03:36 lr 0.000096 wd 0.0500 time 0.4136 (0.4045) data time 0.0007 (0.0058) model time 0.4130 (0.3968) loss 5.4385 (6.4270) grad_norm 10.7291 (3.1687) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][100/625] eta 0:03:32 lr 0.000096 wd 0.0500 time 0.3972 (0.4039) data time 0.0006 (0.0053) model time 0.3966 (0.3970) loss 7.1613 (6.4578) grad_norm 5.6897 (3.2763) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][110/625] eta 0:03:27 lr 0.000095 wd 0.0500 time 0.3970 (0.4036) data time 0.0006 (0.0049) model time 0.3964 (0.3975) loss 7.2723 (6.4670) grad_norm 4.3514 (3.3457) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][120/625] eta 0:03:23 lr 0.000095 wd 0.0500 time 0.4004 (0.4035) data time 0.0009 (0.0046) model time 0.3996 (0.3979) loss 6.0610 (6.4679) grad_norm 3.0216 (3.4454) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][130/625] eta 0:03:19 lr 0.000095 wd 0.0500 time 0.3965 (0.4032) data time 0.0007 (0.0043) model time 0.3958 (0.3980) loss 6.5562 (6.4768) grad_norm 2.6484 (3.4213) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][140/625] eta 0:03:15 lr 0.000095 wd 0.0500 time 0.3976 (0.4031) data time 0.0007 (0.0041) model time 0.3969 (0.3984) loss 7.7631 (6.5127) grad_norm 2.9615 (3.3998) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][150/625] eta 0:03:11 lr 0.000095 wd 0.0500 time 0.5844 (0.4040) data time 0.0006 (0.0039) model time 0.5837 (0.4002) loss 6.6994 (6.4923) grad_norm 7.3284 (3.3712) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][160/625] eta 0:03:08 lr 0.000095 wd 0.0500 time 0.3980 (0.4060) data time 0.0007 (0.0037) model time 0.3974 (0.4034) loss 6.4522 (6.4684) grad_norm 2.5509 (3.4263) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][170/625] eta 0:03:06 lr 0.000095 wd 0.0500 time 0.4087 (0.4096) data time 0.0008 (0.0035) model time 0.4079 (0.4086) loss 6.6439 (6.4725) grad_norm 3.9305 (3.4508) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][180/625] eta 0:03:03 lr 0.000095 wd 0.0500 time 0.5789 (0.4132) data time 0.0007 (0.0034) model time 0.5782 (0.4136) loss 7.3132 (6.4780) grad_norm 2.8630 (3.4332) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][190/625] eta 0:03:01 lr 0.000095 wd 0.0500 time 0.3964 (0.4163) data time 0.0008 (0.0032) model time 0.3956 (0.4178) loss 6.5424 (6.4806) grad_norm 2.8956 (3.4552) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][200/625] eta 0:02:58 lr 0.000095 wd 0.0500 time 0.5791 (0.4195) data time 0.0009 (0.0031) model time 0.5783 (0.4219) loss 6.9832 (6.4674) grad_norm 2.2272 (3.4664) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][210/625] eta 0:02:54 lr 0.000095 wd 0.0500 time 0.3995 (0.4202) data time 0.0009 (0.0030) model time 0.3986 (0.4225) loss 6.9267 (6.4835) grad_norm 2.3420 (3.4256) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][220/625] eta 0:02:49 lr 0.000095 wd 0.0500 time 0.3988 (0.4192) data time 0.0008 (0.0029) model time 0.3979 (0.4211) loss 7.6835 (6.4877) grad_norm 2.6082 (3.4581) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][230/625] eta 0:02:45 lr 0.000095 wd 0.0500 time 0.3987 (0.4192) data time 0.0008 (0.0028) model time 0.3979 (0.4209) loss 6.2522 (6.4926) grad_norm 2.6973 (3.8026) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][240/625] eta 0:02:41 lr 0.000095 wd 0.0500 time 0.4060 (0.4183) data time 0.0007 (0.0028) model time 0.4053 (0.4197) loss 6.1793 (6.4955) grad_norm 3.5010 (3.8213) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][250/625] eta 0:02:36 lr 0.000095 wd 0.0500 time 0.3982 (0.4175) data time 0.0009 (0.0027) model time 0.3973 (0.4186) loss 6.3925 (6.5052) grad_norm 2.7362 (3.8894) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][260/625] eta 0:02:32 lr 0.000095 wd 0.0500 time 0.3975 (0.4173) data time 0.0007 (0.0026) model time 0.3969 (0.4183) loss 6.5397 (6.4940) grad_norm 3.5752 (3.8814) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][270/625] eta 0:02:27 lr 0.000095 wd 0.0500 time 0.3954 (0.4166) data time 0.0008 (0.0026) model time 0.3945 (0.4173) loss 6.6686 (6.4956) grad_norm 2.2862 (3.8492) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][280/625] eta 0:02:23 lr 0.000095 wd 0.0500 time 0.4016 (0.4160) data time 0.0009 (0.0025) model time 0.4007 (0.4164) loss 6.7454 (6.5030) grad_norm 4.9173 (3.8256) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][290/625] eta 0:02:19 lr 0.000095 wd 0.0500 time 0.3976 (0.4153) data time 0.0008 (0.0025) model time 0.3968 (0.4156) loss 5.7247 (6.5003) grad_norm 3.2281 (3.7895) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][300/625] eta 0:02:14 lr 0.000094 wd 0.0500 time 0.3954 (0.4148) data time 0.0008 (0.0024) model time 0.3946 (0.4149) loss 5.8127 (6.5018) grad_norm 6.8249 (3.7720) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][310/625] eta 0:02:10 lr 0.000094 wd 0.0500 time 0.3943 (0.4142) data time 0.0008 (0.0024) model time 0.3935 (0.4141) loss 7.2131 (6.5002) grad_norm 3.0658 (3.7620) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:39:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][320/625] eta 0:02:06 lr 0.000094 wd 0.0500 time 0.4063 (0.4137) data time 0.0008 (0.0023) model time 0.4055 (0.4135) loss 5.7948 (6.4978) grad_norm 2.3057 (3.7336) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][330/625] eta 0:02:01 lr 0.000094 wd 0.0500 time 0.3962 (0.4132) data time 0.0007 (0.0023) model time 0.3955 (0.4130) loss 4.9784 (6.4910) grad_norm 8.8038 (3.7390) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][340/625] eta 0:01:57 lr 0.000094 wd 0.0500 time 0.3974 (0.4128) data time 0.0007 (0.0022) model time 0.3967 (0.4124) loss 6.8412 (6.4810) grad_norm 2.3939 (3.7241) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][350/625] eta 0:01:53 lr 0.000094 wd 0.0500 time 0.3947 (0.4126) data time 0.0007 (0.0022) model time 0.3941 (0.4122) loss 6.5772 (6.4854) grad_norm 4.1687 (3.7417) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][360/625] eta 0:01:49 lr 0.000094 wd 0.0500 time 0.3970 (0.4123) data time 0.0009 (0.0021) model time 0.3961 (0.4118) loss 7.7001 (6.4887) grad_norm 2.7620 (3.7297) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][370/625] eta 0:01:45 lr 0.000094 wd 0.0500 time 0.5792 (0.4124) data time 0.0008 (0.0021) model time 0.5784 (0.4119) loss 6.9129 (6.4863) grad_norm 145.5878 (4.1270) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][380/625] eta 0:01:41 lr 0.000094 wd 0.0500 time 0.5287 (0.4134) data time 0.0006 (0.0021) model time 0.5281 (0.4130) loss 6.2578 (6.4886) grad_norm 5.4743 (4.1036) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][390/625] eta 0:01:37 lr 0.000094 wd 0.0500 time 0.5755 (0.4145) data time 0.0007 (0.0020) model time 0.5749 (0.4143) loss 6.2679 (6.4931) grad_norm 2.9894 (4.0654) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][400/625] eta 0:01:33 lr 0.000094 wd 0.0500 time 0.4009 (0.4158) data time 0.0008 (0.0020) model time 0.4000 (0.4158) loss 6.9365 (6.4895) grad_norm 3.4340 (4.0395) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][410/625] eta 0:01:29 lr 0.000094 wd 0.0500 time 0.3995 (0.4167) data time 0.0009 (0.0020) model time 0.3986 (0.4168) loss 7.1683 (6.4898) grad_norm 3.1021 (4.0230) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][420/625] eta 0:01:25 lr 0.000094 wd 0.0500 time 0.3975 (0.4181) data time 0.0009 (0.0020) model time 0.3966 (0.4184) loss 6.6926 (6.4878) grad_norm 2.1455 (4.0188) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][430/625] eta 0:01:21 lr 0.000094 wd 0.0500 time 0.3964 (0.4181) data time 0.0009 (0.0019) model time 0.3955 (0.4184) loss 6.0587 (6.4818) grad_norm 3.5124 (4.0139) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][440/625] eta 0:01:17 lr 0.000094 wd 0.0500 time 0.5832 (0.4182) data time 0.0008 (0.0019) model time 0.5824 (0.4184) loss 5.7758 (6.4809) grad_norm 3.0864 (3.9924) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][450/625] eta 0:01:13 lr 0.000094 wd 0.0500 time 0.3990 (0.4177) data time 0.0006 (0.0019) model time 0.3984 (0.4179) loss 7.0825 (6.4845) grad_norm 2.1409 (3.9962) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:40:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][460/625] eta 0:01:08 lr 0.000094 wd 0.0500 time 0.3961 (0.4173) data time 0.0009 (0.0019) model time 0.3952 (0.4173) loss 5.9356 (6.4841) grad_norm 2.4319 (4.0291) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][470/625] eta 0:01:04 lr 0.000094 wd 0.0500 time 0.4004 (0.4169) data time 0.0006 (0.0018) model time 0.3998 (0.4168) loss 5.5486 (6.4865) grad_norm 2.8261 (4.0128) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][480/625] eta 0:01:00 lr 0.000093 wd 0.0500 time 0.3992 (0.4168) data time 0.0007 (0.0018) model time 0.3985 (0.4167) loss 6.1704 (6.4762) grad_norm 5.2731 (4.0228) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][490/625] eta 0:00:56 lr 0.000093 wd 0.0500 time 0.3979 (0.4164) data time 0.0008 (0.0018) model time 0.3972 (0.4163) loss 6.1433 (6.4751) grad_norm 3.0059 (4.0053) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][500/625] eta 0:00:52 lr 0.000093 wd 0.0500 time 0.3966 (0.4161) data time 0.0008 (0.0018) model time 0.3958 (0.4159) loss 7.4324 (6.4787) grad_norm 4.4908 (3.9793) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][510/625] eta 0:00:47 lr 0.000093 wd 0.0500 time 0.3985 (0.4157) data time 0.0006 (0.0018) model time 0.3979 (0.4155) loss 5.7999 (6.4770) grad_norm 3.5243 (3.9615) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][520/625] eta 0:00:43 lr 0.000093 wd 0.0500 time 0.3996 (0.4154) data time 0.0008 (0.0017) model time 0.3988 (0.4151) loss 7.6709 (6.4813) grad_norm 2.8395 (3.9469) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][530/625] eta 0:00:39 lr 0.000093 wd 0.0500 time 0.3989 (0.4151) data time 0.0008 (0.0017) model time 0.3980 (0.4148) loss 5.5777 (6.4834) grad_norm 2.0156 (3.9337) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][540/625] eta 0:00:35 lr 0.000093 wd 0.0500 time 0.3976 (0.4148) data time 0.0008 (0.0017) model time 0.3967 (0.4144) loss 7.3289 (6.4762) grad_norm 3.0161 (3.9201) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][550/625] eta 0:00:31 lr 0.000093 wd 0.0500 time 0.3982 (0.4145) data time 0.0007 (0.0017) model time 0.3976 (0.4141) loss 7.4782 (6.4749) grad_norm 4.9261 (3.9119) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][560/625] eta 0:00:26 lr 0.000093 wd 0.0500 time 0.4028 (0.4142) data time 0.0009 (0.0017) model time 0.4019 (0.4138) loss 6.8851 (6.4709) grad_norm 3.2348 (3.9063) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][570/625] eta 0:00:22 lr 0.000093 wd 0.0500 time 0.3971 (0.4139) data time 0.0006 (0.0017) model time 0.3966 (0.4134) loss 6.0654 (6.4720) grad_norm 3.8535 (3.9052) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][580/625] eta 0:00:18 lr 0.000093 wd 0.0500 time 0.3978 (0.4136) data time 0.0006 (0.0017) model time 0.3972 (0.4131) loss 7.4714 (6.4733) grad_norm 3.8396 (3.9206) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][590/625] eta 0:00:14 lr 0.000093 wd 0.0500 time 0.6178 (0.4140) data time 0.0007 (0.0016) model time 0.6171 (0.4135) loss 6.8524 (6.4772) grad_norm 2.4427 (3.9002) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:41:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][600/625] eta 0:00:10 lr 0.000093 wd 0.0500 time 0.5887 (0.4143) data time 0.0009 (0.0016) model time 0.5878 (0.4138) loss 6.4352 (6.4804) grad_norm 3.4237 (3.8830) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][610/625] eta 0:00:06 lr 0.000093 wd 0.0500 time 0.5911 (0.4147) data time 0.0004 (0.0016) model time 0.5907 (0.4143) loss 6.2541 (6.4779) grad_norm 3.3388 (3.8855) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][620/625] eta 0:00:02 lr 0.000093 wd 0.0500 time 0.5508 (0.4158) data time 0.0006 (0.0016) model time 0.5502 (0.4154) loss 7.7354 (6.4831) grad_norm 2.2008 (3.8710) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 252 training takes 0:04:19 [2024-07-25 10:42:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:42:07 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:42:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.5332 (0.5332) Acc@1 89.990 (89.990) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 10:42:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8130 (0.6556) Acc@1 82.861 (87.256) Acc@5 96.826 (97.936) Mem 14939MB [2024-07-25 10:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9146 (0.7611) Acc@1 78.564 (84.373) Acc@5 95.801 (97.005) Mem 14939MB [2024-07-25 10:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.993 Acc@5 96.985 [2024-07-25 10:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 10:42:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.922 (0.922) Loss 0.5371 (0.5371) Acc@1 90.088 (90.088) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:42:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.162) Loss 0.8149 (0.6578) Acc@1 83.203 (87.336) Acc@5 96.924 (97.971) Mem 14939MB [2024-07-25 10:42:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.126) Loss 0.9238 (0.7628) Acc@1 78.369 (84.384) Acc@5 95.508 (96.987) Mem 14939MB [2024-07-25 10:42:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.973 Acc@5 96.943 [2024-07-25 10:42:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 10:42:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.97% [2024-07-25 10:42:13 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:42:14 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:42:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][0/625] eta 0:08:12 lr 0.000093 wd 0.0500 time 0.7888 (0.7888) data time 0.4058 (0.4058) model time 0.0000 (0.0000) loss 7.0296 (7.0296) grad_norm 2.8145 (2.8145) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][10/625] eta 0:05:04 lr 0.000093 wd 0.0500 time 0.4042 (0.4952) data time 0.0008 (0.0376) model time 0.0000 (0.0000) loss 6.7063 (6.7726) grad_norm 4.8590 (2.6633) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][20/625] eta 0:04:51 lr 0.000093 wd 0.0500 time 0.5791 (0.4817) data time 0.0007 (0.0201) model time 0.0000 (0.0000) loss 5.4453 (6.8545) grad_norm 2.2121 (2.7854) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][30/625] eta 0:04:32 lr 0.000093 wd 0.0500 time 0.3973 (0.4583) data time 0.0006 (0.0139) model time 0.0000 (0.0000) loss 5.7794 (6.6942) grad_norm 4.1388 (3.3226) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][40/625] eta 0:04:21 lr 0.000092 wd 0.0500 time 0.3983 (0.4475) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 6.6998 (6.5959) grad_norm 3.2530 (3.4804) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][50/625] eta 0:04:11 lr 0.000092 wd 0.0500 time 0.4082 (0.4382) data time 0.0006 (0.0088) model time 0.0000 (0.0000) loss 6.7182 (6.4789) grad_norm 2.3077 (3.4721) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][60/625] eta 0:04:03 lr 0.000092 wd 0.0500 time 0.3985 (0.4318) data time 0.0010 (0.0075) model time 0.3975 (0.3979) loss 6.4317 (6.5201) grad_norm 1.9571 (3.3102) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][70/625] eta 0:03:57 lr 0.000092 wd 0.0500 time 0.3994 (0.4273) data time 0.0008 (0.0065) model time 0.3986 (0.3984) loss 5.7915 (6.5045) grad_norm 2.3346 (3.2686) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][80/625] eta 0:03:51 lr 0.000092 wd 0.0500 time 0.4027 (0.4239) data time 0.0008 (0.0058) model time 0.4019 (0.3987) loss 5.5759 (6.4720) grad_norm 2.8181 (3.2789) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][90/625] eta 0:03:45 lr 0.000092 wd 0.0500 time 0.3986 (0.4211) data time 0.0008 (0.0053) model time 0.3978 (0.3985) loss 6.9703 (6.4688) grad_norm 3.8903 (3.2737) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:42:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][100/625] eta 0:03:39 lr 0.000092 wd 0.0500 time 0.3980 (0.4189) data time 0.0007 (0.0048) model time 0.3973 (0.3984) loss 5.8606 (6.4571) grad_norm 1.9191 (3.2661) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][110/625] eta 0:03:34 lr 0.000092 wd 0.0500 time 0.3970 (0.4170) data time 0.0007 (0.0045) model time 0.3964 (0.3981) loss 6.2110 (6.4695) grad_norm 2.8453 (3.2002) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][120/625] eta 0:03:29 lr 0.000092 wd 0.0500 time 0.3994 (0.4157) data time 0.0007 (0.0042) model time 0.3987 (0.3984) loss 5.7380 (6.4593) grad_norm 1.8829 (3.1867) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][130/625] eta 0:03:25 lr 0.000092 wd 0.0500 time 0.3958 (0.4144) data time 0.0008 (0.0039) model time 0.3950 (0.3984) loss 7.5458 (6.4806) grad_norm 3.6520 (3.1566) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][140/625] eta 0:03:20 lr 0.000092 wd 0.0500 time 0.4021 (0.4134) data time 0.0007 (0.0037) model time 0.4014 (0.3985) loss 6.0297 (6.4790) grad_norm 3.6698 (3.1369) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][150/625] eta 0:03:15 lr 0.000092 wd 0.0500 time 0.3976 (0.4124) data time 0.0007 (0.0035) model time 0.3969 (0.3984) loss 5.9283 (6.4857) grad_norm 3.1459 (3.1275) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][160/625] eta 0:03:11 lr 0.000092 wd 0.0500 time 0.3965 (0.4115) data time 0.0009 (0.0033) model time 0.3956 (0.3983) loss 7.7124 (6.4841) grad_norm 2.6330 (3.4747) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][170/625] eta 0:03:06 lr 0.000092 wd 0.0500 time 0.4031 (0.4108) data time 0.0006 (0.0032) model time 0.4025 (0.3983) loss 6.8296 (6.4696) grad_norm 2.6430 (3.4549) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][180/625] eta 0:03:02 lr 0.000092 wd 0.0500 time 0.3940 (0.4102) data time 0.0006 (0.0031) model time 0.3934 (0.3983) loss 5.6363 (6.4557) grad_norm 2.3771 (3.4265) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][190/625] eta 0:02:59 lr 0.000092 wd 0.0500 time 0.3971 (0.4124) data time 0.0008 (0.0029) model time 0.3963 (0.4021) loss 6.7060 (6.4516) grad_norm 2.3333 (3.3917) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][200/625] eta 0:02:55 lr 0.000092 wd 0.0500 time 0.4020 (0.4127) data time 0.0007 (0.0028) model time 0.4014 (0.4032) loss 5.8888 (6.4491) grad_norm 2.2545 (3.3646) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][210/625] eta 0:02:52 lr 0.000092 wd 0.0500 time 0.5837 (0.4159) data time 0.0009 (0.0028) model time 0.5828 (0.4079) loss 7.1497 (6.4749) grad_norm 1.7091 (3.3541) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][220/625] eta 0:02:49 lr 0.000092 wd 0.0500 time 0.3996 (0.4182) data time 0.0006 (0.0027) model time 0.3990 (0.4113) loss 6.3185 (6.4681) grad_norm 3.7117 (3.3959) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][230/625] eta 0:02:46 lr 0.000091 wd 0.0500 time 0.3939 (0.4204) data time 0.0008 (0.0026) model time 0.3930 (0.4146) loss 6.2056 (6.4631) grad_norm 3.4369 (3.3856) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][240/625] eta 0:02:42 lr 0.000091 wd 0.0500 time 0.3928 (0.4209) data time 0.0007 (0.0025) model time 0.3921 (0.4154) loss 6.5190 (6.4604) grad_norm 6.1382 (3.3903) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:43:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][250/625] eta 0:02:37 lr 0.000091 wd 0.0500 time 0.3934 (0.4207) data time 0.0007 (0.0024) model time 0.3927 (0.4154) loss 6.3009 (6.4632) grad_norm 2.6788 (3.3934) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][260/625] eta 0:02:33 lr 0.000091 wd 0.0500 time 0.3947 (0.4205) data time 0.0006 (0.0024) model time 0.3940 (0.4154) loss 7.6427 (6.4649) grad_norm 2.7993 (3.3693) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][270/625] eta 0:02:28 lr 0.000091 wd 0.0500 time 0.3943 (0.4197) data time 0.0008 (0.0023) model time 0.3935 (0.4146) loss 5.8978 (6.4611) grad_norm 9.4549 (3.4011) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][280/625] eta 0:02:24 lr 0.000091 wd 0.0500 time 0.4027 (0.4189) data time 0.0009 (0.0023) model time 0.4019 (0.4138) loss 5.4580 (6.4611) grad_norm 2.1750 (3.3911) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][290/625] eta 0:02:20 lr 0.000091 wd 0.0500 time 0.3958 (0.4183) data time 0.0006 (0.0022) model time 0.3952 (0.4132) loss 6.2983 (6.4634) grad_norm 3.7894 (3.3875) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][300/625] eta 0:02:15 lr 0.000091 wd 0.0500 time 0.3973 (0.4176) data time 0.0008 (0.0022) model time 0.3965 (0.4125) loss 5.9662 (6.4635) grad_norm 2.5885 (3.3860) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][310/625] eta 0:02:11 lr 0.000091 wd 0.0500 time 0.3986 (0.4170) data time 0.0006 (0.0021) model time 0.3980 (0.4120) loss 7.0966 (6.4729) grad_norm 2.2881 (3.3606) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][320/625] eta 0:02:07 lr 0.000091 wd 0.0500 time 0.3996 (0.4165) data time 0.0008 (0.0021) model time 0.3988 (0.4115) loss 6.2957 (6.4690) grad_norm 4.0678 (3.3381) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][330/625] eta 0:02:02 lr 0.000091 wd 0.0500 time 0.4018 (0.4160) data time 0.0010 (0.0021) model time 0.4007 (0.4111) loss 6.2245 (6.4683) grad_norm 2.6493 (3.3285) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][340/625] eta 0:01:58 lr 0.000091 wd 0.0500 time 0.4014 (0.4155) data time 0.0008 (0.0020) model time 0.4006 (0.4107) loss 6.8394 (6.4733) grad_norm 3.5954 (3.3087) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][350/625] eta 0:01:54 lr 0.000091 wd 0.0500 time 0.3997 (0.4151) data time 0.0006 (0.0020) model time 0.3991 (0.4103) loss 7.0217 (6.4710) grad_norm 1.9363 (3.2974) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][360/625] eta 0:01:49 lr 0.000091 wd 0.0500 time 0.3968 (0.4147) data time 0.0006 (0.0020) model time 0.3962 (0.4100) loss 6.2314 (6.4661) grad_norm 2.1837 (3.2821) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][370/625] eta 0:01:45 lr 0.000091 wd 0.0500 time 0.3991 (0.4143) data time 0.0007 (0.0019) model time 0.3984 (0.4097) loss 7.3249 (6.4674) grad_norm 3.9525 (3.2755) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][380/625] eta 0:01:41 lr 0.000091 wd 0.0500 time 0.4056 (0.4140) data time 0.0007 (0.0019) model time 0.4050 (0.4094) loss 7.3182 (6.4609) grad_norm 2.2848 (3.2850) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:44:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][390/625] eta 0:01:37 lr 0.000091 wd 0.0500 time 0.3987 (0.4136) data time 0.0008 (0.0019) model time 0.3979 (0.4091) loss 5.4805 (6.4566) grad_norm 2.7406 (3.2740) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][400/625] eta 0:01:33 lr 0.000091 wd 0.0500 time 0.5025 (0.4135) data time 0.0006 (0.0018) model time 0.5018 (0.4091) loss 7.1174 (6.4605) grad_norm 4.7486 (3.2880) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][410/625] eta 0:01:28 lr 0.000091 wd 0.0500 time 0.3989 (0.4139) data time 0.0007 (0.0018) model time 0.3982 (0.4096) loss 5.1153 (6.4592) grad_norm 2.4263 (3.2768) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][420/625] eta 0:01:25 lr 0.000090 wd 0.0500 time 0.5735 (0.4147) data time 0.0006 (0.0018) model time 0.5729 (0.4106) loss 6.1777 (6.4573) grad_norm 2.1822 (3.3061) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][430/625] eta 0:01:21 lr 0.000090 wd 0.0500 time 0.5635 (0.4161) data time 0.0007 (0.0018) model time 0.5628 (0.4123) loss 6.0613 (6.4581) grad_norm 2.6646 (3.2957) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][440/625] eta 0:01:17 lr 0.000090 wd 0.0500 time 0.5816 (0.4173) data time 0.0008 (0.0018) model time 0.5808 (0.4137) loss 6.7992 (6.4591) grad_norm 2.4729 (3.2861) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][450/625] eta 0:01:13 lr 0.000090 wd 0.0500 time 0.3960 (0.4180) data time 0.0007 (0.0017) model time 0.3954 (0.4145) loss 6.7605 (6.4584) grad_norm 2.3161 (3.2753) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][460/625] eta 0:01:09 lr 0.000090 wd 0.0500 time 0.3956 (0.4185) data time 0.0006 (0.0017) model time 0.3949 (0.4152) loss 6.1672 (6.4623) grad_norm 3.4315 (3.2663) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][470/625] eta 0:01:04 lr 0.000090 wd 0.0500 time 0.4035 (0.4189) data time 0.0006 (0.0017) model time 0.4029 (0.4157) loss 6.6180 (6.4628) grad_norm 3.9827 (3.2695) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][480/625] eta 0:01:00 lr 0.000090 wd 0.0500 time 0.3978 (0.4187) data time 0.0008 (0.0017) model time 0.3969 (0.4156) loss 7.2238 (6.4695) grad_norm 5.0280 (3.2667) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][490/625] eta 0:00:56 lr 0.000090 wd 0.0500 time 0.3964 (0.4183) data time 0.0006 (0.0017) model time 0.3958 (0.4152) loss 6.1398 (6.4737) grad_norm 4.4046 (3.2599) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][500/625] eta 0:00:52 lr 0.000090 wd 0.0500 time 0.4104 (0.4180) data time 0.0006 (0.0016) model time 0.4098 (0.4148) loss 6.1626 (6.4709) grad_norm 4.1544 (3.2728) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][510/625] eta 0:00:48 lr 0.000090 wd 0.0500 time 0.3968 (0.4176) data time 0.0007 (0.0016) model time 0.3960 (0.4145) loss 5.4962 (6.4695) grad_norm 3.1416 (3.2729) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][520/625] eta 0:00:43 lr 0.000090 wd 0.0500 time 0.3981 (0.4173) data time 0.0006 (0.0016) model time 0.3975 (0.4142) loss 7.6668 (6.4698) grad_norm 1.9716 (3.2772) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][530/625] eta 0:00:39 lr 0.000090 wd 0.0500 time 0.3981 (0.4170) data time 0.0007 (0.0016) model time 0.3974 (0.4139) loss 5.6970 (6.4775) grad_norm 3.2668 (3.2673) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][540/625] eta 0:00:35 lr 0.000090 wd 0.0500 time 0.4007 (0.4167) data time 0.0008 (0.0016) model time 0.3999 (0.4136) loss 5.8724 (6.4764) grad_norm 3.5743 (3.2789) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][550/625] eta 0:00:31 lr 0.000090 wd 0.0500 time 0.4017 (0.4164) data time 0.0008 (0.0016) model time 0.4009 (0.4133) loss 5.9148 (6.4811) grad_norm 2.5761 (3.2717) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][560/625] eta 0:00:27 lr 0.000090 wd 0.0500 time 0.4030 (0.4161) data time 0.0006 (0.0016) model time 0.4024 (0.4131) loss 6.6991 (6.4806) grad_norm 3.7353 (3.2767) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][570/625] eta 0:00:22 lr 0.000090 wd 0.0500 time 0.3987 (0.4158) data time 0.0006 (0.0015) model time 0.3981 (0.4128) loss 6.8148 (6.4792) grad_norm 2.4741 (3.2768) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][580/625] eta 0:00:18 lr 0.000090 wd 0.0500 time 0.3981 (0.4155) data time 0.0007 (0.0015) model time 0.3974 (0.4125) loss 6.9033 (6.4833) grad_norm 4.8522 (3.2887) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][590/625] eta 0:00:14 lr 0.000090 wd 0.0500 time 0.3993 (0.4153) data time 0.0008 (0.0015) model time 0.3985 (0.4123) loss 5.9743 (6.4814) grad_norm 7.7908 (3.3041) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][600/625] eta 0:00:10 lr 0.000090 wd 0.0500 time 0.3980 (0.4151) data time 0.0009 (0.0015) model time 0.3971 (0.4121) loss 7.2526 (6.4762) grad_norm 3.0787 (3.3825) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][610/625] eta 0:00:06 lr 0.000089 wd 0.0500 time 0.3988 (0.4148) data time 0.0006 (0.0015) model time 0.3982 (0.4118) loss 6.1396 (6.4760) grad_norm 4.9928 (3.3801) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][620/625] eta 0:00:02 lr 0.000089 wd 0.0500 time 0.5387 (0.4147) data time 0.0006 (0.0015) model time 0.5381 (0.4118) loss 5.4578 (6.4709) grad_norm 2.2086 (3.3724) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 253 training takes 0:04:19 [2024-07-25 10:46:33 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:46:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5444 (0.5444) Acc@1 90.039 (90.039) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8184 (0.6602) Acc@1 83.203 (87.451) Acc@5 97.119 (98.025) Mem 14939MB [2024-07-25 10:46:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9194 (0.7676) Acc@1 79.297 (84.489) Acc@5 95.752 (97.031) Mem 14939MB [2024-07-25 10:46:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.111 Acc@5 97.003 [2024-07-25 10:46:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 10:46:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.11% [2024-07-25 10:46:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 10:46:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 10:46:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5371 (0.5371) Acc@1 90.088 (90.088) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8145 (0.6575) Acc@1 83.154 (87.336) Acc@5 96.973 (97.976) Mem 14939MB [2024-07-25 10:46:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9224 (0.7624) Acc@1 78.369 (84.405) Acc@5 95.557 (96.989) Mem 14939MB [2024-07-25 10:46:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.991 Acc@5 96.945 [2024-07-25 10:46:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 10:46:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.99% [2024-07-25 10:46:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:46:41 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:46:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][0/625] eta 0:08:22 lr 0.000089 wd 0.0500 time 0.8041 (0.8041) data time 0.4234 (0.4234) model time 0.0000 (0.0000) loss 6.5051 (6.5051) grad_norm 1.9026 (1.9026) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][10/625] eta 0:04:45 lr 0.000089 wd 0.0500 time 0.3970 (0.4649) data time 0.0006 (0.0393) model time 0.0000 (0.0000) loss 5.7237 (6.6539) grad_norm 2.5297 (2.4004) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][20/625] eta 0:04:43 lr 0.000089 wd 0.0500 time 0.5861 (0.4693) data time 0.0007 (0.0210) model time 0.0000 (0.0000) loss 6.5888 (6.5523) grad_norm 4.2657 (2.8180) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:46:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][30/625] eta 0:04:38 lr 0.000089 wd 0.0500 time 0.5782 (0.4683) data time 0.0006 (0.0145) model time 0.0000 (0.0000) loss 6.4699 (6.4785) grad_norm 2.3222 (2.9733) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][40/625] eta 0:04:31 lr 0.000089 wd 0.0500 time 0.5968 (0.4646) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 5.8094 (6.4469) grad_norm 7.0063 (3.0752) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][50/625] eta 0:04:23 lr 0.000089 wd 0.0500 time 0.5640 (0.4581) data time 0.0009 (0.0092) model time 0.0000 (0.0000) loss 5.0881 (6.3790) grad_norm 3.0570 (3.1328) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][60/625] eta 0:04:16 lr 0.000089 wd 0.0500 time 0.3951 (0.4533) data time 0.0009 (0.0078) model time 0.3942 (0.4283) loss 7.6318 (6.4413) grad_norm 2.7515 (3.3420) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][70/625] eta 0:04:08 lr 0.000089 wd 0.0500 time 0.4005 (0.4482) data time 0.0008 (0.0068) model time 0.3997 (0.4221) loss 7.8275 (6.4611) grad_norm 2.7157 (3.7825) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][80/625] eta 0:04:01 lr 0.000089 wd 0.0500 time 0.3964 (0.4436) data time 0.0008 (0.0061) model time 0.3955 (0.4182) loss 6.6953 (6.4638) grad_norm 2.9933 (3.7144) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][90/625] eta 0:03:54 lr 0.000089 wd 0.0500 time 0.4000 (0.4387) data time 0.0007 (0.0055) model time 0.3993 (0.4131) loss 6.6490 (6.4657) grad_norm 2.0349 (3.5940) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][100/625] eta 0:03:48 lr 0.000089 wd 0.0500 time 0.3976 (0.4346) data time 0.0007 (0.0051) model time 0.3970 (0.4099) loss 5.4387 (6.4228) grad_norm 15.8326 (3.6884) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][110/625] eta 0:03:42 lr 0.000089 wd 0.0500 time 0.3959 (0.4316) data time 0.0007 (0.0047) model time 0.3952 (0.4083) loss 7.8739 (6.4119) grad_norm 19.1289 (3.7526) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][120/625] eta 0:03:36 lr 0.000089 wd 0.0500 time 0.3971 (0.4288) data time 0.0009 (0.0044) model time 0.3962 (0.4066) loss 6.0655 (6.3975) grad_norm 4.1615 (3.8563) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][130/625] eta 0:03:31 lr 0.000089 wd 0.0500 time 0.3968 (0.4265) data time 0.0009 (0.0041) model time 0.3960 (0.4056) loss 6.8583 (6.4070) grad_norm 2.4416 (3.7671) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][140/625] eta 0:03:25 lr 0.000089 wd 0.0500 time 0.3948 (0.4245) data time 0.0009 (0.0039) model time 0.3939 (0.4046) loss 5.9734 (6.4042) grad_norm 4.2665 (3.7778) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][150/625] eta 0:03:20 lr 0.000089 wd 0.0500 time 0.3965 (0.4228) data time 0.0008 (0.0037) model time 0.3957 (0.4039) loss 7.0233 (6.4105) grad_norm 2.7750 (3.7401) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][160/625] eta 0:03:15 lr 0.000089 wd 0.0500 time 0.3991 (0.4213) data time 0.0009 (0.0035) model time 0.3982 (0.4034) loss 6.8393 (6.4235) grad_norm 2.9385 (3.7026) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][170/625] eta 0:03:11 lr 0.000088 wd 0.0500 time 0.3779 (0.4206) data time 0.0008 (0.0033) model time 0.3771 (0.4038) loss 7.3831 (6.4387) grad_norm 2.5647 (3.6429) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][180/625] eta 0:03:06 lr 0.000088 wd 0.0500 time 0.3928 (0.4194) data time 0.0007 (0.0032) model time 0.3921 (0.4033) loss 7.0763 (6.4414) grad_norm 1.7638 (3.6437) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][190/625] eta 0:03:01 lr 0.000088 wd 0.0500 time 0.4130 (0.4184) data time 0.0007 (0.0031) model time 0.4124 (0.4031) loss 6.4383 (6.4332) grad_norm 1.9822 (3.6017) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][200/625] eta 0:02:57 lr 0.000088 wd 0.0500 time 0.3980 (0.4173) data time 0.0007 (0.0030) model time 0.3973 (0.4026) loss 6.9461 (6.4435) grad_norm 3.0338 (3.6825) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][210/625] eta 0:02:52 lr 0.000088 wd 0.0500 time 0.3941 (0.4165) data time 0.0006 (0.0029) model time 0.3934 (0.4023) loss 6.0554 (6.4434) grad_norm 2.3463 (3.7573) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][220/625] eta 0:02:48 lr 0.000088 wd 0.0500 time 0.4035 (0.4170) data time 0.0009 (0.0028) model time 0.4027 (0.4038) loss 6.3320 (6.4405) grad_norm 2.6376 (3.7386) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][230/625] eta 0:02:44 lr 0.000088 wd 0.0500 time 0.3969 (0.4171) data time 0.0007 (0.0027) model time 0.3962 (0.4047) loss 6.2353 (6.4401) grad_norm 2.4096 (3.6765) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][240/625] eta 0:02:41 lr 0.000088 wd 0.0500 time 0.3941 (0.4199) data time 0.0006 (0.0026) model time 0.3935 (0.4088) loss 7.7326 (6.4592) grad_norm 2.5094 (3.6273) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][250/625] eta 0:02:38 lr 0.000088 wd 0.0500 time 0.3979 (0.4232) data time 0.0009 (0.0026) model time 0.3970 (0.4134) loss 6.5025 (6.4607) grad_norm 12.3802 (3.6899) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][260/625] eta 0:02:35 lr 0.000088 wd 0.0500 time 0.3970 (0.4250) data time 0.0008 (0.0025) model time 0.3962 (0.4161) loss 7.2697 (6.4732) grad_norm 3.5673 (3.6613) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:48:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][270/625] eta 0:02:31 lr 0.000088 wd 0.0500 time 0.3992 (0.4263) data time 0.0008 (0.0024) model time 0.3984 (0.4181) loss 6.5216 (6.4705) grad_norm 4.6630 (3.6724) loss_scale 256.0000 (129.4170) mem 14939MB [2024-07-25 10:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][280/625] eta 0:02:27 lr 0.000088 wd 0.0500 time 0.3970 (0.4265) data time 0.0006 (0.0024) model time 0.3964 (0.4186) loss 5.8736 (6.4753) grad_norm 4.7351 (3.6484) loss_scale 256.0000 (133.9217) mem 14939MB [2024-07-25 10:48:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][290/625] eta 0:02:22 lr 0.000088 wd 0.0500 time 0.4000 (0.4262) data time 0.0006 (0.0023) model time 0.3993 (0.4186) loss 6.7895 (6.4659) grad_norm 3.0175 (3.6987) loss_scale 256.0000 (138.1168) mem 14939MB [2024-07-25 10:48:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][300/625] eta 0:02:18 lr 0.000088 wd 0.0500 time 0.3980 (0.4259) data time 0.0008 (0.0023) model time 0.3972 (0.4184) loss 6.7560 (6.4637) grad_norm 2.6736 (3.7031) loss_scale 256.0000 (142.0332) mem 14939MB [2024-07-25 10:48:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][310/625] eta 0:02:13 lr 0.000088 wd 0.0500 time 0.3995 (0.4250) data time 0.0006 (0.0022) model time 0.3989 (0.4176) loss 6.9964 (6.4694) grad_norm 2.5668 (3.7433) loss_scale 256.0000 (145.6977) mem 14939MB [2024-07-25 10:48:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][320/625] eta 0:02:09 lr 0.000088 wd 0.0500 time 0.3991 (0.4242) data time 0.0006 (0.0022) model time 0.3985 (0.4169) loss 5.9466 (6.4582) grad_norm 11.1645 (3.7629) loss_scale 256.0000 (149.1340) mem 14939MB [2024-07-25 10:49:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][330/625] eta 0:02:04 lr 0.000088 wd 0.0500 time 0.4018 (0.4234) data time 0.0006 (0.0022) model time 0.4012 (0.4162) loss 6.1494 (6.4538) grad_norm 2.2525 (3.7273) loss_scale 256.0000 (152.3625) mem 14939MB [2024-07-25 10:49:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][340/625] eta 0:02:00 lr 0.000088 wd 0.0500 time 0.3997 (0.4227) data time 0.0008 (0.0021) model time 0.3988 (0.4156) loss 7.0250 (6.4559) grad_norm 2.5976 (3.7881) loss_scale 256.0000 (155.4018) mem 14939MB [2024-07-25 10:49:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][350/625] eta 0:01:56 lr 0.000088 wd 0.0500 time 0.3987 (0.4220) data time 0.0008 (0.0021) model time 0.3978 (0.4150) loss 7.6542 (6.4681) grad_norm 3.8814 (3.7608) loss_scale 256.0000 (158.2678) mem 14939MB [2024-07-25 10:49:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][360/625] eta 0:01:51 lr 0.000087 wd 0.0500 time 0.3989 (0.4214) data time 0.0007 (0.0021) model time 0.3982 (0.4145) loss 6.3551 (6.4697) grad_norm 3.5838 (3.7568) loss_scale 256.0000 (160.9751) mem 14939MB [2024-07-25 10:49:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][370/625] eta 0:01:47 lr 0.000087 wd 0.0500 time 0.4089 (0.4208) data time 0.0009 (0.0020) model time 0.4080 (0.4140) loss 6.6808 (6.4627) grad_norm 2.6417 (3.7594) loss_scale 256.0000 (163.5364) mem 14939MB [2024-07-25 10:49:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][380/625] eta 0:01:42 lr 0.000087 wd 0.0500 time 0.3989 (0.4202) data time 0.0006 (0.0020) model time 0.3983 (0.4135) loss 6.5954 (6.4683) grad_norm 3.1874 (3.7450) loss_scale 256.0000 (165.9633) mem 14939MB [2024-07-25 10:49:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][390/625] eta 0:01:38 lr 0.000087 wd 0.0500 time 0.5965 (0.4202) data time 0.0006 (0.0020) model time 0.5959 (0.4137) loss 7.6886 (6.4728) grad_norm 3.6677 (3.7265) loss_scale 256.0000 (168.2660) mem 14939MB [2024-07-25 10:49:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][400/625] eta 0:01:34 lr 0.000087 wd 0.0500 time 0.3998 (0.4197) data time 0.0009 (0.0019) model time 0.3989 (0.4132) loss 6.5236 (6.4750) grad_norm 1.9653 (3.7121) loss_scale 256.0000 (170.4539) mem 14939MB [2024-07-25 10:49:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][410/625] eta 0:01:30 lr 0.000087 wd 0.0500 time 0.3978 (0.4192) data time 0.0006 (0.0019) model time 0.3971 (0.4128) loss 5.4304 (6.4721) grad_norm 2.0764 (3.6822) loss_scale 256.0000 (172.5353) mem 14939MB [2024-07-25 10:49:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][420/625] eta 0:01:25 lr 0.000087 wd 0.0500 time 0.3970 (0.4187) data time 0.0008 (0.0019) model time 0.3962 (0.4124) loss 6.1154 (6.4712) grad_norm 2.5549 (3.6737) loss_scale 256.0000 (174.5178) mem 14939MB [2024-07-25 10:49:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][430/625] eta 0:01:21 lr 0.000087 wd 0.0500 time 0.4017 (0.4183) data time 0.0007 (0.0019) model time 0.4010 (0.4120) loss 4.7704 (6.4717) grad_norm 2.4350 (3.6604) loss_scale 256.0000 (176.4084) mem 14939MB [2024-07-25 10:49:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][440/625] eta 0:01:17 lr 0.000087 wd 0.0500 time 0.4015 (0.4182) data time 0.0009 (0.0018) model time 0.4006 (0.4122) loss 7.1575 (6.4872) grad_norm 2.3373 (3.6596) loss_scale 256.0000 (178.2132) mem 14939MB [2024-07-25 10:49:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][450/625] eta 0:01:13 lr 0.000087 wd 0.0500 time 0.3959 (0.4186) data time 0.0007 (0.0018) model time 0.3953 (0.4127) loss 6.7086 (6.4844) grad_norm 2.8947 (3.6618) loss_scale 256.0000 (179.9379) mem 14939MB [2024-07-25 10:49:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][460/625] eta 0:01:09 lr 0.000087 wd 0.0500 time 0.5763 (0.4198) data time 0.0008 (0.0018) model time 0.5755 (0.4141) loss 6.8453 (6.4944) grad_norm 3.1300 (3.6439) loss_scale 256.0000 (181.5879) mem 14939MB [2024-07-25 10:49:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][470/625] eta 0:01:05 lr 0.000087 wd 0.0500 time 0.3994 (0.4211) data time 0.0006 (0.0018) model time 0.3988 (0.4157) loss 6.3866 (6.4945) grad_norm 2.4580 (3.6543) loss_scale 256.0000 (183.1677) mem 14939MB [2024-07-25 10:50:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][480/625] eta 0:01:01 lr 0.000087 wd 0.0500 time 0.3984 (0.4219) data time 0.0009 (0.0018) model time 0.3976 (0.4167) loss 5.3774 (6.4943) grad_norm 2.6269 (3.6581) loss_scale 256.0000 (184.6819) mem 14939MB [2024-07-25 10:50:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][490/625] eta 0:00:57 lr 0.000087 wd 0.0500 time 0.4003 (0.4228) data time 0.0008 (0.0017) model time 0.3995 (0.4179) loss 5.8348 (6.4938) grad_norm 2.7028 (3.6512) loss_scale 256.0000 (186.1344) mem 14939MB [2024-07-25 10:50:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][500/625] eta 0:00:52 lr 0.000087 wd 0.0500 time 0.5689 (0.4229) data time 0.0006 (0.0017) model time 0.5683 (0.4181) loss 7.4248 (6.4960) grad_norm 2.6237 (3.6392) loss_scale 256.0000 (187.5289) mem 14939MB [2024-07-25 10:50:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][510/625] eta 0:00:48 lr 0.000087 wd 0.0500 time 0.3964 (0.4227) data time 0.0006 (0.0017) model time 0.3957 (0.4179) loss 6.5635 (6.4958) grad_norm 2.4099 (3.6283) loss_scale 256.0000 (188.8689) mem 14939MB [2024-07-25 10:50:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][520/625] eta 0:00:44 lr 0.000087 wd 0.0500 time 0.4131 (0.4225) data time 0.0007 (0.0017) model time 0.4124 (0.4178) loss 6.2969 (6.4990) grad_norm 1.9174 (3.6164) loss_scale 256.0000 (190.1574) mem 14939MB [2024-07-25 10:50:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][530/625] eta 0:00:40 lr 0.000087 wd 0.0500 time 0.3970 (0.4221) data time 0.0006 (0.0017) model time 0.3964 (0.4174) loss 6.0602 (6.4981) grad_norm 3.0652 (3.5967) loss_scale 256.0000 (191.3974) mem 14939MB [2024-07-25 10:50:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][540/625] eta 0:00:35 lr 0.000087 wd 0.0500 time 0.4132 (0.4217) data time 0.0009 (0.0017) model time 0.4123 (0.4171) loss 5.8251 (6.4903) grad_norm 3.2649 (3.5879) loss_scale 256.0000 (192.5915) mem 14939MB [2024-07-25 10:50:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][550/625] eta 0:00:31 lr 0.000087 wd 0.0500 time 0.3967 (0.4214) data time 0.0007 (0.0016) model time 0.3961 (0.4167) loss 6.5762 (6.4890) grad_norm 2.7853 (3.5810) loss_scale 256.0000 (193.7423) mem 14939MB [2024-07-25 10:50:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][560/625] eta 0:00:27 lr 0.000086 wd 0.0500 time 0.3987 (0.4210) data time 0.0009 (0.0016) model time 0.3978 (0.4164) loss 7.1014 (6.4942) grad_norm 2.7292 (3.5694) loss_scale 256.0000 (194.8520) mem 14939MB [2024-07-25 10:50:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][570/625] eta 0:00:23 lr 0.000086 wd 0.0500 time 0.3971 (0.4206) data time 0.0006 (0.0016) model time 0.3964 (0.4160) loss 5.9761 (6.4934) grad_norm 2.9220 (3.5606) loss_scale 256.0000 (195.9229) mem 14939MB [2024-07-25 10:50:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][580/625] eta 0:00:18 lr 0.000086 wd 0.0500 time 0.4018 (0.4202) data time 0.0006 (0.0016) model time 0.4012 (0.4157) loss 6.0762 (6.4940) grad_norm 3.2848 (3.5519) loss_scale 256.0000 (196.9570) mem 14939MB [2024-07-25 10:50:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][590/625] eta 0:00:14 lr 0.000086 wd 0.0500 time 0.3982 (0.4198) data time 0.0006 (0.0016) model time 0.3975 (0.4154) loss 6.3266 (6.4971) grad_norm 4.0449 (3.5405) loss_scale 256.0000 (197.9560) mem 14939MB [2024-07-25 10:50:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][600/625] eta 0:00:10 lr 0.000086 wd 0.0500 time 0.4002 (0.4195) data time 0.0007 (0.0016) model time 0.3995 (0.4150) loss 6.9618 (6.4968) grad_norm 3.7422 (3.5500) loss_scale 256.0000 (198.9218) mem 14939MB [2024-07-25 10:50:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][610/625] eta 0:00:06 lr 0.000086 wd 0.0500 time 0.3973 (0.4195) data time 0.0006 (0.0016) model time 0.3968 (0.4151) loss 5.4962 (6.4927) grad_norm 2.2142 (3.5416) loss_scale 256.0000 (199.8560) mem 14939MB [2024-07-25 10:51:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][620/625] eta 0:00:02 lr 0.000086 wd 0.0500 time 0.3956 (0.4191) data time 0.0004 (0.0015) model time 0.3951 (0.4148) loss 7.0015 (6.4934) grad_norm 42.0204 (3.6231) loss_scale 256.0000 (200.7601) mem 14939MB [2024-07-25 10:51:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 254 training takes 0:04:21 [2024-07-25 10:51:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:51:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.435 (0.435) Loss 0.5410 (0.5410) Acc@1 90.283 (90.283) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 10:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8120 (0.6550) Acc@1 83.008 (87.451) Acc@5 96.973 (98.002) Mem 14939MB [2024-07-25 10:51:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9111 (0.7619) Acc@1 79.541 (84.473) Acc@5 95.752 (97.052) Mem 14939MB [2024-07-25 10:51:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.073 Acc@5 97.009 [2024-07-25 10:51:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 10:51:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.784 (0.784) Loss 0.5376 (0.5376) Acc@1 90.137 (90.137) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:51:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.153) Loss 0.8140 (0.6573) Acc@1 83.203 (87.367) Acc@5 96.973 (97.989) Mem 14939MB [2024-07-25 10:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9219 (0.7621) Acc@1 78.467 (84.426) Acc@5 95.508 (96.996) Mem 14939MB [2024-07-25 10:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.013 Acc@5 96.955 [2024-07-25 10:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 10:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.01% [2024-07-25 10:51:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:51:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:51:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][0/625] eta 0:07:40 lr 0.000086 wd 0.0500 time 0.7362 (0.7362) data time 0.3570 (0.3570) model time 0.0000 (0.0000) loss 5.5393 (5.5393) grad_norm 4.2399 (4.2399) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][10/625] eta 0:04:24 lr 0.000086 wd 0.0500 time 0.3982 (0.4295) data time 0.0006 (0.0332) model time 0.0000 (0.0000) loss 5.8661 (6.3075) grad_norm 3.7886 (3.4133) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][20/625] eta 0:04:11 lr 0.000086 wd 0.0500 time 0.3969 (0.4149) data time 0.0007 (0.0178) model time 0.0000 (0.0000) loss 5.7757 (6.2862) grad_norm 8.9071 (3.8483) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][30/625] eta 0:04:07 lr 0.000086 wd 0.0500 time 0.3998 (0.4159) data time 0.0008 (0.0123) model time 0.0000 (0.0000) loss 6.0890 (6.3526) grad_norm 41.0502 (4.9804) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][40/625] eta 0:04:04 lr 0.000086 wd 0.0500 time 0.4041 (0.4174) data time 0.0007 (0.0095) model time 0.0000 (0.0000) loss 6.6733 (6.4370) grad_norm 3.3047 (4.4684) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][50/625] eta 0:04:04 lr 0.000086 wd 0.0500 time 0.5610 (0.4251) data time 0.0008 (0.0078) model time 0.0000 (0.0000) loss 5.9022 (6.5041) grad_norm 20.5957 (4.5256) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][60/625] eta 0:04:02 lr 0.000086 wd 0.0500 time 0.4033 (0.4289) data time 0.0009 (0.0067) model time 0.4024 (0.4470) loss 7.5422 (6.4981) grad_norm 2.3897 (5.0248) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][70/625] eta 0:04:02 lr 0.000086 wd 0.0500 time 0.5808 (0.4374) data time 0.0007 (0.0059) model time 0.5801 (0.4679) loss 6.2254 (6.4867) grad_norm 2.4767 (4.6826) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][80/625] eta 0:03:58 lr 0.000086 wd 0.0500 time 0.5905 (0.4384) data time 0.0009 (0.0053) model time 0.5896 (0.4601) loss 6.6373 (6.5107) grad_norm 2.4798 (4.5957) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][90/625] eta 0:03:56 lr 0.000086 wd 0.0500 time 0.5547 (0.4412) data time 0.0008 (0.0048) model time 0.5538 (0.4608) loss 5.7892 (6.5102) grad_norm 3.9971 (4.3854) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:51:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][100/625] eta 0:03:50 lr 0.000086 wd 0.0500 time 0.3955 (0.4398) data time 0.0009 (0.0044) model time 0.3946 (0.4540) loss 6.9825 (6.5093) grad_norm 3.1533 (4.2663) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][110/625] eta 0:03:44 lr 0.000086 wd 0.0500 time 0.4972 (0.4368) data time 0.0006 (0.0041) model time 0.4966 (0.4458) loss 5.8817 (6.5124) grad_norm 2.9335 (4.1496) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][120/625] eta 0:03:38 lr 0.000085 wd 0.0500 time 0.3968 (0.4334) data time 0.0009 (0.0038) model time 0.3959 (0.4386) loss 7.4535 (6.5313) grad_norm 2.5558 (4.0637) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][130/625] eta 0:03:33 lr 0.000085 wd 0.0500 time 0.3952 (0.4306) data time 0.0008 (0.0036) model time 0.3944 (0.4332) loss 5.5982 (6.5110) grad_norm 4.0611 (4.0363) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][140/625] eta 0:03:27 lr 0.000085 wd 0.0500 time 0.4133 (0.4283) data time 0.0009 (0.0034) model time 0.4124 (0.4293) loss 5.8476 (6.5083) grad_norm 2.5688 (3.9637) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][150/625] eta 0:03:22 lr 0.000085 wd 0.0500 time 0.4011 (0.4273) data time 0.0007 (0.0032) model time 0.4005 (0.4275) loss 6.5206 (6.4934) grad_norm 2.5191 (3.9075) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][160/625] eta 0:03:17 lr 0.000085 wd 0.0500 time 0.3989 (0.4255) data time 0.0009 (0.0031) model time 0.3980 (0.4248) loss 7.4048 (6.4855) grad_norm 3.7046 (3.9608) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][170/625] eta 0:03:12 lr 0.000085 wd 0.0500 time 0.3982 (0.4239) data time 0.0008 (0.0029) model time 0.3973 (0.4225) loss 7.2268 (6.4818) grad_norm 3.2518 (3.9512) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][180/625] eta 0:03:08 lr 0.000085 wd 0.0500 time 0.3964 (0.4226) data time 0.0007 (0.0028) model time 0.3957 (0.4207) loss 6.5240 (6.4845) grad_norm 3.8257 (3.9126) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][190/625] eta 0:03:03 lr 0.000085 wd 0.0500 time 0.3976 (0.4214) data time 0.0009 (0.0027) model time 0.3967 (0.4191) loss 5.6834 (6.4901) grad_norm 12.6021 (3.9198) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][200/625] eta 0:02:58 lr 0.000085 wd 0.0500 time 0.4028 (0.4203) data time 0.0010 (0.0026) model time 0.4018 (0.4178) loss 6.0175 (6.4805) grad_norm 2.4698 (3.8903) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 10:52:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][210/625] eta 0:02:53 lr 0.000085 wd 0.0500 time 0.3974 (0.4192) data time 0.0010 (0.0026) model time 0.3964 (0.4165) loss 7.0341 (6.4865) grad_norm 2.1594 (inf) loss_scale 128.0000 (253.5735) mem 14939MB [2024-07-25 10:52:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][220/625] eta 0:02:49 lr 0.000085 wd 0.0500 time 0.3986 (0.4183) data time 0.0008 (0.0025) model time 0.3978 (0.4153) loss 7.4971 (6.4862) grad_norm 3.1119 (inf) loss_scale 128.0000 (247.8914) mem 14939MB [2024-07-25 10:52:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][230/625] eta 0:02:44 lr 0.000085 wd 0.0500 time 0.3956 (0.4175) data time 0.0007 (0.0024) model time 0.3949 (0.4144) loss 6.9215 (6.4931) grad_norm 30.2651 (inf) loss_scale 128.0000 (242.7013) mem 14939MB [2024-07-25 10:52:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][240/625] eta 0:02:40 lr 0.000085 wd 0.0500 time 0.3968 (0.4167) data time 0.0007 (0.0023) model time 0.3962 (0.4136) loss 6.5349 (6.4871) grad_norm 2.5219 (inf) loss_scale 128.0000 (237.9419) mem 14939MB [2024-07-25 10:52:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][250/625] eta 0:02:36 lr 0.000085 wd 0.0500 time 0.3992 (0.4166) data time 0.0007 (0.0023) model time 0.3985 (0.4135) loss 6.4117 (6.4956) grad_norm 4.4636 (inf) loss_scale 128.0000 (233.5618) mem 14939MB [2024-07-25 10:53:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][260/625] eta 0:02:32 lr 0.000085 wd 0.0500 time 0.4039 (0.4168) data time 0.0009 (0.0022) model time 0.4030 (0.4139) loss 6.3490 (6.4939) grad_norm 3.1095 (inf) loss_scale 128.0000 (229.5172) mem 14939MB [2024-07-25 10:53:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][270/625] eta 0:02:28 lr 0.000085 wd 0.0500 time 0.5978 (0.4181) data time 0.0006 (0.0022) model time 0.5972 (0.4156) loss 5.8048 (6.4953) grad_norm 2.3316 (inf) loss_scale 128.0000 (225.7712) mem 14939MB [2024-07-25 10:53:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][280/625] eta 0:02:24 lr 0.000085 wd 0.0500 time 0.5843 (0.4201) data time 0.0009 (0.0021) model time 0.5834 (0.4181) loss 7.1981 (6.5030) grad_norm 4.9780 (inf) loss_scale 128.0000 (222.2918) mem 14939MB [2024-07-25 10:53:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][290/625] eta 0:02:20 lr 0.000085 wd 0.0500 time 0.4007 (0.4209) data time 0.0007 (0.0021) model time 0.4000 (0.4191) loss 6.8547 (6.5013) grad_norm 2.0214 (inf) loss_scale 128.0000 (219.0515) mem 14939MB [2024-07-25 10:53:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][300/625] eta 0:02:17 lr 0.000085 wd 0.0500 time 0.5857 (0.4224) data time 0.0006 (0.0020) model time 0.5851 (0.4210) loss 6.6479 (6.5059) grad_norm 4.4111 (inf) loss_scale 128.0000 (216.0266) mem 14939MB [2024-07-25 10:53:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][310/625] eta 0:02:13 lr 0.000085 wd 0.0500 time 0.5812 (0.4232) data time 0.0009 (0.0020) model time 0.5803 (0.4220) loss 7.0894 (6.5031) grad_norm 3.1550 (inf) loss_scale 128.0000 (213.1961) mem 14939MB [2024-07-25 10:53:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][320/625] eta 0:02:09 lr 0.000084 wd 0.0500 time 0.5857 (0.4237) data time 0.0009 (0.0020) model time 0.5848 (0.4226) loss 7.4420 (6.5127) grad_norm 4.0645 (inf) loss_scale 128.0000 (210.5421) mem 14939MB [2024-07-25 10:53:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][330/625] eta 0:02:04 lr 0.000084 wd 0.0500 time 0.3982 (0.4230) data time 0.0006 (0.0019) model time 0.3975 (0.4217) loss 5.8154 (6.5096) grad_norm 2.5269 (inf) loss_scale 128.0000 (208.0483) mem 14939MB [2024-07-25 10:53:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][340/625] eta 0:02:00 lr 0.000084 wd 0.0500 time 0.3989 (0.4226) data time 0.0008 (0.0019) model time 0.3981 (0.4213) loss 7.5540 (6.5137) grad_norm 2.8716 (inf) loss_scale 128.0000 (205.7009) mem 14939MB [2024-07-25 10:53:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][350/625] eta 0:01:56 lr 0.000084 wd 0.0500 time 0.3976 (0.4220) data time 0.0008 (0.0019) model time 0.3968 (0.4206) loss 7.2596 (6.5162) grad_norm 4.0841 (inf) loss_scale 128.0000 (203.4872) mem 14939MB [2024-07-25 10:53:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][360/625] eta 0:01:51 lr 0.000084 wd 0.0500 time 0.3972 (0.4213) data time 0.0007 (0.0018) model time 0.3965 (0.4198) loss 6.5687 (6.5260) grad_norm 2.2290 (inf) loss_scale 128.0000 (201.3961) mem 14939MB [2024-07-25 10:53:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][370/625] eta 0:01:47 lr 0.000084 wd 0.0500 time 0.3936 (0.4211) data time 0.0009 (0.0018) model time 0.3927 (0.4196) loss 6.6522 (6.5305) grad_norm 4.0988 (inf) loss_scale 128.0000 (199.4178) mem 14939MB [2024-07-25 10:53:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][380/625] eta 0:01:43 lr 0.000084 wd 0.0500 time 0.3970 (0.4205) data time 0.0007 (0.0018) model time 0.3963 (0.4190) loss 7.1046 (6.5335) grad_norm 2.2910 (inf) loss_scale 128.0000 (197.5433) mem 14939MB [2024-07-25 10:53:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][390/625] eta 0:01:38 lr 0.000084 wd 0.0500 time 0.3964 (0.4199) data time 0.0006 (0.0018) model time 0.3958 (0.4183) loss 6.1568 (6.5335) grad_norm 3.8898 (inf) loss_scale 128.0000 (195.7647) mem 14939MB [2024-07-25 10:54:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][400/625] eta 0:01:34 lr 0.000084 wd 0.0500 time 0.3965 (0.4194) data time 0.0009 (0.0017) model time 0.3956 (0.4177) loss 7.0658 (6.5341) grad_norm 8.0149 (inf) loss_scale 128.0000 (194.0748) mem 14939MB [2024-07-25 10:54:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][410/625] eta 0:01:30 lr 0.000084 wd 0.0500 time 0.4025 (0.4189) data time 0.0008 (0.0017) model time 0.4017 (0.4172) loss 7.2372 (6.5268) grad_norm 2.1997 (inf) loss_scale 128.0000 (192.4672) mem 14939MB [2024-07-25 10:54:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][420/625] eta 0:01:25 lr 0.000084 wd 0.0500 time 0.3954 (0.4184) data time 0.0008 (0.0017) model time 0.3946 (0.4166) loss 6.9226 (6.5228) grad_norm 3.6960 (inf) loss_scale 128.0000 (190.9359) mem 14939MB [2024-07-25 10:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][430/625] eta 0:01:21 lr 0.000084 wd 0.0500 time 0.3951 (0.4179) data time 0.0009 (0.0017) model time 0.3943 (0.4161) loss 5.9949 (6.5182) grad_norm 2.2859 (inf) loss_scale 128.0000 (189.4756) mem 14939MB [2024-07-25 10:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][440/625] eta 0:01:17 lr 0.000084 wd 0.0500 time 0.3970 (0.4175) data time 0.0007 (0.0017) model time 0.3963 (0.4156) loss 5.5189 (6.5189) grad_norm 2.5204 (inf) loss_scale 128.0000 (188.0816) mem 14939MB [2024-07-25 10:54:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][450/625] eta 0:01:12 lr 0.000084 wd 0.0500 time 0.3992 (0.4170) data time 0.0007 (0.0016) model time 0.3985 (0.4152) loss 6.4353 (6.5168) grad_norm 3.1078 (inf) loss_scale 128.0000 (186.7494) mem 14939MB [2024-07-25 10:54:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][460/625] eta 0:01:08 lr 0.000084 wd 0.0500 time 0.3961 (0.4166) data time 0.0009 (0.0016) model time 0.3952 (0.4147) loss 5.6645 (6.5168) grad_norm 2.2611 (inf) loss_scale 128.0000 (185.4751) mem 14939MB [2024-07-25 10:54:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][470/625] eta 0:01:04 lr 0.000084 wd 0.0500 time 0.3955 (0.4166) data time 0.0007 (0.0016) model time 0.3948 (0.4147) loss 6.9162 (6.5133) grad_norm 2.3052 (inf) loss_scale 128.0000 (184.2548) mem 14939MB [2024-07-25 10:54:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][480/625] eta 0:01:00 lr 0.000084 wd 0.0500 time 0.4019 (0.4166) data time 0.0008 (0.0016) model time 0.4011 (0.4147) loss 6.8222 (6.5157) grad_norm 2.5119 (inf) loss_scale 128.0000 (183.0852) mem 14939MB [2024-07-25 10:54:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][490/625] eta 0:00:56 lr 0.000084 wd 0.0500 time 0.3988 (0.4173) data time 0.0007 (0.0016) model time 0.3981 (0.4156) loss 6.3051 (6.5075) grad_norm 3.6885 (inf) loss_scale 128.0000 (181.9633) mem 14939MB [2024-07-25 10:54:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][500/625] eta 0:00:52 lr 0.000084 wd 0.0500 time 0.5731 (0.4180) data time 0.0009 (0.0016) model time 0.5722 (0.4163) loss 6.3669 (6.5078) grad_norm 3.1730 (inf) loss_scale 128.0000 (180.8862) mem 14939MB [2024-07-25 10:54:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][510/625] eta 0:00:48 lr 0.000084 wd 0.0500 time 0.5798 (0.4197) data time 0.0008 (0.0015) model time 0.5790 (0.4183) loss 5.6973 (6.5071) grad_norm 4.9346 (inf) loss_scale 128.0000 (179.8513) mem 14939MB [2024-07-25 10:54:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][520/625] eta 0:00:44 lr 0.000083 wd 0.0500 time 0.5913 (0.4206) data time 0.0008 (0.0015) model time 0.5905 (0.4193) loss 7.2619 (6.5092) grad_norm 4.3537 (inf) loss_scale 128.0000 (178.8560) mem 14939MB [2024-07-25 10:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][530/625] eta 0:00:40 lr 0.000083 wd 0.0500 time 0.4026 (0.4213) data time 0.0006 (0.0015) model time 0.4020 (0.4200) loss 6.7043 (6.5033) grad_norm 2.3764 (inf) loss_scale 128.0000 (177.8983) mem 14939MB [2024-07-25 10:55:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][540/625] eta 0:00:35 lr 0.000083 wd 0.0500 time 0.3963 (0.4218) data time 0.0008 (0.0015) model time 0.3955 (0.4206) loss 7.8403 (6.5092) grad_norm 1.7727 (inf) loss_scale 128.0000 (176.9760) mem 14939MB [2024-07-25 10:55:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][550/625] eta 0:00:31 lr 0.000083 wd 0.0500 time 0.3982 (0.4214) data time 0.0007 (0.0015) model time 0.3975 (0.4201) loss 5.5774 (6.5100) grad_norm 3.2279 (inf) loss_scale 128.0000 (176.0871) mem 14939MB [2024-07-25 10:55:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][560/625] eta 0:00:27 lr 0.000083 wd 0.0500 time 0.4001 (0.4212) data time 0.0008 (0.0015) model time 0.3993 (0.4199) loss 5.5775 (6.5111) grad_norm 2.9958 (inf) loss_scale 128.0000 (175.2299) mem 14939MB [2024-07-25 10:55:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][570/625] eta 0:00:23 lr 0.000083 wd 0.0500 time 0.3966 (0.4208) data time 0.0007 (0.0015) model time 0.3959 (0.4195) loss 5.5149 (6.5074) grad_norm 3.8333 (inf) loss_scale 128.0000 (174.4028) mem 14939MB [2024-07-25 10:55:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][580/625] eta 0:00:18 lr 0.000083 wd 0.0500 time 0.3966 (0.4204) data time 0.0008 (0.0015) model time 0.3958 (0.4191) loss 6.7722 (6.5123) grad_norm 2.5049 (inf) loss_scale 128.0000 (173.6041) mem 14939MB [2024-07-25 10:55:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][590/625] eta 0:00:14 lr 0.000083 wd 0.0500 time 0.3979 (0.4204) data time 0.0008 (0.0015) model time 0.3971 (0.4191) loss 6.9786 (6.5174) grad_norm 11.0166 (inf) loss_scale 128.0000 (172.8325) mem 14939MB [2024-07-25 10:55:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][600/625] eta 0:00:10 lr 0.000083 wd 0.0500 time 0.3994 (0.4200) data time 0.0007 (0.0014) model time 0.3987 (0.4187) loss 6.6700 (6.5156) grad_norm 4.1541 (inf) loss_scale 128.0000 (172.0865) mem 14939MB [2024-07-25 10:55:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][610/625] eta 0:00:06 lr 0.000083 wd 0.0500 time 0.3986 (0.4197) data time 0.0005 (0.0014) model time 0.3981 (0.4183) loss 5.9473 (6.5131) grad_norm 4.0223 (inf) loss_scale 128.0000 (171.3650) mem 14939MB [2024-07-25 10:55:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][620/625] eta 0:00:02 lr 0.000083 wd 0.0500 time 0.3970 (0.4193) data time 0.0004 (0.0014) model time 0.3966 (0.4180) loss 6.5126 (6.5137) grad_norm 5.0619 (inf) loss_scale 128.0000 (170.6667) mem 14939MB [2024-07-25 10:55:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 255 training takes 0:04:22 [2024-07-25 10:55:33 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 10:55:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 10:55:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 0.5322 (0.5322) Acc@1 90.332 (90.332) Acc@5 99.072 (99.072) Mem 14939MB [2024-07-25 10:55:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8076 (0.6543) Acc@1 83.203 (87.442) Acc@5 97.021 (98.002) Mem 14939MB [2024-07-25 10:55:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9170 (0.7596) Acc@1 78.467 (84.438) Acc@5 95.850 (97.035) Mem 14939MB [2024-07-25 10:55:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.029 Acc@5 96.991 [2024-07-25 10:55:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 10:55:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.914 (0.914) Loss 0.5371 (0.5371) Acc@1 90.137 (90.137) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 10:55:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.162) Loss 0.8130 (0.6568) Acc@1 83.203 (87.380) Acc@5 96.924 (97.971) Mem 14939MB [2024-07-25 10:55:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.125) Loss 0.9214 (0.7618) Acc@1 78.467 (84.442) Acc@5 95.459 (96.984) Mem 14939MB [2024-07-25 10:55:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.021 Acc@5 96.943 [2024-07-25 10:55:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 10:55:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.02% [2024-07-25 10:55:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 10:55:41 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 10:55:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][0/625] eta 0:08:15 lr 0.000083 wd 0.0500 time 0.7934 (0.7934) data time 0.4167 (0.4167) model time 0.0000 (0.0000) loss 5.5534 (5.5534) grad_norm 4.6965 (4.6965) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:55:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][10/625] eta 0:04:26 lr 0.000083 wd 0.0500 time 0.3975 (0.4332) data time 0.0008 (0.0387) model time 0.0000 (0.0000) loss 5.6976 (6.2149) grad_norm 3.1581 (3.4006) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:55:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][20/625] eta 0:04:12 lr 0.000083 wd 0.0500 time 0.4052 (0.4172) data time 0.0009 (0.0207) model time 0.0000 (0.0000) loss 6.2625 (6.3850) grad_norm 4.7999 (3.7123) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:55:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][30/625] eta 0:04:04 lr 0.000083 wd 0.0500 time 0.3951 (0.4111) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 6.4650 (6.4557) grad_norm 2.9649 (3.5788) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:55:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][40/625] eta 0:03:58 lr 0.000083 wd 0.0500 time 0.3960 (0.4080) data time 0.0011 (0.0110) model time 0.0000 (0.0000) loss 6.5354 (6.4524) grad_norm 3.0089 (3.4739) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][50/625] eta 0:03:53 lr 0.000083 wd 0.0500 time 0.3851 (0.4061) data time 0.0009 (0.0090) model time 0.0000 (0.0000) loss 6.2329 (6.4719) grad_norm 3.2723 (3.4050) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][60/625] eta 0:03:48 lr 0.000083 wd 0.0500 time 0.4023 (0.4048) data time 0.0007 (0.0077) model time 0.4016 (0.3974) loss 5.5187 (6.4180) grad_norm 3.2889 (3.6073) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][70/625] eta 0:03:45 lr 0.000083 wd 0.0500 time 0.3977 (0.4063) data time 0.0007 (0.0067) model time 0.3970 (0.4060) loss 5.9477 (6.4298) grad_norm 2.3493 (3.6191) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][80/625] eta 0:03:43 lr 0.000083 wd 0.0500 time 0.3955 (0.4095) data time 0.0011 (0.0060) model time 0.3944 (0.4144) loss 7.0453 (6.4564) grad_norm 2.3353 (3.7158) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][90/625] eta 0:03:40 lr 0.000082 wd 0.0500 time 0.4002 (0.4118) data time 0.0008 (0.0054) model time 0.3994 (0.4183) loss 6.7590 (6.4888) grad_norm 2.1946 (3.6082) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][100/625] eta 0:03:40 lr 0.000082 wd 0.0500 time 0.3983 (0.4200) data time 0.0009 (0.0050) model time 0.3974 (0.4333) loss 7.2583 (6.5149) grad_norm 3.2353 (3.5313) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][110/625] eta 0:03:39 lr 0.000082 wd 0.0500 time 0.5193 (0.4259) data time 0.0007 (0.0046) model time 0.5186 (0.4419) loss 6.5673 (6.5251) grad_norm 3.4481 (3.5128) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][120/625] eta 0:03:36 lr 0.000082 wd 0.0500 time 0.4007 (0.4279) data time 0.0007 (0.0043) model time 0.4000 (0.4429) loss 7.0086 (6.5424) grad_norm 2.3486 (3.4789) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][130/625] eta 0:03:32 lr 0.000082 wd 0.0500 time 0.3987 (0.4294) data time 0.0008 (0.0040) model time 0.3979 (0.4434) loss 7.0528 (6.5202) grad_norm 4.0957 (3.4502) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][140/625] eta 0:03:28 lr 0.000082 wd 0.0500 time 0.3993 (0.4296) data time 0.0009 (0.0038) model time 0.3985 (0.4421) loss 7.8730 (6.5357) grad_norm 2.3202 (3.4204) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][150/625] eta 0:03:23 lr 0.000082 wd 0.0500 time 0.5794 (0.4287) data time 0.0009 (0.0036) model time 0.5785 (0.4394) loss 5.9299 (6.5300) grad_norm 2.1058 (3.3944) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][160/625] eta 0:03:18 lr 0.000082 wd 0.0500 time 0.3956 (0.4268) data time 0.0008 (0.0035) model time 0.3948 (0.4355) loss 5.7970 (6.5212) grad_norm 5.8378 (3.3984) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][170/625] eta 0:03:13 lr 0.000082 wd 0.0500 time 0.4000 (0.4253) data time 0.0008 (0.0033) model time 0.3992 (0.4326) loss 7.7645 (6.5171) grad_norm 2.6267 (3.3623) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:56:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][180/625] eta 0:03:08 lr 0.000082 wd 0.0500 time 0.4025 (0.4239) data time 0.0008 (0.0032) model time 0.4017 (0.4300) loss 6.5854 (6.5097) grad_norm 6.1148 (3.3609) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][190/625] eta 0:03:03 lr 0.000082 wd 0.0500 time 0.4057 (0.4227) data time 0.0008 (0.0030) model time 0.4049 (0.4278) loss 6.5845 (6.5132) grad_norm 3.1632 (3.3811) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][200/625] eta 0:02:59 lr 0.000082 wd 0.0500 time 0.3984 (0.4215) data time 0.0009 (0.0029) model time 0.3975 (0.4259) loss 6.0472 (6.5225) grad_norm 4.6862 (3.3675) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][210/625] eta 0:02:54 lr 0.000082 wd 0.0500 time 0.3979 (0.4204) data time 0.0006 (0.0028) model time 0.3973 (0.4241) loss 6.3103 (6.5219) grad_norm 2.8455 (3.3815) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][220/625] eta 0:02:49 lr 0.000082 wd 0.0500 time 0.4118 (0.4195) data time 0.0010 (0.0027) model time 0.4108 (0.4227) loss 6.2819 (6.5214) grad_norm 3.8225 (3.3520) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][230/625] eta 0:02:45 lr 0.000082 wd 0.0500 time 0.3956 (0.4186) data time 0.0007 (0.0027) model time 0.3949 (0.4213) loss 6.5632 (6.5294) grad_norm 6.7960 (3.3357) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][240/625] eta 0:02:40 lr 0.000082 wd 0.0500 time 0.3945 (0.4178) data time 0.0012 (0.0026) model time 0.3933 (0.4201) loss 6.3508 (6.5140) grad_norm 2.6600 (3.3120) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][250/625] eta 0:02:36 lr 0.000082 wd 0.0500 time 0.4016 (0.4171) data time 0.0008 (0.0025) model time 0.4007 (0.4191) loss 6.6793 (6.5251) grad_norm 3.3617 (3.3010) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][260/625] eta 0:02:32 lr 0.000082 wd 0.0500 time 0.4032 (0.4165) data time 0.0006 (0.0025) model time 0.4026 (0.4181) loss 6.2640 (6.5139) grad_norm 3.1459 (3.3459) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][270/625] eta 0:02:27 lr 0.000082 wd 0.0500 time 0.4026 (0.4158) data time 0.0009 (0.0024) model time 0.4016 (0.4172) loss 7.1895 (6.5156) grad_norm 4.2452 (3.3469) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][280/625] eta 0:02:23 lr 0.000082 wd 0.0500 time 0.5570 (0.4158) data time 0.0009 (0.0023) model time 0.5561 (0.4171) loss 5.9754 (6.5211) grad_norm 2.9052 (3.3459) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][290/625] eta 0:02:19 lr 0.000081 wd 0.0500 time 0.3949 (0.4151) data time 0.0009 (0.0023) model time 0.3940 (0.4162) loss 7.5501 (6.5237) grad_norm 1.9754 (3.3302) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][300/625] eta 0:02:15 lr 0.000081 wd 0.0500 time 0.3946 (0.4160) data time 0.0006 (0.0022) model time 0.3940 (0.4171) loss 7.0890 (6.5178) grad_norm 5.8455 (3.3163) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][310/625] eta 0:02:11 lr 0.000081 wd 0.0500 time 0.3983 (0.4165) data time 0.0009 (0.0022) model time 0.3974 (0.4176) loss 6.8133 (6.5225) grad_norm 2.4762 (3.3061) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:57:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][320/625] eta 0:02:07 lr 0.000081 wd 0.0500 time 0.5457 (0.4187) data time 0.0008 (0.0022) model time 0.5448 (0.4202) loss 6.8051 (6.5351) grad_norm 2.7228 (3.3443) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][330/625] eta 0:02:03 lr 0.000081 wd 0.0500 time 0.4001 (0.4199) data time 0.0007 (0.0021) model time 0.3994 (0.4216) loss 6.0900 (6.5246) grad_norm 2.1153 (3.3226) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][340/625] eta 0:02:00 lr 0.000081 wd 0.0500 time 0.4018 (0.4211) data time 0.0008 (0.0021) model time 0.4010 (0.4229) loss 6.0614 (6.5289) grad_norm 2.7204 (3.3156) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][350/625] eta 0:01:55 lr 0.000081 wd 0.0500 time 0.4017 (0.4218) data time 0.0010 (0.0021) model time 0.4007 (0.4236) loss 5.9874 (6.5240) grad_norm 2.7694 (3.3255) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][360/625] eta 0:01:51 lr 0.000081 wd 0.0500 time 0.3948 (0.4221) data time 0.0009 (0.0020) model time 0.3940 (0.4239) loss 5.3127 (6.5249) grad_norm 2.7718 (3.3714) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][370/625] eta 0:01:47 lr 0.000081 wd 0.0500 time 0.3969 (0.4218) data time 0.0006 (0.0020) model time 0.3963 (0.4235) loss 5.7058 (6.5266) grad_norm 3.1835 (3.3796) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][380/625] eta 0:01:43 lr 0.000081 wd 0.0500 time 0.3998 (0.4213) data time 0.0006 (0.0020) model time 0.3992 (0.4227) loss 7.0165 (6.5256) grad_norm 5.0402 (3.3880) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][390/625] eta 0:01:38 lr 0.000081 wd 0.0500 time 0.3971 (0.4207) data time 0.0010 (0.0019) model time 0.3961 (0.4220) loss 6.7646 (6.5209) grad_norm 2.4890 (3.3758) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][400/625] eta 0:01:34 lr 0.000081 wd 0.0500 time 0.3973 (0.4201) data time 0.0009 (0.0019) model time 0.3964 (0.4213) loss 6.3439 (6.5138) grad_norm 8.8309 (3.3705) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][410/625] eta 0:01:30 lr 0.000081 wd 0.0500 time 0.3965 (0.4196) data time 0.0007 (0.0019) model time 0.3958 (0.4206) loss 5.4830 (6.5116) grad_norm 1.9735 (3.3664) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 10:58:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][420/625] eta 0:01:25 lr 0.000081 wd 0.0500 time 0.3942 (0.4190) data time 0.0008 (0.0018) model time 0.3934 (0.4200) loss 7.2919 (6.5160) grad_norm 3.2853 (inf) loss_scale 64.0000 (127.3919) mem 14939MB [2024-07-25 10:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][430/625] eta 0:01:21 lr 0.000081 wd 0.0500 time 0.3989 (0.4186) data time 0.0008 (0.0018) model time 0.3981 (0.4194) loss 6.6243 (6.5187) grad_norm 3.1023 (inf) loss_scale 64.0000 (125.9211) mem 14939MB [2024-07-25 10:58:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][440/625] eta 0:01:17 lr 0.000081 wd 0.0500 time 0.4039 (0.4181) data time 0.0009 (0.0018) model time 0.4030 (0.4188) loss 5.2883 (6.5183) grad_norm 2.5692 (inf) loss_scale 64.0000 (124.5170) mem 14939MB [2024-07-25 10:58:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][450/625] eta 0:01:13 lr 0.000081 wd 0.0500 time 0.3995 (0.4177) data time 0.0007 (0.0018) model time 0.3988 (0.4183) loss 5.7144 (6.5211) grad_norm 3.2258 (inf) loss_scale 64.0000 (123.1752) mem 14939MB [2024-07-25 10:58:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][460/625] eta 0:01:08 lr 0.000081 wd 0.0500 time 0.3995 (0.4174) data time 0.0009 (0.0018) model time 0.3987 (0.4179) loss 7.2154 (6.5267) grad_norm 2.8102 (inf) loss_scale 64.0000 (121.8915) mem 14939MB [2024-07-25 10:58:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][470/625] eta 0:01:04 lr 0.000081 wd 0.0500 time 0.3975 (0.4169) data time 0.0009 (0.0017) model time 0.3966 (0.4174) loss 5.7991 (6.5259) grad_norm 4.2181 (inf) loss_scale 64.0000 (120.6624) mem 14939MB [2024-07-25 10:59:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][480/625] eta 0:01:00 lr 0.000081 wd 0.0500 time 0.3960 (0.4165) data time 0.0007 (0.0017) model time 0.3953 (0.4169) loss 6.7671 (6.5199) grad_norm 2.7330 (inf) loss_scale 64.0000 (119.4844) mem 14939MB [2024-07-25 10:59:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][490/625] eta 0:00:56 lr 0.000080 wd 0.0500 time 0.4044 (0.4162) data time 0.0007 (0.0017) model time 0.4038 (0.4165) loss 6.1749 (6.5167) grad_norm 3.2836 (inf) loss_scale 64.0000 (118.3544) mem 14939MB [2024-07-25 10:59:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][500/625] eta 0:00:51 lr 0.000080 wd 0.0500 time 0.3966 (0.4158) data time 0.0007 (0.0017) model time 0.3959 (0.4161) loss 6.8020 (6.5096) grad_norm 2.2267 (inf) loss_scale 64.0000 (117.2695) mem 14939MB [2024-07-25 10:59:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][510/625] eta 0:00:47 lr 0.000080 wd 0.0500 time 0.3957 (0.4158) data time 0.0007 (0.0017) model time 0.3950 (0.4160) loss 5.8729 (6.5026) grad_norm 2.2557 (inf) loss_scale 64.0000 (116.2270) mem 14939MB [2024-07-25 10:59:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][520/625] eta 0:00:43 lr 0.000080 wd 0.0500 time 0.4003 (0.4162) data time 0.0009 (0.0017) model time 0.3995 (0.4165) loss 6.6760 (6.5065) grad_norm 4.9106 (inf) loss_scale 64.0000 (115.2246) mem 14939MB [2024-07-25 10:59:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][530/625] eta 0:00:39 lr 0.000080 wd 0.0500 time 0.3986 (0.4166) data time 0.0007 (0.0016) model time 0.3979 (0.4169) loss 6.6472 (6.5012) grad_norm 2.1758 (inf) loss_scale 32.0000 (113.7778) mem 14939MB [2024-07-25 10:59:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][540/625] eta 0:00:35 lr 0.000080 wd 0.0500 time 0.5986 (0.4176) data time 0.0007 (0.0016) model time 0.5979 (0.4180) loss 6.1243 (6.5029) grad_norm 3.0555 (inf) loss_scale 32.0000 (112.2662) mem 14939MB [2024-07-25 10:59:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][550/625] eta 0:00:31 lr 0.000080 wd 0.0500 time 0.5932 (0.4187) data time 0.0009 (0.0016) model time 0.5923 (0.4191) loss 7.3528 (6.4992) grad_norm 1.7498 (inf) loss_scale 32.0000 (110.8094) mem 14939MB [2024-07-25 10:59:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][560/625] eta 0:00:27 lr 0.000080 wd 0.0500 time 0.5841 (0.4197) data time 0.0009 (0.0016) model time 0.5832 (0.4202) loss 7.4214 (6.5040) grad_norm 3.8466 (inf) loss_scale 32.0000 (109.4046) mem 14939MB [2024-07-25 10:59:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][570/625] eta 0:00:23 lr 0.000080 wd 0.0500 time 0.3976 (0.4202) data time 0.0009 (0.0016) model time 0.3967 (0.4208) loss 6.7555 (6.5069) grad_norm 3.0227 (inf) loss_scale 32.0000 (108.0490) mem 14939MB [2024-07-25 10:59:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][580/625] eta 0:00:18 lr 0.000080 wd 0.0500 time 0.3968 (0.4203) data time 0.0009 (0.0016) model time 0.3959 (0.4208) loss 7.3081 (6.5050) grad_norm 3.6687 (inf) loss_scale 32.0000 (106.7401) mem 14939MB [2024-07-25 10:59:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][590/625] eta 0:00:14 lr 0.000080 wd 0.0500 time 0.4012 (0.4203) data time 0.0008 (0.0016) model time 0.4004 (0.4207) loss 6.8876 (6.5070) grad_norm 2.1537 (inf) loss_scale 32.0000 (105.4755) mem 14939MB [2024-07-25 10:59:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][600/625] eta 0:00:10 lr 0.000080 wd 0.0500 time 0.3977 (0.4199) data time 0.0011 (0.0016) model time 0.3966 (0.4204) loss 5.8172 (6.5077) grad_norm 2.3688 (inf) loss_scale 32.0000 (104.2529) mem 14939MB [2024-07-25 10:59:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][610/625] eta 0:00:06 lr 0.000080 wd 0.0500 time 0.3989 (0.4196) data time 0.0004 (0.0016) model time 0.3986 (0.4200) loss 7.0786 (6.5044) grad_norm 4.1718 (inf) loss_scale 32.0000 (103.0704) mem 14939MB [2024-07-25 11:00:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][620/625] eta 0:00:02 lr 0.000080 wd 0.0500 time 0.3985 (0.4193) data time 0.0005 (0.0015) model time 0.3981 (0.4196) loss 6.8245 (6.5085) grad_norm 3.3774 (inf) loss_scale 32.0000 (101.9259) mem 14939MB [2024-07-25 11:00:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 256 training takes 0:04:21 [2024-07-25 11:00:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:00:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:00:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.5430 (0.5430) Acc@1 90.625 (90.625) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 11:00:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8169 (0.6593) Acc@1 82.666 (87.451) Acc@5 96.826 (97.954) Mem 14939MB [2024-07-25 11:00:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9185 (0.7642) Acc@1 78.516 (84.401) Acc@5 95.801 (96.980) Mem 14939MB [2024-07-25 11:00:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.087 Acc@5 96.945 [2024-07-25 11:00:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.777 (0.777) Loss 0.5371 (0.5371) Acc@1 90.186 (90.186) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 11:00:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.153) Loss 0.8135 (0.6568) Acc@1 83.105 (87.385) Acc@5 96.973 (97.980) Mem 14939MB [2024-07-25 11:00:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.121) Loss 0.9204 (0.7615) Acc@1 78.369 (84.442) Acc@5 95.459 (96.987) Mem 14939MB [2024-07-25 11:00:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.019 Acc@5 96.943 [2024-07-25 11:00:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 11:00:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][0/625] eta 0:15:37 lr 0.000080 wd 0.0500 time 1.5003 (1.5003) data time 0.4194 (0.4194) model time 0.0000 (0.0000) loss 6.1932 (6.1932) grad_norm 3.7210 (3.7210) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][10/625] eta 0:05:07 lr 0.000080 wd 0.0500 time 0.3962 (0.4995) data time 0.0009 (0.0390) model time 0.0000 (0.0000) loss 6.1694 (6.4994) grad_norm 3.7536 (3.2280) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][20/625] eta 0:04:33 lr 0.000080 wd 0.0500 time 0.3995 (0.4527) data time 0.0007 (0.0209) model time 0.0000 (0.0000) loss 7.0261 (6.4430) grad_norm 2.8787 (3.2191) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][30/625] eta 0:04:19 lr 0.000080 wd 0.0500 time 0.4010 (0.4356) data time 0.0008 (0.0144) model time 0.0000 (0.0000) loss 7.0334 (6.5314) grad_norm 5.0771 (3.2044) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][40/625] eta 0:04:10 lr 0.000080 wd 0.0500 time 0.3956 (0.4275) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 7.6037 (6.5216) grad_norm 2.4446 (3.4156) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][50/625] eta 0:04:02 lr 0.000080 wd 0.0500 time 0.3956 (0.4222) data time 0.0007 (0.0091) model time 0.0000 (0.0000) loss 5.2738 (6.4593) grad_norm 2.8094 (3.4631) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][60/625] eta 0:03:56 lr 0.000080 wd 0.0500 time 0.3986 (0.4187) data time 0.0006 (0.0077) model time 0.3980 (0.3999) loss 6.3762 (6.3788) grad_norm 4.1386 (3.4641) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][70/625] eta 0:03:50 lr 0.000079 wd 0.0500 time 0.3966 (0.4161) data time 0.0009 (0.0068) model time 0.3957 (0.3995) loss 7.8803 (6.4487) grad_norm 3.7716 (3.3650) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][80/625] eta 0:03:45 lr 0.000079 wd 0.0500 time 0.3981 (0.4143) data time 0.0008 (0.0060) model time 0.3973 (0.4000) loss 5.1756 (6.4140) grad_norm 1.8517 (3.2915) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][90/625] eta 0:03:40 lr 0.000079 wd 0.0500 time 0.3970 (0.4127) data time 0.0010 (0.0055) model time 0.3960 (0.3997) loss 6.9635 (6.4605) grad_norm 3.7844 (3.2468) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][100/625] eta 0:03:36 lr 0.000079 wd 0.0500 time 0.3997 (0.4130) data time 0.0006 (0.0050) model time 0.3991 (0.4027) loss 7.6334 (6.4713) grad_norm 1.9644 (3.2065) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][110/625] eta 0:03:32 lr 0.000079 wd 0.0500 time 0.3972 (0.4133) data time 0.0007 (0.0046) model time 0.3965 (0.4049) loss 7.0406 (6.4641) grad_norm 2.6955 (3.2051) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:00:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][120/625] eta 0:03:28 lr 0.000079 wd 0.0500 time 0.3974 (0.4136) data time 0.0008 (0.0043) model time 0.3966 (0.4065) loss 6.4409 (6.4888) grad_norm 4.2717 (3.1839) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][130/625] eta 0:03:27 lr 0.000079 wd 0.0500 time 0.5934 (0.4189) data time 0.0008 (0.0041) model time 0.5926 (0.4159) loss 7.1565 (6.4735) grad_norm 3.2978 (3.2569) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][140/625] eta 0:03:24 lr 0.000079 wd 0.0500 time 0.3969 (0.4226) data time 0.0009 (0.0038) model time 0.3960 (0.4220) loss 6.4720 (6.4819) grad_norm 11.6768 (3.3029) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][150/625] eta 0:03:22 lr 0.000079 wd 0.0500 time 0.6245 (0.4260) data time 0.0007 (0.0036) model time 0.6238 (0.4271) loss 5.6202 (6.4563) grad_norm 2.7763 (3.4966) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][160/625] eta 0:03:19 lr 0.000079 wd 0.0500 time 0.3921 (0.4284) data time 0.0009 (0.0035) model time 0.3912 (0.4304) loss 5.8768 (6.4521) grad_norm 2.3559 (3.4648) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][170/625] eta 0:03:15 lr 0.000079 wd 0.0500 time 0.5009 (0.4304) data time 0.0008 (0.0033) model time 0.5001 (0.4330) loss 5.7887 (6.4541) grad_norm 2.0134 (3.4295) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][180/625] eta 0:03:10 lr 0.000079 wd 0.0500 time 0.4002 (0.4288) data time 0.0006 (0.0032) model time 0.3996 (0.4304) loss 6.6434 (6.4501) grad_norm 2.9585 (3.4349) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][190/625] eta 0:03:06 lr 0.000079 wd 0.0500 time 0.4036 (0.4281) data time 0.0007 (0.0031) model time 0.4029 (0.4294) loss 7.8240 (6.4598) grad_norm 2.1749 (3.4179) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][200/625] eta 0:03:01 lr 0.000079 wd 0.0500 time 0.4024 (0.4268) data time 0.0008 (0.0030) model time 0.4016 (0.4275) loss 6.7262 (6.4624) grad_norm 2.7093 (3.4010) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][210/625] eta 0:02:56 lr 0.000079 wd 0.0500 time 0.3967 (0.4255) data time 0.0007 (0.0029) model time 0.3961 (0.4257) loss 7.1356 (6.4809) grad_norm 2.9128 (3.3763) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][220/625] eta 0:02:51 lr 0.000079 wd 0.0500 time 0.4072 (0.4244) data time 0.0009 (0.0028) model time 0.4064 (0.4242) loss 5.9515 (6.4766) grad_norm 2.0873 (3.4183) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][230/625] eta 0:02:47 lr 0.000079 wd 0.0500 time 0.4036 (0.4235) data time 0.0006 (0.0027) model time 0.4030 (0.4229) loss 5.8455 (6.4650) grad_norm 3.9040 (3.4050) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][240/625] eta 0:02:42 lr 0.000079 wd 0.0500 time 0.3995 (0.4225) data time 0.0006 (0.0026) model time 0.3988 (0.4217) loss 6.9031 (6.4802) grad_norm 4.3113 (3.4409) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][250/625] eta 0:02:38 lr 0.000079 wd 0.0500 time 0.3962 (0.4216) data time 0.0009 (0.0025) model time 0.3953 (0.4206) loss 7.0063 (6.4905) grad_norm 3.6362 (3.4252) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][260/625] eta 0:02:33 lr 0.000079 wd 0.0500 time 0.4013 (0.4208) data time 0.0009 (0.0025) model time 0.4004 (0.4196) loss 6.1657 (6.5045) grad_norm 2.3017 (3.4319) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][270/625] eta 0:02:29 lr 0.000078 wd 0.0500 time 0.3982 (0.4200) data time 0.0006 (0.0024) model time 0.3975 (0.4186) loss 4.8822 (6.4952) grad_norm 2.6143 (3.4177) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][280/625] eta 0:02:24 lr 0.000078 wd 0.0500 time 0.3956 (0.4193) data time 0.0009 (0.0024) model time 0.3947 (0.4178) loss 6.2549 (6.4895) grad_norm 11.4672 (3.4407) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][290/625] eta 0:02:20 lr 0.000078 wd 0.0500 time 0.4023 (0.4186) data time 0.0006 (0.0023) model time 0.4017 (0.4170) loss 5.6818 (6.4936) grad_norm 39.2303 (3.5539) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][300/625] eta 0:02:15 lr 0.000078 wd 0.0500 time 0.4036 (0.4180) data time 0.0008 (0.0023) model time 0.4027 (0.4163) loss 6.3309 (6.4973) grad_norm 3.0316 (3.5240) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][310/625] eta 0:02:11 lr 0.000078 wd 0.0500 time 0.4005 (0.4174) data time 0.0005 (0.0022) model time 0.4000 (0.4155) loss 6.0694 (6.4894) grad_norm 1.9450 (3.5587) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][320/625] eta 0:02:07 lr 0.000078 wd 0.0500 time 0.3974 (0.4172) data time 0.0008 (0.0022) model time 0.3966 (0.4153) loss 6.1959 (6.4903) grad_norm 2.8855 (3.5513) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][330/625] eta 0:02:02 lr 0.000078 wd 0.0500 time 0.3936 (0.4166) data time 0.0008 (0.0021) model time 0.3927 (0.4147) loss 7.3940 (6.4995) grad_norm 3.2747 (3.5446) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][340/625] eta 0:01:58 lr 0.000078 wd 0.0500 time 0.3965 (0.4171) data time 0.0008 (0.0021) model time 0.3957 (0.4153) loss 7.6207 (6.5025) grad_norm 4.7818 (3.5299) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][350/625] eta 0:01:55 lr 0.000078 wd 0.0500 time 0.3944 (0.4186) data time 0.0008 (0.0021) model time 0.3935 (0.4172) loss 7.3215 (6.4920) grad_norm 2.9965 (3.5394) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][360/625] eta 0:01:51 lr 0.000078 wd 0.0500 time 0.5889 (0.4199) data time 0.0008 (0.0020) model time 0.5880 (0.4187) loss 6.9846 (6.4919) grad_norm 2.7088 (3.5722) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][370/625] eta 0:01:47 lr 0.000078 wd 0.0500 time 0.3991 (0.4207) data time 0.0009 (0.0020) model time 0.3982 (0.4196) loss 7.6737 (6.4993) grad_norm 2.6671 (3.6268) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][380/625] eta 0:01:43 lr 0.000078 wd 0.0500 time 0.5646 (0.4219) data time 0.0010 (0.0020) model time 0.5636 (0.4210) loss 7.6457 (6.4979) grad_norm 1.8716 (3.6277) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][390/625] eta 0:01:39 lr 0.000078 wd 0.0500 time 0.3989 (0.4222) data time 0.0008 (0.0019) model time 0.3981 (0.4214) loss 5.8464 (6.4914) grad_norm 2.2794 (3.6015) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][400/625] eta 0:01:34 lr 0.000078 wd 0.0500 time 0.4050 (0.4219) data time 0.0008 (0.0019) model time 0.4042 (0.4210) loss 5.7018 (6.4866) grad_norm 1.9742 (3.5776) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][410/625] eta 0:01:30 lr 0.000078 wd 0.0500 time 0.3996 (0.4216) data time 0.0009 (0.0019) model time 0.3987 (0.4206) loss 6.7908 (6.4905) grad_norm 2.5249 (3.5760) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][420/625] eta 0:01:26 lr 0.000078 wd 0.0500 time 0.3979 (0.4210) data time 0.0006 (0.0019) model time 0.3972 (0.4200) loss 6.6288 (6.4955) grad_norm 2.4467 (3.5632) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][430/625] eta 0:01:21 lr 0.000078 wd 0.0500 time 0.3986 (0.4205) data time 0.0006 (0.0018) model time 0.3980 (0.4194) loss 5.8475 (6.4923) grad_norm 3.0097 (3.5562) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][440/625] eta 0:01:17 lr 0.000078 wd 0.0500 time 0.3949 (0.4200) data time 0.0009 (0.0018) model time 0.3940 (0.4189) loss 6.2696 (6.4925) grad_norm 3.9128 (3.5667) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][450/625] eta 0:01:13 lr 0.000078 wd 0.0500 time 0.3946 (0.4195) data time 0.0007 (0.0018) model time 0.3939 (0.4184) loss 7.6474 (6.4927) grad_norm 2.6574 (3.5582) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][460/625] eta 0:01:09 lr 0.000078 wd 0.0500 time 0.4049 (0.4191) data time 0.0006 (0.0018) model time 0.4042 (0.4179) loss 6.4622 (6.4840) grad_norm 2.2999 (3.5461) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][470/625] eta 0:01:04 lr 0.000077 wd 0.0500 time 0.3982 (0.4187) data time 0.0009 (0.0017) model time 0.3973 (0.4175) loss 7.0393 (6.4874) grad_norm 2.2827 (3.5580) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][480/625] eta 0:01:00 lr 0.000077 wd 0.0500 time 0.3961 (0.4183) data time 0.0008 (0.0017) model time 0.3953 (0.4170) loss 6.8370 (6.4870) grad_norm 2.0754 (3.5357) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][490/625] eta 0:00:56 lr 0.000077 wd 0.0500 time 0.4057 (0.4179) data time 0.0007 (0.0017) model time 0.4050 (0.4166) loss 7.9451 (6.4976) grad_norm 3.6451 (3.5512) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][500/625] eta 0:00:52 lr 0.000077 wd 0.0500 time 0.3960 (0.4176) data time 0.0010 (0.0018) model time 0.3950 (0.4162) loss 7.1214 (6.5011) grad_norm 5.1177 (3.5439) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][510/625] eta 0:00:47 lr 0.000077 wd 0.0500 time 0.4043 (0.4173) data time 0.0008 (0.0017) model time 0.4035 (0.4158) loss 6.5849 (6.4946) grad_norm 2.6204 (3.5362) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][520/625] eta 0:00:43 lr 0.000077 wd 0.0500 time 0.4174 (0.4170) data time 0.0006 (0.0017) model time 0.4168 (0.4155) loss 6.1615 (6.4921) grad_norm 2.7452 (3.5370) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][530/625] eta 0:00:39 lr 0.000077 wd 0.0500 time 0.3906 (0.4167) data time 0.0009 (0.0017) model time 0.3897 (0.4151) loss 6.4135 (6.4866) grad_norm 7.6349 (3.5610) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][540/625] eta 0:00:35 lr 0.000077 wd 0.0500 time 0.3999 (0.4165) data time 0.0008 (0.0017) model time 0.3992 (0.4150) loss 6.0337 (6.4882) grad_norm 3.5747 (3.8027) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:03:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][550/625] eta 0:00:31 lr 0.000077 wd 0.0500 time 0.3998 (0.4162) data time 0.0008 (0.0017) model time 0.3990 (0.4147) loss 6.8656 (6.4893) grad_norm 2.2503 (3.7932) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][560/625] eta 0:00:27 lr 0.000077 wd 0.0500 time 0.5771 (0.4169) data time 0.0008 (0.0017) model time 0.5763 (0.4155) loss 5.5534 (6.4928) grad_norm 2.2783 (3.7770) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][570/625] eta 0:00:22 lr 0.000077 wd 0.0500 time 0.6172 (0.4179) data time 0.0006 (0.0016) model time 0.6166 (0.4166) loss 5.9896 (6.4869) grad_norm 4.4785 (3.7663) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][580/625] eta 0:00:18 lr 0.000077 wd 0.0500 time 0.6107 (0.4185) data time 0.0008 (0.0016) model time 0.6099 (0.4173) loss 6.4988 (6.4858) grad_norm 2.4671 (3.7517) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][590/625] eta 0:00:14 lr 0.000077 wd 0.0500 time 0.3967 (0.4193) data time 0.0007 (0.0016) model time 0.3959 (0.4181) loss 6.3179 (6.4817) grad_norm 5.3487 (3.7742) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][600/625] eta 0:00:10 lr 0.000077 wd 0.0500 time 0.5531 (0.4198) data time 0.0008 (0.0016) model time 0.5523 (0.4187) loss 6.5111 (6.4822) grad_norm 2.7392 (3.7627) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][610/625] eta 0:00:06 lr 0.000077 wd 0.0500 time 0.3952 (0.4199) data time 0.0004 (0.0016) model time 0.3948 (0.4187) loss 6.0917 (6.4838) grad_norm 3.0794 (3.7486) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][620/625] eta 0:00:02 lr 0.000077 wd 0.0500 time 0.3956 (0.4197) data time 0.0006 (0.0016) model time 0.3950 (0.4186) loss 5.6695 (6.4802) grad_norm 2.2031 (3.7288) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 257 training takes 0:04:22 [2024-07-25 11:04:31 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:04:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:04:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.485 (0.485) Loss 0.5444 (0.5444) Acc@1 90.527 (90.527) Acc@5 99.072 (99.072) Mem 14939MB [2024-07-25 11:04:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.123) Loss 0.8145 (0.6614) Acc@1 82.812 (87.447) Acc@5 97.021 (98.002) Mem 14939MB [2024-07-25 11:04:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9136 (0.7651) Acc@1 79.688 (84.505) Acc@5 95.801 (97.049) Mem 14939MB [2024-07-25 11:04:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.087 Acc@5 96.991 [2024-07-25 11:04:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:04:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.993 (0.993) Loss 0.5371 (0.5371) Acc@1 90.234 (90.234) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 11:04:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.173) Loss 0.8125 (0.6565) Acc@1 83.154 (87.398) Acc@5 96.973 (97.985) Mem 14939MB [2024-07-25 11:04:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.131) Loss 0.9199 (0.7611) Acc@1 78.320 (84.440) Acc@5 95.508 (97.008) Mem 14939MB [2024-07-25 11:04:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.021 Acc@5 96.967 [2024-07-25 11:04:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 11:04:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][0/625] eta 0:17:11 lr 0.000077 wd 0.0500 time 1.6499 (1.6499) data time 0.7647 (0.7647) model time 0.0000 (0.0000) loss 7.1037 (7.1037) grad_norm 2.0616 (2.0616) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][10/625] eta 0:05:15 lr 0.000077 wd 0.0500 time 0.3950 (0.5133) data time 0.0009 (0.0703) model time 0.0000 (0.0000) loss 5.3078 (6.3626) grad_norm 3.0253 (2.7094) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][20/625] eta 0:04:37 lr 0.000077 wd 0.0500 time 0.3954 (0.4583) data time 0.0010 (0.0372) model time 0.0000 (0.0000) loss 6.9812 (6.4424) grad_norm 2.6326 (2.6828) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][30/625] eta 0:04:21 lr 0.000077 wd 0.0500 time 0.3971 (0.4392) data time 0.0008 (0.0255) model time 0.0000 (0.0000) loss 7.2573 (6.4885) grad_norm 3.0403 (2.9054) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][40/625] eta 0:04:11 lr 0.000077 wd 0.0500 time 0.3996 (0.4300) data time 0.0008 (0.0195) model time 0.0000 (0.0000) loss 6.2755 (6.4896) grad_norm 3.6325 (2.8911) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][50/625] eta 0:04:03 lr 0.000077 wd 0.0500 time 0.3990 (0.4243) data time 0.0006 (0.0158) model time 0.0000 (0.0000) loss 6.6029 (6.4768) grad_norm 2.0946 (2.8612) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][60/625] eta 0:03:57 lr 0.000076 wd 0.0500 time 0.3944 (0.4201) data time 0.0009 (0.0134) model time 0.3935 (0.3981) loss 6.7792 (6.4788) grad_norm 5.4071 (2.8869) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][70/625] eta 0:03:51 lr 0.000076 wd 0.0500 time 0.4029 (0.4175) data time 0.0006 (0.0116) model time 0.4024 (0.3995) loss 6.7545 (6.4775) grad_norm 4.1563 (3.0410) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][80/625] eta 0:03:46 lr 0.000076 wd 0.0500 time 0.4025 (0.4153) data time 0.0007 (0.0103) model time 0.4018 (0.3992) loss 6.9751 (6.4483) grad_norm 6.3830 (3.1036) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][90/625] eta 0:03:41 lr 0.000076 wd 0.0500 time 0.4033 (0.4137) data time 0.0008 (0.0092) model time 0.4025 (0.3994) loss 7.4437 (6.4462) grad_norm 5.0037 (3.1236) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][100/625] eta 0:03:36 lr 0.000076 wd 0.0500 time 0.3946 (0.4126) data time 0.0007 (0.0084) model time 0.3939 (0.3997) loss 5.6174 (6.4363) grad_norm 1.9757 (3.0680) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][110/625] eta 0:03:31 lr 0.000076 wd 0.0500 time 0.4047 (0.4114) data time 0.0006 (0.0077) model time 0.4041 (0.3996) loss 5.5612 (6.4305) grad_norm 2.9649 (3.0469) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][120/625] eta 0:03:27 lr 0.000076 wd 0.0500 time 0.4020 (0.4105) data time 0.0009 (0.0072) model time 0.4011 (0.3996) loss 7.0056 (6.4661) grad_norm 4.3262 (3.0442) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][130/625] eta 0:03:22 lr 0.000076 wd 0.0500 time 0.3999 (0.4096) data time 0.0007 (0.0067) model time 0.3992 (0.3994) loss 5.7087 (6.4518) grad_norm 3.0798 (3.0714) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][140/625] eta 0:03:19 lr 0.000076 wd 0.0500 time 0.4005 (0.4114) data time 0.0007 (0.0063) model time 0.3998 (0.4033) loss 5.7601 (6.4340) grad_norm 5.3354 (3.0846) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][150/625] eta 0:03:16 lr 0.000076 wd 0.0500 time 0.3984 (0.4131) data time 0.0006 (0.0059) model time 0.3979 (0.4065) loss 5.3782 (6.4233) grad_norm 6.2699 (3.0674) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][160/625] eta 0:03:13 lr 0.000076 wd 0.0500 time 0.5661 (0.4155) data time 0.0009 (0.0056) model time 0.5652 (0.4106) loss 6.1657 (6.4293) grad_norm 3.1986 (3.0622) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][170/625] eta 0:03:10 lr 0.000076 wd 0.0500 time 0.3997 (0.4180) data time 0.0008 (0.0053) model time 0.3989 (0.4144) loss 5.2578 (6.4216) grad_norm 3.1165 (3.0379) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][180/625] eta 0:03:07 lr 0.000076 wd 0.0500 time 0.4133 (0.4218) data time 0.0006 (0.0051) model time 0.4127 (0.4199) loss 5.7786 (6.4167) grad_norm 2.1325 (3.0311) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][190/625] eta 0:03:04 lr 0.000076 wd 0.0500 time 0.3995 (0.4245) data time 0.0007 (0.0049) model time 0.3988 (0.4237) loss 5.3049 (6.4348) grad_norm 2.7585 (3.1380) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][200/625] eta 0:03:00 lr 0.000076 wd 0.0500 time 0.3959 (0.4249) data time 0.0006 (0.0047) model time 0.3953 (0.4243) loss 6.9083 (6.4486) grad_norm 3.7170 (3.1641) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][210/625] eta 0:02:56 lr 0.000076 wd 0.0500 time 0.3973 (0.4254) data time 0.0010 (0.0045) model time 0.3964 (0.4249) loss 7.4120 (6.4600) grad_norm 3.3075 (3.1398) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][220/625] eta 0:02:52 lr 0.000076 wd 0.0500 time 0.3922 (0.4249) data time 0.0009 (0.0043) model time 0.3913 (0.4243) loss 6.9153 (6.4683) grad_norm 10.2161 (3.1684) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][230/625] eta 0:02:47 lr 0.000076 wd 0.0500 time 0.3980 (0.4238) data time 0.0007 (0.0042) model time 0.3973 (0.4228) loss 6.5540 (6.4626) grad_norm 3.3244 (3.1614) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][240/625] eta 0:02:42 lr 0.000076 wd 0.0500 time 0.3944 (0.4228) data time 0.0007 (0.0040) model time 0.3937 (0.4215) loss 5.2647 (6.4608) grad_norm 3.9324 (3.2418) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][250/625] eta 0:02:38 lr 0.000076 wd 0.0500 time 0.3945 (0.4218) data time 0.0006 (0.0039) model time 0.3939 (0.4203) loss 5.7322 (6.4516) grad_norm 4.5434 (3.2472) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][260/625] eta 0:02:33 lr 0.000075 wd 0.0500 time 0.3946 (0.4209) data time 0.0008 (0.0038) model time 0.3938 (0.4192) loss 6.2146 (6.4434) grad_norm 5.2891 (3.2607) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][270/625] eta 0:02:29 lr 0.000075 wd 0.0500 time 0.3962 (0.4201) data time 0.0008 (0.0037) model time 0.3954 (0.4182) loss 6.8792 (6.4512) grad_norm 4.9309 (3.2589) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][280/625] eta 0:02:24 lr 0.000075 wd 0.0500 time 0.3920 (0.4193) data time 0.0009 (0.0036) model time 0.3910 (0.4173) loss 6.7580 (6.4507) grad_norm 2.9171 (3.2452) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][290/625] eta 0:02:20 lr 0.000075 wd 0.0500 time 0.4028 (0.4187) data time 0.0007 (0.0035) model time 0.4022 (0.4166) loss 6.1514 (6.4490) grad_norm 3.8176 (3.2524) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][300/625] eta 0:02:15 lr 0.000075 wd 0.0500 time 0.3953 (0.4179) data time 0.0008 (0.0034) model time 0.3945 (0.4158) loss 6.6261 (6.4391) grad_norm 2.4390 (3.3397) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][310/625] eta 0:02:11 lr 0.000075 wd 0.0500 time 0.3955 (0.4173) data time 0.0008 (0.0033) model time 0.3947 (0.4151) loss 6.6565 (6.4483) grad_norm 2.4263 (3.3458) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][320/625] eta 0:02:07 lr 0.000075 wd 0.0500 time 0.4010 (0.4167) data time 0.0007 (0.0032) model time 0.4004 (0.4145) loss 5.6930 (6.4529) grad_norm 3.9406 (3.3625) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:06:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][330/625] eta 0:02:02 lr 0.000075 wd 0.0500 time 0.4001 (0.4162) data time 0.0008 (0.0032) model time 0.3993 (0.4139) loss 6.6850 (6.4491) grad_norm 2.8154 (3.3583) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][340/625] eta 0:01:58 lr 0.000075 wd 0.0500 time 0.3960 (0.4158) data time 0.0007 (0.0031) model time 0.3953 (0.4134) loss 7.2372 (6.4560) grad_norm 2.7197 (3.3428) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][350/625] eta 0:01:54 lr 0.000075 wd 0.0500 time 0.3905 (0.4159) data time 0.0008 (0.0030) model time 0.3897 (0.4136) loss 5.5829 (6.4617) grad_norm 2.2349 (3.3335) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][360/625] eta 0:01:50 lr 0.000075 wd 0.0500 time 0.4063 (0.4159) data time 0.0010 (0.0030) model time 0.4053 (0.4137) loss 7.0691 (6.4596) grad_norm 1.8571 (3.4097) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][370/625] eta 0:01:46 lr 0.000075 wd 0.0500 time 0.3997 (0.4164) data time 0.0006 (0.0029) model time 0.3991 (0.4143) loss 5.2212 (6.4485) grad_norm 3.2277 (3.3984) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][380/625] eta 0:01:42 lr 0.000075 wd 0.0500 time 0.5729 (0.4170) data time 0.0007 (0.0029) model time 0.5721 (0.4150) loss 5.5720 (6.4456) grad_norm 1.8508 (3.3852) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][390/625] eta 0:01:38 lr 0.000075 wd 0.0500 time 0.5927 (0.4185) data time 0.0007 (0.0028) model time 0.5920 (0.4167) loss 5.5851 (6.4438) grad_norm 3.3348 (3.4410) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][400/625] eta 0:01:34 lr 0.000075 wd 0.0500 time 0.3971 (0.4189) data time 0.0008 (0.0028) model time 0.3963 (0.4173) loss 5.7037 (6.4459) grad_norm 2.1664 (3.4211) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][410/625] eta 0:01:30 lr 0.000075 wd 0.0500 time 0.4079 (0.4204) data time 0.0007 (0.0027) model time 0.4073 (0.4189) loss 6.4681 (6.4484) grad_norm 3.0152 (3.4277) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][420/625] eta 0:01:26 lr 0.000075 wd 0.0500 time 0.3976 (0.4208) data time 0.0009 (0.0027) model time 0.3967 (0.4194) loss 6.5202 (6.4557) grad_norm 2.4406 (3.4190) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][430/625] eta 0:01:22 lr 0.000075 wd 0.0500 time 0.5750 (0.4212) data time 0.0007 (0.0026) model time 0.5742 (0.4199) loss 5.3455 (6.4573) grad_norm 2.2618 (3.4307) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][440/625] eta 0:01:17 lr 0.000075 wd 0.0500 time 0.4008 (0.4211) data time 0.0006 (0.0026) model time 0.4002 (0.4198) loss 7.3689 (6.4566) grad_norm 2.6982 (3.4271) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][450/625] eta 0:01:13 lr 0.000075 wd 0.0500 time 0.3960 (0.4206) data time 0.0009 (0.0025) model time 0.3950 (0.4193) loss 7.1828 (6.4539) grad_norm 2.8825 (3.4435) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][460/625] eta 0:01:09 lr 0.000075 wd 0.0500 time 0.3959 (0.4202) data time 0.0010 (0.0025) model time 0.3949 (0.4188) loss 7.5887 (6.4541) grad_norm 2.3183 (3.4260) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:07:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][470/625] eta 0:01:05 lr 0.000074 wd 0.0500 time 0.3951 (0.4211) data time 0.0007 (0.0025) model time 0.3944 (0.4199) loss 5.3946 (6.4501) grad_norm 2.4895 (3.4170) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][480/625] eta 0:01:00 lr 0.000074 wd 0.0500 time 0.3952 (0.4206) data time 0.0006 (0.0024) model time 0.3946 (0.4194) loss 6.0508 (6.4496) grad_norm 2.0098 (3.3995) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][490/625] eta 0:00:56 lr 0.000074 wd 0.0500 time 0.3986 (0.4202) data time 0.0006 (0.0024) model time 0.3980 (0.4189) loss 5.9835 (6.4522) grad_norm 1.8980 (3.3925) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][500/625] eta 0:00:52 lr 0.000074 wd 0.0500 time 0.3966 (0.4198) data time 0.0009 (0.0024) model time 0.3957 (0.4184) loss 6.9590 (6.4532) grad_norm 2.8593 (3.3897) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][510/625] eta 0:00:48 lr 0.000074 wd 0.0500 time 0.4017 (0.4193) data time 0.0007 (0.0023) model time 0.4010 (0.4179) loss 6.3750 (6.4521) grad_norm 5.3131 (3.3955) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][520/625] eta 0:00:43 lr 0.000074 wd 0.0500 time 0.4001 (0.4189) data time 0.0006 (0.0023) model time 0.3995 (0.4175) loss 5.7636 (6.4439) grad_norm 3.0148 (3.3984) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][530/625] eta 0:00:39 lr 0.000074 wd 0.0500 time 0.3952 (0.4185) data time 0.0009 (0.0023) model time 0.3943 (0.4171) loss 6.7731 (6.4492) grad_norm 3.6064 (3.3975) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][540/625] eta 0:00:35 lr 0.000074 wd 0.0500 time 0.3987 (0.4182) data time 0.0007 (0.0023) model time 0.3980 (0.4167) loss 5.6391 (6.4481) grad_norm 3.0984 (3.3965) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][550/625] eta 0:00:31 lr 0.000074 wd 0.0500 time 0.4037 (0.4178) data time 0.0008 (0.0022) model time 0.4029 (0.4163) loss 6.7607 (6.4534) grad_norm 1.8805 (3.3928) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][560/625] eta 0:00:27 lr 0.000074 wd 0.0500 time 0.4009 (0.4175) data time 0.0007 (0.0022) model time 0.4001 (0.4160) loss 5.8545 (6.4517) grad_norm 1.9960 (3.4042) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][570/625] eta 0:00:22 lr 0.000074 wd 0.0500 time 0.3965 (0.4172) data time 0.0010 (0.0022) model time 0.3955 (0.4156) loss 7.5745 (6.4559) grad_norm 4.3097 (3.4182) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][580/625] eta 0:00:18 lr 0.000074 wd 0.0500 time 0.4019 (0.4176) data time 0.0008 (0.0022) model time 0.4011 (0.4161) loss 7.0375 (6.4567) grad_norm 2.2037 (3.4274) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][590/625] eta 0:00:14 lr 0.000074 wd 0.0500 time 0.3990 (0.4176) data time 0.0008 (0.0021) model time 0.3982 (0.4161) loss 6.4422 (6.4555) grad_norm 4.9408 (3.4302) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][600/625] eta 0:00:10 lr 0.000074 wd 0.0500 time 0.4065 (0.4175) data time 0.0006 (0.0021) model time 0.4059 (0.4161) loss 6.0269 (6.4554) grad_norm 3.0264 (3.4296) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][610/625] eta 0:00:06 lr 0.000074 wd 0.0500 time 0.5645 (0.4190) data time 0.0006 (0.0021) model time 0.5639 (0.4177) loss 7.0935 (6.4520) grad_norm 2.6022 (3.4283) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:08:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][620/625] eta 0:00:02 lr 0.000074 wd 0.0500 time 0.3955 (0.4192) data time 0.0004 (0.0021) model time 0.3951 (0.4179) loss 6.9618 (6.4577) grad_norm 3.9891 (3.4277) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 258 training takes 0:04:22 [2024-07-25 11:09:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:09:01 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:09:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.5469 (0.5469) Acc@1 90.039 (90.039) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 11:09:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.123) Loss 0.8091 (0.6610) Acc@1 82.910 (87.358) Acc@5 97.119 (97.971) Mem 14939MB [2024-07-25 11:09:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9043 (0.7636) Acc@1 79.346 (84.394) Acc@5 95.996 (97.049) Mem 14939MB [2024-07-25 11:09:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.987 Acc@5 97.003 [2024-07-25 11:09:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 11:09:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.911 (0.911) Loss 0.5376 (0.5376) Acc@1 90.186 (90.186) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 11:09:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.162) Loss 0.8125 (0.6565) Acc@1 83.154 (87.393) Acc@5 96.973 (97.994) Mem 14939MB [2024-07-25 11:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.125) Loss 0.9194 (0.7609) Acc@1 78.467 (84.442) Acc@5 95.557 (97.019) Mem 14939MB [2024-07-25 11:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.027 Acc@5 96.983 [2024-07-25 11:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 11:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.03% [2024-07-25 11:09:06 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:09:07 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:09:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][0/625] eta 0:17:50 lr 0.000074 wd 0.0500 time 1.7123 (1.7123) data time 1.3334 (1.3334) model time 0.0000 (0.0000) loss 7.4708 (7.4708) grad_norm 5.0334 (5.0334) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][10/625] eta 0:05:57 lr 0.000074 wd 0.0500 time 0.5696 (0.5819) data time 0.0009 (0.1221) model time 0.0000 (0.0000) loss 6.0215 (6.6091) grad_norm 5.1736 (3.6481) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][20/625] eta 0:05:13 lr 0.000074 wd 0.0500 time 0.4053 (0.5176) data time 0.0007 (0.0643) model time 0.0000 (0.0000) loss 5.5653 (6.4968) grad_norm 2.5110 (3.3160) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][30/625] eta 0:04:50 lr 0.000074 wd 0.0500 time 0.5592 (0.4881) data time 0.0006 (0.0438) model time 0.0000 (0.0000) loss 5.8314 (6.5817) grad_norm 2.3030 (3.1767) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][40/625] eta 0:04:32 lr 0.000074 wd 0.0500 time 0.4042 (0.4666) data time 0.0006 (0.0334) model time 0.0000 (0.0000) loss 6.1864 (6.5245) grad_norm 2.6371 (3.2184) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][50/625] eta 0:04:20 lr 0.000074 wd 0.0500 time 0.4038 (0.4539) data time 0.0006 (0.0270) model time 0.0000 (0.0000) loss 5.8791 (6.5325) grad_norm 2.6124 (3.2092) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][60/625] eta 0:04:11 lr 0.000073 wd 0.0500 time 0.3959 (0.4450) data time 0.0006 (0.0227) model time 0.3953 (0.3990) loss 5.6018 (6.5120) grad_norm 2.3786 (3.1170) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][70/625] eta 0:04:03 lr 0.000073 wd 0.0500 time 0.3972 (0.4388) data time 0.0007 (0.0196) model time 0.3965 (0.3993) loss 5.5172 (6.4501) grad_norm 2.1287 (3.0270) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][80/625] eta 0:03:56 lr 0.000073 wd 0.0500 time 0.3937 (0.4340) data time 0.0009 (0.0173) model time 0.3928 (0.3993) loss 6.8793 (6.4324) grad_norm 2.1977 (2.9933) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][90/625] eta 0:03:50 lr 0.000073 wd 0.0500 time 0.3943 (0.4303) data time 0.0009 (0.0155) model time 0.3934 (0.3994) loss 7.2977 (6.4315) grad_norm 3.4128 (4.1441) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][100/625] eta 0:03:45 lr 0.000073 wd 0.0500 time 0.3975 (0.4293) data time 0.0007 (0.0141) model time 0.3968 (0.4034) loss 7.0869 (6.4482) grad_norm 4.7165 (4.0468) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][110/625] eta 0:03:39 lr 0.000073 wd 0.0500 time 0.4000 (0.4267) data time 0.0009 (0.0129) model time 0.3992 (0.4026) loss 6.4243 (6.4360) grad_norm 2.9289 (3.9295) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:09:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][120/625] eta 0:03:34 lr 0.000073 wd 0.0500 time 0.3950 (0.4244) data time 0.0008 (0.0119) model time 0.3943 (0.4020) loss 7.0378 (6.4576) grad_norm 3.7516 (3.8749) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][130/625] eta 0:03:29 lr 0.000073 wd 0.0500 time 0.3956 (0.4227) data time 0.0007 (0.0110) model time 0.3950 (0.4020) loss 6.4784 (6.4654) grad_norm 4.6469 (3.8383) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][140/625] eta 0:03:24 lr 0.000073 wd 0.0500 time 0.4015 (0.4212) data time 0.0006 (0.0103) model time 0.4008 (0.4018) loss 6.3296 (6.4709) grad_norm 2.2981 (3.7873) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][150/625] eta 0:03:19 lr 0.000073 wd 0.0500 time 0.4018 (0.4198) data time 0.0006 (0.0097) model time 0.4011 (0.4015) loss 5.9194 (6.4800) grad_norm 4.3062 (3.8064) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][160/625] eta 0:03:14 lr 0.000073 wd 0.0500 time 0.3947 (0.4185) data time 0.0007 (0.0091) model time 0.3940 (0.4012) loss 6.1683 (6.4760) grad_norm 2.9214 (3.8734) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][170/625] eta 0:03:09 lr 0.000073 wd 0.0500 time 0.4060 (0.4174) data time 0.0007 (0.0087) model time 0.4053 (0.4011) loss 7.0543 (6.4930) grad_norm 3.7957 (3.9406) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][180/625] eta 0:03:06 lr 0.000073 wd 0.0500 time 0.5829 (0.4181) data time 0.0008 (0.0082) model time 0.5821 (0.4031) loss 6.7097 (6.4998) grad_norm 2.3126 (3.8850) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][190/625] eta 0:03:01 lr 0.000073 wd 0.0500 time 0.4003 (0.4179) data time 0.0009 (0.0078) model time 0.3994 (0.4039) loss 6.1525 (6.5003) grad_norm 5.9341 (3.8350) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][200/625] eta 0:02:58 lr 0.000073 wd 0.0500 time 0.4059 (0.4198) data time 0.0009 (0.0075) model time 0.4051 (0.4074) loss 6.1575 (6.5102) grad_norm 6.0774 (3.7909) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][210/625] eta 0:02:55 lr 0.000073 wd 0.0500 time 0.3974 (0.4230) data time 0.0010 (0.0072) model time 0.3964 (0.4123) loss 6.4741 (6.5219) grad_norm 3.9728 (3.7832) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][220/625] eta 0:02:51 lr 0.000073 wd 0.0500 time 0.4019 (0.4241) data time 0.0006 (0.0069) model time 0.4012 (0.4143) loss 7.0528 (6.5199) grad_norm 2.2789 (3.7805) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][230/625] eta 0:02:47 lr 0.000073 wd 0.0500 time 0.3931 (0.4244) data time 0.0008 (0.0066) model time 0.3923 (0.4152) loss 6.3639 (6.5168) grad_norm 2.8518 (3.7277) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][240/625] eta 0:02:44 lr 0.000073 wd 0.0500 time 0.3993 (0.4263) data time 0.0007 (0.0064) model time 0.3986 (0.4181) loss 6.2882 (6.5058) grad_norm 2.4933 (3.7487) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][250/625] eta 0:02:39 lr 0.000073 wd 0.0500 time 0.4021 (0.4257) data time 0.0006 (0.0062) model time 0.4015 (0.4177) loss 7.4296 (6.5084) grad_norm 3.6035 (3.7509) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:10:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][260/625] eta 0:02:35 lr 0.000073 wd 0.0500 time 0.4059 (0.4254) data time 0.0006 (0.0060) model time 0.4052 (0.4177) loss 6.8379 (6.5120) grad_norm 5.7786 (3.7376) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][270/625] eta 0:02:30 lr 0.000072 wd 0.0500 time 0.3957 (0.4245) data time 0.0010 (0.0058) model time 0.3947 (0.4168) loss 6.6126 (6.5175) grad_norm 3.7516 (3.9985) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][280/625] eta 0:02:26 lr 0.000072 wd 0.0500 time 0.3997 (0.4236) data time 0.0009 (0.0056) model time 0.3988 (0.4160) loss 6.6617 (6.5068) grad_norm 3.2957 (3.9688) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][290/625] eta 0:02:21 lr 0.000072 wd 0.0500 time 0.4063 (0.4228) data time 0.0008 (0.0054) model time 0.4055 (0.4153) loss 5.6530 (6.5057) grad_norm 2.6039 (3.9196) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][300/625] eta 0:02:17 lr 0.000072 wd 0.0500 time 0.4008 (0.4220) data time 0.0006 (0.0053) model time 0.4002 (0.4146) loss 6.6826 (6.5034) grad_norm 8.6525 (3.9871) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][310/625] eta 0:02:12 lr 0.000072 wd 0.0500 time 0.4008 (0.4214) data time 0.0008 (0.0052) model time 0.4000 (0.4141) loss 6.1858 (6.5066) grad_norm 2.5971 (4.0118) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][320/625] eta 0:02:08 lr 0.000072 wd 0.0500 time 0.4004 (0.4212) data time 0.0009 (0.0050) model time 0.3996 (0.4141) loss 6.6652 (6.5072) grad_norm 3.1424 (4.0220) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][330/625] eta 0:02:04 lr 0.000072 wd 0.0500 time 0.3954 (0.4205) data time 0.0009 (0.0049) model time 0.3945 (0.4136) loss 6.0115 (6.5104) grad_norm 1.8355 (4.0111) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][340/625] eta 0:01:59 lr 0.000072 wd 0.0500 time 0.3982 (0.4198) data time 0.0008 (0.0048) model time 0.3974 (0.4130) loss 5.5494 (6.4994) grad_norm 3.5146 (3.9793) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][350/625] eta 0:01:55 lr 0.000072 wd 0.0500 time 0.3950 (0.4192) data time 0.0009 (0.0047) model time 0.3941 (0.4125) loss 7.5329 (6.5014) grad_norm 2.1544 (3.9602) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][360/625] eta 0:01:50 lr 0.000072 wd 0.0500 time 0.4047 (0.4186) data time 0.0006 (0.0046) model time 0.4040 (0.4120) loss 6.5049 (6.4933) grad_norm 3.0286 (3.9332) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][370/625] eta 0:01:46 lr 0.000072 wd 0.0500 time 0.4027 (0.4181) data time 0.0006 (0.0045) model time 0.4020 (0.4116) loss 6.4789 (6.4889) grad_norm 2.3169 (3.9000) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][380/625] eta 0:01:42 lr 0.000072 wd 0.0500 time 0.4025 (0.4177) data time 0.0008 (0.0044) model time 0.4017 (0.4112) loss 7.3676 (6.4824) grad_norm 2.0802 (3.8815) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][390/625] eta 0:01:38 lr 0.000072 wd 0.0500 time 0.3966 (0.4172) data time 0.0006 (0.0043) model time 0.3960 (0.4108) loss 6.1578 (6.4716) grad_norm 3.4029 (3.9129) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][400/625] eta 0:01:33 lr 0.000072 wd 0.0500 time 0.4074 (0.4174) data time 0.0008 (0.0042) model time 0.4066 (0.4113) loss 6.4162 (6.4798) grad_norm 4.3427 (3.9255) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:11:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][410/625] eta 0:01:29 lr 0.000072 wd 0.0500 time 0.3948 (0.4173) data time 0.0010 (0.0041) model time 0.3938 (0.4113) loss 7.3539 (6.4815) grad_norm 3.4969 (3.9010) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][420/625] eta 0:01:25 lr 0.000072 wd 0.0500 time 0.3983 (0.4181) data time 0.0010 (0.0040) model time 0.3974 (0.4123) loss 6.8087 (6.4830) grad_norm 2.8734 (3.8885) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][430/625] eta 0:01:21 lr 0.000072 wd 0.0500 time 0.3964 (0.4194) data time 0.0008 (0.0040) model time 0.3956 (0.4139) loss 6.3992 (6.4940) grad_norm 2.3445 (3.8869) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][440/625] eta 0:01:17 lr 0.000072 wd 0.0500 time 0.5731 (0.4199) data time 0.0008 (0.0039) model time 0.5723 (0.4146) loss 6.7791 (6.4912) grad_norm 7.1607 (3.8798) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][450/625] eta 0:01:13 lr 0.000072 wd 0.0500 time 0.3996 (0.4202) data time 0.0007 (0.0038) model time 0.3988 (0.4150) loss 7.1790 (6.4892) grad_norm 3.9418 (3.8843) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][460/625] eta 0:01:09 lr 0.000072 wd 0.0500 time 0.5745 (0.4216) data time 0.0008 (0.0038) model time 0.5736 (0.4167) loss 6.6308 (6.4903) grad_norm 4.5751 (3.9671) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][470/625] eta 0:01:05 lr 0.000072 wd 0.0500 time 0.4117 (0.4215) data time 0.0010 (0.0037) model time 0.4107 (0.4167) loss 5.8450 (6.4850) grad_norm 2.4802 (3.9496) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][480/625] eta 0:01:01 lr 0.000071 wd 0.0500 time 0.3985 (0.4214) data time 0.0009 (0.0036) model time 0.3977 (0.4167) loss 6.8007 (6.4820) grad_norm 2.6486 (3.9606) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][490/625] eta 0:00:56 lr 0.000071 wd 0.0500 time 0.4120 (0.4209) data time 0.0007 (0.0036) model time 0.4113 (0.4162) loss 6.0402 (6.4820) grad_norm 2.0183 (3.9609) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][500/625] eta 0:00:52 lr 0.000071 wd 0.0500 time 0.3973 (0.4205) data time 0.0006 (0.0035) model time 0.3967 (0.4158) loss 5.9051 (6.4787) grad_norm 2.3174 (3.9392) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][510/625] eta 0:00:48 lr 0.000071 wd 0.0500 time 0.3999 (0.4201) data time 0.0008 (0.0035) model time 0.3991 (0.4155) loss 7.1572 (6.4755) grad_norm 4.1601 (3.9323) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][520/625] eta 0:00:44 lr 0.000071 wd 0.0500 time 0.4025 (0.4197) data time 0.0006 (0.0034) model time 0.4019 (0.4151) loss 6.3937 (6.4771) grad_norm 3.1636 (3.9207) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][530/625] eta 0:00:39 lr 0.000071 wd 0.0500 time 0.4024 (0.4193) data time 0.0006 (0.0034) model time 0.4018 (0.4147) loss 7.3081 (6.4832) grad_norm 3.5299 (3.9067) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][540/625] eta 0:00:35 lr 0.000071 wd 0.0500 time 0.4057 (0.4192) data time 0.0008 (0.0033) model time 0.4048 (0.4147) loss 7.2084 (6.4859) grad_norm 4.7921 (3.9200) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:12:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][550/625] eta 0:00:31 lr 0.000071 wd 0.0500 time 0.4161 (0.4189) data time 0.0007 (0.0033) model time 0.4153 (0.4145) loss 6.5013 (6.4868) grad_norm 2.4077 (3.9153) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][560/625] eta 0:00:27 lr 0.000071 wd 0.0500 time 0.4059 (0.4186) data time 0.0009 (0.0032) model time 0.4050 (0.4142) loss 7.4895 (6.4966) grad_norm 4.0237 (3.9258) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][570/625] eta 0:00:23 lr 0.000071 wd 0.0500 time 0.3950 (0.4182) data time 0.0009 (0.0032) model time 0.3941 (0.4139) loss 5.9972 (6.4950) grad_norm 2.3339 (3.9166) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][580/625] eta 0:00:18 lr 0.000071 wd 0.0500 time 0.4071 (0.4180) data time 0.0007 (0.0032) model time 0.4065 (0.4137) loss 6.0760 (6.4913) grad_norm 2.2722 (3.8983) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][590/625] eta 0:00:14 lr 0.000071 wd 0.0500 time 0.3981 (0.4177) data time 0.0007 (0.0031) model time 0.3974 (0.4134) loss 5.2883 (6.4899) grad_norm 2.1300 (3.8751) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][600/625] eta 0:00:10 lr 0.000071 wd 0.0500 time 0.3978 (0.4175) data time 0.0008 (0.0031) model time 0.3970 (0.4132) loss 6.5422 (6.4938) grad_norm 2.5808 (3.8562) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][610/625] eta 0:00:06 lr 0.000071 wd 0.0500 time 0.3969 (0.4172) data time 0.0006 (0.0031) model time 0.3963 (0.4129) loss 5.7165 (6.4931) grad_norm 6.1087 (3.8474) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][620/625] eta 0:00:02 lr 0.000071 wd 0.0500 time 0.3973 (0.4174) data time 0.0006 (0.0030) model time 0.3967 (0.4133) loss 5.8533 (6.4894) grad_norm 3.4228 (3.8334) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 259 training takes 0:04:20 [2024-07-25 11:13:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:13:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:13:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.526 (0.526) Loss 0.5444 (0.5444) Acc@1 90.430 (90.430) Acc@5 98.730 (98.730) Mem 14939MB [2024-07-25 11:13:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.126) Loss 0.8149 (0.6580) Acc@1 82.617 (87.491) Acc@5 96.924 (97.940) Mem 14939MB [2024-07-25 11:13:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.107) Loss 0.9189 (0.7622) Acc@1 78.467 (84.480) Acc@5 95.752 (96.982) Mem 14939MB [2024-07-25 11:13:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.099 Acc@5 96.951 [2024-07-25 11:13:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:13:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.949 (0.949) Loss 0.5376 (0.5376) Acc@1 90.137 (90.137) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 11:13:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.165) Loss 0.8115 (0.6562) Acc@1 83.105 (87.380) Acc@5 97.021 (97.994) Mem 14939MB [2024-07-25 11:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.127) Loss 0.9189 (0.7607) Acc@1 78.418 (84.433) Acc@5 95.605 (97.021) Mem 14939MB [2024-07-25 11:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.033 Acc@5 96.985 [2024-07-25 11:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 11:13:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.03% [2024-07-25 11:13:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:13:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:13:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][0/625] eta 0:09:44 lr 0.000071 wd 0.0500 time 0.9353 (0.9353) data time 0.5423 (0.5423) model time 0.0000 (0.0000) loss 6.5654 (6.5654) grad_norm 6.2000 (6.2000) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][10/625] eta 0:04:57 lr 0.000071 wd 0.0500 time 0.5721 (0.4838) data time 0.0007 (0.0501) model time 0.0000 (0.0000) loss 7.0310 (6.7393) grad_norm 2.9454 (3.1386) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][20/625] eta 0:04:54 lr 0.000071 wd 0.0500 time 0.6146 (0.4874) data time 0.0009 (0.0266) model time 0.0000 (0.0000) loss 6.6549 (6.6602) grad_norm 2.4917 (3.1198) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 11:13:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][30/625] eta 0:04:45 lr 0.000071 wd 0.0500 time 0.3963 (0.4796) data time 0.0009 (0.0183) model time 0.0000 (0.0000) loss 6.0576 (6.5226) grad_norm 1.9493 (3.3427) loss_scale 64.0000 (40.2581) mem 14939MB [2024-07-25 11:13:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][40/625] eta 0:04:37 lr 0.000071 wd 0.0500 time 0.5717 (0.4747) data time 0.0006 (0.0141) model time 0.0000 (0.0000) loss 7.2892 (6.5311) grad_norm 3.6969 (3.5071) loss_scale 64.0000 (46.0488) mem 14939MB [2024-07-25 11:13:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][50/625] eta 0:04:28 lr 0.000071 wd 0.0500 time 0.4025 (0.4667) data time 0.0006 (0.0115) model time 0.0000 (0.0000) loss 6.3192 (6.5050) grad_norm 2.4952 (3.5858) loss_scale 64.0000 (49.5686) mem 14939MB [2024-07-25 11:14:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][60/625] eta 0:04:20 lr 0.000071 wd 0.0500 time 0.4027 (0.4611) data time 0.0009 (0.0097) model time 0.4018 (0.4317) loss 6.6948 (6.4469) grad_norm 8.3086 (3.6041) loss_scale 64.0000 (51.9344) mem 14939MB [2024-07-25 11:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][70/625] eta 0:04:13 lr 0.000071 wd 0.0500 time 0.3972 (0.4571) data time 0.0010 (0.0085) model time 0.3962 (0.4317) loss 7.1214 (6.4548) grad_norm 2.0549 (3.4781) loss_scale 64.0000 (53.6338) mem 14939MB [2024-07-25 11:14:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][80/625] eta 0:04:05 lr 0.000070 wd 0.0500 time 0.4027 (0.4506) data time 0.0007 (0.0076) model time 0.4020 (0.4221) loss 5.9046 (6.4262) grad_norm 2.4719 (3.6501) loss_scale 64.0000 (54.9136) mem 14939MB [2024-07-25 11:14:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][90/625] eta 0:03:58 lr 0.000070 wd 0.0500 time 0.3967 (0.4450) data time 0.0008 (0.0068) model time 0.3959 (0.4163) loss 6.8982 (6.4547) grad_norm 2.7275 (3.5080) loss_scale 64.0000 (55.9121) mem 14939MB [2024-07-25 11:14:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][100/625] eta 0:03:51 lr 0.000070 wd 0.0500 time 0.3989 (0.4406) data time 0.0009 (0.0063) model time 0.3980 (0.4130) loss 6.5607 (6.4711) grad_norm 3.3875 (3.4254) loss_scale 64.0000 (56.7129) mem 14939MB [2024-07-25 11:14:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][110/625] eta 0:03:45 lr 0.000070 wd 0.0500 time 0.4010 (0.4369) data time 0.0007 (0.0058) model time 0.4003 (0.4107) loss 7.4848 (6.4715) grad_norm 2.8927 (3.3976) loss_scale 64.0000 (57.3694) mem 14939MB [2024-07-25 11:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][120/625] eta 0:03:39 lr 0.000070 wd 0.0500 time 0.3936 (0.4339) data time 0.0008 (0.0054) model time 0.3929 (0.4091) loss 6.9267 (6.4997) grad_norm 2.0905 (3.3741) loss_scale 64.0000 (57.9174) mem 14939MB [2024-07-25 11:14:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][130/625] eta 0:03:33 lr 0.000070 wd 0.0500 time 0.3969 (0.4311) data time 0.0009 (0.0050) model time 0.3960 (0.4075) loss 7.6052 (6.4878) grad_norm 2.7875 (3.4993) loss_scale 64.0000 (58.3817) mem 14939MB [2024-07-25 11:14:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][140/625] eta 0:03:28 lr 0.000070 wd 0.0500 time 0.3980 (0.4289) data time 0.0009 (0.0047) model time 0.3971 (0.4066) loss 6.4751 (6.4838) grad_norm 2.7125 (3.5506) loss_scale 64.0000 (58.7801) mem 14939MB [2024-07-25 11:14:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][150/625] eta 0:03:22 lr 0.000070 wd 0.0500 time 0.4015 (0.4270) data time 0.0008 (0.0045) model time 0.4007 (0.4059) loss 6.4517 (6.4728) grad_norm 2.4920 (3.5105) loss_scale 64.0000 (59.1258) mem 14939MB [2024-07-25 11:14:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][160/625] eta 0:03:17 lr 0.000070 wd 0.0500 time 0.3979 (0.4253) data time 0.0006 (0.0042) model time 0.3972 (0.4052) loss 6.3326 (6.4668) grad_norm 2.6864 (3.4404) loss_scale 64.0000 (59.4286) mem 14939MB [2024-07-25 11:14:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][170/625] eta 0:03:12 lr 0.000070 wd 0.0500 time 0.4005 (0.4239) data time 0.0006 (0.0040) model time 0.3999 (0.4048) loss 5.2686 (6.4658) grad_norm 2.5378 (3.6724) loss_scale 64.0000 (59.6959) mem 14939MB [2024-07-25 11:14:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][180/625] eta 0:03:08 lr 0.000070 wd 0.0500 time 0.3996 (0.4227) data time 0.0009 (0.0039) model time 0.3987 (0.4045) loss 5.2250 (6.4506) grad_norm 3.4871 (3.6729) loss_scale 64.0000 (59.9337) mem 14939MB [2024-07-25 11:14:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][190/625] eta 0:03:03 lr 0.000070 wd 0.0500 time 0.3979 (0.4214) data time 0.0008 (0.0037) model time 0.3971 (0.4041) loss 6.6772 (6.4373) grad_norm 5.2888 (3.6571) loss_scale 64.0000 (60.1466) mem 14939MB [2024-07-25 11:15:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][200/625] eta 0:02:58 lr 0.000070 wd 0.0500 time 0.3987 (0.4204) data time 0.0008 (0.0036) model time 0.3979 (0.4038) loss 5.2694 (6.4252) grad_norm 2.2650 (3.8364) loss_scale 64.0000 (60.3383) mem 14939MB [2024-07-25 11:15:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][210/625] eta 0:02:54 lr 0.000070 wd 0.0500 time 0.5894 (0.4204) data time 0.0006 (0.0034) model time 0.5888 (0.4047) loss 6.0364 (6.4294) grad_norm 3.5383 (3.8259) loss_scale 64.0000 (60.5118) mem 14939MB [2024-07-25 11:15:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][220/625] eta 0:02:50 lr 0.000070 wd 0.0500 time 0.4074 (0.4205) data time 0.0008 (0.0033) model time 0.4067 (0.4057) loss 6.4305 (6.4216) grad_norm 2.2929 (3.8068) loss_scale 64.0000 (60.6697) mem 14939MB [2024-07-25 11:15:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][230/625] eta 0:02:46 lr 0.000070 wd 0.0500 time 0.4002 (0.4203) data time 0.0007 (0.0032) model time 0.3994 (0.4063) loss 6.7442 (6.4259) grad_norm 3.3823 (3.7809) loss_scale 64.0000 (60.8139) mem 14939MB [2024-07-25 11:15:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][240/625] eta 0:02:42 lr 0.000070 wd 0.0500 time 0.3948 (0.4226) data time 0.0006 (0.0031) model time 0.3942 (0.4098) loss 6.0522 (6.4337) grad_norm 2.2549 (3.7481) loss_scale 64.0000 (60.9461) mem 14939MB [2024-07-25 11:15:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][250/625] eta 0:02:39 lr 0.000070 wd 0.0500 time 0.3970 (0.4244) data time 0.0007 (0.0030) model time 0.3963 (0.4128) loss 7.1474 (6.4360) grad_norm 2.8825 (3.7614) loss_scale 64.0000 (61.0677) mem 14939MB [2024-07-25 11:15:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][260/625] eta 0:02:35 lr 0.000070 wd 0.0500 time 0.3943 (0.4256) data time 0.0007 (0.0029) model time 0.3936 (0.4147) loss 6.0044 (6.4399) grad_norm 3.9910 (3.7276) loss_scale 64.0000 (61.1801) mem 14939MB [2024-07-25 11:15:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][270/625] eta 0:02:31 lr 0.000070 wd 0.0500 time 0.5945 (0.4267) data time 0.0006 (0.0029) model time 0.5939 (0.4166) loss 6.4310 (6.4382) grad_norm 1.9161 (3.7125) loss_scale 64.0000 (61.2841) mem 14939MB [2024-07-25 11:15:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][280/625] eta 0:02:27 lr 0.000070 wd 0.0500 time 0.3832 (0.4268) data time 0.0010 (0.0028) model time 0.3822 (0.4171) loss 6.3814 (6.4373) grad_norm 3.1009 (3.7470) loss_scale 64.0000 (61.3808) mem 14939MB [2024-07-25 11:15:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][290/625] eta 0:02:23 lr 0.000069 wd 0.0500 time 0.4054 (0.4274) data time 0.0009 (0.0027) model time 0.4046 (0.4182) loss 7.8752 (6.4320) grad_norm 3.0906 (4.1821) loss_scale 64.0000 (61.4708) mem 14939MB [2024-07-25 11:15:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][300/625] eta 0:02:18 lr 0.000069 wd 0.0500 time 0.3998 (0.4266) data time 0.0007 (0.0027) model time 0.3991 (0.4175) loss 5.5036 (6.4213) grad_norm 3.1788 (4.1387) loss_scale 64.0000 (61.5548) mem 14939MB [2024-07-25 11:15:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][310/625] eta 0:02:14 lr 0.000069 wd 0.0500 time 0.3961 (0.4257) data time 0.0008 (0.0026) model time 0.3953 (0.4169) loss 5.5560 (6.4174) grad_norm 3.4146 (4.1962) loss_scale 64.0000 (61.6334) mem 14939MB [2024-07-25 11:15:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][320/625] eta 0:02:09 lr 0.000069 wd 0.0500 time 0.4030 (0.4250) data time 0.0008 (0.0025) model time 0.4021 (0.4162) loss 6.7375 (6.4124) grad_norm 3.1933 (4.1722) loss_scale 64.0000 (61.7072) mem 14939MB [2024-07-25 11:15:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][330/625] eta 0:02:05 lr 0.000069 wd 0.0500 time 0.3983 (0.4242) data time 0.0009 (0.0025) model time 0.3974 (0.4156) loss 6.6875 (6.4099) grad_norm 4.9811 (4.1686) loss_scale 64.0000 (61.7764) mem 14939MB [2024-07-25 11:16:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][340/625] eta 0:02:00 lr 0.000069 wd 0.0500 time 0.4051 (0.4235) data time 0.0009 (0.0025) model time 0.4043 (0.4150) loss 6.7274 (6.4118) grad_norm 2.7006 (4.1477) loss_scale 64.0000 (61.8416) mem 14939MB [2024-07-25 11:16:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][350/625] eta 0:01:56 lr 0.000069 wd 0.0500 time 0.3990 (0.4229) data time 0.0008 (0.0024) model time 0.3982 (0.4145) loss 6.2451 (6.4079) grad_norm 3.0601 (4.2462) loss_scale 64.0000 (61.9031) mem 14939MB [2024-07-25 11:16:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][360/625] eta 0:01:51 lr 0.000069 wd 0.0500 time 0.4003 (0.4222) data time 0.0006 (0.0024) model time 0.3997 (0.4140) loss 5.4156 (6.4132) grad_norm 5.3085 (4.2287) loss_scale 64.0000 (61.9612) mem 14939MB [2024-07-25 11:16:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][370/625] eta 0:01:47 lr 0.000069 wd 0.0500 time 0.4012 (0.4216) data time 0.0008 (0.0023) model time 0.4004 (0.4136) loss 6.5899 (6.4089) grad_norm 8.3394 (4.2728) loss_scale 64.0000 (62.0162) mem 14939MB [2024-07-25 11:16:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][380/625] eta 0:01:43 lr 0.000069 wd 0.0500 time 0.4004 (0.4210) data time 0.0009 (0.0023) model time 0.3995 (0.4131) loss 6.6255 (6.4039) grad_norm 2.4585 (4.2268) loss_scale 64.0000 (62.0682) mem 14939MB [2024-07-25 11:16:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][390/625] eta 0:01:38 lr 0.000069 wd 0.0500 time 0.3995 (0.4205) data time 0.0007 (0.0023) model time 0.3988 (0.4127) loss 7.0765 (6.4057) grad_norm 2.4023 (4.2095) loss_scale 64.0000 (62.1176) mem 14939MB [2024-07-25 11:16:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][400/625] eta 0:01:34 lr 0.000069 wd 0.0500 time 0.3939 (0.4200) data time 0.0007 (0.0022) model time 0.3931 (0.4123) loss 5.7738 (6.4130) grad_norm 3.1476 (4.2014) loss_scale 64.0000 (62.1646) mem 14939MB [2024-07-25 11:16:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][410/625] eta 0:01:30 lr 0.000069 wd 0.0500 time 0.3967 (0.4195) data time 0.0009 (0.0022) model time 0.3958 (0.4119) loss 6.5582 (6.4044) grad_norm 2.3010 (4.1670) loss_scale 64.0000 (62.2092) mem 14939MB [2024-07-25 11:16:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][420/625] eta 0:01:25 lr 0.000069 wd 0.0500 time 0.4006 (0.4190) data time 0.0009 (0.0022) model time 0.3997 (0.4115) loss 6.4547 (6.4064) grad_norm 2.5532 (4.1371) loss_scale 64.0000 (62.2518) mem 14939MB [2024-07-25 11:16:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][430/625] eta 0:01:21 lr 0.000069 wd 0.0500 time 0.4053 (0.4190) data time 0.0007 (0.0021) model time 0.4047 (0.4117) loss 6.3331 (6.4129) grad_norm 4.5032 (4.1776) loss_scale 64.0000 (62.2923) mem 14939MB [2024-07-25 11:16:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][440/625] eta 0:01:17 lr 0.000069 wd 0.0500 time 0.4195 (0.4186) data time 0.0007 (0.0021) model time 0.4188 (0.4115) loss 6.4775 (6.4099) grad_norm 3.2968 (4.1846) loss_scale 64.0000 (62.3311) mem 14939MB [2024-07-25 11:16:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][450/625] eta 0:01:13 lr 0.000069 wd 0.0500 time 0.3918 (0.4186) data time 0.0010 (0.0021) model time 0.3908 (0.4116) loss 5.9681 (6.4150) grad_norm 2.4985 (4.1599) loss_scale 64.0000 (62.3681) mem 14939MB [2024-07-25 11:16:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][460/625] eta 0:01:09 lr 0.000069 wd 0.0500 time 0.3948 (0.4197) data time 0.0009 (0.0020) model time 0.3939 (0.4130) loss 5.7521 (6.4119) grad_norm 2.0301 (4.1351) loss_scale 64.0000 (62.4035) mem 14939MB [2024-07-25 11:16:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][470/625] eta 0:01:05 lr 0.000069 wd 0.0500 time 0.4003 (0.4209) data time 0.0009 (0.0020) model time 0.3995 (0.4145) loss 8.0469 (6.4119) grad_norm 2.2206 (4.1055) loss_scale 64.0000 (62.4374) mem 14939MB [2024-07-25 11:16:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][480/625] eta 0:01:01 lr 0.000069 wd 0.0500 time 0.3958 (0.4216) data time 0.0008 (0.0020) model time 0.3951 (0.4154) loss 5.9877 (6.4128) grad_norm 4.5020 (4.0954) loss_scale 64.0000 (62.4699) mem 14939MB [2024-07-25 11:17:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][490/625] eta 0:00:56 lr 0.000069 wd 0.0500 time 0.4044 (0.4218) data time 0.0011 (0.0020) model time 0.4033 (0.4157) loss 5.8100 (6.4098) grad_norm 28.2461 (4.1391) loss_scale 64.0000 (62.5010) mem 14939MB [2024-07-25 11:17:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][500/625] eta 0:00:52 lr 0.000069 wd 0.0500 time 0.4018 (0.4219) data time 0.0007 (0.0020) model time 0.4011 (0.4160) loss 5.4894 (6.4113) grad_norm 3.8708 (4.1246) loss_scale 64.0000 (62.5309) mem 14939MB [2024-07-25 11:17:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][510/625] eta 0:00:48 lr 0.000068 wd 0.0500 time 0.5502 (0.4222) data time 0.0009 (0.0019) model time 0.5493 (0.4164) loss 5.3768 (6.4158) grad_norm 5.0686 (4.1152) loss_scale 64.0000 (62.5597) mem 14939MB [2024-07-25 11:17:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][520/625] eta 0:00:44 lr 0.000068 wd 0.0500 time 0.3978 (0.4218) data time 0.0009 (0.0019) model time 0.3969 (0.4161) loss 5.7954 (6.4144) grad_norm 3.1163 (4.1031) loss_scale 64.0000 (62.5873) mem 14939MB [2024-07-25 11:17:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][530/625] eta 0:00:40 lr 0.000068 wd 0.0500 time 0.4007 (0.4215) data time 0.0011 (0.0019) model time 0.3996 (0.4158) loss 5.6279 (6.4222) grad_norm 2.5123 (4.0773) loss_scale 64.0000 (62.6139) mem 14939MB [2024-07-25 11:17:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][540/625] eta 0:00:35 lr 0.000068 wd 0.0500 time 0.4069 (0.4211) data time 0.0010 (0.0019) model time 0.4060 (0.4155) loss 6.3099 (6.4204) grad_norm 2.6743 (4.0484) loss_scale 64.0000 (62.6396) mem 14939MB [2024-07-25 11:17:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][550/625] eta 0:00:31 lr 0.000068 wd 0.0500 time 0.3936 (0.4207) data time 0.0007 (0.0019) model time 0.3929 (0.4151) loss 6.2179 (6.4259) grad_norm 2.6404 (4.0322) loss_scale 64.0000 (62.6642) mem 14939MB [2024-07-25 11:17:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][560/625] eta 0:00:27 lr 0.000068 wd 0.0500 time 0.3970 (0.4203) data time 0.0008 (0.0019) model time 0.3962 (0.4148) loss 6.3206 (6.4222) grad_norm 2.8553 (4.0127) loss_scale 64.0000 (62.6881) mem 14939MB [2024-07-25 11:17:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][570/625] eta 0:00:23 lr 0.000068 wd 0.0500 time 0.3971 (0.4200) data time 0.0007 (0.0018) model time 0.3964 (0.4145) loss 6.4540 (6.4211) grad_norm 3.6864 (4.0138) loss_scale 64.0000 (62.7110) mem 14939MB [2024-07-25 11:17:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][580/625] eta 0:00:18 lr 0.000068 wd 0.0500 time 0.9560 (0.4206) data time 0.0009 (0.0018) model time 0.9551 (0.4153) loss 6.9001 (6.4235) grad_norm 2.8614 (3.9991) loss_scale 64.0000 (62.7332) mem 14939MB [2024-07-25 11:17:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][590/625] eta 0:00:14 lr 0.000068 wd 0.0500 time 0.4041 (0.4203) data time 0.0009 (0.0018) model time 0.4033 (0.4151) loss 6.6500 (6.4176) grad_norm 2.5706 (3.9753) loss_scale 64.0000 (62.7547) mem 14939MB [2024-07-25 11:17:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][600/625] eta 0:00:10 lr 0.000068 wd 0.0500 time 0.3938 (0.4200) data time 0.0006 (0.0018) model time 0.3931 (0.4148) loss 6.8895 (6.4161) grad_norm 1.8666 (3.9534) loss_scale 64.0000 (62.7754) mem 14939MB [2024-07-25 11:17:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][610/625] eta 0:00:06 lr 0.000068 wd 0.0500 time 0.3959 (0.4197) data time 0.0004 (0.0018) model time 0.3955 (0.4145) loss 6.4280 (6.4211) grad_norm 3.8910 (3.9799) loss_scale 64.0000 (62.7954) mem 14939MB [2024-07-25 11:17:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][620/625] eta 0:00:02 lr 0.000068 wd 0.0500 time 0.3979 (0.4193) data time 0.0004 (0.0018) model time 0.3975 (0.4142) loss 6.4704 (6.4278) grad_norm 15.3083 (3.9867) loss_scale 64.0000 (62.8148) mem 14939MB [2024-07-25 11:17:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 260 training takes 0:04:21 [2024-07-25 11:17:57 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:17:58 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:17:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.348 (1.348) Loss 0.5439 (0.5439) Acc@1 90.479 (90.479) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 11:18:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.202) Loss 0.8125 (0.6562) Acc@1 82.324 (87.562) Acc@5 97.119 (97.976) Mem 14939MB [2024-07-25 11:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.147) Loss 0.9043 (0.7612) Acc@1 79.443 (84.528) Acc@5 96.045 (97.042) Mem 14939MB [2024-07-25 11:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.139 Acc@5 97.009 [2024-07-25 11:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:18:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.14% [2024-07-25 11:18:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 11:18:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 11:18:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5381 (0.5381) Acc@1 90.088 (90.088) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 11:18:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8110 (0.6561) Acc@1 83.154 (87.380) Acc@5 97.021 (97.994) Mem 14939MB [2024-07-25 11:18:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9185 (0.7603) Acc@1 78.467 (84.447) Acc@5 95.654 (97.015) Mem 14939MB [2024-07-25 11:18:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.043 Acc@5 96.977 [2024-07-25 11:18:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 11:18:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.04% [2024-07-25 11:18:07 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:18:07 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:18:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][0/625] eta 0:11:24 lr 0.000068 wd 0.0500 time 1.0944 (1.0944) data time 0.7195 (0.7195) model time 0.0000 (0.0000) loss 7.3772 (7.3772) grad_norm 2.5094 (2.5094) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][10/625] eta 0:04:43 lr 0.000068 wd 0.0500 time 0.4038 (0.4607) data time 0.0009 (0.0662) model time 0.0000 (0.0000) loss 5.5113 (6.3059) grad_norm 2.6956 (3.7708) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][20/625] eta 0:04:25 lr 0.000068 wd 0.0500 time 0.4024 (0.4391) data time 0.0008 (0.0351) model time 0.0000 (0.0000) loss 7.3248 (6.6138) grad_norm 2.0908 (3.6906) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][30/625] eta 0:04:21 lr 0.000068 wd 0.0500 time 0.4017 (0.4387) data time 0.0008 (0.0240) model time 0.0000 (0.0000) loss 5.3349 (6.5405) grad_norm 3.2190 (3.5566) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][40/625] eta 0:04:13 lr 0.000068 wd 0.0500 time 0.3975 (0.4330) data time 0.0007 (0.0184) model time 0.0000 (0.0000) loss 5.2912 (6.3846) grad_norm 2.8304 (3.4014) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][50/625] eta 0:04:08 lr 0.000068 wd 0.0500 time 0.5758 (0.4327) data time 0.0006 (0.0149) model time 0.0000 (0.0000) loss 5.6463 (6.3382) grad_norm 3.3104 (3.4095) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][60/625] eta 0:04:05 lr 0.000068 wd 0.0500 time 0.3940 (0.4349) data time 0.0006 (0.0126) model time 0.3934 (0.4452) loss 6.1328 (6.3301) grad_norm 3.2590 (3.4180) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][70/625] eta 0:04:02 lr 0.000068 wd 0.0500 time 0.3964 (0.4370) data time 0.0009 (0.0110) model time 0.3955 (0.4470) loss 6.7601 (6.3078) grad_norm 3.8506 (3.3754) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][80/625] eta 0:04:01 lr 0.000068 wd 0.0500 time 0.5882 (0.4431) data time 0.0008 (0.0097) model time 0.5874 (0.4599) loss 6.2204 (6.3322) grad_norm 2.5985 (3.2872) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][90/625] eta 0:03:57 lr 0.000068 wd 0.0500 time 0.3964 (0.4438) data time 0.0009 (0.0087) model time 0.3955 (0.4571) loss 5.2887 (6.3325) grad_norm 4.9908 (3.4399) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][100/625] eta 0:03:51 lr 0.000068 wd 0.0500 time 0.3975 (0.4409) data time 0.0006 (0.0080) model time 0.3969 (0.4484) loss 5.5493 (6.3104) grad_norm 2.2252 (3.3723) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:18:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][110/625] eta 0:03:46 lr 0.000067 wd 0.0500 time 0.4018 (0.4390) data time 0.0007 (0.0073) model time 0.4011 (0.4436) loss 5.8756 (6.3140) grad_norm 1.9288 (3.2840) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][120/625] eta 0:03:40 lr 0.000067 wd 0.0500 time 0.3970 (0.4358) data time 0.0008 (0.0068) model time 0.3962 (0.4372) loss 5.7946 (6.3139) grad_norm 3.8410 (3.2451) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][130/625] eta 0:03:34 lr 0.000067 wd 0.0500 time 0.3992 (0.4332) data time 0.0007 (0.0063) model time 0.3985 (0.4327) loss 6.1689 (6.3263) grad_norm 1.9305 (3.3243) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][140/625] eta 0:03:29 lr 0.000067 wd 0.0500 time 0.4015 (0.4311) data time 0.0007 (0.0059) model time 0.4008 (0.4293) loss 7.3854 (6.3378) grad_norm 12.9837 (3.3679) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][150/625] eta 0:03:23 lr 0.000067 wd 0.0500 time 0.4018 (0.4290) data time 0.0009 (0.0056) model time 0.4009 (0.4263) loss 5.3144 (6.3472) grad_norm 2.5479 (3.3417) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][160/625] eta 0:03:18 lr 0.000067 wd 0.0500 time 0.3962 (0.4272) data time 0.0009 (0.0053) model time 0.3953 (0.4238) loss 6.7627 (6.3477) grad_norm 2.1917 (3.3341) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][170/625] eta 0:03:13 lr 0.000067 wd 0.0500 time 0.3991 (0.4258) data time 0.0007 (0.0051) model time 0.3984 (0.4220) loss 6.1985 (6.3454) grad_norm 2.0937 (3.2915) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][180/625] eta 0:03:08 lr 0.000067 wd 0.0500 time 0.4172 (0.4245) data time 0.0007 (0.0048) model time 0.4165 (0.4205) loss 6.5703 (6.3326) grad_norm 6.1737 (3.2685) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][190/625] eta 0:03:04 lr 0.000067 wd 0.0500 time 0.3999 (0.4233) data time 0.0007 (0.0046) model time 0.3991 (0.4190) loss 7.0977 (6.3412) grad_norm 3.4828 (3.2719) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][200/625] eta 0:02:59 lr 0.000067 wd 0.0500 time 0.3948 (0.4221) data time 0.0008 (0.0044) model time 0.3940 (0.4176) loss 6.6986 (6.3393) grad_norm 4.8340 (3.2940) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][210/625] eta 0:02:54 lr 0.000067 wd 0.0500 time 0.4008 (0.4211) data time 0.0006 (0.0043) model time 0.4002 (0.4165) loss 7.0830 (6.3474) grad_norm 3.2806 (3.3431) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][220/625] eta 0:02:50 lr 0.000067 wd 0.0500 time 0.3930 (0.4203) data time 0.0009 (0.0041) model time 0.3921 (0.4157) loss 6.1976 (6.3477) grad_norm 7.0551 (3.3961) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][230/625] eta 0:02:45 lr 0.000067 wd 0.0500 time 0.4018 (0.4195) data time 0.0007 (0.0040) model time 0.4011 (0.4149) loss 5.4486 (6.3453) grad_norm 2.6000 (3.3923) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][240/625] eta 0:02:41 lr 0.000067 wd 0.0500 time 0.4000 (0.4193) data time 0.0007 (0.0038) model time 0.3993 (0.4149) loss 7.3995 (6.3546) grad_norm 2.0196 (3.4093) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][250/625] eta 0:02:37 lr 0.000067 wd 0.0500 time 0.3988 (0.4201) data time 0.0007 (0.0037) model time 0.3981 (0.4161) loss 7.1500 (6.3575) grad_norm 5.3808 (3.4134) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:19:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][260/625] eta 0:02:33 lr 0.000067 wd 0.0500 time 0.4017 (0.4202) data time 0.0009 (0.0036) model time 0.4008 (0.4163) loss 7.9558 (6.3637) grad_norm 2.3718 (3.3851) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][270/625] eta 0:02:29 lr 0.000067 wd 0.0500 time 0.5663 (0.4206) data time 0.0008 (0.0035) model time 0.5655 (0.4170) loss 5.7863 (6.3566) grad_norm 5.4885 (3.3747) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][280/625] eta 0:02:25 lr 0.000067 wd 0.0500 time 0.5836 (0.4227) data time 0.0010 (0.0034) model time 0.5826 (0.4196) loss 6.7882 (6.3660) grad_norm 7.0303 (3.5642) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][290/625] eta 0:02:22 lr 0.000067 wd 0.0500 time 0.5787 (0.4242) data time 0.0009 (0.0033) model time 0.5777 (0.4216) loss 6.0950 (6.3695) grad_norm 2.3240 (3.6035) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][300/625] eta 0:02:18 lr 0.000067 wd 0.0500 time 0.6071 (0.4263) data time 0.0006 (0.0033) model time 0.6065 (0.4241) loss 7.0365 (6.3814) grad_norm 2.9340 (3.6192) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][310/625] eta 0:02:14 lr 0.000067 wd 0.0500 time 0.3990 (0.4271) data time 0.0006 (0.0032) model time 0.3984 (0.4251) loss 7.3267 (6.3901) grad_norm 3.0496 (3.7599) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][320/625] eta 0:02:10 lr 0.000067 wd 0.0500 time 0.3992 (0.4266) data time 0.0007 (0.0031) model time 0.3985 (0.4246) loss 5.6611 (6.3781) grad_norm 2.4241 (3.7455) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][330/625] eta 0:02:05 lr 0.000066 wd 0.0500 time 0.3979 (0.4263) data time 0.0010 (0.0030) model time 0.3969 (0.4243) loss 7.3463 (6.3750) grad_norm 4.2339 (3.7306) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][340/625] eta 0:02:01 lr 0.000066 wd 0.0500 time 0.3944 (0.4255) data time 0.0008 (0.0030) model time 0.3937 (0.4234) loss 5.8497 (6.3728) grad_norm 2.5982 (3.7615) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][350/625] eta 0:01:56 lr 0.000066 wd 0.0500 time 0.4009 (0.4247) data time 0.0008 (0.0029) model time 0.4001 (0.4225) loss 6.1883 (6.3722) grad_norm 3.4302 (3.7505) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][360/625] eta 0:01:52 lr 0.000066 wd 0.0500 time 0.3983 (0.4240) data time 0.0009 (0.0029) model time 0.3974 (0.4217) loss 6.2151 (6.3656) grad_norm 3.6018 (3.7449) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][370/625] eta 0:01:47 lr 0.000066 wd 0.0500 time 0.3928 (0.4233) data time 0.0010 (0.0028) model time 0.3918 (0.4210) loss 6.4654 (6.3692) grad_norm 3.2987 (3.7431) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][380/625] eta 0:01:43 lr 0.000066 wd 0.0500 time 0.3993 (0.4227) data time 0.0007 (0.0027) model time 0.3986 (0.4203) loss 6.8400 (6.3648) grad_norm 3.2686 (3.8836) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][390/625] eta 0:01:39 lr 0.000066 wd 0.0500 time 0.4010 (0.4221) data time 0.0006 (0.0027) model time 0.4003 (0.4197) loss 5.9884 (6.3665) grad_norm 2.3066 (3.8563) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:20:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][400/625] eta 0:01:34 lr 0.000066 wd 0.0500 time 0.3922 (0.4215) data time 0.0007 (0.0027) model time 0.3915 (0.4191) loss 7.0336 (6.3785) grad_norm 2.5287 (3.8298) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][410/625] eta 0:01:30 lr 0.000066 wd 0.0500 time 0.3975 (0.4210) data time 0.0009 (0.0026) model time 0.3966 (0.4185) loss 6.3780 (6.3805) grad_norm 2.8859 (3.8164) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][420/625] eta 0:01:26 lr 0.000066 wd 0.0500 time 0.4013 (0.4206) data time 0.0008 (0.0026) model time 0.4005 (0.4181) loss 6.6196 (6.3819) grad_norm 2.2520 (3.8334) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][430/625] eta 0:01:21 lr 0.000066 wd 0.0500 time 0.4088 (0.4202) data time 0.0008 (0.0025) model time 0.4079 (0.4176) loss 7.2739 (6.3881) grad_norm 2.6428 (3.8117) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][440/625] eta 0:01:17 lr 0.000066 wd 0.0500 time 0.3989 (0.4197) data time 0.0007 (0.0025) model time 0.3982 (0.4172) loss 6.1424 (6.3824) grad_norm 3.1310 (3.8065) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][450/625] eta 0:01:13 lr 0.000066 wd 0.0500 time 0.3986 (0.4193) data time 0.0009 (0.0025) model time 0.3977 (0.4167) loss 6.6339 (6.3860) grad_norm 3.3929 (3.8194) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][460/625] eta 0:01:09 lr 0.000066 wd 0.0500 time 0.3999 (0.4191) data time 0.0007 (0.0024) model time 0.3992 (0.4166) loss 6.4533 (6.3887) grad_norm 2.5139 (3.8062) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][470/625] eta 0:01:05 lr 0.000066 wd 0.0500 time 0.3991 (0.4195) data time 0.0007 (0.0024) model time 0.3984 (0.4170) loss 5.8440 (6.3845) grad_norm 2.0377 (3.7880) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][480/625] eta 0:01:00 lr 0.000066 wd 0.0500 time 0.3990 (0.4194) data time 0.0008 (0.0024) model time 0.3982 (0.4170) loss 6.4410 (6.3911) grad_norm 2.6013 (3.7843) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][490/625] eta 0:00:56 lr 0.000066 wd 0.0500 time 0.3997 (0.4194) data time 0.0009 (0.0023) model time 0.3988 (0.4170) loss 7.8271 (6.3906) grad_norm 2.7035 (3.7679) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][500/625] eta 0:00:52 lr 0.000066 wd 0.0500 time 0.5777 (0.4209) data time 0.0006 (0.0023) model time 0.5772 (0.4187) loss 6.4072 (6.3947) grad_norm 2.1918 (3.7454) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][510/625] eta 0:00:48 lr 0.000066 wd 0.0500 time 0.5935 (0.4214) data time 0.0010 (0.0023) model time 0.5925 (0.4193) loss 6.6956 (6.4032) grad_norm 4.7791 (3.7492) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][520/625] eta 0:00:44 lr 0.000066 wd 0.0500 time 0.5904 (0.4224) data time 0.0009 (0.0023) model time 0.5895 (0.4204) loss 5.8745 (6.3989) grad_norm 1.7548 (3.7297) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][530/625] eta 0:00:40 lr 0.000066 wd 0.0500 time 0.5955 (0.4229) data time 0.0008 (0.0022) model time 0.5947 (0.4209) loss 6.0931 (6.4002) grad_norm 2.1119 (3.7166) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:21:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][540/625] eta 0:00:35 lr 0.000066 wd 0.0500 time 0.3987 (0.4228) data time 0.0007 (0.0022) model time 0.3980 (0.4209) loss 5.7972 (6.4037) grad_norm 4.4391 (3.7129) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][550/625] eta 0:00:31 lr 0.000066 wd 0.0500 time 0.4037 (0.4230) data time 0.0007 (0.0022) model time 0.4031 (0.4211) loss 5.4209 (6.4050) grad_norm 2.3257 (3.6982) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][560/625] eta 0:00:27 lr 0.000065 wd 0.0500 time 0.3998 (0.4226) data time 0.0008 (0.0022) model time 0.3990 (0.4207) loss 6.9391 (6.4063) grad_norm 21.4785 (3.7206) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][570/625] eta 0:00:23 lr 0.000065 wd 0.0500 time 0.4069 (0.4222) data time 0.0009 (0.0021) model time 0.4061 (0.4203) loss 7.4577 (6.4138) grad_norm 2.1370 (3.7173) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][580/625] eta 0:00:18 lr 0.000065 wd 0.0500 time 0.4013 (0.4219) data time 0.0009 (0.0021) model time 0.4003 (0.4200) loss 5.5119 (6.4098) grad_norm 2.7459 (3.6960) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][590/625] eta 0:00:14 lr 0.000065 wd 0.0500 time 0.3991 (0.4216) data time 0.0007 (0.0021) model time 0.3984 (0.4196) loss 6.2394 (6.4098) grad_norm 2.1934 (3.6888) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][600/625] eta 0:00:10 lr 0.000065 wd 0.0500 time 0.4033 (0.4212) data time 0.0007 (0.0021) model time 0.4026 (0.4193) loss 6.6900 (6.4097) grad_norm 3.7532 (3.6804) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][610/625] eta 0:00:06 lr 0.000065 wd 0.0500 time 0.3947 (0.4209) data time 0.0004 (0.0021) model time 0.3943 (0.4189) loss 5.6777 (6.4087) grad_norm 4.1812 (3.9018) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][620/625] eta 0:00:02 lr 0.000065 wd 0.0500 time 0.3947 (0.4205) data time 0.0004 (0.0020) model time 0.3944 (0.4186) loss 5.8259 (6.4143) grad_norm 3.3174 (3.9118) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 261 training takes 0:04:22 [2024-07-25 11:22:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:22:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:22:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.5366 (0.5366) Acc@1 90.479 (90.479) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 11:22:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.8052 (0.6513) Acc@1 82.910 (87.527) Acc@5 97.119 (98.002) Mem 14939MB [2024-07-25 11:22:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9053 (0.7574) Acc@1 79.004 (84.570) Acc@5 95.898 (97.028) Mem 14939MB [2024-07-25 11:22:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.175 Acc@5 96.995 [2024-07-25 11:22:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 11:22:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.18% [2024-07-25 11:22:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 11:22:35 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 11:22:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 3.013 (3.013) Loss 0.5381 (0.5381) Acc@1 90.137 (90.137) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 11:22:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.352) Loss 0.8110 (0.6559) Acc@1 83.203 (87.358) Acc@5 97.119 (97.994) Mem 14939MB [2024-07-25 11:22:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.225) Loss 0.9175 (0.7600) Acc@1 78.564 (84.452) Acc@5 95.703 (97.019) Mem 14939MB [2024-07-25 11:22:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.043 Acc@5 96.977 [2024-07-25 11:22:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-07-25 11:22:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][0/625] eta 0:27:06 lr 0.000065 wd 0.0500 time 2.6017 (2.6017) data time 0.5810 (0.5810) model time 0.0000 (0.0000) loss 6.7467 (6.7467) grad_norm 2.3352 (2.3352) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][10/625] eta 0:06:09 lr 0.000065 wd 0.0500 time 0.4064 (0.6015) data time 0.0006 (0.0536) model time 0.0000 (0.0000) loss 6.3683 (6.5982) grad_norm 2.4538 (2.9798) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][20/625] eta 0:05:06 lr 0.000065 wd 0.0500 time 0.4011 (0.5067) data time 0.0007 (0.0285) model time 0.0000 (0.0000) loss 6.3048 (6.5483) grad_norm 3.1429 (3.7025) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][30/625] eta 0:04:41 lr 0.000065 wd 0.0500 time 0.3991 (0.4727) data time 0.0007 (0.0196) model time 0.0000 (0.0000) loss 6.4143 (6.5284) grad_norm 2.6788 (3.7114) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:22:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][40/625] eta 0:04:26 lr 0.000065 wd 0.0500 time 0.3976 (0.4549) data time 0.0007 (0.0150) model time 0.0000 (0.0000) loss 7.4523 (6.5002) grad_norm 3.1534 (3.7147) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][50/625] eta 0:04:20 lr 0.000065 wd 0.0500 time 0.3958 (0.4536) data time 0.0007 (0.0196) model time 0.0000 (0.0000) loss 7.1131 (6.4604) grad_norm 67.4755 (4.8410) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][60/625] eta 0:04:11 lr 0.000065 wd 0.0500 time 0.3993 (0.4445) data time 0.0007 (0.0165) model time 0.3986 (0.3973) loss 6.8750 (6.4839) grad_norm 2.5077 (4.9862) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][70/625] eta 0:04:06 lr 0.000065 wd 0.0500 time 0.4039 (0.4433) data time 0.0008 (0.0143) model time 0.4031 (0.4162) loss 5.3457 (6.4958) grad_norm 2.1468 (4.7202) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][80/625] eta 0:04:00 lr 0.000065 wd 0.0500 time 0.4017 (0.4409) data time 0.0007 (0.0127) model time 0.4011 (0.4184) loss 5.5813 (6.4913) grad_norm 1.7281 (4.5741) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][90/625] eta 0:03:57 lr 0.000065 wd 0.0500 time 0.6248 (0.4447) data time 0.0011 (0.0114) model time 0.6237 (0.4323) loss 7.6576 (6.4496) grad_norm 2.1926 (4.3989) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][100/625] eta 0:03:53 lr 0.000065 wd 0.0500 time 0.3967 (0.4448) data time 0.0007 (0.0103) model time 0.3960 (0.4350) loss 6.4796 (6.4325) grad_norm 4.5105 (4.3200) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][110/625] eta 0:03:50 lr 0.000065 wd 0.0500 time 0.5582 (0.4472) data time 0.0007 (0.0095) model time 0.5574 (0.4408) loss 6.7771 (6.4066) grad_norm 3.1558 (4.2170) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][120/625] eta 0:03:46 lr 0.000065 wd 0.0500 time 0.4027 (0.4476) data time 0.0007 (0.0088) model time 0.4020 (0.4422) loss 6.0167 (6.3752) grad_norm 3.3947 (4.0957) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][130/625] eta 0:03:40 lr 0.000065 wd 0.0500 time 0.3951 (0.4460) data time 0.0007 (0.0082) model time 0.3943 (0.4403) loss 6.1340 (6.3839) grad_norm 3.0247 (4.1483) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][140/625] eta 0:03:35 lr 0.000065 wd 0.0500 time 0.3970 (0.4442) data time 0.0010 (0.0077) model time 0.3959 (0.4380) loss 6.7949 (6.3934) grad_norm 6.1369 (4.0992) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][150/625] eta 0:03:34 lr 0.000065 wd 0.0500 time 0.3975 (0.4515) data time 0.0008 (0.0072) model time 0.3967 (0.4496) loss 6.9992 (6.4083) grad_norm 2.9072 (4.0401) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][160/625] eta 0:03:28 lr 0.000064 wd 0.0500 time 0.3958 (0.4483) data time 0.0007 (0.0068) model time 0.3951 (0.4450) loss 6.1949 (6.3984) grad_norm 3.2838 (3.9571) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:23:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][170/625] eta 0:03:22 lr 0.000064 wd 0.0500 time 0.3932 (0.4458) data time 0.0007 (0.0065) model time 0.3925 (0.4416) loss 6.1518 (6.4073) grad_norm 2.9797 (3.8934) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][180/625] eta 0:03:17 lr 0.000064 wd 0.0500 time 0.3984 (0.4433) data time 0.0009 (0.0062) model time 0.3975 (0.4383) loss 7.6870 (6.4012) grad_norm 2.4558 (3.8286) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][190/625] eta 0:03:11 lr 0.000064 wd 0.0500 time 0.4021 (0.4410) data time 0.0006 (0.0059) model time 0.4015 (0.4355) loss 7.4244 (6.4177) grad_norm 3.5846 (3.7901) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][200/625] eta 0:03:06 lr 0.000064 wd 0.0500 time 0.4025 (0.4389) data time 0.0011 (0.0056) model time 0.4014 (0.4330) loss 6.0383 (6.4284) grad_norm 3.6292 (3.7915) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][210/625] eta 0:03:01 lr 0.000064 wd 0.0500 time 0.4028 (0.4372) data time 0.0008 (0.0054) model time 0.4021 (0.4310) loss 5.6367 (6.4304) grad_norm 2.0365 (3.7678) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][220/625] eta 0:02:56 lr 0.000064 wd 0.0500 time 0.3993 (0.4364) data time 0.0006 (0.0052) model time 0.3987 (0.4304) loss 5.4687 (6.4190) grad_norm 2.3203 (3.8648) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][230/625] eta 0:02:51 lr 0.000064 wd 0.0500 time 0.4014 (0.4348) data time 0.0009 (0.0050) model time 0.4006 (0.4286) loss 6.3604 (6.4211) grad_norm 3.2656 (3.8332) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][240/625] eta 0:02:46 lr 0.000064 wd 0.0500 time 0.4069 (0.4333) data time 0.0007 (0.0048) model time 0.4063 (0.4269) loss 6.1808 (6.4231) grad_norm 3.3780 (3.8106) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][250/625] eta 0:02:42 lr 0.000064 wd 0.0500 time 0.4002 (0.4320) data time 0.0006 (0.0047) model time 0.3996 (0.4256) loss 6.9251 (6.4302) grad_norm 3.9659 (3.7726) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][260/625] eta 0:02:37 lr 0.000064 wd 0.0500 time 0.3973 (0.4308) data time 0.0009 (0.0045) model time 0.3964 (0.4244) loss 6.2502 (6.4282) grad_norm 2.3360 (3.7469) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][270/625] eta 0:02:32 lr 0.000064 wd 0.0500 time 0.4038 (0.4297) data time 0.0008 (0.0044) model time 0.4030 (0.4233) loss 7.4806 (6.4326) grad_norm 2.7079 (3.7290) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][280/625] eta 0:02:27 lr 0.000064 wd 0.0500 time 0.3920 (0.4286) data time 0.0008 (0.0043) model time 0.3912 (0.4222) loss 5.8906 (6.4222) grad_norm 1.9306 (3.6982) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][290/625] eta 0:02:23 lr 0.000064 wd 0.0500 time 0.4023 (0.4281) data time 0.0009 (0.0041) model time 0.4014 (0.4219) loss 7.2245 (6.4212) grad_norm 2.8147 (3.6764) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][300/625] eta 0:02:19 lr 0.000064 wd 0.0500 time 0.6098 (0.4286) data time 0.0008 (0.0040) model time 0.6090 (0.4227) loss 5.4825 (6.4238) grad_norm 2.5651 (3.6547) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][310/625] eta 0:02:15 lr 0.000064 wd 0.0500 time 0.5759 (0.4289) data time 0.0009 (0.0039) model time 0.5750 (0.4232) loss 6.9755 (6.4379) grad_norm 2.4204 (3.6354) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:24:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][320/625] eta 0:02:11 lr 0.000064 wd 0.0500 time 0.4036 (0.4296) data time 0.0006 (0.0038) model time 0.4030 (0.4242) loss 5.7910 (6.4334) grad_norm 5.8413 (3.6193) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][330/625] eta 0:02:06 lr 0.000064 wd 0.0500 time 0.3962 (0.4304) data time 0.0006 (0.0038) model time 0.3956 (0.4253) loss 6.7296 (6.4289) grad_norm 2.8775 (3.6159) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][340/625] eta 0:02:02 lr 0.000064 wd 0.0500 time 0.3958 (0.4314) data time 0.0007 (0.0037) model time 0.3951 (0.4267) loss 6.7069 (6.4462) grad_norm 5.5568 (3.6189) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][350/625] eta 0:01:58 lr 0.000064 wd 0.0500 time 0.5843 (0.4319) data time 0.0007 (0.0036) model time 0.5836 (0.4273) loss 6.1681 (6.4467) grad_norm 2.7425 (3.6077) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][360/625] eta 0:01:54 lr 0.000064 wd 0.0500 time 0.3989 (0.4314) data time 0.0007 (0.0035) model time 0.3982 (0.4269) loss 6.2330 (6.4421) grad_norm 2.6774 (3.5885) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][370/625] eta 0:01:49 lr 0.000064 wd 0.0500 time 0.4036 (0.4313) data time 0.0009 (0.0034) model time 0.4027 (0.4269) loss 7.1400 (6.4463) grad_norm 1.9819 (3.5773) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][380/625] eta 0:01:45 lr 0.000064 wd 0.0500 time 0.4019 (0.4305) data time 0.0007 (0.0034) model time 0.4012 (0.4261) loss 6.8955 (6.4407) grad_norm 2.3929 (3.5594) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][390/625] eta 0:01:40 lr 0.000063 wd 0.0500 time 0.3973 (0.4298) data time 0.0009 (0.0033) model time 0.3964 (0.4253) loss 7.2569 (6.4471) grad_norm 6.6962 (3.5660) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][400/625] eta 0:01:36 lr 0.000063 wd 0.0500 time 0.4021 (0.4290) data time 0.0006 (0.0032) model time 0.4015 (0.4246) loss 6.4002 (6.4494) grad_norm 3.1435 (3.5596) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][410/625] eta 0:01:32 lr 0.000063 wd 0.0500 time 0.3999 (0.4283) data time 0.0007 (0.0032) model time 0.3992 (0.4238) loss 5.8065 (6.4536) grad_norm 1.8939 (3.5488) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][420/625] eta 0:01:27 lr 0.000063 wd 0.0500 time 0.3960 (0.4276) data time 0.0009 (0.0031) model time 0.3951 (0.4231) loss 7.1568 (6.4555) grad_norm 3.0941 (3.5392) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][430/625] eta 0:01:23 lr 0.000063 wd 0.0500 time 0.3985 (0.4269) data time 0.0008 (0.0031) model time 0.3976 (0.4225) loss 6.5471 (6.4536) grad_norm 5.5337 (3.5740) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][440/625] eta 0:01:18 lr 0.000063 wd 0.0500 time 0.3973 (0.4267) data time 0.0006 (0.0030) model time 0.3967 (0.4223) loss 6.3411 (6.4514) grad_norm 3.9927 (3.5735) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][450/625] eta 0:01:14 lr 0.000063 wd 0.0500 time 0.3977 (0.4261) data time 0.0008 (0.0030) model time 0.3969 (0.4217) loss 6.6819 (6.4510) grad_norm 2.5493 (3.5669) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:25:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][460/625] eta 0:01:10 lr 0.000063 wd 0.0500 time 0.3997 (0.4256) data time 0.0006 (0.0029) model time 0.3991 (0.4212) loss 5.7141 (6.4540) grad_norm 3.5370 (3.5559) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][470/625] eta 0:01:05 lr 0.000063 wd 0.0500 time 0.4124 (0.4251) data time 0.0006 (0.0029) model time 0.4118 (0.4207) loss 6.5562 (6.4550) grad_norm 2.2776 (3.6199) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][480/625] eta 0:01:01 lr 0.000063 wd 0.0500 time 0.3973 (0.4245) data time 0.0008 (0.0028) model time 0.3965 (0.4202) loss 6.6542 (6.4486) grad_norm 2.4307 (3.5983) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][490/625] eta 0:00:57 lr 0.000063 wd 0.0500 time 0.3975 (0.4240) data time 0.0007 (0.0028) model time 0.3969 (0.4198) loss 5.6369 (6.4467) grad_norm 2.9570 (3.5859) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][500/625] eta 0:00:52 lr 0.000063 wd 0.0500 time 0.3959 (0.4235) data time 0.0009 (0.0028) model time 0.3950 (0.4193) loss 6.2619 (6.4443) grad_norm 2.6463 (3.6067) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][510/625] eta 0:00:48 lr 0.000063 wd 0.0500 time 0.3987 (0.4234) data time 0.0006 (0.0027) model time 0.3981 (0.4191) loss 6.3072 (6.4485) grad_norm 4.6561 (3.5969) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][520/625] eta 0:00:44 lr 0.000063 wd 0.0500 time 0.3979 (0.4232) data time 0.0008 (0.0027) model time 0.3970 (0.4191) loss 6.2821 (6.4414) grad_norm 2.2626 (3.5854) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][530/625] eta 0:00:40 lr 0.000063 wd 0.0500 time 0.3942 (0.4238) data time 0.0008 (0.0027) model time 0.3933 (0.4198) loss 6.1109 (6.4446) grad_norm 3.0236 (3.5832) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][540/625] eta 0:00:36 lr 0.000063 wd 0.0500 time 0.3996 (0.4244) data time 0.0007 (0.0026) model time 0.3989 (0.4205) loss 5.5900 (6.4477) grad_norm 2.8198 (3.5816) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][550/625] eta 0:00:31 lr 0.000063 wd 0.0500 time 0.3925 (0.4252) data time 0.0006 (0.0026) model time 0.3918 (0.4214) loss 5.1650 (6.4397) grad_norm 4.2242 (3.5878) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][560/625] eta 0:00:27 lr 0.000063 wd 0.0500 time 0.3955 (0.4261) data time 0.0008 (0.0026) model time 0.3947 (0.4225) loss 6.7242 (6.4345) grad_norm 7.9553 (3.5903) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][570/625] eta 0:00:23 lr 0.000063 wd 0.0500 time 0.3964 (0.4260) data time 0.0006 (0.0025) model time 0.3958 (0.4225) loss 6.3187 (6.4305) grad_norm 3.3741 (3.6019) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][580/625] eta 0:00:19 lr 0.000063 wd 0.0500 time 0.3997 (0.4262) data time 0.0009 (0.0025) model time 0.3988 (0.4227) loss 6.9011 (6.4275) grad_norm 2.3065 (3.5957) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][590/625] eta 0:00:14 lr 0.000063 wd 0.0500 time 0.3940 (0.4262) data time 0.0008 (0.0025) model time 0.3931 (0.4228) loss 7.8646 (6.4322) grad_norm 2.5038 (3.5800) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:26:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][600/625] eta 0:00:10 lr 0.000063 wd 0.0500 time 0.3993 (0.4258) data time 0.0007 (0.0024) model time 0.3986 (0.4223) loss 6.1378 (6.4347) grad_norm 2.3694 (3.5671) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][610/625] eta 0:00:06 lr 0.000063 wd 0.0500 time 0.4022 (0.4253) data time 0.0005 (0.0024) model time 0.4017 (0.4219) loss 6.9054 (6.4354) grad_norm 2.4753 (3.5672) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][620/625] eta 0:00:02 lr 0.000062 wd 0.0500 time 0.3933 (0.4249) data time 0.0005 (0.0024) model time 0.3929 (0.4215) loss 6.7020 (6.4333) grad_norm 3.1738 (3.5691) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 262 training takes 0:04:25 [2024-07-25 11:27:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:27:06 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:27:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5532 (0.5532) Acc@1 90.186 (90.186) Acc@5 98.779 (98.779) Mem 14939MB [2024-07-25 11:27:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.122) Loss 0.8101 (0.6639) Acc@1 83.057 (87.522) Acc@5 97.021 (97.954) Mem 14939MB [2024-07-25 11:27:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9180 (0.7664) Acc@1 79.053 (84.508) Acc@5 95.898 (97.028) Mem 14939MB [2024-07-25 11:27:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.115 Acc@5 96.997 [2024-07-25 11:27:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:27:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.777 (0.777) Loss 0.5381 (0.5381) Acc@1 90.088 (90.088) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 11:27:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.158) Loss 0.8110 (0.6559) Acc@1 83.252 (87.371) Acc@5 97.070 (97.998) Mem 14939MB [2024-07-25 11:27:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9175 (0.7599) Acc@1 78.564 (84.461) Acc@5 95.752 (97.033) Mem 14939MB [2024-07-25 11:27:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.061 Acc@5 96.991 [2024-07-25 11:27:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:27:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.06% [2024-07-25 11:27:12 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:27:13 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:27:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][0/625] eta 0:07:24 lr 0.000062 wd 0.0500 time 0.7118 (0.7118) data time 0.3340 (0.3340) model time 0.0000 (0.0000) loss 6.9324 (6.9324) grad_norm 1.9258 (1.9258) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][10/625] eta 0:04:23 lr 0.000062 wd 0.0500 time 0.3951 (0.4292) data time 0.0007 (0.0311) model time 0.0000 (0.0000) loss 6.4101 (6.7648) grad_norm 2.9324 (3.5901) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][20/625] eta 0:04:10 lr 0.000062 wd 0.0500 time 0.3958 (0.4142) data time 0.0008 (0.0167) model time 0.0000 (0.0000) loss 5.5620 (6.6044) grad_norm 3.7424 (3.1599) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][30/625] eta 0:04:03 lr 0.000062 wd 0.0500 time 0.3984 (0.4097) data time 0.0009 (0.0116) model time 0.0000 (0.0000) loss 6.9663 (6.5554) grad_norm 2.2639 (3.1219) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][40/625] eta 0:03:58 lr 0.000062 wd 0.0500 time 0.3990 (0.4072) data time 0.0008 (0.0089) model time 0.0000 (0.0000) loss 6.3649 (6.5112) grad_norm 3.1301 (3.8461) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][50/625] eta 0:03:53 lr 0.000062 wd 0.0500 time 0.3973 (0.4059) data time 0.0007 (0.0073) model time 0.0000 (0.0000) loss 5.2197 (6.4189) grad_norm 2.4689 (3.6785) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][60/625] eta 0:03:48 lr 0.000062 wd 0.0500 time 0.3987 (0.4050) data time 0.0009 (0.0063) model time 0.3978 (0.3995) loss 7.0239 (6.4305) grad_norm 2.6290 (3.5590) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][70/625] eta 0:03:44 lr 0.000062 wd 0.0500 time 0.3958 (0.4041) data time 0.0011 (0.0055) model time 0.3948 (0.3985) loss 7.0052 (6.4479) grad_norm 3.9045 (3.6469) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][80/625] eta 0:03:39 lr 0.000062 wd 0.0500 time 0.3976 (0.4036) data time 0.0007 (0.0049) model time 0.3970 (0.3989) loss 6.6300 (6.4518) grad_norm 2.8658 (3.8239) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][90/625] eta 0:03:35 lr 0.000062 wd 0.0500 time 0.3981 (0.4032) data time 0.0006 (0.0045) model time 0.3975 (0.3989) loss 6.1583 (6.3734) grad_norm 3.0367 (3.8757) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][100/625] eta 0:03:32 lr 0.000062 wd 0.0500 time 0.4000 (0.4039) data time 0.0009 (0.0041) model time 0.3991 (0.4010) loss 7.0210 (6.3840) grad_norm 3.6028 (3.9365) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:27:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][110/625] eta 0:03:29 lr 0.000062 wd 0.0500 time 0.3954 (0.4067) data time 0.0008 (0.0038) model time 0.3946 (0.4066) loss 6.5120 (6.4076) grad_norm 2.8205 (3.9175) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:28:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][120/625] eta 0:03:26 lr 0.000062 wd 0.0500 time 0.5718 (0.4092) data time 0.0009 (0.0036) model time 0.5710 (0.4107) loss 5.9142 (6.4224) grad_norm 3.5910 (3.9244) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:28:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][130/625] eta 0:03:23 lr 0.000062 wd 0.0500 time 0.3973 (0.4119) data time 0.0008 (0.0034) model time 0.3965 (0.4149) loss 6.3638 (6.4291) grad_norm 3.8320 (3.9629) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:28:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][140/625] eta 0:03:20 lr 0.000062 wd 0.0500 time 0.4005 (0.4135) data time 0.0009 (0.0032) model time 0.3996 (0.4169) loss 7.0480 (6.4475) grad_norm 2.9112 (4.0006) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 11:28:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][150/625] eta 0:03:18 lr 0.000062 wd 0.0500 time 0.6069 (0.4174) data time 0.0006 (0.0030) model time 0.6063 (0.4225) loss 6.7480 (6.4508) grad_norm 2.8340 (4.1058) loss_scale 128.0000 (65.2715) mem 14939MB [2024-07-25 11:28:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][160/625] eta 0:03:15 lr 0.000062 wd 0.0500 time 0.3976 (0.4202) data time 0.0006 (0.0029) model time 0.3970 (0.4260) loss 6.4479 (6.4294) grad_norm 3.2364 (4.0574) loss_scale 128.0000 (69.1677) mem 14939MB [2024-07-25 11:28:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][170/625] eta 0:03:11 lr 0.000062 wd 0.0500 time 0.3946 (0.4209) data time 0.0009 (0.0028) model time 0.3937 (0.4264) loss 6.7774 (6.4248) grad_norm 3.0061 (3.9845) loss_scale 128.0000 (72.6082) mem 14939MB [2024-07-25 11:28:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][180/625] eta 0:03:08 lr 0.000062 wd 0.0500 time 0.3975 (0.4226) data time 0.0008 (0.0027) model time 0.3966 (0.4283) loss 5.8852 (6.4188) grad_norm 2.9774 (4.0982) loss_scale 128.0000 (75.6685) mem 14939MB [2024-07-25 11:28:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][190/625] eta 0:03:03 lr 0.000062 wd 0.0500 time 0.3995 (0.4213) data time 0.0008 (0.0026) model time 0.3987 (0.4260) loss 6.1708 (6.4234) grad_norm 2.5577 (4.2224) loss_scale 128.0000 (78.4084) mem 14939MB [2024-07-25 11:28:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][200/625] eta 0:02:58 lr 0.000062 wd 0.0500 time 0.3973 (0.4211) data time 0.0008 (0.0025) model time 0.3965 (0.4255) loss 5.7826 (6.4366) grad_norm 3.9880 (4.1814) loss_scale 128.0000 (80.8756) mem 14939MB [2024-07-25 11:28:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][210/625] eta 0:02:54 lr 0.000062 wd 0.0500 time 0.3986 (0.4202) data time 0.0008 (0.0024) model time 0.3978 (0.4239) loss 6.3236 (6.4160) grad_norm 2.9569 (4.1447) loss_scale 128.0000 (83.1090) mem 14939MB [2024-07-25 11:28:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][220/625] eta 0:02:49 lr 0.000062 wd 0.0500 time 0.3967 (0.4191) data time 0.0008 (0.0023) model time 0.3959 (0.4222) loss 6.4993 (6.4045) grad_norm 3.1016 (4.0818) loss_scale 128.0000 (85.1403) mem 14939MB [2024-07-25 11:28:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][230/625] eta 0:02:45 lr 0.000061 wd 0.0500 time 0.3967 (0.4182) data time 0.0008 (0.0023) model time 0.3959 (0.4208) loss 5.6478 (6.3983) grad_norm 2.9563 (4.0252) loss_scale 128.0000 (86.9957) mem 14939MB [2024-07-25 11:28:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][240/625] eta 0:02:40 lr 0.000061 wd 0.0500 time 0.3987 (0.4173) data time 0.0009 (0.0022) model time 0.3979 (0.4196) loss 6.0641 (6.3899) grad_norm 3.1943 (3.9639) loss_scale 128.0000 (88.6971) mem 14939MB [2024-07-25 11:28:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][250/625] eta 0:02:36 lr 0.000061 wd 0.0500 time 0.3957 (0.4166) data time 0.0008 (0.0022) model time 0.3949 (0.4185) loss 6.5203 (6.3951) grad_norm 2.2466 (3.9209) loss_scale 128.0000 (90.2629) mem 14939MB [2024-07-25 11:29:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][260/625] eta 0:02:31 lr 0.000061 wd 0.0500 time 0.4009 (0.4160) data time 0.0006 (0.0021) model time 0.4002 (0.4175) loss 7.1772 (6.3950) grad_norm 4.2902 (3.9075) loss_scale 128.0000 (91.7088) mem 14939MB [2024-07-25 11:29:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][270/625] eta 0:02:27 lr 0.000061 wd 0.0500 time 0.3991 (0.4154) data time 0.0009 (0.0021) model time 0.3982 (0.4168) loss 5.9276 (6.3906) grad_norm 3.3659 (3.9053) loss_scale 128.0000 (93.0480) mem 14939MB [2024-07-25 11:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][280/625] eta 0:02:23 lr 0.000061 wd 0.0500 time 0.4026 (0.4148) data time 0.0009 (0.0020) model time 0.4017 (0.4160) loss 6.2435 (6.3866) grad_norm 2.4502 (3.9165) loss_scale 128.0000 (94.2918) mem 14939MB [2024-07-25 11:29:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][290/625] eta 0:02:18 lr 0.000061 wd 0.0500 time 0.3994 (0.4143) data time 0.0008 (0.0020) model time 0.3985 (0.4152) loss 5.8066 (6.3905) grad_norm 2.2493 (3.8931) loss_scale 128.0000 (95.4502) mem 14939MB [2024-07-25 11:29:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][300/625] eta 0:02:14 lr 0.000061 wd 0.0500 time 0.4005 (0.4139) data time 0.0007 (0.0019) model time 0.3998 (0.4147) loss 6.7201 (6.3868) grad_norm 4.6768 (3.8936) loss_scale 128.0000 (96.5316) mem 14939MB [2024-07-25 11:29:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][310/625] eta 0:02:10 lr 0.000061 wd 0.0500 time 0.3978 (0.4134) data time 0.0008 (0.0019) model time 0.3970 (0.4140) loss 6.3399 (6.3911) grad_norm 5.8559 (3.8771) loss_scale 128.0000 (97.5434) mem 14939MB [2024-07-25 11:29:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][320/625] eta 0:02:06 lr 0.000061 wd 0.0500 time 0.5477 (0.4135) data time 0.0009 (0.0019) model time 0.5468 (0.4140) loss 5.8745 (6.3895) grad_norm 3.0805 (3.8579) loss_scale 128.0000 (98.4922) mem 14939MB [2024-07-25 11:29:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][330/625] eta 0:02:02 lr 0.000061 wd 0.0500 time 0.3994 (0.4136) data time 0.0009 (0.0018) model time 0.3985 (0.4141) loss 6.6527 (6.3873) grad_norm 3.6889 (3.8527) loss_scale 128.0000 (99.3837) mem 14939MB [2024-07-25 11:29:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][340/625] eta 0:01:57 lr 0.000061 wd 0.0500 time 0.3957 (0.4136) data time 0.0009 (0.0018) model time 0.3948 (0.4141) loss 6.1971 (6.3928) grad_norm 5.3545 (3.8722) loss_scale 128.0000 (100.2229) mem 14939MB [2024-07-25 11:29:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][350/625] eta 0:01:54 lr 0.000061 wd 0.0500 time 0.5802 (0.4152) data time 0.0009 (0.0018) model time 0.5793 (0.4159) loss 6.7348 (6.3941) grad_norm 2.3925 (3.8709) loss_scale 128.0000 (101.0142) mem 14939MB [2024-07-25 11:29:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][360/625] eta 0:01:50 lr 0.000061 wd 0.0500 time 0.5856 (0.4163) data time 0.0007 (0.0018) model time 0.5850 (0.4172) loss 5.8870 (6.3929) grad_norm 2.6265 (3.8456) loss_scale 128.0000 (101.7618) mem 14939MB [2024-07-25 11:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][370/625] eta 0:01:46 lr 0.000061 wd 0.0500 time 0.6055 (0.4184) data time 0.0009 (0.0017) model time 0.6046 (0.4195) loss 5.8440 (6.4081) grad_norm 2.9998 (3.8244) loss_scale 128.0000 (102.4690) mem 14939MB [2024-07-25 11:29:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][380/625] eta 0:01:42 lr 0.000061 wd 0.0500 time 0.5118 (0.4195) data time 0.0007 (0.0017) model time 0.5111 (0.4208) loss 5.4161 (6.4057) grad_norm 2.4092 (3.8039) loss_scale 128.0000 (103.1391) mem 14939MB [2024-07-25 11:29:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][390/625] eta 0:01:38 lr 0.000061 wd 0.0500 time 0.3967 (0.4194) data time 0.0009 (0.0017) model time 0.3959 (0.4205) loss 7.5034 (6.4092) grad_norm 3.7403 (3.7873) loss_scale 128.0000 (103.7749) mem 14939MB [2024-07-25 11:30:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][400/625] eta 0:01:34 lr 0.000061 wd 0.0500 time 0.4001 (0.4197) data time 0.0007 (0.0017) model time 0.3994 (0.4209) loss 5.4989 (6.4035) grad_norm 3.8043 (3.7696) loss_scale 128.0000 (104.3791) mem 14939MB [2024-07-25 11:30:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][410/625] eta 0:01:30 lr 0.000061 wd 0.0500 time 0.3986 (0.4192) data time 0.0007 (0.0017) model time 0.3979 (0.4202) loss 5.9344 (6.4072) grad_norm 2.3946 (3.7425) loss_scale 128.0000 (104.9538) mem 14939MB [2024-07-25 11:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][420/625] eta 0:01:25 lr 0.000061 wd 0.0500 time 0.3992 (0.4191) data time 0.0009 (0.0016) model time 0.3983 (0.4201) loss 6.5002 (6.4102) grad_norm 2.6300 (3.7214) loss_scale 128.0000 (105.5012) mem 14939MB [2024-07-25 11:30:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][430/625] eta 0:01:21 lr 0.000061 wd 0.0500 time 0.4068 (0.4187) data time 0.0009 (0.0016) model time 0.4059 (0.4195) loss 6.4030 (6.4039) grad_norm 3.1755 (3.7118) loss_scale 128.0000 (106.0232) mem 14939MB [2024-07-25 11:30:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][440/625] eta 0:01:17 lr 0.000061 wd 0.0500 time 0.4000 (0.4182) data time 0.0006 (0.0016) model time 0.3994 (0.4190) loss 5.5595 (6.4027) grad_norm 3.0097 (3.7009) loss_scale 128.0000 (106.5215) mem 14939MB [2024-07-25 11:30:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][450/625] eta 0:01:13 lr 0.000061 wd 0.0500 time 0.3982 (0.4178) data time 0.0008 (0.0016) model time 0.3974 (0.4184) loss 7.2021 (6.4044) grad_norm 2.8730 (3.7057) loss_scale 128.0000 (106.9978) mem 14939MB [2024-07-25 11:30:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][460/625] eta 0:01:08 lr 0.000060 wd 0.0500 time 0.3943 (0.4174) data time 0.0007 (0.0016) model time 0.3936 (0.4179) loss 5.7953 (6.4061) grad_norm 2.2205 (3.6931) loss_scale 128.0000 (107.4534) mem 14939MB [2024-07-25 11:30:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][470/625] eta 0:01:04 lr 0.000060 wd 0.0500 time 0.3960 (0.4169) data time 0.0007 (0.0016) model time 0.3953 (0.4174) loss 6.6099 (6.4011) grad_norm 2.4052 (3.6800) loss_scale 128.0000 (107.8896) mem 14939MB [2024-07-25 11:30:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][480/625] eta 0:01:00 lr 0.000060 wd 0.0500 time 0.3995 (0.4166) data time 0.0010 (0.0015) model time 0.3985 (0.4170) loss 6.9557 (6.3934) grad_norm 3.3480 (3.7582) loss_scale 128.0000 (108.3077) mem 14939MB [2024-07-25 11:30:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][490/625] eta 0:00:56 lr 0.000060 wd 0.0500 time 0.3991 (0.4162) data time 0.0009 (0.0015) model time 0.3982 (0.4165) loss 5.9871 (6.3918) grad_norm 2.4755 (3.7390) loss_scale 128.0000 (108.7088) mem 14939MB [2024-07-25 11:30:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][500/625] eta 0:00:51 lr 0.000060 wd 0.0500 time 0.3959 (0.4158) data time 0.0007 (0.0015) model time 0.3953 (0.4161) loss 7.3189 (6.3981) grad_norm 13.2010 (3.7483) loss_scale 128.0000 (109.0938) mem 14939MB [2024-07-25 11:30:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][510/625] eta 0:00:47 lr 0.000060 wd 0.0500 time 0.3965 (0.4155) data time 0.0007 (0.0015) model time 0.3957 (0.4157) loss 6.5596 (6.4030) grad_norm 2.0735 (3.7285) loss_scale 128.0000 (109.4638) mem 14939MB [2024-07-25 11:30:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][520/625] eta 0:00:43 lr 0.000060 wd 0.0500 time 0.3993 (0.4151) data time 0.0006 (0.0015) model time 0.3987 (0.4153) loss 5.6627 (6.3985) grad_norm 3.4579 (3.7622) loss_scale 128.0000 (109.8196) mem 14939MB [2024-07-25 11:30:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][530/625] eta 0:00:39 lr 0.000060 wd 0.0500 time 0.3965 (0.4148) data time 0.0009 (0.0015) model time 0.3956 (0.4149) loss 6.2585 (6.3994) grad_norm 3.1770 (3.7524) loss_scale 128.0000 (110.1620) mem 14939MB [2024-07-25 11:30:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][540/625] eta 0:00:35 lr 0.000060 wd 0.0500 time 0.5793 (0.4149) data time 0.0008 (0.0015) model time 0.5785 (0.4150) loss 5.6843 (6.3986) grad_norm 3.3884 (3.7407) loss_scale 128.0000 (110.4917) mem 14939MB [2024-07-25 11:31:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][550/625] eta 0:00:31 lr 0.000060 wd 0.0500 time 0.3876 (0.4153) data time 0.0010 (0.0015) model time 0.3867 (0.4154) loss 7.3319 (6.4007) grad_norm 2.7821 (3.7464) loss_scale 128.0000 (110.8094) mem 14939MB [2024-07-25 11:31:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][560/625] eta 0:00:26 lr 0.000060 wd 0.0500 time 0.4063 (0.4152) data time 0.0008 (0.0015) model time 0.4054 (0.4153) loss 6.4790 (6.3975) grad_norm 2.3389 (3.7364) loss_scale 128.0000 (111.1159) mem 14939MB [2024-07-25 11:31:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][570/625] eta 0:00:22 lr 0.000060 wd 0.0500 time 0.6048 (0.4160) data time 0.0007 (0.0014) model time 0.6042 (0.4161) loss 6.6052 (6.3987) grad_norm 1.7516 (3.7198) loss_scale 128.0000 (111.4116) mem 14939MB [2024-07-25 11:31:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][580/625] eta 0:00:18 lr 0.000060 wd 0.0500 time 0.5839 (0.4170) data time 0.0010 (0.0014) model time 0.5829 (0.4172) loss 5.4852 (6.3932) grad_norm 2.8984 (3.7050) loss_scale 128.0000 (111.6971) mem 14939MB [2024-07-25 11:31:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][590/625] eta 0:00:14 lr 0.000060 wd 0.0500 time 0.3949 (0.4179) data time 0.0009 (0.0014) model time 0.3940 (0.4182) loss 5.2559 (6.3967) grad_norm 2.8668 (3.6951) loss_scale 128.0000 (111.9729) mem 14939MB [2024-07-25 11:31:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][600/625] eta 0:00:10 lr 0.000060 wd 0.0500 time 0.5580 (0.4184) data time 0.0009 (0.0014) model time 0.5571 (0.4187) loss 6.9173 (6.3998) grad_norm 3.4902 (3.6934) loss_scale 128.0000 (112.2396) mem 14939MB [2024-07-25 11:31:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][610/625] eta 0:00:06 lr 0.000060 wd 0.0500 time 0.3944 (0.4185) data time 0.0007 (0.0014) model time 0.3937 (0.4188) loss 6.0795 (6.4027) grad_norm 4.6128 (3.6922) loss_scale 128.0000 (112.4975) mem 14939MB [2024-07-25 11:31:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][620/625] eta 0:00:02 lr 0.000060 wd 0.0500 time 0.3989 (0.4186) data time 0.0004 (0.0014) model time 0.3985 (0.4189) loss 6.1914 (6.4058) grad_norm 3.3532 (3.6787) loss_scale 128.0000 (112.7472) mem 14939MB [2024-07-25 11:31:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 263 training takes 0:04:21 [2024-07-25 11:31:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:31:36 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:31:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5508 (0.5508) Acc@1 89.990 (89.990) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 11:31:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.119) Loss 0.8154 (0.6615) Acc@1 82.812 (87.393) Acc@5 97.168 (98.011) Mem 14939MB [2024-07-25 11:31:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9131 (0.7632) Acc@1 79.248 (84.515) Acc@5 95.996 (97.061) Mem 14939MB [2024-07-25 11:31:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.167 Acc@5 97.031 [2024-07-25 11:31:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 11:31:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.825 (0.825) Loss 0.5396 (0.5396) Acc@1 89.990 (89.990) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 11:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.157) Loss 0.8110 (0.6560) Acc@1 83.154 (87.371) Acc@5 97.168 (97.998) Mem 14939MB [2024-07-25 11:31:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9155 (0.7599) Acc@1 78.564 (84.452) Acc@5 95.752 (97.026) Mem 14939MB [2024-07-25 11:31:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.059 Acc@5 96.985 [2024-07-25 11:31:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:31:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][0/625] eta 0:13:25 lr 0.000060 wd 0.0500 time 1.2886 (1.2886) data time 0.4014 (0.4014) model time 0.0000 (0.0000) loss 5.8691 (5.8691) grad_norm 4.6285 (4.6285) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:31:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][10/625] eta 0:04:54 lr 0.000060 wd 0.0500 time 0.3946 (0.4794) data time 0.0007 (0.0374) model time 0.0000 (0.0000) loss 4.9183 (6.3164) grad_norm 2.8443 (4.9933) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:31:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][20/625] eta 0:04:27 lr 0.000060 wd 0.0500 time 0.3967 (0.4417) data time 0.0006 (0.0200) model time 0.0000 (0.0000) loss 7.2159 (6.3378) grad_norm 2.4384 (4.0549) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:31:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][30/625] eta 0:04:14 lr 0.000060 wd 0.0500 time 0.3986 (0.4281) data time 0.0008 (0.0138) model time 0.0000 (0.0000) loss 5.3922 (6.1775) grad_norm 3.8842 (3.7142) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:31:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][40/625] eta 0:04:06 lr 0.000060 wd 0.0500 time 0.3996 (0.4213) data time 0.0007 (0.0106) model time 0.0000 (0.0000) loss 6.5678 (6.1920) grad_norm 2.4766 (3.7132) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][50/625] eta 0:03:59 lr 0.000060 wd 0.0500 time 0.3983 (0.4170) data time 0.0009 (0.0087) model time 0.0000 (0.0000) loss 5.9915 (6.2370) grad_norm 3.5010 (3.6016) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][60/625] eta 0:03:53 lr 0.000060 wd 0.0500 time 0.3990 (0.4141) data time 0.0006 (0.0074) model time 0.3984 (0.3984) loss 5.2491 (6.2881) grad_norm 2.9427 (3.5123) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][70/625] eta 0:03:48 lr 0.000060 wd 0.0500 time 0.3956 (0.4119) data time 0.0007 (0.0065) model time 0.3948 (0.3978) loss 7.2748 (6.3228) grad_norm 8.6584 (3.4966) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][80/625] eta 0:03:43 lr 0.000059 wd 0.0500 time 0.4006 (0.4102) data time 0.0006 (0.0058) model time 0.3999 (0.3977) loss 5.6287 (6.3492) grad_norm 2.2900 (3.4230) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][90/625] eta 0:03:38 lr 0.000059 wd 0.0500 time 0.4081 (0.4090) data time 0.0009 (0.0053) model time 0.4073 (0.3978) loss 6.1991 (6.3652) grad_norm 3.2839 (3.3988) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][100/625] eta 0:03:34 lr 0.000059 wd 0.0500 time 0.4002 (0.4078) data time 0.0006 (0.0048) model time 0.3996 (0.3976) loss 6.8813 (6.3896) grad_norm 2.9013 (3.3682) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][110/625] eta 0:03:29 lr 0.000059 wd 0.0500 time 0.4011 (0.4070) data time 0.0006 (0.0045) model time 0.4005 (0.3975) loss 7.0597 (6.4236) grad_norm 3.1518 (3.3736) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][120/625] eta 0:03:25 lr 0.000059 wd 0.0500 time 0.4005 (0.4065) data time 0.0007 (0.0042) model time 0.3998 (0.3979) loss 6.4602 (6.4351) grad_norm 2.1763 (3.3786) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][130/625] eta 0:03:20 lr 0.000059 wd 0.0500 time 0.3941 (0.4059) data time 0.0008 (0.0039) model time 0.3933 (0.3980) loss 7.2816 (6.4412) grad_norm 5.3969 (3.4522) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][140/625] eta 0:03:18 lr 0.000059 wd 0.0500 time 0.5753 (0.4095) data time 0.0007 (0.0037) model time 0.5746 (0.4044) loss 7.0401 (6.4434) grad_norm 2.9401 (3.4518) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][150/625] eta 0:03:14 lr 0.000059 wd 0.0500 time 0.3961 (0.4088) data time 0.0009 (0.0035) model time 0.3952 (0.4037) loss 7.4433 (6.4505) grad_norm 2.9224 (3.4060) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][160/625] eta 0:03:11 lr 0.000059 wd 0.0500 time 0.5415 (0.4110) data time 0.0007 (0.0034) model time 0.5408 (0.4073) loss 7.1948 (6.4503) grad_norm 18.3490 (3.4807) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][170/625] eta 0:03:08 lr 0.000059 wd 0.0500 time 0.3970 (0.4135) data time 0.0007 (0.0032) model time 0.3962 (0.4111) loss 7.3502 (6.4438) grad_norm 2.6049 (3.4362) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:32:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][180/625] eta 0:03:06 lr 0.000059 wd 0.0500 time 0.5924 (0.4197) data time 0.0009 (0.0031) model time 0.5915 (0.4199) loss 6.1133 (6.4385) grad_norm 2.9835 (3.4160) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][190/625] eta 0:03:04 lr 0.000059 wd 0.0500 time 0.3945 (0.4233) data time 0.0007 (0.0030) model time 0.3937 (0.4248) loss 6.3935 (6.4528) grad_norm 2.8237 (3.3808) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][200/625] eta 0:03:00 lr 0.000059 wd 0.0500 time 0.4017 (0.4247) data time 0.0009 (0.0029) model time 0.4008 (0.4264) loss 6.4710 (6.4478) grad_norm 2.2638 (3.3547) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][210/625] eta 0:02:56 lr 0.000059 wd 0.0500 time 0.3977 (0.4247) data time 0.0006 (0.0028) model time 0.3971 (0.4263) loss 5.5529 (6.4353) grad_norm 3.0866 (3.3458) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][220/625] eta 0:02:52 lr 0.000059 wd 0.0500 time 0.3990 (0.4250) data time 0.0008 (0.0027) model time 0.3982 (0.4266) loss 6.5002 (6.4370) grad_norm 2.6010 (3.3561) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][230/625] eta 0:02:47 lr 0.000059 wd 0.0500 time 0.3948 (0.4238) data time 0.0008 (0.0026) model time 0.3939 (0.4249) loss 5.6518 (6.4267) grad_norm 2.9113 (3.4855) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][240/625] eta 0:02:42 lr 0.000059 wd 0.0500 time 0.3965 (0.4228) data time 0.0007 (0.0025) model time 0.3958 (0.4235) loss 6.9331 (6.4344) grad_norm 5.5204 (3.4869) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][250/625] eta 0:02:38 lr 0.000059 wd 0.0500 time 0.4012 (0.4218) data time 0.0008 (0.0025) model time 0.4003 (0.4222) loss 7.0958 (6.4410) grad_norm 3.1572 (3.5011) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][260/625] eta 0:02:33 lr 0.000059 wd 0.0500 time 0.3954 (0.4209) data time 0.0007 (0.0024) model time 0.3947 (0.4210) loss 6.7408 (6.4463) grad_norm 3.2502 (3.5632) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][270/625] eta 0:02:29 lr 0.000059 wd 0.0500 time 0.4003 (0.4201) data time 0.0008 (0.0023) model time 0.3994 (0.4199) loss 7.6588 (6.4536) grad_norm 2.2756 (3.5503) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][280/625] eta 0:02:24 lr 0.000059 wd 0.0500 time 0.3995 (0.4194) data time 0.0009 (0.0023) model time 0.3986 (0.4190) loss 7.5502 (6.4688) grad_norm 5.6315 (3.5436) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][290/625] eta 0:02:20 lr 0.000059 wd 0.0500 time 0.4137 (0.4188) data time 0.0006 (0.0022) model time 0.4130 (0.4183) loss 5.5962 (6.4683) grad_norm 2.3446 (3.5021) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][300/625] eta 0:02:15 lr 0.000059 wd 0.0500 time 0.3968 (0.4182) data time 0.0008 (0.0022) model time 0.3960 (0.4175) loss 7.1025 (6.4769) grad_norm 3.2841 (3.5185) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][310/625] eta 0:02:11 lr 0.000059 wd 0.0500 time 0.4030 (0.4176) data time 0.0009 (0.0022) model time 0.4021 (0.4168) loss 6.3237 (6.4667) grad_norm 3.4701 (3.5063) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][320/625] eta 0:02:07 lr 0.000058 wd 0.0500 time 0.3980 (0.4170) data time 0.0007 (0.0021) model time 0.3973 (0.4162) loss 6.3156 (6.4761) grad_norm 3.1512 (3.4976) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][330/625] eta 0:02:02 lr 0.000058 wd 0.0500 time 0.3994 (0.4165) data time 0.0010 (0.0021) model time 0.3984 (0.4155) loss 6.9143 (6.4666) grad_norm 1.9878 (3.4791) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][340/625] eta 0:01:58 lr 0.000058 wd 0.0500 time 0.3974 (0.4159) data time 0.0006 (0.0020) model time 0.3967 (0.4149) loss 6.9447 (6.4702) grad_norm 3.4509 (3.4714) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][350/625] eta 0:01:54 lr 0.000058 wd 0.0500 time 0.3965 (0.4154) data time 0.0006 (0.0020) model time 0.3958 (0.4143) loss 6.1943 (6.4705) grad_norm 7.6752 (3.4682) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][360/625] eta 0:01:50 lr 0.000058 wd 0.0500 time 0.4017 (0.4156) data time 0.0008 (0.0020) model time 0.4009 (0.4145) loss 6.1417 (6.4769) grad_norm 4.7366 (3.4928) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][370/625] eta 0:01:45 lr 0.000058 wd 0.0500 time 0.4032 (0.4157) data time 0.0009 (0.0019) model time 0.4023 (0.4146) loss 6.6559 (6.4706) grad_norm 2.6068 (3.5171) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][380/625] eta 0:01:42 lr 0.000058 wd 0.0500 time 0.6074 (0.4163) data time 0.0007 (0.0019) model time 0.6067 (0.4154) loss 7.4955 (6.4752) grad_norm 3.4987 (3.5503) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][390/625] eta 0:01:37 lr 0.000058 wd 0.0500 time 0.4006 (0.4169) data time 0.0005 (0.0019) model time 0.4001 (0.4160) loss 6.5434 (6.4776) grad_norm 2.3889 (3.5523) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][400/625] eta 0:01:34 lr 0.000058 wd 0.0500 time 0.3998 (0.4186) data time 0.0006 (0.0019) model time 0.3992 (0.4180) loss 5.6646 (6.4654) grad_norm 2.8514 (3.5310) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][410/625] eta 0:01:30 lr 0.000058 wd 0.0500 time 0.3978 (0.4194) data time 0.0009 (0.0018) model time 0.3969 (0.4189) loss 5.7568 (6.4618) grad_norm 2.3531 (3.5235) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][420/625] eta 0:01:26 lr 0.000058 wd 0.0500 time 0.4002 (0.4206) data time 0.0007 (0.0018) model time 0.3995 (0.4202) loss 6.6571 (6.4601) grad_norm 2.8654 (3.5240) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][430/625] eta 0:01:22 lr 0.000058 wd 0.0500 time 0.3942 (0.4208) data time 0.0008 (0.0018) model time 0.3934 (0.4204) loss 6.2506 (6.4598) grad_norm 4.0340 (3.5254) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][440/625] eta 0:01:17 lr 0.000058 wd 0.0500 time 0.3971 (0.4210) data time 0.0008 (0.0018) model time 0.3963 (0.4207) loss 7.4369 (6.4640) grad_norm 3.3722 (3.5550) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][450/625] eta 0:01:13 lr 0.000058 wd 0.0500 time 0.3990 (0.4205) data time 0.0008 (0.0017) model time 0.3982 (0.4201) loss 7.4817 (6.4664) grad_norm 2.4344 (3.5803) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][460/625] eta 0:01:09 lr 0.000058 wd 0.0500 time 0.3986 (0.4201) data time 0.0007 (0.0017) model time 0.3979 (0.4196) loss 6.9001 (6.4678) grad_norm 4.2736 (3.7894) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:34:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][470/625] eta 0:01:05 lr 0.000058 wd 0.0500 time 0.4002 (0.4196) data time 0.0008 (0.0017) model time 0.3993 (0.4191) loss 7.1459 (6.4703) grad_norm 7.6600 (3.8281) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][480/625] eta 0:01:00 lr 0.000058 wd 0.0500 time 0.3969 (0.4192) data time 0.0008 (0.0017) model time 0.3962 (0.4186) loss 7.2127 (6.4649) grad_norm 2.3237 (3.8171) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][490/625] eta 0:00:56 lr 0.000058 wd 0.0500 time 0.3995 (0.4187) data time 0.0008 (0.0017) model time 0.3986 (0.4181) loss 6.0875 (6.4630) grad_norm 2.9835 (3.8139) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][500/625] eta 0:00:52 lr 0.000058 wd 0.0500 time 0.3965 (0.4184) data time 0.0007 (0.0016) model time 0.3958 (0.4177) loss 5.5536 (6.4619) grad_norm 4.6822 (3.8350) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][510/625] eta 0:00:48 lr 0.000058 wd 0.0500 time 0.3993 (0.4180) data time 0.0009 (0.0016) model time 0.3984 (0.4173) loss 6.6299 (6.4627) grad_norm 2.9685 (3.8068) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][520/625] eta 0:00:43 lr 0.000058 wd 0.0500 time 0.3994 (0.4176) data time 0.0006 (0.0016) model time 0.3988 (0.4168) loss 8.1452 (6.4709) grad_norm 3.1227 (3.7945) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][530/625] eta 0:00:39 lr 0.000058 wd 0.0500 time 0.4002 (0.4173) data time 0.0007 (0.0016) model time 0.3995 (0.4165) loss 6.2704 (6.4672) grad_norm 2.7501 (3.7879) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][540/625] eta 0:00:35 lr 0.000058 wd 0.0500 time 0.3967 (0.4169) data time 0.0009 (0.0016) model time 0.3958 (0.4161) loss 5.3327 (6.4644) grad_norm 3.1082 (3.7898) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][550/625] eta 0:00:31 lr 0.000058 wd 0.0500 time 0.3984 (0.4166) data time 0.0009 (0.0016) model time 0.3975 (0.4157) loss 6.7401 (6.4612) grad_norm 2.2038 (3.7862) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][560/625] eta 0:00:27 lr 0.000057 wd 0.0500 time 0.3990 (0.4163) data time 0.0006 (0.0016) model time 0.3984 (0.4154) loss 6.9971 (6.4566) grad_norm 4.0421 (3.8251) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][570/625] eta 0:00:22 lr 0.000057 wd 0.0500 time 0.4122 (0.4161) data time 0.0010 (0.0016) model time 0.4112 (0.4151) loss 6.4433 (6.4581) grad_norm 3.8636 (3.8146) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][580/625] eta 0:00:18 lr 0.000057 wd 0.0500 time 0.3983 (0.4163) data time 0.0008 (0.0015) model time 0.3975 (0.4154) loss 6.4899 (6.4588) grad_norm 3.6161 (3.7983) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][590/625] eta 0:00:14 lr 0.000057 wd 0.0500 time 0.4002 (0.4164) data time 0.0006 (0.0015) model time 0.3995 (0.4155) loss 6.8267 (6.4609) grad_norm 2.8801 (3.7829) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][600/625] eta 0:00:10 lr 0.000057 wd 0.0500 time 0.5765 (0.4170) data time 0.0009 (0.0015) model time 0.5756 (0.4162) loss 6.7669 (6.4572) grad_norm 2.4072 (3.7864) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:35:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][610/625] eta 0:00:06 lr 0.000057 wd 0.0500 time 0.3965 (0.4174) data time 0.0004 (0.0015) model time 0.3961 (0.4165) loss 6.0241 (6.4543) grad_norm 2.5278 (3.7710) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][620/625] eta 0:00:02 lr 0.000057 wd 0.0500 time 0.5664 (0.4187) data time 0.0004 (0.0015) model time 0.5660 (0.4180) loss 5.7290 (6.4547) grad_norm 3.9804 (3.7669) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 264 training takes 0:04:21 [2024-07-25 11:36:03 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:36:04 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:36:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5435 (0.5435) Acc@1 90.430 (90.430) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 11:36:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.130) Loss 0.8247 (0.6577) Acc@1 82.471 (87.447) Acc@5 97.168 (98.047) Mem 14939MB [2024-07-25 11:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.109) Loss 0.9092 (0.7601) Acc@1 78.906 (84.556) Acc@5 96.045 (97.082) Mem 14939MB [2024-07-25 11:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.187 Acc@5 97.055 [2024-07-25 11:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 11:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.19% [2024-07-25 11:36:06 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 11:36:07 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 11:36:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5391 (0.5391) Acc@1 90.039 (90.039) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 11:36:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8110 (0.6558) Acc@1 83.252 (87.429) Acc@5 97.119 (97.994) Mem 14939MB [2024-07-25 11:36:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9150 (0.7595) Acc@1 78.613 (84.482) Acc@5 95.752 (97.017) Mem 14939MB [2024-07-25 11:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.087 Acc@5 96.977 [2024-07-25 11:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.09% [2024-07-25 11:36:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:36:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:36:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][0/625] eta 0:08:57 lr 0.000057 wd 0.0500 time 0.8594 (0.8594) data time 0.4803 (0.4803) model time 0.0000 (0.0000) loss 6.2577 (6.2577) grad_norm 3.6499 (3.6499) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][10/625] eta 0:05:09 lr 0.000057 wd 0.0500 time 0.3977 (0.5025) data time 0.0009 (0.0445) model time 0.0000 (0.0000) loss 5.6142 (6.3050) grad_norm 2.5874 (3.1053) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][20/625] eta 0:04:43 lr 0.000057 wd 0.0500 time 0.3997 (0.4687) data time 0.0007 (0.0237) model time 0.0000 (0.0000) loss 6.2131 (6.1336) grad_norm 2.4715 (3.0809) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][30/625] eta 0:04:31 lr 0.000057 wd 0.0500 time 0.3977 (0.4568) data time 0.0009 (0.0164) model time 0.0000 (0.0000) loss 5.4791 (6.1999) grad_norm 2.3644 (3.1634) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][40/625] eta 0:04:21 lr 0.000057 wd 0.0500 time 0.3964 (0.4470) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 7.0378 (6.1666) grad_norm 3.3605 (3.3969) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][50/625] eta 0:04:11 lr 0.000057 wd 0.0500 time 0.4067 (0.4376) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 6.2594 (6.2093) grad_norm 2.6132 (3.3007) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][60/625] eta 0:04:03 lr 0.000057 wd 0.0500 time 0.3992 (0.4313) data time 0.0008 (0.0087) model time 0.3984 (0.3983) loss 5.9205 (6.2140) grad_norm 2.2658 (3.3161) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][70/625] eta 0:03:56 lr 0.000057 wd 0.0500 time 0.3956 (0.4267) data time 0.0007 (0.0076) model time 0.3949 (0.3980) loss 5.4778 (6.2359) grad_norm 4.9716 (3.5758) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][80/625] eta 0:03:50 lr 0.000057 wd 0.0500 time 0.4011 (0.4233) data time 0.0007 (0.0068) model time 0.4004 (0.3982) loss 5.4797 (6.2893) grad_norm 2.7001 (3.5204) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][90/625] eta 0:03:44 lr 0.000057 wd 0.0500 time 0.3963 (0.4204) data time 0.0007 (0.0061) model time 0.3956 (0.3975) loss 5.3476 (6.2567) grad_norm 2.9031 (3.6040) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][100/625] eta 0:03:39 lr 0.000057 wd 0.0500 time 0.3977 (0.4182) data time 0.0008 (0.0056) model time 0.3968 (0.3975) loss 6.7045 (6.2957) grad_norm 3.5753 (3.8168) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:36:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][110/625] eta 0:03:34 lr 0.000057 wd 0.0500 time 0.3984 (0.4164) data time 0.0007 (0.0052) model time 0.3978 (0.3976) loss 7.1411 (6.3445) grad_norm 3.3602 (3.7740) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][120/625] eta 0:03:29 lr 0.000057 wd 0.0500 time 0.4011 (0.4151) data time 0.0009 (0.0048) model time 0.4002 (0.3978) loss 5.5821 (6.3503) grad_norm 2.5190 (3.7600) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][130/625] eta 0:03:24 lr 0.000057 wd 0.0500 time 0.3992 (0.4138) data time 0.0009 (0.0045) model time 0.3983 (0.3978) loss 6.8189 (6.3815) grad_norm 2.6365 (3.7203) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][140/625] eta 0:03:20 lr 0.000057 wd 0.0500 time 0.4034 (0.4126) data time 0.0009 (0.0043) model time 0.4025 (0.3976) loss 7.0187 (6.3952) grad_norm 2.6984 (3.6754) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][150/625] eta 0:03:15 lr 0.000057 wd 0.0500 time 0.3968 (0.4116) data time 0.0009 (0.0040) model time 0.3959 (0.3975) loss 6.3490 (6.4079) grad_norm 2.8922 (3.7424) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][160/625] eta 0:03:10 lr 0.000057 wd 0.0500 time 0.3949 (0.4107) data time 0.0009 (0.0038) model time 0.3939 (0.3974) loss 6.2515 (6.4017) grad_norm 2.5951 (3.7242) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][170/625] eta 0:03:07 lr 0.000057 wd 0.0500 time 0.4041 (0.4125) data time 0.0007 (0.0037) model time 0.4033 (0.4010) loss 6.9800 (6.4095) grad_norm 3.5752 (3.6846) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][180/625] eta 0:03:03 lr 0.000056 wd 0.0500 time 0.3989 (0.4134) data time 0.0007 (0.0035) model time 0.3982 (0.4030) loss 6.1773 (6.4172) grad_norm 1.9870 (3.6832) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][190/625] eta 0:02:59 lr 0.000056 wd 0.0500 time 0.4072 (0.4127) data time 0.0007 (0.0034) model time 0.4065 (0.4028) loss 5.8317 (6.4077) grad_norm 3.5553 (3.6405) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][200/625] eta 0:02:57 lr 0.000056 wd 0.0500 time 0.6035 (0.4166) data time 0.0009 (0.0033) model time 0.6026 (0.4086) loss 5.3827 (6.3970) grad_norm 4.3992 (3.6918) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][210/625] eta 0:02:53 lr 0.000056 wd 0.0500 time 0.4004 (0.4178) data time 0.0006 (0.0031) model time 0.3997 (0.4107) loss 5.9645 (6.4000) grad_norm 2.8778 (3.6779) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][220/625] eta 0:02:50 lr 0.000056 wd 0.0500 time 0.5530 (0.4199) data time 0.0007 (0.0030) model time 0.5523 (0.4137) loss 6.8628 (6.3986) grad_norm 3.1168 (3.6607) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][230/625] eta 0:02:45 lr 0.000056 wd 0.0500 time 0.4047 (0.4202) data time 0.0009 (0.0029) model time 0.4039 (0.4145) loss 6.9607 (6.4010) grad_norm 1.9874 (3.6596) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][240/625] eta 0:02:42 lr 0.000056 wd 0.0500 time 0.4059 (0.4211) data time 0.0008 (0.0029) model time 0.4051 (0.4158) loss 5.8408 (6.3979) grad_norm 4.5756 (3.6642) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:37:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][250/625] eta 0:02:38 lr 0.000056 wd 0.0500 time 0.3984 (0.4218) data time 0.0006 (0.0028) model time 0.3978 (0.4169) loss 5.2532 (6.3895) grad_norm 2.2683 (3.6917) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][260/625] eta 0:02:33 lr 0.000056 wd 0.0500 time 0.3996 (0.4212) data time 0.0006 (0.0027) model time 0.3990 (0.4164) loss 6.1028 (6.3767) grad_norm 2.1885 (3.7192) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][270/625] eta 0:02:29 lr 0.000056 wd 0.0500 time 0.3965 (0.4204) data time 0.0007 (0.0026) model time 0.3959 (0.4156) loss 6.5733 (6.3789) grad_norm 2.8596 (3.7592) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][280/625] eta 0:02:24 lr 0.000056 wd 0.0500 time 0.3978 (0.4196) data time 0.0008 (0.0026) model time 0.3969 (0.4148) loss 6.7638 (6.3809) grad_norm 2.4257 (3.7225) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][290/625] eta 0:02:20 lr 0.000056 wd 0.0500 time 0.3993 (0.4189) data time 0.0008 (0.0025) model time 0.3985 (0.4140) loss 7.2609 (6.3867) grad_norm 5.8052 (3.7295) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][300/625] eta 0:02:15 lr 0.000056 wd 0.0500 time 0.4065 (0.4182) data time 0.0008 (0.0025) model time 0.4058 (0.4134) loss 6.6422 (6.3841) grad_norm 4.0264 (3.8683) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][310/625] eta 0:02:11 lr 0.000056 wd 0.0500 time 0.4005 (0.4176) data time 0.0009 (0.0024) model time 0.3996 (0.4129) loss 5.5072 (6.3742) grad_norm 2.8194 (3.8369) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][320/625] eta 0:02:07 lr 0.000056 wd 0.0500 time 0.3968 (0.4171) data time 0.0009 (0.0024) model time 0.3959 (0.4124) loss 6.9877 (6.3721) grad_norm 4.5349 (3.8321) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][330/625] eta 0:02:02 lr 0.000056 wd 0.0500 time 0.3997 (0.4165) data time 0.0006 (0.0023) model time 0.3991 (0.4118) loss 7.0458 (6.3734) grad_norm 2.3672 (3.8092) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][340/625] eta 0:01:58 lr 0.000056 wd 0.0500 time 0.3952 (0.4160) data time 0.0008 (0.0023) model time 0.3945 (0.4113) loss 7.4422 (6.3815) grad_norm 2.8249 (3.7969) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][350/625] eta 0:01:54 lr 0.000056 wd 0.0500 time 0.3978 (0.4155) data time 0.0008 (0.0022) model time 0.3970 (0.4109) loss 5.8807 (6.3792) grad_norm 2.0690 (3.7826) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][360/625] eta 0:01:49 lr 0.000056 wd 0.0500 time 0.3988 (0.4151) data time 0.0007 (0.0022) model time 0.3981 (0.4105) loss 5.4381 (6.3834) grad_norm 2.2655 (3.7589) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][370/625] eta 0:01:45 lr 0.000056 wd 0.0500 time 0.3944 (0.4146) data time 0.0007 (0.0022) model time 0.3937 (0.4101) loss 6.6179 (6.3849) grad_norm 2.9406 (3.7519) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][380/625] eta 0:01:41 lr 0.000056 wd 0.0500 time 0.3991 (0.4141) data time 0.0007 (0.0021) model time 0.3984 (0.4097) loss 6.3270 (6.3853) grad_norm 5.2862 (3.7860) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][390/625] eta 0:01:37 lr 0.000056 wd 0.0500 time 0.4004 (0.4143) data time 0.0007 (0.0021) model time 0.3998 (0.4099) loss 7.2827 (6.3914) grad_norm 6.2227 (3.8054) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][400/625] eta 0:01:33 lr 0.000056 wd 0.0500 time 0.3968 (0.4147) data time 0.0006 (0.0021) model time 0.3962 (0.4106) loss 6.4763 (6.3864) grad_norm 2.1459 (3.8051) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][410/625] eta 0:01:29 lr 0.000056 wd 0.0500 time 0.3991 (0.4144) data time 0.0010 (0.0020) model time 0.3982 (0.4102) loss 6.8389 (6.3944) grad_norm 3.3344 (3.8122) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][420/625] eta 0:01:25 lr 0.000056 wd 0.0500 time 0.3942 (0.4160) data time 0.0009 (0.0020) model time 0.3933 (0.4122) loss 6.8915 (6.3881) grad_norm 3.8280 (3.8400) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][430/625] eta 0:01:21 lr 0.000055 wd 0.0500 time 0.5903 (0.4169) data time 0.0009 (0.0020) model time 0.5895 (0.4132) loss 6.3037 (6.3890) grad_norm 2.8853 (3.8792) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][440/625] eta 0:01:17 lr 0.000055 wd 0.0500 time 0.6004 (0.4178) data time 0.0009 (0.0019) model time 0.5995 (0.4143) loss 7.3098 (6.3911) grad_norm 2.1550 (3.8607) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][450/625] eta 0:01:13 lr 0.000055 wd 0.0500 time 0.3992 (0.4181) data time 0.0009 (0.0019) model time 0.3983 (0.4147) loss 6.8245 (6.3888) grad_norm 3.2716 (3.8403) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][460/625] eta 0:01:09 lr 0.000055 wd 0.0500 time 0.3988 (0.4189) data time 0.0006 (0.0019) model time 0.3982 (0.4157) loss 5.9890 (6.3917) grad_norm 2.9770 (3.8422) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][470/625] eta 0:01:04 lr 0.000055 wd 0.0500 time 0.4003 (0.4187) data time 0.0007 (0.0019) model time 0.3996 (0.4156) loss 6.1857 (6.3866) grad_norm 2.2493 (3.8268) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][480/625] eta 0:01:00 lr 0.000055 wd 0.0500 time 0.3982 (0.4185) data time 0.0006 (0.0019) model time 0.3975 (0.4154) loss 5.9818 (6.3885) grad_norm 3.4234 (3.8011) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][490/625] eta 0:00:56 lr 0.000055 wd 0.0500 time 0.3954 (0.4181) data time 0.0009 (0.0018) model time 0.3945 (0.4150) loss 6.1673 (6.3880) grad_norm 3.1273 (3.7824) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][500/625] eta 0:00:52 lr 0.000055 wd 0.0500 time 0.3981 (0.4177) data time 0.0007 (0.0018) model time 0.3974 (0.4146) loss 6.3816 (6.3854) grad_norm 2.8740 (3.7601) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][510/625] eta 0:00:47 lr 0.000055 wd 0.0500 time 0.3954 (0.4173) data time 0.0007 (0.0018) model time 0.3947 (0.4142) loss 6.4849 (6.3877) grad_norm 2.9412 (3.7651) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][520/625] eta 0:00:43 lr 0.000055 wd 0.0500 time 0.4017 (0.4170) data time 0.0007 (0.0018) model time 0.4009 (0.4139) loss 7.1993 (6.3863) grad_norm 2.9386 (3.7425) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][530/625] eta 0:00:39 lr 0.000055 wd 0.0500 time 0.4002 (0.4166) data time 0.0009 (0.0018) model time 0.3993 (0.4136) loss 6.3829 (6.3892) grad_norm 4.1704 (3.7371) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:39:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][540/625] eta 0:00:35 lr 0.000055 wd 0.0500 time 0.4015 (0.4163) data time 0.0008 (0.0017) model time 0.4007 (0.4132) loss 6.1556 (6.3847) grad_norm 2.6470 (3.7339) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][550/625] eta 0:00:31 lr 0.000055 wd 0.0500 time 0.3978 (0.4160) data time 0.0008 (0.0017) model time 0.3970 (0.4129) loss 5.5327 (6.3832) grad_norm 2.4177 (3.7218) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][560/625] eta 0:00:27 lr 0.000055 wd 0.0500 time 0.3988 (0.4157) data time 0.0007 (0.0017) model time 0.3981 (0.4127) loss 6.0736 (6.3809) grad_norm 3.7958 (3.7088) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][570/625] eta 0:00:22 lr 0.000055 wd 0.0500 time 0.3993 (0.4154) data time 0.0009 (0.0017) model time 0.3985 (0.4124) loss 5.8506 (6.3802) grad_norm 2.7796 (3.7057) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][580/625] eta 0:00:18 lr 0.000055 wd 0.0500 time 0.3994 (0.4151) data time 0.0008 (0.0017) model time 0.3987 (0.4121) loss 7.0077 (6.3755) grad_norm 2.6467 (3.6967) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][590/625] eta 0:00:14 lr 0.000055 wd 0.0500 time 0.3975 (0.4149) data time 0.0011 (0.0017) model time 0.3964 (0.4119) loss 6.1784 (6.3809) grad_norm 6.8290 (3.7612) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][600/625] eta 0:00:10 lr 0.000055 wd 0.0500 time 0.4008 (0.4146) data time 0.0007 (0.0017) model time 0.4001 (0.4117) loss 7.0969 (6.3885) grad_norm 3.2912 (3.8000) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][610/625] eta 0:00:06 lr 0.000055 wd 0.0500 time 0.4035 (0.4148) data time 0.0004 (0.0017) model time 0.4030 (0.4118) loss 6.2629 (6.3885) grad_norm 3.2302 (3.8224) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][620/625] eta 0:00:02 lr 0.000055 wd 0.0500 time 0.5679 (0.4150) data time 0.0006 (0.0016) model time 0.5672 (0.4121) loss 5.5651 (6.3895) grad_norm 3.3981 (3.8159) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 265 training takes 0:04:19 [2024-07-25 11:40:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:40:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:40:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5435 (0.5435) Acc@1 89.990 (89.990) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 11:40:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8149 (0.6535) Acc@1 82.422 (87.509) Acc@5 97.168 (98.038) Mem 14939MB [2024-07-25 11:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9004 (0.7569) Acc@1 79.688 (84.642) Acc@5 96.143 (97.101) Mem 14939MB [2024-07-25 11:40:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.237 Acc@5 97.055 [2024-07-25 11:40:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 11:40:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.24% [2024-07-25 11:40:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 11:40:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 11:40:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.5391 (0.5391) Acc@1 90.039 (90.039) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 11:40:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8105 (0.6556) Acc@1 83.301 (87.460) Acc@5 97.119 (98.002) Mem 14939MB [2024-07-25 11:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9146 (0.7594) Acc@1 78.662 (84.491) Acc@5 95.752 (97.021) Mem 14939MB [2024-07-25 11:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.099 Acc@5 96.987 [2024-07-25 11:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.10% [2024-07-25 11:40:37 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:40:38 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:40:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][0/625] eta 0:08:34 lr 0.000055 wd 0.0500 time 0.8230 (0.8230) data time 0.4429 (0.4429) model time 0.0000 (0.0000) loss 6.9004 (6.9004) grad_norm 5.7662 (5.7662) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][10/625] eta 0:04:47 lr 0.000055 wd 0.0500 time 0.3943 (0.4674) data time 0.0009 (0.0411) model time 0.0000 (0.0000) loss 6.0502 (6.4336) grad_norm 3.3210 (3.4530) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][20/625] eta 0:04:38 lr 0.000055 wd 0.0500 time 0.6206 (0.4609) data time 0.0009 (0.0219) model time 0.0000 (0.0000) loss 6.8857 (6.5621) grad_norm 2.2085 (4.6133) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][30/625] eta 0:04:36 lr 0.000055 wd 0.0500 time 0.3963 (0.4642) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 6.4652 (6.4785) grad_norm 3.4916 (4.1862) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:40:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][40/625] eta 0:04:30 lr 0.000055 wd 0.0500 time 0.5806 (0.4619) data time 0.0006 (0.0116) model time 0.0000 (0.0000) loss 5.0885 (6.3518) grad_norm 2.3664 (4.4011) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][50/625] eta 0:04:23 lr 0.000055 wd 0.0500 time 0.5642 (0.4590) data time 0.0009 (0.0095) model time 0.0000 (0.0000) loss 6.0753 (6.2899) grad_norm 2.0386 (4.1587) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][60/625] eta 0:04:14 lr 0.000054 wd 0.0500 time 0.5406 (0.4511) data time 0.0006 (0.0081) model time 0.5400 (0.4103) loss 7.6698 (6.3960) grad_norm 3.0614 (4.1123) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][70/625] eta 0:04:06 lr 0.000054 wd 0.0500 time 0.4020 (0.4436) data time 0.0006 (0.0071) model time 0.4013 (0.4036) loss 6.1350 (6.4263) grad_norm 2.7760 (4.1861) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][80/625] eta 0:03:59 lr 0.000054 wd 0.0500 time 0.4021 (0.4402) data time 0.0008 (0.0063) model time 0.4012 (0.4075) loss 6.7554 (6.4394) grad_norm 2.9472 (4.0701) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][90/625] eta 0:03:52 lr 0.000054 wd 0.0500 time 0.3950 (0.4354) data time 0.0006 (0.0057) model time 0.3944 (0.4046) loss 5.4475 (6.4135) grad_norm 2.5366 (3.9693) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][100/625] eta 0:03:47 lr 0.000054 wd 0.0500 time 0.3978 (0.4335) data time 0.0007 (0.0052) model time 0.3971 (0.4066) loss 7.4148 (6.4104) grad_norm 2.4098 (3.8634) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][110/625] eta 0:03:41 lr 0.000054 wd 0.0500 time 0.3986 (0.4303) data time 0.0007 (0.0048) model time 0.3979 (0.4051) loss 6.0660 (6.4173) grad_norm 2.3547 (3.7566) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][120/625] eta 0:03:36 lr 0.000054 wd 0.0500 time 0.3989 (0.4277) data time 0.0008 (0.0045) model time 0.3981 (0.4041) loss 5.9108 (6.4012) grad_norm 1.9551 (3.9635) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][130/625] eta 0:03:30 lr 0.000054 wd 0.0500 time 0.3986 (0.4255) data time 0.0008 (0.0042) model time 0.3978 (0.4034) loss 6.1385 (6.4105) grad_norm 2.5702 (3.8900) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][140/625] eta 0:03:25 lr 0.000054 wd 0.0500 time 0.3969 (0.4236) data time 0.0007 (0.0040) model time 0.3962 (0.4027) loss 7.5712 (6.3985) grad_norm 7.1483 (3.8408) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][150/625] eta 0:03:20 lr 0.000054 wd 0.0500 time 0.3967 (0.4220) data time 0.0009 (0.0038) model time 0.3958 (0.4022) loss 6.3770 (6.4173) grad_norm 5.8382 (3.8179) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][160/625] eta 0:03:15 lr 0.000054 wd 0.0500 time 0.3984 (0.4206) data time 0.0007 (0.0036) model time 0.3978 (0.4019) loss 5.9862 (6.3944) grad_norm 5.4760 (3.7810) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][170/625] eta 0:03:10 lr 0.000054 wd 0.0500 time 0.3975 (0.4193) data time 0.0008 (0.0034) model time 0.3967 (0.4016) loss 7.1684 (6.4094) grad_norm 2.3447 (3.7839) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][180/625] eta 0:03:06 lr 0.000054 wd 0.0500 time 0.3986 (0.4182) data time 0.0007 (0.0033) model time 0.3979 (0.4013) loss 6.5256 (6.3957) grad_norm 2.2165 (3.7302) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:41:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][190/625] eta 0:03:01 lr 0.000054 wd 0.0500 time 0.3993 (0.4173) data time 0.0008 (0.0032) model time 0.3986 (0.4012) loss 5.9065 (6.3935) grad_norm 2.7910 (3.6955) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][200/625] eta 0:02:57 lr 0.000054 wd 0.0500 time 0.4049 (0.4175) data time 0.0009 (0.0030) model time 0.4040 (0.4025) loss 6.8418 (6.4072) grad_norm 2.6294 (3.7063) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][210/625] eta 0:02:53 lr 0.000054 wd 0.0500 time 0.3973 (0.4173) data time 0.0008 (0.0029) model time 0.3965 (0.4031) loss 6.1027 (6.4042) grad_norm 3.7164 (3.6877) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][220/625] eta 0:02:49 lr 0.000054 wd 0.0500 time 0.3991 (0.4175) data time 0.0007 (0.0029) model time 0.3984 (0.4041) loss 6.8397 (6.4102) grad_norm 3.3659 (3.6724) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][230/625] eta 0:02:45 lr 0.000054 wd 0.0500 time 0.3975 (0.4182) data time 0.0009 (0.0028) model time 0.3967 (0.4057) loss 6.2473 (6.4138) grad_norm 2.8703 (3.6448) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][240/625] eta 0:02:41 lr 0.000054 wd 0.0500 time 0.3970 (0.4198) data time 0.0006 (0.0027) model time 0.3963 (0.4084) loss 7.0756 (6.4198) grad_norm 2.3994 (3.7494) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][250/625] eta 0:02:37 lr 0.000054 wd 0.0500 time 0.3994 (0.4210) data time 0.0007 (0.0026) model time 0.3987 (0.4105) loss 4.9825 (6.4137) grad_norm 2.2009 (3.7174) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][260/625] eta 0:02:33 lr 0.000054 wd 0.0500 time 0.3975 (0.4219) data time 0.0008 (0.0025) model time 0.3967 (0.4121) loss 6.1113 (6.4095) grad_norm 7.3940 (3.7213) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][270/625] eta 0:02:30 lr 0.000054 wd 0.0500 time 0.3995 (0.4226) data time 0.0009 (0.0025) model time 0.3986 (0.4134) loss 5.4130 (6.4039) grad_norm 3.0128 (3.7029) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:42:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][280/625] eta 0:02:25 lr 0.000054 wd 0.0500 time 0.3986 (0.4224) data time 0.0007 (0.0024) model time 0.3980 (0.4134) loss 5.3202 (6.4064) grad_norm 2.6444 (3.7485) loss_scale 256.0000 (131.6441) mem 14939MB [2024-07-25 11:42:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][290/625] eta 0:02:21 lr 0.000054 wd 0.0500 time 0.5140 (0.4225) data time 0.0007 (0.0024) model time 0.5133 (0.4139) loss 7.2861 (6.4081) grad_norm 2.9290 (3.7912) loss_scale 256.0000 (135.9175) mem 14939MB [2024-07-25 11:42:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][300/625] eta 0:02:17 lr 0.000054 wd 0.0500 time 0.3970 (0.4217) data time 0.0007 (0.0023) model time 0.3963 (0.4132) loss 7.2689 (6.4062) grad_norm 3.1543 (3.7507) loss_scale 256.0000 (139.9070) mem 14939MB [2024-07-25 11:42:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][310/625] eta 0:02:12 lr 0.000053 wd 0.0500 time 0.4013 (0.4210) data time 0.0007 (0.0023) model time 0.4006 (0.4127) loss 4.9303 (6.3954) grad_norm 2.9257 (3.7211) loss_scale 256.0000 (143.6399) mem 14939MB [2024-07-25 11:42:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][320/625] eta 0:02:08 lr 0.000053 wd 0.0500 time 0.3955 (0.4207) data time 0.0007 (0.0022) model time 0.3948 (0.4127) loss 5.6537 (6.3888) grad_norm 3.3596 (3.6961) loss_scale 256.0000 (147.1402) mem 14939MB [2024-07-25 11:42:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][330/625] eta 0:02:03 lr 0.000053 wd 0.0500 time 0.4004 (0.4203) data time 0.0009 (0.0022) model time 0.3995 (0.4124) loss 7.1487 (6.3871) grad_norm 2.5046 (3.7678) loss_scale 256.0000 (150.4290) mem 14939MB [2024-07-25 11:43:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][340/625] eta 0:01:59 lr 0.000053 wd 0.0500 time 0.4039 (0.4197) data time 0.0008 (0.0021) model time 0.4031 (0.4120) loss 6.9566 (6.3879) grad_norm 13.0542 (3.8095) loss_scale 256.0000 (153.5249) mem 14939MB [2024-07-25 11:43:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][350/625] eta 0:01:55 lr 0.000053 wd 0.0500 time 0.3983 (0.4192) data time 0.0009 (0.0021) model time 0.3973 (0.4116) loss 6.5874 (6.3858) grad_norm 2.7511 (3.7997) loss_scale 256.0000 (156.4444) mem 14939MB [2024-07-25 11:43:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][360/625] eta 0:01:50 lr 0.000053 wd 0.0500 time 0.3952 (0.4186) data time 0.0008 (0.0021) model time 0.3944 (0.4110) loss 5.7021 (6.3874) grad_norm 6.0789 (3.7879) loss_scale 256.0000 (159.2022) mem 14939MB [2024-07-25 11:43:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][370/625] eta 0:01:46 lr 0.000053 wd 0.0500 time 0.3964 (0.4180) data time 0.0006 (0.0021) model time 0.3958 (0.4106) loss 6.5504 (6.3940) grad_norm 2.9688 (3.8239) loss_scale 256.0000 (161.8113) mem 14939MB [2024-07-25 11:43:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][380/625] eta 0:01:42 lr 0.000053 wd 0.0500 time 0.3985 (0.4176) data time 0.0008 (0.0020) model time 0.3977 (0.4103) loss 6.0715 (6.3877) grad_norm 2.1173 (3.8249) loss_scale 256.0000 (164.2835) mem 14939MB [2024-07-25 11:43:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][390/625] eta 0:01:38 lr 0.000053 wd 0.0500 time 0.3969 (0.4171) data time 0.0009 (0.0020) model time 0.3960 (0.4099) loss 6.5357 (6.3865) grad_norm 3.3056 (3.8217) loss_scale 256.0000 (166.6292) mem 14939MB [2024-07-25 11:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][400/625] eta 0:01:33 lr 0.000053 wd 0.0500 time 0.3999 (0.4166) data time 0.0006 (0.0020) model time 0.3993 (0.4096) loss 5.8967 (6.3891) grad_norm 2.8693 (3.8373) loss_scale 256.0000 (168.8579) mem 14939MB [2024-07-25 11:43:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][410/625] eta 0:01:29 lr 0.000053 wd 0.0500 time 0.3954 (0.4162) data time 0.0007 (0.0019) model time 0.3947 (0.4093) loss 5.6205 (6.3848) grad_norm 3.3676 (3.8899) loss_scale 256.0000 (170.9781) mem 14939MB [2024-07-25 11:43:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][420/625] eta 0:01:25 lr 0.000053 wd 0.0500 time 0.3999 (0.4163) data time 0.0008 (0.0019) model time 0.3991 (0.4095) loss 6.6963 (6.3839) grad_norm 2.4747 (3.8836) loss_scale 256.0000 (172.9976) mem 14939MB [2024-07-25 11:43:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][430/625] eta 0:01:21 lr 0.000053 wd 0.0500 time 0.4003 (0.4162) data time 0.0009 (0.0019) model time 0.3994 (0.4096) loss 6.3259 (6.3834) grad_norm 2.1653 (3.8915) loss_scale 256.0000 (174.9234) mem 14939MB [2024-07-25 11:43:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][440/625] eta 0:01:17 lr 0.000053 wd 0.0500 time 0.3965 (0.4163) data time 0.0006 (0.0019) model time 0.3958 (0.4098) loss 5.1958 (6.3780) grad_norm 4.7466 (3.8929) loss_scale 256.0000 (176.7619) mem 14939MB [2024-07-25 11:43:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][450/625] eta 0:01:12 lr 0.000053 wd 0.0500 time 0.3957 (0.4166) data time 0.0007 (0.0018) model time 0.3950 (0.4104) loss 6.4855 (6.3768) grad_norm 4.0357 (3.9003) loss_scale 256.0000 (178.5188) mem 14939MB [2024-07-25 11:43:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][460/625] eta 0:01:08 lr 0.000053 wd 0.0500 time 0.4023 (0.4173) data time 0.0009 (0.0018) model time 0.4014 (0.4113) loss 5.9277 (6.3778) grad_norm 2.8668 (3.8843) loss_scale 256.0000 (180.1996) mem 14939MB [2024-07-25 11:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][470/625] eta 0:01:04 lr 0.000053 wd 0.0500 time 0.3984 (0.4184) data time 0.0009 (0.0018) model time 0.3976 (0.4126) loss 6.2912 (6.3732) grad_norm 2.7819 (3.8688) loss_scale 256.0000 (181.8089) mem 14939MB [2024-07-25 11:44:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][480/625] eta 0:01:00 lr 0.000053 wd 0.0500 time 0.4027 (0.4194) data time 0.0008 (0.0018) model time 0.4018 (0.4139) loss 7.6983 (6.3741) grad_norm 2.9833 (3.8497) loss_scale 256.0000 (183.3514) mem 14939MB [2024-07-25 11:44:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][490/625] eta 0:00:56 lr 0.000053 wd 0.0500 time 0.3958 (0.4200) data time 0.0009 (0.0018) model time 0.3949 (0.4146) loss 6.3374 (6.3697) grad_norm 2.4010 (3.8399) loss_scale 256.0000 (184.8310) mem 14939MB [2024-07-25 11:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][500/625] eta 0:00:52 lr 0.000053 wd 0.0500 time 0.4062 (0.4199) data time 0.0007 (0.0017) model time 0.4055 (0.4146) loss 7.2546 (6.3747) grad_norm 3.8839 (3.8348) loss_scale 256.0000 (186.2515) mem 14939MB [2024-07-25 11:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][510/625] eta 0:00:48 lr 0.000053 wd 0.0500 time 0.5698 (0.4201) data time 0.0006 (0.0017) model time 0.5692 (0.4149) loss 6.5751 (6.3703) grad_norm 2.0747 (3.8188) loss_scale 256.0000 (187.6164) mem 14939MB [2024-07-25 11:44:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][520/625] eta 0:00:44 lr 0.000053 wd 0.0500 time 0.3947 (0.4196) data time 0.0008 (0.0017) model time 0.3939 (0.4145) loss 5.3027 (6.3731) grad_norm 3.2208 (3.8038) loss_scale 256.0000 (188.9290) mem 14939MB [2024-07-25 11:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][530/625] eta 0:00:39 lr 0.000053 wd 0.0500 time 0.3973 (0.4192) data time 0.0009 (0.0017) model time 0.3964 (0.4141) loss 7.5616 (6.3758) grad_norm 1.8721 (3.7837) loss_scale 256.0000 (190.1921) mem 14939MB [2024-07-25 11:44:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][540/625] eta 0:00:35 lr 0.000053 wd 0.0500 time 0.3975 (0.4191) data time 0.0007 (0.0017) model time 0.3968 (0.4141) loss 5.8116 (6.3715) grad_norm 2.3379 (3.7694) loss_scale 256.0000 (191.4085) mem 14939MB [2024-07-25 11:44:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][550/625] eta 0:00:31 lr 0.000053 wd 0.0500 time 0.3964 (0.4188) data time 0.0008 (0.0017) model time 0.3956 (0.4138) loss 6.8011 (6.3660) grad_norm 2.9069 (3.7569) loss_scale 256.0000 (192.5808) mem 14939MB [2024-07-25 11:44:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][560/625] eta 0:00:27 lr 0.000053 wd 0.0500 time 0.3998 (0.4185) data time 0.0010 (0.0017) model time 0.3988 (0.4135) loss 6.5489 (6.3678) grad_norm 2.2224 (3.7465) loss_scale 256.0000 (193.7112) mem 14939MB [2024-07-25 11:44:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][570/625] eta 0:00:22 lr 0.000052 wd 0.0500 time 0.4005 (0.4181) data time 0.0007 (0.0016) model time 0.3998 (0.4133) loss 6.7713 (6.3693) grad_norm 4.0023 (3.7438) loss_scale 256.0000 (194.8021) mem 14939MB [2024-07-25 11:44:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][580/625] eta 0:00:18 lr 0.000052 wd 0.0500 time 0.3978 (0.4178) data time 0.0010 (0.0016) model time 0.3968 (0.4130) loss 6.5536 (6.3693) grad_norm 2.5645 (3.7388) loss_scale 256.0000 (195.8554) mem 14939MB [2024-07-25 11:44:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][590/625] eta 0:00:14 lr 0.000052 wd 0.0500 time 0.3996 (0.4176) data time 0.0007 (0.0016) model time 0.3989 (0.4128) loss 5.0444 (6.3656) grad_norm 2.1106 (3.7346) loss_scale 256.0000 (196.8731) mem 14939MB [2024-07-25 11:44:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][600/625] eta 0:00:10 lr 0.000052 wd 0.0500 time 0.4023 (0.4173) data time 0.0009 (0.0016) model time 0.4014 (0.4126) loss 7.6535 (6.3679) grad_norm 2.4501 (3.7246) loss_scale 256.0000 (197.8569) mem 14939MB [2024-07-25 11:44:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][610/625] eta 0:00:06 lr 0.000052 wd 0.0500 time 0.3988 (0.4170) data time 0.0006 (0.0016) model time 0.3982 (0.4123) loss 5.6694 (6.3646) grad_norm 11.0977 (3.7511) loss_scale 256.0000 (198.8085) mem 14939MB [2024-07-25 11:44:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][620/625] eta 0:00:02 lr 0.000052 wd 0.0500 time 0.3984 (0.4168) data time 0.0005 (0.0016) model time 0.3980 (0.4121) loss 6.1952 (6.3649) grad_norm 2.3541 (3.7344) loss_scale 256.0000 (199.7295) mem 14939MB [2024-07-25 11:44:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 266 training takes 0:04:20 [2024-07-25 11:44:58 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:44:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:45:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.5396 (0.5396) Acc@1 90.283 (90.283) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 11:45:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8047 (0.6507) Acc@1 82.617 (87.575) Acc@5 97.021 (98.060) Mem 14939MB [2024-07-25 11:45:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.8945 (0.7536) Acc@1 79.834 (84.698) Acc@5 96.045 (97.089) Mem 14939MB [2024-07-25 11:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.277 Acc@5 97.059 [2024-07-25 11:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 11:45:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.28% [2024-07-25 11:45:02 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 11:45:02 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 11:45:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.5396 (0.5396) Acc@1 89.990 (89.990) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 11:45:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8105 (0.6556) Acc@1 83.301 (87.451) Acc@5 97.119 (98.007) Mem 14939MB [2024-07-25 11:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9136 (0.7593) Acc@1 78.711 (84.510) Acc@5 95.801 (97.028) Mem 14939MB [2024-07-25 11:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.115 Acc@5 96.989 [2024-07-25 11:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.12% [2024-07-25 11:45:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:45:06 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:45:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][0/625] eta 0:08:46 lr 0.000052 wd 0.0500 time 0.8422 (0.8422) data time 0.4411 (0.4411) model time 0.0000 (0.0000) loss 7.4014 (7.4014) grad_norm 2.8158 (2.8158) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][10/625] eta 0:04:29 lr 0.000052 wd 0.0500 time 0.3969 (0.4389) data time 0.0009 (0.0409) model time 0.0000 (0.0000) loss 6.2616 (6.3013) grad_norm 2.8873 (3.8002) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][20/625] eta 0:04:19 lr 0.000052 wd 0.0500 time 0.3981 (0.4289) data time 0.0007 (0.0218) model time 0.0000 (0.0000) loss 6.0234 (6.3235) grad_norm 2.9644 (3.3617) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][30/625] eta 0:04:15 lr 0.000052 wd 0.0500 time 0.4127 (0.4296) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 7.2629 (6.3345) grad_norm 2.6748 (3.2239) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][40/625] eta 0:04:09 lr 0.000052 wd 0.0500 time 0.4539 (0.4273) data time 0.0007 (0.0116) model time 0.0000 (0.0000) loss 6.1811 (6.3944) grad_norm 2.9413 (3.3901) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][50/625] eta 0:04:06 lr 0.000052 wd 0.0500 time 0.5120 (0.4280) data time 0.0009 (0.0095) model time 0.0000 (0.0000) loss 5.6613 (6.3165) grad_norm 3.4431 (3.4142) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][60/625] eta 0:04:02 lr 0.000052 wd 0.0500 time 0.3941 (0.4299) data time 0.0007 (0.0081) model time 0.3935 (0.4388) loss 5.5172 (6.2815) grad_norm 2.3648 (3.5072) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][70/625] eta 0:04:01 lr 0.000052 wd 0.0500 time 0.3953 (0.4358) data time 0.0006 (0.0071) model time 0.3947 (0.4551) loss 5.5192 (6.2733) grad_norm 2.6947 (4.4328) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][80/625] eta 0:03:59 lr 0.000052 wd 0.0500 time 0.5951 (0.4401) data time 0.0008 (0.0063) model time 0.5943 (0.4598) loss 6.7422 (6.3048) grad_norm 2.6461 (4.3321) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][90/625] eta 0:03:55 lr 0.000052 wd 0.0500 time 0.3985 (0.4398) data time 0.0010 (0.0057) model time 0.3975 (0.4541) loss 7.7634 (6.3212) grad_norm 2.3945 (4.3938) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][100/625] eta 0:03:50 lr 0.000052 wd 0.0500 time 0.5938 (0.4396) data time 0.0009 (0.0053) model time 0.5929 (0.4505) loss 6.1366 (6.3183) grad_norm 3.0490 (4.3765) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][110/625] eta 0:03:45 lr 0.000052 wd 0.0500 time 0.3957 (0.4370) data time 0.0007 (0.0049) model time 0.3951 (0.4438) loss 6.1247 (6.2717) grad_norm 2.0247 (4.3207) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:45:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][120/625] eta 0:03:39 lr 0.000052 wd 0.0500 time 0.3945 (0.4339) data time 0.0008 (0.0045) model time 0.3937 (0.4373) loss 5.4484 (6.2822) grad_norm 7.3426 (4.2801) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][130/625] eta 0:03:33 lr 0.000052 wd 0.0500 time 0.3954 (0.4314) data time 0.0009 (0.0043) model time 0.3945 (0.4327) loss 6.0372 (6.2842) grad_norm 4.1184 (4.2178) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][140/625] eta 0:03:28 lr 0.000052 wd 0.0500 time 0.3992 (0.4293) data time 0.0009 (0.0040) model time 0.3983 (0.4291) loss 5.5190 (6.3143) grad_norm 3.4063 (4.1440) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][150/625] eta 0:03:22 lr 0.000052 wd 0.0500 time 0.3990 (0.4273) data time 0.0008 (0.0038) model time 0.3982 (0.4261) loss 6.5782 (6.3117) grad_norm 2.3983 (4.0969) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][160/625] eta 0:03:18 lr 0.000052 wd 0.0500 time 0.4021 (0.4260) data time 0.0006 (0.0036) model time 0.4014 (0.4242) loss 6.3930 (6.3105) grad_norm 1.9975 (4.0322) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][170/625] eta 0:03:13 lr 0.000052 wd 0.0500 time 0.4025 (0.4245) data time 0.0008 (0.0035) model time 0.4017 (0.4221) loss 6.8188 (6.3177) grad_norm 4.2630 (4.1499) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][180/625] eta 0:03:08 lr 0.000052 wd 0.0500 time 0.4006 (0.4231) data time 0.0007 (0.0033) model time 0.4000 (0.4203) loss 6.6971 (6.3217) grad_norm 3.3243 (4.0832) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][190/625] eta 0:03:03 lr 0.000052 wd 0.0500 time 0.3993 (0.4219) data time 0.0008 (0.0032) model time 0.3985 (0.4189) loss 6.4766 (6.3100) grad_norm 2.8443 (4.0418) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][200/625] eta 0:02:58 lr 0.000051 wd 0.0500 time 0.4013 (0.4209) data time 0.0010 (0.0031) model time 0.4004 (0.4176) loss 5.3032 (6.3071) grad_norm 2.6502 (4.0361) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][210/625] eta 0:02:54 lr 0.000051 wd 0.0500 time 0.3984 (0.4200) data time 0.0006 (0.0030) model time 0.3978 (0.4165) loss 7.3410 (6.3235) grad_norm 2.5017 (3.9774) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][220/625] eta 0:02:49 lr 0.000051 wd 0.0500 time 0.3957 (0.4190) data time 0.0007 (0.0029) model time 0.3950 (0.4155) loss 6.9080 (6.3384) grad_norm 2.7876 (3.9218) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][230/625] eta 0:02:45 lr 0.000051 wd 0.0500 time 0.4047 (0.4182) data time 0.0006 (0.0028) model time 0.4040 (0.4146) loss 6.0388 (6.3578) grad_norm 7.8392 (3.9177) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][240/625] eta 0:02:41 lr 0.000051 wd 0.0500 time 0.4007 (0.4182) data time 0.0006 (0.0027) model time 0.4001 (0.4147) loss 5.7180 (6.3625) grad_norm 2.7150 (3.8654) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][250/625] eta 0:02:37 lr 0.000051 wd 0.0500 time 0.4033 (0.4187) data time 0.0006 (0.0026) model time 0.4027 (0.4155) loss 6.7680 (6.3814) grad_norm 4.3192 (3.8491) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:46:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][260/625] eta 0:02:32 lr 0.000051 wd 0.0500 time 0.5958 (0.4187) data time 0.0007 (0.0026) model time 0.5951 (0.4156) loss 6.3489 (6.3805) grad_norm 2.1281 (3.8331) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][270/625] eta 0:02:28 lr 0.000051 wd 0.0500 time 0.3971 (0.4193) data time 0.0009 (0.0025) model time 0.3962 (0.4164) loss 7.2723 (6.3906) grad_norm 3.7643 (3.8518) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][280/625] eta 0:02:25 lr 0.000051 wd 0.0500 time 0.5816 (0.4212) data time 0.0008 (0.0024) model time 0.5808 (0.4189) loss 6.7203 (6.3908) grad_norm 4.4573 (3.8342) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][290/625] eta 0:02:22 lr 0.000051 wd 0.0500 time 0.5967 (0.4240) data time 0.0007 (0.0024) model time 0.5961 (0.4222) loss 6.4373 (6.3947) grad_norm 2.3839 (3.7988) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][300/625] eta 0:02:18 lr 0.000051 wd 0.0500 time 0.5764 (0.4250) data time 0.0008 (0.0023) model time 0.5756 (0.4235) loss 7.1311 (6.3963) grad_norm 2.9803 (3.7895) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][310/625] eta 0:02:14 lr 0.000051 wd 0.0500 time 0.5515 (0.4258) data time 0.0006 (0.0023) model time 0.5509 (0.4245) loss 6.4196 (6.3993) grad_norm 3.8029 (3.7649) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][320/625] eta 0:02:09 lr 0.000051 wd 0.0500 time 0.6246 (0.4261) data time 0.0007 (0.0022) model time 0.6239 (0.4249) loss 5.2196 (6.3925) grad_norm 2.4228 (3.7403) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][330/625] eta 0:02:05 lr 0.000051 wd 0.0500 time 0.3964 (0.4259) data time 0.0007 (0.0022) model time 0.3957 (0.4246) loss 6.1436 (6.3956) grad_norm 32.8748 (3.8286) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][340/625] eta 0:02:01 lr 0.000051 wd 0.0500 time 0.3989 (0.4251) data time 0.0008 (0.0022) model time 0.3980 (0.4237) loss 6.8593 (6.4027) grad_norm 14.1841 (3.8573) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][350/625] eta 0:01:56 lr 0.000051 wd 0.0500 time 0.3981 (0.4243) data time 0.0006 (0.0021) model time 0.3975 (0.4228) loss 5.9263 (6.3941) grad_norm 2.8947 (3.8548) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][360/625] eta 0:01:52 lr 0.000051 wd 0.0500 time 0.3965 (0.4236) data time 0.0009 (0.0021) model time 0.3957 (0.4220) loss 7.0745 (6.4048) grad_norm 2.6174 (3.8238) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][370/625] eta 0:01:47 lr 0.000051 wd 0.0500 time 0.4053 (0.4229) data time 0.0008 (0.0021) model time 0.4045 (0.4213) loss 6.4505 (6.4122) grad_norm 2.9369 (3.7987) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][380/625] eta 0:01:43 lr 0.000051 wd 0.0500 time 0.3969 (0.4223) data time 0.0008 (0.0020) model time 0.3961 (0.4205) loss 7.4456 (6.4183) grad_norm 2.1488 (3.7749) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][390/625] eta 0:01:39 lr 0.000051 wd 0.0500 time 0.3945 (0.4216) data time 0.0009 (0.0020) model time 0.3936 (0.4198) loss 6.6104 (6.4190) grad_norm 2.7563 (3.7500) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][400/625] eta 0:01:34 lr 0.000051 wd 0.0500 time 0.3997 (0.4211) data time 0.0006 (0.0020) model time 0.3990 (0.4192) loss 6.0494 (6.4314) grad_norm 3.4194 (3.7668) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:47:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][410/625] eta 0:01:30 lr 0.000051 wd 0.0500 time 0.3934 (0.4206) data time 0.0007 (0.0020) model time 0.3926 (0.4187) loss 6.5562 (6.4269) grad_norm 4.9768 (3.7630) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][420/625] eta 0:01:26 lr 0.000051 wd 0.0500 time 0.3953 (0.4201) data time 0.0007 (0.0019) model time 0.3946 (0.4181) loss 5.9576 (6.4127) grad_norm 2.4332 (3.7632) loss_scale 256.0000 (256.0000) mem 14939MB [2024-07-25 11:48:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][430/625] eta 0:01:21 lr 0.000051 wd 0.0500 time 0.3993 (0.4196) data time 0.0007 (0.0019) model time 0.3987 (0.4176) loss 5.8401 (6.4145) grad_norm 2.2125 (inf) loss_scale 128.0000 (255.1090) mem 14939MB [2024-07-25 11:48:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][440/625] eta 0:01:17 lr 0.000051 wd 0.0500 time 0.3988 (0.4191) data time 0.0007 (0.0019) model time 0.3982 (0.4171) loss 6.7728 (6.4158) grad_norm 2.8862 (inf) loss_scale 128.0000 (252.2268) mem 14939MB [2024-07-25 11:48:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][450/625] eta 0:01:13 lr 0.000051 wd 0.0500 time 0.3970 (0.4187) data time 0.0009 (0.0019) model time 0.3961 (0.4166) loss 6.0786 (6.4249) grad_norm 3.4918 (inf) loss_scale 128.0000 (249.4723) mem 14939MB [2024-07-25 11:48:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][460/625] eta 0:01:09 lr 0.000050 wd 0.0500 time 0.4000 (0.4186) data time 0.0006 (0.0018) model time 0.3994 (0.4165) loss 5.0692 (6.4243) grad_norm 2.5223 (inf) loss_scale 128.0000 (246.8373) mem 14939MB [2024-07-25 11:48:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][470/625] eta 0:01:04 lr 0.000050 wd 0.0500 time 0.3998 (0.4186) data time 0.0008 (0.0018) model time 0.3990 (0.4165) loss 6.9681 (6.4222) grad_norm 2.8608 (inf) loss_scale 128.0000 (244.3142) mem 14939MB [2024-07-25 11:48:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][480/625] eta 0:01:00 lr 0.000050 wd 0.0500 time 0.3967 (0.4189) data time 0.0006 (0.0018) model time 0.3961 (0.4169) loss 6.0435 (6.4140) grad_norm 3.2347 (inf) loss_scale 128.0000 (241.8960) mem 14939MB [2024-07-25 11:48:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][490/625] eta 0:00:56 lr 0.000050 wd 0.0500 time 0.3964 (0.4191) data time 0.0007 (0.0018) model time 0.3958 (0.4172) loss 6.5238 (6.4154) grad_norm 4.6731 (inf) loss_scale 128.0000 (239.5764) mem 14939MB [2024-07-25 11:48:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][500/625] eta 0:00:52 lr 0.000050 wd 0.0500 time 0.5719 (0.4201) data time 0.0008 (0.0018) model time 0.5711 (0.4183) loss 5.6969 (6.4179) grad_norm 2.5275 (inf) loss_scale 128.0000 (237.3493) mem 14939MB [2024-07-25 11:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][510/625] eta 0:00:48 lr 0.000050 wd 0.0500 time 0.3925 (0.4210) data time 0.0006 (0.0017) model time 0.3919 (0.4193) loss 5.4840 (6.4130) grad_norm 2.9930 (inf) loss_scale 128.0000 (235.2094) mem 14939MB [2024-07-25 11:48:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][520/625] eta 0:00:44 lr 0.000050 wd 0.0500 time 0.3975 (0.4217) data time 0.0010 (0.0017) model time 0.3965 (0.4201) loss 6.8699 (6.4146) grad_norm 2.2353 (inf) loss_scale 128.0000 (233.1516) mem 14939MB [2024-07-25 11:48:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][530/625] eta 0:00:40 lr 0.000050 wd 0.0500 time 0.5984 (0.4227) data time 0.0007 (0.0017) model time 0.5977 (0.4212) loss 6.7654 (6.4189) grad_norm 2.1767 (inf) loss_scale 128.0000 (231.1714) mem 14939MB [2024-07-25 11:48:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][540/625] eta 0:00:35 lr 0.000050 wd 0.0500 time 0.3965 (0.4226) data time 0.0007 (0.0017) model time 0.3958 (0.4212) loss 6.2342 (6.4181) grad_norm 1.9344 (inf) loss_scale 128.0000 (229.2643) mem 14939MB [2024-07-25 11:48:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][550/625] eta 0:00:31 lr 0.000050 wd 0.0500 time 0.4008 (0.4225) data time 0.0008 (0.0017) model time 0.3999 (0.4210) loss 6.2576 (6.4191) grad_norm 2.6503 (inf) loss_scale 128.0000 (227.4265) mem 14939MB [2024-07-25 11:49:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][560/625] eta 0:00:27 lr 0.000050 wd 0.0500 time 0.3947 (0.4221) data time 0.0007 (0.0017) model time 0.3940 (0.4206) loss 6.4313 (6.4167) grad_norm 4.6760 (inf) loss_scale 128.0000 (225.6542) mem 14939MB [2024-07-25 11:49:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][570/625] eta 0:00:23 lr 0.000050 wd 0.0500 time 0.3957 (0.4216) data time 0.0009 (0.0017) model time 0.3949 (0.4201) loss 7.0389 (6.4169) grad_norm 2.2847 (inf) loss_scale 128.0000 (223.9440) mem 14939MB [2024-07-25 11:49:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][580/625] eta 0:00:18 lr 0.000050 wd 0.0500 time 0.3964 (0.4212) data time 0.0008 (0.0016) model time 0.3956 (0.4197) loss 5.4101 (6.4170) grad_norm 2.2201 (inf) loss_scale 128.0000 (222.2926) mem 14939MB [2024-07-25 11:49:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][590/625] eta 0:00:14 lr 0.000050 wd 0.0500 time 0.4013 (0.4208) data time 0.0008 (0.0016) model time 0.4004 (0.4193) loss 6.4118 (6.4221) grad_norm 4.2994 (inf) loss_scale 128.0000 (220.6971) mem 14939MB [2024-07-25 11:49:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][600/625] eta 0:00:10 lr 0.000050 wd 0.0500 time 0.3989 (0.4204) data time 0.0009 (0.0016) model time 0.3980 (0.4189) loss 5.8663 (6.4257) grad_norm 2.1148 (inf) loss_scale 128.0000 (219.1547) mem 14939MB [2024-07-25 11:49:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][610/625] eta 0:00:06 lr 0.000050 wd 0.0500 time 0.3974 (0.4201) data time 0.0004 (0.0016) model time 0.3970 (0.4185) loss 6.1503 (6.4199) grad_norm 3.5200 (inf) loss_scale 128.0000 (217.6628) mem 14939MB [2024-07-25 11:49:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][620/625] eta 0:00:02 lr 0.000050 wd 0.0500 time 0.4000 (0.4197) data time 0.0004 (0.0016) model time 0.3996 (0.4181) loss 6.6114 (6.4177) grad_norm 5.3984 (inf) loss_scale 128.0000 (216.2190) mem 14939MB [2024-07-25 11:49:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 267 training takes 0:04:22 [2024-07-25 11:49:28 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:49:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:49:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.5474 (0.5474) Acc@1 90.039 (90.039) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 11:49:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.8208 (0.6578) Acc@1 82.373 (87.589) Acc@5 96.973 (97.989) Mem 14939MB [2024-07-25 11:49:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.105) Loss 0.9009 (0.7602) Acc@1 79.395 (84.717) Acc@5 96.191 (97.094) Mem 14939MB [2024-07-25 11:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.351 Acc@5 97.057 [2024-07-25 11:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 11:49:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.35% [2024-07-25 11:49:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 11:49:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 11:49:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.5391 (0.5391) Acc@1 89.990 (89.990) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 11:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8101 (0.6552) Acc@1 83.301 (87.460) Acc@5 97.119 (98.025) Mem 14939MB [2024-07-25 11:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9131 (0.7588) Acc@1 78.809 (84.526) Acc@5 95.801 (97.042) Mem 14939MB [2024-07-25 11:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.127 Acc@5 97.005 [2024-07-25 11:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-07-25 11:49:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.13% [2024-07-25 11:49:35 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:49:36 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:49:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][0/625] eta 0:07:59 lr 0.000050 wd 0.0500 time 0.7669 (0.7669) data time 0.3890 (0.3890) model time 0.0000 (0.0000) loss 6.0402 (6.0402) grad_norm 4.6342 (4.6342) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:49:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][10/625] eta 0:04:26 lr 0.000050 wd 0.0500 time 0.4022 (0.4337) data time 0.0008 (0.0361) model time 0.0000 (0.0000) loss 6.1138 (6.1282) grad_norm 5.5901 (3.8549) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:49:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][20/625] eta 0:04:12 lr 0.000050 wd 0.0500 time 0.3994 (0.4174) data time 0.0006 (0.0193) model time 0.0000 (0.0000) loss 6.8640 (6.2559) grad_norm 3.4219 (3.9279) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:49:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][30/625] eta 0:04:04 lr 0.000050 wd 0.0500 time 0.4075 (0.4115) data time 0.0006 (0.0133) model time 0.0000 (0.0000) loss 6.6153 (6.2953) grad_norm 2.6017 (3.5636) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:49:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][40/625] eta 0:03:58 lr 0.000050 wd 0.0500 time 0.3998 (0.4082) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 5.2101 (6.2890) grad_norm 5.5169 (3.5941) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:49:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][50/625] eta 0:03:55 lr 0.000050 wd 0.0500 time 0.5437 (0.4093) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 6.1811 (6.3140) grad_norm 2.8213 (3.5157) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][60/625] eta 0:03:50 lr 0.000050 wd 0.0500 time 0.4022 (0.4079) data time 0.0009 (0.0072) model time 0.4013 (0.4002) loss 6.9820 (6.3187) grad_norm 6.9074 (3.5294) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][70/625] eta 0:03:48 lr 0.000050 wd 0.0500 time 0.4115 (0.4114) data time 0.0007 (0.0063) model time 0.4108 (0.4160) loss 5.4078 (6.3148) grad_norm 2.7657 (3.5391) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][80/625] eta 0:03:44 lr 0.000050 wd 0.0500 time 0.3952 (0.4118) data time 0.0009 (0.0056) model time 0.3944 (0.4152) loss 6.5796 (6.2902) grad_norm 2.2508 (3.4479) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][90/625] eta 0:03:43 lr 0.000050 wd 0.0500 time 0.5734 (0.4173) data time 0.0006 (0.0051) model time 0.5728 (0.4266) loss 6.9278 (6.2769) grad_norm 1.9000 (3.3849) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][100/625] eta 0:03:40 lr 0.000050 wd 0.0500 time 0.3995 (0.4204) data time 0.0008 (0.0047) model time 0.3987 (0.4308) loss 6.3723 (6.2994) grad_norm 3.1308 (3.9154) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][110/625] eta 0:03:38 lr 0.000049 wd 0.0500 time 0.6022 (0.4252) data time 0.0007 (0.0043) model time 0.6015 (0.4379) loss 6.5706 (6.3141) grad_norm 6.3044 (3.9006) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][120/625] eta 0:03:34 lr 0.000049 wd 0.0500 time 0.3959 (0.4255) data time 0.0009 (0.0040) model time 0.3951 (0.4364) loss 6.3644 (6.3053) grad_norm 5.1055 (3.9387) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][130/625] eta 0:03:30 lr 0.000049 wd 0.0500 time 0.3957 (0.4257) data time 0.0008 (0.0038) model time 0.3949 (0.4353) loss 6.4194 (6.3298) grad_norm 2.5343 (3.9376) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][140/625] eta 0:03:26 lr 0.000049 wd 0.0500 time 0.3990 (0.4261) data time 0.0009 (0.0036) model time 0.3981 (0.4347) loss 6.1249 (6.3438) grad_norm 3.4648 (3.8506) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][150/625] eta 0:03:21 lr 0.000049 wd 0.0500 time 0.3920 (0.4249) data time 0.0009 (0.0034) model time 0.3911 (0.4320) loss 5.3843 (6.3311) grad_norm 2.2444 (3.7779) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][160/625] eta 0:03:16 lr 0.000049 wd 0.0500 time 0.3952 (0.4232) data time 0.0008 (0.0032) model time 0.3944 (0.4289) loss 5.6534 (6.3220) grad_norm 5.6655 (3.7725) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][170/625] eta 0:03:11 lr 0.000049 wd 0.0500 time 0.3998 (0.4218) data time 0.0007 (0.0031) model time 0.3991 (0.4262) loss 6.6574 (6.3223) grad_norm 2.9812 (3.7536) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][180/625] eta 0:03:07 lr 0.000049 wd 0.0500 time 0.4000 (0.4206) data time 0.0008 (0.0030) model time 0.3992 (0.4242) loss 6.7597 (6.3253) grad_norm 1.9144 (3.8354) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:50:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][190/625] eta 0:03:02 lr 0.000049 wd 0.0500 time 0.3975 (0.4195) data time 0.0007 (0.0029) model time 0.3968 (0.4224) loss 6.4661 (6.3235) grad_norm 4.6285 (3.8310) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][200/625] eta 0:02:57 lr 0.000049 wd 0.0500 time 0.3984 (0.4185) data time 0.0010 (0.0028) model time 0.3974 (0.4208) loss 7.0047 (6.3425) grad_norm 2.2720 (3.8617) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][210/625] eta 0:02:53 lr 0.000049 wd 0.0500 time 0.3962 (0.4177) data time 0.0009 (0.0027) model time 0.3954 (0.4195) loss 6.6565 (6.3480) grad_norm 2.9191 (3.8308) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][220/625] eta 0:02:49 lr 0.000049 wd 0.0500 time 0.3987 (0.4176) data time 0.0007 (0.0026) model time 0.3980 (0.4193) loss 6.2437 (6.3354) grad_norm 3.7210 (3.8276) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][230/625] eta 0:02:44 lr 0.000049 wd 0.0500 time 0.4002 (0.4169) data time 0.0007 (0.0025) model time 0.3996 (0.4182) loss 6.9250 (6.3386) grad_norm 2.6690 (3.8297) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][240/625] eta 0:02:40 lr 0.000049 wd 0.0500 time 0.4019 (0.4163) data time 0.0008 (0.0025) model time 0.4011 (0.4174) loss 5.5753 (6.3439) grad_norm 3.2923 (3.7876) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][250/625] eta 0:02:35 lr 0.000049 wd 0.0500 time 0.4068 (0.4157) data time 0.0009 (0.0024) model time 0.4059 (0.4165) loss 6.6556 (6.3489) grad_norm 3.3057 (3.7754) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][260/625] eta 0:02:31 lr 0.000049 wd 0.0500 time 0.4007 (0.4152) data time 0.0008 (0.0023) model time 0.3999 (0.4158) loss 7.9268 (6.3472) grad_norm 2.3495 (3.7394) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][270/625] eta 0:02:27 lr 0.000049 wd 0.0500 time 0.3969 (0.4145) data time 0.0007 (0.0023) model time 0.3962 (0.4149) loss 6.8251 (6.3511) grad_norm 2.1025 (3.7333) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][280/625] eta 0:02:23 lr 0.000049 wd 0.0500 time 0.3958 (0.4146) data time 0.0009 (0.0022) model time 0.3949 (0.4149) loss 7.2700 (6.3479) grad_norm 3.3445 (3.7322) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][290/625] eta 0:02:18 lr 0.000049 wd 0.0500 time 0.3972 (0.4147) data time 0.0007 (0.0022) model time 0.3965 (0.4150) loss 6.1553 (6.3491) grad_norm 5.1594 (3.7857) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][300/625] eta 0:02:14 lr 0.000049 wd 0.0500 time 0.3994 (0.4147) data time 0.0007 (0.0021) model time 0.3988 (0.4150) loss 7.1410 (6.3532) grad_norm 3.4750 (3.8012) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][310/625] eta 0:02:10 lr 0.000049 wd 0.0500 time 0.5614 (0.4157) data time 0.0007 (0.0021) model time 0.5607 (0.4161) loss 6.4273 (6.3451) grad_norm 2.4260 (3.8004) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][320/625] eta 0:02:07 lr 0.000049 wd 0.0500 time 0.3957 (0.4169) data time 0.0007 (0.0021) model time 0.3950 (0.4175) loss 6.3926 (6.3524) grad_norm 2.3771 (3.7723) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][330/625] eta 0:02:03 lr 0.000049 wd 0.0500 time 0.3972 (0.4185) data time 0.0009 (0.0020) model time 0.3963 (0.4193) loss 6.6984 (6.3409) grad_norm 20.9261 (3.8167) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:51:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][340/625] eta 0:01:59 lr 0.000049 wd 0.0500 time 0.3994 (0.4192) data time 0.0007 (0.0020) model time 0.3987 (0.4200) loss 7.1455 (6.3470) grad_norm 2.9739 (3.7964) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][350/625] eta 0:01:55 lr 0.000049 wd 0.0500 time 0.5816 (0.4201) data time 0.0007 (0.0019) model time 0.5808 (0.4211) loss 5.9058 (6.3439) grad_norm 3.0540 (3.7833) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][360/625] eta 0:01:51 lr 0.000049 wd 0.0500 time 0.3988 (0.4198) data time 0.0008 (0.0019) model time 0.3980 (0.4207) loss 6.0911 (6.3480) grad_norm 2.1138 (3.7482) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][370/625] eta 0:01:46 lr 0.000049 wd 0.0500 time 0.3968 (0.4196) data time 0.0009 (0.0019) model time 0.3959 (0.4204) loss 6.8094 (6.3566) grad_norm 3.0475 (3.7400) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][380/625] eta 0:01:42 lr 0.000048 wd 0.0500 time 0.3978 (0.4191) data time 0.0007 (0.0019) model time 0.3971 (0.4197) loss 6.1962 (6.3562) grad_norm 2.2799 (3.7362) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][390/625] eta 0:01:38 lr 0.000048 wd 0.0500 time 0.3994 (0.4186) data time 0.0007 (0.0018) model time 0.3987 (0.4191) loss 6.5481 (6.3624) grad_norm 3.8824 (3.7561) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][400/625] eta 0:01:34 lr 0.000048 wd 0.0500 time 0.4028 (0.4181) data time 0.0007 (0.0018) model time 0.4021 (0.4185) loss 5.8473 (6.3564) grad_norm 2.3575 (3.7475) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][410/625] eta 0:01:29 lr 0.000048 wd 0.0500 time 0.3958 (0.4176) data time 0.0008 (0.0018) model time 0.3950 (0.4179) loss 6.2216 (6.3613) grad_norm 2.1096 (3.7705) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][420/625] eta 0:01:25 lr 0.000048 wd 0.0500 time 0.4030 (0.4172) data time 0.0008 (0.0018) model time 0.4022 (0.4174) loss 5.2577 (6.3547) grad_norm 23.7466 (3.8303) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][430/625] eta 0:01:21 lr 0.000048 wd 0.0500 time 0.3996 (0.4168) data time 0.0009 (0.0017) model time 0.3987 (0.4169) loss 6.5674 (6.3468) grad_norm 2.1722 (3.8378) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][440/625] eta 0:01:17 lr 0.000048 wd 0.0500 time 0.4002 (0.4168) data time 0.0007 (0.0017) model time 0.3996 (0.4170) loss 6.1773 (6.3464) grad_norm 2.7078 (3.8117) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][450/625] eta 0:01:12 lr 0.000048 wd 0.0500 time 0.4009 (0.4165) data time 0.0009 (0.0017) model time 0.4000 (0.4165) loss 6.4894 (6.3482) grad_norm 2.3657 (3.8047) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][460/625] eta 0:01:08 lr 0.000048 wd 0.0500 time 0.4012 (0.4161) data time 0.0009 (0.0017) model time 0.4003 (0.4161) loss 6.3281 (6.3501) grad_norm 2.0749 (3.8049) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][470/625] eta 0:01:04 lr 0.000048 wd 0.0500 time 0.4033 (0.4158) data time 0.0008 (0.0017) model time 0.4025 (0.4157) loss 6.4026 (6.3501) grad_norm 2.2525 (3.7876) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:52:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][480/625] eta 0:01:00 lr 0.000048 wd 0.0500 time 0.3968 (0.4154) data time 0.0006 (0.0017) model time 0.3962 (0.4153) loss 6.7021 (6.3504) grad_norm 1.9907 (3.7747) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][490/625] eta 0:00:56 lr 0.000048 wd 0.0500 time 0.3997 (0.4151) data time 0.0007 (0.0016) model time 0.3990 (0.4149) loss 6.5726 (6.3510) grad_norm 4.4251 (3.7685) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][500/625] eta 0:00:51 lr 0.000048 wd 0.0500 time 0.3979 (0.4151) data time 0.0007 (0.0016) model time 0.3972 (0.4149) loss 6.9440 (6.3520) grad_norm 4.1882 (3.7647) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][510/625] eta 0:00:47 lr 0.000048 wd 0.0500 time 0.3999 (0.4154) data time 0.0007 (0.0016) model time 0.3991 (0.4153) loss 5.4826 (6.3508) grad_norm 5.8028 (3.7599) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][520/625] eta 0:00:43 lr 0.000048 wd 0.0500 time 0.5925 (0.4155) data time 0.0009 (0.0016) model time 0.5916 (0.4153) loss 5.5317 (6.3526) grad_norm 4.3150 (3.7422) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][530/625] eta 0:00:39 lr 0.000048 wd 0.0500 time 0.4015 (0.4161) data time 0.0008 (0.0016) model time 0.4007 (0.4160) loss 6.7751 (6.3515) grad_norm 2.5187 (3.7904) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][540/625] eta 0:00:35 lr 0.000048 wd 0.0500 time 0.5929 (0.4175) data time 0.0006 (0.0016) model time 0.5923 (0.4175) loss 6.3500 (6.3495) grad_norm 2.7984 (3.7762) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][550/625] eta 0:00:31 lr 0.000048 wd 0.0500 time 0.5876 (0.4181) data time 0.0007 (0.0016) model time 0.5869 (0.4181) loss 5.2931 (6.3437) grad_norm 2.0423 (3.7735) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][560/625] eta 0:00:27 lr 0.000048 wd 0.0500 time 0.5768 (0.4188) data time 0.0009 (0.0015) model time 0.5759 (0.4189) loss 6.4957 (6.3412) grad_norm 2.7642 (3.7698) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][570/625] eta 0:00:23 lr 0.000048 wd 0.0500 time 0.3964 (0.4188) data time 0.0008 (0.0015) model time 0.3956 (0.4189) loss 6.4598 (6.3406) grad_norm 2.4667 (3.7739) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][580/625] eta 0:00:18 lr 0.000048 wd 0.0500 time 0.5753 (0.4194) data time 0.0006 (0.0015) model time 0.5747 (0.4195) loss 7.8271 (6.3458) grad_norm 3.3449 (3.7783) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][590/625] eta 0:00:14 lr 0.000048 wd 0.0500 time 0.3955 (0.4191) data time 0.0006 (0.0015) model time 0.3949 (0.4191) loss 5.3285 (6.3423) grad_norm 1.6344 (3.7820) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][600/625] eta 0:00:10 lr 0.000048 wd 0.0500 time 0.3978 (0.4187) data time 0.0009 (0.0015) model time 0.3969 (0.4187) loss 6.5059 (6.3483) grad_norm 2.3648 (3.7679) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][610/625] eta 0:00:06 lr 0.000048 wd 0.0500 time 0.3975 (0.4184) data time 0.0006 (0.0015) model time 0.3969 (0.4184) loss 6.6231 (6.3530) grad_norm 2.5624 (3.7628) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][620/625] eta 0:00:02 lr 0.000048 wd 0.0500 time 0.4019 (0.4181) data time 0.0004 (0.0015) model time 0.4014 (0.4180) loss 6.8444 (6.3555) grad_norm 2.1604 (3.7654) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:53:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 268 training takes 0:04:21 [2024-07-25 11:53:57 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:53:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 11:54:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.446 (0.446) Loss 0.5542 (0.5542) Acc@1 89.795 (89.795) Acc@5 98.828 (98.828) Mem 14939MB [2024-07-25 11:54:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8145 (0.6594) Acc@1 83.154 (87.540) Acc@5 97.266 (97.985) Mem 14939MB [2024-07-25 11:54:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9150 (0.7630) Acc@1 79.053 (84.610) Acc@5 96.094 (97.098) Mem 14939MB [2024-07-25 11:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.251 Acc@5 97.061 [2024-07-25 11:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 11:54:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.938 (0.938) Loss 0.5391 (0.5391) Acc@1 90.039 (90.039) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 11:54:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.164) Loss 0.8091 (0.6549) Acc@1 83.301 (87.491) Acc@5 97.119 (98.020) Mem 14939MB [2024-07-25 11:54:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.127) Loss 0.9126 (0.7585) Acc@1 78.955 (84.556) Acc@5 95.850 (97.040) Mem 14939MB [2024-07-25 11:54:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.161 Acc@5 97.003 [2024-07-25 11:54:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 11:54:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.16% [2024-07-25 11:54:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 11:54:06 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 11:54:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][0/625] eta 0:08:32 lr 0.000048 wd 0.0500 time 0.8197 (0.8197) data time 0.4237 (0.4237) model time 0.0000 (0.0000) loss 6.3628 (6.3628) grad_norm 5.1124 (5.1124) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:54:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][10/625] eta 0:04:29 lr 0.000048 wd 0.0500 time 0.3999 (0.4383) data time 0.0007 (0.0393) model time 0.0000 (0.0000) loss 5.4780 (6.4859) grad_norm 2.8927 (4.0447) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:54:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][20/625] eta 0:04:13 lr 0.000047 wd 0.0500 time 0.3968 (0.4198) data time 0.0009 (0.0210) model time 0.0000 (0.0000) loss 5.5848 (6.3520) grad_norm 3.6208 (3.3915) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:54:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][30/625] eta 0:04:05 lr 0.000047 wd 0.0500 time 0.4015 (0.4134) data time 0.0008 (0.0145) model time 0.0000 (0.0000) loss 6.8024 (6.3085) grad_norm 2.3516 (3.4879) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:54:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][40/625] eta 0:03:59 lr 0.000047 wd 0.0500 time 0.4034 (0.4100) data time 0.0008 (0.0112) model time 0.0000 (0.0000) loss 6.1996 (6.3504) grad_norm 1.8394 (3.3266) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:54:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][50/625] eta 0:03:54 lr 0.000047 wd 0.0500 time 0.3997 (0.4080) data time 0.0009 (0.0092) model time 0.0000 (0.0000) loss 6.4238 (6.3440) grad_norm 2.4843 (3.2550) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 11:54:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 11:54:27 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 11:54:29 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 12:32:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 12:32:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 12:33:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 12:33:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 12:33:56 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 12:34:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 12:34:16 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 12:34:16 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 12:34:16 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 12:34:16 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 269) [2024-07-25 12:34:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 12:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][60/625] eta 0:12:04 lr 0.000047 wd 0.0500 time 0.3959 (1.2831) data time 0.0006 (0.0700) model time 0.3953 (1.2131) loss 7.1378 (6.6817) grad_norm 13.6965 (5.4139) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:34:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][70/625] eta 0:07:32 lr 0.000047 wd 0.0500 time 0.3920 (0.8148) data time 0.0008 (0.0336) model time 0.3911 (0.7812) loss 6.6218 (6.5935) grad_norm 2.5909 (4.0971) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:34:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][80/625] eta 0:06:04 lr 0.000047 wd 0.0500 time 0.3914 (0.6696) data time 0.0007 (0.0223) model time 0.3907 (0.6473) loss 7.3238 (6.6071) grad_norm 3.2269 (3.6749) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:34:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][90/625] eta 0:05:20 lr 0.000047 wd 0.0500 time 0.3908 (0.5986) data time 0.0008 (0.0168) model time 0.3900 (0.5818) loss 7.0586 (6.6019) grad_norm 2.2083 (3.6014) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:34:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][100/625] eta 0:04:52 lr 0.000047 wd 0.0500 time 0.3944 (0.5568) data time 0.0008 (0.0135) model time 0.3935 (0.5433) loss 5.9515 (6.5912) grad_norm 3.5021 (3.5262) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:34:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][110/625] eta 0:04:36 lr 0.000047 wd 0.0500 time 0.3982 (0.5371) data time 0.0006 (0.0114) model time 0.3976 (0.5257) loss 5.7829 (6.5421) grad_norm 2.3101 (3.6425) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][120/625] eta 0:04:20 lr 0.000047 wd 0.0500 time 0.3932 (0.5164) data time 0.0009 (0.0099) model time 0.3923 (0.5066) loss 7.0031 (6.5428) grad_norm 1.9796 (3.6058) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][130/625] eta 0:04:08 lr 0.000047 wd 0.0500 time 0.4034 (0.5014) data time 0.0009 (0.0087) model time 0.4025 (0.4926) loss 5.8726 (6.5262) grad_norm 3.5641 (3.7084) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][140/625] eta 0:03:57 lr 0.000047 wd 0.0500 time 0.3912 (0.4893) data time 0.0006 (0.0078) model time 0.3906 (0.4815) loss 6.2783 (6.4946) grad_norm 3.0048 (3.7175) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][150/625] eta 0:03:47 lr 0.000047 wd 0.0500 time 0.3951 (0.4800) data time 0.0008 (0.0071) model time 0.3943 (0.4729) loss 6.2942 (6.4886) grad_norm 1.9682 (3.6929) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][160/625] eta 0:03:39 lr 0.000047 wd 0.0500 time 0.3903 (0.4722) data time 0.0007 (0.0065) model time 0.3896 (0.4657) loss 7.5723 (6.4976) grad_norm 2.2554 (3.6701) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][170/625] eta 0:03:31 lr 0.000047 wd 0.0500 time 0.3924 (0.4659) data time 0.0008 (0.0061) model time 0.3916 (0.4598) loss 6.8096 (6.5248) grad_norm 2.7052 (3.6152) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][180/625] eta 0:03:24 lr 0.000047 wd 0.0500 time 0.4036 (0.4605) data time 0.0007 (0.0057) model time 0.4029 (0.4548) loss 6.0790 (6.5027) grad_norm 2.3230 (3.5526) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][190/625] eta 0:03:18 lr 0.000047 wd 0.0500 time 0.3966 (0.4560) data time 0.0006 (0.0053) model time 0.3959 (0.4507) loss 7.0676 (6.5035) grad_norm 2.7010 (3.5382) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][200/625] eta 0:03:12 lr 0.000047 wd 0.0500 time 0.3958 (0.4520) data time 0.0007 (0.0050) model time 0.3950 (0.4470) loss 6.5914 (6.4848) grad_norm 2.5747 (3.4861) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][210/625] eta 0:03:06 lr 0.000047 wd 0.0500 time 0.3950 (0.4485) data time 0.0006 (0.0047) model time 0.3943 (0.4438) loss 7.0484 (6.4768) grad_norm 3.1427 (3.5018) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][220/625] eta 0:03:00 lr 0.000047 wd 0.0500 time 0.3977 (0.4454) data time 0.0008 (0.0045) model time 0.3969 (0.4409) loss 6.7125 (6.4994) grad_norm 4.3274 (3.5167) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][230/625] eta 0:02:54 lr 0.000047 wd 0.0500 time 0.3951 (0.4427) data time 0.0006 (0.0043) model time 0.3945 (0.4384) loss 6.1102 (6.4847) grad_norm 4.0882 (3.6891) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][240/625] eta 0:02:49 lr 0.000047 wd 0.0500 time 0.3974 (0.4403) data time 0.0006 (0.0041) model time 0.3968 (0.4362) loss 6.8989 (6.4828) grad_norm 3.4074 (3.7231) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][250/625] eta 0:02:44 lr 0.000047 wd 0.0500 time 0.3962 (0.4382) data time 0.0009 (0.0040) model time 0.3953 (0.4342) loss 5.5020 (6.4635) grad_norm 2.6057 (3.6753) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:35:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][260/625] eta 0:02:39 lr 0.000047 wd 0.0500 time 0.3969 (0.4363) data time 0.0007 (0.0038) model time 0.3962 (0.4325) loss 6.9691 (6.4482) grad_norm 2.5865 (3.6403) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][270/625] eta 0:02:34 lr 0.000047 wd 0.0500 time 0.3976 (0.4345) data time 0.0006 (0.0037) model time 0.3969 (0.4309) loss 6.1012 (6.4305) grad_norm 2.4752 (3.6162) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][280/625] eta 0:02:29 lr 0.000047 wd 0.0500 time 0.3989 (0.4329) data time 0.0006 (0.0035) model time 0.3982 (0.4294) loss 5.8298 (6.4289) grad_norm 3.8105 (3.6082) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][290/625] eta 0:02:24 lr 0.000047 wd 0.0500 time 0.3954 (0.4314) data time 0.0007 (0.0034) model time 0.3948 (0.4280) loss 5.4358 (6.4215) grad_norm 5.1224 (3.6121) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][300/625] eta 0:02:19 lr 0.000046 wd 0.0500 time 0.3958 (0.4300) data time 0.0007 (0.0033) model time 0.3951 (0.4267) loss 5.8160 (6.4185) grad_norm 2.1302 (3.5789) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][310/625] eta 0:02:15 lr 0.000046 wd 0.0500 time 0.4010 (0.4288) data time 0.0008 (0.0032) model time 0.4002 (0.4256) loss 7.1412 (6.4158) grad_norm 3.2594 (3.5524) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][320/625] eta 0:02:10 lr 0.000046 wd 0.0500 time 0.3963 (0.4277) data time 0.0006 (0.0032) model time 0.3957 (0.4245) loss 4.8467 (6.3984) grad_norm 2.7674 (3.5610) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][330/625] eta 0:02:06 lr 0.000046 wd 0.0500 time 0.3994 (0.4279) data time 0.0009 (0.0031) model time 0.3985 (0.4249) loss 7.0370 (6.4065) grad_norm 3.3704 (3.5374) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][340/625] eta 0:02:01 lr 0.000046 wd 0.0500 time 0.3964 (0.4269) data time 0.0009 (0.0030) model time 0.3955 (0.4239) loss 6.1575 (6.4074) grad_norm 3.4717 (3.5711) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][350/625] eta 0:01:57 lr 0.000046 wd 0.0500 time 0.3965 (0.4261) data time 0.0008 (0.0029) model time 0.3957 (0.4231) loss 7.1378 (6.3922) grad_norm 3.4682 (3.5651) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][360/625] eta 0:01:52 lr 0.000046 wd 0.0500 time 0.4021 (0.4252) data time 0.0008 (0.0029) model time 0.4012 (0.4223) loss 7.5500 (6.3885) grad_norm 2.5341 (3.5724) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][370/625] eta 0:01:48 lr 0.000046 wd 0.0500 time 0.3986 (0.4244) data time 0.0007 (0.0028) model time 0.3979 (0.4216) loss 6.7587 (6.4041) grad_norm 3.0118 (3.5609) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][380/625] eta 0:01:43 lr 0.000046 wd 0.0500 time 0.3997 (0.4237) data time 0.0008 (0.0027) model time 0.3988 (0.4209) loss 5.3436 (6.4051) grad_norm 3.7454 (3.5350) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][390/625] eta 0:01:39 lr 0.000046 wd 0.0500 time 0.3973 (0.4229) data time 0.0009 (0.0027) model time 0.3964 (0.4203) loss 6.5589 (6.4068) grad_norm 2.0562 (3.5420) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][400/625] eta 0:01:35 lr 0.000046 wd 0.0500 time 0.4125 (0.4223) data time 0.0007 (0.0026) model time 0.4118 (0.4197) loss 6.2483 (6.4061) grad_norm 2.1761 (3.5231) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:36:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][410/625] eta 0:01:30 lr 0.000046 wd 0.0500 time 0.3962 (0.4218) data time 0.0007 (0.0026) model time 0.3956 (0.4192) loss 5.8228 (6.3983) grad_norm 3.3085 (3.5230) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][420/625] eta 0:01:26 lr 0.000046 wd 0.0500 time 0.4005 (0.4212) data time 0.0008 (0.0026) model time 0.3997 (0.4187) loss 7.3862 (6.3965) grad_norm 4.7919 (3.5049) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][430/625] eta 0:01:22 lr 0.000046 wd 0.0500 time 0.3947 (0.4207) data time 0.0007 (0.0025) model time 0.3939 (0.4182) loss 7.0326 (6.3975) grad_norm 15.3531 (3.5171) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][440/625] eta 0:01:17 lr 0.000046 wd 0.0500 time 0.4208 (0.4202) data time 0.0006 (0.0025) model time 0.4202 (0.4177) loss 6.7123 (6.3921) grad_norm 2.1133 (3.5145) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][450/625] eta 0:01:13 lr 0.000046 wd 0.0500 time 0.3950 (0.4196) data time 0.0007 (0.0025) model time 0.3943 (0.4171) loss 6.9704 (6.3975) grad_norm 2.9767 (3.5190) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][460/625] eta 0:01:09 lr 0.000046 wd 0.0500 time 0.3945 (0.4191) data time 0.0009 (0.0024) model time 0.3936 (0.4167) loss 7.2195 (6.4015) grad_norm 5.4206 (3.5362) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][470/625] eta 0:01:04 lr 0.000046 wd 0.0500 time 0.3955 (0.4186) data time 0.0007 (0.0024) model time 0.3947 (0.4162) loss 5.3151 (6.3940) grad_norm 13.8400 (3.5570) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][480/625] eta 0:01:00 lr 0.000046 wd 0.0500 time 0.4089 (0.4182) data time 0.0006 (0.0024) model time 0.4083 (0.4158) loss 6.3867 (6.3945) grad_norm 2.4572 (3.5363) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][490/625] eta 0:00:56 lr 0.000046 wd 0.0500 time 0.4007 (0.4177) data time 0.0008 (0.0023) model time 0.3999 (0.4154) loss 6.7698 (6.4004) grad_norm 1.9937 (3.5162) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][500/625] eta 0:00:52 lr 0.000046 wd 0.0500 time 0.4260 (0.4174) data time 0.0009 (0.0023) model time 0.4251 (0.4151) loss 7.2896 (6.3982) grad_norm 6.0483 (3.5438) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][510/625] eta 0:00:47 lr 0.000046 wd 0.0500 time 0.3951 (0.4170) data time 0.0009 (0.0023) model time 0.3942 (0.4147) loss 6.0142 (6.3929) grad_norm 4.1797 (3.5472) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][520/625] eta 0:00:43 lr 0.000046 wd 0.0500 time 0.4106 (0.4166) data time 0.0006 (0.0022) model time 0.4100 (0.4144) loss 5.4901 (6.3855) grad_norm 4.8271 (3.5529) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][530/625] eta 0:00:39 lr 0.000046 wd 0.0500 time 0.3982 (0.4162) data time 0.0009 (0.0022) model time 0.3973 (0.4140) loss 7.2095 (6.3859) grad_norm 2.8745 (3.5520) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][540/625] eta 0:00:35 lr 0.000046 wd 0.0500 time 0.3976 (0.4158) data time 0.0009 (0.0022) model time 0.3967 (0.4136) loss 6.3604 (6.3910) grad_norm 2.9933 (3.5343) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][550/625] eta 0:00:31 lr 0.000046 wd 0.0500 time 0.3993 (0.4158) data time 0.0007 (0.0022) model time 0.3987 (0.4136) loss 5.9882 (6.3896) grad_norm 2.9860 (3.5342) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:37:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][560/625] eta 0:00:27 lr 0.000046 wd 0.0500 time 0.3973 (0.4154) data time 0.0006 (0.0021) model time 0.3967 (0.4133) loss 7.0761 (6.3943) grad_norm 4.4336 (3.5183) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:38:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][570/625] eta 0:00:22 lr 0.000046 wd 0.0500 time 0.3930 (0.4151) data time 0.0009 (0.0021) model time 0.3921 (0.4130) loss 6.1868 (6.3996) grad_norm 2.9087 (3.5154) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:38:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][580/625] eta 0:00:18 lr 0.000045 wd 0.0500 time 0.4288 (0.4149) data time 0.0007 (0.0021) model time 0.4282 (0.4128) loss 6.8621 (6.3913) grad_norm 10.8692 (3.5197) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:38:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][590/625] eta 0:00:14 lr 0.000045 wd 0.0500 time 0.3952 (0.4150) data time 0.0009 (0.0025) model time 0.3942 (0.4125) loss 7.6615 (6.3908) grad_norm 2.2146 (3.5074) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:38:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][600/625] eta 0:00:10 lr 0.000045 wd 0.0500 time 0.4201 (0.4147) data time 0.0006 (0.0024) model time 0.4195 (0.4122) loss 7.6383 (6.3921) grad_norm 2.2924 (3.5040) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:38:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][610/625] eta 0:00:06 lr 0.000045 wd 0.0500 time 0.3963 (0.4144) data time 0.0004 (0.0024) model time 0.3959 (0.4120) loss 6.7083 (6.3937) grad_norm 3.8573 (3.4923) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:38:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][620/625] eta 0:00:02 lr 0.000045 wd 0.0500 time 0.3963 (0.4141) data time 0.0006 (0.0024) model time 0.3957 (0.4117) loss 6.2854 (6.4022) grad_norm 2.2767 (3.5085) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 12:38:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 269 training takes 0:03:57 [2024-07-25 12:38:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 12:38:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 12:38:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.426 (0.426) Loss 0.5386 (0.5386) Acc@1 90.430 (90.430) Acc@5 98.926 (98.926) Mem 14931MB [2024-07-25 12:38:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.116) Loss 0.8062 (0.6526) Acc@1 82.715 (87.615) Acc@5 96.924 (97.998) Mem 14931MB [2024-07-25 12:38:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.102) Loss 0.9038 (0.7573) Acc@1 79.492 (84.645) Acc@5 96.045 (97.066) Mem 14931MB [2024-07-25 12:38:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.281 Acc@5 97.039 [2024-07-25 12:38:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 12:38:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.881 (0.881) Loss 0.5396 (0.5396) Acc@1 90.039 (90.039) Acc@5 98.926 (98.926) Mem 14931MB [2024-07-25 12:38:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.160) Loss 0.8086 (0.6544) Acc@1 83.301 (87.513) Acc@5 97.119 (98.025) Mem 14931MB [2024-07-25 12:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.125) Loss 0.9116 (0.7580) Acc@1 78.906 (84.559) Acc@5 95.850 (97.049) Mem 14931MB [2024-07-25 12:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.165 Acc@5 97.013 [2024-07-25 12:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 12:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.17% [2024-07-25 12:38:33 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 12:38:36 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 12:38:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][0/625] eta 0:10:00 lr 0.000045 wd 0.0500 time 0.9603 (0.9603) data time 0.3809 (0.3809) model time 0.0000 (0.0000) loss 6.9501 (6.9501) grad_norm 2.2521 (2.2521) loss_scale 128.0000 (128.0000) mem 14938MB [2024-07-25 12:38:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][10/625] eta 0:04:36 lr 0.000045 wd 0.0500 time 0.3963 (0.4488) data time 0.0006 (0.0354) model time 0.0000 (0.0000) loss 5.0263 (6.5335) grad_norm 3.0832 (2.9088) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][20/625] eta 0:04:16 lr 0.000045 wd 0.0500 time 0.3921 (0.4239) data time 0.0007 (0.0190) model time 0.0000 (0.0000) loss 6.7078 (6.6275) grad_norm 3.3761 (2.8972) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:38:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][30/625] eta 0:04:07 lr 0.000045 wd 0.0500 time 0.3932 (0.4152) data time 0.0008 (0.0131) model time 0.0000 (0.0000) loss 7.1476 (6.4988) grad_norm 2.9110 (2.8284) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][40/625] eta 0:04:00 lr 0.000045 wd 0.0500 time 0.3994 (0.4109) data time 0.0006 (0.0101) model time 0.0000 (0.0000) loss 6.2085 (6.4822) grad_norm 4.7331 (4.0910) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][50/625] eta 0:03:54 lr 0.000045 wd 0.0500 time 0.3954 (0.4084) data time 0.0006 (0.0083) model time 0.0000 (0.0000) loss 6.8131 (6.4994) grad_norm 3.6168 (3.9391) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][60/625] eta 0:03:49 lr 0.000045 wd 0.0500 time 0.3967 (0.4063) data time 0.0008 (0.0071) model time 0.3959 (0.3943) loss 6.2427 (6.4805) grad_norm 4.0061 (3.7688) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:39:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][70/625] eta 0:03:44 lr 0.000045 wd 0.0500 time 0.4091 (0.4053) data time 0.0006 (0.0062) model time 0.4085 (0.3964) loss 5.2587 (6.4071) grad_norm 2.8472 (3.6892) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:39:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][80/625] eta 0:03:41 lr 0.000045 wd 0.0500 time 0.3748 (0.4055) data time 0.0007 (0.0056) model time 0.3741 (0.3999) loss 6.5066 (6.4046) grad_norm 2.5197 (inf) loss_scale 64.0000 (120.0988) mem 14939MB [2024-07-25 12:39:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][90/625] eta 0:03:36 lr 0.000045 wd 0.0500 time 0.3974 (0.4046) data time 0.0008 (0.0050) model time 0.3966 (0.3989) loss 6.6910 (6.3795) grad_norm 2.2804 (inf) loss_scale 64.0000 (113.9341) mem 14939MB [2024-07-25 12:39:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][100/625] eta 0:03:32 lr 0.000045 wd 0.0500 time 0.4006 (0.4045) data time 0.0006 (0.0046) model time 0.4000 (0.3997) loss 6.5384 (6.3846) grad_norm 3.4606 (inf) loss_scale 64.0000 (108.9901) mem 14939MB [2024-07-25 12:39:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][110/625] eta 0:03:28 lr 0.000045 wd 0.0500 time 0.3996 (0.4039) data time 0.0006 (0.0043) model time 0.3989 (0.3992) loss 6.5932 (6.3642) grad_norm 3.9627 (inf) loss_scale 64.0000 (104.9369) mem 14939MB [2024-07-25 12:39:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][120/625] eta 0:03:23 lr 0.000045 wd 0.0500 time 0.3977 (0.4033) data time 0.0009 (0.0040) model time 0.3968 (0.3988) loss 7.4293 (6.3549) grad_norm 2.6356 (inf) loss_scale 64.0000 (101.5537) mem 14939MB [2024-07-25 12:39:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][130/625] eta 0:03:19 lr 0.000045 wd 0.0500 time 0.4026 (0.4030) data time 0.0008 (0.0038) model time 0.4018 (0.3988) loss 7.4890 (6.3289) grad_norm 3.0622 (inf) loss_scale 64.0000 (98.6870) mem 14939MB [2024-07-25 12:39:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][140/625] eta 0:03:15 lr 0.000045 wd 0.0500 time 0.3933 (0.4027) data time 0.0007 (0.0035) model time 0.3926 (0.3986) loss 5.5168 (6.3017) grad_norm 2.7999 (inf) loss_scale 64.0000 (96.2270) mem 14939MB [2024-07-25 12:39:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][150/625] eta 0:03:11 lr 0.000045 wd 0.0500 time 0.4038 (0.4036) data time 0.0006 (0.0034) model time 0.4032 (0.4003) loss 6.7320 (6.3052) grad_norm 4.0458 (inf) loss_scale 64.0000 (94.0927) mem 14939MB [2024-07-25 12:39:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][160/625] eta 0:03:10 lr 0.000045 wd 0.0500 time 0.3993 (0.4106) data time 0.0009 (0.0032) model time 0.3984 (0.4108) loss 7.0228 (6.3189) grad_norm 2.8638 (inf) loss_scale 64.0000 (92.2236) mem 14939MB [2024-07-25 12:39:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][170/625] eta 0:03:06 lr 0.000045 wd 0.0500 time 0.3964 (0.4099) data time 0.0006 (0.0031) model time 0.3958 (0.4097) loss 5.2288 (6.3313) grad_norm 3.2011 (inf) loss_scale 64.0000 (90.5731) mem 14939MB [2024-07-25 12:39:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][180/625] eta 0:03:02 lr 0.000045 wd 0.0500 time 0.3986 (0.4092) data time 0.0008 (0.0030) model time 0.3978 (0.4086) loss 5.6821 (6.3264) grad_norm 3.4988 (inf) loss_scale 64.0000 (89.1050) mem 14939MB [2024-07-25 12:39:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][190/625] eta 0:02:57 lr 0.000045 wd 0.0500 time 0.3953 (0.4085) data time 0.0009 (0.0028) model time 0.3944 (0.4077) loss 5.7913 (6.3320) grad_norm 2.1912 (inf) loss_scale 64.0000 (87.7906) mem 14939MB [2024-07-25 12:39:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][200/625] eta 0:02:53 lr 0.000045 wd 0.0500 time 0.4011 (0.4079) data time 0.0009 (0.0027) model time 0.4003 (0.4069) loss 6.9315 (6.3513) grad_norm 6.0989 (inf) loss_scale 64.0000 (86.6070) mem 14939MB [2024-07-25 12:40:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][210/625] eta 0:02:49 lr 0.000045 wd 0.0500 time 0.3976 (0.4074) data time 0.0006 (0.0027) model time 0.3970 (0.4063) loss 5.3667 (6.3567) grad_norm 2.5452 (inf) loss_scale 64.0000 (85.5355) mem 14939MB [2024-07-25 12:40:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][220/625] eta 0:02:44 lr 0.000045 wd 0.0500 time 0.4010 (0.4071) data time 0.0008 (0.0026) model time 0.4002 (0.4058) loss 7.0770 (6.3703) grad_norm 2.7890 (inf) loss_scale 64.0000 (84.5611) mem 14939MB [2024-07-25 12:40:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][230/625] eta 0:02:40 lr 0.000045 wd 0.0500 time 0.4001 (0.4067) data time 0.0008 (0.0025) model time 0.3993 (0.4054) loss 6.1828 (6.3622) grad_norm 3.4405 (inf) loss_scale 64.0000 (83.6710) mem 14939MB [2024-07-25 12:40:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][240/625] eta 0:02:36 lr 0.000044 wd 0.0500 time 0.3992 (0.4064) data time 0.0008 (0.0024) model time 0.3985 (0.4050) loss 6.7801 (6.3523) grad_norm 2.2069 (inf) loss_scale 64.0000 (82.8548) mem 14939MB [2024-07-25 12:40:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][250/625] eta 0:02:32 lr 0.000044 wd 0.0500 time 0.3972 (0.4060) data time 0.0008 (0.0024) model time 0.3963 (0.4045) loss 6.7985 (6.3622) grad_norm 2.1659 (inf) loss_scale 64.0000 (82.1036) mem 14939MB [2024-07-25 12:40:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][260/625] eta 0:02:28 lr 0.000044 wd 0.0500 time 0.3949 (0.4057) data time 0.0008 (0.0023) model time 0.3942 (0.4042) loss 5.7126 (6.3464) grad_norm 6.2013 (inf) loss_scale 64.0000 (81.4100) mem 14939MB [2024-07-25 12:40:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][270/625] eta 0:02:23 lr 0.000044 wd 0.0500 time 0.4003 (0.4054) data time 0.0006 (0.0023) model time 0.3997 (0.4039) loss 5.3810 (6.3419) grad_norm 3.9951 (inf) loss_scale 64.0000 (80.7675) mem 14939MB [2024-07-25 12:40:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][280/625] eta 0:02:19 lr 0.000044 wd 0.0500 time 0.3982 (0.4051) data time 0.0009 (0.0022) model time 0.3973 (0.4035) loss 6.1493 (6.3393) grad_norm 3.6217 (inf) loss_scale 64.0000 (80.1708) mem 14939MB [2024-07-25 12:40:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][290/625] eta 0:02:15 lr 0.000044 wd 0.0500 time 0.4019 (0.4048) data time 0.0008 (0.0022) model time 0.4011 (0.4032) loss 7.4194 (6.3414) grad_norm 3.4491 (inf) loss_scale 64.0000 (79.6151) mem 14939MB [2024-07-25 12:40:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][300/625] eta 0:02:11 lr 0.000044 wd 0.0500 time 0.6009 (0.4053) data time 0.0008 (0.0021) model time 0.6001 (0.4038) loss 7.1441 (6.3421) grad_norm 3.1780 (inf) loss_scale 64.0000 (79.0963) mem 14939MB [2024-07-25 12:40:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][310/625] eta 0:02:07 lr 0.000044 wd 0.0500 time 0.3996 (0.4051) data time 0.0006 (0.0021) model time 0.3990 (0.4036) loss 6.8550 (6.3377) grad_norm 7.2248 (inf) loss_scale 64.0000 (78.6109) mem 14939MB [2024-07-25 12:40:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][320/625] eta 0:02:03 lr 0.000044 wd 0.0500 time 0.3953 (0.4048) data time 0.0007 (0.0020) model time 0.3946 (0.4033) loss 6.2632 (6.3355) grad_norm 2.8013 (inf) loss_scale 64.0000 (78.1558) mem 14939MB [2024-07-25 12:40:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][330/625] eta 0:01:59 lr 0.000044 wd 0.0500 time 0.3954 (0.4045) data time 0.0006 (0.0020) model time 0.3948 (0.4030) loss 6.2714 (6.3340) grad_norm 2.4848 (inf) loss_scale 64.0000 (77.7281) mem 14939MB [2024-07-25 12:40:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][340/625] eta 0:01:55 lr 0.000044 wd 0.0500 time 0.3967 (0.4043) data time 0.0007 (0.0020) model time 0.3961 (0.4028) loss 6.0004 (6.3322) grad_norm 2.5116 (inf) loss_scale 64.0000 (77.3255) mem 14939MB [2024-07-25 12:40:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][350/625] eta 0:01:51 lr 0.000044 wd 0.0500 time 0.3960 (0.4043) data time 0.0006 (0.0019) model time 0.3953 (0.4028) loss 6.3326 (6.3399) grad_norm 3.3663 (inf) loss_scale 64.0000 (76.9459) mem 14939MB [2024-07-25 12:41:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][360/625] eta 0:01:47 lr 0.000044 wd 0.0500 time 0.3962 (0.4041) data time 0.0008 (0.0019) model time 0.3954 (0.4025) loss 5.8198 (6.3398) grad_norm 2.3580 (inf) loss_scale 64.0000 (76.5873) mem 14939MB [2024-07-25 12:41:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][370/625] eta 0:01:43 lr 0.000044 wd 0.0500 time 0.3966 (0.4045) data time 0.0009 (0.0019) model time 0.3957 (0.4031) loss 5.2880 (6.3269) grad_norm 5.8826 (inf) loss_scale 64.0000 (76.2480) mem 14939MB [2024-07-25 12:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][380/625] eta 0:01:39 lr 0.000044 wd 0.0500 time 0.3970 (0.4043) data time 0.0006 (0.0018) model time 0.3964 (0.4028) loss 5.4271 (6.3208) grad_norm 2.7198 (inf) loss_scale 64.0000 (75.9265) mem 14939MB [2024-07-25 12:41:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][390/625] eta 0:01:34 lr 0.000044 wd 0.0500 time 0.3967 (0.4041) data time 0.0007 (0.0018) model time 0.3961 (0.4026) loss 6.5586 (6.3251) grad_norm 5.4204 (inf) loss_scale 64.0000 (75.6215) mem 14939MB [2024-07-25 12:41:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][400/625] eta 0:01:30 lr 0.000044 wd 0.0500 time 0.4035 (0.4040) data time 0.0008 (0.0018) model time 0.4027 (0.4025) loss 7.7602 (6.3377) grad_norm 2.3183 (inf) loss_scale 64.0000 (75.3317) mem 14939MB [2024-07-25 12:41:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][410/625] eta 0:01:26 lr 0.000044 wd 0.0500 time 0.3933 (0.4038) data time 0.0011 (0.0018) model time 0.3922 (0.4023) loss 7.3790 (6.3390) grad_norm 2.7336 (inf) loss_scale 64.0000 (75.0560) mem 14939MB [2024-07-25 12:41:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][420/625] eta 0:01:22 lr 0.000044 wd 0.0500 time 0.3948 (0.4036) data time 0.0009 (0.0017) model time 0.3939 (0.4021) loss 6.2430 (6.3377) grad_norm 3.6830 (inf) loss_scale 64.0000 (74.7933) mem 14939MB [2024-07-25 12:41:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][430/625] eta 0:01:18 lr 0.000044 wd 0.0500 time 0.3978 (0.4035) data time 0.0006 (0.0017) model time 0.3972 (0.4020) loss 6.0690 (6.3409) grad_norm 3.0414 (inf) loss_scale 64.0000 (74.5429) mem 14939MB [2024-07-25 12:41:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][440/625] eta 0:01:14 lr 0.000044 wd 0.0500 time 0.3990 (0.4034) data time 0.0007 (0.0017) model time 0.3983 (0.4018) loss 6.0546 (6.3486) grad_norm 3.0154 (inf) loss_scale 64.0000 (74.3039) mem 14939MB [2024-07-25 12:41:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][450/625] eta 0:01:10 lr 0.000044 wd 0.0500 time 0.3949 (0.4032) data time 0.0009 (0.0017) model time 0.3940 (0.4017) loss 5.9296 (6.3404) grad_norm 2.3606 (inf) loss_scale 64.0000 (74.0754) mem 14939MB [2024-07-25 12:41:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][460/625] eta 0:01:06 lr 0.000044 wd 0.0500 time 0.3927 (0.4030) data time 0.0007 (0.0017) model time 0.3921 (0.4015) loss 6.2306 (6.3354) grad_norm 3.0941 (inf) loss_scale 64.0000 (73.8568) mem 14939MB [2024-07-25 12:41:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][470/625] eta 0:01:02 lr 0.000044 wd 0.0500 time 0.4018 (0.4029) data time 0.0008 (0.0017) model time 0.4009 (0.4014) loss 4.7945 (6.3438) grad_norm 3.3437 (inf) loss_scale 64.0000 (73.6476) mem 14939MB [2024-07-25 12:41:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][480/625] eta 0:00:58 lr 0.000044 wd 0.0500 time 0.3960 (0.4028) data time 0.0008 (0.0016) model time 0.3952 (0.4013) loss 5.5797 (6.3428) grad_norm 2.3735 (inf) loss_scale 64.0000 (73.4470) mem 14939MB [2024-07-25 12:41:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][490/625] eta 0:00:54 lr 0.000044 wd 0.0500 time 0.3947 (0.4027) data time 0.0008 (0.0016) model time 0.3939 (0.4012) loss 7.6780 (6.3494) grad_norm 2.4205 (inf) loss_scale 64.0000 (73.2546) mem 14939MB [2024-07-25 12:41:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][500/625] eta 0:00:50 lr 0.000044 wd 0.0500 time 0.3982 (0.4026) data time 0.0008 (0.0016) model time 0.3974 (0.4011) loss 6.9298 (6.3451) grad_norm 3.8757 (inf) loss_scale 64.0000 (73.0699) mem 14939MB [2024-07-25 12:42:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][510/625] eta 0:00:46 lr 0.000044 wd 0.0500 time 0.4003 (0.4025) data time 0.0007 (0.0016) model time 0.3995 (0.4010) loss 5.9072 (6.3486) grad_norm 2.6335 (inf) loss_scale 64.0000 (72.8924) mem 14939MB [2024-07-25 12:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][520/625] eta 0:00:42 lr 0.000044 wd 0.0500 time 0.5566 (0.4027) data time 0.0008 (0.0016) model time 0.5558 (0.4013) loss 5.5844 (6.3417) grad_norm 5.6575 (inf) loss_scale 64.0000 (72.7217) mem 14939MB [2024-07-25 12:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][530/625] eta 0:00:38 lr 0.000043 wd 0.0500 time 0.3982 (0.4026) data time 0.0008 (0.0016) model time 0.3974 (0.4011) loss 7.4817 (6.3450) grad_norm 7.8097 (inf) loss_scale 64.0000 (72.5574) mem 14939MB [2024-07-25 12:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][540/625] eta 0:00:34 lr 0.000043 wd 0.0500 time 0.3956 (0.4025) data time 0.0008 (0.0015) model time 0.3948 (0.4010) loss 6.4427 (6.3476) grad_norm 2.8580 (inf) loss_scale 64.0000 (72.3993) mem 14939MB [2024-07-25 12:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][550/625] eta 0:00:30 lr 0.000043 wd 0.0500 time 0.3958 (0.4024) data time 0.0008 (0.0015) model time 0.3951 (0.4009) loss 5.8137 (6.3510) grad_norm 3.3453 (inf) loss_scale 64.0000 (72.2468) mem 14939MB [2024-07-25 12:42:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][560/625] eta 0:00:26 lr 0.000043 wd 0.0500 time 0.3953 (0.4023) data time 0.0007 (0.0015) model time 0.3946 (0.4008) loss 6.2699 (6.3544) grad_norm 4.6164 (inf) loss_scale 64.0000 (72.0998) mem 14939MB [2024-07-25 12:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][570/625] eta 0:00:22 lr 0.000043 wd 0.0500 time 0.3908 (0.4022) data time 0.0008 (0.0015) model time 0.3900 (0.4007) loss 6.5776 (6.3558) grad_norm 2.2337 (inf) loss_scale 64.0000 (71.9580) mem 14939MB [2024-07-25 12:42:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][580/625] eta 0:00:18 lr 0.000043 wd 0.0500 time 0.3947 (0.4021) data time 0.0006 (0.0015) model time 0.3941 (0.4007) loss 5.9402 (6.3540) grad_norm 4.5649 (inf) loss_scale 64.0000 (71.8210) mem 14939MB [2024-07-25 12:42:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][590/625] eta 0:00:14 lr 0.000043 wd 0.0500 time 0.3960 (0.4029) data time 0.0007 (0.0015) model time 0.3954 (0.4015) loss 5.5599 (6.3512) grad_norm 3.2825 (inf) loss_scale 64.0000 (71.6887) mem 14939MB [2024-07-25 12:42:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][600/625] eta 0:00:10 lr 0.000043 wd 0.0500 time 0.3965 (0.4028) data time 0.0006 (0.0015) model time 0.3958 (0.4014) loss 7.4600 (6.3559) grad_norm 2.2458 (inf) loss_scale 64.0000 (71.5607) mem 14939MB [2024-07-25 12:42:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][610/625] eta 0:00:06 lr 0.000043 wd 0.0500 time 0.3984 (0.4027) data time 0.0006 (0.0015) model time 0.3978 (0.4013) loss 6.7985 (6.3611) grad_norm 2.1736 (inf) loss_scale 64.0000 (71.4370) mem 14939MB [2024-07-25 12:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][620/625] eta 0:00:02 lr 0.000043 wd 0.0500 time 0.3932 (0.4026) data time 0.0006 (0.0015) model time 0.3926 (0.4012) loss 6.6254 (6.3603) grad_norm 4.4332 (inf) loss_scale 64.0000 (71.3172) mem 14939MB [2024-07-25 12:42:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 270 training takes 0:04:11 [2024-07-25 12:42:48 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 12:42:49 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 12:42:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.426 (0.426) Loss 0.5400 (0.5400) Acc@1 90.381 (90.381) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 12:42:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.117) Loss 0.8091 (0.6526) Acc@1 82.422 (87.584) Acc@5 97.266 (97.989) Mem 14939MB [2024-07-25 12:42:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.102) Loss 0.9038 (0.7583) Acc@1 78.906 (84.617) Acc@5 96.045 (97.042) Mem 14939MB [2024-07-25 12:42:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.261 Acc@5 97.007 [2024-07-25 12:42:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 12:42:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.829 (0.829) Loss 0.5400 (0.5400) Acc@1 90.088 (90.088) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 12:42:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.158) Loss 0.8081 (0.6543) Acc@1 83.301 (87.549) Acc@5 97.119 (98.025) Mem 14939MB [2024-07-25 12:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9102 (0.7578) Acc@1 78.906 (84.577) Acc@5 95.850 (97.049) Mem 14939MB [2024-07-25 12:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.175 Acc@5 97.013 [2024-07-25 12:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 12:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.18% [2024-07-25 12:42:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 12:42:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 12:42:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][0/625] eta 0:07:32 lr 0.000043 wd 0.0500 time 0.7234 (0.7234) data time 0.3490 (0.3490) model time 0.0000 (0.0000) loss 6.6646 (6.6646) grad_norm 2.9024 (2.9024) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][10/625] eta 0:04:21 lr 0.000043 wd 0.0500 time 0.3951 (0.4258) data time 0.0006 (0.0324) model time 0.0000 (0.0000) loss 5.5211 (6.4634) grad_norm 3.0336 (3.1302) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][20/625] eta 0:04:09 lr 0.000043 wd 0.0500 time 0.3934 (0.4119) data time 0.0008 (0.0174) model time 0.0000 (0.0000) loss 5.2269 (6.4744) grad_norm 1.6439 (3.1547) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][30/625] eta 0:04:02 lr 0.000043 wd 0.0500 time 0.3986 (0.4072) data time 0.0006 (0.0121) model time 0.0000 (0.0000) loss 6.3582 (6.4455) grad_norm 6.4143 (3.5324) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][40/625] eta 0:03:56 lr 0.000043 wd 0.0500 time 0.3959 (0.4048) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 6.3462 (6.4777) grad_norm 3.7522 (3.4513) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][50/625] eta 0:03:54 lr 0.000043 wd 0.0500 time 0.3937 (0.4071) data time 0.0009 (0.0077) model time 0.0000 (0.0000) loss 5.6422 (6.4592) grad_norm 2.3610 (3.3757) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][60/625] eta 0:03:49 lr 0.000043 wd 0.0500 time 0.4133 (0.4057) data time 0.0008 (0.0065) model time 0.4125 (0.3974) loss 7.1049 (6.4581) grad_norm 3.3766 (3.5618) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][70/625] eta 0:03:44 lr 0.000043 wd 0.0500 time 0.3948 (0.4044) data time 0.0008 (0.0057) model time 0.3939 (0.3964) loss 6.7004 (6.4446) grad_norm 2.5211 (3.8021) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][80/625] eta 0:03:39 lr 0.000043 wd 0.0500 time 0.4001 (0.4035) data time 0.0007 (0.0052) model time 0.3994 (0.3963) loss 5.6332 (6.4552) grad_norm 10.5727 (3.8358) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][90/625] eta 0:03:35 lr 0.000043 wd 0.0500 time 0.3967 (0.4026) data time 0.0008 (0.0047) model time 0.3958 (0.3959) loss 5.7311 (6.4886) grad_norm 2.0578 (3.8111) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][100/625] eta 0:03:31 lr 0.000043 wd 0.0500 time 0.4020 (0.4023) data time 0.0009 (0.0043) model time 0.4011 (0.3965) loss 7.4404 (6.4682) grad_norm 3.1288 (3.8059) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][110/625] eta 0:03:26 lr 0.000043 wd 0.0500 time 0.3973 (0.4019) data time 0.0006 (0.0040) model time 0.3966 (0.3965) loss 5.5947 (6.4636) grad_norm 2.0440 (3.8390) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][120/625] eta 0:03:22 lr 0.000043 wd 0.0500 time 0.3944 (0.4014) data time 0.0009 (0.0038) model time 0.3935 (0.3963) loss 6.2634 (6.4246) grad_norm 2.4413 (3.8281) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][130/625] eta 0:03:18 lr 0.000043 wd 0.0500 time 0.3960 (0.4010) data time 0.0009 (0.0035) model time 0.3951 (0.3962) loss 6.1860 (6.4056) grad_norm 4.3190 (3.8176) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][140/625] eta 0:03:14 lr 0.000043 wd 0.0500 time 0.3960 (0.4007) data time 0.0007 (0.0034) model time 0.3953 (0.3961) loss 7.0674 (6.4115) grad_norm 2.6003 (3.7345) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][150/625] eta 0:03:10 lr 0.000043 wd 0.0500 time 0.3973 (0.4004) data time 0.0006 (0.0032) model time 0.3967 (0.3960) loss 6.3352 (6.4072) grad_norm 3.0278 (3.6538) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:43:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][160/625] eta 0:03:06 lr 0.000043 wd 0.0500 time 0.3966 (0.4001) data time 0.0009 (0.0030) model time 0.3958 (0.3960) loss 6.1203 (6.4262) grad_norm 3.2485 (3.6055) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][170/625] eta 0:03:01 lr 0.000043 wd 0.0500 time 0.3937 (0.3999) data time 0.0009 (0.0029) model time 0.3928 (0.3960) loss 6.0380 (6.4435) grad_norm 4.0867 (3.7538) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][180/625] eta 0:02:58 lr 0.000043 wd 0.0500 time 0.6182 (0.4018) data time 0.0009 (0.0029) model time 0.6173 (0.3988) loss 6.3114 (6.4526) grad_norm 3.7471 (3.7737) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][190/625] eta 0:02:55 lr 0.000043 wd 0.0500 time 0.3940 (0.4024) data time 0.0006 (0.0028) model time 0.3934 (0.3997) loss 6.4525 (6.4402) grad_norm 2.2555 (4.1296) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][200/625] eta 0:02:50 lr 0.000042 wd 0.0500 time 0.3960 (0.4021) data time 0.0007 (0.0027) model time 0.3953 (0.3994) loss 6.2925 (6.4374) grad_norm 3.3821 (4.0652) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][210/625] eta 0:02:46 lr 0.000042 wd 0.0500 time 0.4025 (0.4019) data time 0.0007 (0.0026) model time 0.4018 (0.3993) loss 6.0942 (6.4384) grad_norm 2.5695 (4.0286) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][220/625] eta 0:02:42 lr 0.000042 wd 0.0500 time 0.3953 (0.4016) data time 0.0009 (0.0025) model time 0.3944 (0.3991) loss 6.8014 (6.4320) grad_norm 2.7104 (4.0701) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][230/625] eta 0:02:38 lr 0.000042 wd 0.0500 time 0.3939 (0.4014) data time 0.0006 (0.0024) model time 0.3933 (0.3988) loss 6.7170 (6.4199) grad_norm 2.3431 (3.9993) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][240/625] eta 0:02:34 lr 0.000042 wd 0.0500 time 0.3944 (0.4012) data time 0.0009 (0.0024) model time 0.3935 (0.3987) loss 5.2527 (6.4148) grad_norm 2.3375 (3.9567) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][250/625] eta 0:02:30 lr 0.000042 wd 0.0500 time 0.3935 (0.4010) data time 0.0007 (0.0023) model time 0.3928 (0.3985) loss 6.3393 (6.4183) grad_norm 3.7804 (4.2747) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][260/625] eta 0:02:26 lr 0.000042 wd 0.0500 time 0.3975 (0.4008) data time 0.0009 (0.0022) model time 0.3967 (0.3984) loss 5.2695 (6.4273) grad_norm 3.3389 (4.2417) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][270/625] eta 0:02:22 lr 0.000042 wd 0.0500 time 0.3957 (0.4013) data time 0.0008 (0.0022) model time 0.3949 (0.3991) loss 6.7709 (6.4369) grad_norm 3.4010 (4.2299) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][280/625] eta 0:02:18 lr 0.000042 wd 0.0500 time 0.3963 (0.4011) data time 0.0007 (0.0021) model time 0.3956 (0.3989) loss 5.7971 (6.4205) grad_norm 2.9351 (4.1852) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][290/625] eta 0:02:14 lr 0.000042 wd 0.0500 time 0.3984 (0.4010) data time 0.0007 (0.0021) model time 0.3978 (0.3988) loss 6.6847 (6.4046) grad_norm 1.7391 (4.1776) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:44:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][300/625] eta 0:02:10 lr 0.000042 wd 0.0500 time 0.3996 (0.4008) data time 0.0006 (0.0021) model time 0.3990 (0.3986) loss 6.0336 (6.3900) grad_norm 4.7694 (4.1570) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][310/625] eta 0:02:06 lr 0.000042 wd 0.0500 time 0.3937 (0.4007) data time 0.0009 (0.0020) model time 0.3929 (0.3985) loss 5.4650 (6.3998) grad_norm 1.7789 (4.1245) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][320/625] eta 0:02:02 lr 0.000042 wd 0.0500 time 0.3952 (0.4006) data time 0.0009 (0.0020) model time 0.3943 (0.3984) loss 7.0358 (6.4029) grad_norm 2.5166 (4.0771) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][330/625] eta 0:01:58 lr 0.000042 wd 0.0500 time 0.3972 (0.4005) data time 0.0008 (0.0019) model time 0.3963 (0.3984) loss 6.7825 (6.3987) grad_norm 3.8941 (4.1237) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][340/625] eta 0:01:54 lr 0.000042 wd 0.0500 time 0.3952 (0.4005) data time 0.0009 (0.0019) model time 0.3942 (0.3984) loss 6.2913 (6.4002) grad_norm 2.1090 (4.1155) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][350/625] eta 0:01:50 lr 0.000042 wd 0.0500 time 0.3949 (0.4004) data time 0.0008 (0.0019) model time 0.3941 (0.3983) loss 5.5449 (6.3884) grad_norm 2.5617 (4.1199) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][360/625] eta 0:01:46 lr 0.000042 wd 0.0500 time 0.3995 (0.4003) data time 0.0007 (0.0019) model time 0.3988 (0.3983) loss 6.8583 (6.3943) grad_norm 4.3187 (4.1036) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][370/625] eta 0:01:42 lr 0.000042 wd 0.0500 time 0.3946 (0.4002) data time 0.0008 (0.0018) model time 0.3937 (0.3982) loss 6.1311 (6.3806) grad_norm 2.3892 (4.0883) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][380/625] eta 0:01:38 lr 0.000042 wd 0.0500 time 0.3964 (0.4001) data time 0.0008 (0.0018) model time 0.3956 (0.3981) loss 6.0417 (6.3753) grad_norm 3.9318 (4.0669) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][390/625] eta 0:01:33 lr 0.000042 wd 0.0500 time 0.3997 (0.4000) data time 0.0006 (0.0018) model time 0.3991 (0.3980) loss 7.1070 (6.3765) grad_norm 2.7175 (4.0614) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][400/625] eta 0:01:30 lr 0.000042 wd 0.0500 time 0.3951 (0.4007) data time 0.0008 (0.0018) model time 0.3944 (0.3989) loss 7.3775 (6.3837) grad_norm 2.1907 (4.0863) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][410/625] eta 0:01:26 lr 0.000042 wd 0.0500 time 0.3924 (0.4010) data time 0.0008 (0.0017) model time 0.3916 (0.3993) loss 6.2126 (6.3882) grad_norm 4.7263 (4.0634) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][420/625] eta 0:01:22 lr 0.000042 wd 0.0500 time 0.3941 (0.4009) data time 0.0008 (0.0017) model time 0.3934 (0.3992) loss 5.9479 (6.3875) grad_norm 2.3885 (4.0480) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][430/625] eta 0:01:18 lr 0.000042 wd 0.0500 time 0.3973 (0.4008) data time 0.0008 (0.0017) model time 0.3964 (0.3991) loss 7.1421 (6.3860) grad_norm 2.6020 (4.0539) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][440/625] eta 0:01:14 lr 0.000042 wd 0.0500 time 0.3958 (0.4007) data time 0.0007 (0.0017) model time 0.3951 (0.3990) loss 6.2242 (6.3922) grad_norm 3.0698 (4.1231) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:45:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][450/625] eta 0:01:10 lr 0.000042 wd 0.0500 time 0.3922 (0.4006) data time 0.0006 (0.0016) model time 0.3916 (0.3989) loss 7.1581 (6.3810) grad_norm 2.7137 (4.1187) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][460/625] eta 0:01:06 lr 0.000042 wd 0.0500 time 0.3940 (0.4006) data time 0.0006 (0.0016) model time 0.3934 (0.3989) loss 6.1970 (6.3691) grad_norm 20.5198 (4.1259) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][470/625] eta 0:01:02 lr 0.000042 wd 0.0500 time 0.3937 (0.4005) data time 0.0007 (0.0016) model time 0.3930 (0.3988) loss 7.0424 (6.3689) grad_norm 2.2836 (4.1090) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][480/625] eta 0:00:58 lr 0.000042 wd 0.0500 time 0.3959 (0.4004) data time 0.0007 (0.0016) model time 0.3952 (0.3988) loss 6.5511 (6.3657) grad_norm 2.4511 (4.0957) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][490/625] eta 0:00:54 lr 0.000042 wd 0.0500 time 0.3930 (0.4007) data time 0.0009 (0.0016) model time 0.3921 (0.3991) loss 7.0182 (6.3706) grad_norm 112.0633 (4.3193) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][500/625] eta 0:00:50 lr 0.000041 wd 0.0500 time 0.3955 (0.4006) data time 0.0008 (0.0016) model time 0.3946 (0.3990) loss 7.1156 (6.3739) grad_norm 3.1235 (4.2882) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][510/625] eta 0:00:46 lr 0.000041 wd 0.0500 time 0.3941 (0.4005) data time 0.0007 (0.0016) model time 0.3935 (0.3989) loss 5.5553 (6.3749) grad_norm 2.7747 (4.2646) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][520/625] eta 0:00:42 lr 0.000041 wd 0.0500 time 0.3943 (0.4005) data time 0.0007 (0.0015) model time 0.3937 (0.3989) loss 6.3140 (6.3751) grad_norm 2.1827 (4.2435) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][530/625] eta 0:00:38 lr 0.000041 wd 0.0500 time 0.3946 (0.4004) data time 0.0008 (0.0015) model time 0.3938 (0.3988) loss 6.8805 (6.3805) grad_norm 2.4764 (4.2154) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][540/625] eta 0:00:34 lr 0.000041 wd 0.0500 time 0.3951 (0.4003) data time 0.0009 (0.0015) model time 0.3942 (0.3988) loss 6.4969 (6.3802) grad_norm 2.7753 (4.1913) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][550/625] eta 0:00:30 lr 0.000041 wd 0.0500 time 0.3945 (0.4003) data time 0.0006 (0.0015) model time 0.3938 (0.3987) loss 5.3466 (6.3775) grad_norm 2.4262 (4.1911) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][560/625] eta 0:00:26 lr 0.000041 wd 0.0500 time 0.3957 (0.4002) data time 0.0008 (0.0015) model time 0.3949 (0.3986) loss 7.6618 (6.3737) grad_norm 2.9785 (4.1630) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][570/625] eta 0:00:22 lr 0.000041 wd 0.0500 time 0.3931 (0.4001) data time 0.0008 (0.0015) model time 0.3923 (0.3986) loss 6.8878 (6.3774) grad_norm 2.3481 (4.2945) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][580/625] eta 0:00:18 lr 0.000041 wd 0.0500 time 0.3931 (0.4001) data time 0.0008 (0.0015) model time 0.3923 (0.3985) loss 5.8457 (6.3770) grad_norm 2.7956 (4.2981) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][590/625] eta 0:00:14 lr 0.000041 wd 0.0500 time 0.3973 (0.4000) data time 0.0007 (0.0015) model time 0.3966 (0.3985) loss 5.5290 (6.3734) grad_norm 4.4916 (4.2921) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][600/625] eta 0:00:09 lr 0.000041 wd 0.0500 time 0.3959 (0.3999) data time 0.0007 (0.0014) model time 0.3953 (0.3984) loss 4.9786 (6.3756) grad_norm 3.0670 (4.2691) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:46:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][610/625] eta 0:00:05 lr 0.000041 wd 0.0500 time 0.3951 (0.3999) data time 0.0006 (0.0014) model time 0.3945 (0.3984) loss 6.1521 (6.3748) grad_norm 4.5292 (4.2606) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][620/625] eta 0:00:02 lr 0.000041 wd 0.0500 time 0.3959 (0.4002) data time 0.0006 (0.0014) model time 0.3953 (0.3987) loss 5.7620 (6.3689) grad_norm 2.6521 (4.2443) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 271 training takes 0:04:10 [2024-07-25 12:47:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 12:47:06 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 12:47:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5469 (0.5469) Acc@1 90.186 (90.186) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 12:47:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8125 (0.6582) Acc@1 82.764 (87.575) Acc@5 97.119 (98.038) Mem 14939MB [2024-07-25 12:47:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9229 (0.7617) Acc@1 79.053 (84.649) Acc@5 95.898 (97.075) Mem 14939MB [2024-07-25 12:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.287 Acc@5 97.055 [2024-07-25 12:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 12:47:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.799 (0.799) Loss 0.5391 (0.5391) Acc@1 90.039 (90.039) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 12:47:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.155) Loss 0.8081 (0.6543) Acc@1 83.252 (87.558) Acc@5 97.119 (98.020) Mem 14939MB [2024-07-25 12:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9106 (0.7578) Acc@1 78.906 (84.587) Acc@5 95.850 (97.045) Mem 14939MB [2024-07-25 12:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.187 Acc@5 97.013 [2024-07-25 12:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 12:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.19% [2024-07-25 12:47:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 12:47:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 12:47:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][0/625] eta 0:07:29 lr 0.000041 wd 0.0500 time 0.7194 (0.7194) data time 0.3367 (0.3367) model time 0.0000 (0.0000) loss 6.4192 (6.4192) grad_norm 3.9568 (3.9568) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][10/625] eta 0:04:21 lr 0.000041 wd 0.0500 time 0.3941 (0.4255) data time 0.0008 (0.0314) model time 0.0000 (0.0000) loss 6.5939 (6.7923) grad_norm 3.2964 (2.9162) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][20/625] eta 0:04:12 lr 0.000041 wd 0.0500 time 0.3915 (0.4174) data time 0.0009 (0.0168) model time 0.0000 (0.0000) loss 6.6396 (6.4952) grad_norm 3.3406 (3.1661) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][30/625] eta 0:04:04 lr 0.000041 wd 0.0500 time 0.3952 (0.4104) data time 0.0006 (0.0117) model time 0.0000 (0.0000) loss 5.0633 (6.3293) grad_norm 2.0517 (3.2138) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][40/625] eta 0:03:58 lr 0.000041 wd 0.0500 time 0.3964 (0.4071) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 5.1974 (6.2909) grad_norm 3.3972 (3.1838) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][50/625] eta 0:03:52 lr 0.000041 wd 0.0500 time 0.3950 (0.4048) data time 0.0007 (0.0074) model time 0.0000 (0.0000) loss 6.7027 (6.2764) grad_norm 1.8742 (3.3047) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][60/625] eta 0:03:47 lr 0.000041 wd 0.0500 time 0.3965 (0.4035) data time 0.0006 (0.0063) model time 0.3959 (0.3961) loss 6.6224 (6.3508) grad_norm 2.3294 (3.2536) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][70/625] eta 0:03:43 lr 0.000041 wd 0.0500 time 0.3942 (0.4026) data time 0.0008 (0.0055) model time 0.3934 (0.3962) loss 5.5735 (6.3023) grad_norm 1.9263 (3.2266) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][80/625] eta 0:03:38 lr 0.000041 wd 0.0500 time 0.3974 (0.4018) data time 0.0008 (0.0050) model time 0.3966 (0.3959) loss 7.0945 (6.2993) grad_norm 2.7051 (3.3646) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][90/625] eta 0:03:34 lr 0.000041 wd 0.0500 time 0.3977 (0.4012) data time 0.0007 (0.0045) model time 0.3971 (0.3959) loss 5.4107 (6.2962) grad_norm 2.0250 (3.3522) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][100/625] eta 0:03:30 lr 0.000041 wd 0.0500 time 0.3969 (0.4008) data time 0.0009 (0.0041) model time 0.3960 (0.3959) loss 6.2819 (6.2540) grad_norm 3.1207 (3.2863) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][110/625] eta 0:03:26 lr 0.000041 wd 0.0500 time 0.3978 (0.4004) data time 0.0007 (0.0038) model time 0.3970 (0.3959) loss 6.7305 (6.2505) grad_norm 2.4140 (3.3465) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][120/625] eta 0:03:22 lr 0.000041 wd 0.0500 time 0.3973 (0.4001) data time 0.0006 (0.0036) model time 0.3967 (0.3959) loss 5.9826 (6.2425) grad_norm 5.3366 (3.3945) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][130/625] eta 0:03:17 lr 0.000041 wd 0.0500 time 0.3975 (0.4000) data time 0.0009 (0.0034) model time 0.3966 (0.3960) loss 5.0007 (6.2221) grad_norm 2.9074 (3.3946) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][140/625] eta 0:03:13 lr 0.000041 wd 0.0500 time 0.3963 (0.3998) data time 0.0006 (0.0032) model time 0.3956 (0.3961) loss 4.8806 (6.2136) grad_norm 2.2040 (3.4277) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][150/625] eta 0:03:09 lr 0.000041 wd 0.0500 time 0.3930 (0.3996) data time 0.0007 (0.0030) model time 0.3923 (0.3962) loss 7.0591 (6.2170) grad_norm 3.3807 (3.4036) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][160/625] eta 0:03:05 lr 0.000041 wd 0.0500 time 0.3949 (0.3995) data time 0.0007 (0.0029) model time 0.3942 (0.3961) loss 5.2937 (6.2122) grad_norm 2.3027 (3.4102) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][170/625] eta 0:03:01 lr 0.000041 wd 0.0500 time 0.3968 (0.3992) data time 0.0008 (0.0028) model time 0.3960 (0.3960) loss 6.1768 (6.2142) grad_norm 1.9651 (3.3634) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][180/625] eta 0:02:57 lr 0.000040 wd 0.0500 time 0.3957 (0.3990) data time 0.0006 (0.0027) model time 0.3951 (0.3959) loss 6.3692 (6.2171) grad_norm 2.5583 (3.3902) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][190/625] eta 0:02:53 lr 0.000040 wd 0.0500 time 0.3955 (0.3989) data time 0.0006 (0.0026) model time 0.3948 (0.3960) loss 5.5111 (6.2254) grad_norm 2.2943 (3.4624) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][200/625] eta 0:02:49 lr 0.000040 wd 0.0500 time 0.3920 (0.3988) data time 0.0007 (0.0025) model time 0.3913 (0.3959) loss 5.2715 (6.2293) grad_norm 2.8507 (3.4619) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][210/625] eta 0:02:45 lr 0.000040 wd 0.0500 time 0.3973 (0.3986) data time 0.0008 (0.0024) model time 0.3965 (0.3958) loss 6.1606 (6.2582) grad_norm 4.7275 (3.4870) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][220/625] eta 0:02:42 lr 0.000040 wd 0.0500 time 0.5624 (0.4010) data time 0.0006 (0.0024) model time 0.5617 (0.3991) loss 5.3986 (6.2624) grad_norm 3.4418 (3.4964) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][230/625] eta 0:02:38 lr 0.000040 wd 0.0500 time 0.3966 (0.4015) data time 0.0006 (0.0023) model time 0.3960 (0.3997) loss 7.1190 (6.2718) grad_norm 2.1962 (3.5009) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][240/625] eta 0:02:34 lr 0.000040 wd 0.0500 time 0.3932 (0.4021) data time 0.0007 (0.0022) model time 0.3925 (0.4005) loss 6.7784 (6.2597) grad_norm 2.1658 (3.5383) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][250/625] eta 0:02:30 lr 0.000040 wd 0.0500 time 0.3935 (0.4019) data time 0.0008 (0.0022) model time 0.3927 (0.4003) loss 5.6601 (6.2647) grad_norm 2.6582 (3.5386) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:48:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][260/625] eta 0:02:26 lr 0.000040 wd 0.0500 time 0.3955 (0.4017) data time 0.0007 (0.0021) model time 0.3948 (0.4002) loss 7.0656 (6.2596) grad_norm 2.3245 (3.5055) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][270/625] eta 0:02:22 lr 0.000040 wd 0.0500 time 0.4003 (0.4016) data time 0.0006 (0.0021) model time 0.3997 (0.4000) loss 4.7362 (6.2693) grad_norm 2.0859 (3.7525) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][280/625] eta 0:02:18 lr 0.000040 wd 0.0500 time 0.3968 (0.4014) data time 0.0008 (0.0020) model time 0.3960 (0.3998) loss 5.7158 (6.2716) grad_norm 2.1336 (3.7613) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][290/625] eta 0:02:14 lr 0.000040 wd 0.0500 time 0.4047 (0.4014) data time 0.0007 (0.0020) model time 0.4040 (0.3999) loss 6.1397 (6.2709) grad_norm 3.8853 (3.7762) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][300/625] eta 0:02:10 lr 0.000040 wd 0.0500 time 0.3954 (0.4013) data time 0.0006 (0.0020) model time 0.3947 (0.3997) loss 6.5743 (6.2789) grad_norm 2.3384 (3.8465) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][310/625] eta 0:02:06 lr 0.000040 wd 0.0500 time 0.3990 (0.4011) data time 0.0006 (0.0019) model time 0.3985 (0.3996) loss 6.4585 (6.2905) grad_norm 3.7380 (3.8359) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][320/625] eta 0:02:02 lr 0.000040 wd 0.0500 time 0.4046 (0.4010) data time 0.0007 (0.0019) model time 0.4039 (0.3995) loss 6.2947 (6.2933) grad_norm 5.0558 (3.8309) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][330/625] eta 0:01:58 lr 0.000040 wd 0.0500 time 0.3979 (0.4009) data time 0.0007 (0.0019) model time 0.3973 (0.3994) loss 6.9916 (6.3011) grad_norm 2.2759 (3.8045) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][340/625] eta 0:01:54 lr 0.000040 wd 0.0500 time 0.3968 (0.4009) data time 0.0008 (0.0018) model time 0.3960 (0.3994) loss 6.9509 (6.3043) grad_norm 3.2397 (3.8044) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][350/625] eta 0:01:50 lr 0.000040 wd 0.0500 time 0.3998 (0.4009) data time 0.0008 (0.0018) model time 0.3990 (0.3994) loss 7.4088 (6.3091) grad_norm 3.6769 (3.8083) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][360/625] eta 0:01:46 lr 0.000040 wd 0.0500 time 0.4043 (0.4008) data time 0.0006 (0.0018) model time 0.4037 (0.3993) loss 5.6986 (6.3057) grad_norm 2.8233 (3.8060) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][370/625] eta 0:01:42 lr 0.000040 wd 0.0500 time 0.3993 (0.4008) data time 0.0009 (0.0018) model time 0.3984 (0.3993) loss 6.5039 (6.3173) grad_norm 2.3296 (3.7970) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][380/625] eta 0:01:38 lr 0.000040 wd 0.0500 time 0.4182 (0.4008) data time 0.0006 (0.0017) model time 0.4176 (0.3993) loss 5.9210 (6.3105) grad_norm 3.8051 (3.7948) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][390/625] eta 0:01:34 lr 0.000040 wd 0.0500 time 0.3974 (0.4008) data time 0.0008 (0.0018) model time 0.3966 (0.3993) loss 6.0742 (6.3025) grad_norm 5.1149 (3.8054) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][400/625] eta 0:01:30 lr 0.000040 wd 0.0500 time 0.3986 (0.4007) data time 0.0010 (0.0017) model time 0.3977 (0.3992) loss 6.0862 (6.2988) grad_norm 2.6350 (3.7877) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:49:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][410/625] eta 0:01:26 lr 0.000040 wd 0.0500 time 0.3970 (0.4006) data time 0.0007 (0.0017) model time 0.3963 (0.3991) loss 5.3278 (6.2864) grad_norm 3.9363 (3.7757) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][420/625] eta 0:01:22 lr 0.000040 wd 0.0500 time 0.3973 (0.4007) data time 0.0008 (0.0017) model time 0.3964 (0.3992) loss 5.8435 (6.2924) grad_norm 6.2678 (3.7816) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][430/625] eta 0:01:18 lr 0.000040 wd 0.0500 time 0.5456 (0.4010) data time 0.0009 (0.0017) model time 0.5447 (0.3996) loss 6.5667 (6.2929) grad_norm 2.4161 (3.7855) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][440/625] eta 0:01:14 lr 0.000040 wd 0.0500 time 0.6013 (0.4024) data time 0.0006 (0.0017) model time 0.6007 (0.4011) loss 7.8218 (6.2933) grad_norm 2.2974 (3.7579) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][450/625] eta 0:01:10 lr 0.000040 wd 0.0500 time 0.3994 (0.4028) data time 0.0006 (0.0017) model time 0.3987 (0.4015) loss 6.8221 (6.2953) grad_norm 2.8627 (3.7460) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][460/625] eta 0:01:06 lr 0.000040 wd 0.0500 time 0.3952 (0.4031) data time 0.0008 (0.0017) model time 0.3944 (0.4019) loss 4.9967 (6.2958) grad_norm 4.1988 (3.7288) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][470/625] eta 0:01:02 lr 0.000040 wd 0.0500 time 0.3954 (0.4030) data time 0.0008 (0.0017) model time 0.3946 (0.4018) loss 6.2753 (6.2960) grad_norm 4.6693 (3.7370) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][480/625] eta 0:00:58 lr 0.000040 wd 0.0500 time 0.3973 (0.4029) data time 0.0008 (0.0017) model time 0.3965 (0.4017) loss 6.2422 (6.2976) grad_norm 4.9208 (3.7758) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][490/625] eta 0:00:54 lr 0.000039 wd 0.0500 time 0.3963 (0.4028) data time 0.0007 (0.0017) model time 0.3956 (0.4016) loss 6.6710 (6.3015) grad_norm 4.5604 (3.7824) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][500/625] eta 0:00:50 lr 0.000039 wd 0.0500 time 0.3954 (0.4027) data time 0.0006 (0.0016) model time 0.3948 (0.4014) loss 6.6477 (6.3057) grad_norm 4.7256 (3.7700) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][510/625] eta 0:00:46 lr 0.000039 wd 0.0500 time 0.4073 (0.4026) data time 0.0007 (0.0016) model time 0.4067 (0.4014) loss 6.9089 (6.3125) grad_norm 2.4396 (3.7492) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][520/625] eta 0:00:42 lr 0.000039 wd 0.0500 time 0.3966 (0.4026) data time 0.0006 (0.0016) model time 0.3960 (0.4014) loss 5.6697 (6.3124) grad_norm 2.1628 (3.7313) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][530/625] eta 0:00:38 lr 0.000039 wd 0.0500 time 0.3967 (0.4025) data time 0.0009 (0.0016) model time 0.3958 (0.4013) loss 5.7139 (6.3114) grad_norm 2.1484 (3.7514) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][540/625] eta 0:00:34 lr 0.000039 wd 0.0500 time 0.3974 (0.4025) data time 0.0006 (0.0016) model time 0.3968 (0.4013) loss 6.6551 (6.3090) grad_norm 3.1328 (3.7473) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][550/625] eta 0:00:30 lr 0.000039 wd 0.0500 time 0.3946 (0.4024) data time 0.0006 (0.0016) model time 0.3940 (0.4012) loss 6.8278 (6.3083) grad_norm 3.5709 (3.7983) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:50:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][560/625] eta 0:00:26 lr 0.000039 wd 0.0500 time 0.4038 (0.4024) data time 0.0008 (0.0016) model time 0.4029 (0.4011) loss 5.4509 (6.3067) grad_norm 2.4806 (3.7899) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][570/625] eta 0:00:22 lr 0.000039 wd 0.0500 time 0.3990 (0.4023) data time 0.0009 (0.0015) model time 0.3981 (0.4010) loss 6.7540 (6.3059) grad_norm 2.3649 (3.7835) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][580/625] eta 0:00:18 lr 0.000039 wd 0.0500 time 0.3954 (0.4022) data time 0.0007 (0.0015) model time 0.3947 (0.4010) loss 7.3044 (6.3108) grad_norm 1.8881 (3.7780) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][590/625] eta 0:00:14 lr 0.000039 wd 0.0500 time 0.3982 (0.4021) data time 0.0008 (0.0015) model time 0.3975 (0.4009) loss 5.4886 (6.3108) grad_norm 4.9007 (3.9396) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][600/625] eta 0:00:10 lr 0.000039 wd 0.0500 time 0.3985 (0.4020) data time 0.0007 (0.0015) model time 0.3979 (0.4008) loss 6.6666 (6.3083) grad_norm 2.7363 (3.9212) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][610/625] eta 0:00:06 lr 0.000039 wd 0.0500 time 0.3956 (0.4019) data time 0.0006 (0.0015) model time 0.3949 (0.4007) loss 6.7117 (6.3112) grad_norm 3.4119 (3.9253) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][620/625] eta 0:00:02 lr 0.000039 wd 0.0500 time 0.3940 (0.4018) data time 0.0006 (0.0015) model time 0.3934 (0.4006) loss 7.4939 (6.3147) grad_norm 2.9613 (3.9187) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 272 training takes 0:04:11 [2024-07-25 12:51:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 12:51:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 12:51:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.496 (0.496) Loss 0.5425 (0.5425) Acc@1 90.186 (90.186) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 12:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.123) Loss 0.8130 (0.6555) Acc@1 82.617 (87.638) Acc@5 96.973 (98.025) Mem 14939MB [2024-07-25 12:51:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 0.9033 (0.7604) Acc@1 79.199 (84.668) Acc@5 95.898 (97.094) Mem 14939MB [2024-07-25 12:51:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.279 Acc@5 97.061 [2024-07-25 12:51:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 12:51:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.935 (0.935) Loss 0.5400 (0.5400) Acc@1 90.137 (90.137) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 12:51:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.087 (0.163) Loss 0.8076 (0.6542) Acc@1 83.252 (87.571) Acc@5 97.070 (98.011) Mem 14939MB [2024-07-25 12:51:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.126) Loss 0.9097 (0.7576) Acc@1 78.857 (84.598) Acc@5 95.850 (97.045) Mem 14939MB [2024-07-25 12:51:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.201 Acc@5 97.011 [2024-07-25 12:51:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 12:51:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.20% [2024-07-25 12:51:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 12:51:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 12:51:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][0/625] eta 0:07:34 lr 0.000039 wd 0.0500 time 0.7277 (0.7277) data time 0.3542 (0.3542) model time 0.0000 (0.0000) loss 6.8492 (6.8492) grad_norm 2.3849 (2.3849) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][10/625] eta 0:04:22 lr 0.000039 wd 0.0500 time 0.3946 (0.4266) data time 0.0009 (0.0330) model time 0.0000 (0.0000) loss 5.6069 (6.4853) grad_norm 2.8740 (3.3025) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][20/625] eta 0:04:09 lr 0.000039 wd 0.0500 time 0.3972 (0.4122) data time 0.0007 (0.0176) model time 0.0000 (0.0000) loss 6.5742 (6.3624) grad_norm 3.5285 (3.2750) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][30/625] eta 0:04:10 lr 0.000039 wd 0.0500 time 0.3969 (0.4206) data time 0.0008 (0.0122) model time 0.0000 (0.0000) loss 7.3337 (6.3658) grad_norm 2.1322 (3.1214) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][40/625] eta 0:04:09 lr 0.000039 wd 0.0500 time 0.5316 (0.4264) data time 0.0007 (0.0094) model time 0.0000 (0.0000) loss 6.8788 (6.3721) grad_norm 3.0552 (3.5179) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][50/625] eta 0:04:04 lr 0.000039 wd 0.0500 time 0.3988 (0.4244) data time 0.0008 (0.0078) model time 0.0000 (0.0000) loss 6.6816 (6.3369) grad_norm 3.2694 (3.4957) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:51:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][60/625] eta 0:03:57 lr 0.000039 wd 0.0500 time 0.3925 (0.4198) data time 0.0009 (0.0066) model time 0.3916 (0.3954) loss 6.4282 (6.2964) grad_norm 7.4114 (3.4704) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][70/625] eta 0:03:51 lr 0.000039 wd 0.0500 time 0.3987 (0.4165) data time 0.0008 (0.0059) model time 0.3978 (0.3951) loss 6.4023 (6.2974) grad_norm 10.4065 (3.4632) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][80/625] eta 0:03:45 lr 0.000039 wd 0.0500 time 0.4025 (0.4142) data time 0.0009 (0.0053) model time 0.4016 (0.3957) loss 6.9552 (6.3386) grad_norm 2.6819 (3.4165) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][90/625] eta 0:03:40 lr 0.000039 wd 0.0500 time 0.3946 (0.4122) data time 0.0007 (0.0048) model time 0.3939 (0.3957) loss 5.7171 (6.3430) grad_norm 3.4294 (3.3777) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][100/625] eta 0:03:35 lr 0.000039 wd 0.0500 time 0.3945 (0.4106) data time 0.0010 (0.0044) model time 0.3935 (0.3955) loss 7.2282 (6.3642) grad_norm 2.6339 (3.3930) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][110/625] eta 0:03:30 lr 0.000039 wd 0.0500 time 0.3974 (0.4095) data time 0.0006 (0.0041) model time 0.3968 (0.3959) loss 6.4457 (6.4071) grad_norm 2.5198 (3.3480) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][120/625] eta 0:03:26 lr 0.000039 wd 0.0500 time 0.3940 (0.4085) data time 0.0008 (0.0038) model time 0.3932 (0.3959) loss 6.2605 (6.4124) grad_norm 2.5020 (3.3113) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][130/625] eta 0:03:21 lr 0.000039 wd 0.0500 time 0.3964 (0.4077) data time 0.0006 (0.0036) model time 0.3957 (0.3961) loss 7.3936 (6.4155) grad_norm 2.0609 (3.4103) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][140/625] eta 0:03:17 lr 0.000039 wd 0.0500 time 0.3969 (0.4069) data time 0.0007 (0.0034) model time 0.3962 (0.3960) loss 6.1009 (6.3958) grad_norm 3.5392 (3.4513) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][150/625] eta 0:03:12 lr 0.000039 wd 0.0500 time 0.3951 (0.4062) data time 0.0006 (0.0033) model time 0.3944 (0.3959) loss 7.2155 (6.3936) grad_norm 2.5641 (3.4308) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][160/625] eta 0:03:08 lr 0.000039 wd 0.0500 time 0.3970 (0.4056) data time 0.0006 (0.0031) model time 0.3964 (0.3959) loss 6.0602 (6.3689) grad_norm 4.4201 (3.4202) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][170/625] eta 0:03:04 lr 0.000039 wd 0.0500 time 0.3985 (0.4050) data time 0.0008 (0.0030) model time 0.3976 (0.3958) loss 6.3155 (6.3665) grad_norm 7.9974 (3.6476) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][180/625] eta 0:03:00 lr 0.000038 wd 0.0500 time 0.3945 (0.4047) data time 0.0007 (0.0029) model time 0.3938 (0.3960) loss 6.0376 (6.3790) grad_norm 3.7947 (3.7630) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][190/625] eta 0:02:55 lr 0.000038 wd 0.0500 time 0.3956 (0.4043) data time 0.0009 (0.0028) model time 0.3947 (0.3960) loss 5.5404 (6.3713) grad_norm 2.0379 (3.7283) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 12:52:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][200/625] eta 0:02:51 lr 0.000038 wd 0.0500 time 0.3943 (0.4039) data time 0.0007 (0.0027) model time 0.3936 (0.3960) loss 6.9129 (6.3821) grad_norm 2.8963 (3.7297) loss_scale 128.0000 (65.5920) mem 14939MB [2024-07-25 12:52:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][210/625] eta 0:02:47 lr 0.000038 wd 0.0500 time 0.3956 (0.4036) data time 0.0008 (0.0026) model time 0.3948 (0.3960) loss 5.7173 (6.3841) grad_norm 2.6209 (3.7104) loss_scale 128.0000 (68.5498) mem 14939MB [2024-07-25 12:53:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][220/625] eta 0:02:43 lr 0.000038 wd 0.0500 time 0.3966 (0.4034) data time 0.0008 (0.0025) model time 0.3959 (0.3961) loss 6.9848 (6.3865) grad_norm 2.3107 (3.6576) loss_scale 128.0000 (71.2398) mem 14939MB [2024-07-25 12:53:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][230/625] eta 0:02:39 lr 0.000038 wd 0.0500 time 0.3976 (0.4039) data time 0.0006 (0.0024) model time 0.3970 (0.3972) loss 7.1942 (6.3916) grad_norm 2.7821 (3.6221) loss_scale 128.0000 (73.6970) mem 14939MB [2024-07-25 12:53:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][240/625] eta 0:02:35 lr 0.000038 wd 0.0500 time 0.3963 (0.4036) data time 0.0008 (0.0024) model time 0.3955 (0.3972) loss 7.2676 (6.4047) grad_norm 5.7549 (3.6905) loss_scale 128.0000 (75.9502) mem 14939MB [2024-07-25 12:53:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][250/625] eta 0:02:32 lr 0.000038 wd 0.0500 time 0.3980 (0.4057) data time 0.0007 (0.0023) model time 0.3973 (0.4000) loss 7.0635 (6.4048) grad_norm 3.4611 (3.7448) loss_scale 128.0000 (78.0239) mem 14939MB [2024-07-25 12:53:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][260/625] eta 0:02:28 lr 0.000038 wd 0.0500 time 0.5974 (0.4068) data time 0.0009 (0.0023) model time 0.5965 (0.4016) loss 6.4593 (6.4134) grad_norm 2.7163 (3.7302) loss_scale 128.0000 (79.9387) mem 14939MB [2024-07-25 12:53:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][270/625] eta 0:02:24 lr 0.000038 wd 0.0500 time 0.3939 (0.4069) data time 0.0008 (0.0022) model time 0.3931 (0.4020) loss 6.2803 (6.4152) grad_norm 2.6705 (3.7207) loss_scale 128.0000 (81.7122) mem 14939MB [2024-07-25 12:53:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][280/625] eta 0:02:20 lr 0.000038 wd 0.0500 time 0.4014 (0.4066) data time 0.0007 (0.0022) model time 0.4007 (0.4017) loss 7.3398 (6.4172) grad_norm 3.9516 (3.7037) loss_scale 128.0000 (83.3594) mem 14939MB [2024-07-25 12:53:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][290/625] eta 0:02:16 lr 0.000038 wd 0.0500 time 0.3948 (0.4063) data time 0.0008 (0.0021) model time 0.3939 (0.4015) loss 6.4270 (6.4067) grad_norm 2.8398 (3.6900) loss_scale 128.0000 (84.8935) mem 14939MB [2024-07-25 12:53:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][300/625] eta 0:02:11 lr 0.000038 wd 0.0500 time 0.3971 (0.4059) data time 0.0008 (0.0021) model time 0.3963 (0.4013) loss 6.9960 (6.4065) grad_norm 2.8467 (3.6592) loss_scale 128.0000 (86.3256) mem 14939MB [2024-07-25 12:53:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][310/625] eta 0:02:07 lr 0.000038 wd 0.0500 time 0.3990 (0.4056) data time 0.0008 (0.0020) model time 0.3982 (0.4011) loss 6.1929 (6.3956) grad_norm 2.5620 (3.6588) loss_scale 128.0000 (87.6656) mem 14939MB [2024-07-25 12:53:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][320/625] eta 0:02:03 lr 0.000038 wd 0.0500 time 0.4052 (0.4054) data time 0.0007 (0.0020) model time 0.4045 (0.4009) loss 5.6962 (6.4039) grad_norm 2.6081 (3.6329) loss_scale 128.0000 (88.9221) mem 14939MB [2024-07-25 12:53:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][330/625] eta 0:01:59 lr 0.000038 wd 0.0500 time 0.3935 (0.4051) data time 0.0009 (0.0020) model time 0.3927 (0.4007) loss 5.9900 (6.4080) grad_norm 2.5288 (3.6104) loss_scale 128.0000 (90.1027) mem 14939MB [2024-07-25 12:53:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][340/625] eta 0:01:55 lr 0.000038 wd 0.0500 time 0.3980 (0.4049) data time 0.0008 (0.0019) model time 0.3972 (0.4005) loss 5.3969 (6.3985) grad_norm 4.2722 (3.6069) loss_scale 128.0000 (91.2141) mem 14939MB [2024-07-25 12:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][350/625] eta 0:01:51 lr 0.000038 wd 0.0500 time 0.3985 (0.4046) data time 0.0008 (0.0019) model time 0.3978 (0.4004) loss 5.5578 (6.3892) grad_norm 2.5582 (3.6344) loss_scale 128.0000 (92.2621) mem 14939MB [2024-07-25 12:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][360/625] eta 0:01:47 lr 0.000038 wd 0.0500 time 0.3966 (0.4044) data time 0.0009 (0.0019) model time 0.3957 (0.4002) loss 6.7205 (6.3892) grad_norm 2.6569 (3.6594) loss_scale 128.0000 (93.2521) mem 14939MB [2024-07-25 12:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][370/625] eta 0:01:43 lr 0.000038 wd 0.0500 time 0.3954 (0.4042) data time 0.0008 (0.0018) model time 0.3946 (0.4001) loss 5.8738 (6.3868) grad_norm 2.8867 (3.6495) loss_scale 128.0000 (94.1887) mem 14939MB [2024-07-25 12:54:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][380/625] eta 0:01:38 lr 0.000038 wd 0.0500 time 0.4011 (0.4040) data time 0.0009 (0.0018) model time 0.4002 (0.4000) loss 6.4058 (6.3899) grad_norm 3.2664 (3.6754) loss_scale 128.0000 (95.0761) mem 14939MB [2024-07-25 12:54:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][390/625] eta 0:01:34 lr 0.000038 wd 0.0500 time 0.3962 (0.4039) data time 0.0006 (0.0018) model time 0.3955 (0.3999) loss 5.4787 (6.3871) grad_norm 2.9621 (3.6670) loss_scale 128.0000 (95.9182) mem 14939MB [2024-07-25 12:54:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][400/625] eta 0:01:30 lr 0.000038 wd 0.0500 time 0.3966 (0.4037) data time 0.0006 (0.0018) model time 0.3959 (0.3998) loss 6.5837 (6.3818) grad_norm 2.1892 (3.6590) loss_scale 128.0000 (96.7182) mem 14939MB [2024-07-25 12:54:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][410/625] eta 0:01:26 lr 0.000038 wd 0.0500 time 0.3958 (0.4036) data time 0.0007 (0.0017) model time 0.3952 (0.3997) loss 5.0613 (6.3680) grad_norm 6.7898 (3.6673) loss_scale 128.0000 (97.4793) mem 14939MB [2024-07-25 12:54:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][420/625] eta 0:01:22 lr 0.000038 wd 0.0500 time 0.3959 (0.4035) data time 0.0007 (0.0017) model time 0.3952 (0.3997) loss 6.7019 (6.3617) grad_norm 2.6156 (3.6559) loss_scale 128.0000 (98.2043) mem 14939MB [2024-07-25 12:54:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][430/625] eta 0:01:18 lr 0.000038 wd 0.0500 time 0.3961 (0.4033) data time 0.0008 (0.0017) model time 0.3953 (0.3996) loss 5.5134 (6.3603) grad_norm 2.3057 (3.6919) loss_scale 128.0000 (98.8956) mem 14939MB [2024-07-25 12:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][440/625] eta 0:01:14 lr 0.000038 wd 0.0500 time 0.3977 (0.4032) data time 0.0008 (0.0017) model time 0.3969 (0.3995) loss 6.7134 (6.3533) grad_norm 11.2002 (3.7295) loss_scale 128.0000 (99.5556) mem 14939MB [2024-07-25 12:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][450/625] eta 0:01:10 lr 0.000038 wd 0.0500 time 0.3884 (0.4035) data time 0.0009 (0.0017) model time 0.3875 (0.3999) loss 6.5269 (6.3560) grad_norm 2.5014 (3.7254) loss_scale 128.0000 (100.1863) mem 14939MB [2024-07-25 12:54:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][460/625] eta 0:01:06 lr 0.000038 wd 0.0500 time 0.3960 (0.4033) data time 0.0006 (0.0017) model time 0.3954 (0.3998) loss 5.5380 (6.3514) grad_norm 5.0792 (3.7234) loss_scale 128.0000 (100.7896) mem 14939MB [2024-07-25 12:54:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][470/625] eta 0:01:02 lr 0.000038 wd 0.0500 time 0.3948 (0.4043) data time 0.0007 (0.0016) model time 0.3941 (0.4009) loss 5.6992 (6.3494) grad_norm 3.3649 (3.7105) loss_scale 128.0000 (101.3673) mem 14939MB [2024-07-25 12:54:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][480/625] eta 0:00:58 lr 0.000038 wd 0.0500 time 0.3974 (0.4049) data time 0.0006 (0.0016) model time 0.3967 (0.4018) loss 7.8577 (6.3551) grad_norm 3.4583 (3.6927) loss_scale 128.0000 (101.9210) mem 14939MB [2024-07-25 12:54:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][490/625] eta 0:00:54 lr 0.000038 wd 0.0500 time 0.3963 (0.4052) data time 0.0009 (0.0016) model time 0.3955 (0.4021) loss 7.0936 (6.3625) grad_norm 3.3607 (3.6752) loss_scale 128.0000 (102.4521) mem 14939MB [2024-07-25 12:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][500/625] eta 0:00:50 lr 0.000037 wd 0.0500 time 0.3965 (0.4050) data time 0.0008 (0.0016) model time 0.3957 (0.4020) loss 5.6135 (6.3573) grad_norm 2.9431 (3.6722) loss_scale 128.0000 (102.9621) mem 14939MB [2024-07-25 12:54:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][510/625] eta 0:00:46 lr 0.000037 wd 0.0500 time 0.3964 (0.4049) data time 0.0007 (0.0016) model time 0.3958 (0.4018) loss 5.9571 (6.3516) grad_norm 2.4142 (3.6750) loss_scale 128.0000 (103.4521) mem 14939MB [2024-07-25 12:55:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][520/625] eta 0:00:42 lr 0.000037 wd 0.0500 time 0.4017 (0.4047) data time 0.0006 (0.0016) model time 0.4011 (0.4017) loss 6.5294 (6.3520) grad_norm 4.6570 (3.6722) loss_scale 128.0000 (103.9232) mem 14939MB [2024-07-25 12:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][530/625] eta 0:00:38 lr 0.000037 wd 0.0500 time 0.3940 (0.4046) data time 0.0009 (0.0015) model time 0.3931 (0.4016) loss 6.4693 (6.3546) grad_norm 2.0786 (3.6495) loss_scale 128.0000 (104.3766) mem 14939MB [2024-07-25 12:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][540/625] eta 0:00:34 lr 0.000037 wd 0.0500 time 0.3966 (0.4045) data time 0.0006 (0.0015) model time 0.3960 (0.4015) loss 5.7381 (6.3484) grad_norm 2.5311 (3.6274) loss_scale 128.0000 (104.8133) mem 14939MB [2024-07-25 12:55:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][550/625] eta 0:00:30 lr 0.000037 wd 0.0500 time 0.3953 (0.4043) data time 0.0007 (0.0015) model time 0.3946 (0.4014) loss 5.0138 (6.3499) grad_norm 4.0279 (3.6145) loss_scale 128.0000 (105.2341) mem 14939MB [2024-07-25 12:55:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][560/625] eta 0:00:26 lr 0.000037 wd 0.0500 time 0.3940 (0.4042) data time 0.0009 (0.0015) model time 0.3931 (0.4013) loss 5.6281 (6.3485) grad_norm 3.3383 (3.6205) loss_scale 128.0000 (105.6399) mem 14939MB [2024-07-25 12:55:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][570/625] eta 0:00:22 lr 0.000037 wd 0.0500 time 0.3947 (0.4041) data time 0.0009 (0.0015) model time 0.3938 (0.4012) loss 6.2315 (6.3507) grad_norm 2.0239 (3.6059) loss_scale 128.0000 (106.0315) mem 14939MB [2024-07-25 12:55:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][580/625] eta 0:00:18 lr 0.000037 wd 0.0500 time 0.3949 (0.4039) data time 0.0006 (0.0015) model time 0.3942 (0.4011) loss 5.9668 (6.3457) grad_norm 7.5316 (3.6005) loss_scale 128.0000 (106.4096) mem 14939MB [2024-07-25 12:55:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][590/625] eta 0:00:14 lr 0.000037 wd 0.0500 time 0.3954 (0.4038) data time 0.0008 (0.0015) model time 0.3946 (0.4010) loss 6.6117 (6.3484) grad_norm 3.0725 (3.6124) loss_scale 128.0000 (106.7750) mem 14939MB [2024-07-25 12:55:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][600/625] eta 0:00:10 lr 0.000037 wd 0.0500 time 0.3979 (0.4037) data time 0.0006 (0.0015) model time 0.3973 (0.4009) loss 6.7003 (6.3475) grad_norm 1.9209 (3.6133) loss_scale 128.0000 (107.1281) mem 14939MB [2024-07-25 12:55:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][610/625] eta 0:00:06 lr 0.000037 wd 0.0500 time 0.3965 (0.4036) data time 0.0006 (0.0015) model time 0.3959 (0.4009) loss 6.1354 (6.3486) grad_norm 2.6368 (3.6082) loss_scale 128.0000 (107.4697) mem 14939MB [2024-07-25 12:55:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][620/625] eta 0:00:02 lr 0.000037 wd 0.0500 time 0.3967 (0.4035) data time 0.0006 (0.0014) model time 0.3961 (0.4008) loss 7.8489 (6.3490) grad_norm 2.3849 (3.6050) loss_scale 128.0000 (107.8003) mem 14939MB [2024-07-25 12:55:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 273 training takes 0:04:12 [2024-07-25 12:55:44 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 12:55:45 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 12:55:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5435 (0.5435) Acc@1 90.430 (90.430) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 12:55:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8096 (0.6544) Acc@1 82.910 (87.655) Acc@5 97.217 (98.069) Mem 14939MB [2024-07-25 12:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9077 (0.7599) Acc@1 79.346 (84.675) Acc@5 96.191 (97.114) Mem 14939MB [2024-07-25 12:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.293 Acc@5 97.093 [2024-07-25 12:55:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 12:55:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.789 (0.789) Loss 0.5396 (0.5396) Acc@1 90.186 (90.186) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 12:55:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.156) Loss 0.8076 (0.6540) Acc@1 83.105 (87.522) Acc@5 97.070 (98.020) Mem 14939MB [2024-07-25 12:55:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.123) Loss 0.9092 (0.7574) Acc@1 78.809 (84.580) Acc@5 95.898 (97.061) Mem 14939MB [2024-07-25 12:55:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.189 Acc@5 97.029 [2024-07-25 12:55:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 12:55:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][0/625] eta 0:13:10 lr 0.000037 wd 0.0500 time 1.2650 (1.2650) data time 0.6059 (0.6059) model time 0.0000 (0.0000) loss 6.2099 (6.2099) grad_norm 2.5633 (2.5633) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:55:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][10/625] eta 0:05:02 lr 0.000037 wd 0.0500 time 0.3933 (0.4926) data time 0.0007 (0.0559) model time 0.0000 (0.0000) loss 7.1002 (6.2679) grad_norm 2.8981 (3.9159) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][20/625] eta 0:04:30 lr 0.000037 wd 0.0500 time 0.3958 (0.4470) data time 0.0006 (0.0296) model time 0.0000 (0.0000) loss 6.7615 (6.3686) grad_norm 3.7657 (3.5727) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][30/625] eta 0:04:16 lr 0.000037 wd 0.0500 time 0.3960 (0.4308) data time 0.0007 (0.0204) model time 0.0000 (0.0000) loss 5.5919 (6.3164) grad_norm 3.4440 (4.1386) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][40/625] eta 0:04:07 lr 0.000037 wd 0.0500 time 0.3962 (0.4226) data time 0.0008 (0.0156) model time 0.0000 (0.0000) loss 5.6855 (6.3285) grad_norm 3.0417 (3.8738) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][50/625] eta 0:04:00 lr 0.000037 wd 0.0500 time 0.3965 (0.4177) data time 0.0008 (0.0127) model time 0.0000 (0.0000) loss 6.2003 (6.3718) grad_norm 2.7063 (3.8582) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][60/625] eta 0:03:55 lr 0.000037 wd 0.0500 time 0.5729 (0.4170) data time 0.0006 (0.0108) model time 0.5723 (0.4124) loss 5.8992 (6.3486) grad_norm 4.2738 (3.8792) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][70/625] eta 0:03:52 lr 0.000037 wd 0.0500 time 0.3945 (0.4188) data time 0.0006 (0.0094) model time 0.3939 (0.4207) loss 6.1269 (6.2953) grad_norm 2.8657 (3.8240) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][80/625] eta 0:03:52 lr 0.000037 wd 0.0500 time 0.5782 (0.4274) data time 0.0006 (0.0083) model time 0.5776 (0.4431) loss 7.1254 (6.2996) grad_norm 2.1466 (3.6749) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][90/625] eta 0:03:46 lr 0.000037 wd 0.0500 time 0.3961 (0.4240) data time 0.0008 (0.0075) model time 0.3953 (0.4311) loss 6.7236 (6.2992) grad_norm 2.2880 (3.5907) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][100/625] eta 0:03:41 lr 0.000037 wd 0.0500 time 0.3975 (0.4212) data time 0.0008 (0.0068) model time 0.3967 (0.4240) loss 6.0729 (6.3068) grad_norm 3.3251 (3.5852) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][110/625] eta 0:03:35 lr 0.000037 wd 0.0500 time 0.3996 (0.4191) data time 0.0006 (0.0063) model time 0.3990 (0.4195) loss 6.1008 (6.3323) grad_norm 3.8757 (3.6585) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][120/625] eta 0:03:30 lr 0.000037 wd 0.0500 time 0.3956 (0.4172) data time 0.0007 (0.0058) model time 0.3949 (0.4160) loss 7.3715 (6.3140) grad_norm 2.7443 (3.5882) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][130/625] eta 0:03:25 lr 0.000037 wd 0.0500 time 0.3958 (0.4156) data time 0.0008 (0.0055) model time 0.3950 (0.4135) loss 7.2877 (6.3411) grad_norm 8.7828 (3.6048) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][140/625] eta 0:03:20 lr 0.000037 wd 0.0500 time 0.3965 (0.4144) data time 0.0007 (0.0051) model time 0.3958 (0.4117) loss 6.9792 (6.3450) grad_norm 2.5492 (3.6116) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][150/625] eta 0:03:16 lr 0.000037 wd 0.0500 time 0.3989 (0.4133) data time 0.0008 (0.0048) model time 0.3981 (0.4102) loss 6.7365 (6.3402) grad_norm 3.2378 (3.6305) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:56:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][160/625] eta 0:03:11 lr 0.000037 wd 0.0500 time 0.3989 (0.4123) data time 0.0006 (0.0046) model time 0.3983 (0.4089) loss 6.8199 (6.3543) grad_norm 4.1518 (3.6294) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][170/625] eta 0:03:07 lr 0.000037 wd 0.0500 time 0.3950 (0.4114) data time 0.0006 (0.0044) model time 0.3944 (0.4079) loss 7.2339 (6.3735) grad_norm 5.2103 (3.5913) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][180/625] eta 0:03:02 lr 0.000037 wd 0.0500 time 0.4003 (0.4106) data time 0.0006 (0.0042) model time 0.3997 (0.4070) loss 6.9886 (6.3758) grad_norm 4.8538 (3.6302) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][190/625] eta 0:02:58 lr 0.000037 wd 0.0500 time 0.3959 (0.4099) data time 0.0006 (0.0040) model time 0.3953 (0.4062) loss 6.4239 (6.3886) grad_norm 2.1904 (3.6906) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][200/625] eta 0:02:53 lr 0.000036 wd 0.0500 time 0.3954 (0.4092) data time 0.0008 (0.0038) model time 0.3946 (0.4055) loss 6.6759 (6.3815) grad_norm 2.1083 (3.6714) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][210/625] eta 0:02:49 lr 0.000036 wd 0.0500 time 0.3966 (0.4086) data time 0.0007 (0.0037) model time 0.3960 (0.4049) loss 6.3728 (6.3793) grad_norm 2.9780 (3.6454) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][220/625] eta 0:02:45 lr 0.000036 wd 0.0500 time 0.3967 (0.4081) data time 0.0008 (0.0036) model time 0.3959 (0.4044) loss 6.9764 (6.3769) grad_norm 3.1638 (3.6153) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][230/625] eta 0:02:41 lr 0.000036 wd 0.0500 time 0.3998 (0.4084) data time 0.0006 (0.0034) model time 0.3993 (0.4050) loss 5.8569 (6.3823) grad_norm 3.0205 (3.5874) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][240/625] eta 0:02:37 lr 0.000036 wd 0.0500 time 0.3975 (0.4079) data time 0.0006 (0.0033) model time 0.3968 (0.4045) loss 6.9043 (6.3867) grad_norm 3.1903 (3.6353) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][250/625] eta 0:02:32 lr 0.000036 wd 0.0500 time 0.3968 (0.4075) data time 0.0006 (0.0032) model time 0.3962 (0.4041) loss 5.2061 (6.3755) grad_norm 3.2533 (3.6345) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][260/625] eta 0:02:28 lr 0.000036 wd 0.0500 time 0.4012 (0.4072) data time 0.0006 (0.0031) model time 0.4006 (0.4039) loss 6.3788 (6.3671) grad_norm 3.4726 (3.6226) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][270/625] eta 0:02:24 lr 0.000036 wd 0.0500 time 0.3980 (0.4069) data time 0.0006 (0.0030) model time 0.3974 (0.4035) loss 6.2551 (6.3644) grad_norm 2.9255 (3.6030) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][280/625] eta 0:02:20 lr 0.000036 wd 0.0500 time 0.3959 (0.4065) data time 0.0008 (0.0030) model time 0.3951 (0.4032) loss 5.4800 (6.3659) grad_norm 2.9505 (3.5724) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][290/625] eta 0:02:16 lr 0.000036 wd 0.0500 time 0.3975 (0.4079) data time 0.0008 (0.0029) model time 0.3967 (0.4050) loss 5.6697 (6.3645) grad_norm 4.8653 (3.6481) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][300/625] eta 0:02:13 lr 0.000036 wd 0.0500 time 0.5881 (0.4104) data time 0.0008 (0.0028) model time 0.5874 (0.4081) loss 6.7552 (6.3605) grad_norm 3.1060 (3.6336) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:57:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][310/625] eta 0:02:09 lr 0.000036 wd 0.0500 time 0.3986 (0.4099) data time 0.0006 (0.0028) model time 0.3980 (0.4076) loss 6.9353 (6.3657) grad_norm 3.9703 (3.6184) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][320/625] eta 0:02:04 lr 0.000036 wd 0.0500 time 0.4023 (0.4095) data time 0.0008 (0.0027) model time 0.4015 (0.4071) loss 6.4063 (6.3620) grad_norm 3.2194 (3.6036) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][330/625] eta 0:02:00 lr 0.000036 wd 0.0500 time 0.3995 (0.4092) data time 0.0007 (0.0026) model time 0.3989 (0.4068) loss 6.4103 (6.3517) grad_norm 2.9278 (3.5777) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][340/625] eta 0:01:56 lr 0.000036 wd 0.0500 time 0.3963 (0.4088) data time 0.0006 (0.0026) model time 0.3957 (0.4064) loss 7.6018 (6.3533) grad_norm 2.9118 (3.5698) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][350/625] eta 0:01:52 lr 0.000036 wd 0.0500 time 0.3959 (0.4086) data time 0.0008 (0.0026) model time 0.3952 (0.4062) loss 6.4039 (6.3592) grad_norm 2.5190 (3.5559) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][360/625] eta 0:01:48 lr 0.000036 wd 0.0500 time 0.4588 (0.4084) data time 0.0007 (0.0025) model time 0.4581 (0.4061) loss 5.7997 (6.3535) grad_norm 3.6176 (3.5803) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][370/625] eta 0:01:44 lr 0.000036 wd 0.0500 time 0.3935 (0.4082) data time 0.0008 (0.0025) model time 0.3927 (0.4058) loss 5.3728 (6.3456) grad_norm 2.9212 (3.5821) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][380/625] eta 0:01:39 lr 0.000036 wd 0.0500 time 0.3966 (0.4078) data time 0.0006 (0.0024) model time 0.3960 (0.4055) loss 5.2392 (6.3429) grad_norm 2.7698 (3.5647) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][390/625] eta 0:01:35 lr 0.000036 wd 0.0500 time 0.3971 (0.4075) data time 0.0008 (0.0024) model time 0.3963 (0.4051) loss 5.1998 (6.3423) grad_norm 3.4572 (3.5582) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][400/625] eta 0:01:31 lr 0.000036 wd 0.0500 time 0.3923 (0.4073) data time 0.0006 (0.0024) model time 0.3917 (0.4050) loss 6.6199 (6.3385) grad_norm 2.5295 (3.5519) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][410/625] eta 0:01:27 lr 0.000036 wd 0.0500 time 0.3956 (0.4071) data time 0.0007 (0.0023) model time 0.3948 (0.4048) loss 6.0267 (6.3361) grad_norm 2.8518 (3.5801) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][420/625] eta 0:01:23 lr 0.000036 wd 0.0500 time 0.4030 (0.4070) data time 0.0006 (0.0023) model time 0.4024 (0.4046) loss 5.9043 (6.3340) grad_norm 3.0377 (3.5702) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][430/625] eta 0:01:19 lr 0.000036 wd 0.0500 time 0.3965 (0.4068) data time 0.0006 (0.0023) model time 0.3959 (0.4044) loss 7.5184 (6.3379) grad_norm 3.4007 (3.5635) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][440/625] eta 0:01:15 lr 0.000036 wd 0.0500 time 0.3971 (0.4067) data time 0.0008 (0.0022) model time 0.3963 (0.4044) loss 5.4034 (6.3314) grad_norm 2.3595 (3.5433) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][450/625] eta 0:01:11 lr 0.000036 wd 0.0500 time 0.4189 (0.4069) data time 0.0006 (0.0022) model time 0.4183 (0.4046) loss 7.3653 (6.3338) grad_norm 3.9873 (3.5737) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:58:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][460/625] eta 0:01:07 lr 0.000036 wd 0.0500 time 0.3954 (0.4067) data time 0.0006 (0.0022) model time 0.3948 (0.4044) loss 5.5589 (6.3308) grad_norm 2.4678 (3.5705) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:59:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][470/625] eta 0:01:03 lr 0.000036 wd 0.0500 time 0.3950 (0.4065) data time 0.0006 (0.0021) model time 0.3944 (0.4042) loss 6.5802 (6.3272) grad_norm 4.4106 (3.5850) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:59:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][480/625] eta 0:00:58 lr 0.000036 wd 0.0500 time 0.3982 (0.4063) data time 0.0008 (0.0021) model time 0.3974 (0.4041) loss 6.0868 (6.3304) grad_norm 2.6529 (3.5978) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 12:59:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][490/625] eta 0:00:54 lr 0.000036 wd 0.0500 time 0.3955 (0.4061) data time 0.0006 (0.0021) model time 0.3949 (0.4039) loss 7.3201 (6.3355) grad_norm 2.4686 (inf) loss_scale 64.0000 (126.9572) mem 14939MB [2024-07-25 12:59:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][500/625] eta 0:00:50 lr 0.000036 wd 0.0500 time 0.3964 (0.4059) data time 0.0007 (0.0021) model time 0.3958 (0.4037) loss 5.6470 (6.3327) grad_norm 2.9132 (inf) loss_scale 64.0000 (125.7006) mem 14939MB [2024-07-25 12:59:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][510/625] eta 0:00:46 lr 0.000036 wd 0.0500 time 0.4019 (0.4068) data time 0.0008 (0.0020) model time 0.4011 (0.4048) loss 6.3410 (6.3365) grad_norm 2.6460 (inf) loss_scale 64.0000 (124.4932) mem 14939MB [2024-07-25 12:59:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][520/625] eta 0:00:42 lr 0.000036 wd 0.0500 time 0.3949 (0.4076) data time 0.0008 (0.0020) model time 0.3941 (0.4057) loss 6.7216 (6.3390) grad_norm 2.4997 (inf) loss_scale 64.0000 (123.3321) mem 14939MB [2024-07-25 12:59:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][530/625] eta 0:00:38 lr 0.000035 wd 0.0500 time 0.3989 (0.4074) data time 0.0008 (0.0020) model time 0.3981 (0.4055) loss 7.1356 (6.3376) grad_norm 2.6428 (inf) loss_scale 64.0000 (122.2147) mem 14939MB [2024-07-25 12:59:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][540/625] eta 0:00:34 lr 0.000035 wd 0.0500 time 0.4020 (0.4072) data time 0.0009 (0.0020) model time 0.4011 (0.4053) loss 7.2852 (6.3448) grad_norm 3.7949 (inf) loss_scale 64.0000 (121.1386) mem 14939MB [2024-07-25 12:59:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][550/625] eta 0:00:30 lr 0.000035 wd 0.0500 time 0.3953 (0.4070) data time 0.0008 (0.0019) model time 0.3945 (0.4051) loss 7.4877 (6.3458) grad_norm 8.0893 (inf) loss_scale 64.0000 (120.1016) mem 14939MB [2024-07-25 12:59:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][560/625] eta 0:00:26 lr 0.000035 wd 0.0500 time 0.3967 (0.4068) data time 0.0007 (0.0019) model time 0.3961 (0.4049) loss 7.1065 (6.3464) grad_norm 5.9065 (inf) loss_scale 64.0000 (119.1016) mem 14939MB [2024-07-25 12:59:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][570/625] eta 0:00:22 lr 0.000035 wd 0.0500 time 0.3998 (0.4067) data time 0.0008 (0.0019) model time 0.3990 (0.4048) loss 6.2125 (6.3443) grad_norm 2.1717 (inf) loss_scale 64.0000 (118.1366) mem 14939MB [2024-07-25 12:59:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][580/625] eta 0:00:18 lr 0.000035 wd 0.0500 time 0.3974 (0.4065) data time 0.0006 (0.0019) model time 0.3967 (0.4046) loss 5.6284 (6.3430) grad_norm 4.4343 (inf) loss_scale 64.0000 (117.2048) mem 14939MB [2024-07-25 12:59:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][590/625] eta 0:00:14 lr 0.000035 wd 0.0500 time 0.3956 (0.4064) data time 0.0006 (0.0019) model time 0.3949 (0.4045) loss 6.0813 (6.3441) grad_norm 2.8259 (inf) loss_scale 64.0000 (116.3046) mem 14939MB [2024-07-25 12:59:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][600/625] eta 0:00:10 lr 0.000035 wd 0.0500 time 0.3952 (0.4063) data time 0.0009 (0.0018) model time 0.3943 (0.4044) loss 7.1662 (6.3475) grad_norm 4.8148 (inf) loss_scale 64.0000 (115.4343) mem 14939MB [2024-07-25 12:59:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][610/625] eta 0:00:06 lr 0.000035 wd 0.0500 time 0.3941 (0.4061) data time 0.0006 (0.0018) model time 0.3935 (0.4042) loss 5.7948 (6.3477) grad_norm 2.6258 (inf) loss_scale 64.0000 (114.5925) mem 14939MB [2024-07-25 13:00:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][620/625] eta 0:00:02 lr 0.000035 wd 0.0500 time 0.3959 (0.4060) data time 0.0006 (0.0018) model time 0.3954 (0.4041) loss 6.8008 (6.3492) grad_norm 2.3363 (inf) loss_scale 64.0000 (113.7778) mem 14939MB [2024-07-25 13:00:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 274 training takes 0:04:13 [2024-07-25 13:00:04 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:00:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:00:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5356 (0.5356) Acc@1 90.039 (90.039) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:00:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8081 (0.6493) Acc@1 82.764 (87.700) Acc@5 97.168 (98.100) Mem 14939MB [2024-07-25 13:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9082 (0.7545) Acc@1 78.955 (84.724) Acc@5 96.094 (97.119) Mem 14939MB [2024-07-25 13:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.331 Acc@5 97.087 [2024-07-25 13:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:00:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.759 (0.759) Loss 0.5400 (0.5400) Acc@1 90.234 (90.234) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 13:00:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.152) Loss 0.8081 (0.6542) Acc@1 83.154 (87.558) Acc@5 97.070 (98.016) Mem 14939MB [2024-07-25 13:00:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.120) Loss 0.9097 (0.7576) Acc@1 78.809 (84.601) Acc@5 95.898 (97.059) Mem 14939MB [2024-07-25 13:00:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.207 Acc@5 97.027 [2024-07-25 13:00:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 13:00:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.21% [2024-07-25 13:00:10 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:00:11 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:00:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][0/625] eta 0:08:13 lr 0.000035 wd 0.0500 time 0.7898 (0.7898) data time 0.4085 (0.4085) model time 0.0000 (0.0000) loss 6.9758 (6.9758) grad_norm 2.7055 (2.7055) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][10/625] eta 0:04:25 lr 0.000035 wd 0.0500 time 0.3958 (0.4315) data time 0.0006 (0.0378) model time 0.0000 (0.0000) loss 6.1962 (6.3801) grad_norm 2.3435 (3.8583) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][20/625] eta 0:04:10 lr 0.000035 wd 0.0500 time 0.3943 (0.4144) data time 0.0006 (0.0202) model time 0.0000 (0.0000) loss 6.6336 (6.4379) grad_norm 2.8677 (3.6252) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][30/625] eta 0:04:03 lr 0.000035 wd 0.0500 time 0.3956 (0.4085) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 6.5485 (6.4529) grad_norm 2.0616 (3.6785) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][40/625] eta 0:03:57 lr 0.000035 wd 0.0500 time 0.3970 (0.4055) data time 0.0006 (0.0108) model time 0.0000 (0.0000) loss 6.9977 (6.4405) grad_norm 2.7985 (3.7716) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][50/625] eta 0:03:52 lr 0.000035 wd 0.0500 time 0.3926 (0.4038) data time 0.0008 (0.0088) model time 0.0000 (0.0000) loss 5.4570 (6.4383) grad_norm 4.2614 (3.8118) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][60/625] eta 0:03:47 lr 0.000035 wd 0.0500 time 0.3957 (0.4025) data time 0.0008 (0.0075) model time 0.3949 (0.3948) loss 6.0972 (6.3737) grad_norm 2.7337 (3.7589) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][70/625] eta 0:03:42 lr 0.000035 wd 0.0500 time 0.4014 (0.4016) data time 0.0008 (0.0066) model time 0.4007 (0.3952) loss 6.3041 (6.4115) grad_norm 3.1329 (3.7993) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][80/625] eta 0:03:38 lr 0.000035 wd 0.0500 time 0.3932 (0.4008) data time 0.0008 (0.0059) model time 0.3924 (0.3949) loss 6.4281 (6.3879) grad_norm 3.1291 (3.6861) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][90/625] eta 0:03:34 lr 0.000035 wd 0.0500 time 0.3950 (0.4004) data time 0.0007 (0.0053) model time 0.3943 (0.3952) loss 7.6227 (6.3877) grad_norm 3.4692 (3.5896) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][100/625] eta 0:03:31 lr 0.000035 wd 0.0500 time 0.3969 (0.4032) data time 0.0008 (0.0049) model time 0.3961 (0.4016) loss 7.0746 (6.3899) grad_norm 2.3466 (3.7230) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:00:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][110/625] eta 0:03:30 lr 0.000035 wd 0.0500 time 0.5460 (0.4094) data time 0.0007 (0.0045) model time 0.5453 (0.4132) loss 6.1465 (6.3838) grad_norm 2.6778 (3.6994) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][120/625] eta 0:03:28 lr 0.000035 wd 0.0500 time 0.3949 (0.4126) data time 0.0008 (0.0042) model time 0.3941 (0.4182) loss 6.3431 (6.3506) grad_norm 2.9867 (3.6464) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][130/625] eta 0:03:23 lr 0.000035 wd 0.0500 time 0.3966 (0.4114) data time 0.0007 (0.0039) model time 0.3959 (0.4154) loss 6.8150 (6.3738) grad_norm 2.7610 (3.6377) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][140/625] eta 0:03:18 lr 0.000035 wd 0.0500 time 0.3974 (0.4103) data time 0.0008 (0.0037) model time 0.3966 (0.4131) loss 6.3206 (6.3309) grad_norm 2.0064 (3.5974) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][150/625] eta 0:03:14 lr 0.000035 wd 0.0500 time 0.3967 (0.4096) data time 0.0008 (0.0035) model time 0.3959 (0.4117) loss 7.3080 (6.3469) grad_norm 3.3765 (3.6152) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][160/625] eta 0:03:10 lr 0.000035 wd 0.0500 time 0.3961 (0.4088) data time 0.0008 (0.0033) model time 0.3953 (0.4103) loss 7.2672 (6.3551) grad_norm 2.9379 (3.5965) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][170/625] eta 0:03:05 lr 0.000035 wd 0.0500 time 0.3970 (0.4081) data time 0.0010 (0.0032) model time 0.3960 (0.4090) loss 7.2468 (6.3367) grad_norm 8.4463 (3.6667) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][180/625] eta 0:03:01 lr 0.000035 wd 0.0500 time 0.3948 (0.4074) data time 0.0007 (0.0031) model time 0.3941 (0.4079) loss 6.3152 (6.3307) grad_norm 2.8890 (3.6292) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][190/625] eta 0:02:56 lr 0.000035 wd 0.0500 time 0.3964 (0.4069) data time 0.0009 (0.0030) model time 0.3956 (0.4071) loss 7.1200 (6.3317) grad_norm 2.5706 (3.6191) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][200/625] eta 0:02:52 lr 0.000035 wd 0.0500 time 0.4080 (0.4070) data time 0.0008 (0.0029) model time 0.4072 (0.4072) loss 5.0404 (6.3239) grad_norm 2.6762 (3.6683) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][210/625] eta 0:02:48 lr 0.000035 wd 0.0500 time 0.3941 (0.4064) data time 0.0009 (0.0028) model time 0.3932 (0.4064) loss 6.0749 (6.3168) grad_norm 4.3629 (3.8066) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][220/625] eta 0:02:44 lr 0.000035 wd 0.0500 time 0.3968 (0.4059) data time 0.0006 (0.0027) model time 0.3961 (0.4057) loss 5.4413 (6.3145) grad_norm 2.9577 (3.7995) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][230/625] eta 0:02:40 lr 0.000035 wd 0.0500 time 0.3947 (0.4055) data time 0.0007 (0.0026) model time 0.3940 (0.4051) loss 6.8933 (6.3045) grad_norm 2.0113 (3.7965) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][240/625] eta 0:02:35 lr 0.000035 wd 0.0500 time 0.3947 (0.4051) data time 0.0008 (0.0025) model time 0.3939 (0.4046) loss 6.0225 (6.2900) grad_norm 2.0627 (3.7653) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][250/625] eta 0:02:31 lr 0.000034 wd 0.0500 time 0.3960 (0.4048) data time 0.0007 (0.0025) model time 0.3953 (0.4042) loss 5.7841 (6.2753) grad_norm 2.4181 (3.7356) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:01:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][260/625] eta 0:02:27 lr 0.000034 wd 0.0500 time 0.3986 (0.4045) data time 0.0008 (0.0024) model time 0.3977 (0.4038) loss 5.5189 (6.2872) grad_norm 2.3777 (3.7773) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][270/625] eta 0:02:23 lr 0.000034 wd 0.0500 time 0.3850 (0.4042) data time 0.0008 (0.0024) model time 0.3842 (0.4034) loss 6.9918 (6.3005) grad_norm 2.5450 (3.7840) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][280/625] eta 0:02:19 lr 0.000034 wd 0.0500 time 0.3943 (0.4039) data time 0.0006 (0.0023) model time 0.3936 (0.4031) loss 6.7786 (6.3010) grad_norm 2.1136 (3.7512) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][290/625] eta 0:02:15 lr 0.000034 wd 0.0500 time 0.3948 (0.4036) data time 0.0008 (0.0022) model time 0.3940 (0.4027) loss 6.8557 (6.2961) grad_norm 2.5989 (3.7235) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][300/625] eta 0:02:11 lr 0.000034 wd 0.0500 time 0.3948 (0.4034) data time 0.0008 (0.0022) model time 0.3939 (0.4024) loss 6.4769 (6.2998) grad_norm 4.7867 (3.7298) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][310/625] eta 0:02:06 lr 0.000034 wd 0.0500 time 0.4010 (0.4031) data time 0.0008 (0.0022) model time 0.4002 (0.4021) loss 5.9288 (6.2963) grad_norm 4.4883 (3.7350) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][320/625] eta 0:02:03 lr 0.000034 wd 0.0500 time 0.5604 (0.4043) data time 0.0008 (0.0021) model time 0.5595 (0.4035) loss 6.5286 (6.3006) grad_norm 3.6673 (3.7190) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][330/625] eta 0:01:59 lr 0.000034 wd 0.0500 time 0.6034 (0.4057) data time 0.0007 (0.0021) model time 0.6027 (0.4051) loss 5.7532 (6.3023) grad_norm 4.5136 (3.7428) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][340/625] eta 0:01:55 lr 0.000034 wd 0.0500 time 0.3938 (0.4068) data time 0.0007 (0.0020) model time 0.3931 (0.4065) loss 6.4135 (6.3059) grad_norm 2.1968 (3.7362) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][350/625] eta 0:01:51 lr 0.000034 wd 0.0500 time 0.3954 (0.4065) data time 0.0006 (0.0020) model time 0.3948 (0.4061) loss 6.4016 (6.3165) grad_norm 2.1322 (3.7275) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][360/625] eta 0:01:47 lr 0.000034 wd 0.0500 time 0.3979 (0.4062) data time 0.0009 (0.0020) model time 0.3970 (0.4058) loss 5.8747 (6.3278) grad_norm 2.2546 (3.7276) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][370/625] eta 0:01:43 lr 0.000034 wd 0.0500 time 0.3954 (0.4060) data time 0.0007 (0.0019) model time 0.3947 (0.4055) loss 5.9828 (6.3201) grad_norm 1.9587 (3.7160) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][380/625] eta 0:01:39 lr 0.000034 wd 0.0500 time 0.3940 (0.4058) data time 0.0009 (0.0019) model time 0.3931 (0.4052) loss 5.5084 (6.3115) grad_norm 2.7714 (3.7019) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][390/625] eta 0:01:35 lr 0.000034 wd 0.0500 time 0.3968 (0.4055) data time 0.0006 (0.0019) model time 0.3962 (0.4049) loss 6.0584 (6.3040) grad_norm 2.7504 (3.7007) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][400/625] eta 0:01:31 lr 0.000034 wd 0.0500 time 0.3969 (0.4053) data time 0.0006 (0.0019) model time 0.3963 (0.4046) loss 5.5944 (6.2985) grad_norm 4.7633 (3.6928) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][410/625] eta 0:01:27 lr 0.000034 wd 0.0500 time 0.3954 (0.4050) data time 0.0009 (0.0018) model time 0.3945 (0.4044) loss 7.1133 (6.2950) grad_norm 2.7918 (3.7197) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][420/625] eta 0:01:23 lr 0.000034 wd 0.0500 time 0.4007 (0.4053) data time 0.0006 (0.0018) model time 0.4000 (0.4046) loss 6.6097 (6.2975) grad_norm 2.4993 (3.7100) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][430/625] eta 0:01:18 lr 0.000034 wd 0.0500 time 0.3951 (0.4051) data time 0.0006 (0.0018) model time 0.3945 (0.4044) loss 6.6698 (6.3017) grad_norm 2.2467 (3.7113) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][440/625] eta 0:01:14 lr 0.000034 wd 0.0500 time 0.3962 (0.4049) data time 0.0006 (0.0018) model time 0.3956 (0.4042) loss 6.7058 (6.3002) grad_norm 6.5167 (3.7473) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][450/625] eta 0:01:10 lr 0.000034 wd 0.0500 time 0.3949 (0.4047) data time 0.0007 (0.0017) model time 0.3942 (0.4040) loss 6.1510 (6.2958) grad_norm 1.9405 (3.7440) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][460/625] eta 0:01:06 lr 0.000034 wd 0.0500 time 0.3950 (0.4045) data time 0.0009 (0.0017) model time 0.3941 (0.4038) loss 6.9383 (6.3040) grad_norm 2.8779 (3.7308) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][470/625] eta 0:01:02 lr 0.000034 wd 0.0500 time 0.3967 (0.4044) data time 0.0006 (0.0017) model time 0.3960 (0.4036) loss 6.8316 (6.3058) grad_norm 2.4928 (3.7162) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][480/625] eta 0:00:58 lr 0.000034 wd 0.0500 time 0.4002 (0.4043) data time 0.0007 (0.0017) model time 0.3995 (0.4035) loss 5.3169 (6.3106) grad_norm 6.3421 (3.7179) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][490/625] eta 0:00:54 lr 0.000034 wd 0.0500 time 0.3959 (0.4041) data time 0.0008 (0.0017) model time 0.3952 (0.4033) loss 5.7555 (6.3131) grad_norm 2.8421 (3.8266) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][500/625] eta 0:00:50 lr 0.000034 wd 0.0500 time 0.3979 (0.4040) data time 0.0008 (0.0017) model time 0.3971 (0.4032) loss 6.3132 (6.3111) grad_norm 2.7735 (3.8071) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][510/625] eta 0:00:46 lr 0.000034 wd 0.0500 time 0.3967 (0.4039) data time 0.0005 (0.0016) model time 0.3962 (0.4030) loss 6.1638 (6.3141) grad_norm 3.4509 (3.8592) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][520/625] eta 0:00:42 lr 0.000034 wd 0.0500 time 0.3952 (0.4037) data time 0.0008 (0.0016) model time 0.3945 (0.4029) loss 7.1967 (6.3207) grad_norm 2.5426 (3.8475) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][530/625] eta 0:00:38 lr 0.000034 wd 0.0500 time 0.3959 (0.4036) data time 0.0008 (0.0016) model time 0.3952 (0.4027) loss 5.6705 (6.3170) grad_norm 2.6753 (3.8356) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][540/625] eta 0:00:34 lr 0.000034 wd 0.0500 time 0.5994 (0.4041) data time 0.0006 (0.0016) model time 0.5988 (0.4033) loss 7.5823 (6.3215) grad_norm 3.5340 (3.8154) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][550/625] eta 0:00:30 lr 0.000034 wd 0.0500 time 0.3934 (0.4046) data time 0.0007 (0.0016) model time 0.3927 (0.4038) loss 6.4814 (6.3192) grad_norm 2.6693 (3.8025) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:03:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][560/625] eta 0:00:26 lr 0.000034 wd 0.0500 time 0.3942 (0.4053) data time 0.0006 (0.0016) model time 0.3936 (0.4046) loss 6.5895 (6.3248) grad_norm 3.2751 (3.8026) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][570/625] eta 0:00:22 lr 0.000034 wd 0.0500 time 0.4084 (0.4052) data time 0.0006 (0.0016) model time 0.4078 (0.4045) loss 5.7910 (6.3235) grad_norm 18.5249 (3.8193) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][580/625] eta 0:00:18 lr 0.000034 wd 0.0500 time 0.3947 (0.4050) data time 0.0006 (0.0015) model time 0.3941 (0.4043) loss 5.4954 (6.3234) grad_norm 2.3865 (3.8071) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][590/625] eta 0:00:14 lr 0.000034 wd 0.0500 time 0.3955 (0.4049) data time 0.0009 (0.0015) model time 0.3947 (0.4041) loss 7.0674 (6.3257) grad_norm 2.1934 (3.8113) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][600/625] eta 0:00:10 lr 0.000033 wd 0.0500 time 0.3919 (0.4047) data time 0.0010 (0.0015) model time 0.3909 (0.4040) loss 5.9966 (6.3283) grad_norm 2.8271 (3.7945) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][610/625] eta 0:00:06 lr 0.000033 wd 0.0500 time 0.3945 (0.4046) data time 0.0004 (0.0015) model time 0.3941 (0.4038) loss 6.1890 (6.3255) grad_norm 2.7357 (3.7867) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][620/625] eta 0:00:02 lr 0.000033 wd 0.0500 time 0.3953 (0.4044) data time 0.0006 (0.0015) model time 0.3946 (0.4036) loss 6.9607 (6.3295) grad_norm 4.0994 (3.7918) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 275 training takes 0:04:12 [2024-07-25 13:04:24 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:04:25 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:04:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 0.5439 (0.5439) Acc@1 90.137 (90.137) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 13:04:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8130 (0.6531) Acc@1 82.715 (87.580) Acc@5 97.119 (98.038) Mem 14939MB [2024-07-25 13:04:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 0.9009 (0.7567) Acc@1 79.053 (84.656) Acc@5 96.094 (97.087) Mem 14939MB [2024-07-25 13:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.291 Acc@5 97.055 [2024-07-25 13:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:04:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.875 (0.875) Loss 0.5400 (0.5400) Acc@1 90.186 (90.186) Acc@5 98.877 (98.877) Mem 14939MB [2024-07-25 13:04:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.158) Loss 0.8081 (0.6539) Acc@1 83.154 (87.571) Acc@5 97.070 (98.029) Mem 14939MB [2024-07-25 13:04:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.124) Loss 0.9092 (0.7573) Acc@1 78.906 (84.635) Acc@5 95.898 (97.070) Mem 14939MB [2024-07-25 13:04:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.235 Acc@5 97.037 [2024-07-25 13:04:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-07-25 13:04:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.24% [2024-07-25 13:04:31 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:04:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:04:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][0/625] eta 0:07:50 lr 0.000033 wd 0.0500 time 0.7524 (0.7524) data time 0.3692 (0.3692) model time 0.0000 (0.0000) loss 6.6628 (6.6628) grad_norm 3.5816 (3.5816) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][10/625] eta 0:04:23 lr 0.000033 wd 0.0500 time 0.3935 (0.4291) data time 0.0006 (0.0343) model time 0.0000 (0.0000) loss 4.9581 (6.1646) grad_norm 2.3557 (3.0899) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][20/625] eta 0:04:09 lr 0.000033 wd 0.0500 time 0.3965 (0.4132) data time 0.0006 (0.0183) model time 0.0000 (0.0000) loss 5.5963 (6.2153) grad_norm 6.4899 (3.2033) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][30/625] eta 0:04:02 lr 0.000033 wd 0.0500 time 0.3970 (0.4079) data time 0.0008 (0.0127) model time 0.0000 (0.0000) loss 6.6443 (6.2915) grad_norm 2.4448 (3.6500) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][40/625] eta 0:03:56 lr 0.000033 wd 0.0500 time 0.3925 (0.4048) data time 0.0009 (0.0098) model time 0.0000 (0.0000) loss 5.9236 (6.2827) grad_norm 1.9496 (3.5001) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][50/625] eta 0:03:51 lr 0.000033 wd 0.0500 time 0.3977 (0.4031) data time 0.0008 (0.0081) model time 0.0000 (0.0000) loss 6.8710 (6.3585) grad_norm 2.5296 (4.7980) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:04:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][60/625] eta 0:03:47 lr 0.000033 wd 0.0500 time 0.3955 (0.4021) data time 0.0006 (0.0069) model time 0.3949 (0.3961) loss 5.8863 (6.3009) grad_norm 2.3345 (4.5124) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][70/625] eta 0:03:42 lr 0.000033 wd 0.0500 time 0.3942 (0.4011) data time 0.0008 (0.0060) model time 0.3934 (0.3953) loss 5.8476 (6.3102) grad_norm 2.4061 (4.3765) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][80/625] eta 0:03:38 lr 0.000033 wd 0.0500 time 0.3975 (0.4006) data time 0.0008 (0.0054) model time 0.3966 (0.3954) loss 6.6491 (6.3500) grad_norm 2.2920 (4.2805) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][90/625] eta 0:03:34 lr 0.000033 wd 0.0500 time 0.3970 (0.4002) data time 0.0008 (0.0049) model time 0.3962 (0.3956) loss 6.4275 (6.3519) grad_norm 3.0173 (4.1952) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][100/625] eta 0:03:29 lr 0.000033 wd 0.0500 time 0.3953 (0.3998) data time 0.0006 (0.0045) model time 0.3947 (0.3957) loss 5.6445 (6.3343) grad_norm 3.7249 (4.0952) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][110/625] eta 0:03:25 lr 0.000033 wd 0.0500 time 0.3966 (0.3997) data time 0.0006 (0.0041) model time 0.3960 (0.3959) loss 6.9296 (6.3281) grad_norm 4.1155 (4.0808) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][120/625] eta 0:03:21 lr 0.000033 wd 0.0500 time 0.3968 (0.3994) data time 0.0008 (0.0039) model time 0.3960 (0.3960) loss 6.2703 (6.3424) grad_norm 3.2326 (4.0186) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][130/625] eta 0:03:17 lr 0.000033 wd 0.0500 time 0.3963 (0.3992) data time 0.0008 (0.0037) model time 0.3955 (0.3959) loss 5.3998 (6.3323) grad_norm 3.3490 (3.9854) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][140/625] eta 0:03:15 lr 0.000033 wd 0.0500 time 0.5703 (0.4028) data time 0.0007 (0.0035) model time 0.5696 (0.4017) loss 7.0415 (6.3348) grad_norm 4.6238 (3.9520) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][150/625] eta 0:03:13 lr 0.000033 wd 0.0500 time 0.6130 (0.4075) data time 0.0006 (0.0033) model time 0.6123 (0.4088) loss 6.3169 (6.3560) grad_norm 2.0188 (3.8677) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][160/625] eta 0:03:10 lr 0.000033 wd 0.0500 time 0.3921 (0.4090) data time 0.0006 (0.0031) model time 0.3915 (0.4110) loss 6.6322 (6.3604) grad_norm 3.0844 (3.8108) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][170/625] eta 0:03:05 lr 0.000033 wd 0.0500 time 0.3940 (0.4083) data time 0.0007 (0.0030) model time 0.3934 (0.4097) loss 7.2749 (6.3348) grad_norm 3.3169 (3.7954) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][180/625] eta 0:03:01 lr 0.000033 wd 0.0500 time 0.3964 (0.4089) data time 0.0008 (0.0029) model time 0.3956 (0.4104) loss 5.6065 (6.3453) grad_norm 3.1908 (3.7500) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][190/625] eta 0:02:57 lr 0.000033 wd 0.0500 time 0.3957 (0.4082) data time 0.0006 (0.0028) model time 0.3951 (0.4093) loss 6.0203 (6.3652) grad_norm 2.3596 (3.7759) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][200/625] eta 0:02:53 lr 0.000033 wd 0.0500 time 0.3937 (0.4076) data time 0.0009 (0.0027) model time 0.3928 (0.4083) loss 6.4671 (6.3613) grad_norm 2.7273 (3.7565) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:05:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][210/625] eta 0:02:48 lr 0.000033 wd 0.0500 time 0.3955 (0.4070) data time 0.0007 (0.0026) model time 0.3948 (0.4075) loss 6.2130 (6.3579) grad_norm 3.0741 (3.7020) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][220/625] eta 0:02:44 lr 0.000033 wd 0.0500 time 0.4048 (0.4066) data time 0.0009 (0.0025) model time 0.4039 (0.4068) loss 6.8083 (6.3513) grad_norm 3.6320 (3.6949) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][230/625] eta 0:02:40 lr 0.000033 wd 0.0500 time 0.3946 (0.4061) data time 0.0008 (0.0024) model time 0.3938 (0.4062) loss 5.8806 (6.3504) grad_norm 2.9475 (3.6622) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][240/625] eta 0:02:36 lr 0.000033 wd 0.0500 time 0.3936 (0.4058) data time 0.0007 (0.0024) model time 0.3929 (0.4057) loss 7.0225 (6.3607) grad_norm 2.4110 (3.6671) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][250/625] eta 0:02:32 lr 0.000033 wd 0.0500 time 0.3946 (0.4054) data time 0.0008 (0.0023) model time 0.3938 (0.4051) loss 5.5336 (6.3609) grad_norm 2.3016 (3.6742) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][260/625] eta 0:02:27 lr 0.000033 wd 0.0500 time 0.3961 (0.4050) data time 0.0009 (0.0023) model time 0.3952 (0.4046) loss 6.0178 (6.3701) grad_norm 3.7948 (3.6511) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][270/625] eta 0:02:23 lr 0.000033 wd 0.0500 time 0.3968 (0.4047) data time 0.0008 (0.0022) model time 0.3960 (0.4042) loss 6.0857 (6.3801) grad_norm 2.7803 (3.6608) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][280/625] eta 0:02:19 lr 0.000033 wd 0.0500 time 0.3960 (0.4044) data time 0.0008 (0.0022) model time 0.3952 (0.4039) loss 7.0224 (6.3779) grad_norm 3.6679 (3.6320) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][290/625] eta 0:02:15 lr 0.000033 wd 0.0500 time 0.3962 (0.4041) data time 0.0006 (0.0021) model time 0.3956 (0.4035) loss 6.6196 (6.3724) grad_norm 2.5829 (3.6437) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][300/625] eta 0:02:11 lr 0.000033 wd 0.0500 time 0.3924 (0.4039) data time 0.0008 (0.0021) model time 0.3916 (0.4032) loss 5.9361 (6.3734) grad_norm 2.0610 (3.6314) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][310/625] eta 0:02:07 lr 0.000033 wd 0.0500 time 0.3963 (0.4036) data time 0.0008 (0.0020) model time 0.3955 (0.4029) loss 8.2701 (6.3773) grad_norm 2.4429 (3.6619) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][320/625] eta 0:02:03 lr 0.000033 wd 0.0500 time 0.3951 (0.4035) data time 0.0006 (0.0020) model time 0.3946 (0.4027) loss 6.0745 (6.3790) grad_norm 3.2057 (3.7059) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][330/625] eta 0:01:58 lr 0.000032 wd 0.0500 time 0.3975 (0.4032) data time 0.0006 (0.0020) model time 0.3969 (0.4024) loss 6.4095 (6.3757) grad_norm 2.8133 (3.7872) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][340/625] eta 0:01:54 lr 0.000032 wd 0.0500 time 0.3932 (0.4030) data time 0.0007 (0.0019) model time 0.3925 (0.4022) loss 5.6958 (6.3870) grad_norm 4.3380 (3.7817) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][350/625] eta 0:01:50 lr 0.000032 wd 0.0500 time 0.3966 (0.4028) data time 0.0007 (0.0019) model time 0.3958 (0.4020) loss 7.6253 (6.3900) grad_norm 4.9840 (3.7704) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:06:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][360/625] eta 0:01:47 lr 0.000032 wd 0.0500 time 0.5745 (0.4046) data time 0.0006 (0.0019) model time 0.5738 (0.4040) loss 6.7050 (6.3947) grad_norm 3.5389 (3.7568) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][370/625] eta 0:01:43 lr 0.000032 wd 0.0500 time 0.6008 (0.4061) data time 0.0008 (0.0018) model time 0.6000 (0.4058) loss 5.8937 (6.4041) grad_norm 3.1854 (3.7914) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][380/625] eta 0:01:39 lr 0.000032 wd 0.0500 time 0.3916 (0.4069) data time 0.0009 (0.0018) model time 0.3907 (0.4066) loss 6.7759 (6.4057) grad_norm 4.2328 (3.7815) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][390/625] eta 0:01:35 lr 0.000032 wd 0.0500 time 0.4193 (0.4066) data time 0.0008 (0.0018) model time 0.4185 (0.4063) loss 7.6166 (6.4019) grad_norm 3.0085 (3.7560) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][400/625] eta 0:01:31 lr 0.000032 wd 0.0500 time 0.4007 (0.4069) data time 0.0006 (0.0018) model time 0.4001 (0.4066) loss 5.7052 (6.4030) grad_norm 2.6198 (3.8334) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][410/625] eta 0:01:27 lr 0.000032 wd 0.0500 time 0.3913 (0.4067) data time 0.0008 (0.0017) model time 0.3904 (0.4063) loss 6.2968 (6.3998) grad_norm 2.6149 (3.8238) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][420/625] eta 0:01:23 lr 0.000032 wd 0.0500 time 0.3948 (0.4065) data time 0.0006 (0.0017) model time 0.3942 (0.4061) loss 5.2109 (6.4016) grad_norm 3.3378 (3.8127) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][430/625] eta 0:01:19 lr 0.000032 wd 0.0500 time 0.3978 (0.4063) data time 0.0008 (0.0017) model time 0.3970 (0.4059) loss 7.1866 (6.3991) grad_norm 5.4131 (3.7901) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][440/625] eta 0:01:15 lr 0.000032 wd 0.0500 time 0.3956 (0.4061) data time 0.0006 (0.0017) model time 0.3950 (0.4057) loss 6.6869 (6.4018) grad_norm 2.6537 (3.7939) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][450/625] eta 0:01:11 lr 0.000032 wd 0.0500 time 0.3972 (0.4059) data time 0.0008 (0.0017) model time 0.3964 (0.4054) loss 7.0378 (6.3962) grad_norm 3.1579 (3.7756) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][460/625] eta 0:01:06 lr 0.000032 wd 0.0500 time 0.3957 (0.4057) data time 0.0008 (0.0016) model time 0.3949 (0.4052) loss 6.3257 (6.3958) grad_norm 2.6796 (3.7669) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][470/625] eta 0:01:02 lr 0.000032 wd 0.0500 time 0.3962 (0.4055) data time 0.0007 (0.0016) model time 0.3955 (0.4050) loss 6.7646 (6.3959) grad_norm 2.3671 (3.7458) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][480/625] eta 0:00:58 lr 0.000032 wd 0.0500 time 0.3971 (0.4054) data time 0.0008 (0.0016) model time 0.3963 (0.4048) loss 6.5836 (6.3934) grad_norm 2.1837 (3.7266) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][490/625] eta 0:00:54 lr 0.000032 wd 0.0500 time 0.3968 (0.4052) data time 0.0008 (0.0016) model time 0.3960 (0.4046) loss 6.8247 (6.3936) grad_norm 2.5165 (3.7057) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][500/625] eta 0:00:50 lr 0.000032 wd 0.0500 time 0.4015 (0.4051) data time 0.0006 (0.0016) model time 0.4009 (0.4045) loss 5.3390 (6.3875) grad_norm 3.5489 (3.7122) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:07:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][510/625] eta 0:00:46 lr 0.000032 wd 0.0500 time 0.3954 (0.4050) data time 0.0008 (0.0016) model time 0.3946 (0.4044) loss 7.0799 (6.3903) grad_norm 3.0406 (3.6919) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][520/625] eta 0:00:42 lr 0.000032 wd 0.0500 time 0.3936 (0.4049) data time 0.0008 (0.0015) model time 0.3928 (0.4043) loss 6.1442 (6.3930) grad_norm 2.7409 (3.6813) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][530/625] eta 0:00:38 lr 0.000032 wd 0.0500 time 0.3945 (0.4047) data time 0.0006 (0.0015) model time 0.3939 (0.4041) loss 6.2450 (6.3908) grad_norm 2.5422 (3.6791) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][540/625] eta 0:00:34 lr 0.000032 wd 0.0500 time 0.3975 (0.4046) data time 0.0007 (0.0015) model time 0.3968 (0.4039) loss 6.8822 (6.3805) grad_norm 2.3550 (3.6792) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][550/625] eta 0:00:30 lr 0.000032 wd 0.0500 time 0.3997 (0.4045) data time 0.0006 (0.0015) model time 0.3992 (0.4038) loss 6.7535 (6.3752) grad_norm 3.9622 (3.6704) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][560/625] eta 0:00:26 lr 0.000032 wd 0.0500 time 0.3958 (0.4044) data time 0.0006 (0.0015) model time 0.3952 (0.4037) loss 5.5517 (6.3780) grad_norm 2.3351 (3.6551) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][570/625] eta 0:00:22 lr 0.000032 wd 0.0500 time 0.4023 (0.4042) data time 0.0008 (0.0015) model time 0.4015 (0.4035) loss 5.8711 (6.3802) grad_norm 2.7498 (3.7043) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][580/625] eta 0:00:18 lr 0.000032 wd 0.0500 time 0.4125 (0.4050) data time 0.0006 (0.0015) model time 0.4119 (0.4044) loss 5.4721 (6.3777) grad_norm 2.8983 (3.6920) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][590/625] eta 0:00:14 lr 0.000032 wd 0.0500 time 0.3950 (0.4057) data time 0.0007 (0.0015) model time 0.3944 (0.4051) loss 5.2906 (6.3748) grad_norm 3.4615 (3.6845) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][600/625] eta 0:00:10 lr 0.000032 wd 0.0500 time 0.3971 (0.4064) data time 0.0006 (0.0014) model time 0.3965 (0.4059) loss 6.2567 (6.3755) grad_norm 2.4456 (3.7104) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][610/625] eta 0:00:06 lr 0.000032 wd 0.0500 time 0.3964 (0.4062) data time 0.0006 (0.0014) model time 0.3959 (0.4057) loss 6.5961 (6.3753) grad_norm 2.6826 (3.7205) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][620/625] eta 0:00:02 lr 0.000032 wd 0.0500 time 0.3952 (0.4063) data time 0.0004 (0.0014) model time 0.3948 (0.4058) loss 6.0505 (6.3753) grad_norm 2.7755 (3.7043) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 276 training takes 0:04:13 [2024-07-25 13:08:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:08:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:08:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5449 (0.5449) Acc@1 90.381 (90.381) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:08:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8057 (0.6560) Acc@1 82.715 (87.620) Acc@5 97.412 (98.065) Mem 14939MB [2024-07-25 13:08:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9087 (0.7590) Acc@1 79.639 (84.775) Acc@5 96.094 (97.098) Mem 14939MB [2024-07-25 13:08:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.413 Acc@5 97.055 [2024-07-25 13:08:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:08:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.41% [2024-07-25 13:08:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 13:08:50 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 13:08:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.5405 (0.5405) Acc@1 90.283 (90.283) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 13:08:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8071 (0.6538) Acc@1 83.154 (87.607) Acc@5 97.070 (98.025) Mem 14939MB [2024-07-25 13:08:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9092 (0.7572) Acc@1 78.955 (84.663) Acc@5 95.996 (97.070) Mem 14939MB [2024-07-25 13:08:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.259 Acc@5 97.035 [2024-07-25 13:08:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:08:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.26% [2024-07-25 13:08:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:08:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:08:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][0/625] eta 0:08:12 lr 0.000032 wd 0.0500 time 0.7887 (0.7887) data time 0.4159 (0.4159) model time 0.0000 (0.0000) loss 5.5486 (5.5486) grad_norm 2.6974 (2.6974) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:08:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][10/625] eta 0:04:25 lr 0.000032 wd 0.0500 time 0.3941 (0.4315) data time 0.0006 (0.0385) model time 0.0000 (0.0000) loss 6.6250 (6.3176) grad_norm 1.7266 (3.0246) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][20/625] eta 0:04:11 lr 0.000032 wd 0.0500 time 0.3934 (0.4151) data time 0.0008 (0.0206) model time 0.0000 (0.0000) loss 6.8433 (6.2344) grad_norm 3.3974 (3.1509) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][30/625] eta 0:04:03 lr 0.000032 wd 0.0500 time 0.3929 (0.4087) data time 0.0008 (0.0142) model time 0.0000 (0.0000) loss 6.2007 (6.2709) grad_norm 2.6804 (3.2812) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][40/625] eta 0:03:57 lr 0.000032 wd 0.0500 time 0.3957 (0.4060) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 5.3767 (6.3189) grad_norm 7.3685 (3.3828) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][50/625] eta 0:03:52 lr 0.000032 wd 0.0500 time 0.3953 (0.4040) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 7.1560 (6.3107) grad_norm 3.2485 (3.3414) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][60/625] eta 0:03:47 lr 0.000032 wd 0.0500 time 0.4038 (0.4029) data time 0.0008 (0.0076) model time 0.4031 (0.3964) loss 7.3058 (6.3377) grad_norm 2.5143 (3.2568) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][70/625] eta 0:03:43 lr 0.000031 wd 0.0500 time 0.3980 (0.4019) data time 0.0006 (0.0067) model time 0.3974 (0.3959) loss 7.0740 (6.3732) grad_norm 4.5959 (3.3367) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][80/625] eta 0:03:38 lr 0.000031 wd 0.0500 time 0.3949 (0.4012) data time 0.0006 (0.0060) model time 0.3942 (0.3956) loss 5.2964 (6.3696) grad_norm 2.3753 (3.5882) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][90/625] eta 0:03:34 lr 0.000031 wd 0.0500 time 0.3937 (0.4007) data time 0.0006 (0.0054) model time 0.3931 (0.3956) loss 6.4369 (6.3609) grad_norm 3.5582 (3.7048) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][100/625] eta 0:03:30 lr 0.000031 wd 0.0500 time 0.3953 (0.4002) data time 0.0007 (0.0049) model time 0.3946 (0.3956) loss 5.9526 (6.3520) grad_norm 2.0850 (3.6323) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][110/625] eta 0:03:25 lr 0.000031 wd 0.0500 time 0.3953 (0.3998) data time 0.0008 (0.0046) model time 0.3944 (0.3955) loss 5.9224 (6.3408) grad_norm 2.9501 (3.7651) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][120/625] eta 0:03:22 lr 0.000031 wd 0.0500 time 0.3957 (0.4012) data time 0.0009 (0.0042) model time 0.3948 (0.3984) loss 6.6283 (6.3414) grad_norm 3.9487 (3.7585) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][130/625] eta 0:03:18 lr 0.000031 wd 0.0500 time 0.3965 (0.4009) data time 0.0006 (0.0040) model time 0.3959 (0.3981) loss 5.8310 (6.3114) grad_norm 2.4207 (3.6982) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][140/625] eta 0:03:14 lr 0.000031 wd 0.0500 time 0.3957 (0.4005) data time 0.0009 (0.0038) model time 0.3948 (0.3977) loss 6.9141 (6.3300) grad_norm 3.0215 (3.6561) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][150/625] eta 0:03:10 lr 0.000031 wd 0.0500 time 0.3948 (0.4002) data time 0.0008 (0.0036) model time 0.3939 (0.3975) loss 5.3542 (6.3292) grad_norm 1.8317 (3.6235) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:09:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][160/625] eta 0:03:06 lr 0.000031 wd 0.0500 time 0.3952 (0.4000) data time 0.0008 (0.0034) model time 0.3945 (0.3974) loss 6.3455 (6.3462) grad_norm 3.7979 (3.6236) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][170/625] eta 0:03:01 lr 0.000031 wd 0.0500 time 0.3935 (0.3999) data time 0.0009 (0.0032) model time 0.3926 (0.3973) loss 5.8263 (6.3343) grad_norm 2.6635 (3.5941) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][180/625] eta 0:03:00 lr 0.000031 wd 0.0500 time 0.3961 (0.4047) data time 0.0006 (0.0031) model time 0.3955 (0.4042) loss 6.7975 (6.3333) grad_norm 3.2834 (3.6130) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][190/625] eta 0:02:57 lr 0.000031 wd 0.0500 time 0.3965 (0.4086) data time 0.0006 (0.0030) model time 0.3959 (0.4094) loss 6.5972 (6.3496) grad_norm 2.8116 (4.3014) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][200/625] eta 0:02:53 lr 0.000031 wd 0.0500 time 0.3983 (0.4089) data time 0.0006 (0.0029) model time 0.3977 (0.4098) loss 6.5650 (6.3436) grad_norm 2.6387 (4.2498) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][210/625] eta 0:02:49 lr 0.000031 wd 0.0500 time 0.3991 (0.4084) data time 0.0008 (0.0028) model time 0.3983 (0.4090) loss 5.8727 (6.3442) grad_norm 2.5227 (4.1861) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][220/625] eta 0:02:45 lr 0.000031 wd 0.0500 time 0.3954 (0.4079) data time 0.0009 (0.0027) model time 0.3946 (0.4083) loss 6.5798 (6.3391) grad_norm 21.7693 (4.2261) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][230/625] eta 0:02:41 lr 0.000031 wd 0.0500 time 0.3903 (0.4077) data time 0.0008 (0.0026) model time 0.3895 (0.4079) loss 7.3078 (6.3368) grad_norm 2.4026 (4.2006) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][240/625] eta 0:02:36 lr 0.000031 wd 0.0500 time 0.3987 (0.4072) data time 0.0008 (0.0025) model time 0.3979 (0.4072) loss 6.5934 (6.3248) grad_norm 3.4041 (4.1542) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][250/625] eta 0:02:32 lr 0.000031 wd 0.0500 time 0.4001 (0.4068) data time 0.0006 (0.0025) model time 0.3995 (0.4067) loss 5.4453 (6.3234) grad_norm 3.3449 (4.1479) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][260/625] eta 0:02:28 lr 0.000031 wd 0.0500 time 0.3934 (0.4064) data time 0.0008 (0.0024) model time 0.3926 (0.4062) loss 7.1073 (6.3334) grad_norm 2.4604 (4.1151) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][270/625] eta 0:02:24 lr 0.000031 wd 0.0500 time 0.3955 (0.4060) data time 0.0006 (0.0024) model time 0.3949 (0.4057) loss 5.9275 (6.3367) grad_norm 2.4525 (4.0846) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][280/625] eta 0:02:19 lr 0.000031 wd 0.0500 time 0.3960 (0.4057) data time 0.0008 (0.0023) model time 0.3952 (0.4052) loss 5.9674 (6.3350) grad_norm 3.1867 (4.0446) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][290/625] eta 0:02:15 lr 0.000031 wd 0.0500 time 0.3959 (0.4054) data time 0.0007 (0.0023) model time 0.3953 (0.4049) loss 7.8862 (6.3479) grad_norm 2.4327 (4.0984) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][300/625] eta 0:02:11 lr 0.000031 wd 0.0500 time 0.3918 (0.4052) data time 0.0009 (0.0022) model time 0.3909 (0.4046) loss 5.9385 (6.3409) grad_norm 5.9559 (4.0778) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:10:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][310/625] eta 0:02:07 lr 0.000031 wd 0.0500 time 0.3988 (0.4050) data time 0.0008 (0.0022) model time 0.3980 (0.4043) loss 6.9262 (6.3464) grad_norm 2.9602 (4.0442) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][320/625] eta 0:02:03 lr 0.000031 wd 0.0500 time 0.3949 (0.4047) data time 0.0007 (0.0021) model time 0.3942 (0.4040) loss 5.3123 (6.3402) grad_norm 5.3709 (4.0208) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][330/625] eta 0:01:59 lr 0.000031 wd 0.0500 time 0.3948 (0.4048) data time 0.0008 (0.0021) model time 0.3939 (0.4041) loss 7.2011 (6.3465) grad_norm 3.9417 (3.9940) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][340/625] eta 0:01:55 lr 0.000031 wd 0.0500 time 0.3965 (0.4049) data time 0.0007 (0.0021) model time 0.3959 (0.4042) loss 5.3846 (6.3441) grad_norm 3.3886 (3.9568) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][350/625] eta 0:01:51 lr 0.000031 wd 0.0500 time 0.3947 (0.4047) data time 0.0008 (0.0021) model time 0.3939 (0.4040) loss 5.9732 (6.3462) grad_norm 5.5175 (3.9386) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][360/625] eta 0:01:47 lr 0.000031 wd 0.0500 time 0.3961 (0.4046) data time 0.0008 (0.0020) model time 0.3953 (0.4037) loss 6.7440 (6.3550) grad_norm 3.4545 (4.0130) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][370/625] eta 0:01:43 lr 0.000031 wd 0.0500 time 0.3973 (0.4045) data time 0.0009 (0.0020) model time 0.3964 (0.4037) loss 7.3013 (6.3517) grad_norm 3.3411 (3.9957) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][380/625] eta 0:01:39 lr 0.000031 wd 0.0500 time 0.3947 (0.4043) data time 0.0007 (0.0020) model time 0.3940 (0.4035) loss 5.2158 (6.3437) grad_norm 4.6034 (3.9848) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][390/625] eta 0:01:35 lr 0.000031 wd 0.0500 time 0.5413 (0.4047) data time 0.0006 (0.0019) model time 0.5407 (0.4039) loss 6.3155 (6.3420) grad_norm 4.0544 (3.9785) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][400/625] eta 0:01:31 lr 0.000031 wd 0.0500 time 0.5934 (0.4074) data time 0.0006 (0.0020) model time 0.5927 (0.4069) loss 6.2859 (6.3465) grad_norm 3.7246 (3.9592) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][410/625] eta 0:01:27 lr 0.000031 wd 0.0500 time 0.3951 (0.4087) data time 0.0007 (0.0020) model time 0.3944 (0.4085) loss 5.8531 (6.3439) grad_norm 2.2012 (3.9536) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][420/625] eta 0:01:23 lr 0.000031 wd 0.0500 time 0.3959 (0.4092) data time 0.0005 (0.0019) model time 0.3954 (0.4090) loss 6.1549 (6.3408) grad_norm 2.6358 (3.9464) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][430/625] eta 0:01:19 lr 0.000031 wd 0.0500 time 0.3953 (0.4089) data time 0.0008 (0.0019) model time 0.3945 (0.4087) loss 6.9742 (6.3409) grad_norm 2.9556 (3.9225) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][440/625] eta 0:01:15 lr 0.000030 wd 0.0500 time 0.3967 (0.4087) data time 0.0006 (0.0019) model time 0.3962 (0.4083) loss 7.4320 (6.3412) grad_norm 2.2699 (3.9250) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:11:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][450/625] eta 0:01:11 lr 0.000030 wd 0.0500 time 0.3946 (0.4084) data time 0.0008 (0.0019) model time 0.3938 (0.4080) loss 6.7505 (6.3467) grad_norm 3.0724 (3.9084) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][460/625] eta 0:01:07 lr 0.000030 wd 0.0500 time 0.3945 (0.4082) data time 0.0009 (0.0018) model time 0.3936 (0.4077) loss 6.3792 (6.3503) grad_norm 2.8013 (3.9410) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][470/625] eta 0:01:03 lr 0.000030 wd 0.0500 time 0.4020 (0.4079) data time 0.0009 (0.0018) model time 0.4011 (0.4075) loss 6.9045 (6.3511) grad_norm 2.4216 (3.9193) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][480/625] eta 0:00:59 lr 0.000030 wd 0.0500 time 0.3933 (0.4077) data time 0.0005 (0.0018) model time 0.3927 (0.4072) loss 6.1763 (6.3407) grad_norm 4.2368 (3.9445) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][490/625] eta 0:00:55 lr 0.000030 wd 0.0500 time 0.3953 (0.4075) data time 0.0007 (0.0018) model time 0.3946 (0.4069) loss 5.5350 (6.3427) grad_norm 2.3495 (3.9280) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][500/625] eta 0:00:50 lr 0.000030 wd 0.0500 time 0.3964 (0.4072) data time 0.0008 (0.0018) model time 0.3956 (0.4067) loss 6.9758 (6.3473) grad_norm 3.4311 (3.9089) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][510/625] eta 0:00:46 lr 0.000030 wd 0.0500 time 0.4016 (0.4070) data time 0.0009 (0.0017) model time 0.4007 (0.4065) loss 5.4846 (6.3450) grad_norm 3.0543 (3.9073) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][520/625] eta 0:00:42 lr 0.000030 wd 0.0500 time 0.3982 (0.4069) data time 0.0009 (0.0017) model time 0.3974 (0.4062) loss 7.7287 (6.3501) grad_norm 2.8268 (3.8919) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][530/625] eta 0:00:38 lr 0.000030 wd 0.0500 time 0.3976 (0.4067) data time 0.0008 (0.0017) model time 0.3967 (0.4061) loss 6.6609 (6.3494) grad_norm 2.9300 (3.9050) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][540/625] eta 0:00:34 lr 0.000030 wd 0.0500 time 0.3975 (0.4066) data time 0.0007 (0.0017) model time 0.3968 (0.4059) loss 6.7481 (6.3478) grad_norm 15.1459 (3.9054) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][550/625] eta 0:00:30 lr 0.000030 wd 0.0500 time 0.3969 (0.4064) data time 0.0008 (0.0017) model time 0.3961 (0.4057) loss 6.1390 (6.3459) grad_norm 3.7835 (3.9013) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][560/625] eta 0:00:26 lr 0.000030 wd 0.0500 time 0.3963 (0.4065) data time 0.0007 (0.0017) model time 0.3956 (0.4058) loss 6.4616 (6.3438) grad_norm 2.0917 (3.8790) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][570/625] eta 0:00:22 lr 0.000030 wd 0.0500 time 0.3961 (0.4064) data time 0.0006 (0.0016) model time 0.3955 (0.4057) loss 6.2118 (6.3381) grad_norm 3.3428 (3.8699) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][580/625] eta 0:00:18 lr 0.000030 wd 0.0500 time 0.3937 (0.4062) data time 0.0007 (0.0016) model time 0.3930 (0.4055) loss 6.2612 (6.3345) grad_norm 2.6065 (3.8926) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][590/625] eta 0:00:14 lr 0.000030 wd 0.0500 time 0.3963 (0.4060) data time 0.0006 (0.0016) model time 0.3957 (0.4053) loss 6.5866 (6.3361) grad_norm 3.3688 (3.8785) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:12:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][600/625] eta 0:00:10 lr 0.000030 wd 0.0500 time 0.3948 (0.4059) data time 0.0007 (0.0016) model time 0.3940 (0.4052) loss 4.8240 (6.3331) grad_norm 2.6129 (3.8884) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:13:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][610/625] eta 0:00:06 lr 0.000030 wd 0.0500 time 0.5703 (0.4061) data time 0.0004 (0.0016) model time 0.5699 (0.4053) loss 7.0825 (6.3345) grad_norm 2.4520 (3.8774) loss_scale 128.0000 (64.3142) mem 14939MB [2024-07-25 13:13:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][620/625] eta 0:00:02 lr 0.000030 wd 0.0500 time 0.3949 (0.4066) data time 0.0004 (0.0016) model time 0.3944 (0.4059) loss 7.3494 (6.3351) grad_norm 3.5126 (3.9126) loss_scale 128.0000 (65.3398) mem 14939MB [2024-07-25 13:13:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 277 training takes 0:04:14 [2024-07-25 13:13:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:13:08 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:13:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5454 (0.5454) Acc@1 90.332 (90.332) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 13:13:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8066 (0.6547) Acc@1 82.715 (87.646) Acc@5 97.217 (98.069) Mem 14939MB [2024-07-25 13:13:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9072 (0.7583) Acc@1 79.492 (84.703) Acc@5 96.143 (97.152) Mem 14939MB [2024-07-25 13:13:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.335 Acc@5 97.119 [2024-07-25 13:13:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:13:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.800 (0.800) Loss 0.5405 (0.5405) Acc@1 90.234 (90.234) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 13:13:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.156) Loss 0.8076 (0.6538) Acc@1 83.057 (87.598) Acc@5 97.070 (98.016) Mem 14939MB [2024-07-25 13:13:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9092 (0.7571) Acc@1 78.906 (84.659) Acc@5 95.996 (97.077) Mem 14939MB [2024-07-25 13:13:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.257 Acc@5 97.043 [2024-07-25 13:13:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:13:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][0/625] eta 0:14:04 lr 0.000030 wd 0.0500 time 1.3512 (1.3512) data time 0.6069 (0.6069) model time 0.0000 (0.0000) loss 5.8625 (5.8625) grad_norm 3.1164 (3.1164) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][10/625] eta 0:05:42 lr 0.000030 wd 0.0500 time 0.3962 (0.5569) data time 0.0009 (0.0559) model time 0.0000 (0.0000) loss 6.6921 (6.4325) grad_norm 3.4806 (3.5982) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][20/625] eta 0:04:50 lr 0.000030 wd 0.0500 time 0.3952 (0.4803) data time 0.0008 (0.0297) model time 0.0000 (0.0000) loss 6.4974 (6.3776) grad_norm 2.1643 (3.4381) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][30/625] eta 0:04:29 lr 0.000030 wd 0.0500 time 0.3957 (0.4530) data time 0.0008 (0.0204) model time 0.0000 (0.0000) loss 4.7692 (6.3602) grad_norm 2.5633 (3.4649) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][40/625] eta 0:04:16 lr 0.000030 wd 0.0500 time 0.3968 (0.4390) data time 0.0007 (0.0156) model time 0.0000 (0.0000) loss 5.6443 (6.3731) grad_norm 2.2831 (3.3595) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][50/625] eta 0:04:07 lr 0.000030 wd 0.0500 time 0.4005 (0.4307) data time 0.0007 (0.0127) model time 0.0000 (0.0000) loss 5.9460 (6.3938) grad_norm 2.2602 (3.5363) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][60/625] eta 0:04:00 lr 0.000030 wd 0.0500 time 0.4015 (0.4255) data time 0.0008 (0.0107) model time 0.4006 (0.3979) loss 7.0524 (6.4494) grad_norm 2.5191 (3.4770) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][70/625] eta 0:03:53 lr 0.000030 wd 0.0500 time 0.3936 (0.4213) data time 0.0009 (0.0094) model time 0.3927 (0.3965) loss 5.1991 (6.4370) grad_norm 3.5605 (4.6744) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][80/625] eta 0:03:48 lr 0.000030 wd 0.0500 time 0.3941 (0.4185) data time 0.0009 (0.0083) model time 0.3931 (0.3968) loss 7.1619 (6.4067) grad_norm 2.8666 (4.4439) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][90/625] eta 0:03:42 lr 0.000030 wd 0.0500 time 0.3964 (0.4162) data time 0.0009 (0.0075) model time 0.3955 (0.3968) loss 5.3365 (6.3721) grad_norm 3.0061 (4.2878) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:13:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][100/625] eta 0:03:37 lr 0.000030 wd 0.0500 time 0.3977 (0.4142) data time 0.0006 (0.0069) model time 0.3970 (0.3964) loss 6.0938 (6.3557) grad_norm 2.3214 (4.2909) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][110/625] eta 0:03:32 lr 0.000030 wd 0.0500 time 0.4011 (0.4126) data time 0.0009 (0.0063) model time 0.4003 (0.3963) loss 7.4966 (6.3758) grad_norm 2.2632 (4.2284) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][120/625] eta 0:03:28 lr 0.000030 wd 0.0500 time 0.3966 (0.4131) data time 0.0009 (0.0059) model time 0.3957 (0.3994) loss 6.4770 (6.3680) grad_norm 21.1490 (4.2720) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][130/625] eta 0:03:23 lr 0.000030 wd 0.0500 time 0.3959 (0.4118) data time 0.0008 (0.0055) model time 0.3950 (0.3989) loss 7.0642 (6.3841) grad_norm 2.5233 (4.1849) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][140/625] eta 0:03:19 lr 0.000030 wd 0.0500 time 0.3979 (0.4108) data time 0.0009 (0.0052) model time 0.3970 (0.3987) loss 5.1324 (6.3614) grad_norm 2.7738 (4.9059) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][150/625] eta 0:03:14 lr 0.000030 wd 0.0500 time 0.3950 (0.4099) data time 0.0006 (0.0049) model time 0.3944 (0.3983) loss 5.4306 (6.3557) grad_norm 3.7138 (5.0898) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][160/625] eta 0:03:10 lr 0.000030 wd 0.0500 time 0.3982 (0.4093) data time 0.0008 (0.0046) model time 0.3974 (0.3985) loss 7.0421 (6.3483) grad_norm 5.2523 (4.9779) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][170/625] eta 0:03:05 lr 0.000030 wd 0.0500 time 0.3985 (0.4086) data time 0.0007 (0.0044) model time 0.3979 (0.3983) loss 5.2578 (6.3442) grad_norm 2.8968 (4.8667) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][180/625] eta 0:03:01 lr 0.000030 wd 0.0500 time 0.3976 (0.4081) data time 0.0008 (0.0042) model time 0.3967 (0.3983) loss 5.8062 (6.3361) grad_norm 4.0272 (4.7951) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][190/625] eta 0:02:57 lr 0.000030 wd 0.0500 time 0.3960 (0.4077) data time 0.0007 (0.0040) model time 0.3952 (0.3985) loss 5.7803 (6.3309) grad_norm 3.6025 (4.7513) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][200/625] eta 0:02:53 lr 0.000029 wd 0.0500 time 0.3976 (0.4072) data time 0.0007 (0.0039) model time 0.3970 (0.3984) loss 7.7501 (6.3327) grad_norm 2.3972 (4.6733) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][210/625] eta 0:02:49 lr 0.000029 wd 0.0500 time 0.3920 (0.4092) data time 0.0009 (0.0038) model time 0.3912 (0.4014) loss 7.6099 (6.3267) grad_norm 2.4962 (4.6183) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][220/625] eta 0:02:47 lr 0.000029 wd 0.0500 time 0.3994 (0.4140) data time 0.0009 (0.0036) model time 0.3986 (0.4081) loss 5.5842 (6.3195) grad_norm 2.3841 (4.5349) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][230/625] eta 0:02:44 lr 0.000029 wd 0.0500 time 0.3968 (0.4160) data time 0.0006 (0.0035) model time 0.3962 (0.4109) loss 5.7741 (6.3135) grad_norm 1.8325 (4.4653) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][240/625] eta 0:02:40 lr 0.000029 wd 0.0500 time 0.4061 (0.4158) data time 0.0008 (0.0034) model time 0.4052 (0.4109) loss 6.1005 (6.3178) grad_norm 2.3072 (4.4095) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][250/625] eta 0:02:35 lr 0.000029 wd 0.0500 time 0.4012 (0.4151) data time 0.0007 (0.0033) model time 0.4006 (0.4102) loss 6.2781 (6.3274) grad_norm 2.6776 (4.3459) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][260/625] eta 0:02:31 lr 0.000029 wd 0.0500 time 0.3979 (0.4145) data time 0.0006 (0.0032) model time 0.3973 (0.4097) loss 5.6369 (6.3280) grad_norm 3.4718 (4.2950) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][270/625] eta 0:02:26 lr 0.000029 wd 0.0500 time 0.3983 (0.4139) data time 0.0007 (0.0031) model time 0.3976 (0.4091) loss 5.8981 (6.3266) grad_norm 2.3072 (4.2796) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][280/625] eta 0:02:22 lr 0.000029 wd 0.0500 time 0.4014 (0.4133) data time 0.0009 (0.0030) model time 0.4005 (0.4086) loss 6.3031 (6.3241) grad_norm 3.4954 (4.2439) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][290/625] eta 0:02:18 lr 0.000029 wd 0.0500 time 0.4188 (0.4129) data time 0.0009 (0.0030) model time 0.4179 (0.4082) loss 7.6214 (6.3394) grad_norm 2.9123 (4.2015) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][300/625] eta 0:02:14 lr 0.000029 wd 0.0500 time 0.3952 (0.4125) data time 0.0007 (0.0029) model time 0.3946 (0.4079) loss 7.2636 (6.3493) grad_norm 2.9821 (4.1587) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][310/625] eta 0:02:09 lr 0.000029 wd 0.0500 time 0.3992 (0.4121) data time 0.0010 (0.0028) model time 0.3983 (0.4076) loss 7.2531 (6.3519) grad_norm 2.2915 (4.1176) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][320/625] eta 0:02:05 lr 0.000029 wd 0.0500 time 0.3960 (0.4117) data time 0.0007 (0.0028) model time 0.3954 (0.4072) loss 4.9913 (6.3468) grad_norm 2.9031 (4.0784) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][330/625] eta 0:02:01 lr 0.000029 wd 0.0500 time 0.3972 (0.4112) data time 0.0008 (0.0027) model time 0.3963 (0.4068) loss 6.3441 (6.3386) grad_norm 12.0031 (4.1605) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][340/625] eta 0:01:57 lr 0.000029 wd 0.0500 time 0.5845 (0.4113) data time 0.0008 (0.0026) model time 0.5837 (0.4071) loss 7.2911 (6.3500) grad_norm 3.5965 (4.1572) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][350/625] eta 0:01:53 lr 0.000029 wd 0.0500 time 0.3929 (0.4109) data time 0.0008 (0.0026) model time 0.3921 (0.4067) loss 5.3300 (6.3439) grad_norm 3.1188 (4.1319) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][360/625] eta 0:01:48 lr 0.000029 wd 0.0500 time 0.4007 (0.4106) data time 0.0006 (0.0025) model time 0.4001 (0.4064) loss 5.2146 (6.3482) grad_norm 2.6578 (4.0937) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][370/625] eta 0:01:44 lr 0.000029 wd 0.0500 time 0.3977 (0.4103) data time 0.0009 (0.0025) model time 0.3969 (0.4061) loss 6.9495 (6.3557) grad_norm 3.4932 (4.1991) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][380/625] eta 0:01:40 lr 0.000029 wd 0.0500 time 0.3951 (0.4099) data time 0.0008 (0.0025) model time 0.3943 (0.4058) loss 6.9933 (6.3557) grad_norm 3.4065 (4.1867) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][390/625] eta 0:01:36 lr 0.000029 wd 0.0500 time 0.3969 (0.4096) data time 0.0008 (0.0024) model time 0.3962 (0.4056) loss 6.9947 (6.3527) grad_norm 2.5093 (4.1899) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:15:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][400/625] eta 0:01:32 lr 0.000029 wd 0.0500 time 0.3980 (0.4093) data time 0.0009 (0.0024) model time 0.3971 (0.4053) loss 6.7963 (6.3556) grad_norm 3.7445 (4.2224) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][410/625] eta 0:01:27 lr 0.000029 wd 0.0500 time 0.3971 (0.4090) data time 0.0009 (0.0023) model time 0.3963 (0.4050) loss 6.0615 (6.3569) grad_norm 2.4813 (4.2002) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][420/625] eta 0:01:23 lr 0.000029 wd 0.0500 time 0.3968 (0.4087) data time 0.0008 (0.0023) model time 0.3960 (0.4048) loss 6.0836 (6.3530) grad_norm 2.8412 (4.2052) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][430/625] eta 0:01:19 lr 0.000029 wd 0.0500 time 0.3932 (0.4097) data time 0.0007 (0.0023) model time 0.3925 (0.4060) loss 6.1625 (6.3514) grad_norm 2.4153 (4.1867) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][440/625] eta 0:01:16 lr 0.000029 wd 0.0500 time 0.3935 (0.4117) data time 0.0008 (0.0022) model time 0.3927 (0.4083) loss 7.1790 (6.3575) grad_norm 2.6457 (4.1819) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][450/625] eta 0:01:12 lr 0.000029 wd 0.0500 time 0.3958 (0.4127) data time 0.0007 (0.0022) model time 0.3951 (0.4095) loss 6.6597 (6.3574) grad_norm 2.6241 (4.1662) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][460/625] eta 0:01:08 lr 0.000029 wd 0.0500 time 0.3972 (0.4128) data time 0.0010 (0.0022) model time 0.3961 (0.4097) loss 6.6240 (6.3526) grad_norm 3.0419 (4.2145) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][470/625] eta 0:01:03 lr 0.000029 wd 0.0500 time 0.4072 (0.4125) data time 0.0008 (0.0022) model time 0.4064 (0.4094) loss 6.3865 (6.3532) grad_norm 2.5629 (4.2261) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][480/625] eta 0:00:59 lr 0.000029 wd 0.0500 time 0.3963 (0.4122) data time 0.0007 (0.0021) model time 0.3956 (0.4091) loss 6.2611 (6.3529) grad_norm 4.1438 (4.2794) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][490/625] eta 0:00:55 lr 0.000029 wd 0.0500 time 0.3945 (0.4119) data time 0.0008 (0.0021) model time 0.3936 (0.4089) loss 7.3896 (6.3519) grad_norm 2.0303 (4.2511) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][500/625] eta 0:00:51 lr 0.000029 wd 0.0500 time 0.3999 (0.4130) data time 0.0007 (0.0021) model time 0.3992 (0.4101) loss 5.9126 (6.3519) grad_norm 5.2876 (4.2302) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][510/625] eta 0:00:47 lr 0.000029 wd 0.0500 time 0.4007 (0.4127) data time 0.0008 (0.0021) model time 0.3999 (0.4098) loss 7.3361 (6.3598) grad_norm 3.9097 (4.2272) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][520/625] eta 0:00:43 lr 0.000029 wd 0.0500 time 0.4010 (0.4124) data time 0.0007 (0.0020) model time 0.4004 (0.4095) loss 5.9329 (6.3642) grad_norm 2.4276 (4.2102) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][530/625] eta 0:00:39 lr 0.000029 wd 0.0500 time 0.3985 (0.4121) data time 0.0006 (0.0020) model time 0.3979 (0.4092) loss 6.8259 (6.3638) grad_norm 24.5600 (4.2353) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:16:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][540/625] eta 0:00:35 lr 0.000029 wd 0.0500 time 0.3993 (0.4119) data time 0.0008 (0.0020) model time 0.3985 (0.4090) loss 6.1470 (6.3590) grad_norm 2.1202 (4.2717) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][550/625] eta 0:00:30 lr 0.000029 wd 0.0500 time 0.3949 (0.4116) data time 0.0009 (0.0020) model time 0.3940 (0.4088) loss 6.7106 (6.3604) grad_norm 4.6592 (4.2789) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][560/625] eta 0:00:26 lr 0.000029 wd 0.0500 time 0.5582 (0.4116) data time 0.0007 (0.0020) model time 0.5575 (0.4089) loss 5.5383 (6.3608) grad_norm 2.8865 (4.2829) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][570/625] eta 0:00:22 lr 0.000029 wd 0.0500 time 0.3970 (0.4115) data time 0.0006 (0.0019) model time 0.3964 (0.4087) loss 6.1430 (6.3541) grad_norm 2.2576 (4.2541) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][580/625] eta 0:00:18 lr 0.000029 wd 0.0500 time 0.3958 (0.4112) data time 0.0007 (0.0019) model time 0.3951 (0.4085) loss 5.5712 (6.3531) grad_norm 2.6074 (4.2300) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][590/625] eta 0:00:14 lr 0.000028 wd 0.0500 time 0.3959 (0.4111) data time 0.0009 (0.0019) model time 0.3949 (0.4083) loss 6.3968 (6.3531) grad_norm 2.4082 (4.2087) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][600/625] eta 0:00:10 lr 0.000028 wd 0.0500 time 0.3970 (0.4108) data time 0.0008 (0.0019) model time 0.3961 (0.4081) loss 7.3804 (6.3554) grad_norm 2.6303 (4.1849) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][610/625] eta 0:00:06 lr 0.000028 wd 0.0500 time 0.3932 (0.4106) data time 0.0004 (0.0019) model time 0.3927 (0.4079) loss 5.1859 (6.3549) grad_norm 2.7258 (4.1686) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][620/625] eta 0:00:02 lr 0.000028 wd 0.0500 time 0.3973 (0.4104) data time 0.0004 (0.0019) model time 0.3969 (0.4076) loss 6.4262 (6.3564) grad_norm 3.0097 (4.1629) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 278 training takes 0:04:16 [2024-07-25 13:17:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:17:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:17:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5508 (0.5508) Acc@1 90.332 (90.332) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 13:17:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8154 (0.6588) Acc@1 82.764 (87.669) Acc@5 97.119 (98.065) Mem 14939MB [2024-07-25 13:17:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9111 (0.7620) Acc@1 79.443 (84.724) Acc@5 95.898 (97.103) Mem 14939MB [2024-07-25 13:17:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.331 Acc@5 97.063 [2024-07-25 13:17:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:17:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.768 (0.768) Loss 0.5405 (0.5405) Acc@1 90.186 (90.186) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 13:17:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.153) Loss 0.8076 (0.6538) Acc@1 82.910 (87.593) Acc@5 97.119 (98.020) Mem 14939MB [2024-07-25 13:17:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 0.9087 (0.7571) Acc@1 78.906 (84.663) Acc@5 96.045 (97.084) Mem 14939MB [2024-07-25 13:17:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.261 Acc@5 97.055 [2024-07-25 13:17:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:17:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.26% [2024-07-25 13:17:36 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:17:37 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:17:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][0/625] eta 0:08:17 lr 0.000028 wd 0.0500 time 0.7956 (0.7956) data time 0.4118 (0.4118) model time 0.0000 (0.0000) loss 6.3485 (6.3485) grad_norm 2.5346 (2.5346) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][10/625] eta 0:04:29 lr 0.000028 wd 0.0500 time 0.3910 (0.4375) data time 0.0007 (0.0383) model time 0.0000 (0.0000) loss 5.4699 (6.1042) grad_norm 2.3686 (4.5953) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][20/625] eta 0:04:12 lr 0.000028 wd 0.0500 time 0.3976 (0.4180) data time 0.0007 (0.0205) model time 0.0000 (0.0000) loss 5.6146 (6.0417) grad_norm 2.6945 (4.8612) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][30/625] eta 0:04:24 lr 0.000028 wd 0.0500 time 0.3971 (0.4447) data time 0.0010 (0.0141) model time 0.0000 (0.0000) loss 7.0799 (6.1337) grad_norm 2.2471 (4.2311) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:17:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][40/625] eta 0:04:25 lr 0.000028 wd 0.0500 time 0.5305 (0.4546) data time 0.0006 (0.0109) model time 0.0000 (0.0000) loss 5.9352 (6.1692) grad_norm 2.5253 (4.1504) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][50/625] eta 0:04:21 lr 0.000028 wd 0.0500 time 0.3987 (0.4545) data time 0.0006 (0.0089) model time 0.0000 (0.0000) loss 6.2261 (6.2165) grad_norm 2.7075 (4.2846) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][60/625] eta 0:04:11 lr 0.000028 wd 0.0500 time 0.3948 (0.4449) data time 0.0009 (0.0076) model time 0.3940 (0.3953) loss 5.9829 (6.2787) grad_norm 3.3912 (4.2109) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][70/625] eta 0:04:03 lr 0.000028 wd 0.0500 time 0.4105 (0.4383) data time 0.0009 (0.0067) model time 0.4096 (0.3961) loss 6.9456 (6.3486) grad_norm 1.9951 (4.1594) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][80/625] eta 0:03:56 lr 0.000028 wd 0.0500 time 0.3955 (0.4332) data time 0.0006 (0.0059) model time 0.3949 (0.3963) loss 5.6642 (6.3503) grad_norm 2.4508 (3.9924) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][90/625] eta 0:03:49 lr 0.000028 wd 0.0500 time 0.3957 (0.4292) data time 0.0008 (0.0054) model time 0.3949 (0.3963) loss 6.0060 (6.3441) grad_norm 2.6023 (3.8676) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][100/625] eta 0:03:49 lr 0.000028 wd 0.0500 time 0.4316 (0.4366) data time 0.0008 (0.0049) model time 0.4308 (0.4175) loss 7.5581 (6.3963) grad_norm 2.2585 (3.8329) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][110/625] eta 0:03:43 lr 0.000028 wd 0.0500 time 0.3936 (0.4331) data time 0.0009 (0.0047) model time 0.3928 (0.4138) loss 6.6785 (6.4083) grad_norm 2.5467 (3.7703) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][120/625] eta 0:03:37 lr 0.000028 wd 0.0500 time 0.3949 (0.4302) data time 0.0008 (0.0044) model time 0.3941 (0.4114) loss 7.1498 (6.4070) grad_norm 14.5372 (4.0374) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][130/625] eta 0:03:31 lr 0.000028 wd 0.0500 time 0.3949 (0.4276) data time 0.0007 (0.0041) model time 0.3942 (0.4094) loss 5.9892 (6.3864) grad_norm 2.3355 (3.9739) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][140/625] eta 0:03:26 lr 0.000028 wd 0.0500 time 0.3982 (0.4254) data time 0.0007 (0.0039) model time 0.3975 (0.4079) loss 6.3792 (6.3941) grad_norm 2.1775 (3.8848) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][150/625] eta 0:03:21 lr 0.000028 wd 0.0500 time 0.3980 (0.4235) data time 0.0007 (0.0037) model time 0.3973 (0.4067) loss 5.9493 (6.3961) grad_norm 3.0567 (3.8293) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][160/625] eta 0:03:16 lr 0.000028 wd 0.0500 time 0.3980 (0.4218) data time 0.0007 (0.0035) model time 0.3972 (0.4058) loss 6.9669 (6.3740) grad_norm 2.1978 (3.7952) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][170/625] eta 0:03:11 lr 0.000028 wd 0.0500 time 0.3991 (0.4206) data time 0.0008 (0.0033) model time 0.3983 (0.4053) loss 5.4587 (6.3669) grad_norm 5.8463 (3.8160) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][180/625] eta 0:03:06 lr 0.000028 wd 0.0500 time 0.3906 (0.4193) data time 0.0007 (0.0032) model time 0.3899 (0.4046) loss 6.1921 (6.3829) grad_norm 2.7227 (3.8451) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:18:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][190/625] eta 0:03:01 lr 0.000028 wd 0.0500 time 0.3930 (0.4183) data time 0.0009 (0.0031) model time 0.3921 (0.4042) loss 6.2027 (6.3714) grad_norm 4.7052 (3.8643) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][200/625] eta 0:02:57 lr 0.000028 wd 0.0500 time 0.4000 (0.4173) data time 0.0007 (0.0029) model time 0.3993 (0.4037) loss 6.0381 (6.3760) grad_norm 3.6599 (3.8369) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][210/625] eta 0:02:52 lr 0.000028 wd 0.0500 time 0.3969 (0.4164) data time 0.0006 (0.0028) model time 0.3963 (0.4034) loss 5.7341 (6.3675) grad_norm 2.7832 (3.8376) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][220/625] eta 0:02:48 lr 0.000028 wd 0.0500 time 0.3962 (0.4155) data time 0.0010 (0.0028) model time 0.3951 (0.4029) loss 6.4777 (6.3801) grad_norm 1.9199 (3.8352) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][230/625] eta 0:02:43 lr 0.000028 wd 0.0500 time 0.3985 (0.4148) data time 0.0008 (0.0027) model time 0.3977 (0.4026) loss 5.6001 (6.3873) grad_norm 2.0795 (3.8042) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][240/625] eta 0:02:39 lr 0.000028 wd 0.0500 time 0.3938 (0.4141) data time 0.0010 (0.0026) model time 0.3928 (0.4024) loss 6.5574 (6.3799) grad_norm 3.3878 (3.7774) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][250/625] eta 0:02:35 lr 0.000028 wd 0.0500 time 0.3927 (0.4157) data time 0.0009 (0.0025) model time 0.3918 (0.4050) loss 6.4404 (6.3648) grad_norm 4.0355 (3.7665) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][260/625] eta 0:02:33 lr 0.000028 wd 0.0500 time 0.5669 (0.4199) data time 0.0008 (0.0025) model time 0.5661 (0.4106) loss 5.9194 (6.3591) grad_norm 2.9917 (3.7265) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][270/625] eta 0:02:29 lr 0.000028 wd 0.0500 time 0.3968 (0.4216) data time 0.0007 (0.0024) model time 0.3961 (0.4130) loss 6.8692 (6.3590) grad_norm 2.8916 (3.9104) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][280/625] eta 0:02:25 lr 0.000028 wd 0.0500 time 0.4026 (0.4208) data time 0.0008 (0.0024) model time 0.4018 (0.4124) loss 5.8994 (6.3476) grad_norm 2.7623 (3.8834) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][290/625] eta 0:02:20 lr 0.000028 wd 0.0500 time 0.3976 (0.4200) data time 0.0006 (0.0023) model time 0.3970 (0.4118) loss 6.8430 (6.3492) grad_norm 4.2995 (3.8592) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][300/625] eta 0:02:16 lr 0.000028 wd 0.0500 time 0.4000 (0.4194) data time 0.0006 (0.0023) model time 0.3994 (0.4113) loss 5.6146 (6.3497) grad_norm 2.7340 (3.8281) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][310/625] eta 0:02:11 lr 0.000028 wd 0.0500 time 0.4002 (0.4187) data time 0.0008 (0.0022) model time 0.3994 (0.4108) loss 6.9002 (6.3549) grad_norm 2.0146 (3.8055) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][320/625] eta 0:02:07 lr 0.000028 wd 0.0500 time 0.3949 (0.4185) data time 0.0007 (0.0022) model time 0.3943 (0.4109) loss 6.0266 (6.3492) grad_norm 4.8883 (3.7822) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:19:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][330/625] eta 0:02:03 lr 0.000028 wd 0.0500 time 0.3984 (0.4180) data time 0.0008 (0.0021) model time 0.3976 (0.4105) loss 5.9096 (6.3463) grad_norm 5.1028 (3.8485) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][340/625] eta 0:01:58 lr 0.000028 wd 0.0500 time 0.3967 (0.4174) data time 0.0008 (0.0021) model time 0.3960 (0.4100) loss 5.5709 (6.3465) grad_norm 4.0422 (3.8338) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][350/625] eta 0:01:54 lr 0.000028 wd 0.0500 time 0.3990 (0.4168) data time 0.0008 (0.0021) model time 0.3981 (0.4095) loss 6.6522 (6.3475) grad_norm 17.5426 (3.8531) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][360/625] eta 0:01:50 lr 0.000028 wd 0.0500 time 0.3932 (0.4163) data time 0.0009 (0.0020) model time 0.3924 (0.4091) loss 5.3432 (6.3437) grad_norm 2.6845 (3.8273) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][370/625] eta 0:01:46 lr 0.000028 wd 0.0500 time 0.4000 (0.4158) data time 0.0008 (0.0020) model time 0.3992 (0.4088) loss 6.8489 (6.3476) grad_norm 3.1061 (3.8006) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][380/625] eta 0:01:41 lr 0.000027 wd 0.0500 time 0.3958 (0.4153) data time 0.0008 (0.0020) model time 0.3950 (0.4084) loss 6.5675 (6.3447) grad_norm 3.0260 (3.7836) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][390/625] eta 0:01:37 lr 0.000027 wd 0.0500 time 0.3998 (0.4149) data time 0.0009 (0.0019) model time 0.3989 (0.4080) loss 6.6460 (6.3509) grad_norm 3.0430 (3.7831) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][400/625] eta 0:01:33 lr 0.000027 wd 0.0500 time 0.3988 (0.4144) data time 0.0008 (0.0019) model time 0.3980 (0.4077) loss 6.3694 (6.3501) grad_norm 5.0364 (3.8096) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][410/625] eta 0:01:29 lr 0.000027 wd 0.0500 time 0.3996 (0.4140) data time 0.0006 (0.0019) model time 0.3990 (0.4074) loss 5.0840 (6.3440) grad_norm 5.3041 (3.8036) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][420/625] eta 0:01:24 lr 0.000027 wd 0.0500 time 0.3973 (0.4136) data time 0.0008 (0.0019) model time 0.3965 (0.4071) loss 6.0960 (6.3504) grad_norm 4.0699 (3.8590) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][430/625] eta 0:01:20 lr 0.000027 wd 0.0500 time 0.4042 (0.4133) data time 0.0008 (0.0018) model time 0.4033 (0.4068) loss 4.9823 (6.3566) grad_norm 4.2524 (3.8545) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][440/625] eta 0:01:16 lr 0.000027 wd 0.0500 time 0.3948 (0.4129) data time 0.0008 (0.0018) model time 0.3940 (0.4066) loss 7.4053 (6.3543) grad_norm 2.7157 (3.9179) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][450/625] eta 0:01:12 lr 0.000027 wd 0.0500 time 0.3995 (0.4125) data time 0.0006 (0.0018) model time 0.3989 (0.4063) loss 5.5810 (6.3553) grad_norm 2.6160 (3.9180) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][460/625] eta 0:01:08 lr 0.000027 wd 0.0500 time 0.3960 (0.4123) data time 0.0008 (0.0018) model time 0.3951 (0.4061) loss 6.3276 (6.3580) grad_norm 2.5123 (3.8957) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][470/625] eta 0:01:04 lr 0.000027 wd 0.0500 time 0.3957 (0.4132) data time 0.0009 (0.0018) model time 0.3949 (0.4073) loss 5.2511 (6.3532) grad_norm 4.4173 (3.9220) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:20:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][480/625] eta 0:01:00 lr 0.000027 wd 0.0500 time 0.3964 (0.4155) data time 0.0006 (0.0017) model time 0.3958 (0.4100) loss 5.6789 (6.3465) grad_norm 2.8822 (3.9215) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][490/625] eta 0:00:56 lr 0.000027 wd 0.0500 time 0.4063 (0.4165) data time 0.0006 (0.0017) model time 0.4057 (0.4112) loss 5.1051 (6.3421) grad_norm 3.9436 (4.0119) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][500/625] eta 0:00:52 lr 0.000027 wd 0.0500 time 0.3963 (0.4161) data time 0.0006 (0.0017) model time 0.3956 (0.4109) loss 6.5634 (6.3396) grad_norm 4.9539 (4.0129) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][510/625] eta 0:00:47 lr 0.000027 wd 0.0500 time 0.3964 (0.4157) data time 0.0006 (0.0017) model time 0.3958 (0.4106) loss 6.5188 (6.3454) grad_norm 2.7638 (4.1088) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][520/625] eta 0:00:43 lr 0.000027 wd 0.0500 time 0.3893 (0.4154) data time 0.0007 (0.0017) model time 0.3886 (0.4103) loss 7.1919 (6.3497) grad_norm 4.2934 (4.1001) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][530/625] eta 0:00:39 lr 0.000027 wd 0.0500 time 0.3975 (0.4150) data time 0.0008 (0.0017) model time 0.3967 (0.4100) loss 7.4959 (6.3563) grad_norm 6.7306 (4.0888) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][540/625] eta 0:00:35 lr 0.000027 wd 0.0500 time 0.3989 (0.4151) data time 0.0008 (0.0016) model time 0.3981 (0.4101) loss 5.8805 (6.3591) grad_norm 3.0840 (4.0971) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][550/625] eta 0:00:31 lr 0.000027 wd 0.0500 time 0.3942 (0.4148) data time 0.0007 (0.0016) model time 0.3936 (0.4099) loss 5.4883 (6.3477) grad_norm 3.5864 (4.0883) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][560/625] eta 0:00:26 lr 0.000027 wd 0.0500 time 0.3976 (0.4145) data time 0.0007 (0.0016) model time 0.3970 (0.4096) loss 5.5160 (6.3467) grad_norm 2.0257 (4.0627) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][570/625] eta 0:00:22 lr 0.000027 wd 0.0500 time 0.3991 (0.4142) data time 0.0007 (0.0016) model time 0.3985 (0.4094) loss 5.3926 (6.3412) grad_norm 3.8104 (4.0946) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][580/625] eta 0:00:18 lr 0.000027 wd 0.0500 time 0.3963 (0.4139) data time 0.0008 (0.0016) model time 0.3955 (0.4091) loss 5.4849 (6.3447) grad_norm 2.5423 (4.1225) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][590/625] eta 0:00:14 lr 0.000027 wd 0.0500 time 0.3953 (0.4136) data time 0.0008 (0.0016) model time 0.3946 (0.4089) loss 6.4945 (6.3504) grad_norm 2.5096 (4.1122) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][600/625] eta 0:00:10 lr 0.000027 wd 0.0500 time 0.4012 (0.4133) data time 0.0006 (0.0016) model time 0.4006 (0.4086) loss 5.2057 (6.3505) grad_norm 2.4709 (4.1053) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][610/625] eta 0:00:06 lr 0.000027 wd 0.0500 time 0.3945 (0.4131) data time 0.0006 (0.0016) model time 0.3939 (0.4084) loss 6.0535 (6.3476) grad_norm 2.2586 (4.0841) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][620/625] eta 0:00:02 lr 0.000027 wd 0.0500 time 0.3939 (0.4128) data time 0.0004 (0.0015) model time 0.3935 (0.4082) loss 4.9101 (6.3460) grad_norm 3.7924 (4.0683) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:21:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 279 training takes 0:04:17 [2024-07-25 13:21:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:21:56 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:21:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5444 (0.5444) Acc@1 90.576 (90.576) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:21:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8101 (0.6549) Acc@1 83.105 (87.726) Acc@5 97.314 (98.025) Mem 14939MB [2024-07-25 13:21:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9033 (0.7577) Acc@1 79.248 (84.766) Acc@5 96.143 (97.080) Mem 14939MB [2024-07-25 13:21:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.383 Acc@5 97.047 [2024-07-25 13:21:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:21:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.860 (0.860) Loss 0.5405 (0.5405) Acc@1 90.186 (90.186) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 13:22:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.158) Loss 0.8076 (0.6539) Acc@1 82.959 (87.584) Acc@5 97.119 (98.029) Mem 14939MB [2024-07-25 13:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.123) Loss 0.9077 (0.7569) Acc@1 79.053 (84.680) Acc@5 96.094 (97.098) Mem 14939MB [2024-07-25 13:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.275 Acc@5 97.069 [2024-07-25 13:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.28% [2024-07-25 13:22:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:22:03 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:22:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][0/625] eta 0:07:46 lr 0.000027 wd 0.0500 time 0.7468 (0.7468) data time 0.3729 (0.3729) model time 0.0000 (0.0000) loss 6.1664 (6.1664) grad_norm 6.4505 (6.4505) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][10/625] eta 0:04:23 lr 0.000027 wd 0.0500 time 0.3993 (0.4287) data time 0.0008 (0.0347) model time 0.0000 (0.0000) loss 6.9818 (6.3170) grad_norm 3.3734 (3.1704) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][20/625] eta 0:04:10 lr 0.000027 wd 0.0500 time 0.3993 (0.4136) data time 0.0007 (0.0185) model time 0.0000 (0.0000) loss 7.0568 (6.3343) grad_norm 3.5853 (3.3341) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][30/625] eta 0:04:02 lr 0.000027 wd 0.0500 time 0.3984 (0.4082) data time 0.0006 (0.0128) model time 0.0000 (0.0000) loss 6.2430 (6.1437) grad_norm 2.9154 (3.0850) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][40/625] eta 0:03:57 lr 0.000027 wd 0.0500 time 0.3967 (0.4052) data time 0.0008 (0.0099) model time 0.0000 (0.0000) loss 6.0926 (6.1302) grad_norm 2.0037 (3.2111) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][50/625] eta 0:03:52 lr 0.000027 wd 0.0500 time 0.3954 (0.4035) data time 0.0006 (0.0081) model time 0.0000 (0.0000) loss 7.3238 (6.2027) grad_norm 2.8435 (3.2193) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][60/625] eta 0:03:49 lr 0.000027 wd 0.0500 time 0.5814 (0.4055) data time 0.0008 (0.0069) model time 0.5806 (0.4147) loss 7.3110 (6.2169) grad_norm 3.9528 (3.1763) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][70/625] eta 0:03:49 lr 0.000027 wd 0.0500 time 0.3960 (0.4135) data time 0.0006 (0.0060) model time 0.3954 (0.4381) loss 5.5126 (6.2045) grad_norm 2.9999 (3.1331) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][80/625] eta 0:03:50 lr 0.000027 wd 0.0500 time 0.6010 (0.4231) data time 0.0008 (0.0054) model time 0.6002 (0.4554) loss 5.2178 (6.2273) grad_norm 2.8002 (3.1221) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][90/625] eta 0:03:45 lr 0.000027 wd 0.0500 time 0.3953 (0.4212) data time 0.0006 (0.0049) model time 0.3947 (0.4430) loss 6.4095 (6.2236) grad_norm 9.5842 (3.2742) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][100/625] eta 0:03:39 lr 0.000027 wd 0.0500 time 0.3952 (0.4187) data time 0.0007 (0.0045) model time 0.3945 (0.4333) loss 6.2573 (6.2327) grad_norm 3.6327 (3.3912) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][110/625] eta 0:03:34 lr 0.000027 wd 0.0500 time 0.3960 (0.4168) data time 0.0008 (0.0042) model time 0.3951 (0.4273) loss 6.4656 (6.2386) grad_norm 3.1093 (3.3686) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][120/625] eta 0:03:29 lr 0.000027 wd 0.0500 time 0.3970 (0.4152) data time 0.0007 (0.0039) model time 0.3962 (0.4228) loss 6.2043 (6.2314) grad_norm 2.2269 (3.6590) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:22:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][130/625] eta 0:03:24 lr 0.000027 wd 0.0500 time 0.3944 (0.4138) data time 0.0007 (0.0037) model time 0.3937 (0.4194) loss 5.6941 (6.2247) grad_norm 44.1546 (4.0452) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:23:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][140/625] eta 0:03:20 lr 0.000027 wd 0.0500 time 0.3945 (0.4125) data time 0.0009 (0.0035) model time 0.3936 (0.4168) loss 6.4502 (6.2213) grad_norm 2.1196 (4.1740) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:23:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][150/625] eta 0:03:15 lr 0.000027 wd 0.0500 time 0.3978 (0.4114) data time 0.0006 (0.0033) model time 0.3972 (0.4146) loss 5.5985 (6.2263) grad_norm 2.4044 (inf) loss_scale 64.0000 (126.3046) mem 14939MB [2024-07-25 13:23:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][160/625] eta 0:03:10 lr 0.000027 wd 0.0500 time 0.4055 (0.4105) data time 0.0008 (0.0031) model time 0.4046 (0.4129) loss 5.4992 (6.2271) grad_norm 2.5995 (inf) loss_scale 64.0000 (122.4348) mem 14939MB [2024-07-25 13:23:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][170/625] eta 0:03:06 lr 0.000026 wd 0.0500 time 0.4017 (0.4098) data time 0.0009 (0.0030) model time 0.4008 (0.4116) loss 6.7186 (6.2336) grad_norm 3.2620 (inf) loss_scale 64.0000 (119.0175) mem 14939MB [2024-07-25 13:23:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][180/625] eta 0:03:02 lr 0.000026 wd 0.0500 time 0.3914 (0.4091) data time 0.0006 (0.0029) model time 0.3908 (0.4105) loss 5.2121 (6.2187) grad_norm 2.5949 (inf) loss_scale 64.0000 (115.9779) mem 14939MB [2024-07-25 13:23:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][190/625] eta 0:02:57 lr 0.000026 wd 0.0500 time 0.3946 (0.4084) data time 0.0008 (0.0028) model time 0.3938 (0.4094) loss 6.3689 (6.2170) grad_norm 2.5348 (inf) loss_scale 64.0000 (113.2565) mem 14939MB [2024-07-25 13:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][200/625] eta 0:02:53 lr 0.000026 wd 0.0500 time 0.3975 (0.4079) data time 0.0007 (0.0027) model time 0.3968 (0.4085) loss 5.3570 (6.2257) grad_norm 2.7163 (inf) loss_scale 64.0000 (110.8060) mem 14939MB [2024-07-25 13:23:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][210/625] eta 0:02:49 lr 0.000026 wd 0.0500 time 0.3928 (0.4073) data time 0.0009 (0.0026) model time 0.3919 (0.4077) loss 6.2495 (6.2246) grad_norm 2.6631 (inf) loss_scale 64.0000 (108.5877) mem 14939MB [2024-07-25 13:23:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][220/625] eta 0:02:44 lr 0.000026 wd 0.0500 time 0.3959 (0.4068) data time 0.0009 (0.0025) model time 0.3950 (0.4070) loss 6.3441 (6.2209) grad_norm 4.9307 (inf) loss_scale 64.0000 (106.5701) mem 14939MB [2024-07-25 13:23:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][230/625] eta 0:02:40 lr 0.000026 wd 0.0500 time 0.3963 (0.4064) data time 0.0008 (0.0024) model time 0.3955 (0.4063) loss 7.3628 (6.2342) grad_norm 4.0440 (inf) loss_scale 64.0000 (104.7273) mem 14939MB [2024-07-25 13:23:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][240/625] eta 0:02:36 lr 0.000026 wd 0.0500 time 0.3956 (0.4060) data time 0.0007 (0.0024) model time 0.3949 (0.4058) loss 5.7355 (6.2459) grad_norm 2.1079 (inf) loss_scale 64.0000 (103.0373) mem 14939MB [2024-07-25 13:23:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][250/625] eta 0:02:32 lr 0.000026 wd 0.0500 time 0.3949 (0.4057) data time 0.0008 (0.0023) model time 0.3941 (0.4054) loss 5.8351 (6.2383) grad_norm 2.7529 (inf) loss_scale 64.0000 (101.4821) mem 14939MB [2024-07-25 13:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][260/625] eta 0:02:27 lr 0.000026 wd 0.0500 time 0.3984 (0.4053) data time 0.0008 (0.0023) model time 0.3976 (0.4049) loss 6.4528 (6.2320) grad_norm 2.7180 (inf) loss_scale 64.0000 (100.0460) mem 14939MB [2024-07-25 13:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][270/625] eta 0:02:23 lr 0.000026 wd 0.0500 time 0.3944 (0.4050) data time 0.0006 (0.0022) model time 0.3938 (0.4045) loss 7.1836 (6.2460) grad_norm 3.3850 (inf) loss_scale 64.0000 (98.7159) mem 14939MB [2024-07-25 13:23:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][280/625] eta 0:02:19 lr 0.000026 wd 0.0500 time 0.5648 (0.4054) data time 0.0008 (0.0022) model time 0.5639 (0.4050) loss 5.9114 (6.2498) grad_norm 7.5081 (inf) loss_scale 64.0000 (97.4804) mem 14939MB [2024-07-25 13:24:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][290/625] eta 0:02:16 lr 0.000026 wd 0.0500 time 0.5996 (0.4084) data time 0.0008 (0.0021) model time 0.5988 (0.4086) loss 6.3699 (6.2510) grad_norm 3.5573 (inf) loss_scale 64.0000 (96.3299) mem 14939MB [2024-07-25 13:24:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][300/625] eta 0:02:13 lr 0.000026 wd 0.0500 time 0.3938 (0.4111) data time 0.0007 (0.0021) model time 0.3931 (0.4118) loss 6.0734 (6.2418) grad_norm 5.4594 (inf) loss_scale 64.0000 (95.2558) mem 14939MB [2024-07-25 13:24:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][310/625] eta 0:02:09 lr 0.000026 wd 0.0500 time 0.3963 (0.4121) data time 0.0010 (0.0020) model time 0.3953 (0.4130) loss 6.9716 (6.2531) grad_norm 2.9474 (inf) loss_scale 64.0000 (94.2508) mem 14939MB [2024-07-25 13:24:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][320/625] eta 0:02:05 lr 0.000026 wd 0.0500 time 0.4041 (0.4117) data time 0.0007 (0.0020) model time 0.4034 (0.4124) loss 6.0985 (6.2410) grad_norm 3.5200 (inf) loss_scale 64.0000 (93.3084) mem 14939MB [2024-07-25 13:24:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][330/625] eta 0:02:01 lr 0.000026 wd 0.0500 time 0.4030 (0.4113) data time 0.0006 (0.0020) model time 0.4024 (0.4118) loss 7.1283 (6.2617) grad_norm 2.2329 (inf) loss_scale 64.0000 (92.4230) mem 14939MB [2024-07-25 13:24:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][340/625] eta 0:01:57 lr 0.000026 wd 0.0500 time 0.3951 (0.4108) data time 0.0011 (0.0019) model time 0.3940 (0.4113) loss 5.9450 (6.2772) grad_norm 4.6776 (inf) loss_scale 64.0000 (91.5894) mem 14939MB [2024-07-25 13:24:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][350/625] eta 0:01:52 lr 0.000026 wd 0.0500 time 0.3969 (0.4105) data time 0.0006 (0.0019) model time 0.3963 (0.4108) loss 6.6628 (6.2784) grad_norm 2.6495 (inf) loss_scale 64.0000 (90.8034) mem 14939MB [2024-07-25 13:24:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][360/625] eta 0:01:48 lr 0.000026 wd 0.0500 time 0.3927 (0.4101) data time 0.0007 (0.0019) model time 0.3920 (0.4103) loss 5.5212 (6.2746) grad_norm 2.6623 (inf) loss_scale 64.0000 (90.0609) mem 14939MB [2024-07-25 13:24:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][370/625] eta 0:01:44 lr 0.000026 wd 0.0500 time 0.3936 (0.4097) data time 0.0008 (0.0018) model time 0.3928 (0.4098) loss 6.0642 (6.2690) grad_norm 2.2948 (inf) loss_scale 64.0000 (89.3585) mem 14939MB [2024-07-25 13:24:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][380/625] eta 0:01:40 lr 0.000026 wd 0.0500 time 0.3961 (0.4093) data time 0.0008 (0.0018) model time 0.3953 (0.4093) loss 5.5843 (6.2723) grad_norm 3.1500 (inf) loss_scale 64.0000 (88.6929) mem 14939MB [2024-07-25 13:24:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][390/625] eta 0:01:36 lr 0.000026 wd 0.0500 time 0.3932 (0.4090) data time 0.0007 (0.0018) model time 0.3925 (0.4089) loss 5.9411 (6.2693) grad_norm 4.2178 (inf) loss_scale 64.0000 (88.0614) mem 14939MB [2024-07-25 13:24:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][400/625] eta 0:01:31 lr 0.000026 wd 0.0500 time 0.3968 (0.4087) data time 0.0008 (0.0018) model time 0.3960 (0.4086) loss 6.7840 (6.2726) grad_norm 6.5754 (inf) loss_scale 64.0000 (87.4613) mem 14939MB [2024-07-25 13:24:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][410/625] eta 0:01:27 lr 0.000026 wd 0.0500 time 0.4033 (0.4084) data time 0.0007 (0.0018) model time 0.4027 (0.4082) loss 7.0605 (6.2694) grad_norm 7.4777 (inf) loss_scale 64.0000 (86.8905) mem 14939MB [2024-07-25 13:24:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][420/625] eta 0:01:23 lr 0.000026 wd 0.0500 time 0.3936 (0.4081) data time 0.0009 (0.0017) model time 0.3928 (0.4079) loss 6.9517 (6.2728) grad_norm 2.8903 (inf) loss_scale 64.0000 (86.3468) mem 14939MB [2024-07-25 13:24:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][430/625] eta 0:01:19 lr 0.000026 wd 0.0500 time 0.3933 (0.4079) data time 0.0008 (0.0017) model time 0.3925 (0.4076) loss 6.1085 (6.2674) grad_norm 2.0433 (inf) loss_scale 64.0000 (85.8283) mem 14939MB [2024-07-25 13:25:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][440/625] eta 0:01:15 lr 0.000026 wd 0.0500 time 0.3955 (0.4077) data time 0.0007 (0.0017) model time 0.3948 (0.4074) loss 6.4467 (6.2691) grad_norm 3.8178 (inf) loss_scale 64.0000 (85.3333) mem 14939MB [2024-07-25 13:25:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][450/625] eta 0:01:11 lr 0.000026 wd 0.0500 time 0.3978 (0.4075) data time 0.0008 (0.0017) model time 0.3970 (0.4071) loss 6.0793 (6.2724) grad_norm 2.7038 (inf) loss_scale 64.0000 (84.8603) mem 14939MB [2024-07-25 13:25:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][460/625] eta 0:01:07 lr 0.000026 wd 0.0500 time 0.3963 (0.4072) data time 0.0008 (0.0017) model time 0.3954 (0.4068) loss 5.5980 (6.2673) grad_norm 9.8289 (inf) loss_scale 64.0000 (84.4078) mem 14939MB [2024-07-25 13:25:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][470/625] eta 0:01:03 lr 0.000026 wd 0.0500 time 0.3971 (0.4070) data time 0.0008 (0.0017) model time 0.3963 (0.4065) loss 6.9058 (6.2748) grad_norm 2.5336 (inf) loss_scale 64.0000 (83.9745) mem 14939MB [2024-07-25 13:25:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][480/625] eta 0:00:58 lr 0.000026 wd 0.0500 time 0.3951 (0.4068) data time 0.0007 (0.0017) model time 0.3944 (0.4063) loss 7.2472 (6.2745) grad_norm 2.6761 (inf) loss_scale 64.0000 (83.5593) mem 14939MB [2024-07-25 13:25:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][490/625] eta 0:00:54 lr 0.000026 wd 0.0500 time 0.3937 (0.4066) data time 0.0006 (0.0016) model time 0.3931 (0.4061) loss 7.8596 (6.2771) grad_norm 3.9260 (inf) loss_scale 64.0000 (83.1609) mem 14939MB [2024-07-25 13:25:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][500/625] eta 0:00:50 lr 0.000026 wd 0.0500 time 0.3953 (0.4068) data time 0.0009 (0.0016) model time 0.3944 (0.4062) loss 7.4432 (6.2745) grad_norm 4.8018 (inf) loss_scale 64.0000 (82.7784) mem 14939MB [2024-07-25 13:25:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][510/625] eta 0:00:46 lr 0.000026 wd 0.0500 time 0.3936 (0.4084) data time 0.0008 (0.0016) model time 0.3928 (0.4080) loss 5.3622 (6.2701) grad_norm 2.2368 (inf) loss_scale 64.0000 (82.4110) mem 14939MB [2024-07-25 13:25:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][520/625] eta 0:00:43 lr 0.000026 wd 0.0500 time 0.3922 (0.4104) data time 0.0009 (0.0016) model time 0.3913 (0.4102) loss 6.3284 (6.2744) grad_norm 2.4713 (inf) loss_scale 64.0000 (82.0576) mem 14939MB [2024-07-25 13:25:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][530/625] eta 0:00:39 lr 0.000026 wd 0.0500 time 0.4034 (0.4108) data time 0.0008 (0.0016) model time 0.4025 (0.4107) loss 7.8531 (6.2826) grad_norm 2.5633 (inf) loss_scale 64.0000 (81.7175) mem 14939MB [2024-07-25 13:25:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][540/625] eta 0:00:34 lr 0.000026 wd 0.0500 time 0.3886 (0.4105) data time 0.0006 (0.0016) model time 0.3880 (0.4103) loss 5.5130 (6.2816) grad_norm 3.3137 (inf) loss_scale 64.0000 (81.3900) mem 14939MB [2024-07-25 13:25:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][550/625] eta 0:00:30 lr 0.000026 wd 0.0500 time 0.3960 (0.4104) data time 0.0006 (0.0016) model time 0.3954 (0.4102) loss 7.5320 (6.2824) grad_norm 3.4255 (inf) loss_scale 64.0000 (81.0744) mem 14939MB [2024-07-25 13:25:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][560/625] eta 0:00:26 lr 0.000026 wd 0.0500 time 0.4261 (0.4101) data time 0.0006 (0.0015) model time 0.4255 (0.4099) loss 5.1060 (6.2810) grad_norm 12.0924 (inf) loss_scale 64.0000 (80.7701) mem 14939MB [2024-07-25 13:25:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][570/625] eta 0:00:22 lr 0.000026 wd 0.0500 time 0.3927 (0.4099) data time 0.0007 (0.0015) model time 0.3920 (0.4096) loss 6.1780 (6.2901) grad_norm 3.7819 (inf) loss_scale 64.0000 (80.4764) mem 14939MB [2024-07-25 13:26:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][580/625] eta 0:00:18 lr 0.000026 wd 0.0500 time 0.3959 (0.4096) data time 0.0008 (0.0015) model time 0.3952 (0.4093) loss 6.8032 (6.2877) grad_norm 37.6081 (inf) loss_scale 64.0000 (80.1928) mem 14939MB [2024-07-25 13:26:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][590/625] eta 0:00:14 lr 0.000026 wd 0.0500 time 0.3953 (0.4094) data time 0.0008 (0.0015) model time 0.3946 (0.4091) loss 5.6032 (6.2929) grad_norm 5.1574 (inf) loss_scale 64.0000 (79.9188) mem 14939MB [2024-07-25 13:26:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][600/625] eta 0:00:10 lr 0.000026 wd 0.0500 time 0.3945 (0.4092) data time 0.0008 (0.0015) model time 0.3937 (0.4089) loss 5.6810 (6.2936) grad_norm 5.2258 (inf) loss_scale 64.0000 (79.6539) mem 14939MB [2024-07-25 13:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][610/625] eta 0:00:06 lr 0.000025 wd 0.0500 time 0.3969 (0.4091) data time 0.0004 (0.0015) model time 0.3966 (0.4087) loss 5.5935 (6.2944) grad_norm 4.4379 (inf) loss_scale 64.0000 (79.3977) mem 14939MB [2024-07-25 13:26:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][620/625] eta 0:00:02 lr 0.000025 wd 0.0500 time 0.3969 (0.4089) data time 0.0005 (0.0015) model time 0.3964 (0.4085) loss 6.4457 (6.2924) grad_norm 4.1328 (inf) loss_scale 64.0000 (79.1498) mem 14939MB [2024-07-25 13:26:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 280 training takes 0:04:15 [2024-07-25 13:26:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:26:20 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:26:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.5488 (0.5488) Acc@1 90.674 (90.674) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:26:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8149 (0.6599) Acc@1 82.568 (87.722) Acc@5 97.314 (98.051) Mem 14939MB [2024-07-25 13:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 0.9053 (0.7625) Acc@1 79.785 (84.770) Acc@5 95.996 (97.114) Mem 14939MB [2024-07-25 13:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.387 Acc@5 97.081 [2024-07-25 13:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:26:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.800 (0.800) Loss 0.5410 (0.5410) Acc@1 90.283 (90.283) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 13:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.155) Loss 0.8076 (0.6536) Acc@1 82.959 (87.611) Acc@5 97.168 (98.029) Mem 14939MB [2024-07-25 13:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.122) Loss 0.9077 (0.7567) Acc@1 79.199 (84.694) Acc@5 96.191 (97.098) Mem 14939MB [2024-07-25 13:26:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.299 Acc@5 97.071 [2024-07-25 13:26:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:26:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.30% [2024-07-25 13:26:26 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:26:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:26:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][0/625] eta 0:08:35 lr 0.000025 wd 0.0500 time 0.8245 (0.8245) data time 0.4415 (0.4415) model time 0.0000 (0.0000) loss 5.6906 (5.6906) grad_norm 3.9902 (3.9902) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:26:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][10/625] eta 0:04:28 lr 0.000025 wd 0.0500 time 0.3966 (0.4366) data time 0.0008 (0.0409) model time 0.0000 (0.0000) loss 6.8856 (6.0703) grad_norm 2.3403 (4.0252) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:26:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][20/625] eta 0:04:12 lr 0.000025 wd 0.0500 time 0.3935 (0.4178) data time 0.0006 (0.0218) model time 0.0000 (0.0000) loss 6.5650 (6.3275) grad_norm 2.6461 (3.9524) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:26:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][30/625] eta 0:04:04 lr 0.000025 wd 0.0500 time 0.3955 (0.4111) data time 0.0009 (0.0150) model time 0.0000 (0.0000) loss 7.0753 (6.2723) grad_norm 2.3688 (5.2260) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:26:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][40/625] eta 0:04:01 lr 0.000025 wd 0.0500 time 0.3978 (0.4121) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 6.9012 (6.2342) grad_norm 3.2969 (4.9400) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:26:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][50/625] eta 0:03:55 lr 0.000025 wd 0.0500 time 0.3953 (0.4090) data time 0.0008 (0.0094) model time 0.0000 (0.0000) loss 6.5170 (6.2665) grad_norm 4.2662 (4.5729) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:26:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][60/625] eta 0:03:49 lr 0.000025 wd 0.0500 time 0.3947 (0.4068) data time 0.0008 (0.0080) model time 0.3939 (0.3951) loss 6.9688 (6.2546) grad_norm 2.2827 (4.4361) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:26:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][70/625] eta 0:03:45 lr 0.000025 wd 0.0500 time 0.3949 (0.4055) data time 0.0006 (0.0070) model time 0.3943 (0.3957) loss 6.0506 (6.2929) grad_norm 12.1353 (4.4771) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][80/625] eta 0:03:40 lr 0.000025 wd 0.0500 time 0.3885 (0.4044) data time 0.0009 (0.0063) model time 0.3877 (0.3959) loss 7.0541 (6.3111) grad_norm 3.0136 (4.3556) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][90/625] eta 0:03:35 lr 0.000025 wd 0.0500 time 0.3963 (0.4035) data time 0.0008 (0.0057) model time 0.3956 (0.3957) loss 6.0650 (6.2842) grad_norm 2.2877 (4.2564) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][100/625] eta 0:03:33 lr 0.000025 wd 0.0500 time 0.5854 (0.4072) data time 0.0007 (0.0052) model time 0.5847 (0.4045) loss 6.2674 (6.2880) grad_norm 7.3761 (4.3280) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][110/625] eta 0:03:33 lr 0.000025 wd 0.0500 time 0.5820 (0.4154) data time 0.0008 (0.0048) model time 0.5811 (0.4201) loss 6.9024 (6.2497) grad_norm 2.6030 (4.2236) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][120/625] eta 0:03:32 lr 0.000025 wd 0.0500 time 0.4035 (0.4214) data time 0.0008 (0.0045) model time 0.4027 (0.4297) loss 6.7173 (6.2349) grad_norm 2.8418 (4.3318) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][130/625] eta 0:03:28 lr 0.000025 wd 0.0500 time 0.3988 (0.4209) data time 0.0008 (0.0042) model time 0.3980 (0.4277) loss 6.7256 (6.2545) grad_norm 2.9097 (4.2806) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][140/625] eta 0:03:23 lr 0.000025 wd 0.0500 time 0.3977 (0.4194) data time 0.0009 (0.0039) model time 0.3968 (0.4245) loss 5.2385 (6.2605) grad_norm 3.2418 (4.2677) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][150/625] eta 0:03:18 lr 0.000025 wd 0.0500 time 0.3999 (0.4179) data time 0.0006 (0.0037) model time 0.3993 (0.4216) loss 6.5652 (6.2553) grad_norm 2.9926 (4.1821) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][160/625] eta 0:03:13 lr 0.000025 wd 0.0500 time 0.3957 (0.4165) data time 0.0006 (0.0036) model time 0.3950 (0.4192) loss 6.1501 (6.2615) grad_norm 2.9614 (4.1214) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][170/625] eta 0:03:09 lr 0.000025 wd 0.0500 time 0.3963 (0.4154) data time 0.0006 (0.0034) model time 0.3957 (0.4173) loss 6.4777 (6.2757) grad_norm 2.2442 (4.1167) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][180/625] eta 0:03:04 lr 0.000025 wd 0.0500 time 0.3998 (0.4144) data time 0.0009 (0.0033) model time 0.3989 (0.4157) loss 7.3391 (6.2788) grad_norm 2.5232 (4.1365) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][190/625] eta 0:02:59 lr 0.000025 wd 0.0500 time 0.3951 (0.4134) data time 0.0008 (0.0031) model time 0.3943 (0.4142) loss 6.6265 (6.2732) grad_norm 2.5142 (4.0963) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][200/625] eta 0:02:55 lr 0.000025 wd 0.0500 time 0.4098 (0.4126) data time 0.0008 (0.0030) model time 0.4091 (0.4131) loss 5.7682 (6.2687) grad_norm 3.4041 (4.0532) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][210/625] eta 0:02:50 lr 0.000025 wd 0.0500 time 0.3979 (0.4119) data time 0.0008 (0.0029) model time 0.3972 (0.4120) loss 6.2695 (6.2716) grad_norm 2.8649 (3.9872) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:27:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][220/625] eta 0:02:46 lr 0.000025 wd 0.0500 time 0.3979 (0.4116) data time 0.0008 (0.0028) model time 0.3972 (0.4116) loss 6.5762 (6.2781) grad_norm 3.2441 (3.9403) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][230/625] eta 0:02:42 lr 0.000025 wd 0.0500 time 0.3950 (0.4110) data time 0.0008 (0.0028) model time 0.3942 (0.4107) loss 6.7500 (6.2661) grad_norm 3.2939 (3.8793) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][240/625] eta 0:02:38 lr 0.000025 wd 0.0500 time 0.3944 (0.4104) data time 0.0006 (0.0027) model time 0.3939 (0.4099) loss 6.8654 (6.2701) grad_norm 4.9551 (3.8814) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][250/625] eta 0:02:33 lr 0.000025 wd 0.0500 time 0.3987 (0.4098) data time 0.0006 (0.0026) model time 0.3981 (0.4092) loss 5.9161 (6.2725) grad_norm 2.9860 (3.8673) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][260/625] eta 0:02:29 lr 0.000025 wd 0.0500 time 0.3992 (0.4099) data time 0.0008 (0.0025) model time 0.3984 (0.4092) loss 7.0107 (6.2799) grad_norm 3.6157 (3.8693) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][270/625] eta 0:02:25 lr 0.000025 wd 0.0500 time 0.4361 (0.4095) data time 0.0008 (0.0025) model time 0.4353 (0.4088) loss 5.8437 (6.2658) grad_norm 1.9382 (3.8859) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][280/625] eta 0:02:21 lr 0.000025 wd 0.0500 time 0.3950 (0.4092) data time 0.0008 (0.0024) model time 0.3942 (0.4084) loss 6.4660 (6.2601) grad_norm 3.4836 (3.8492) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][290/625] eta 0:02:16 lr 0.000025 wd 0.0500 time 0.3953 (0.4088) data time 0.0008 (0.0023) model time 0.3945 (0.4079) loss 7.1718 (6.2637) grad_norm 2.4916 (3.8359) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][300/625] eta 0:02:12 lr 0.000025 wd 0.0500 time 0.3985 (0.4087) data time 0.0008 (0.0024) model time 0.3976 (0.4077) loss 6.2206 (6.2578) grad_norm 3.6182 (3.8363) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][310/625] eta 0:02:08 lr 0.000025 wd 0.0500 time 0.3949 (0.4084) data time 0.0006 (0.0023) model time 0.3943 (0.4073) loss 5.8671 (6.2531) grad_norm 6.1169 (3.8096) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][320/625] eta 0:02:04 lr 0.000025 wd 0.0500 time 0.6009 (0.4094) data time 0.0008 (0.0023) model time 0.6002 (0.4086) loss 6.0306 (6.2540) grad_norm 2.4981 (3.7742) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][330/625] eta 0:02:01 lr 0.000025 wd 0.0500 time 0.5947 (0.4119) data time 0.0008 (0.0023) model time 0.5940 (0.4114) loss 5.6759 (6.2543) grad_norm 2.2180 (3.8075) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][340/625] eta 0:01:58 lr 0.000025 wd 0.0500 time 0.3936 (0.4140) data time 0.0008 (0.0022) model time 0.3928 (0.4140) loss 6.9129 (6.2567) grad_norm 2.6987 (3.7862) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][350/625] eta 0:01:53 lr 0.000025 wd 0.0500 time 0.3948 (0.4140) data time 0.0006 (0.0022) model time 0.3943 (0.4139) loss 5.4087 (6.2557) grad_norm 2.4686 (3.7719) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:28:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][360/625] eta 0:01:49 lr 0.000025 wd 0.0500 time 0.3960 (0.4135) data time 0.0008 (0.0021) model time 0.3952 (0.4133) loss 5.2686 (6.2608) grad_norm 6.0423 (3.7665) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][370/625] eta 0:01:45 lr 0.000025 wd 0.0500 time 0.3949 (0.4131) data time 0.0008 (0.0021) model time 0.3941 (0.4128) loss 7.3574 (6.2688) grad_norm 2.5169 (3.8166) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][380/625] eta 0:01:41 lr 0.000025 wd 0.0500 time 0.3977 (0.4126) data time 0.0009 (0.0021) model time 0.3969 (0.4123) loss 7.1124 (6.2699) grad_norm 2.2341 (3.8110) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][390/625] eta 0:01:36 lr 0.000025 wd 0.0500 time 0.3983 (0.4123) data time 0.0006 (0.0021) model time 0.3977 (0.4118) loss 5.5483 (6.2747) grad_norm 2.6629 (3.7941) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][400/625] eta 0:01:32 lr 0.000025 wd 0.0500 time 0.3957 (0.4119) data time 0.0008 (0.0020) model time 0.3949 (0.4113) loss 6.8221 (6.2767) grad_norm 2.4482 (3.7692) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][410/625] eta 0:01:28 lr 0.000025 wd 0.0500 time 0.4019 (0.4115) data time 0.0008 (0.0020) model time 0.4011 (0.4110) loss 7.0155 (6.2715) grad_norm 4.5132 (3.7754) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][420/625] eta 0:01:24 lr 0.000025 wd 0.0500 time 0.3948 (0.4112) data time 0.0006 (0.0020) model time 0.3942 (0.4105) loss 6.7957 (6.2764) grad_norm 2.6966 (3.7646) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][430/625] eta 0:01:20 lr 0.000024 wd 0.0500 time 0.3953 (0.4108) data time 0.0008 (0.0019) model time 0.3946 (0.4101) loss 6.6344 (6.2800) grad_norm 2.6793 (3.7445) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][440/625] eta 0:01:15 lr 0.000024 wd 0.0500 time 0.3956 (0.4105) data time 0.0008 (0.0019) model time 0.3948 (0.4098) loss 6.7792 (6.2803) grad_norm 3.7419 (3.7435) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][450/625] eta 0:01:11 lr 0.000024 wd 0.0500 time 0.3942 (0.4102) data time 0.0006 (0.0019) model time 0.3936 (0.4094) loss 6.6701 (6.2871) grad_norm 2.7906 (3.7290) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][460/625] eta 0:01:07 lr 0.000024 wd 0.0500 time 0.3991 (0.4099) data time 0.0008 (0.0019) model time 0.3983 (0.4091) loss 5.3360 (6.2907) grad_norm 3.2907 (3.7105) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][470/625] eta 0:01:03 lr 0.000024 wd 0.0500 time 0.3974 (0.4096) data time 0.0009 (0.0018) model time 0.3965 (0.4088) loss 6.7018 (6.2867) grad_norm 2.7707 (3.6981) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][480/625] eta 0:00:59 lr 0.000024 wd 0.0500 time 0.3974 (0.4096) data time 0.0006 (0.0018) model time 0.3968 (0.4088) loss 7.2791 (6.2864) grad_norm 3.0461 (3.6763) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][490/625] eta 0:00:55 lr 0.000024 wd 0.0500 time 0.3905 (0.4094) data time 0.0010 (0.0018) model time 0.3895 (0.4085) loss 6.4111 (6.2860) grad_norm 2.8722 (3.6859) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][500/625] eta 0:00:51 lr 0.000024 wd 0.0500 time 0.3947 (0.4091) data time 0.0006 (0.0018) model time 0.3941 (0.4082) loss 6.6924 (6.2918) grad_norm 2.8165 (3.6889) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:29:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][510/625] eta 0:00:47 lr 0.000024 wd 0.0500 time 0.3943 (0.4089) data time 0.0009 (0.0018) model time 0.3934 (0.4080) loss 6.4457 (6.2900) grad_norm 3.5934 (3.7098) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][520/625] eta 0:00:42 lr 0.000024 wd 0.0500 time 0.3979 (0.4087) data time 0.0008 (0.0017) model time 0.3972 (0.4077) loss 6.4366 (6.2873) grad_norm 4.0947 (3.7535) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][530/625] eta 0:00:38 lr 0.000024 wd 0.0500 time 0.3963 (0.4084) data time 0.0007 (0.0017) model time 0.3956 (0.4075) loss 5.7621 (6.2851) grad_norm 2.2703 (3.7668) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][540/625] eta 0:00:34 lr 0.000024 wd 0.0500 time 0.5849 (0.4091) data time 0.0008 (0.0017) model time 0.5841 (0.4082) loss 6.3552 (6.2856) grad_norm 2.6085 (3.7615) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][550/625] eta 0:00:30 lr 0.000024 wd 0.0500 time 0.5858 (0.4105) data time 0.0006 (0.0017) model time 0.5851 (0.4098) loss 5.9530 (6.2821) grad_norm 2.7005 (3.7566) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][560/625] eta 0:00:26 lr 0.000024 wd 0.0500 time 0.6051 (0.4120) data time 0.0006 (0.0017) model time 0.6045 (0.4114) loss 5.8207 (6.2846) grad_norm 2.9171 (3.7426) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][570/625] eta 0:00:22 lr 0.000024 wd 0.0500 time 0.3971 (0.4119) data time 0.0007 (0.0017) model time 0.3964 (0.4113) loss 6.2900 (6.2824) grad_norm 2.1738 (3.7996) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][580/625] eta 0:00:18 lr 0.000024 wd 0.0500 time 0.3969 (0.4116) data time 0.0007 (0.0017) model time 0.3962 (0.4110) loss 6.3969 (6.2783) grad_norm 3.0263 (3.8032) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][590/625] eta 0:00:14 lr 0.000024 wd 0.0500 time 0.3942 (0.4114) data time 0.0008 (0.0016) model time 0.3934 (0.4107) loss 7.0522 (6.2771) grad_norm 3.1283 (3.7838) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][600/625] eta 0:00:10 lr 0.000024 wd 0.0500 time 0.3973 (0.4111) data time 0.0008 (0.0016) model time 0.3966 (0.4104) loss 5.1728 (6.2744) grad_norm 2.7086 (3.7863) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][610/625] eta 0:00:06 lr 0.000024 wd 0.0500 time 0.3939 (0.4109) data time 0.0005 (0.0016) model time 0.3934 (0.4102) loss 7.0748 (6.2749) grad_norm 3.1637 (3.7806) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][620/625] eta 0:00:02 lr 0.000024 wd 0.0500 time 0.3948 (0.4107) data time 0.0005 (0.0016) model time 0.3943 (0.4099) loss 7.0910 (6.2798) grad_norm 7.9625 (3.7753) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 281 training takes 0:04:16 [2024-07-25 13:30:44 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:30:45 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:30:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5420 (0.5420) Acc@1 90.576 (90.576) Acc@5 98.926 (98.926) Mem 14939MB [2024-07-25 13:30:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.122) Loss 0.8101 (0.6530) Acc@1 82.812 (87.691) Acc@5 97.168 (98.065) Mem 14939MB [2024-07-25 13:30:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.105) Loss 0.8989 (0.7551) Acc@1 80.029 (84.807) Acc@5 96.143 (97.149) Mem 14939MB [2024-07-25 13:30:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.391 Acc@5 97.117 [2024-07-25 13:30:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.728 (0.728) Loss 0.5410 (0.5410) Acc@1 90.283 (90.283) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:30:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.152) Loss 0.8076 (0.6534) Acc@1 82.910 (87.624) Acc@5 97.266 (98.042) Mem 14939MB [2024-07-25 13:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.120) Loss 0.9067 (0.7563) Acc@1 79.150 (84.712) Acc@5 96.191 (97.105) Mem 14939MB [2024-07-25 13:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.317 Acc@5 97.073 [2024-07-25 13:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:30:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.32% [2024-07-25 13:30:50 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:30:52 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][0/625] eta 0:07:40 lr 0.000024 wd 0.0500 time 0.7367 (0.7367) data time 0.3608 (0.3608) model time 0.0000 (0.0000) loss 6.9849 (6.9849) grad_norm 2.2904 (2.2904) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][10/625] eta 0:04:33 lr 0.000024 wd 0.0500 time 0.3963 (0.4442) data time 0.0010 (0.0336) model time 0.0000 (0.0000) loss 7.0091 (6.2375) grad_norm 4.8409 (3.3956) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][20/625] eta 0:04:14 lr 0.000024 wd 0.0500 time 0.3970 (0.4214) data time 0.0006 (0.0180) model time 0.0000 (0.0000) loss 5.9029 (6.3049) grad_norm 3.0548 (2.9912) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][30/625] eta 0:04:05 lr 0.000024 wd 0.0500 time 0.3951 (0.4130) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 6.7661 (6.2268) grad_norm 3.2027 (3.3647) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][40/625] eta 0:03:59 lr 0.000024 wd 0.0500 time 0.3974 (0.4089) data time 0.0006 (0.0097) model time 0.0000 (0.0000) loss 5.2767 (6.2640) grad_norm 2.6845 (3.3442) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][50/625] eta 0:03:53 lr 0.000024 wd 0.0500 time 0.3996 (0.4064) data time 0.0006 (0.0079) model time 0.0000 (0.0000) loss 5.6079 (6.2518) grad_norm 2.8439 (3.4554) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][60/625] eta 0:03:48 lr 0.000024 wd 0.0500 time 0.3958 (0.4051) data time 0.0008 (0.0068) model time 0.3950 (0.3976) loss 6.5631 (6.2362) grad_norm 2.9453 (3.4722) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][70/625] eta 0:03:44 lr 0.000024 wd 0.0500 time 0.4036 (0.4040) data time 0.0008 (0.0059) model time 0.4028 (0.3969) loss 6.0247 (6.2272) grad_norm 4.9750 (3.4637) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][80/625] eta 0:03:39 lr 0.000024 wd 0.0500 time 0.3957 (0.4032) data time 0.0008 (0.0053) model time 0.3949 (0.3969) loss 5.3394 (6.2312) grad_norm 2.1541 (3.4262) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][90/625] eta 0:03:35 lr 0.000024 wd 0.0500 time 0.3974 (0.4025) data time 0.0009 (0.0048) model time 0.3965 (0.3965) loss 5.5764 (6.1952) grad_norm 2.0631 (3.4341) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][100/625] eta 0:03:30 lr 0.000024 wd 0.0500 time 0.3975 (0.4018) data time 0.0006 (0.0044) model time 0.3969 (0.3962) loss 6.6808 (6.2146) grad_norm 3.0594 (3.6313) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][110/625] eta 0:03:26 lr 0.000024 wd 0.0500 time 0.3961 (0.4014) data time 0.0006 (0.0041) model time 0.3955 (0.3962) loss 7.1156 (6.2251) grad_norm 2.2912 (4.7865) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][120/625] eta 0:03:22 lr 0.000024 wd 0.0500 time 0.3959 (0.4010) data time 0.0007 (0.0038) model time 0.3952 (0.3962) loss 6.2798 (6.2332) grad_norm 5.9008 (4.6577) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][130/625] eta 0:03:18 lr 0.000024 wd 0.0500 time 0.4012 (0.4006) data time 0.0006 (0.0036) model time 0.4006 (0.3961) loss 6.7341 (6.2468) grad_norm 2.4896 (4.5592) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][140/625] eta 0:03:16 lr 0.000024 wd 0.0500 time 0.5239 (0.4057) data time 0.0007 (0.0034) model time 0.5233 (0.4045) loss 5.0456 (6.2251) grad_norm 4.1700 (4.5212) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][150/625] eta 0:03:16 lr 0.000024 wd 0.0500 time 0.6038 (0.4136) data time 0.0007 (0.0032) model time 0.6031 (0.4164) loss 7.1913 (6.2319) grad_norm 14.8967 (4.5349) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:31:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][160/625] eta 0:03:13 lr 0.000024 wd 0.0500 time 0.5170 (0.4167) data time 0.0008 (0.0031) model time 0.5162 (0.4206) loss 6.2829 (6.2149) grad_norm 3.9713 (4.5206) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][170/625] eta 0:03:09 lr 0.000024 wd 0.0500 time 0.3955 (0.4155) data time 0.0008 (0.0029) model time 0.3947 (0.4185) loss 6.0020 (6.2055) grad_norm 2.7085 (4.4679) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][180/625] eta 0:03:04 lr 0.000024 wd 0.0500 time 0.3936 (0.4144) data time 0.0007 (0.0028) model time 0.3930 (0.4167) loss 5.5793 (6.2132) grad_norm 2.6963 (4.3841) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][190/625] eta 0:02:59 lr 0.000024 wd 0.0500 time 0.3964 (0.4135) data time 0.0006 (0.0027) model time 0.3957 (0.4153) loss 6.3326 (6.2373) grad_norm 2.9274 (4.3170) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][200/625] eta 0:02:55 lr 0.000024 wd 0.0500 time 0.3950 (0.4126) data time 0.0006 (0.0026) model time 0.3944 (0.4139) loss 4.8275 (6.2625) grad_norm 2.4236 (4.2692) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][210/625] eta 0:02:50 lr 0.000024 wd 0.0500 time 0.3963 (0.4119) data time 0.0006 (0.0025) model time 0.3957 (0.4128) loss 6.4430 (6.2782) grad_norm 4.5636 (4.5657) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][220/625] eta 0:02:46 lr 0.000024 wd 0.0500 time 0.3969 (0.4112) data time 0.0009 (0.0025) model time 0.3960 (0.4118) loss 6.4573 (6.2799) grad_norm 2.3714 (4.4849) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][230/625] eta 0:02:42 lr 0.000024 wd 0.0500 time 0.3964 (0.4115) data time 0.0008 (0.0024) model time 0.3956 (0.4121) loss 6.4406 (6.2666) grad_norm 3.7193 (4.4382) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][240/625] eta 0:02:38 lr 0.000024 wd 0.0500 time 0.3964 (0.4110) data time 0.0008 (0.0023) model time 0.3956 (0.4113) loss 5.9488 (6.2630) grad_norm 2.3730 (4.4677) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][250/625] eta 0:02:33 lr 0.000024 wd 0.0500 time 0.3958 (0.4104) data time 0.0008 (0.0023) model time 0.3950 (0.4106) loss 6.8640 (6.2756) grad_norm 2.0333 (4.4097) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][260/625] eta 0:02:29 lr 0.000024 wd 0.0500 time 0.3955 (0.4099) data time 0.0007 (0.0022) model time 0.3948 (0.4099) loss 6.9115 (6.2685) grad_norm 3.4440 (4.3805) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][270/625] eta 0:02:25 lr 0.000024 wd 0.0500 time 0.3949 (0.4094) data time 0.0009 (0.0022) model time 0.3939 (0.4092) loss 6.0833 (6.2684) grad_norm 4.8445 (4.4762) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][280/625] eta 0:02:21 lr 0.000023 wd 0.0500 time 0.3940 (0.4089) data time 0.0008 (0.0021) model time 0.3931 (0.4086) loss 7.3244 (6.2673) grad_norm 2.5944 (4.4147) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][290/625] eta 0:02:16 lr 0.000023 wd 0.0500 time 0.3963 (0.4085) data time 0.0009 (0.0021) model time 0.3955 (0.4081) loss 6.2433 (6.2740) grad_norm 3.2958 (4.3776) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][300/625] eta 0:02:12 lr 0.000023 wd 0.0500 time 0.3963 (0.4081) data time 0.0009 (0.0020) model time 0.3954 (0.4076) loss 6.5467 (6.2767) grad_norm 2.4753 (4.3373) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:32:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][310/625] eta 0:02:08 lr 0.000023 wd 0.0500 time 0.3956 (0.4077) data time 0.0006 (0.0020) model time 0.3949 (0.4072) loss 6.3118 (6.2780) grad_norm 4.3205 (4.3136) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][320/625] eta 0:02:04 lr 0.000023 wd 0.0500 time 0.3982 (0.4074) data time 0.0009 (0.0020) model time 0.3973 (0.4068) loss 6.5391 (6.2812) grad_norm 3.8048 (4.2819) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][330/625] eta 0:02:00 lr 0.000023 wd 0.0500 time 0.3947 (0.4071) data time 0.0009 (0.0019) model time 0.3938 (0.4064) loss 6.7316 (6.2871) grad_norm 6.0174 (4.3396) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][340/625] eta 0:01:55 lr 0.000023 wd 0.0500 time 0.3940 (0.4068) data time 0.0008 (0.0019) model time 0.3931 (0.4060) loss 4.8181 (6.2858) grad_norm 4.6049 (4.3296) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][350/625] eta 0:01:51 lr 0.000023 wd 0.0500 time 0.3962 (0.4065) data time 0.0008 (0.0019) model time 0.3954 (0.4057) loss 7.3400 (6.2830) grad_norm 4.1882 (4.3818) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][360/625] eta 0:01:48 lr 0.000023 wd 0.0500 time 0.4063 (0.4082) data time 0.0006 (0.0018) model time 0.4057 (0.4077) loss 5.9177 (6.2802) grad_norm 2.0822 (4.3639) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][370/625] eta 0:01:44 lr 0.000023 wd 0.0500 time 0.6082 (0.4105) data time 0.0008 (0.0018) model time 0.6074 (0.4104) loss 6.3867 (6.2854) grad_norm 5.1658 (4.3313) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][380/625] eta 0:01:40 lr 0.000023 wd 0.0500 time 0.3956 (0.4115) data time 0.0007 (0.0018) model time 0.3949 (0.4114) loss 6.1619 (6.2855) grad_norm 3.6119 (4.3524) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][390/625] eta 0:01:36 lr 0.000023 wd 0.0500 time 0.3968 (0.4114) data time 0.0007 (0.0017) model time 0.3961 (0.4113) loss 6.9281 (6.2904) grad_norm 13.3393 (4.3572) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][400/625] eta 0:01:32 lr 0.000023 wd 0.0500 time 0.3973 (0.4111) data time 0.0008 (0.0017) model time 0.3965 (0.4109) loss 5.1177 (6.2895) grad_norm 2.4249 (4.3207) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][410/625] eta 0:01:28 lr 0.000023 wd 0.0500 time 0.3959 (0.4107) data time 0.0008 (0.0017) model time 0.3951 (0.4105) loss 5.5799 (6.2911) grad_norm 2.9042 (4.2773) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][420/625] eta 0:01:24 lr 0.000023 wd 0.0500 time 0.3945 (0.4104) data time 0.0007 (0.0017) model time 0.3938 (0.4102) loss 6.0663 (6.2919) grad_norm 2.6276 (4.2465) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][430/625] eta 0:01:19 lr 0.000023 wd 0.0500 time 0.3996 (0.4101) data time 0.0006 (0.0017) model time 0.3990 (0.4098) loss 6.2566 (6.2971) grad_norm 5.0462 (4.2470) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][440/625] eta 0:01:15 lr 0.000023 wd 0.0500 time 0.3951 (0.4098) data time 0.0006 (0.0016) model time 0.3945 (0.4094) loss 5.7653 (6.2984) grad_norm 3.4418 (4.2352) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:33:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][450/625] eta 0:01:11 lr 0.000023 wd 0.0500 time 0.4016 (0.4099) data time 0.0006 (0.0016) model time 0.4010 (0.4095) loss 6.7282 (6.2971) grad_norm 2.0983 (4.2170) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][460/625] eta 0:01:07 lr 0.000023 wd 0.0500 time 0.3966 (0.4096) data time 0.0007 (0.0016) model time 0.3959 (0.4092) loss 5.9618 (6.2974) grad_norm 2.2513 (4.2335) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][470/625] eta 0:01:03 lr 0.000023 wd 0.0500 time 0.3944 (0.4093) data time 0.0006 (0.0016) model time 0.3938 (0.4088) loss 6.5157 (6.3008) grad_norm 4.6613 (4.2256) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][480/625] eta 0:00:59 lr 0.000023 wd 0.0500 time 0.3973 (0.4090) data time 0.0007 (0.0016) model time 0.3967 (0.4085) loss 6.2406 (6.3091) grad_norm 2.9088 (4.2036) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][490/625] eta 0:00:55 lr 0.000023 wd 0.0500 time 0.3974 (0.4088) data time 0.0006 (0.0016) model time 0.3968 (0.4082) loss 7.7018 (6.3088) grad_norm 3.5657 (4.1829) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][500/625] eta 0:00:51 lr 0.000023 wd 0.0500 time 0.3986 (0.4086) data time 0.0008 (0.0016) model time 0.3978 (0.4080) loss 5.7232 (6.3101) grad_norm 2.7279 (4.1691) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][510/625] eta 0:00:46 lr 0.000023 wd 0.0500 time 0.3976 (0.4084) data time 0.0008 (0.0015) model time 0.3967 (0.4077) loss 6.7063 (6.3077) grad_norm 38.5501 (4.2212) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][520/625] eta 0:00:42 lr 0.000023 wd 0.0500 time 0.3958 (0.4081) data time 0.0008 (0.0015) model time 0.3950 (0.4075) loss 7.0759 (6.3062) grad_norm 2.1245 (4.2032) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][530/625] eta 0:00:38 lr 0.000023 wd 0.0500 time 0.3960 (0.4079) data time 0.0007 (0.0015) model time 0.3953 (0.4073) loss 6.6885 (6.3053) grad_norm 2.9695 (4.1970) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][540/625] eta 0:00:34 lr 0.000023 wd 0.0500 time 0.3960 (0.4077) data time 0.0009 (0.0015) model time 0.3951 (0.4070) loss 6.0938 (6.3055) grad_norm 2.9148 (4.1959) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][550/625] eta 0:00:30 lr 0.000023 wd 0.0500 time 0.3962 (0.4075) data time 0.0010 (0.0015) model time 0.3952 (0.4068) loss 6.3906 (6.3112) grad_norm 3.0077 (4.1944) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][560/625] eta 0:00:26 lr 0.000023 wd 0.0500 time 0.3964 (0.4073) data time 0.0008 (0.0015) model time 0.3956 (0.4066) loss 6.1917 (6.3106) grad_norm 4.2419 (4.1866) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][570/625] eta 0:00:22 lr 0.000023 wd 0.0500 time 0.3957 (0.4071) data time 0.0008 (0.0015) model time 0.3950 (0.4064) loss 6.7274 (6.3135) grad_norm 3.4800 (4.1856) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][580/625] eta 0:00:18 lr 0.000023 wd 0.0500 time 0.5885 (0.4088) data time 0.0006 (0.0015) model time 0.5879 (0.4082) loss 6.1653 (6.3094) grad_norm 2.2157 (4.1694) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][590/625] eta 0:00:14 lr 0.000023 wd 0.0500 time 0.4047 (0.4098) data time 0.0008 (0.0015) model time 0.4040 (0.4093) loss 6.0476 (6.3029) grad_norm 2.6540 (4.1549) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:34:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][600/625] eta 0:00:10 lr 0.000023 wd 0.0500 time 0.3953 (0.4105) data time 0.0008 (0.0014) model time 0.3945 (0.4100) loss 6.3356 (6.3029) grad_norm 2.8074 (4.1377) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][610/625] eta 0:00:06 lr 0.000023 wd 0.0500 time 0.3960 (0.4105) data time 0.0003 (0.0014) model time 0.3957 (0.4100) loss 4.9969 (6.2998) grad_norm 2.4161 (4.1556) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][620/625] eta 0:00:02 lr 0.000023 wd 0.0500 time 0.3959 (0.4102) data time 0.0003 (0.0014) model time 0.3955 (0.4097) loss 7.4024 (6.2994) grad_norm 3.2217 (4.2364) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 282 training takes 0:04:16 [2024-07-25 13:35:08 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:35:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:35:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.451 (0.451) Loss 0.5425 (0.5425) Acc@1 90.576 (90.576) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:35:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8076 (0.6545) Acc@1 82.812 (87.726) Acc@5 97.266 (98.047) Mem 14939MB [2024-07-25 13:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.8999 (0.7567) Acc@1 79.248 (84.801) Acc@5 96.094 (97.128) Mem 14939MB [2024-07-25 13:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.429 Acc@5 97.099 [2024-07-25 13:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:35:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.43% [2024-07-25 13:35:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 13:35:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 13:35:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5410 (0.5410) Acc@1 90.381 (90.381) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8076 (0.6535) Acc@1 82.910 (87.633) Acc@5 97.266 (98.056) Mem 14939MB [2024-07-25 13:35:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9062 (0.7563) Acc@1 79.150 (84.717) Acc@5 96.191 (97.117) Mem 14939MB [2024-07-25 13:35:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.321 Acc@5 97.085 [2024-07-25 13:35:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:35:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.32% [2024-07-25 13:35:15 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:35:16 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:35:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][0/625] eta 0:07:45 lr 0.000023 wd 0.0500 time 0.7444 (0.7444) data time 0.3634 (0.3634) model time 0.0000 (0.0000) loss 5.5295 (5.5295) grad_norm 2.6780 (2.6780) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][10/625] eta 0:04:23 lr 0.000023 wd 0.0500 time 0.3945 (0.4281) data time 0.0006 (0.0338) model time 0.0000 (0.0000) loss 5.8272 (6.3643) grad_norm 2.9070 (2.7624) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][20/625] eta 0:04:09 lr 0.000023 wd 0.0500 time 0.3956 (0.4130) data time 0.0008 (0.0181) model time 0.0000 (0.0000) loss 6.2603 (6.3564) grad_norm 3.1124 (3.7946) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][30/625] eta 0:04:03 lr 0.000023 wd 0.0500 time 0.3976 (0.4092) data time 0.0006 (0.0125) model time 0.0000 (0.0000) loss 5.5213 (6.2727) grad_norm 2.2800 (3.6374) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][40/625] eta 0:03:57 lr 0.000023 wd 0.0500 time 0.4089 (0.4065) data time 0.0009 (0.0097) model time 0.0000 (0.0000) loss 6.1619 (6.2754) grad_norm 2.8914 (3.4984) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][50/625] eta 0:03:52 lr 0.000023 wd 0.0500 time 0.4056 (0.4049) data time 0.0006 (0.0079) model time 0.0000 (0.0000) loss 5.6603 (6.2914) grad_norm 5.2000 (3.5732) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][60/625] eta 0:03:47 lr 0.000023 wd 0.0500 time 0.3967 (0.4035) data time 0.0006 (0.0068) model time 0.3961 (0.3950) loss 6.7497 (6.3540) grad_norm 3.8060 (3.4983) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][70/625] eta 0:03:43 lr 0.000023 wd 0.0500 time 0.3939 (0.4026) data time 0.0008 (0.0059) model time 0.3931 (0.3960) loss 6.0082 (6.3563) grad_norm 2.2717 (3.5137) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][80/625] eta 0:03:38 lr 0.000023 wd 0.0500 time 0.3945 (0.4018) data time 0.0008 (0.0053) model time 0.3937 (0.3956) loss 6.2644 (6.3527) grad_norm 3.7720 (3.5640) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][90/625] eta 0:03:34 lr 0.000023 wd 0.0500 time 0.3950 (0.4015) data time 0.0009 (0.0048) model time 0.3941 (0.3964) loss 7.1735 (6.3707) grad_norm 3.9608 (3.5155) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:35:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][100/625] eta 0:03:30 lr 0.000023 wd 0.0500 time 0.3958 (0.4011) data time 0.0006 (0.0044) model time 0.3952 (0.3964) loss 5.4577 (6.3468) grad_norm 2.0695 (3.5542) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][110/625] eta 0:03:26 lr 0.000023 wd 0.0500 time 0.3958 (0.4010) data time 0.0006 (0.0041) model time 0.3952 (0.3969) loss 7.0640 (6.3114) grad_norm 2.7850 (3.4724) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][120/625] eta 0:03:22 lr 0.000023 wd 0.0500 time 0.4389 (0.4010) data time 0.0008 (0.0038) model time 0.4381 (0.3973) loss 6.1720 (6.3009) grad_norm 4.7194 (3.5468) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][130/625] eta 0:03:18 lr 0.000023 wd 0.0500 time 0.3963 (0.4010) data time 0.0008 (0.0038) model time 0.3955 (0.3973) loss 5.1089 (6.3142) grad_norm 2.3218 (3.5166) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][140/625] eta 0:03:14 lr 0.000022 wd 0.0500 time 0.3938 (0.4009) data time 0.0009 (0.0036) model time 0.3929 (0.3974) loss 6.0369 (6.3085) grad_norm 2.8942 (3.5058) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][150/625] eta 0:03:10 lr 0.000022 wd 0.0500 time 0.3956 (0.4006) data time 0.0006 (0.0035) model time 0.3950 (0.3972) loss 5.7686 (6.3128) grad_norm 2.8937 (3.4950) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][160/625] eta 0:03:06 lr 0.000022 wd 0.0500 time 0.3930 (0.4004) data time 0.0008 (0.0033) model time 0.3921 (0.3971) loss 6.9592 (6.3240) grad_norm 3.1845 (3.4704) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][170/625] eta 0:03:02 lr 0.000022 wd 0.0500 time 0.3769 (0.4011) data time 0.0007 (0.0031) model time 0.3762 (0.3984) loss 6.8090 (6.3336) grad_norm 5.5407 (3.4900) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][180/625] eta 0:03:00 lr 0.000022 wd 0.0500 time 0.5831 (0.4063) data time 0.0008 (0.0030) model time 0.5823 (0.4057) loss 7.3232 (6.3359) grad_norm 2.9124 (3.4686) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][190/625] eta 0:02:59 lr 0.000022 wd 0.0500 time 0.5930 (0.4128) data time 0.0006 (0.0029) model time 0.5924 (0.4146) loss 5.2919 (6.3440) grad_norm 3.5755 (3.4859) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][200/625] eta 0:02:55 lr 0.000022 wd 0.0500 time 0.3937 (0.4139) data time 0.0007 (0.0028) model time 0.3930 (0.4159) loss 5.3353 (6.3409) grad_norm 6.6036 (3.5243) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][210/625] eta 0:02:51 lr 0.000022 wd 0.0500 time 0.3955 (0.4131) data time 0.0006 (0.0027) model time 0.3949 (0.4146) loss 6.4390 (6.3308) grad_norm 3.1162 (3.5308) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][220/625] eta 0:02:47 lr 0.000022 wd 0.0500 time 0.3976 (0.4124) data time 0.0008 (0.0026) model time 0.3968 (0.4136) loss 6.1840 (6.3204) grad_norm 4.4310 (3.5303) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][230/625] eta 0:02:42 lr 0.000022 wd 0.0500 time 0.3947 (0.4117) data time 0.0007 (0.0025) model time 0.3940 (0.4126) loss 6.7367 (6.3289) grad_norm 2.7174 (3.5803) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][240/625] eta 0:02:38 lr 0.000022 wd 0.0500 time 0.3988 (0.4111) data time 0.0006 (0.0025) model time 0.3982 (0.4118) loss 6.3998 (6.3414) grad_norm 4.1648 (3.5899) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:36:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][250/625] eta 0:02:33 lr 0.000022 wd 0.0500 time 0.3949 (0.4105) data time 0.0008 (0.0024) model time 0.3941 (0.4109) loss 5.7571 (6.3234) grad_norm 2.8362 (3.5746) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:37:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][260/625] eta 0:02:29 lr 0.000022 wd 0.0500 time 0.3925 (0.4100) data time 0.0006 (0.0023) model time 0.3919 (0.4102) loss 6.6188 (6.3219) grad_norm 5.6621 (3.5611) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:37:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][270/625] eta 0:02:25 lr 0.000022 wd 0.0500 time 0.3953 (0.4096) data time 0.0008 (0.0023) model time 0.3945 (0.4097) loss 6.8187 (6.3390) grad_norm 2.8471 (3.5725) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:37:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][280/625] eta 0:02:21 lr 0.000022 wd 0.0500 time 0.3975 (0.4092) data time 0.0008 (0.0022) model time 0.3966 (0.4091) loss 5.1530 (6.3295) grad_norm 2.8633 (3.6682) loss_scale 128.0000 (66.0498) mem 14939MB [2024-07-25 13:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][290/625] eta 0:02:16 lr 0.000022 wd 0.0500 time 0.3942 (0.4088) data time 0.0007 (0.0022) model time 0.3935 (0.4086) loss 6.3200 (6.3312) grad_norm 2.2629 (3.6547) loss_scale 128.0000 (68.1787) mem 14939MB [2024-07-25 13:37:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][300/625] eta 0:02:12 lr 0.000022 wd 0.0500 time 0.4105 (0.4085) data time 0.0006 (0.0021) model time 0.4100 (0.4082) loss 6.2538 (6.3227) grad_norm 3.3905 (3.6583) loss_scale 128.0000 (70.1661) mem 14939MB [2024-07-25 13:37:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][310/625] eta 0:02:08 lr 0.000022 wd 0.0500 time 0.3939 (0.4082) data time 0.0008 (0.0021) model time 0.3931 (0.4079) loss 5.6106 (6.3096) grad_norm 3.7935 (3.6434) loss_scale 128.0000 (72.0257) mem 14939MB [2024-07-25 13:37:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][320/625] eta 0:02:04 lr 0.000022 wd 0.0500 time 0.3959 (0.4078) data time 0.0006 (0.0021) model time 0.3954 (0.4074) loss 6.2643 (6.3101) grad_norm 5.3235 (3.6397) loss_scale 128.0000 (73.7695) mem 14939MB [2024-07-25 13:37:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][330/625] eta 0:02:00 lr 0.000022 wd 0.0500 time 0.3974 (0.4075) data time 0.0006 (0.0020) model time 0.3968 (0.4070) loss 5.6459 (6.3166) grad_norm 2.8017 (3.6231) loss_scale 128.0000 (75.4079) mem 14939MB [2024-07-25 13:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][340/625] eta 0:01:56 lr 0.000022 wd 0.0500 time 0.4020 (0.4072) data time 0.0008 (0.0020) model time 0.4012 (0.4067) loss 6.8698 (6.3207) grad_norm 4.3490 (3.6154) loss_scale 128.0000 (76.9501) mem 14939MB [2024-07-25 13:37:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][350/625] eta 0:01:51 lr 0.000022 wd 0.0500 time 0.3936 (0.4071) data time 0.0006 (0.0020) model time 0.3930 (0.4066) loss 5.6974 (6.3109) grad_norm 3.7697 (3.6100) loss_scale 128.0000 (78.4046) mem 14939MB [2024-07-25 13:37:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][360/625] eta 0:01:47 lr 0.000022 wd 0.0500 time 0.3978 (0.4069) data time 0.0006 (0.0019) model time 0.3972 (0.4062) loss 5.6780 (6.3078) grad_norm 6.3618 (3.6196) loss_scale 128.0000 (79.7784) mem 14939MB [2024-07-25 13:37:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][370/625] eta 0:01:43 lr 0.000022 wd 0.0500 time 0.3957 (0.4066) data time 0.0008 (0.0019) model time 0.3949 (0.4059) loss 4.7930 (6.3058) grad_norm 3.4027 (3.7303) loss_scale 128.0000 (81.0782) mem 14939MB [2024-07-25 13:37:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][380/625] eta 0:01:39 lr 0.000022 wd 0.0500 time 0.3997 (0.4063) data time 0.0006 (0.0019) model time 0.3991 (0.4056) loss 5.3081 (6.3066) grad_norm 3.3185 (3.7289) loss_scale 128.0000 (82.3097) mem 14939MB [2024-07-25 13:37:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][390/625] eta 0:01:35 lr 0.000022 wd 0.0500 time 0.5779 (0.4065) data time 0.0008 (0.0019) model time 0.5771 (0.4058) loss 6.3245 (6.3038) grad_norm 3.1774 (3.7247) loss_scale 128.0000 (83.4783) mem 14939MB [2024-07-25 13:38:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][400/625] eta 0:01:32 lr 0.000022 wd 0.0500 time 0.5695 (0.4089) data time 0.0008 (0.0018) model time 0.5688 (0.4086) loss 6.6669 (6.3049) grad_norm 4.4395 (3.7252) loss_scale 128.0000 (84.5885) mem 14939MB [2024-07-25 13:38:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][410/625] eta 0:01:28 lr 0.000022 wd 0.0500 time 0.5766 (0.4117) data time 0.0009 (0.0018) model time 0.5757 (0.4117) loss 6.4686 (6.3101) grad_norm 2.7344 (3.7448) loss_scale 128.0000 (85.6448) mem 14939MB [2024-07-25 13:38:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][420/625] eta 0:01:24 lr 0.000022 wd 0.0500 time 0.3961 (0.4122) data time 0.0009 (0.0018) model time 0.3953 (0.4122) loss 6.1083 (6.3069) grad_norm 2.1982 (3.7571) loss_scale 128.0000 (86.6508) mem 14939MB [2024-07-25 13:38:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][430/625] eta 0:01:20 lr 0.000022 wd 0.0500 time 0.3942 (0.4118) data time 0.0006 (0.0018) model time 0.3935 (0.4118) loss 5.6536 (6.3035) grad_norm 3.0566 (3.7401) loss_scale 128.0000 (87.6102) mem 14939MB [2024-07-25 13:38:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][440/625] eta 0:01:16 lr 0.000022 wd 0.0500 time 0.3942 (0.4114) data time 0.0008 (0.0017) model time 0.3934 (0.4114) loss 6.0397 (6.3024) grad_norm 2.5737 (3.7322) loss_scale 128.0000 (88.5261) mem 14939MB [2024-07-25 13:38:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][450/625] eta 0:01:11 lr 0.000022 wd 0.0500 time 0.3945 (0.4111) data time 0.0007 (0.0017) model time 0.3939 (0.4110) loss 6.1152 (6.3002) grad_norm 4.1790 (3.7662) loss_scale 128.0000 (89.4013) mem 14939MB [2024-07-25 13:38:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][460/625] eta 0:01:07 lr 0.000022 wd 0.0500 time 0.3958 (0.4108) data time 0.0006 (0.0017) model time 0.3951 (0.4106) loss 5.6455 (6.2896) grad_norm 2.8596 (3.8949) loss_scale 128.0000 (90.2386) mem 14939MB [2024-07-25 13:38:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][470/625] eta 0:01:03 lr 0.000022 wd 0.0500 time 0.3861 (0.4105) data time 0.0009 (0.0017) model time 0.3852 (0.4102) loss 5.5403 (6.2888) grad_norm 2.2224 (3.8970) loss_scale 128.0000 (91.0403) mem 14939MB [2024-07-25 13:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][480/625] eta 0:00:59 lr 0.000022 wd 0.0500 time 0.3987 (0.4102) data time 0.0008 (0.0017) model time 0.3979 (0.4099) loss 6.5239 (6.2867) grad_norm 2.4164 (3.9053) loss_scale 128.0000 (91.8087) mem 14939MB [2024-07-25 13:38:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][490/625] eta 0:00:55 lr 0.000022 wd 0.0500 time 0.3933 (0.4099) data time 0.0009 (0.0016) model time 0.3924 (0.4096) loss 6.1665 (6.2911) grad_norm 3.1077 (3.9023) loss_scale 128.0000 (92.5458) mem 14939MB [2024-07-25 13:38:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][500/625] eta 0:00:51 lr 0.000022 wd 0.0500 time 0.3957 (0.4097) data time 0.0006 (0.0016) model time 0.3951 (0.4093) loss 6.2699 (6.2927) grad_norm 2.3528 (3.8951) loss_scale 128.0000 (93.2535) mem 14939MB [2024-07-25 13:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][510/625] eta 0:00:47 lr 0.000022 wd 0.0500 time 0.3963 (0.4094) data time 0.0006 (0.0016) model time 0.3957 (0.4090) loss 6.3096 (6.2916) grad_norm 2.7074 (3.8776) loss_scale 128.0000 (93.9335) mem 14939MB [2024-07-25 13:38:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][520/625] eta 0:00:42 lr 0.000022 wd 0.0500 time 0.3959 (0.4091) data time 0.0006 (0.0016) model time 0.3953 (0.4087) loss 6.0828 (6.3013) grad_norm 3.4680 (3.8625) loss_scale 128.0000 (94.5873) mem 14939MB [2024-07-25 13:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][530/625] eta 0:00:38 lr 0.000022 wd 0.0500 time 0.3941 (0.4089) data time 0.0008 (0.0016) model time 0.3933 (0.4084) loss 5.7614 (6.2983) grad_norm 5.0425 (3.8534) loss_scale 128.0000 (95.2166) mem 14939MB [2024-07-25 13:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][540/625] eta 0:00:34 lr 0.000022 wd 0.0500 time 0.3957 (0.4087) data time 0.0008 (0.0016) model time 0.3949 (0.4082) loss 7.3584 (6.2966) grad_norm 4.2149 (3.8430) loss_scale 128.0000 (95.8226) mem 14939MB [2024-07-25 13:39:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][550/625] eta 0:00:30 lr 0.000022 wd 0.0500 time 0.3945 (0.4085) data time 0.0009 (0.0016) model time 0.3936 (0.4080) loss 7.1917 (6.2961) grad_norm 2.4353 (3.8241) loss_scale 128.0000 (96.4065) mem 14939MB [2024-07-25 13:39:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][560/625] eta 0:00:26 lr 0.000022 wd 0.0500 time 0.3964 (0.4083) data time 0.0007 (0.0015) model time 0.3957 (0.4077) loss 6.2719 (6.2945) grad_norm 2.7263 (3.8052) loss_scale 128.0000 (96.9697) mem 14939MB [2024-07-25 13:39:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][570/625] eta 0:00:22 lr 0.000022 wd 0.0500 time 0.3979 (0.4081) data time 0.0006 (0.0015) model time 0.3973 (0.4075) loss 5.8607 (6.2964) grad_norm 3.2829 (3.7851) loss_scale 128.0000 (97.5131) mem 14939MB [2024-07-25 13:39:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][580/625] eta 0:00:18 lr 0.000022 wd 0.0500 time 0.3980 (0.4079) data time 0.0007 (0.0015) model time 0.3973 (0.4073) loss 7.1695 (6.2974) grad_norm 2.8787 (3.8374) loss_scale 128.0000 (98.0379) mem 14939MB [2024-07-25 13:39:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][590/625] eta 0:00:14 lr 0.000022 wd 0.0500 time 0.3951 (0.4077) data time 0.0006 (0.0015) model time 0.3945 (0.4071) loss 5.6510 (6.2949) grad_norm 4.0488 (3.8330) loss_scale 128.0000 (98.5448) mem 14939MB [2024-07-25 13:39:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][600/625] eta 0:00:10 lr 0.000022 wd 0.0500 time 0.3990 (0.4075) data time 0.0006 (0.0015) model time 0.3984 (0.4069) loss 6.7387 (6.2905) grad_norm 4.5972 (3.8378) loss_scale 128.0000 (99.0349) mem 14939MB [2024-07-25 13:39:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][610/625] eta 0:00:06 lr 0.000022 wd 0.0500 time 0.5848 (0.4080) data time 0.0006 (0.0015) model time 0.5842 (0.4073) loss 6.6392 (6.2898) grad_norm 2.3724 (3.8318) loss_scale 128.0000 (99.5090) mem 14939MB [2024-07-25 13:39:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][620/625] eta 0:00:02 lr 0.000022 wd 0.0500 time 0.5712 (0.4093) data time 0.0004 (0.0015) model time 0.5708 (0.4088) loss 6.9841 (6.2895) grad_norm 2.7974 (3.8242) loss_scale 128.0000 (99.9678) mem 14939MB [2024-07-25 13:39:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 283 training takes 0:04:16 [2024-07-25 13:39:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:39:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:39:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.439 (0.439) Loss 0.5479 (0.5479) Acc@1 90.234 (90.234) Acc@5 99.072 (99.072) Mem 14939MB [2024-07-25 13:39:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.118) Loss 0.8101 (0.6569) Acc@1 82.910 (87.646) Acc@5 97.119 (98.034) Mem 14939MB [2024-07-25 13:39:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 0.9033 (0.7590) Acc@1 79.395 (84.789) Acc@5 96.045 (97.117) Mem 14939MB [2024-07-25 13:39:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.425 Acc@5 97.077 [2024-07-25 13:39:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:39:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.746 (0.746) Loss 0.5410 (0.5410) Acc@1 90.430 (90.430) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:39:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.149) Loss 0.8081 (0.6534) Acc@1 82.861 (87.646) Acc@5 97.266 (98.056) Mem 14939MB [2024-07-25 13:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.119) Loss 0.9062 (0.7563) Acc@1 79.150 (84.733) Acc@5 96.191 (97.114) Mem 14939MB [2024-07-25 13:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.339 Acc@5 97.083 [2024-07-25 13:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:39:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.34% [2024-07-25 13:39:38 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:39:39 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][0/625] eta 0:07:54 lr 0.000022 wd 0.0500 time 0.7594 (0.7594) data time 0.3858 (0.3858) model time 0.0000 (0.0000) loss 6.2678 (6.2678) grad_norm 2.2398 (2.2398) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:39:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][10/625] eta 0:05:15 lr 0.000022 wd 0.0500 time 0.3951 (0.5122) data time 0.0007 (0.0359) model time 0.0000 (0.0000) loss 5.1255 (5.9294) grad_norm 2.7696 (2.7161) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:39:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][20/625] eta 0:04:39 lr 0.000022 wd 0.0500 time 0.4000 (0.4619) data time 0.0006 (0.0192) model time 0.0000 (0.0000) loss 6.1907 (6.1904) grad_norm 4.8713 (2.9414) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:39:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][30/625] eta 0:04:22 lr 0.000021 wd 0.0500 time 0.3960 (0.4408) data time 0.0009 (0.0133) model time 0.0000 (0.0000) loss 5.7620 (6.1826) grad_norm 2.9617 (3.0587) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:39:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][40/625] eta 0:04:11 lr 0.000021 wd 0.0500 time 0.3924 (0.4301) data time 0.0006 (0.0103) model time 0.0000 (0.0000) loss 6.2986 (6.2222) grad_norm 3.2628 (3.0636) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][50/625] eta 0:04:03 lr 0.000021 wd 0.0500 time 0.3985 (0.4236) data time 0.0007 (0.0084) model time 0.0000 (0.0000) loss 6.7869 (6.2377) grad_norm 2.9536 (3.1574) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][60/625] eta 0:03:56 lr 0.000021 wd 0.0500 time 0.3963 (0.4191) data time 0.0006 (0.0072) model time 0.3957 (0.3949) loss 5.7953 (6.2436) grad_norm 3.3693 (3.1959) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][70/625] eta 0:03:50 lr 0.000021 wd 0.0500 time 0.3966 (0.4159) data time 0.0008 (0.0063) model time 0.3958 (0.3954) loss 6.5556 (6.2451) grad_norm 3.6235 (3.1949) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][80/625] eta 0:03:45 lr 0.000021 wd 0.0500 time 0.4005 (0.4137) data time 0.0008 (0.0056) model time 0.3997 (0.3960) loss 5.0407 (6.2260) grad_norm 2.1433 (3.2898) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][90/625] eta 0:03:40 lr 0.000021 wd 0.0500 time 0.3954 (0.4119) data time 0.0008 (0.0051) model time 0.3946 (0.3962) loss 7.3988 (6.2632) grad_norm 2.9400 (3.5678) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][100/625] eta 0:03:35 lr 0.000021 wd 0.0500 time 0.3965 (0.4104) data time 0.0007 (0.0046) model time 0.3958 (0.3961) loss 6.2500 (6.3024) grad_norm 4.2793 (3.5415) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][110/625] eta 0:03:30 lr 0.000021 wd 0.0500 time 0.3977 (0.4091) data time 0.0008 (0.0043) model time 0.3968 (0.3959) loss 5.4490 (6.2988) grad_norm 2.5641 (3.4630) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][120/625] eta 0:03:26 lr 0.000021 wd 0.0500 time 0.3943 (0.4081) data time 0.0009 (0.0040) model time 0.3935 (0.3960) loss 6.3426 (6.2990) grad_norm 3.4355 (3.4055) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][130/625] eta 0:03:21 lr 0.000021 wd 0.0500 time 0.3898 (0.4073) data time 0.0008 (0.0038) model time 0.3890 (0.3960) loss 5.6480 (6.2855) grad_norm 5.7297 (3.5273) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][140/625] eta 0:03:17 lr 0.000021 wd 0.0500 time 0.3961 (0.4065) data time 0.0009 (0.0036) model time 0.3952 (0.3959) loss 7.8170 (6.2797) grad_norm 3.0383 (3.5035) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][150/625] eta 0:03:13 lr 0.000021 wd 0.0500 time 0.3938 (0.4069) data time 0.0007 (0.0034) model time 0.3931 (0.3975) loss 5.8937 (6.2825) grad_norm 3.4971 (3.5150) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][160/625] eta 0:03:08 lr 0.000021 wd 0.0500 time 0.3943 (0.4062) data time 0.0007 (0.0032) model time 0.3936 (0.3973) loss 6.7771 (6.2918) grad_norm 2.7795 (3.6271) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][170/625] eta 0:03:04 lr 0.000021 wd 0.0500 time 0.3937 (0.4056) data time 0.0009 (0.0031) model time 0.3929 (0.3971) loss 6.1751 (6.3039) grad_norm 3.0436 (3.6620) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][180/625] eta 0:03:00 lr 0.000021 wd 0.0500 time 0.3975 (0.4051) data time 0.0007 (0.0030) model time 0.3968 (0.3970) loss 5.8755 (6.3064) grad_norm 2.7613 (3.6637) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:40:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][190/625] eta 0:02:56 lr 0.000021 wd 0.0500 time 0.3974 (0.4046) data time 0.0009 (0.0029) model time 0.3965 (0.3969) loss 6.0949 (6.2739) grad_norm 3.2299 (3.6308) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][200/625] eta 0:02:51 lr 0.000021 wd 0.0500 time 0.3985 (0.4042) data time 0.0007 (0.0028) model time 0.3979 (0.3968) loss 5.1119 (6.2611) grad_norm 2.6863 (3.5990) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][210/625] eta 0:02:49 lr 0.000021 wd 0.0500 time 0.5818 (0.4074) data time 0.0007 (0.0027) model time 0.5811 (0.4014) loss 6.5722 (6.2649) grad_norm 2.7991 (3.5921) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][220/625] eta 0:02:47 lr 0.000021 wd 0.0500 time 0.5631 (0.4128) data time 0.0007 (0.0026) model time 0.5624 (0.4087) loss 5.8920 (6.2760) grad_norm 17.5352 (3.6287) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][230/625] eta 0:02:44 lr 0.000021 wd 0.0500 time 0.3964 (0.4161) data time 0.0008 (0.0025) model time 0.3956 (0.4131) loss 5.2781 (6.2738) grad_norm 2.7990 (3.6186) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][240/625] eta 0:02:40 lr 0.000021 wd 0.0500 time 0.3949 (0.4159) data time 0.0009 (0.0024) model time 0.3940 (0.4130) loss 5.5959 (6.2684) grad_norm 3.6066 (3.6013) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][250/625] eta 0:02:35 lr 0.000021 wd 0.0500 time 0.3963 (0.4151) data time 0.0006 (0.0024) model time 0.3957 (0.4121) loss 7.2005 (6.2680) grad_norm 2.8149 (3.5746) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][260/625] eta 0:02:31 lr 0.000021 wd 0.0500 time 0.3967 (0.4144) data time 0.0009 (0.0023) model time 0.3959 (0.4113) loss 6.9839 (6.2781) grad_norm 3.9251 (3.6332) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][270/625] eta 0:02:26 lr 0.000021 wd 0.0500 time 0.4023 (0.4138) data time 0.0007 (0.0023) model time 0.4016 (0.4107) loss 6.1689 (6.2722) grad_norm 4.1876 (3.6244) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][280/625] eta 0:02:22 lr 0.000021 wd 0.0500 time 0.3964 (0.4132) data time 0.0007 (0.0022) model time 0.3957 (0.4100) loss 7.2898 (6.2617) grad_norm 2.0201 (3.6636) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][290/625] eta 0:02:18 lr 0.000021 wd 0.0500 time 0.3974 (0.4126) data time 0.0008 (0.0022) model time 0.3966 (0.4094) loss 7.3469 (6.2666) grad_norm 2.6562 (3.6906) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][300/625] eta 0:02:13 lr 0.000021 wd 0.0500 time 0.3940 (0.4121) data time 0.0008 (0.0021) model time 0.3933 (0.4089) loss 6.2131 (6.2569) grad_norm 4.6768 (3.7128) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][310/625] eta 0:02:09 lr 0.000021 wd 0.0500 time 0.3969 (0.4116) data time 0.0008 (0.0021) model time 0.3960 (0.4084) loss 5.9091 (6.2554) grad_norm 2.5511 (3.7049) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][320/625] eta 0:02:05 lr 0.000021 wd 0.0500 time 0.4028 (0.4111) data time 0.0007 (0.0020) model time 0.4021 (0.4079) loss 5.6519 (6.2452) grad_norm 3.9341 (3.8398) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][330/625] eta 0:02:01 lr 0.000021 wd 0.0500 time 0.3962 (0.4107) data time 0.0006 (0.0020) model time 0.3956 (0.4075) loss 6.5193 (6.2489) grad_norm 5.1476 (3.8415) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:41:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][340/625] eta 0:01:56 lr 0.000021 wd 0.0500 time 0.3979 (0.4103) data time 0.0006 (0.0020) model time 0.3973 (0.4071) loss 5.9810 (6.2488) grad_norm 2.2143 (3.8215) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][350/625] eta 0:01:52 lr 0.000021 wd 0.0500 time 0.3951 (0.4099) data time 0.0007 (0.0019) model time 0.3944 (0.4067) loss 6.2943 (6.2578) grad_norm 3.8021 (3.8289) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][360/625] eta 0:01:48 lr 0.000021 wd 0.0500 time 0.3956 (0.4095) data time 0.0009 (0.0019) model time 0.3947 (0.4064) loss 6.3569 (6.2596) grad_norm 2.6676 (3.8071) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][370/625] eta 0:01:44 lr 0.000021 wd 0.0500 time 0.3973 (0.4097) data time 0.0008 (0.0019) model time 0.3966 (0.4067) loss 6.4032 (6.2708) grad_norm 4.0907 (3.9812) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][380/625] eta 0:01:40 lr 0.000021 wd 0.0500 time 0.3958 (0.4094) data time 0.0008 (0.0019) model time 0.3950 (0.4064) loss 7.3193 (6.2753) grad_norm 3.1484 (4.0330) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][390/625] eta 0:01:36 lr 0.000021 wd 0.0500 time 0.3975 (0.4091) data time 0.0008 (0.0018) model time 0.3966 (0.4061) loss 6.7172 (6.2823) grad_norm 3.1266 (4.0221) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][400/625] eta 0:01:31 lr 0.000021 wd 0.0500 time 0.3968 (0.4088) data time 0.0008 (0.0018) model time 0.3960 (0.4058) loss 6.3954 (6.2810) grad_norm 3.0732 (3.9994) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][410/625] eta 0:01:27 lr 0.000021 wd 0.0500 time 0.4001 (0.4086) data time 0.0008 (0.0018) model time 0.3993 (0.4056) loss 5.4992 (6.2774) grad_norm 3.6080 (3.9932) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][420/625] eta 0:01:23 lr 0.000021 wd 0.0500 time 0.3971 (0.4083) data time 0.0008 (0.0018) model time 0.3963 (0.4053) loss 5.6608 (6.2841) grad_norm 2.7888 (4.0149) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][430/625] eta 0:01:19 lr 0.000021 wd 0.0500 time 0.3945 (0.4089) data time 0.0009 (0.0017) model time 0.3936 (0.4060) loss 5.5919 (6.2788) grad_norm 3.2468 (4.0361) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][440/625] eta 0:01:16 lr 0.000021 wd 0.0500 time 0.5827 (0.4113) data time 0.0008 (0.0017) model time 0.5819 (0.4089) loss 6.1086 (6.2824) grad_norm 2.5473 (4.0806) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][450/625] eta 0:01:12 lr 0.000021 wd 0.0500 time 0.5783 (0.4128) data time 0.0008 (0.0017) model time 0.5775 (0.4106) loss 6.9308 (6.2817) grad_norm 2.7082 (4.0608) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][460/625] eta 0:01:08 lr 0.000021 wd 0.0500 time 0.3954 (0.4128) data time 0.0006 (0.0017) model time 0.3948 (0.4107) loss 6.1847 (6.2851) grad_norm 2.3801 (4.1229) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][470/625] eta 0:01:03 lr 0.000021 wd 0.0500 time 0.4005 (0.4125) data time 0.0007 (0.0017) model time 0.3998 (0.4103) loss 7.9189 (6.2858) grad_norm 3.2164 (4.1010) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:42:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][480/625] eta 0:00:59 lr 0.000021 wd 0.0500 time 0.3978 (0.4122) data time 0.0007 (0.0016) model time 0.3971 (0.4100) loss 7.0667 (6.2871) grad_norm 2.4569 (4.0858) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][490/625] eta 0:00:55 lr 0.000021 wd 0.0500 time 0.3934 (0.4118) data time 0.0011 (0.0016) model time 0.3923 (0.4096) loss 6.3384 (6.2839) grad_norm 2.2888 (4.0610) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][500/625] eta 0:00:51 lr 0.000021 wd 0.0500 time 0.3963 (0.4115) data time 0.0008 (0.0016) model time 0.3955 (0.4093) loss 6.5414 (6.2860) grad_norm 4.5174 (4.0424) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][510/625] eta 0:00:47 lr 0.000021 wd 0.0500 time 0.3959 (0.4112) data time 0.0006 (0.0016) model time 0.3953 (0.4090) loss 6.6776 (6.2873) grad_norm 2.3187 (4.0298) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][520/625] eta 0:00:43 lr 0.000021 wd 0.0500 time 0.3960 (0.4109) data time 0.0008 (0.0016) model time 0.3952 (0.4087) loss 7.0716 (6.2903) grad_norm 4.7663 (4.0156) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][530/625] eta 0:00:39 lr 0.000021 wd 0.0500 time 0.3950 (0.4107) data time 0.0006 (0.0016) model time 0.3944 (0.4084) loss 5.3395 (6.2842) grad_norm 4.4122 (4.0174) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][540/625] eta 0:00:34 lr 0.000021 wd 0.0500 time 0.3957 (0.4104) data time 0.0006 (0.0016) model time 0.3951 (0.4082) loss 7.0443 (6.2841) grad_norm 2.8942 (4.0299) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][550/625] eta 0:00:30 lr 0.000021 wd 0.0500 time 0.4013 (0.4102) data time 0.0008 (0.0015) model time 0.4004 (0.4080) loss 6.8135 (6.2835) grad_norm 2.5046 (4.0458) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][560/625] eta 0:00:26 lr 0.000021 wd 0.0500 time 0.3965 (0.4099) data time 0.0009 (0.0015) model time 0.3957 (0.4077) loss 6.2616 (6.2872) grad_norm 4.0805 (4.0240) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][570/625] eta 0:00:22 lr 0.000020 wd 0.0500 time 0.3957 (0.4097) data time 0.0008 (0.0015) model time 0.3949 (0.4075) loss 5.8384 (6.2879) grad_norm 4.7469 (4.0060) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][580/625] eta 0:00:18 lr 0.000020 wd 0.0500 time 0.3946 (0.4095) data time 0.0007 (0.0015) model time 0.3939 (0.4073) loss 6.1911 (6.2814) grad_norm 3.0061 (4.0093) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][590/625] eta 0:00:14 lr 0.000020 wd 0.0500 time 0.3964 (0.4096) data time 0.0008 (0.0015) model time 0.3956 (0.4074) loss 6.9161 (6.2829) grad_norm 2.9097 (3.9919) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][600/625] eta 0:00:10 lr 0.000020 wd 0.0500 time 0.3954 (0.4094) data time 0.0007 (0.0015) model time 0.3947 (0.4072) loss 6.5160 (6.2847) grad_norm 78.1755 (4.1008) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][610/625] eta 0:00:06 lr 0.000020 wd 0.0500 time 0.3950 (0.4092) data time 0.0004 (0.0015) model time 0.3946 (0.4070) loss 6.1637 (6.2851) grad_norm 3.3962 (4.0934) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][620/625] eta 0:00:02 lr 0.000020 wd 0.0500 time 0.3956 (0.4089) data time 0.0006 (0.0015) model time 0.3950 (0.4068) loss 6.9453 (6.2840) grad_norm 2.6202 (4.1091) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:43:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 284 training takes 0:04:15 [2024-07-25 13:43:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:43:56 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:43:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.441 (0.441) Loss 0.5469 (0.5469) Acc@1 90.381 (90.381) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:43:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8110 (0.6558) Acc@1 83.008 (87.682) Acc@5 97.314 (98.078) Mem 14939MB [2024-07-25 13:43:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 0.9048 (0.7582) Acc@1 80.127 (84.796) Acc@5 96.045 (97.121) Mem 14939MB [2024-07-25 13:43:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.443 Acc@5 97.081 [2024-07-25 13:43:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:43:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.44% [2024-07-25 13:43:58 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 13:43:59 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 13:44:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.447 (0.447) Loss 0.5410 (0.5410) Acc@1 90.479 (90.479) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:44:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.119) Loss 0.8081 (0.6533) Acc@1 82.764 (87.624) Acc@5 97.217 (98.047) Mem 14939MB [2024-07-25 13:44:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.103) Loss 0.9043 (0.7560) Acc@1 79.297 (84.719) Acc@5 96.143 (97.112) Mem 14939MB [2024-07-25 13:44:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.327 Acc@5 97.079 [2024-07-25 13:44:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-07-25 13:44:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][0/625] eta 0:13:11 lr 0.000020 wd 0.0500 time 1.2657 (1.2657) data time 0.6511 (0.6511) model time 0.0000 (0.0000) loss 6.7263 (6.7263) grad_norm 19.0319 (19.0319) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][10/625] eta 0:04:52 lr 0.000020 wd 0.0500 time 0.3957 (0.4750) data time 0.0007 (0.0599) model time 0.0000 (0.0000) loss 6.7987 (6.1706) grad_norm 2.7644 (4.5015) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][20/625] eta 0:04:24 lr 0.000020 wd 0.0500 time 0.3949 (0.4373) data time 0.0008 (0.0317) model time 0.0000 (0.0000) loss 6.2680 (6.3475) grad_norm 3.0513 (3.9500) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][30/625] eta 0:04:25 lr 0.000020 wd 0.0500 time 0.3961 (0.4470) data time 0.0008 (0.0218) model time 0.0000 (0.0000) loss 6.6013 (6.3383) grad_norm 2.8724 (5.6127) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][40/625] eta 0:04:29 lr 0.000020 wd 0.0500 time 0.3967 (0.4609) data time 0.0006 (0.0166) model time 0.0000 (0.0000) loss 6.7272 (6.3605) grad_norm 2.0889 (5.1287) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][50/625] eta 0:04:23 lr 0.000020 wd 0.0500 time 0.3989 (0.4576) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 6.0479 (6.3205) grad_norm 3.8186 (4.8215) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][60/625] eta 0:04:14 lr 0.000020 wd 0.0500 time 0.4039 (0.4507) data time 0.0008 (0.0115) model time 0.4031 (0.4149) loss 6.6827 (6.3527) grad_norm 2.6821 (4.5490) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][70/625] eta 0:04:05 lr 0.000020 wd 0.0500 time 0.3983 (0.4430) data time 0.0009 (0.0100) model time 0.3975 (0.4051) loss 5.9006 (6.3842) grad_norm 3.3916 (4.4246) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][80/625] eta 0:03:58 lr 0.000020 wd 0.0500 time 0.3949 (0.4374) data time 0.0006 (0.0088) model time 0.3943 (0.4023) loss 7.3199 (6.4162) grad_norm 2.6988 (4.3104) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][90/625] eta 0:03:51 lr 0.000020 wd 0.0500 time 0.3972 (0.4330) data time 0.0007 (0.0080) model time 0.3965 (0.4007) loss 7.5905 (6.4252) grad_norm 2.2458 (4.2422) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][100/625] eta 0:03:45 lr 0.000020 wd 0.0500 time 0.3973 (0.4294) data time 0.0009 (0.0073) model time 0.3965 (0.3997) loss 5.8525 (6.3717) grad_norm 2.0849 (4.1217) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][110/625] eta 0:03:39 lr 0.000020 wd 0.0500 time 0.3964 (0.4265) data time 0.0006 (0.0067) model time 0.3958 (0.3992) loss 6.4629 (6.3821) grad_norm 2.6723 (4.1459) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][120/625] eta 0:03:34 lr 0.000020 wd 0.0500 time 0.4028 (0.4254) data time 0.0009 (0.0062) model time 0.4020 (0.4011) loss 6.0913 (6.3936) grad_norm 2.4465 (4.0798) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:44:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][130/625] eta 0:03:29 lr 0.000020 wd 0.0500 time 0.3953 (0.4232) data time 0.0007 (0.0058) model time 0.3946 (0.4004) loss 5.9931 (6.3987) grad_norm 3.4441 (4.0410) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][140/625] eta 0:03:24 lr 0.000020 wd 0.0500 time 0.3944 (0.4212) data time 0.0009 (0.0054) model time 0.3935 (0.3998) loss 5.6226 (6.4022) grad_norm 2.8518 (3.9788) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][150/625] eta 0:03:19 lr 0.000020 wd 0.0500 time 0.3978 (0.4197) data time 0.0008 (0.0051) model time 0.3970 (0.3996) loss 6.7871 (6.4009) grad_norm 2.5770 (4.0085) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][160/625] eta 0:03:14 lr 0.000020 wd 0.0500 time 0.3965 (0.4183) data time 0.0009 (0.0049) model time 0.3956 (0.3993) loss 6.1239 (6.3919) grad_norm 2.5036 (3.9743) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][170/625] eta 0:03:09 lr 0.000020 wd 0.0500 time 0.3959 (0.4170) data time 0.0010 (0.0046) model time 0.3949 (0.3989) loss 6.8617 (6.3782) grad_norm 2.5712 (3.9384) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][180/625] eta 0:03:05 lr 0.000020 wd 0.0500 time 0.3965 (0.4159) data time 0.0006 (0.0044) model time 0.3959 (0.3987) loss 6.2050 (6.3722) grad_norm 3.6788 (3.9248) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][190/625] eta 0:03:00 lr 0.000020 wd 0.0500 time 0.3975 (0.4150) data time 0.0008 (0.0043) model time 0.3967 (0.3986) loss 6.6827 (6.3676) grad_norm 2.1079 (3.9191) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][200/625] eta 0:02:56 lr 0.000020 wd 0.0500 time 0.3973 (0.4143) data time 0.0007 (0.0041) model time 0.3966 (0.3988) loss 4.6956 (6.3414) grad_norm 2.4574 (4.5479) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][210/625] eta 0:02:51 lr 0.000020 wd 0.0500 time 0.3981 (0.4136) data time 0.0009 (0.0039) model time 0.3972 (0.3987) loss 5.5500 (6.3375) grad_norm 1.8215 (4.4752) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][220/625] eta 0:02:47 lr 0.000020 wd 0.0500 time 0.4005 (0.4129) data time 0.0006 (0.0038) model time 0.3999 (0.3987) loss 7.2180 (6.3483) grad_norm 2.6543 (4.3947) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][230/625] eta 0:02:42 lr 0.000020 wd 0.0500 time 0.3952 (0.4125) data time 0.0008 (0.0037) model time 0.3944 (0.3989) loss 7.3253 (6.3486) grad_norm 2.5953 (4.3407) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][240/625] eta 0:02:38 lr 0.000020 wd 0.0500 time 0.5933 (0.4129) data time 0.0006 (0.0035) model time 0.5926 (0.4001) loss 5.7156 (6.3344) grad_norm 5.6840 (4.3378) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][250/625] eta 0:02:35 lr 0.000020 wd 0.0500 time 0.5739 (0.4156) data time 0.0006 (0.0034) model time 0.5733 (0.4040) loss 7.1485 (6.3437) grad_norm 4.1708 (4.3019) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][260/625] eta 0:02:32 lr 0.000020 wd 0.0500 time 0.3934 (0.4176) data time 0.0007 (0.0033) model time 0.3928 (0.4070) loss 5.3308 (6.3477) grad_norm 2.6673 (4.2830) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][270/625] eta 0:02:28 lr 0.000020 wd 0.0500 time 0.3989 (0.4189) data time 0.0007 (0.0032) model time 0.3983 (0.4091) loss 5.2755 (6.3472) grad_norm 7.4794 (4.2512) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:45:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][280/625] eta 0:02:24 lr 0.000020 wd 0.0500 time 0.3957 (0.4186) data time 0.0008 (0.0032) model time 0.3948 (0.4091) loss 5.5166 (6.3398) grad_norm 3.1631 (4.2140) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][290/625] eta 0:02:19 lr 0.000020 wd 0.0500 time 0.3956 (0.4178) data time 0.0006 (0.0031) model time 0.3951 (0.4085) loss 5.5695 (6.3465) grad_norm 2.4584 (4.1838) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][300/625] eta 0:02:15 lr 0.000020 wd 0.0500 time 0.3958 (0.4171) data time 0.0007 (0.0030) model time 0.3951 (0.4080) loss 7.3347 (6.3562) grad_norm 2.2162 (4.1387) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][310/625] eta 0:02:11 lr 0.000020 wd 0.0500 time 0.4004 (0.4164) data time 0.0007 (0.0029) model time 0.3997 (0.4075) loss 6.9872 (6.3704) grad_norm 19.9642 (4.2751) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][320/625] eta 0:02:06 lr 0.000020 wd 0.0500 time 0.3956 (0.4158) data time 0.0009 (0.0029) model time 0.3947 (0.4071) loss 6.6643 (6.3772) grad_norm 3.5994 (4.2414) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][330/625] eta 0:02:02 lr 0.000020 wd 0.0500 time 0.3754 (0.4158) data time 0.0007 (0.0028) model time 0.3747 (0.4073) loss 6.9999 (6.3700) grad_norm 2.7241 (4.2083) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][340/625] eta 0:01:58 lr 0.000020 wd 0.0500 time 0.3959 (0.4153) data time 0.0007 (0.0028) model time 0.3952 (0.4070) loss 6.9158 (6.3715) grad_norm 2.2556 (4.2528) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][350/625] eta 0:01:54 lr 0.000020 wd 0.0500 time 0.3934 (0.4148) data time 0.0007 (0.0027) model time 0.3928 (0.4067) loss 6.4126 (6.3670) grad_norm 2.6692 (4.2180) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][360/625] eta 0:01:49 lr 0.000020 wd 0.0500 time 0.3932 (0.4143) data time 0.0010 (0.0027) model time 0.3922 (0.4063) loss 5.5528 (6.3736) grad_norm 1.9746 (4.1932) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][370/625] eta 0:01:45 lr 0.000020 wd 0.0500 time 0.3962 (0.4138) data time 0.0006 (0.0026) model time 0.3956 (0.4060) loss 5.4845 (6.3643) grad_norm 4.7625 (4.1829) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][380/625] eta 0:01:41 lr 0.000020 wd 0.0500 time 0.3953 (0.4134) data time 0.0008 (0.0026) model time 0.3944 (0.4057) loss 5.6959 (6.3623) grad_norm 2.6846 (4.1804) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][390/625] eta 0:01:37 lr 0.000020 wd 0.0500 time 0.3948 (0.4129) data time 0.0007 (0.0025) model time 0.3942 (0.4054) loss 6.9468 (6.3626) grad_norm 3.1004 (4.1563) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][400/625] eta 0:01:32 lr 0.000020 wd 0.0500 time 0.3956 (0.4126) data time 0.0007 (0.0025) model time 0.3949 (0.4051) loss 5.3009 (6.3633) grad_norm 3.0662 (4.1467) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][410/625] eta 0:01:28 lr 0.000020 wd 0.0500 time 0.3947 (0.4122) data time 0.0008 (0.0025) model time 0.3939 (0.4049) loss 6.3877 (6.3635) grad_norm 3.2200 (4.2088) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][420/625] eta 0:01:24 lr 0.000020 wd 0.0500 time 0.3956 (0.4119) data time 0.0009 (0.0024) model time 0.3947 (0.4047) loss 6.4592 (6.3645) grad_norm 3.2583 (4.2054) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:46:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][430/625] eta 0:01:20 lr 0.000020 wd 0.0500 time 0.3966 (0.4115) data time 0.0006 (0.0024) model time 0.3960 (0.4044) loss 5.5446 (6.3550) grad_norm 15.7077 (4.2052) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][440/625] eta 0:01:16 lr 0.000020 wd 0.0500 time 0.3963 (0.4112) data time 0.0009 (0.0023) model time 0.3954 (0.4042) loss 6.8298 (6.3552) grad_norm 3.7790 (4.1909) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][450/625] eta 0:01:11 lr 0.000020 wd 0.0500 time 0.3955 (0.4108) data time 0.0007 (0.0023) model time 0.3948 (0.4040) loss 5.9342 (6.3565) grad_norm 2.6908 (4.1667) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][460/625] eta 0:01:07 lr 0.000020 wd 0.0500 time 0.3975 (0.4105) data time 0.0009 (0.0023) model time 0.3967 (0.4038) loss 6.7737 (6.3492) grad_norm 22.0067 (4.1828) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][470/625] eta 0:01:03 lr 0.000020 wd 0.0500 time 0.3934 (0.4122) data time 0.0009 (0.0022) model time 0.3925 (0.4058) loss 6.2783 (6.3508) grad_norm 2.8950 (4.1570) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][480/625] eta 0:01:00 lr 0.000020 wd 0.0500 time 0.5933 (0.4142) data time 0.0008 (0.0022) model time 0.5925 (0.4082) loss 5.9121 (6.3476) grad_norm 2.5395 (4.1565) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][490/625] eta 0:00:55 lr 0.000020 wd 0.0500 time 0.3936 (0.4147) data time 0.0008 (0.0022) model time 0.3928 (0.4089) loss 5.5159 (6.3417) grad_norm 2.4373 (4.1336) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][500/625] eta 0:00:51 lr 0.000020 wd 0.0500 time 0.3972 (0.4146) data time 0.0008 (0.0022) model time 0.3963 (0.4088) loss 5.4904 (6.3409) grad_norm 4.1207 (4.1242) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][510/625] eta 0:00:47 lr 0.000020 wd 0.0500 time 0.3971 (0.4142) data time 0.0008 (0.0021) model time 0.3962 (0.4085) loss 6.0745 (6.3360) grad_norm 2.3727 (4.1066) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][520/625] eta 0:00:43 lr 0.000019 wd 0.0500 time 0.3948 (0.4139) data time 0.0008 (0.0021) model time 0.3940 (0.4083) loss 7.1380 (6.3381) grad_norm 3.2420 (4.1067) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][530/625] eta 0:00:39 lr 0.000019 wd 0.0500 time 0.3949 (0.4135) data time 0.0008 (0.0021) model time 0.3941 (0.4080) loss 6.4141 (6.3418) grad_norm 3.6135 (4.1010) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][540/625] eta 0:00:35 lr 0.000019 wd 0.0500 time 0.3965 (0.4132) data time 0.0009 (0.0021) model time 0.3956 (0.4077) loss 5.7417 (6.3395) grad_norm 2.8992 (4.0992) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][550/625] eta 0:00:30 lr 0.000019 wd 0.0500 time 0.3946 (0.4129) data time 0.0006 (0.0020) model time 0.3939 (0.4075) loss 6.9314 (6.3400) grad_norm 6.2142 (4.0861) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 13:47:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][560/625] eta 0:00:26 lr 0.000019 wd 0.0500 time 0.3921 (0.4129) data time 0.0008 (0.0020) model time 0.3913 (0.4076) loss 6.9858 (6.3409) grad_norm 2.3315 (inf) loss_scale 64.0000 (127.7718) mem 14939MB [2024-07-25 13:47:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][570/625] eta 0:00:22 lr 0.000019 wd 0.0500 time 0.3915 (0.4126) data time 0.0010 (0.0020) model time 0.3905 (0.4074) loss 6.3561 (6.3420) grad_norm 2.5653 (inf) loss_scale 64.0000 (126.6550) mem 14939MB [2024-07-25 13:48:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][580/625] eta 0:00:18 lr 0.000019 wd 0.0500 time 0.3969 (0.4124) data time 0.0007 (0.0020) model time 0.3962 (0.4072) loss 7.2852 (6.3436) grad_norm 2.1405 (inf) loss_scale 64.0000 (125.5766) mem 14939MB [2024-07-25 13:48:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][590/625] eta 0:00:14 lr 0.000019 wd 0.0500 time 0.3957 (0.4121) data time 0.0007 (0.0020) model time 0.3951 (0.4070) loss 5.4295 (6.3358) grad_norm 3.0481 (inf) loss_scale 64.0000 (124.5347) mem 14939MB [2024-07-25 13:48:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][600/625] eta 0:00:10 lr 0.000019 wd 0.0500 time 0.3969 (0.4119) data time 0.0006 (0.0019) model time 0.3963 (0.4068) loss 6.2519 (6.3352) grad_norm 2.5824 (inf) loss_scale 64.0000 (123.5275) mem 14939MB [2024-07-25 13:48:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][610/625] eta 0:00:06 lr 0.000019 wd 0.0500 time 0.3932 (0.4116) data time 0.0004 (0.0019) model time 0.3928 (0.4066) loss 4.9692 (6.3285) grad_norm 2.3970 (inf) loss_scale 64.0000 (122.5532) mem 14939MB [2024-07-25 13:48:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][620/625] eta 0:00:02 lr 0.000019 wd 0.0500 time 0.3961 (0.4114) data time 0.0006 (0.0019) model time 0.3955 (0.4064) loss 6.3185 (6.3216) grad_norm 3.4053 (inf) loss_scale 64.0000 (121.6103) mem 14939MB [2024-07-25 13:48:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 285 training takes 0:04:17 [2024-07-25 13:48:19 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:48:19 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:48:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5435 (0.5435) Acc@1 90.381 (90.381) Acc@5 99.072 (99.072) Mem 14939MB [2024-07-25 13:48:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8052 (0.6516) Acc@1 82.666 (87.753) Acc@5 97.168 (98.042) Mem 14939MB [2024-07-25 13:48:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.8950 (0.7538) Acc@1 79.297 (84.814) Acc@5 95.947 (97.108) Mem 14939MB [2024-07-25 13:48:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.433 Acc@5 97.079 [2024-07-25 13:48:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:48:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.891 (0.891) Loss 0.5410 (0.5410) Acc@1 90.479 (90.479) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:48:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.160) Loss 0.8071 (0.6533) Acc@1 82.715 (87.642) Acc@5 97.266 (98.047) Mem 14939MB [2024-07-25 13:48:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.125) Loss 0.9043 (0.7559) Acc@1 79.248 (84.745) Acc@5 96.094 (97.117) Mem 14939MB [2024-07-25 13:48:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.353 Acc@5 97.083 [2024-07-25 13:48:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:48:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.35% [2024-07-25 13:48:25 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:48:27 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:48:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][0/625] eta 0:08:15 lr 0.000019 wd 0.0500 time 0.7928 (0.7928) data time 0.4164 (0.4164) model time 0.0000 (0.0000) loss 6.6747 (6.6747) grad_norm 3.6982 (3.6982) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:48:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][10/625] eta 0:04:25 lr 0.000019 wd 0.0500 time 0.3955 (0.4320) data time 0.0008 (0.0386) model time 0.0000 (0.0000) loss 5.2816 (6.2834) grad_norm 3.8130 (4.2264) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:48:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][20/625] eta 0:04:11 lr 0.000019 wd 0.0500 time 0.3913 (0.4149) data time 0.0007 (0.0206) model time 0.0000 (0.0000) loss 6.6151 (6.2329) grad_norm 7.2807 (3.9744) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:48:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][30/625] eta 0:04:03 lr 0.000019 wd 0.0500 time 0.3997 (0.4090) data time 0.0009 (0.0142) model time 0.0000 (0.0000) loss 6.6100 (6.1443) grad_norm 6.4910 (4.0246) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:48:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][40/625] eta 0:03:57 lr 0.000019 wd 0.0500 time 0.3958 (0.4058) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 5.7890 (6.1327) grad_norm 2.0025 (3.9687) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:48:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][50/625] eta 0:03:52 lr 0.000019 wd 0.0500 time 0.3873 (0.4042) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 6.2813 (6.1526) grad_norm 27.9599 (4.3634) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:48:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][60/625] eta 0:03:51 lr 0.000019 wd 0.0500 time 0.5407 (0.4093) data time 0.0006 (0.0077) model time 0.5401 (0.4345) loss 6.4597 (6.2371) grad_norm 2.2887 (4.2359) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:48:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][70/625] eta 0:03:53 lr 0.000019 wd 0.0500 time 0.6099 (0.4207) data time 0.0010 (0.0067) model time 0.6089 (0.4619) loss 6.6020 (6.2427) grad_norm 2.5673 (4.1392) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][80/625] eta 0:03:54 lr 0.000019 wd 0.0500 time 0.5989 (0.4308) data time 0.0007 (0.0060) model time 0.5982 (0.4752) loss 5.5634 (6.2375) grad_norm 2.2210 (4.1046) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][90/625] eta 0:03:50 lr 0.000019 wd 0.0500 time 0.5557 (0.4305) data time 0.0009 (0.0055) model time 0.5548 (0.4631) loss 6.3358 (6.2146) grad_norm 2.6138 (4.0308) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][100/625] eta 0:03:44 lr 0.000019 wd 0.0500 time 0.3961 (0.4271) data time 0.0007 (0.0050) model time 0.3954 (0.4497) loss 6.7635 (6.2407) grad_norm 4.8564 (4.0032) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][110/625] eta 0:03:38 lr 0.000019 wd 0.0500 time 0.3999 (0.4244) data time 0.0008 (0.0046) model time 0.3990 (0.4406) loss 6.3238 (6.2533) grad_norm 3.7078 (4.2505) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][120/625] eta 0:03:33 lr 0.000019 wd 0.0500 time 0.3911 (0.4220) data time 0.0009 (0.0043) model time 0.3902 (0.4341) loss 6.8951 (6.2652) grad_norm 1.9893 (4.2520) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][130/625] eta 0:03:27 lr 0.000019 wd 0.0500 time 0.3959 (0.4200) data time 0.0007 (0.0040) model time 0.3952 (0.4293) loss 7.0384 (6.2826) grad_norm 4.7594 (4.1973) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][140/625] eta 0:03:22 lr 0.000019 wd 0.0500 time 0.4025 (0.4184) data time 0.0009 (0.0038) model time 0.4016 (0.4256) loss 6.8243 (6.2893) grad_norm 3.1712 (4.2411) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][150/625] eta 0:03:18 lr 0.000019 wd 0.0500 time 0.3937 (0.4171) data time 0.0009 (0.0036) model time 0.3929 (0.4228) loss 4.9642 (6.2779) grad_norm 2.2360 (4.2758) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][160/625] eta 0:03:13 lr 0.000019 wd 0.0500 time 0.3962 (0.4158) data time 0.0007 (0.0034) model time 0.3955 (0.4203) loss 5.9120 (6.2813) grad_norm 5.0864 (4.2224) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][170/625] eta 0:03:08 lr 0.000019 wd 0.0500 time 0.3981 (0.4147) data time 0.0008 (0.0033) model time 0.3973 (0.4183) loss 6.5482 (6.2772) grad_norm 4.3198 (4.2056) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][180/625] eta 0:03:04 lr 0.000019 wd 0.0500 time 0.3933 (0.4137) data time 0.0008 (0.0032) model time 0.3924 (0.4166) loss 6.4935 (6.2800) grad_norm 3.2274 (4.1706) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][190/625] eta 0:02:59 lr 0.000019 wd 0.0500 time 0.3954 (0.4128) data time 0.0008 (0.0030) model time 0.3945 (0.4151) loss 5.4313 (6.2815) grad_norm 3.1544 (4.2079) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][200/625] eta 0:02:55 lr 0.000019 wd 0.0500 time 0.3954 (0.4120) data time 0.0007 (0.0029) model time 0.3946 (0.4138) loss 6.5843 (6.2752) grad_norm 2.3432 (4.2103) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][210/625] eta 0:02:50 lr 0.000019 wd 0.0500 time 0.3947 (0.4112) data time 0.0009 (0.0028) model time 0.3938 (0.4126) loss 5.9295 (6.2726) grad_norm 2.9794 (4.2148) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:49:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][220/625] eta 0:02:46 lr 0.000019 wd 0.0500 time 0.3962 (0.4105) data time 0.0009 (0.0027) model time 0.3953 (0.4115) loss 6.0131 (6.2767) grad_norm 4.0757 (4.1916) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][230/625] eta 0:02:41 lr 0.000019 wd 0.0500 time 0.3968 (0.4099) data time 0.0008 (0.0027) model time 0.3959 (0.4106) loss 5.0305 (6.2632) grad_norm 2.2908 (4.1765) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][240/625] eta 0:02:37 lr 0.000019 wd 0.0500 time 0.3944 (0.4093) data time 0.0006 (0.0026) model time 0.3938 (0.4098) loss 5.9291 (6.2617) grad_norm 3.0896 (4.1976) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][250/625] eta 0:02:33 lr 0.000019 wd 0.0500 time 0.3951 (0.4088) data time 0.0008 (0.0025) model time 0.3943 (0.4091) loss 5.0628 (6.2505) grad_norm 3.8753 (4.1696) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][260/625] eta 0:02:29 lr 0.000019 wd 0.0500 time 0.3971 (0.4084) data time 0.0009 (0.0025) model time 0.3962 (0.4085) loss 6.3065 (6.2439) grad_norm 3.0746 (4.1649) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][270/625] eta 0:02:24 lr 0.000019 wd 0.0500 time 0.3946 (0.4080) data time 0.0008 (0.0024) model time 0.3938 (0.4080) loss 6.1857 (6.2463) grad_norm 3.6195 (4.3364) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][280/625] eta 0:02:21 lr 0.000019 wd 0.0500 time 0.6085 (0.4088) data time 0.0008 (0.0023) model time 0.6077 (0.4089) loss 5.3272 (6.2474) grad_norm 4.1937 (4.2894) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][290/625] eta 0:02:17 lr 0.000019 wd 0.0500 time 0.3974 (0.4118) data time 0.0007 (0.0023) model time 0.3967 (0.4125) loss 6.1039 (6.2508) grad_norm 3.6825 (4.2540) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][300/625] eta 0:02:14 lr 0.000019 wd 0.0500 time 0.4018 (0.4137) data time 0.0009 (0.0022) model time 0.4009 (0.4148) loss 7.7618 (6.2609) grad_norm 1.8751 (4.2762) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][310/625] eta 0:02:10 lr 0.000019 wd 0.0500 time 0.5406 (0.4150) data time 0.0007 (0.0022) model time 0.5399 (0.4163) loss 7.1584 (6.2640) grad_norm 2.7905 (4.2545) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][320/625] eta 0:02:06 lr 0.000019 wd 0.0500 time 0.4048 (0.4144) data time 0.0006 (0.0022) model time 0.4042 (0.4155) loss 5.2968 (6.2654) grad_norm 3.5569 (4.2420) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][330/625] eta 0:02:02 lr 0.000019 wd 0.0500 time 0.3962 (0.4139) data time 0.0009 (0.0021) model time 0.3954 (0.4148) loss 6.0006 (6.2601) grad_norm 3.2672 (4.1970) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][340/625] eta 0:01:57 lr 0.000019 wd 0.0500 time 0.3973 (0.4134) data time 0.0008 (0.0021) model time 0.3965 (0.4141) loss 6.8577 (6.2564) grad_norm 4.5997 (4.1818) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][350/625] eta 0:01:53 lr 0.000019 wd 0.0500 time 0.3948 (0.4129) data time 0.0007 (0.0020) model time 0.3941 (0.4135) loss 6.2128 (6.2517) grad_norm 4.0565 (4.1540) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:50:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][360/625] eta 0:01:49 lr 0.000019 wd 0.0500 time 0.3957 (0.4125) data time 0.0006 (0.0020) model time 0.3951 (0.4130) loss 5.8811 (6.2572) grad_norm 4.9541 (4.1254) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][370/625] eta 0:01:45 lr 0.000019 wd 0.0500 time 0.3956 (0.4120) data time 0.0006 (0.0020) model time 0.3950 (0.4124) loss 6.7508 (6.2548) grad_norm 4.3860 (4.2821) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][380/625] eta 0:01:40 lr 0.000019 wd 0.0500 time 0.3975 (0.4116) data time 0.0006 (0.0020) model time 0.3969 (0.4119) loss 5.5093 (6.2536) grad_norm 2.6934 (4.2879) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][390/625] eta 0:01:36 lr 0.000019 wd 0.0500 time 0.3962 (0.4112) data time 0.0008 (0.0019) model time 0.3954 (0.4114) loss 6.8566 (6.2566) grad_norm 2.1840 (4.3682) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][400/625] eta 0:01:32 lr 0.000019 wd 0.0500 time 0.3945 (0.4109) data time 0.0007 (0.0019) model time 0.3938 (0.4110) loss 5.5958 (6.2559) grad_norm 2.7852 (4.3303) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][410/625] eta 0:01:28 lr 0.000019 wd 0.0500 time 0.3966 (0.4105) data time 0.0008 (0.0019) model time 0.3958 (0.4105) loss 6.2216 (6.2555) grad_norm 3.3667 (4.3997) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][420/625] eta 0:01:24 lr 0.000019 wd 0.0500 time 0.3957 (0.4102) data time 0.0008 (0.0018) model time 0.3949 (0.4101) loss 5.7108 (6.2500) grad_norm 3.0411 (4.3812) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][430/625] eta 0:01:19 lr 0.000019 wd 0.0500 time 0.3995 (0.4099) data time 0.0007 (0.0018) model time 0.3989 (0.4098) loss 6.3688 (6.2541) grad_norm 2.0594 (4.3612) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][440/625] eta 0:01:15 lr 0.000019 wd 0.0500 time 0.3955 (0.4096) data time 0.0007 (0.0018) model time 0.3948 (0.4094) loss 5.0993 (6.2550) grad_norm 3.9591 (4.3455) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][450/625] eta 0:01:11 lr 0.000019 wd 0.0500 time 0.3945 (0.4093) data time 0.0006 (0.0018) model time 0.3938 (0.4091) loss 6.0733 (6.2625) grad_norm 3.6291 (4.3457) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][460/625] eta 0:01:07 lr 0.000019 wd 0.0500 time 0.3956 (0.4090) data time 0.0009 (0.0018) model time 0.3947 (0.4087) loss 5.6887 (6.2643) grad_norm 2.1650 (4.3391) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][470/625] eta 0:01:03 lr 0.000019 wd 0.0500 time 0.3946 (0.4087) data time 0.0006 (0.0017) model time 0.3940 (0.4084) loss 6.7053 (6.2611) grad_norm 4.4062 (4.3190) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][480/625] eta 0:00:59 lr 0.000019 wd 0.0500 time 0.3989 (0.4084) data time 0.0006 (0.0017) model time 0.3983 (0.4081) loss 5.5586 (6.2600) grad_norm 4.4450 (4.2946) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][490/625] eta 0:00:55 lr 0.000019 wd 0.0500 time 0.3960 (0.4082) data time 0.0008 (0.0017) model time 0.3952 (0.4078) loss 5.9800 (6.2647) grad_norm 4.7719 (4.2817) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][500/625] eta 0:00:51 lr 0.000019 wd 0.0500 time 0.3959 (0.4083) data time 0.0006 (0.0017) model time 0.3953 (0.4079) loss 5.7628 (6.2609) grad_norm 3.4835 (4.2543) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:51:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][510/625] eta 0:00:47 lr 0.000018 wd 0.0500 time 0.5717 (0.4101) data time 0.0008 (0.0017) model time 0.5709 (0.4100) loss 6.9739 (6.2630) grad_norm 4.0436 (4.2310) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][520/625] eta 0:00:43 lr 0.000018 wd 0.0500 time 0.3993 (0.4116) data time 0.0009 (0.0017) model time 0.3984 (0.4115) loss 5.7831 (6.2624) grad_norm 3.2498 (4.2656) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][530/625] eta 0:00:39 lr 0.000018 wd 0.0500 time 0.3937 (0.4126) data time 0.0009 (0.0016) model time 0.3927 (0.4127) loss 5.8909 (6.2661) grad_norm 5.5915 (4.2627) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][540/625] eta 0:00:35 lr 0.000018 wd 0.0500 time 0.3957 (0.4123) data time 0.0006 (0.0016) model time 0.3951 (0.4123) loss 6.2189 (6.2650) grad_norm 4.4137 (4.2550) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][550/625] eta 0:00:30 lr 0.000018 wd 0.0500 time 0.4022 (0.4120) data time 0.0007 (0.0016) model time 0.4015 (0.4120) loss 6.8064 (6.2637) grad_norm 3.3941 (4.2827) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][560/625] eta 0:00:26 lr 0.000018 wd 0.0500 time 0.3963 (0.4117) data time 0.0008 (0.0016) model time 0.3955 (0.4116) loss 6.7743 (6.2671) grad_norm 3.5302 (4.2611) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][570/625] eta 0:00:22 lr 0.000018 wd 0.0500 time 0.3956 (0.4115) data time 0.0008 (0.0016) model time 0.3948 (0.4114) loss 7.0690 (6.2652) grad_norm 2.6553 (4.2473) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][580/625] eta 0:00:18 lr 0.000018 wd 0.0500 time 0.3959 (0.4112) data time 0.0008 (0.0016) model time 0.3951 (0.4110) loss 5.8628 (6.2675) grad_norm 5.7961 (4.2338) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][590/625] eta 0:00:14 lr 0.000018 wd 0.0500 time 0.4029 (0.4110) data time 0.0008 (0.0016) model time 0.4021 (0.4108) loss 6.1829 (6.2669) grad_norm 4.9372 (4.2856) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][600/625] eta 0:00:10 lr 0.000018 wd 0.0500 time 0.3963 (0.4107) data time 0.0009 (0.0015) model time 0.3954 (0.4105) loss 6.1272 (6.2642) grad_norm 4.3708 (4.2987) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][610/625] eta 0:00:06 lr 0.000018 wd 0.0500 time 0.3930 (0.4105) data time 0.0004 (0.0015) model time 0.3926 (0.4102) loss 5.7103 (6.2607) grad_norm 5.9963 (4.2958) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][620/625] eta 0:00:02 lr 0.000018 wd 0.0500 time 0.3941 (0.4102) data time 0.0005 (0.0015) model time 0.3935 (0.4099) loss 5.5908 (6.2582) grad_norm 3.6730 (4.3413) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 286 training takes 0:04:16 [2024-07-25 13:52:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:52:44 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:52:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5464 (0.5464) Acc@1 90.381 (90.381) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 13:52:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8125 (0.6573) Acc@1 82.520 (87.651) Acc@5 97.168 (98.047) Mem 14939MB [2024-07-25 13:52:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.8989 (0.7593) Acc@1 79.736 (84.803) Acc@5 96.094 (97.138) Mem 14939MB [2024-07-25 13:52:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.405 Acc@5 97.101 [2024-07-25 13:52:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:52:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.915 (0.915) Loss 0.5415 (0.5415) Acc@1 90.576 (90.576) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 13:52:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.162) Loss 0.8076 (0.6532) Acc@1 82.715 (87.669) Acc@5 97.217 (98.056) Mem 14939MB [2024-07-25 13:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.126) Loss 0.9038 (0.7558) Acc@1 79.199 (84.749) Acc@5 96.094 (97.121) Mem 14939MB [2024-07-25 13:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.355 Acc@5 97.085 [2024-07-25 13:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 13:52:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.36% [2024-07-25 13:52:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 13:52:51 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 13:52:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][0/625] eta 0:08:07 lr 0.000018 wd 0.0500 time 0.7797 (0.7797) data time 0.4086 (0.4086) model time 0.0000 (0.0000) loss 6.1499 (6.1499) grad_norm 32.8339 (32.8339) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:52:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][10/625] eta 0:04:25 lr 0.000018 wd 0.0500 time 0.4018 (0.4315) data time 0.0007 (0.0379) model time 0.0000 (0.0000) loss 5.3874 (6.2367) grad_norm 2.4300 (6.4235) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][20/625] eta 0:04:10 lr 0.000018 wd 0.0500 time 0.3927 (0.4140) data time 0.0007 (0.0202) model time 0.0000 (0.0000) loss 6.7470 (6.3160) grad_norm 2.7789 (5.5710) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][30/625] eta 0:04:02 lr 0.000018 wd 0.0500 time 0.3958 (0.4081) data time 0.0006 (0.0139) model time 0.0000 (0.0000) loss 5.6705 (6.2466) grad_norm 2.7247 (4.9230) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][40/625] eta 0:03:57 lr 0.000018 wd 0.0500 time 0.3966 (0.4054) data time 0.0006 (0.0107) model time 0.0000 (0.0000) loss 5.6632 (6.1878) grad_norm 7.1999 (4.9756) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][50/625] eta 0:03:52 lr 0.000018 wd 0.0500 time 0.3939 (0.4035) data time 0.0007 (0.0088) model time 0.0000 (0.0000) loss 5.3346 (6.1858) grad_norm 3.1172 (4.5214) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][60/625] eta 0:03:48 lr 0.000018 wd 0.0500 time 0.3944 (0.4049) data time 0.0008 (0.0075) model time 0.3936 (0.4111) loss 7.0590 (6.2359) grad_norm 2.8875 (4.3995) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][70/625] eta 0:03:44 lr 0.000018 wd 0.0500 time 0.4029 (0.4037) data time 0.0006 (0.0066) model time 0.4024 (0.4034) loss 6.4041 (6.1769) grad_norm 4.2991 (4.2477) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][80/625] eta 0:03:39 lr 0.000018 wd 0.0500 time 0.3913 (0.4026) data time 0.0008 (0.0059) model time 0.3905 (0.4004) loss 6.9042 (6.1956) grad_norm 7.9391 (4.2730) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][90/625] eta 0:03:35 lr 0.000018 wd 0.0500 time 0.3947 (0.4019) data time 0.0008 (0.0053) model time 0.3939 (0.3991) loss 7.1385 (6.2355) grad_norm 5.7951 (4.2374) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][100/625] eta 0:03:34 lr 0.000018 wd 0.0500 time 0.5773 (0.4084) data time 0.0006 (0.0049) model time 0.5767 (0.4124) loss 6.6498 (6.2375) grad_norm 3.1498 (4.0880) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][110/625] eta 0:03:33 lr 0.000018 wd 0.0500 time 0.3953 (0.4148) data time 0.0008 (0.0045) model time 0.3945 (0.4234) loss 6.9528 (6.2569) grad_norm 2.6244 (4.0016) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][120/625] eta 0:03:31 lr 0.000018 wd 0.0500 time 0.3955 (0.4188) data time 0.0007 (0.0042) model time 0.3948 (0.4292) loss 6.3884 (6.2740) grad_norm 3.7888 (4.0245) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][130/625] eta 0:03:27 lr 0.000018 wd 0.0500 time 0.3950 (0.4193) data time 0.0008 (0.0040) model time 0.3942 (0.4284) loss 6.1745 (6.2887) grad_norm 2.5709 (3.9894) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][140/625] eta 0:03:22 lr 0.000018 wd 0.0500 time 0.3953 (0.4175) data time 0.0007 (0.0037) model time 0.3946 (0.4246) loss 6.0963 (6.2843) grad_norm 5.0592 (4.0177) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][150/625] eta 0:03:17 lr 0.000018 wd 0.0500 time 0.3944 (0.4161) data time 0.0006 (0.0035) model time 0.3938 (0.4217) loss 6.3252 (6.2967) grad_norm 3.5397 (3.9710) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:53:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][160/625] eta 0:03:12 lr 0.000018 wd 0.0500 time 0.3954 (0.4149) data time 0.0007 (0.0034) model time 0.3947 (0.4193) loss 6.4512 (6.2940) grad_norm 3.4299 (4.0608) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][170/625] eta 0:03:08 lr 0.000018 wd 0.0500 time 0.4002 (0.4138) data time 0.0009 (0.0032) model time 0.3994 (0.4173) loss 6.3101 (6.2954) grad_norm 2.6025 (4.0373) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][180/625] eta 0:03:03 lr 0.000018 wd 0.0500 time 0.3946 (0.4129) data time 0.0007 (0.0031) model time 0.3940 (0.4157) loss 5.9714 (6.3092) grad_norm 3.8883 (3.9778) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][190/625] eta 0:02:59 lr 0.000018 wd 0.0500 time 0.3930 (0.4121) data time 0.0009 (0.0030) model time 0.3922 (0.4144) loss 6.5514 (6.3075) grad_norm 2.7966 (3.9305) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][200/625] eta 0:02:54 lr 0.000018 wd 0.0500 time 0.4047 (0.4114) data time 0.0007 (0.0029) model time 0.4041 (0.4132) loss 5.9728 (6.3118) grad_norm 2.8159 (3.8806) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][210/625] eta 0:02:50 lr 0.000018 wd 0.0500 time 0.3967 (0.4108) data time 0.0008 (0.0028) model time 0.3958 (0.4123) loss 6.2812 (6.2956) grad_norm 2.4185 (3.8369) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][220/625] eta 0:02:46 lr 0.000018 wd 0.0500 time 0.3955 (0.4102) data time 0.0008 (0.0027) model time 0.3947 (0.4113) loss 6.3160 (6.3015) grad_norm 7.0293 (3.8334) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][230/625] eta 0:02:41 lr 0.000018 wd 0.0500 time 0.3998 (0.4096) data time 0.0008 (0.0026) model time 0.3990 (0.4105) loss 5.6507 (6.2998) grad_norm 2.0323 (3.8230) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][240/625] eta 0:02:37 lr 0.000018 wd 0.0500 time 0.3939 (0.4093) data time 0.0009 (0.0026) model time 0.3930 (0.4100) loss 7.1113 (6.3002) grad_norm 27.0798 (3.8826) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][250/625] eta 0:02:33 lr 0.000018 wd 0.0500 time 0.3950 (0.4088) data time 0.0008 (0.0025) model time 0.3942 (0.4092) loss 6.8043 (6.3041) grad_norm 5.1740 (3.8702) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][260/625] eta 0:02:29 lr 0.000018 wd 0.0500 time 0.4046 (0.4085) data time 0.0007 (0.0024) model time 0.4040 (0.4088) loss 5.9942 (6.3028) grad_norm 4.6987 (3.9415) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][270/625] eta 0:02:24 lr 0.000018 wd 0.0500 time 0.3945 (0.4081) data time 0.0008 (0.0024) model time 0.3937 (0.4082) loss 5.7648 (6.3186) grad_norm 2.2270 (3.9193) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][280/625] eta 0:02:20 lr 0.000018 wd 0.0500 time 0.3956 (0.4087) data time 0.0006 (0.0023) model time 0.3950 (0.4089) loss 5.2388 (6.3159) grad_norm 2.5624 (3.9044) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][290/625] eta 0:02:16 lr 0.000018 wd 0.0500 time 0.3971 (0.4082) data time 0.0008 (0.0023) model time 0.3963 (0.4084) loss 6.2631 (6.3234) grad_norm 2.7183 (3.9002) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][300/625] eta 0:02:12 lr 0.000018 wd 0.0500 time 0.3972 (0.4079) data time 0.0006 (0.0022) model time 0.3966 (0.4079) loss 6.1052 (6.3282) grad_norm 2.8848 (3.9015) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:54:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][310/625] eta 0:02:08 lr 0.000018 wd 0.0500 time 0.3985 (0.4075) data time 0.0008 (0.0022) model time 0.3977 (0.4074) loss 5.5775 (6.3346) grad_norm 3.9880 (3.9202) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:55:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][320/625] eta 0:02:04 lr 0.000018 wd 0.0500 time 0.6007 (0.4088) data time 0.0009 (0.0021) model time 0.5998 (0.4089) loss 6.9432 (6.3383) grad_norm 3.1883 (3.8985) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:55:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][330/625] eta 0:02:01 lr 0.000018 wd 0.0500 time 0.3945 (0.4115) data time 0.0007 (0.0021) model time 0.3939 (0.4121) loss 5.7114 (6.3333) grad_norm 2.3596 (3.9033) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 13:55:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 13:55:11 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 13:55:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 13:57:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 13:57:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 13:57:46 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 13:58:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 13:58:13 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 13:58:13 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 13:58:13 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 13:58:13 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 287) [2024-07-25 13:58:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 13:58:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][340/625] eta 0:18:55 lr 0.000018 wd 0.0500 time 0.4071 (3.9843) data time 0.0009 (0.2390) model time 0.4062 (3.7453) loss 6.1493 (6.8681) grad_norm 4.9511 (4.3848) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:58:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][350/625] eta 0:05:41 lr 0.000018 wd 0.0500 time 0.4252 (1.2432) data time 0.0011 (0.0560) model time 0.4241 (1.1872) loss 6.0877 (6.3896) grad_norm 3.4766 (14.3925) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:58:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][360/625] eta 0:03:54 lr 0.000018 wd 0.0500 time 0.4148 (0.8838) data time 0.0009 (0.0321) model time 0.4139 (0.8517) loss 6.7813 (6.3942) grad_norm 2.7334 (9.4564) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:58:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][370/625] eta 0:03:09 lr 0.000018 wd 0.0500 time 0.4209 (0.7423) data time 0.0009 (0.0227) model time 0.4200 (0.7196) loss 6.6502 (6.4496) grad_norm 6.5341 (8.2930) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:58:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][380/625] eta 0:02:43 lr 0.000018 wd 0.0500 time 0.4109 (0.6667) data time 0.0011 (0.0177) model time 0.4098 (0.6490) loss 5.9577 (6.4060) grad_norm 3.3018 (7.0557) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:58:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][390/625] eta 0:02:26 lr 0.000018 wd 0.0500 time 0.4134 (0.6235) data time 0.0010 (0.0146) model time 0.4123 (0.6090) loss 6.1798 (6.3642) grad_norm 4.3260 (6.3674) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:58:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][400/625] eta 0:02:13 lr 0.000018 wd 0.0500 time 0.4267 (0.5953) data time 0.0010 (0.0124) model time 0.4257 (0.5829) loss 6.2796 (6.3248) grad_norm 3.7493 (5.8821) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][410/625] eta 0:02:02 lr 0.000018 wd 0.0500 time 0.4195 (0.5707) data time 0.0011 (0.0109) model time 0.4185 (0.5598) loss 7.0651 (6.3151) grad_norm 2.1471 (5.6442) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][420/625] eta 0:01:53 lr 0.000018 wd 0.0500 time 0.4100 (0.5523) data time 0.0009 (0.0097) model time 0.4091 (0.5426) loss 4.9035 (6.2897) grad_norm 3.2255 (5.5571) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][430/625] eta 0:01:44 lr 0.000018 wd 0.0500 time 0.4113 (0.5376) data time 0.0009 (0.0087) model time 0.4105 (0.5288) loss 6.5924 (6.2919) grad_norm 2.5191 (5.3349) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][440/625] eta 0:01:37 lr 0.000018 wd 0.0500 time 0.4164 (0.5259) data time 0.0009 (0.0080) model time 0.4154 (0.5179) loss 7.3443 (6.3388) grad_norm 3.5727 (5.1245) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][450/625] eta 0:01:30 lr 0.000018 wd 0.0500 time 0.4125 (0.5161) data time 0.0008 (0.0074) model time 0.4117 (0.5087) loss 5.9923 (6.3297) grad_norm 4.3184 (5.0456) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][460/625] eta 0:01:23 lr 0.000018 wd 0.0500 time 0.4154 (0.5079) data time 0.0011 (0.0069) model time 0.4143 (0.5010) loss 5.8036 (6.3454) grad_norm 1.9509 (4.9127) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][470/625] eta 0:01:17 lr 0.000018 wd 0.0500 time 0.4106 (0.5007) data time 0.0011 (0.0064) model time 0.4095 (0.4943) loss 6.7144 (6.3365) grad_norm 3.1849 (4.8015) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][480/625] eta 0:01:11 lr 0.000018 wd 0.0500 time 0.4195 (0.4946) data time 0.0011 (0.0061) model time 0.4184 (0.4886) loss 6.9834 (6.3205) grad_norm 4.0128 (4.7024) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][490/625] eta 0:01:06 lr 0.000018 wd 0.0500 time 0.4195 (0.4895) data time 0.0009 (0.0057) model time 0.4187 (0.4838) loss 6.2044 (6.3132) grad_norm 3.5677 (4.6270) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][500/625] eta 0:01:00 lr 0.000018 wd 0.0500 time 0.4217 (0.4852) data time 0.0008 (0.0054) model time 0.4209 (0.4797) loss 4.9479 (6.3230) grad_norm 2.1904 (4.5701) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][510/625] eta 0:00:55 lr 0.000018 wd 0.0500 time 0.4102 (0.4812) data time 0.0011 (0.0052) model time 0.4092 (0.4760) loss 7.1468 (6.3219) grad_norm 5.0548 (4.5268) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][520/625] eta 0:00:50 lr 0.000018 wd 0.0500 time 0.4182 (0.4776) data time 0.0011 (0.0050) model time 0.4171 (0.4727) loss 7.0414 (6.3128) grad_norm 3.1025 (4.4733) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][530/625] eta 0:00:45 lr 0.000018 wd 0.0500 time 0.4167 (0.4744) data time 0.0011 (0.0048) model time 0.4157 (0.4696) loss 5.7795 (6.3089) grad_norm 2.8739 (4.4673) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][540/625] eta 0:00:40 lr 0.000017 wd 0.0500 time 0.4157 (0.4715) data time 0.0009 (0.0046) model time 0.4148 (0.4669) loss 5.9296 (6.2909) grad_norm 2.1657 (4.3867) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 13:59:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][550/625] eta 0:00:35 lr 0.000017 wd 0.0500 time 0.4071 (0.4689) data time 0.0011 (0.0044) model time 0.4060 (0.4644) loss 5.1438 (6.2824) grad_norm 4.4689 (4.3544) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 14:00:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][560/625] eta 0:00:30 lr 0.000017 wd 0.0500 time 0.4107 (0.4665) data time 0.0009 (0.0043) model time 0.4098 (0.4622) loss 5.2922 (6.2787) grad_norm 2.5060 (4.3470) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 14:00:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][570/625] eta 0:00:25 lr 0.000017 wd 0.0500 time 0.4153 (0.4644) data time 0.0011 (0.0041) model time 0.4142 (0.4602) loss 6.2358 (6.2779) grad_norm 3.0556 (4.3247) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 14:00:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][580/625] eta 0:00:20 lr 0.000017 wd 0.0500 time 0.4159 (0.4623) data time 0.0011 (0.0040) model time 0.4148 (0.4583) loss 7.5621 (6.2842) grad_norm 3.3272 (4.3026) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 14:00:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][590/625] eta 0:00:16 lr 0.000017 wd 0.0500 time 0.4093 (0.4605) data time 0.0010 (0.0039) model time 0.4082 (0.4566) loss 6.6993 (6.2782) grad_norm 6.1001 (4.2648) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 14:00:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][600/625] eta 0:00:11 lr 0.000017 wd 0.0500 time 0.4287 (0.4588) data time 0.0008 (0.0038) model time 0.4279 (0.4550) loss 5.2972 (6.2659) grad_norm 2.8845 (4.2384) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 14:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][610/625] eta 0:00:06 lr 0.000017 wd 0.0500 time 0.4136 (0.4580) data time 0.0006 (0.0037) model time 0.4131 (0.4543) loss 7.5447 (6.2696) grad_norm 2.4552 (4.1889) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 14:00:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][620/625] eta 0:00:02 lr 0.000017 wd 0.0500 time 0.4140 (0.4579) data time 0.0008 (0.0036) model time 0.4131 (0.4543) loss 7.1035 (6.2783) grad_norm 2.9572 (4.2544) loss_scale 64.0000 (64.0000) mem 14934MB [2024-07-25 14:00:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 287 training takes 0:02:11 [2024-07-25 14:00:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 14:00:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 14:00:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5420 (0.5420) Acc@1 90.381 (90.381) Acc@5 98.975 (98.975) Mem 14934MB [2024-07-25 14:00:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.127) Loss 0.8071 (0.6543) Acc@1 82.861 (87.749) Acc@5 97.021 (98.011) Mem 14934MB [2024-07-25 14:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.108) Loss 0.9014 (0.7559) Acc@1 79.541 (84.782) Acc@5 96.240 (97.119) Mem 14934MB [2024-07-25 14:00:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.421 Acc@5 97.085 [2024-07-25 14:00:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 14:00:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.815 (0.815) Loss 0.5415 (0.5415) Acc@1 90.527 (90.527) Acc@5 98.975 (98.975) Mem 14934MB [2024-07-25 14:00:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.163) Loss 0.8081 (0.6533) Acc@1 82.861 (87.700) Acc@5 97.266 (98.065) Mem 14934MB [2024-07-25 14:00:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.126) Loss 0.9028 (0.7558) Acc@1 79.150 (84.756) Acc@5 96.045 (97.121) Mem 14934MB [2024-07-25 14:00:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.357 Acc@5 97.087 [2024-07-25 14:00:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 14:00:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.36% [2024-07-25 14:00:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 14:00:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 14:00:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][0/625] eta 0:15:23 lr 0.000017 wd 0.0500 time 1.4779 (1.4779) data time 0.4055 (0.4055) model time 0.0000 (0.0000) loss 6.7265 (6.7265) grad_norm 2.4321 (2.4321) loss_scale 64.0000 (64.0000) mem 14943MB [2024-07-25 14:00:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][10/625] eta 0:05:14 lr 0.000017 wd 0.0500 time 0.4117 (0.5120) data time 0.0011 (0.0379) model time 0.0000 (0.0000) loss 6.6517 (5.9661) grad_norm 4.2669 (2.9401) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:00:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][20/625] eta 0:04:43 lr 0.000017 wd 0.0500 time 0.4113 (0.4680) data time 0.0008 (0.0204) model time 0.0000 (0.0000) loss 6.4446 (6.1232) grad_norm 3.5276 (3.1956) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:00:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][30/625] eta 0:04:29 lr 0.000017 wd 0.0500 time 0.4380 (0.4535) data time 0.0011 (0.0142) model time 0.0000 (0.0000) loss 5.9664 (6.3289) grad_norm 2.9069 (3.1645) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][40/625] eta 0:04:20 lr 0.000017 wd 0.0500 time 0.4162 (0.4452) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 6.4833 (6.3608) grad_norm 2.8173 (3.6618) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][50/625] eta 0:04:13 lr 0.000017 wd 0.0500 time 0.4216 (0.4406) data time 0.0011 (0.0091) model time 0.0000 (0.0000) loss 7.2959 (6.3066) grad_norm 2.4863 (3.4884) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][60/625] eta 0:04:07 lr 0.000017 wd 0.0500 time 0.4186 (0.4372) data time 0.0008 (0.0078) model time 0.4178 (0.4186) loss 6.1213 (6.3071) grad_norm 2.7712 (3.4347) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][70/625] eta 0:04:01 lr 0.000017 wd 0.0500 time 0.4127 (0.4343) data time 0.0010 (0.0068) model time 0.4117 (0.4171) loss 6.7874 (6.2894) grad_norm 2.7864 (3.4663) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][80/625] eta 0:03:55 lr 0.000017 wd 0.0500 time 0.4113 (0.4321) data time 0.0008 (0.0061) model time 0.4105 (0.4166) loss 5.9401 (6.2824) grad_norm 2.4702 (3.4254) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][90/625] eta 0:03:50 lr 0.000017 wd 0.0500 time 0.4231 (0.4307) data time 0.0010 (0.0056) model time 0.4221 (0.4170) loss 6.3010 (6.2942) grad_norm 4.6336 (3.3820) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][100/625] eta 0:03:45 lr 0.000017 wd 0.0500 time 0.4149 (0.4294) data time 0.0009 (0.0051) model time 0.4140 (0.4168) loss 5.8574 (6.2737) grad_norm 2.7314 (3.3973) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][110/625] eta 0:03:40 lr 0.000017 wd 0.0500 time 0.4144 (0.4283) data time 0.0008 (0.0047) model time 0.4136 (0.4167) loss 6.6942 (6.2755) grad_norm 2.3998 (3.3650) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][120/625] eta 0:03:35 lr 0.000017 wd 0.0500 time 0.4137 (0.4273) data time 0.0010 (0.0044) model time 0.4126 (0.4166) loss 6.6753 (6.2875) grad_norm 2.6784 (3.3567) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][130/625] eta 0:03:30 lr 0.000017 wd 0.0500 time 0.4128 (0.4262) data time 0.0008 (0.0042) model time 0.4120 (0.4160) loss 6.7049 (6.2686) grad_norm 3.0784 (3.3502) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][140/625] eta 0:03:27 lr 0.000017 wd 0.0500 time 0.4166 (0.4270) data time 0.0008 (0.0040) model time 0.4158 (0.4182) loss 6.6492 (6.2810) grad_norm 2.8859 (3.3614) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][150/625] eta 0:03:22 lr 0.000017 wd 0.0500 time 0.4156 (0.4264) data time 0.0011 (0.0038) model time 0.4146 (0.4180) loss 6.6824 (6.3068) grad_norm 3.7925 (3.4788) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][160/625] eta 0:03:17 lr 0.000017 wd 0.0500 time 0.4137 (0.4256) data time 0.0010 (0.0036) model time 0.4126 (0.4175) loss 5.8342 (6.2897) grad_norm 2.5631 (3.4304) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][170/625] eta 0:03:13 lr 0.000017 wd 0.0500 time 0.4132 (0.4250) data time 0.0008 (0.0035) model time 0.4123 (0.4173) loss 5.5069 (6.2591) grad_norm 2.2593 (3.4325) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][180/625] eta 0:03:08 lr 0.000017 wd 0.0500 time 0.4130 (0.4245) data time 0.0011 (0.0033) model time 0.4119 (0.4171) loss 6.5053 (6.2525) grad_norm 3.4967 (3.4825) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][190/625] eta 0:03:04 lr 0.000017 wd 0.0500 time 0.4133 (0.4240) data time 0.0010 (0.0032) model time 0.4123 (0.4169) loss 5.6208 (6.2369) grad_norm 2.9551 (3.4551) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][200/625] eta 0:03:00 lr 0.000017 wd 0.0500 time 0.4157 (0.4237) data time 0.0011 (0.0031) model time 0.4146 (0.4169) loss 6.5210 (6.2564) grad_norm 4.3771 (3.6258) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][210/625] eta 0:02:56 lr 0.000017 wd 0.0500 time 0.4126 (0.4255) data time 0.0008 (0.0030) model time 0.4118 (0.4196) loss 5.6417 (6.2553) grad_norm 2.7512 (3.6457) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][220/625] eta 0:02:52 lr 0.000017 wd 0.0500 time 0.4113 (0.4251) data time 0.0010 (0.0029) model time 0.4104 (0.4193) loss 6.1395 (6.2542) grad_norm 3.7725 (3.6221) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][230/625] eta 0:02:47 lr 0.000017 wd 0.0500 time 0.4103 (0.4247) data time 0.0010 (0.0029) model time 0.4093 (0.4190) loss 6.3486 (6.2595) grad_norm 4.0831 (3.6142) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][240/625] eta 0:02:43 lr 0.000017 wd 0.0500 time 0.4119 (0.4243) data time 0.0009 (0.0028) model time 0.4111 (0.4188) loss 6.0397 (6.2468) grad_norm 2.7514 (3.6588) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][250/625] eta 0:02:39 lr 0.000017 wd 0.0500 time 0.4113 (0.4240) data time 0.0008 (0.0027) model time 0.4105 (0.4187) loss 5.8036 (6.2361) grad_norm 2.8758 (3.6579) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][260/625] eta 0:02:34 lr 0.000017 wd 0.0500 time 0.4146 (0.4237) data time 0.0008 (0.0026) model time 0.4137 (0.4185) loss 6.9372 (6.2338) grad_norm 2.6889 (3.6378) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][270/625] eta 0:02:30 lr 0.000017 wd 0.0500 time 0.4142 (0.4234) data time 0.0011 (0.0026) model time 0.4131 (0.4184) loss 6.4400 (6.2484) grad_norm 3.0550 (3.6087) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][280/625] eta 0:02:26 lr 0.000017 wd 0.0500 time 0.4227 (0.4232) data time 0.0009 (0.0025) model time 0.4219 (0.4183) loss 6.8683 (6.2540) grad_norm 2.7545 (3.6102) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][290/625] eta 0:02:21 lr 0.000017 wd 0.0500 time 0.4155 (0.4229) data time 0.0011 (0.0025) model time 0.4144 (0.4181) loss 6.0269 (6.2491) grad_norm 3.6791 (3.6376) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][300/625] eta 0:02:17 lr 0.000017 wd 0.0500 time 0.4126 (0.4228) data time 0.0010 (0.0024) model time 0.4116 (0.4181) loss 7.0547 (6.2624) grad_norm 2.5521 (3.6098) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][310/625] eta 0:02:13 lr 0.000017 wd 0.0500 time 0.4143 (0.4226) data time 0.0011 (0.0024) model time 0.4133 (0.4179) loss 6.3590 (6.2643) grad_norm 2.8224 (3.7510) loss_scale 64.0000 (64.0000) mem 14941MB [2024-07-25 14:02:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 14:02:57 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 14:02:58 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 14:04:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 14:04:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 14:04:58 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 14:05:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 14:05:08 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 14:05:08 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 14:05:08 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 14:05:08 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 288) [2024-07-25 14:05:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 14:05:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][320/625] eta 0:17:02 lr 0.000017 wd 0.0500 time 0.3990 (3.3537) data time 0.0007 (0.1981) model time 0.3983 (3.1556) loss 5.6713 (6.3382) grad_norm 2.0724 (3.1443) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][330/625] eta 0:05:17 lr 0.000017 wd 0.0500 time 0.3950 (1.0773) data time 0.0008 (0.0464) model time 0.3942 (1.0309) loss 6.3277 (6.3440) grad_norm 9.5153 (4.2025) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][340/625] eta 0:03:42 lr 0.000017 wd 0.0500 time 0.3966 (0.7812) data time 0.0006 (0.0266) model time 0.3960 (0.7547) loss 6.9516 (6.4814) grad_norm 3.3203 (3.9021) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][350/625] eta 0:03:02 lr 0.000017 wd 0.0500 time 0.4023 (0.6648) data time 0.0007 (0.0188) model time 0.4016 (0.6460) loss 6.4533 (6.4775) grad_norm 2.9608 (4.1998) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][360/625] eta 0:02:39 lr 0.000017 wd 0.0500 time 0.3980 (0.6027) data time 0.0008 (0.0146) model time 0.3971 (0.5881) loss 6.5385 (6.4073) grad_norm 3.5562 (4.5089) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][370/625] eta 0:02:24 lr 0.000017 wd 0.0500 time 0.3927 (0.5678) data time 0.0009 (0.0120) model time 0.3918 (0.5558) loss 6.4951 (6.4132) grad_norm 5.4963 (4.4408) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][380/625] eta 0:02:13 lr 0.000017 wd 0.0500 time 0.3970 (0.5451) data time 0.0009 (0.0102) model time 0.3961 (0.5348) loss 6.1098 (6.3621) grad_norm 2.6437 (4.3071) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][390/625] eta 0:02:03 lr 0.000017 wd 0.0500 time 0.3976 (0.5252) data time 0.0008 (0.0090) model time 0.3968 (0.5162) loss 7.2596 (6.3641) grad_norm 2.1201 (4.2511) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][400/625] eta 0:01:54 lr 0.000017 wd 0.0500 time 0.3984 (0.5099) data time 0.0006 (0.0080) model time 0.3978 (0.5019) loss 5.4376 (6.3526) grad_norm 2.6423 (4.1410) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:05:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][410/625] eta 0:01:47 lr 0.000017 wd 0.0500 time 0.3989 (0.4979) data time 0.0006 (0.0072) model time 0.3983 (0.4907) loss 6.6947 (6.3425) grad_norm 2.2161 (4.4570) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:06:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][420/625] eta 0:01:40 lr 0.000017 wd 0.0500 time 0.4007 (0.4883) data time 0.0006 (0.0066) model time 0.4001 (0.4817) loss 7.0561 (6.3691) grad_norm 4.1445 (4.3492) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:06:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][430/625] eta 0:01:33 lr 0.000017 wd 0.0500 time 0.3970 (0.4803) data time 0.0007 (0.0061) model time 0.3963 (0.4742) loss 5.4490 (6.3460) grad_norm 33.2605 (4.4918) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:06:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][440/625] eta 0:01:27 lr 0.000017 wd 0.0500 time 0.3966 (0.4738) data time 0.0008 (0.0057) model time 0.3959 (0.4682) loss 6.5970 (6.3488) grad_norm 2.1139 (4.3465) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:06:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 14:06:12 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 14:06:16 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 14:10:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 14:10:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 14:10:55 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 14:11:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 14:11:09 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 14:11:09 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 14:11:09 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 14:11:09 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 288) [2024-07-25 14:11:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 14:11:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][450/625] eta 0:06:46 lr 0.000017 wd 0.0500 time 0.3902 (2.3248) data time 0.0012 (0.1457) model time 0.3890 (2.1791) loss 7.1825 (7.0609) grad_norm 2.3956 (4.0480) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:11:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][460/625] eta 0:03:16 lr 0.000017 wd 0.0500 time 0.3908 (1.1926) data time 0.0009 (0.0606) model time 0.3899 (1.1321) loss 6.5764 (6.7924) grad_norm 3.8751 (4.1251) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 14:11:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][470/625] eta 0:02:19 lr 0.000017 wd 0.0500 time 0.3833 (0.8972) data time 0.0007 (0.0385) model time 0.3826 (0.8587) loss 6.8467 (6.7205) grad_norm 3.8297 (inf) loss_scale 32.0000 (58.0741) mem 14931MB [2024-07-25 14:11:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][480/625] eta 0:01:50 lr 0.000017 wd 0.0500 time 0.3869 (0.7627) data time 0.0010 (0.0286) model time 0.3859 (0.7341) loss 5.7573 (6.6228) grad_norm 6.8366 (inf) loss_scale 32.0000 (51.0270) mem 14931MB [2024-07-25 14:11:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][490/625] eta 0:01:32 lr 0.000017 wd 0.0500 time 0.4499 (0.6846) data time 0.0008 (0.0227) model time 0.4491 (0.6619) loss 6.1351 (6.5412) grad_norm 3.3417 (inf) loss_scale 32.0000 (46.9787) mem 14931MB [2024-07-25 14:11:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][500/625] eta 0:01:20 lr 0.000017 wd 0.0500 time 0.3897 (0.6439) data time 0.0010 (0.0189) model time 0.3887 (0.6250) loss 6.1813 (6.5102) grad_norm 3.9868 (inf) loss_scale 32.0000 (44.3509) mem 14931MB [2024-07-25 14:11:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 14:11:55 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 14:11:58 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 17:57:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 17:57:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 17:58:02 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 17:58:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 17:58:14 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 17:58:15 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 17:58:15 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 17:59:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 17:59:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 18:00:10 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 18:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 18:00:23 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 18:00:23 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 18:00:23 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 18:00:23 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 288) [2024-07-25 18:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 18:00:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][510/625] eta 0:08:14 lr 0.000017 wd 0.0500 time 1.3060 (4.2997) data time 0.0009 (0.3190) model time 1.3051 (3.9807) loss 6.7720 (6.6328) grad_norm 8.0941 (5.9792) loss_scale 32.0000 (32.0000) mem 14934MB [2024-07-25 18:00:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][520/625] eta 0:01:49 lr 0.000017 wd 0.0500 time 0.3940 (1.0460) data time 0.0007 (0.0540) model time 0.3934 (0.9920) loss 5.3405 (6.5362) grad_norm 6.8594 (4.8686) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:00:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][530/625] eta 0:01:11 lr 0.000017 wd 0.0500 time 0.3995 (0.7516) data time 0.0009 (0.0299) model time 0.3986 (0.7217) loss 6.7585 (6.5654) grad_norm 4.7425 (4.8810) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:00:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][540/625] eta 0:00:54 lr 0.000017 wd 0.0500 time 0.3978 (0.6411) data time 0.0007 (0.0209) model time 0.3971 (0.6202) loss 6.4866 (6.5165) grad_norm 2.7236 (4.6182) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:00:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][550/625] eta 0:00:43 lr 0.000017 wd 0.0500 time 0.3931 (0.5829) data time 0.0010 (0.0162) model time 0.3921 (0.5668) loss 6.3900 (6.4637) grad_norm 2.8266 (4.4107) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:00:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][560/625] eta 0:00:35 lr 0.000017 wd 0.0500 time 0.3720 (0.5510) data time 0.0007 (0.0132) model time 0.3713 (0.5378) loss 5.7524 (6.4297) grad_norm 5.2205 (4.1185) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:01:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][570/625] eta 0:00:29 lr 0.000017 wd 0.0500 time 0.4053 (0.5307) data time 0.0006 (0.0113) model time 0.4047 (0.5194) loss 6.8152 (6.3981) grad_norm 3.6205 (4.2526) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:01:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][580/625] eta 0:00:23 lr 0.000017 wd 0.0500 time 0.3967 (0.5120) data time 0.0008 (0.0098) model time 0.3959 (0.5022) loss 6.3708 (6.3590) grad_norm 2.6127 (4.7481) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:01:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][590/625] eta 0:00:17 lr 0.000017 wd 0.0500 time 0.3979 (0.4982) data time 0.0010 (0.0089) model time 0.3968 (0.4893) loss 6.6279 (6.3407) grad_norm 9.3718 (4.6736) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:01:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][600/625] eta 0:00:12 lr 0.000017 wd 0.0500 time 0.3970 (0.4871) data time 0.0028 (0.0080) model time 0.3942 (0.4791) loss 5.8860 (6.3352) grad_norm 2.2046 (4.4751) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:01:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][610/625] eta 0:00:07 lr 0.000017 wd 0.0500 time 0.3960 (0.4786) data time 0.0004 (0.0073) model time 0.3956 (0.4713) loss 6.3982 (6.3683) grad_norm 3.2062 (4.4844) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:01:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][620/625] eta 0:00:02 lr 0.000017 wd 0.0500 time 0.3959 (0.4713) data time 0.0006 (0.0068) model time 0.3953 (0.4644) loss 6.8189 (6.3743) grad_norm 3.5340 (4.5813) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:01:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 288 training takes 0:00:54 [2024-07-25 18:01:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 18:01:25 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 18:01:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.440 (0.440) Loss 0.5386 (0.5386) Acc@1 90.479 (90.479) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-25 18:01:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8154 (0.6540) Acc@1 82.617 (87.744) Acc@5 97.021 (98.042) Mem 14931MB [2024-07-25 18:01:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9004 (0.7565) Acc@1 79.541 (84.842) Acc@5 96.143 (97.108) Mem 14931MB [2024-07-25 18:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.459 Acc@5 97.073 [2024-07-25 18:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.5% [2024-07-25 18:01:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.46% [2024-07-25 18:01:29 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 18:01:30 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 18:01:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5420 (0.5420) Acc@1 90.527 (90.527) Acc@5 98.975 (98.975) Mem 14931MB [2024-07-25 18:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.121) Loss 0.8086 (0.6538) Acc@1 82.764 (87.700) Acc@5 97.266 (98.074) Mem 14931MB [2024-07-25 18:01:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.9028 (0.7561) Acc@1 79.248 (84.766) Acc@5 96.094 (97.135) Mem 14931MB [2024-07-25 18:01:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.369 Acc@5 97.099 [2024-07-25 18:01:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 18:01:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.37% [2024-07-25 18:01:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 18:01:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 18:01:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][0/625] eta 0:14:51 lr 0.000017 wd 0.0500 time 1.4263 (1.4263) data time 0.4954 (0.4954) model time 0.0000 (0.0000) loss 6.7658 (6.7658) grad_norm 3.0752 (3.0752) loss_scale 32.0000 (32.0000) mem 14938MB [2024-07-25 18:01:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][10/625] eta 0:05:03 lr 0.000017 wd 0.0500 time 0.3965 (0.4938) data time 0.0008 (0.0458) model time 0.0000 (0.0000) loss 6.5747 (6.3645) grad_norm 3.9207 (3.1612) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:01:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][20/625] eta 0:04:31 lr 0.000016 wd 0.0500 time 0.3958 (0.4490) data time 0.0007 (0.0245) model time 0.0000 (0.0000) loss 5.3873 (6.2928) grad_norm 3.9322 (3.3499) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:01:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][30/625] eta 0:04:17 lr 0.000016 wd 0.0500 time 0.4030 (0.4330) data time 0.0009 (0.0169) model time 0.0000 (0.0000) loss 6.5142 (6.2350) grad_norm 2.9348 (3.2149) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:01:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][40/625] eta 0:04:08 lr 0.000016 wd 0.0500 time 0.3964 (0.4247) data time 0.0009 (0.0130) model time 0.0000 (0.0000) loss 6.0807 (6.3104) grad_norm 2.7912 (3.0750) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:01:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][50/625] eta 0:04:01 lr 0.000016 wd 0.0500 time 0.3970 (0.4198) data time 0.0009 (0.0106) model time 0.0000 (0.0000) loss 6.1097 (6.3481) grad_norm 2.9781 (3.0001) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:01:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][60/625] eta 0:03:55 lr 0.000016 wd 0.0500 time 0.4008 (0.4166) data time 0.0009 (0.0091) model time 0.3999 (0.3992) loss 6.7032 (6.3271) grad_norm 2.5836 (2.9782) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][70/625] eta 0:03:50 lr 0.000016 wd 0.0500 time 0.3967 (0.4156) data time 0.0007 (0.0079) model time 0.3959 (0.4040) loss 5.8403 (6.2946) grad_norm 4.9290 (3.2009) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][80/625] eta 0:03:45 lr 0.000016 wd 0.0500 time 0.3971 (0.4135) data time 0.0006 (0.0070) model time 0.3964 (0.4019) loss 6.3834 (6.2917) grad_norm 2.9919 (3.4796) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][90/625] eta 0:03:40 lr 0.000016 wd 0.0500 time 0.4027 (0.4119) data time 0.0006 (0.0064) model time 0.4021 (0.4008) loss 6.0289 (6.2433) grad_norm 4.7459 (3.4949) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][100/625] eta 0:03:35 lr 0.000016 wd 0.0500 time 0.3978 (0.4105) data time 0.0006 (0.0058) model time 0.3972 (0.4001) loss 6.5605 (6.2238) grad_norm 2.7081 (3.4308) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][110/625] eta 0:03:30 lr 0.000016 wd 0.0500 time 0.3952 (0.4093) data time 0.0009 (0.0054) model time 0.3943 (0.3995) loss 6.5937 (6.2406) grad_norm 3.6372 (3.4154) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][120/625] eta 0:03:26 lr 0.000016 wd 0.0500 time 0.3983 (0.4085) data time 0.0007 (0.0050) model time 0.3976 (0.3993) loss 6.0598 (6.2259) grad_norm 2.6108 (3.4760) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][130/625] eta 0:03:21 lr 0.000016 wd 0.0500 time 0.3968 (0.4079) data time 0.0008 (0.0047) model time 0.3960 (0.3993) loss 6.9359 (6.2169) grad_norm 5.3460 (3.4509) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][140/625] eta 0:03:17 lr 0.000016 wd 0.0500 time 0.3959 (0.4077) data time 0.0007 (0.0044) model time 0.3952 (0.3998) loss 5.5950 (6.2039) grad_norm 4.9457 (3.6627) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][150/625] eta 0:03:13 lr 0.000016 wd 0.0500 time 0.4015 (0.4073) data time 0.0007 (0.0042) model time 0.4008 (0.3999) loss 5.7319 (6.2003) grad_norm 5.1620 (3.6832) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][160/625] eta 0:03:09 lr 0.000016 wd 0.0500 time 0.4039 (0.4080) data time 0.0009 (0.0040) model time 0.4030 (0.4017) loss 6.2344 (6.2122) grad_norm 3.8016 (3.6940) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][170/625] eta 0:03:05 lr 0.000016 wd 0.0500 time 0.3995 (0.4075) data time 0.0006 (0.0038) model time 0.3989 (0.4013) loss 6.9714 (6.2203) grad_norm 2.7181 (3.6678) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 18:02:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 18:02:46 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 18:02:48 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 18:06:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 18:06:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 18:07:16 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 18:07:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 18:07:29 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 18:07:29 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 18:07:29 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 18:07:29 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 289) [2024-07-25 18:07:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 18:07:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][180/625] eta 0:25:07 lr 0.000016 wd 0.0500 time 0.3897 (3.3871) data time 0.0007 (0.2018) model time 0.3890 (3.1852) loss 5.8922 (6.3607) grad_norm 5.8136 (5.1317) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:07:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 18:07:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 18:07:49 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 18:18:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 18:18:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 18:19:00 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 18:19:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 18:19:13 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 18:19:13 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 18:19:13 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 18:19:13 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 289) [2024-07-25 18:19:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 18:19:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][190/625] eta 0:15:53 lr 0.000016 wd 0.0500 time 0.4056 (2.1926) data time 0.0008 (0.1147) model time 0.4048 (2.0779) loss 6.7714 (6.6990) grad_norm 3.4707 (2.6757) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:19:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][200/625] eta 0:07:38 lr 0.000016 wd 0.0500 time 0.4148 (1.0793) data time 0.0010 (0.0437) model time 0.4138 (1.0357) loss 6.9506 (6.5644) grad_norm 5.1054 (3.2873) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:19:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][210/625] eta 0:05:41 lr 0.000016 wd 0.0500 time 0.4120 (0.8225) data time 0.0008 (0.0273) model time 0.4112 (0.7952) loss 6.9068 (6.4995) grad_norm 4.6363 (4.0931) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:19:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][220/625] eta 0:04:47 lr 0.000016 wd 0.0500 time 0.4102 (0.7088) data time 0.0010 (0.0200) model time 0.4092 (0.6888) loss 6.3515 (6.4548) grad_norm 5.0743 (5.6339) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:19:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][230/625] eta 0:04:14 lr 0.000016 wd 0.0500 time 0.4052 (0.6441) data time 0.0011 (0.0159) model time 0.4041 (0.6282) loss 6.6517 (6.3971) grad_norm 2.8121 (5.2945) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:19:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][240/625] eta 0:03:55 lr 0.000016 wd 0.0500 time 0.4078 (0.6120) data time 0.0008 (0.0132) model time 0.4070 (0.5987) loss 6.7465 (6.3811) grad_norm 2.7472 (4.9586) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:19:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][250/625] eta 0:03:38 lr 0.000016 wd 0.0500 time 0.4071 (0.5817) data time 0.0008 (0.0114) model time 0.4063 (0.5703) loss 5.2650 (6.3566) grad_norm 3.9449 (4.7237) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][260/625] eta 0:03:24 lr 0.000016 wd 0.0500 time 0.4160 (0.5597) data time 0.0010 (0.0100) model time 0.4150 (0.5497) loss 6.6288 (6.3527) grad_norm 3.3483 (4.8185) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][270/625] eta 0:03:12 lr 0.000016 wd 0.0500 time 0.4118 (0.5428) data time 0.0008 (0.0089) model time 0.4111 (0.5338) loss 5.6159 (6.3336) grad_norm 4.6433 (4.6507) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][280/625] eta 0:03:02 lr 0.000016 wd 0.0500 time 0.4158 (0.5294) data time 0.0008 (0.0081) model time 0.4150 (0.5213) loss 6.2797 (6.3376) grad_norm 4.6404 (4.5144) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][290/625] eta 0:02:53 lr 0.000016 wd 0.0500 time 0.4145 (0.5186) data time 0.0010 (0.0074) model time 0.4135 (0.5111) loss 7.1553 (6.3593) grad_norm 2.2412 (4.5016) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][300/625] eta 0:02:45 lr 0.000016 wd 0.0500 time 0.4192 (0.5098) data time 0.0008 (0.0069) model time 0.4184 (0.5029) loss 6.9759 (6.3696) grad_norm 2.2513 (4.4363) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][310/625] eta 0:02:38 lr 0.000016 wd 0.0500 time 0.4102 (0.5024) data time 0.0008 (0.0064) model time 0.4094 (0.4960) loss 5.7670 (6.3607) grad_norm 2.8428 (4.3916) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][320/625] eta 0:02:31 lr 0.000016 wd 0.0500 time 0.4054 (0.4960) data time 0.0010 (0.0060) model time 0.4044 (0.4899) loss 5.6080 (6.3817) grad_norm 3.0518 (4.3705) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][330/625] eta 0:02:24 lr 0.000016 wd 0.0500 time 0.4090 (0.4903) data time 0.0009 (0.0057) model time 0.4082 (0.4846) loss 5.8547 (6.3804) grad_norm 3.4200 (4.3574) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][340/625] eta 0:02:18 lr 0.000016 wd 0.0500 time 0.4121 (0.4855) data time 0.0010 (0.0054) model time 0.4111 (0.4801) loss 6.4378 (6.3842) grad_norm 2.8856 (4.2796) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][350/625] eta 0:02:12 lr 0.000016 wd 0.0500 time 0.4196 (0.4814) data time 0.0010 (0.0051) model time 0.4186 (0.4762) loss 7.2054 (6.3885) grad_norm 2.4798 (4.1832) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][360/625] eta 0:02:06 lr 0.000016 wd 0.0500 time 0.4136 (0.4776) data time 0.0008 (0.0049) model time 0.4127 (0.4727) loss 5.9188 (6.3767) grad_norm 2.7376 (4.1476) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][370/625] eta 0:02:00 lr 0.000016 wd 0.0500 time 0.4151 (0.4743) data time 0.0009 (0.0047) model time 0.4142 (0.4696) loss 5.5405 (6.3633) grad_norm 3.2802 (4.0939) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][380/625] eta 0:01:55 lr 0.000016 wd 0.0500 time 0.4191 (0.4713) data time 0.0008 (0.0045) model time 0.4183 (0.4668) loss 5.3015 (6.3456) grad_norm 3.7095 (4.0938) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][390/625] eta 0:01:50 lr 0.000016 wd 0.0500 time 0.4180 (0.4685) data time 0.0008 (0.0043) model time 0.4172 (0.4642) loss 4.8511 (6.3296) grad_norm 4.7898 (4.0607) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:20:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][400/625] eta 0:01:44 lr 0.000016 wd 0.0500 time 0.4203 (0.4661) data time 0.0009 (0.0042) model time 0.4194 (0.4619) loss 5.2052 (6.3086) grad_norm 4.0803 (3.9995) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:21:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][410/625] eta 0:01:39 lr 0.000016 wd 0.0500 time 0.4396 (0.4640) data time 0.0008 (0.0040) model time 0.4388 (0.4600) loss 6.5400 (6.3089) grad_norm 5.0328 (3.9536) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:21:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][420/625] eta 0:01:34 lr 0.000016 wd 0.0500 time 0.4216 (0.4620) data time 0.0008 (0.0039) model time 0.4208 (0.4581) loss 6.2740 (6.3021) grad_norm 2.5896 (3.9108) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:21:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][430/625] eta 0:01:29 lr 0.000016 wd 0.0500 time 0.4149 (0.4601) data time 0.0009 (0.0038) model time 0.4140 (0.4563) loss 5.5939 (6.2946) grad_norm 5.4028 (3.9128) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:21:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 18:21:13 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 18:21:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 18:26:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 18:26:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 18:26:46 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 18:46:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 18:46:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 18:46:59 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 18:47:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 18:47:12 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 18:47:13 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 18:47:13 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 18:47:13 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 289) [2024-07-25 18:47:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 18:47:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][440/625] eta 0:07:29 lr 0.000016 wd 0.0500 time 0.4182 (2.4273) data time 0.0009 (0.1737) model time 0.4172 (2.2535) loss 7.1944 (6.5991) grad_norm 3.9915 (2.9659) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:47:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][450/625] eta 0:03:09 lr 0.000016 wd 0.0500 time 0.4169 (1.0850) data time 0.0010 (0.0587) model time 0.4159 (1.0263) loss 7.4905 (6.4615) grad_norm 2.5530 (3.5606) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:47:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][460/625] eta 0:02:14 lr 0.000016 wd 0.0500 time 0.4183 (0.8168) data time 0.0010 (0.0357) model time 0.4173 (0.7811) loss 6.7866 (6.4756) grad_norm 2.8005 (5.2837) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:47:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][470/625] eta 0:01:48 lr 0.000016 wd 0.0500 time 0.4230 (0.7015) data time 0.0011 (0.0258) model time 0.4219 (0.6756) loss 6.4672 (6.4602) grad_norm 4.5857 (4.9786) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:47:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][480/625] eta 0:01:32 lr 0.000016 wd 0.0500 time 0.4076 (0.6376) data time 0.0012 (0.0204) model time 0.4064 (0.6173) loss 6.1460 (6.3855) grad_norm 2.3107 (4.5100) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:47:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][490/625] eta 0:01:21 lr 0.000016 wd 0.0500 time 0.6957 (0.6061) data time 0.0007 (0.0168) model time 0.6950 (0.5893) loss 5.5120 (6.3953) grad_norm 5.5840 (4.2783) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:47:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][500/625] eta 0:01:12 lr 0.000016 wd 0.0500 time 0.4077 (0.5767) data time 0.0010 (0.0144) model time 0.4067 (0.5623) loss 6.7154 (6.3753) grad_norm 2.5551 (4.1273) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:47:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][510/625] eta 0:01:03 lr 0.000016 wd 0.0500 time 0.4244 (0.5550) data time 0.0011 (0.0126) model time 0.4233 (0.5423) loss 5.4668 (6.3008) grad_norm 2.2277 (4.5084) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][520/625] eta 0:00:56 lr 0.000016 wd 0.0500 time 0.4078 (0.5383) data time 0.0008 (0.0113) model time 0.4070 (0.5270) loss 5.7513 (6.2864) grad_norm 5.2591 (4.4051) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][530/625] eta 0:00:49 lr 0.000016 wd 0.0500 time 0.4096 (0.5250) data time 0.0011 (0.0102) model time 0.4085 (0.5148) loss 7.3278 (6.2846) grad_norm 4.9021 (4.3196) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][540/625] eta 0:00:43 lr 0.000016 wd 0.0500 time 0.4128 (0.5142) data time 0.0011 (0.0094) model time 0.4118 (0.5048) loss 5.9595 (6.3024) grad_norm 1.9517 (4.1986) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][550/625] eta 0:00:37 lr 0.000016 wd 0.0500 time 0.4254 (0.5056) data time 0.0008 (0.0086) model time 0.4246 (0.4970) loss 5.2936 (6.2916) grad_norm 2.3142 (4.1539) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][560/625] eta 0:00:32 lr 0.000016 wd 0.0500 time 0.4159 (0.4985) data time 0.0008 (0.0080) model time 0.4151 (0.4904) loss 5.4034 (6.2996) grad_norm 2.8885 (4.0419) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][570/625] eta 0:00:27 lr 0.000016 wd 0.0500 time 0.4214 (0.4925) data time 0.0008 (0.0075) model time 0.4206 (0.4850) loss 6.1215 (6.3037) grad_norm 2.3902 (3.9694) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][580/625] eta 0:00:21 lr 0.000016 wd 0.0500 time 0.4149 (0.4874) data time 0.0011 (0.0071) model time 0.4138 (0.4803) loss 6.2347 (6.2864) grad_norm 4.5279 (3.9616) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][590/625] eta 0:00:16 lr 0.000016 wd 0.0500 time 0.4182 (0.4827) data time 0.0014 (0.0067) model time 0.4168 (0.4760) loss 6.4338 (6.2891) grad_norm 2.8369 (3.9000) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][600/625] eta 0:00:11 lr 0.000016 wd 0.0500 time 0.4284 (0.4787) data time 0.0010 (0.0064) model time 0.4274 (0.4723) loss 6.9394 (6.2954) grad_norm 3.3503 (3.8663) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][610/625] eta 0:00:07 lr 0.000016 wd 0.0500 time 0.4170 (0.4750) data time 0.0005 (0.0061) model time 0.4164 (0.4689) loss 5.9350 (6.2921) grad_norm 6.5744 (3.8757) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][620/625] eta 0:00:02 lr 0.000016 wd 0.0500 time 0.4143 (0.4718) data time 0.0008 (0.0058) model time 0.4135 (0.4660) loss 6.2878 (6.2847) grad_norm 3.4500 (3.9967) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 18:48:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 18:48:45 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 18:48:49 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 19:01:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 19:01:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 19:02:08 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 19:02:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 19:02:27 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 19:02:27 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 19:02:28 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 19:02:28 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 289) [2024-07-25 19:02:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 19:02:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 289 training takes 0:00:10 [2024-07-25 19:02:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 19:02:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 19:02:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 0.5415 (0.5415) Acc@1 90.283 (90.283) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-25 19:02:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.084 (0.127) Loss 0.8115 (0.6517) Acc@1 82.373 (87.740) Acc@5 97.070 (98.042) Mem 14931MB [2024-07-25 19:02:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.084 (0.107) Loss 0.9009 (0.7546) Acc@1 79.590 (84.775) Acc@5 95.996 (97.138) Mem 14931MB [2024-07-25 19:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.411 Acc@5 97.099 [2024-07-25 19:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 19:02:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.848 (0.848) Loss 0.5425 (0.5425) Acc@1 90.527 (90.527) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-25 19:02:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.084 (0.163) Loss 0.8086 (0.6535) Acc@1 82.764 (87.700) Acc@5 97.266 (98.078) Mem 14931MB [2024-07-25 19:02:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.084 (0.125) Loss 0.9028 (0.7560) Acc@1 79.346 (84.780) Acc@5 96.045 (97.138) Mem 14931MB [2024-07-25 19:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.391 Acc@5 97.101 [2024-07-25 19:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 19:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.39% [2024-07-25 19:02:54 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 19:02:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 19:02:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][0/625] eta 0:15:00 lr 0.000016 wd 0.0500 time 1.4407 (1.4407) data time 0.4003 (0.4003) model time 0.0000 (0.0000) loss 7.0515 (7.0515) grad_norm 3.1070 (3.1070) loss_scale 32.0000 (32.0000) mem 14938MB [2024-07-25 19:03:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][10/625] eta 0:05:12 lr 0.000016 wd 0.0500 time 0.4145 (0.5080) data time 0.0011 (0.0374) model time 0.0000 (0.0000) loss 7.3833 (6.6218) grad_norm 6.3023 (3.4707) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 19:03:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][20/625] eta 0:04:39 lr 0.000016 wd 0.0500 time 0.4148 (0.4622) data time 0.0011 (0.0201) model time 0.0000 (0.0000) loss 6.8918 (6.5866) grad_norm 4.4811 (3.8646) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 19:03:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][30/625] eta 0:04:26 lr 0.000016 wd 0.0500 time 0.4051 (0.4475) data time 0.0011 (0.0139) model time 0.0000 (0.0000) loss 6.3371 (6.5003) grad_norm 2.4206 (3.9077) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 19:03:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][40/625] eta 0:04:17 lr 0.000016 wd 0.0500 time 0.4159 (0.4398) data time 0.0011 (0.0108) model time 0.0000 (0.0000) loss 6.1547 (6.4416) grad_norm 5.9908 (3.8818) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 19:03:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][50/625] eta 0:04:10 lr 0.000016 wd 0.0500 time 0.4346 (0.4361) data time 0.0008 (0.0089) model time 0.0000 (0.0000) loss 5.7553 (6.4067) grad_norm 3.4019 (4.1088) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 19:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][60/625] eta 0:04:07 lr 0.000016 wd 0.0500 time 0.4065 (0.4379) data time 0.0010 (0.0076) model time 0.4054 (0.4460) loss 6.2837 (6.3515) grad_norm 2.5117 (3.9713) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 19:03:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 19:03:22 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 19:03:24 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 19:10:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 19:10:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 19:16:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 19:16:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 19:17:17 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 19:34:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 19:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 19:34:39 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 19:34:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 19:34:51 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 19:34:52 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 19:34:52 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 19:34:52 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 290) [2024-07-25 19:34:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 19:35:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][70/625] eta 0:14:52 lr 0.000016 wd 0.0500 time 0.4157 (1.6080) data time 0.0011 (0.0775) model time 0.4146 (1.5305) loss 6.6653 (6.5864) grad_norm 2.2356 (3.8510) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 19:35:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 19:35:13 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 19:35:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 19:38:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 19:38:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 19:40:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 19:40:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 19:40:24 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 19:40:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 19:40:36 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 19:40:36 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 19:40:36 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 19:40:36 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 290) [2024-07-25 19:40:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 19:40:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][80/625] eta 0:14:02 lr 0.000016 wd 0.0500 time 0.3921 (1.5464) data time 0.0006 (0.1183) model time 0.3915 (1.4282) loss 6.6827 (6.6960) grad_norm 5.5598 (3.1662) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 19:40:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][90/625] eta 0:08:04 lr 0.000016 wd 0.0500 time 0.3926 (0.9059) data time 0.0007 (0.0531) model time 0.3919 (0.8527) loss 6.6906 (6.5471) grad_norm 3.1264 (3.2786) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 19:41:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][100/625] eta 0:06:19 lr 0.000016 wd 0.0500 time 0.3940 (0.7229) data time 0.0011 (0.0345) model time 0.3930 (0.6884) loss 6.3286 (6.4558) grad_norm 5.4138 (3.4153) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 19:41:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][110/625] eta 0:05:27 lr 0.000016 wd 0.0500 time 0.3922 (0.6365) data time 0.0010 (0.0257) model time 0.3911 (0.6108) loss 6.9531 (6.4350) grad_norm 4.7412 (3.6356) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 19:41:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][120/625] eta 0:04:55 lr 0.000016 wd 0.0500 time 0.3932 (0.5860) data time 0.0008 (0.0206) model time 0.3925 (0.5654) loss 6.2246 (6.3979) grad_norm 3.1426 (3.5336) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 19:41:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 19:41:09 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 19:41:12 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 19:54:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 19:55:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 20:20:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 20:20:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 20:20:45 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 20:20:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 20:20:59 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 20:20:59 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 20:20:59 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 20:20:59 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 290) [2024-07-25 20:20:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 20:21:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][130/625] eta 0:13:49 lr 0.000016 wd 0.0500 time 0.3913 (1.6752) data time 0.0006 (0.0756) model time 0.3907 (1.5996) loss 5.7720 (6.3011) grad_norm 20.7960 (6.8456) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][140/625] eta 0:07:47 lr 0.000016 wd 0.0500 time 0.3959 (0.9636) data time 0.0007 (0.0341) model time 0.3952 (0.9295) loss 6.6733 (6.3990) grad_norm 2.5700 (5.2126) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][150/625] eta 0:06:00 lr 0.000016 wd 0.0500 time 0.3923 (0.7599) data time 0.0008 (0.0223) model time 0.3915 (0.7376) loss 6.9450 (6.4081) grad_norm 3.1597 (5.3991) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][160/625] eta 0:05:08 lr 0.000016 wd 0.0500 time 0.3946 (0.6635) data time 0.0008 (0.0166) model time 0.3938 (0.6469) loss 6.4005 (6.4146) grad_norm 5.9641 (4.8380) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][170/625] eta 0:04:36 lr 0.000016 wd 0.0500 time 0.3983 (0.6074) data time 0.0006 (0.0134) model time 0.3976 (0.5940) loss 6.5547 (6.4372) grad_norm 3.9142 (4.8615) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][180/625] eta 0:04:17 lr 0.000016 wd 0.0500 time 0.3940 (0.5784) data time 0.0006 (0.0112) model time 0.3934 (0.5672) loss 5.6648 (6.3784) grad_norm 3.0229 (4.6757) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][190/625] eta 0:04:00 lr 0.000016 wd 0.0500 time 0.3933 (0.5524) data time 0.0008 (0.0097) model time 0.3925 (0.5427) loss 5.9201 (6.3722) grad_norm 1.9534 (4.3931) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][200/625] eta 0:03:46 lr 0.000015 wd 0.0500 time 0.3981 (0.5324) data time 0.0006 (0.0086) model time 0.3975 (0.5238) loss 5.2271 (6.3386) grad_norm 3.2948 (4.8343) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][210/625] eta 0:03:34 lr 0.000015 wd 0.0500 time 0.4003 (0.5171) data time 0.0009 (0.0077) model time 0.3994 (0.5094) loss 5.9271 (6.3096) grad_norm 2.8600 (4.7263) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][220/625] eta 0:03:24 lr 0.000015 wd 0.0500 time 0.3954 (0.5048) data time 0.0006 (0.0070) model time 0.3948 (0.4978) loss 7.0067 (6.3155) grad_norm 4.4407 (4.6112) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:21:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][230/625] eta 0:03:15 lr 0.000015 wd 0.0500 time 0.4030 (0.4950) data time 0.0006 (0.0064) model time 0.4023 (0.4886) loss 6.0863 (6.3429) grad_norm 2.6836 (4.5446) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][240/625] eta 0:03:07 lr 0.000015 wd 0.0500 time 0.3950 (0.4868) data time 0.0009 (0.0060) model time 0.3941 (0.4809) loss 6.7690 (6.3419) grad_norm 3.3582 (4.4547) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][250/625] eta 0:03:00 lr 0.000015 wd 0.0500 time 0.3979 (0.4801) data time 0.0006 (0.0056) model time 0.3973 (0.4745) loss 6.1163 (6.3427) grad_norm 2.8809 (4.4203) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][260/625] eta 0:02:52 lr 0.000015 wd 0.0500 time 0.3922 (0.4739) data time 0.0009 (0.0052) model time 0.3913 (0.4687) loss 5.9123 (6.3349) grad_norm 2.7869 (4.3438) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][270/625] eta 0:02:46 lr 0.000015 wd 0.0500 time 0.4091 (0.4688) data time 0.0007 (0.0049) model time 0.4084 (0.4639) loss 6.6838 (6.3391) grad_norm 2.3495 (4.2364) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][280/625] eta 0:02:40 lr 0.000015 wd 0.0500 time 0.3939 (0.4644) data time 0.0007 (0.0047) model time 0.3932 (0.4598) loss 5.7674 (6.3405) grad_norm 3.2703 (4.1909) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][290/625] eta 0:02:34 lr 0.000015 wd 0.0500 time 0.3993 (0.4606) data time 0.0009 (0.0045) model time 0.3984 (0.4561) loss 6.8503 (6.3460) grad_norm 5.0417 (4.1620) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][300/625] eta 0:02:28 lr 0.000015 wd 0.0500 time 0.3953 (0.4572) data time 0.0006 (0.0043) model time 0.3946 (0.4529) loss 5.3739 (6.3311) grad_norm 5.0060 (4.1093) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][310/625] eta 0:02:23 lr 0.000015 wd 0.0500 time 0.3954 (0.4540) data time 0.0007 (0.0041) model time 0.3947 (0.4499) loss 6.9738 (6.3327) grad_norm 3.9686 (4.1053) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][320/625] eta 0:02:17 lr 0.000015 wd 0.0500 time 0.3978 (0.4512) data time 0.0008 (0.0039) model time 0.3970 (0.4473) loss 5.4139 (6.3151) grad_norm 5.7358 (4.0747) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][330/625] eta 0:02:12 lr 0.000015 wd 0.0500 time 0.3988 (0.4488) data time 0.0008 (0.0038) model time 0.3980 (0.4450) loss 6.9905 (6.3188) grad_norm 5.1798 (4.0318) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][340/625] eta 0:02:07 lr 0.000015 wd 0.0500 time 0.3945 (0.4463) data time 0.0008 (0.0036) model time 0.3937 (0.4427) loss 6.2802 (6.3050) grad_norm 4.9336 (4.0290) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][350/625] eta 0:02:02 lr 0.000015 wd 0.0500 time 0.3990 (0.4444) data time 0.0009 (0.0035) model time 0.3982 (0.4409) loss 7.2005 (6.2962) grad_norm 3.3282 (4.0033) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][360/625] eta 0:01:57 lr 0.000015 wd 0.0500 time 0.4030 (0.4423) data time 0.0006 (0.0034) model time 0.4024 (0.4389) loss 6.4384 (6.2913) grad_norm 3.7181 (3.9931) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][370/625] eta 0:01:52 lr 0.000015 wd 0.0500 time 0.3971 (0.4406) data time 0.0008 (0.0033) model time 0.3964 (0.4373) loss 5.9269 (6.2909) grad_norm 2.8705 (4.1182) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:22:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][380/625] eta 0:01:47 lr 0.000015 wd 0.0500 time 0.3972 (0.4389) data time 0.0008 (0.0032) model time 0.3964 (0.4357) loss 5.9081 (6.2749) grad_norm 2.1170 (4.0850) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][390/625] eta 0:01:42 lr 0.000015 wd 0.0500 time 0.3999 (0.4375) data time 0.0007 (0.0031) model time 0.3992 (0.4344) loss 6.6433 (6.2701) grad_norm 4.1318 (4.0748) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][400/625] eta 0:01:38 lr 0.000015 wd 0.0500 time 0.3960 (0.4373) data time 0.0009 (0.0030) model time 0.3951 (0.4343) loss 5.6361 (6.2723) grad_norm 3.3413 (4.1149) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][410/625] eta 0:01:33 lr 0.000015 wd 0.0500 time 0.3999 (0.4360) data time 0.0006 (0.0030) model time 0.3993 (0.4330) loss 6.5046 (6.2797) grad_norm 4.3287 (4.1268) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][420/625] eta 0:01:29 lr 0.000015 wd 0.0500 time 0.3959 (0.4347) data time 0.0008 (0.0029) model time 0.3951 (0.4318) loss 6.0343 (6.2682) grad_norm 2.8588 (4.1042) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][430/625] eta 0:01:24 lr 0.000015 wd 0.0500 time 0.3949 (0.4337) data time 0.0007 (0.0028) model time 0.3942 (0.4309) loss 5.6270 (6.2654) grad_norm 3.0009 (4.1405) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][440/625] eta 0:01:20 lr 0.000015 wd 0.0500 time 0.3949 (0.4326) data time 0.0008 (0.0028) model time 0.3941 (0.4298) loss 5.8416 (6.2786) grad_norm 4.4874 (4.1584) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][450/625] eta 0:01:15 lr 0.000015 wd 0.0500 time 0.3983 (0.4314) data time 0.0006 (0.0027) model time 0.3977 (0.4287) loss 6.7234 (6.2886) grad_norm 2.2951 (4.1374) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][460/625] eta 0:01:11 lr 0.000015 wd 0.0500 time 0.3918 (0.4304) data time 0.0008 (0.0027) model time 0.3910 (0.4277) loss 6.5145 (6.2837) grad_norm 5.0516 (4.1718) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][470/625] eta 0:01:06 lr 0.000015 wd 0.0500 time 0.4022 (0.4294) data time 0.0007 (0.0026) model time 0.4016 (0.4268) loss 5.8819 (6.2835) grad_norm 2.7793 (4.1772) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][480/625] eta 0:01:02 lr 0.000015 wd 0.0500 time 0.3927 (0.4285) data time 0.0009 (0.0026) model time 0.3918 (0.4259) loss 6.7878 (6.2852) grad_norm 3.5359 (4.1481) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][490/625] eta 0:00:57 lr 0.000015 wd 0.0500 time 0.3950 (0.4276) data time 0.0006 (0.0025) model time 0.3944 (0.4251) loss 6.1517 (6.2814) grad_norm 3.0788 (4.1249) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][500/625] eta 0:00:53 lr 0.000015 wd 0.0500 time 0.3962 (0.4268) data time 0.0008 (0.0025) model time 0.3954 (0.4243) loss 6.5482 (6.2861) grad_norm 3.8743 (4.1028) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][510/625] eta 0:00:48 lr 0.000015 wd 0.0500 time 0.4001 (0.4260) data time 0.0006 (0.0024) model time 0.3995 (0.4236) loss 5.8338 (6.2794) grad_norm 4.6917 (4.0845) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][520/625] eta 0:00:44 lr 0.000015 wd 0.0500 time 0.3948 (0.4252) data time 0.0007 (0.0024) model time 0.3941 (0.4228) loss 5.9509 (6.2803) grad_norm 3.7776 (4.0932) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:23:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][530/625] eta 0:00:40 lr 0.000015 wd 0.0500 time 0.3969 (0.4248) data time 0.0008 (0.0023) model time 0.3961 (0.4225) loss 6.5580 (6.2815) grad_norm 4.5437 (4.0776) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][540/625] eta 0:00:36 lr 0.000015 wd 0.0500 time 0.3938 (0.4242) data time 0.0007 (0.0023) model time 0.3931 (0.4219) loss 6.3992 (6.2823) grad_norm 2.7545 (4.0718) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][550/625] eta 0:00:31 lr 0.000015 wd 0.0500 time 0.3932 (0.4235) data time 0.0007 (0.0023) model time 0.3924 (0.4213) loss 7.7158 (6.2862) grad_norm 3.9661 (4.0794) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][560/625] eta 0:00:27 lr 0.000015 wd 0.0500 time 0.3936 (0.4229) data time 0.0009 (0.0022) model time 0.3927 (0.4207) loss 6.6454 (6.2842) grad_norm 2.6710 (4.0658) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][570/625] eta 0:00:23 lr 0.000015 wd 0.0500 time 0.3981 (0.4223) data time 0.0009 (0.0022) model time 0.3972 (0.4201) loss 5.3983 (6.2778) grad_norm 2.4233 (4.0329) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][580/625] eta 0:00:18 lr 0.000015 wd 0.0500 time 0.3912 (0.4218) data time 0.0007 (0.0022) model time 0.3905 (0.4196) loss 6.2899 (6.2762) grad_norm 4.2044 (4.0297) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][590/625] eta 0:00:14 lr 0.000015 wd 0.0500 time 0.3963 (0.4212) data time 0.0008 (0.0022) model time 0.3955 (0.4191) loss 6.1681 (6.2682) grad_norm 2.4126 (4.0145) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][600/625] eta 0:00:10 lr 0.000015 wd 0.0500 time 0.3977 (0.4207) data time 0.0008 (0.0021) model time 0.3968 (0.4186) loss 5.3114 (6.2657) grad_norm 3.6520 (4.0114) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][610/625] eta 0:00:06 lr 0.000015 wd 0.0500 time 0.3964 (0.4202) data time 0.0006 (0.0021) model time 0.3958 (0.4181) loss 6.5096 (6.2724) grad_norm 2.5092 (3.9930) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][620/625] eta 0:00:02 lr 0.000015 wd 0.0500 time 0.3954 (0.4201) data time 0.0004 (0.0021) model time 0.3950 (0.4180) loss 6.7253 (6.2724) grad_norm 2.7571 (3.9785) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-25 20:24:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 290 training takes 0:03:30 [2024-07-25 20:24:34 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 20:24:39 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 20:24:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.446 (0.446) Loss 0.5464 (0.5464) Acc@1 90.186 (90.186) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-25 20:24:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8145 (0.6566) Acc@1 82.520 (87.682) Acc@5 97.217 (98.065) Mem 14931MB [2024-07-25 20:24:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.8984 (0.7574) Acc@1 79.443 (84.784) Acc@5 96.045 (97.131) Mem 14931MB [2024-07-25 20:24:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.411 Acc@5 97.095 [2024-07-25 20:24:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 20:24:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.830 (0.830) Loss 0.5425 (0.5425) Acc@1 90.479 (90.479) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-25 20:24:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.155) Loss 0.8086 (0.6538) Acc@1 82.812 (87.722) Acc@5 97.266 (98.078) Mem 14931MB [2024-07-25 20:24:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.122) Loss 0.9028 (0.7561) Acc@1 79.297 (84.789) Acc@5 96.045 (97.133) Mem 14931MB [2024-07-25 20:24:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.393 Acc@5 97.097 [2024-07-25 20:24:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 20:24:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.39% [2024-07-25 20:24:46 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 20:24:46 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 20:24:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][0/625] eta 0:16:41 lr 0.000015 wd 0.0500 time 1.6026 (1.6026) data time 0.4109 (0.4109) model time 0.0000 (0.0000) loss 5.5918 (5.5918) grad_norm 2.7919 (2.7919) loss_scale 32.0000 (32.0000) mem 14942MB [2024-07-25 20:24:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][10/625] eta 0:05:10 lr 0.000015 wd 0.0500 time 0.3916 (0.5049) data time 0.0006 (0.0381) model time 0.0000 (0.0000) loss 6.1399 (6.5967) grad_norm 4.9566 (3.7860) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:24:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][20/625] eta 0:04:33 lr 0.000015 wd 0.0500 time 0.3959 (0.4528) data time 0.0009 (0.0204) model time 0.0000 (0.0000) loss 5.9667 (6.3782) grad_norm 2.7782 (3.5766) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][30/625] eta 0:04:18 lr 0.000015 wd 0.0500 time 0.3969 (0.4344) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 6.7167 (6.3855) grad_norm 2.7315 (5.6520) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][40/625] eta 0:04:08 lr 0.000015 wd 0.0500 time 0.3965 (0.4249) data time 0.0008 (0.0108) model time 0.0000 (0.0000) loss 6.5910 (6.3684) grad_norm 3.7107 (5.1127) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][50/625] eta 0:04:01 lr 0.000015 wd 0.0500 time 0.3945 (0.4192) data time 0.0007 (0.0089) model time 0.0000 (0.0000) loss 6.7074 (6.3492) grad_norm 5.3179 (4.9569) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][60/625] eta 0:03:54 lr 0.000015 wd 0.0500 time 0.3958 (0.4153) data time 0.0008 (0.0076) model time 0.3950 (0.3944) loss 6.0786 (6.3795) grad_norm 3.8384 (4.7347) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][70/625] eta 0:03:49 lr 0.000015 wd 0.0500 time 0.4041 (0.4127) data time 0.0006 (0.0066) model time 0.4035 (0.3954) loss 5.8944 (6.3784) grad_norm 8.6423 (4.8674) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][80/625] eta 0:03:43 lr 0.000015 wd 0.0500 time 0.3905 (0.4106) data time 0.0010 (0.0059) model time 0.3895 (0.3952) loss 6.4602 (6.3688) grad_norm 3.6738 (4.6977) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][90/625] eta 0:03:38 lr 0.000015 wd 0.0500 time 0.4015 (0.4091) data time 0.0009 (0.0054) model time 0.4006 (0.3953) loss 5.9949 (6.3561) grad_norm 7.3699 (4.5743) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][100/625] eta 0:03:34 lr 0.000015 wd 0.0500 time 0.3963 (0.4078) data time 0.0009 (0.0049) model time 0.3955 (0.3953) loss 5.4191 (6.3238) grad_norm 3.2197 (4.7872) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][110/625] eta 0:03:29 lr 0.000015 wd 0.0500 time 0.3954 (0.4067) data time 0.0009 (0.0045) model time 0.3945 (0.3953) loss 6.9649 (6.3196) grad_norm 2.5643 (4.6221) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][120/625] eta 0:03:24 lr 0.000015 wd 0.0500 time 0.3953 (0.4059) data time 0.0007 (0.0042) model time 0.3946 (0.3954) loss 6.6041 (6.3131) grad_norm 2.8729 (4.8630) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][130/625] eta 0:03:20 lr 0.000015 wd 0.0500 time 0.3902 (0.4051) data time 0.0007 (0.0040) model time 0.3895 (0.3952) loss 6.6824 (6.3270) grad_norm 2.3702 (4.7382) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][140/625] eta 0:03:16 lr 0.000015 wd 0.0500 time 0.3937 (0.4044) data time 0.0007 (0.0038) model time 0.3930 (0.3952) loss 6.9039 (6.2978) grad_norm 5.6389 (4.9097) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][150/625] eta 0:03:12 lr 0.000015 wd 0.0500 time 0.3958 (0.4052) data time 0.0006 (0.0036) model time 0.3953 (0.3972) loss 6.3346 (6.2849) grad_norm 2.5955 (4.8607) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][160/625] eta 0:03:08 lr 0.000015 wd 0.0500 time 0.3947 (0.4048) data time 0.0008 (0.0034) model time 0.3939 (0.3972) loss 6.3312 (6.2665) grad_norm 3.5795 (5.1717) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:25:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][170/625] eta 0:03:03 lr 0.000015 wd 0.0500 time 0.3995 (0.4043) data time 0.0006 (0.0032) model time 0.3989 (0.3971) loss 6.7775 (6.2746) grad_norm 2.5679 (5.0414) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][180/625] eta 0:02:59 lr 0.000015 wd 0.0500 time 0.3952 (0.4039) data time 0.0007 (0.0031) model time 0.3945 (0.3971) loss 6.1509 (6.2682) grad_norm 2.4191 (5.0214) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][190/625] eta 0:02:55 lr 0.000015 wd 0.0500 time 0.3936 (0.4036) data time 0.0009 (0.0030) model time 0.3927 (0.3970) loss 6.7563 (6.2652) grad_norm 2.6366 (5.0175) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][200/625] eta 0:02:51 lr 0.000015 wd 0.0500 time 0.3957 (0.4033) data time 0.0007 (0.0029) model time 0.3950 (0.3971) loss 7.0453 (6.2580) grad_norm 2.4879 (4.9261) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][210/625] eta 0:02:47 lr 0.000015 wd 0.0500 time 0.3990 (0.4030) data time 0.0006 (0.0028) model time 0.3984 (0.3969) loss 5.8523 (6.2490) grad_norm 7.2354 (4.9385) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][220/625] eta 0:02:43 lr 0.000015 wd 0.0500 time 0.3952 (0.4035) data time 0.0007 (0.0027) model time 0.3945 (0.3980) loss 6.0809 (6.2481) grad_norm 1.9865 (4.9186) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][230/625] eta 0:02:39 lr 0.000015 wd 0.0500 time 0.3958 (0.4032) data time 0.0009 (0.0026) model time 0.3949 (0.3978) loss 6.8197 (6.2425) grad_norm 4.2550 (4.8378) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][240/625] eta 0:02:35 lr 0.000015 wd 0.0500 time 0.3952 (0.4029) data time 0.0008 (0.0025) model time 0.3944 (0.3976) loss 6.5564 (6.2619) grad_norm 9.4622 (4.8412) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][250/625] eta 0:02:30 lr 0.000015 wd 0.0500 time 0.3979 (0.4026) data time 0.0008 (0.0025) model time 0.3971 (0.3975) loss 5.1822 (6.2594) grad_norm 2.4631 (4.7610) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][260/625] eta 0:02:26 lr 0.000015 wd 0.0500 time 0.3964 (0.4024) data time 0.0007 (0.0024) model time 0.3957 (0.3974) loss 5.8355 (6.2691) grad_norm 3.1188 (4.7094) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][270/625] eta 0:02:22 lr 0.000015 wd 0.0500 time 0.3962 (0.4021) data time 0.0006 (0.0024) model time 0.3956 (0.3973) loss 6.6365 (6.2793) grad_norm 2.9434 (4.6326) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][280/625] eta 0:02:18 lr 0.000015 wd 0.0500 time 0.3984 (0.4020) data time 0.0009 (0.0023) model time 0.3975 (0.3973) loss 6.0399 (6.2755) grad_norm 3.1591 (4.6007) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][290/625] eta 0:02:14 lr 0.000015 wd 0.0500 time 0.3967 (0.4018) data time 0.0008 (0.0023) model time 0.3959 (0.3972) loss 6.1225 (6.2755) grad_norm 6.2766 (4.5795) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][300/625] eta 0:02:10 lr 0.000015 wd 0.0500 time 0.3993 (0.4016) data time 0.0009 (0.0022) model time 0.3984 (0.3972) loss 5.7766 (6.2725) grad_norm 2.4821 (4.5279) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][310/625] eta 0:02:06 lr 0.000015 wd 0.0500 time 0.3970 (0.4015) data time 0.0006 (0.0022) model time 0.3964 (0.3972) loss 4.9385 (6.2500) grad_norm 2.2353 (4.5293) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][320/625] eta 0:02:02 lr 0.000015 wd 0.0500 time 0.3956 (0.4014) data time 0.0009 (0.0021) model time 0.3947 (0.3971) loss 5.7667 (6.2563) grad_norm 3.1293 (4.5211) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:26:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][330/625] eta 0:01:58 lr 0.000015 wd 0.0500 time 0.4017 (0.4012) data time 0.0006 (0.0021) model time 0.4011 (0.3971) loss 6.0014 (6.2523) grad_norm 4.1516 (4.5409) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][340/625] eta 0:01:54 lr 0.000015 wd 0.0500 time 0.3954 (0.4011) data time 0.0007 (0.0020) model time 0.3947 (0.3970) loss 6.8305 (6.2531) grad_norm 3.7703 (4.7681) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][350/625] eta 0:01:50 lr 0.000015 wd 0.0500 time 0.3947 (0.4009) data time 0.0006 (0.0020) model time 0.3941 (0.3970) loss 5.6087 (6.2504) grad_norm 2.9178 (4.7195) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][360/625] eta 0:01:46 lr 0.000015 wd 0.0500 time 0.3960 (0.4008) data time 0.0008 (0.0020) model time 0.3952 (0.3969) loss 4.8120 (6.2502) grad_norm 4.0378 (4.6964) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][370/625] eta 0:01:42 lr 0.000015 wd 0.0500 time 0.3970 (0.4010) data time 0.0006 (0.0019) model time 0.3964 (0.3973) loss 6.1037 (6.2583) grad_norm 3.1992 (4.6534) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][380/625] eta 0:01:38 lr 0.000015 wd 0.0500 time 0.3960 (0.4009) data time 0.0009 (0.0019) model time 0.3951 (0.3972) loss 6.2401 (6.2524) grad_norm 3.0076 (4.6102) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][390/625] eta 0:01:34 lr 0.000015 wd 0.0500 time 0.3955 (0.4007) data time 0.0006 (0.0019) model time 0.3949 (0.3971) loss 6.0029 (6.2535) grad_norm 2.8296 (4.6068) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][400/625] eta 0:01:30 lr 0.000015 wd 0.0500 time 0.3926 (0.4006) data time 0.0008 (0.0019) model time 0.3918 (0.3970) loss 5.9311 (6.2528) grad_norm 3.2946 (4.5701) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][410/625] eta 0:01:26 lr 0.000015 wd 0.0500 time 0.3965 (0.4005) data time 0.0007 (0.0018) model time 0.3958 (0.3970) loss 6.9560 (6.2502) grad_norm 2.3903 (4.5362) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][420/625] eta 0:01:22 lr 0.000015 wd 0.0500 time 0.3953 (0.4004) data time 0.0008 (0.0018) model time 0.3945 (0.3969) loss 6.3200 (6.2524) grad_norm 3.8244 (4.5985) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][430/625] eta 0:01:18 lr 0.000015 wd 0.0500 time 0.3940 (0.4003) data time 0.0008 (0.0018) model time 0.3932 (0.3969) loss 5.8953 (6.2551) grad_norm 2.3987 (4.5537) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][440/625] eta 0:01:14 lr 0.000015 wd 0.0500 time 0.3957 (0.4005) data time 0.0009 (0.0018) model time 0.3948 (0.3973) loss 5.2181 (6.2463) grad_norm 2.3439 (4.5082) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][450/625] eta 0:01:10 lr 0.000015 wd 0.0500 time 0.3983 (0.4004) data time 0.0006 (0.0018) model time 0.3977 (0.3972) loss 5.8371 (6.2379) grad_norm 3.3374 (4.4852) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][460/625] eta 0:01:06 lr 0.000015 wd 0.0500 time 0.3955 (0.4003) data time 0.0010 (0.0017) model time 0.3944 (0.3971) loss 6.4493 (6.2359) grad_norm 2.0185 (4.4500) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][470/625] eta 0:01:02 lr 0.000015 wd 0.0500 time 0.3961 (0.4003) data time 0.0007 (0.0017) model time 0.3954 (0.3971) loss 6.2803 (6.2401) grad_norm 4.3158 (4.4237) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:27:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][480/625] eta 0:00:58 lr 0.000015 wd 0.0500 time 0.3971 (0.4002) data time 0.0006 (0.0017) model time 0.3965 (0.3971) loss 5.3395 (6.2421) grad_norm 3.1107 (4.4457) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][490/625] eta 0:00:54 lr 0.000015 wd 0.0500 time 0.3945 (0.4001) data time 0.0007 (0.0017) model time 0.3938 (0.3970) loss 6.0396 (6.2454) grad_norm 6.0203 (4.4281) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][500/625] eta 0:00:50 lr 0.000015 wd 0.0500 time 0.3929 (0.4000) data time 0.0007 (0.0017) model time 0.3922 (0.3970) loss 5.9766 (6.2449) grad_norm 2.1139 (4.4347) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][510/625] eta 0:00:46 lr 0.000015 wd 0.0500 time 0.3975 (0.4000) data time 0.0008 (0.0017) model time 0.3967 (0.3970) loss 6.8414 (6.2507) grad_norm 11.1898 (4.4442) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][520/625] eta 0:00:41 lr 0.000014 wd 0.0500 time 0.3977 (0.3999) data time 0.0006 (0.0016) model time 0.3972 (0.3970) loss 5.2166 (6.2433) grad_norm 3.1117 (4.4243) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][530/625] eta 0:00:37 lr 0.000014 wd 0.0500 time 0.4005 (0.3999) data time 0.0006 (0.0016) model time 0.3999 (0.3970) loss 5.6059 (6.2390) grad_norm 2.8050 (4.4037) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][540/625] eta 0:00:33 lr 0.000014 wd 0.0500 time 0.3944 (0.3998) data time 0.0006 (0.0016) model time 0.3938 (0.3970) loss 6.3779 (6.2442) grad_norm 3.7086 (4.3831) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][550/625] eta 0:00:29 lr 0.000014 wd 0.0500 time 0.3952 (0.3998) data time 0.0009 (0.0016) model time 0.3943 (0.3969) loss 6.3706 (6.2436) grad_norm 6.9890 (4.3781) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][560/625] eta 0:00:25 lr 0.000014 wd 0.0500 time 0.3952 (0.3998) data time 0.0006 (0.0016) model time 0.3945 (0.3970) loss 6.5857 (6.2447) grad_norm 3.1248 (4.3595) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][570/625] eta 0:00:21 lr 0.000014 wd 0.0500 time 0.3957 (0.3997) data time 0.0009 (0.0016) model time 0.3948 (0.3969) loss 6.0895 (6.2425) grad_norm 3.1032 (4.3370) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][580/625] eta 0:00:17 lr 0.000014 wd 0.0500 time 0.3972 (0.3997) data time 0.0008 (0.0016) model time 0.3964 (0.3969) loss 7.4865 (6.2474) grad_norm 2.7687 (4.3797) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][590/625] eta 0:00:13 lr 0.000014 wd 0.0500 time 0.3745 (0.4000) data time 0.0009 (0.0016) model time 0.3736 (0.3973) loss 6.8112 (6.2437) grad_norm 3.3281 (4.4337) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-25 20:28:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][600/625] eta 0:00:09 lr 0.000014 wd 0.0500 time 0.3955 (0.3999) data time 0.0009 (0.0015) model time 0.3946 (0.3973) loss 7.1431 (6.2422) grad_norm 3.6555 (4.4231) loss_scale 64.0000 (32.5324) mem 14939MB [2024-07-25 20:28:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][610/625] eta 0:00:05 lr 0.000014 wd 0.0500 time 0.3981 (0.3999) data time 0.0004 (0.0015) model time 0.3976 (0.3972) loss 6.1270 (6.2428) grad_norm 4.3176 (4.5391) loss_scale 64.0000 (33.0475) mem 14939MB [2024-07-25 20:28:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][620/625] eta 0:00:01 lr 0.000014 wd 0.0500 time 0.3962 (0.3998) data time 0.0007 (0.0015) model time 0.3955 (0.3972) loss 5.9680 (6.2445) grad_norm 8.3009 (4.5697) loss_scale 64.0000 (33.5459) mem 14939MB [2024-07-25 20:28:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 291 training takes 0:04:09 [2024-07-25 20:28:56 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 20:28:57 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 20:28:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5396 (0.5396) Acc@1 90.332 (90.332) Acc@5 98.975 (98.975) Mem 14939MB [2024-07-25 20:28:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.121) Loss 0.8096 (0.6522) Acc@1 82.422 (87.766) Acc@5 97.168 (98.051) Mem 14939MB [2024-07-25 20:28:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.8970 (0.7544) Acc@1 79.004 (84.849) Acc@5 96.094 (97.138) Mem 14939MB [2024-07-25 20:29:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.489 Acc@5 97.115 [2024-07-25 20:29:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.5% [2024-07-25 20:29:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.49% [2024-07-25 20:29:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 20:29:01 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 20:29:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.445 (0.445) Loss 0.5425 (0.5425) Acc@1 90.430 (90.430) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 20:29:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.119) Loss 0.8081 (0.6535) Acc@1 82.812 (87.722) Acc@5 97.314 (98.091) Mem 14939MB [2024-07-25 20:29:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.103) Loss 0.9028 (0.7559) Acc@1 79.346 (84.791) Acc@5 96.045 (97.135) Mem 14939MB [2024-07-25 20:29:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.391 Acc@5 97.097 [2024-07-25 20:29:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 20:29:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][0/625] eta 0:13:13 lr 0.000014 wd 0.0500 time 1.2691 (1.2691) data time 0.4948 (0.4948) model time 0.0000 (0.0000) loss 6.2568 (6.2568) grad_norm 3.2420 (3.2420) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][10/625] eta 0:04:52 lr 0.000014 wd 0.0500 time 0.3962 (0.4758) data time 0.0009 (0.0458) model time 0.0000 (0.0000) loss 6.0450 (6.1491) grad_norm 2.9914 (3.8177) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][20/625] eta 0:04:27 lr 0.000014 wd 0.0500 time 0.3960 (0.4413) data time 0.0009 (0.0244) model time 0.0000 (0.0000) loss 6.0231 (6.2083) grad_norm 2.2216 (3.3953) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][30/625] eta 0:04:17 lr 0.000014 wd 0.0500 time 0.4008 (0.4335) data time 0.0009 (0.0168) model time 0.0000 (0.0000) loss 5.5060 (6.2190) grad_norm 3.3652 (3.5854) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][40/625] eta 0:04:08 lr 0.000014 wd 0.0500 time 0.3966 (0.4247) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 6.7176 (6.2041) grad_norm 3.1201 (3.7296) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][50/625] eta 0:04:01 lr 0.000014 wd 0.0500 time 0.3987 (0.4191) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 6.8253 (6.3075) grad_norm 2.7863 (3.6824) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][60/625] eta 0:03:54 lr 0.000014 wd 0.0500 time 0.4046 (0.4157) data time 0.0006 (0.0090) model time 0.4040 (0.3973) loss 5.8439 (6.3486) grad_norm 2.8256 (6.0382) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][70/625] eta 0:03:49 lr 0.000014 wd 0.0500 time 0.3934 (0.4130) data time 0.0007 (0.0078) model time 0.3928 (0.3963) loss 6.1199 (6.3287) grad_norm 2.2303 (6.4927) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][80/625] eta 0:03:44 lr 0.000014 wd 0.0500 time 0.3958 (0.4114) data time 0.0009 (0.0070) model time 0.3949 (0.3973) loss 6.4726 (6.3213) grad_norm 2.9483 (6.0372) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][90/625] eta 0:03:39 lr 0.000014 wd 0.0500 time 0.4020 (0.4097) data time 0.0006 (0.0063) model time 0.4014 (0.3967) loss 5.2437 (6.2908) grad_norm 4.1135 (5.7995) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][100/625] eta 0:03:34 lr 0.000014 wd 0.0500 time 0.3933 (0.4083) data time 0.0006 (0.0058) model time 0.3927 (0.3964) loss 5.3969 (6.2839) grad_norm 2.8891 (5.5309) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][110/625] eta 0:03:29 lr 0.000014 wd 0.0500 time 0.3935 (0.4072) data time 0.0008 (0.0053) model time 0.3927 (0.3963) loss 6.4912 (6.2856) grad_norm 3.1728 (5.2902) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][120/625] eta 0:03:25 lr 0.000014 wd 0.0500 time 0.3986 (0.4063) data time 0.0008 (0.0049) model time 0.3978 (0.3961) loss 5.6199 (6.2754) grad_norm 4.4374 (5.1481) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:29:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][130/625] eta 0:03:21 lr 0.000014 wd 0.0500 time 0.3932 (0.4070) data time 0.0006 (0.0046) model time 0.3926 (0.3984) loss 7.0877 (6.2651) grad_norm 9.2388 (5.1981) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][140/625] eta 0:03:17 lr 0.000014 wd 0.0500 time 0.3940 (0.4062) data time 0.0007 (0.0044) model time 0.3933 (0.3980) loss 6.0460 (6.2899) grad_norm 16.4358 (5.1710) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][150/625] eta 0:03:12 lr 0.000014 wd 0.0500 time 0.4267 (0.4058) data time 0.0007 (0.0041) model time 0.4260 (0.3981) loss 7.2947 (6.2922) grad_norm 5.5324 (5.1086) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][160/625] eta 0:03:08 lr 0.000014 wd 0.0500 time 0.3943 (0.4052) data time 0.0009 (0.0039) model time 0.3934 (0.3979) loss 6.4636 (6.3031) grad_norm 2.6586 (4.9814) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][170/625] eta 0:03:04 lr 0.000014 wd 0.0500 time 0.3969 (0.4047) data time 0.0006 (0.0038) model time 0.3963 (0.3977) loss 5.4426 (6.3001) grad_norm 2.6218 (4.9183) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][180/625] eta 0:02:59 lr 0.000014 wd 0.0500 time 0.3952 (0.4042) data time 0.0008 (0.0036) model time 0.3944 (0.3975) loss 6.3071 (6.3098) grad_norm 3.9980 (4.8489) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][190/625] eta 0:02:55 lr 0.000014 wd 0.0500 time 0.3956 (0.4038) data time 0.0008 (0.0035) model time 0.3947 (0.3973) loss 5.9403 (6.2909) grad_norm 2.7133 (4.7797) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][200/625] eta 0:02:51 lr 0.000014 wd 0.0500 time 0.3979 (0.4035) data time 0.0006 (0.0033) model time 0.3972 (0.3973) loss 6.8619 (6.2796) grad_norm 2.6655 (4.7346) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][210/625] eta 0:02:47 lr 0.000014 wd 0.0500 time 0.3939 (0.4032) data time 0.0009 (0.0032) model time 0.3930 (0.3972) loss 7.1168 (6.2843) grad_norm 2.8271 (4.6627) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][220/625] eta 0:02:43 lr 0.000014 wd 0.0500 time 0.3949 (0.4029) data time 0.0008 (0.0031) model time 0.3941 (0.3972) loss 7.7194 (6.2859) grad_norm 3.0050 (4.5776) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][230/625] eta 0:02:39 lr 0.000014 wd 0.0500 time 0.3929 (0.4027) data time 0.0008 (0.0030) model time 0.3921 (0.3971) loss 6.6224 (6.3053) grad_norm 3.4859 (4.5597) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][240/625] eta 0:02:34 lr 0.000014 wd 0.0500 time 0.3939 (0.4024) data time 0.0007 (0.0029) model time 0.3932 (0.3971) loss 6.4469 (6.3225) grad_norm 3.9154 (4.5109) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][250/625] eta 0:02:30 lr 0.000014 wd 0.0500 time 0.4038 (0.4023) data time 0.0009 (0.0028) model time 0.4030 (0.3972) loss 5.6307 (6.3166) grad_norm 2.7879 (4.4452) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][260/625] eta 0:02:27 lr 0.000014 wd 0.0500 time 0.3976 (0.4028) data time 0.0006 (0.0028) model time 0.3970 (0.3979) loss 5.5751 (6.3190) grad_norm 4.2309 (4.4003) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][270/625] eta 0:02:22 lr 0.000014 wd 0.0500 time 0.3966 (0.4026) data time 0.0009 (0.0027) model time 0.3957 (0.3979) loss 6.7149 (6.3144) grad_norm 4.0290 (4.3808) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:30:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][280/625] eta 0:02:18 lr 0.000014 wd 0.0500 time 0.3993 (0.4024) data time 0.0007 (0.0026) model time 0.3987 (0.3978) loss 5.6900 (6.3204) grad_norm 4.5139 (4.3470) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][290/625] eta 0:02:14 lr 0.000014 wd 0.0500 time 0.3963 (0.4022) data time 0.0008 (0.0026) model time 0.3955 (0.3977) loss 6.2364 (6.3139) grad_norm 2.5595 (4.3387) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][300/625] eta 0:02:10 lr 0.000014 wd 0.0500 time 0.3984 (0.4020) data time 0.0009 (0.0025) model time 0.3975 (0.3977) loss 5.3121 (6.3146) grad_norm 2.2601 (4.3023) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][310/625] eta 0:02:06 lr 0.000014 wd 0.0500 time 0.3971 (0.4019) data time 0.0009 (0.0025) model time 0.3962 (0.3976) loss 5.4653 (6.3187) grad_norm 3.4014 (4.2749) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][320/625] eta 0:02:02 lr 0.000014 wd 0.0500 time 0.3950 (0.4019) data time 0.0008 (0.0024) model time 0.3941 (0.3977) loss 5.7112 (6.3138) grad_norm 2.3866 (4.2340) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][330/625] eta 0:01:58 lr 0.000014 wd 0.0500 time 0.4061 (0.4017) data time 0.0006 (0.0024) model time 0.4055 (0.3976) loss 7.2234 (6.3197) grad_norm 3.0292 (4.2000) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][340/625] eta 0:01:54 lr 0.000014 wd 0.0500 time 0.3947 (0.4016) data time 0.0008 (0.0023) model time 0.3939 (0.3976) loss 6.5546 (6.3203) grad_norm 3.0773 (4.1864) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][350/625] eta 0:01:50 lr 0.000014 wd 0.0500 time 0.4002 (0.4023) data time 0.0008 (0.0023) model time 0.3994 (0.3985) loss 7.6033 (6.3172) grad_norm 3.3577 (4.1975) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][360/625] eta 0:01:46 lr 0.000014 wd 0.0500 time 0.3980 (0.4021) data time 0.0007 (0.0023) model time 0.3973 (0.3984) loss 5.9346 (6.3081) grad_norm 3.1753 (4.1873) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][370/625] eta 0:01:42 lr 0.000014 wd 0.0500 time 0.3949 (0.4022) data time 0.0006 (0.0022) model time 0.3943 (0.3986) loss 5.4518 (6.2991) grad_norm 2.2433 (4.1737) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][380/625] eta 0:01:38 lr 0.000014 wd 0.0500 time 0.3971 (0.4021) data time 0.0008 (0.0022) model time 0.3963 (0.3986) loss 6.6617 (6.3067) grad_norm 3.4609 (4.1679) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][390/625] eta 0:01:34 lr 0.000014 wd 0.0500 time 0.3999 (0.4020) data time 0.0008 (0.0022) model time 0.3991 (0.3985) loss 6.3729 (6.3031) grad_norm 2.5888 (4.2873) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][400/625] eta 0:01:30 lr 0.000014 wd 0.0500 time 0.3969 (0.4020) data time 0.0007 (0.0022) model time 0.3962 (0.3985) loss 5.3751 (6.2997) grad_norm 2.3570 (4.2566) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][410/625] eta 0:01:26 lr 0.000014 wd 0.0500 time 0.3960 (0.4019) data time 0.0008 (0.0021) model time 0.3951 (0.3985) loss 7.2737 (6.2986) grad_norm 3.0735 (4.2326) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][420/625] eta 0:01:22 lr 0.000014 wd 0.0500 time 0.3980 (0.4018) data time 0.0006 (0.0021) model time 0.3974 (0.3985) loss 6.7443 (6.2944) grad_norm 4.2834 (4.2102) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:31:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][430/625] eta 0:01:18 lr 0.000014 wd 0.0500 time 0.3967 (0.4022) data time 0.0006 (0.0021) model time 0.3961 (0.3990) loss 5.5019 (6.2902) grad_norm 9.5484 (4.2041) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][440/625] eta 0:01:14 lr 0.000014 wd 0.0500 time 0.3956 (0.4021) data time 0.0007 (0.0021) model time 0.3949 (0.3989) loss 5.4906 (6.2808) grad_norm 3.7029 (4.2211) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][450/625] eta 0:01:10 lr 0.000014 wd 0.0500 time 0.4030 (0.4022) data time 0.0010 (0.0020) model time 0.4020 (0.3991) loss 7.0094 (6.2784) grad_norm 6.0673 (4.2623) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][460/625] eta 0:01:06 lr 0.000014 wd 0.0500 time 0.3981 (0.4021) data time 0.0006 (0.0020) model time 0.3974 (0.3990) loss 6.8468 (6.2804) grad_norm 2.3505 (4.3583) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][470/625] eta 0:01:02 lr 0.000014 wd 0.0500 time 0.3983 (0.4022) data time 0.0008 (0.0020) model time 0.3974 (0.3992) loss 6.7265 (6.2887) grad_norm 4.2745 (4.3393) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][480/625] eta 0:00:58 lr 0.000014 wd 0.0500 time 0.3976 (0.4025) data time 0.0006 (0.0020) model time 0.3971 (0.3996) loss 5.1999 (6.2976) grad_norm 6.8897 (4.3332) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][490/625] eta 0:00:54 lr 0.000014 wd 0.0500 time 0.3973 (0.4024) data time 0.0007 (0.0019) model time 0.3966 (0.3995) loss 6.1798 (6.2940) grad_norm 2.3918 (4.3054) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][500/625] eta 0:00:50 lr 0.000014 wd 0.0500 time 0.3949 (0.4023) data time 0.0007 (0.0019) model time 0.3943 (0.3995) loss 6.1532 (6.2900) grad_norm 3.0330 (4.4338) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][510/625] eta 0:00:46 lr 0.000014 wd 0.0500 time 0.3959 (0.4022) data time 0.0006 (0.0019) model time 0.3953 (0.3994) loss 6.0317 (6.2943) grad_norm 3.0453 (4.4268) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][520/625] eta 0:00:42 lr 0.000014 wd 0.0500 time 0.3953 (0.4024) data time 0.0009 (0.0019) model time 0.3944 (0.3996) loss 5.3896 (6.2820) grad_norm 2.3922 (4.4048) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][530/625] eta 0:00:38 lr 0.000014 wd 0.0500 time 0.3947 (0.4023) data time 0.0007 (0.0019) model time 0.3940 (0.3995) loss 5.3454 (6.2715) grad_norm 2.1916 (4.3802) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][540/625] eta 0:00:34 lr 0.000014 wd 0.0500 time 0.3955 (0.4023) data time 0.0007 (0.0019) model time 0.3948 (0.3996) loss 5.3910 (6.2691) grad_norm 2.9727 (4.3533) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][550/625] eta 0:00:30 lr 0.000014 wd 0.0500 time 0.3951 (0.4022) data time 0.0009 (0.0018) model time 0.3943 (0.3995) loss 7.1598 (6.2713) grad_norm 2.2564 (4.3349) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][560/625] eta 0:00:26 lr 0.000014 wd 0.0500 time 0.4094 (0.4023) data time 0.0007 (0.0018) model time 0.4087 (0.3996) loss 6.5269 (6.2757) grad_norm 2.9691 (4.3200) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][570/625] eta 0:00:22 lr 0.000014 wd 0.0500 time 0.3976 (0.4025) data time 0.0007 (0.0018) model time 0.3970 (0.3999) loss 6.9365 (6.2763) grad_norm 4.0565 (4.3310) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:32:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][580/625] eta 0:00:18 lr 0.000014 wd 0.0500 time 0.3926 (0.4025) data time 0.0007 (0.0018) model time 0.3920 (0.3999) loss 5.8603 (6.2829) grad_norm 65.5575 (4.4188) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][590/625] eta 0:00:14 lr 0.000014 wd 0.0500 time 0.4039 (0.4024) data time 0.0007 (0.0018) model time 0.4032 (0.3999) loss 5.9074 (6.2830) grad_norm 3.2942 (4.4171) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][600/625] eta 0:00:10 lr 0.000014 wd 0.0500 time 0.4066 (0.4023) data time 0.0009 (0.0018) model time 0.4057 (0.3998) loss 7.5352 (6.2821) grad_norm 4.7063 (4.3951) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][610/625] eta 0:00:06 lr 0.000014 wd 0.0500 time 0.3938 (0.4022) data time 0.0007 (0.0018) model time 0.3931 (0.3997) loss 6.8742 (6.2826) grad_norm 13.2326 (4.5395) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][620/625] eta 0:00:02 lr 0.000014 wd 0.0500 time 0.3969 (0.4021) data time 0.0005 (0.0017) model time 0.3965 (0.3996) loss 6.3886 (6.2812) grad_norm 2.5676 (4.5125) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 292 training takes 0:04:11 [2024-07-25 20:33:15 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 20:33:15 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 20:33:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5454 (0.5454) Acc@1 90.283 (90.283) Acc@5 99.121 (99.121) Mem 14939MB [2024-07-25 20:33:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.120) Loss 0.8154 (0.6571) Acc@1 82.471 (87.664) Acc@5 97.217 (98.078) Mem 14939MB [2024-07-25 20:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9023 (0.7594) Acc@1 79.834 (84.775) Acc@5 96.143 (97.163) Mem 14939MB [2024-07-25 20:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.441 Acc@5 97.131 [2024-07-25 20:33:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 20:33:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.843 (0.843) Loss 0.5425 (0.5425) Acc@1 90.381 (90.381) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 20:33:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.157) Loss 0.8081 (0.6536) Acc@1 82.861 (87.709) Acc@5 97.314 (98.087) Mem 14939MB [2024-07-25 20:33:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.123) Loss 0.9028 (0.7559) Acc@1 79.346 (84.787) Acc@5 96.045 (97.133) Mem 14939MB [2024-07-25 20:33:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.395 Acc@5 97.097 [2024-07-25 20:33:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 20:33:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.39% [2024-07-25 20:33:21 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 20:33:22 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 20:33:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][0/625] eta 0:08:09 lr 0.000014 wd 0.0500 time 0.7830 (0.7830) data time 0.4104 (0.4104) model time 0.0000 (0.0000) loss 5.4738 (5.4738) grad_norm 4.2502 (4.2502) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][10/625] eta 0:04:25 lr 0.000014 wd 0.0500 time 0.3948 (0.4313) data time 0.0007 (0.0381) model time 0.0000 (0.0000) loss 6.7949 (6.3190) grad_norm 3.5223 (4.5426) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][20/625] eta 0:04:10 lr 0.000014 wd 0.0500 time 0.3948 (0.4146) data time 0.0008 (0.0203) model time 0.0000 (0.0000) loss 6.5576 (6.2082) grad_norm 3.6309 (5.3023) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][30/625] eta 0:04:03 lr 0.000014 wd 0.0500 time 0.3987 (0.4089) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 5.9116 (6.2438) grad_norm 2.4591 (4.5496) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][40/625] eta 0:03:57 lr 0.000014 wd 0.0500 time 0.3954 (0.4057) data time 0.0008 (0.0108) model time 0.0000 (0.0000) loss 6.6843 (6.2419) grad_norm 3.0200 (4.0748) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][50/625] eta 0:03:52 lr 0.000014 wd 0.0500 time 0.3961 (0.4038) data time 0.0006 (0.0089) model time 0.0000 (0.0000) loss 5.5160 (6.2791) grad_norm 2.8580 (3.8900) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][60/625] eta 0:03:47 lr 0.000014 wd 0.0500 time 0.4053 (0.4027) data time 0.0007 (0.0075) model time 0.4047 (0.3962) loss 6.6878 (6.2668) grad_norm 6.5288 (3.8059) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][70/625] eta 0:03:46 lr 0.000014 wd 0.0500 time 0.4004 (0.4074) data time 0.0008 (0.0066) model time 0.3996 (0.4158) loss 5.2491 (6.2290) grad_norm 2.9452 (3.8521) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][80/625] eta 0:03:41 lr 0.000014 wd 0.0500 time 0.3961 (0.4062) data time 0.0009 (0.0059) model time 0.3952 (0.4094) loss 5.8324 (6.2468) grad_norm 2.1290 (3.7675) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:33:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][90/625] eta 0:03:36 lr 0.000014 wd 0.0500 time 0.3988 (0.4052) data time 0.0008 (0.0053) model time 0.3981 (0.4060) loss 5.7647 (6.2034) grad_norm 4.9792 (3.8023) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][100/625] eta 0:03:33 lr 0.000014 wd 0.0500 time 0.3930 (0.4058) data time 0.0007 (0.0049) model time 0.3923 (0.4070) loss 6.9722 (6.1922) grad_norm 5.8889 (3.8007) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][110/625] eta 0:03:28 lr 0.000014 wd 0.0500 time 0.3976 (0.4051) data time 0.0007 (0.0045) model time 0.3969 (0.4053) loss 5.1730 (6.1862) grad_norm 3.9238 (3.7611) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][120/625] eta 0:03:24 lr 0.000014 wd 0.0500 time 0.3964 (0.4044) data time 0.0006 (0.0042) model time 0.3958 (0.4039) loss 6.2336 (6.1764) grad_norm 2.7860 (3.7108) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][130/625] eta 0:03:19 lr 0.000014 wd 0.0500 time 0.3934 (0.4038) data time 0.0006 (0.0039) model time 0.3928 (0.4029) loss 5.5219 (6.2042) grad_norm 3.2589 (3.6806) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][140/625] eta 0:03:15 lr 0.000014 wd 0.0500 time 0.3957 (0.4032) data time 0.0007 (0.0037) model time 0.3950 (0.4021) loss 6.5019 (6.1950) grad_norm 3.4098 (3.6929) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][150/625] eta 0:03:11 lr 0.000014 wd 0.0500 time 0.4025 (0.4028) data time 0.0006 (0.0035) model time 0.4019 (0.4015) loss 6.3367 (6.1964) grad_norm 2.2894 (3.6635) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][160/625] eta 0:03:07 lr 0.000014 wd 0.0500 time 0.3964 (0.4024) data time 0.0006 (0.0034) model time 0.3958 (0.4010) loss 5.8874 (6.2103) grad_norm 5.1699 (3.7002) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][170/625] eta 0:03:02 lr 0.000014 wd 0.0500 time 0.3966 (0.4020) data time 0.0008 (0.0032) model time 0.3957 (0.4005) loss 5.7441 (6.1907) grad_norm 3.0806 (3.9488) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][180/625] eta 0:02:58 lr 0.000014 wd 0.0500 time 0.3980 (0.4017) data time 0.0006 (0.0031) model time 0.3974 (0.4001) loss 6.7202 (6.1784) grad_norm 2.2130 (3.9879) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][190/625] eta 0:02:54 lr 0.000014 wd 0.0500 time 0.3964 (0.4015) data time 0.0008 (0.0030) model time 0.3956 (0.3998) loss 6.1380 (6.1825) grad_norm 2.6316 (3.9407) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][200/625] eta 0:02:50 lr 0.000014 wd 0.0500 time 0.3972 (0.4012) data time 0.0007 (0.0029) model time 0.3965 (0.3995) loss 4.9680 (6.1838) grad_norm 1.9479 (3.8861) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][210/625] eta 0:02:46 lr 0.000014 wd 0.0500 time 0.3980 (0.4010) data time 0.0008 (0.0028) model time 0.3972 (0.3993) loss 5.7794 (6.1770) grad_norm 2.2540 (3.8868) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][220/625] eta 0:02:42 lr 0.000014 wd 0.0500 time 0.3951 (0.4008) data time 0.0008 (0.0027) model time 0.3943 (0.3991) loss 6.8169 (6.1705) grad_norm 4.6806 (3.8828) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][230/625] eta 0:02:38 lr 0.000014 wd 0.0500 time 0.3974 (0.4006) data time 0.0006 (0.0026) model time 0.3968 (0.3989) loss 6.6657 (6.1762) grad_norm 2.9771 (3.9523) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:34:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][240/625] eta 0:02:34 lr 0.000014 wd 0.0500 time 0.3946 (0.4005) data time 0.0009 (0.0025) model time 0.3937 (0.3988) loss 5.4016 (6.1704) grad_norm 2.6826 (3.9403) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][250/625] eta 0:02:30 lr 0.000014 wd 0.0500 time 0.3985 (0.4004) data time 0.0008 (0.0025) model time 0.3977 (0.3987) loss 7.1804 (6.1676) grad_norm 3.2716 (3.9290) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][260/625] eta 0:02:26 lr 0.000014 wd 0.0500 time 0.3966 (0.4002) data time 0.0008 (0.0024) model time 0.3958 (0.3985) loss 5.5932 (6.1789) grad_norm 2.7375 (3.9125) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][270/625] eta 0:02:22 lr 0.000014 wd 0.0500 time 0.3947 (0.4001) data time 0.0009 (0.0024) model time 0.3938 (0.3984) loss 7.2701 (6.1820) grad_norm 2.2000 (3.8936) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][280/625] eta 0:02:17 lr 0.000014 wd 0.0500 time 0.3963 (0.4000) data time 0.0009 (0.0023) model time 0.3954 (0.3983) loss 6.5107 (6.1971) grad_norm 2.9655 (3.8710) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][290/625] eta 0:02:14 lr 0.000014 wd 0.0500 time 0.3970 (0.4015) data time 0.0007 (0.0023) model time 0.3964 (0.4001) loss 6.8157 (6.2057) grad_norm 4.3733 (3.8639) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][300/625] eta 0:02:10 lr 0.000014 wd 0.0500 time 0.3935 (0.4013) data time 0.0007 (0.0022) model time 0.3928 (0.3999) loss 6.3996 (6.2025) grad_norm 2.1308 (3.8691) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][310/625] eta 0:02:06 lr 0.000014 wd 0.0500 time 0.4030 (0.4011) data time 0.0009 (0.0022) model time 0.4021 (0.3998) loss 5.6823 (6.1864) grad_norm 2.4165 (3.9247) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][320/625] eta 0:02:02 lr 0.000014 wd 0.0500 time 0.3964 (0.4016) data time 0.0007 (0.0021) model time 0.3958 (0.4004) loss 6.4795 (6.1928) grad_norm 2.2647 (3.9107) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][330/625] eta 0:01:58 lr 0.000014 wd 0.0500 time 0.3980 (0.4015) data time 0.0006 (0.0021) model time 0.3975 (0.4002) loss 5.9372 (6.1877) grad_norm 2.5373 (3.8790) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][340/625] eta 0:01:54 lr 0.000014 wd 0.0500 time 0.3976 (0.4014) data time 0.0006 (0.0020) model time 0.3970 (0.4001) loss 6.7620 (6.1943) grad_norm 4.1985 (3.8687) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][350/625] eta 0:01:50 lr 0.000014 wd 0.0500 time 0.4008 (0.4013) data time 0.0009 (0.0020) model time 0.3999 (0.4000) loss 6.1877 (6.1936) grad_norm 3.1091 (3.8543) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][360/625] eta 0:01:46 lr 0.000014 wd 0.0500 time 0.3964 (0.4011) data time 0.0008 (0.0020) model time 0.3956 (0.3998) loss 5.5489 (6.1957) grad_norm 2.6252 (3.8478) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][370/625] eta 0:01:42 lr 0.000014 wd 0.0500 time 0.3968 (0.4011) data time 0.0009 (0.0020) model time 0.3959 (0.3998) loss 6.0034 (6.2003) grad_norm 3.1994 (3.8352) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][380/625] eta 0:01:38 lr 0.000014 wd 0.0500 time 0.3941 (0.4010) data time 0.0010 (0.0019) model time 0.3932 (0.3997) loss 7.3128 (6.2087) grad_norm 3.9430 (3.8151) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:35:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][390/625] eta 0:01:34 lr 0.000014 wd 0.0500 time 0.3922 (0.4009) data time 0.0007 (0.0019) model time 0.3916 (0.3996) loss 5.4419 (6.2139) grad_norm 2.3711 (3.8047) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][400/625] eta 0:01:30 lr 0.000014 wd 0.0500 time 0.3973 (0.4008) data time 0.0009 (0.0019) model time 0.3964 (0.3995) loss 6.1671 (6.2187) grad_norm 2.7893 (3.8095) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][410/625] eta 0:01:26 lr 0.000014 wd 0.0500 time 0.3980 (0.4007) data time 0.0007 (0.0018) model time 0.3973 (0.3994) loss 6.1920 (6.2194) grad_norm 3.3633 (3.8091) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][420/625] eta 0:01:22 lr 0.000013 wd 0.0500 time 0.3925 (0.4006) data time 0.0007 (0.0018) model time 0.3919 (0.3993) loss 5.8815 (6.2160) grad_norm 2.5139 (3.8198) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][430/625] eta 0:01:18 lr 0.000013 wd 0.0500 time 0.3950 (0.4005) data time 0.0008 (0.0018) model time 0.3941 (0.3992) loss 5.7754 (6.2214) grad_norm 2.5733 (3.8593) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][440/625] eta 0:01:14 lr 0.000013 wd 0.0500 time 0.3961 (0.4006) data time 0.0006 (0.0018) model time 0.3955 (0.3993) loss 6.9231 (6.2271) grad_norm 2.6017 (3.8387) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][450/625] eta 0:01:10 lr 0.000013 wd 0.0500 time 0.3981 (0.4005) data time 0.0008 (0.0018) model time 0.3974 (0.3992) loss 5.4370 (6.2171) grad_norm 2.5874 (3.8227) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][460/625] eta 0:01:06 lr 0.000013 wd 0.0500 time 0.3975 (0.4004) data time 0.0009 (0.0017) model time 0.3966 (0.3991) loss 5.8051 (6.2153) grad_norm 3.0101 (3.8123) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][470/625] eta 0:01:02 lr 0.000013 wd 0.0500 time 0.3964 (0.4003) data time 0.0008 (0.0017) model time 0.3956 (0.3990) loss 5.2200 (6.2111) grad_norm 4.0547 (3.8249) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][480/625] eta 0:00:58 lr 0.000013 wd 0.0500 time 0.3945 (0.4002) data time 0.0006 (0.0017) model time 0.3939 (0.3990) loss 6.3643 (6.2095) grad_norm 4.5484 (3.8263) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][490/625] eta 0:00:54 lr 0.000013 wd 0.0500 time 0.3970 (0.4002) data time 0.0007 (0.0017) model time 0.3963 (0.3989) loss 6.6613 (6.2142) grad_norm 2.6859 (3.8132) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][500/625] eta 0:00:50 lr 0.000013 wd 0.0500 time 0.3955 (0.4001) data time 0.0007 (0.0017) model time 0.3948 (0.3988) loss 5.6329 (6.2138) grad_norm 3.0020 (3.8279) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][510/625] eta 0:00:46 lr 0.000013 wd 0.0500 time 0.5665 (0.4008) data time 0.0006 (0.0016) model time 0.5659 (0.3996) loss 5.9357 (6.2107) grad_norm 3.1903 (3.8086) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][520/625] eta 0:00:42 lr 0.000013 wd 0.0500 time 0.3957 (0.4007) data time 0.0006 (0.0016) model time 0.3951 (0.3995) loss 6.3379 (6.2131) grad_norm 4.7522 (3.8018) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][530/625] eta 0:00:38 lr 0.000013 wd 0.0500 time 0.3954 (0.4006) data time 0.0006 (0.0016) model time 0.3948 (0.3994) loss 6.3036 (6.2169) grad_norm 5.4905 (3.7991) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:36:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][540/625] eta 0:00:34 lr 0.000013 wd 0.0500 time 0.3926 (0.4009) data time 0.0007 (0.0016) model time 0.3918 (0.3997) loss 5.1911 (6.2130) grad_norm 2.8729 (3.8215) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][550/625] eta 0:00:30 lr 0.000013 wd 0.0500 time 0.3968 (0.4008) data time 0.0006 (0.0016) model time 0.3962 (0.3997) loss 5.7941 (6.2088) grad_norm 2.6455 (3.8080) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][560/625] eta 0:00:26 lr 0.000013 wd 0.0500 time 0.3960 (0.4007) data time 0.0009 (0.0016) model time 0.3951 (0.3996) loss 6.6830 (6.2085) grad_norm 3.7968 (3.7937) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][570/625] eta 0:00:22 lr 0.000013 wd 0.0500 time 0.3965 (0.4008) data time 0.0008 (0.0016) model time 0.3957 (0.3997) loss 6.3566 (6.2093) grad_norm 3.1142 (3.7845) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][580/625] eta 0:00:18 lr 0.000013 wd 0.0500 time 0.3971 (0.4007) data time 0.0009 (0.0015) model time 0.3962 (0.3996) loss 5.6321 (6.2091) grad_norm 3.4101 (3.7949) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][590/625] eta 0:00:14 lr 0.000013 wd 0.0500 time 0.3964 (0.4006) data time 0.0009 (0.0015) model time 0.3955 (0.3995) loss 6.5941 (6.2098) grad_norm 5.5948 (3.8305) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][600/625] eta 0:00:10 lr 0.000013 wd 0.0500 time 0.3977 (0.4006) data time 0.0008 (0.0015) model time 0.3968 (0.3995) loss 6.3277 (6.2107) grad_norm 2.0736 (3.8212) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][610/625] eta 0:00:06 lr 0.000013 wd 0.0500 time 0.3954 (0.4005) data time 0.0006 (0.0015) model time 0.3948 (0.3994) loss 6.4275 (6.2081) grad_norm 3.3225 (3.8527) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][620/625] eta 0:00:02 lr 0.000013 wd 0.0500 time 0.3962 (0.4004) data time 0.0006 (0.0015) model time 0.3956 (0.3993) loss 6.5925 (6.2055) grad_norm 2.7562 (3.8485) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 293 training takes 0:04:10 [2024-07-25 20:37:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 20:37:33 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 20:37:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5479 (0.5479) Acc@1 90.283 (90.283) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 20:37:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8145 (0.6578) Acc@1 82.715 (87.722) Acc@5 97.217 (98.047) Mem 14939MB [2024-07-25 20:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.104) Loss 0.9009 (0.7611) Acc@1 79.590 (84.752) Acc@5 96.094 (97.117) Mem 14939MB [2024-07-25 20:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.397 Acc@5 97.079 [2024-07-25 20:37:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 20:37:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.797 (0.797) Loss 0.5425 (0.5425) Acc@1 90.381 (90.381) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 20:37:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.154) Loss 0.8081 (0.6536) Acc@1 82.861 (87.722) Acc@5 97.314 (98.096) Mem 14939MB [2024-07-25 20:37:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 0.9028 (0.7560) Acc@1 79.248 (84.796) Acc@5 96.045 (97.140) Mem 14939MB [2024-07-25 20:37:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.401 Acc@5 97.103 [2024-07-25 20:37:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 20:37:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.40% [2024-07-25 20:37:38 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 20:37:40 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 20:37:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][0/625] eta 0:07:29 lr 0.000013 wd 0.0500 time 0.7197 (0.7197) data time 0.3472 (0.3472) model time 0.0000 (0.0000) loss 5.6936 (5.6936) grad_norm 3.4637 (3.4637) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][10/625] eta 0:04:21 lr 0.000013 wd 0.0500 time 0.4021 (0.4259) data time 0.0006 (0.0324) model time 0.0000 (0.0000) loss 5.4945 (6.2212) grad_norm 2.4068 (4.6045) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][20/625] eta 0:04:08 lr 0.000013 wd 0.0500 time 0.3940 (0.4112) data time 0.0009 (0.0174) model time 0.0000 (0.0000) loss 5.7120 (6.1481) grad_norm 3.1133 (3.8991) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][30/625] eta 0:04:01 lr 0.000013 wd 0.0500 time 0.4003 (0.4064) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 6.1184 (6.3489) grad_norm 3.5621 (3.7063) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:37:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][40/625] eta 0:03:56 lr 0.000013 wd 0.0500 time 0.3965 (0.4038) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 5.6322 (6.1942) grad_norm 3.2457 (3.7764) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][50/625] eta 0:03:51 lr 0.000013 wd 0.0500 time 0.3945 (0.4023) data time 0.0006 (0.0076) model time 0.0000 (0.0000) loss 6.9242 (6.2035) grad_norm 3.5220 (3.7049) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][60/625] eta 0:03:46 lr 0.000013 wd 0.0500 time 0.3983 (0.4013) data time 0.0008 (0.0065) model time 0.3975 (0.3954) loss 6.6052 (6.1817) grad_norm 5.0700 (3.9661) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][70/625] eta 0:03:42 lr 0.000013 wd 0.0500 time 0.3996 (0.4007) data time 0.0008 (0.0057) model time 0.3988 (0.3959) loss 6.5697 (6.2093) grad_norm 4.9151 (3.9012) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][80/625] eta 0:03:39 lr 0.000013 wd 0.0500 time 0.3936 (0.4023) data time 0.0008 (0.0051) model time 0.3928 (0.4014) loss 5.5080 (6.2469) grad_norm 2.1518 (3.7585) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][90/625] eta 0:03:35 lr 0.000013 wd 0.0500 time 0.3950 (0.4019) data time 0.0008 (0.0046) model time 0.3942 (0.4005) loss 6.9058 (6.2379) grad_norm 2.5728 (3.7440) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][100/625] eta 0:03:31 lr 0.000013 wd 0.0500 time 0.5706 (0.4034) data time 0.0009 (0.0043) model time 0.5697 (0.4037) loss 6.6508 (6.2302) grad_norm 2.7462 (3.7326) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][110/625] eta 0:03:29 lr 0.000013 wd 0.0500 time 0.3890 (0.4062) data time 0.0008 (0.0040) model time 0.3882 (0.4086) loss 6.6564 (6.2594) grad_norm 2.0490 (3.7189) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][120/625] eta 0:03:24 lr 0.000013 wd 0.0500 time 0.3955 (0.4053) data time 0.0006 (0.0037) model time 0.3949 (0.4066) loss 5.2432 (6.2355) grad_norm 2.6493 (3.6991) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][130/625] eta 0:03:20 lr 0.000013 wd 0.0500 time 0.3955 (0.4045) data time 0.0006 (0.0035) model time 0.3949 (0.4051) loss 5.6460 (6.2264) grad_norm 2.2297 (3.7351) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][140/625] eta 0:03:15 lr 0.000013 wd 0.0500 time 0.4022 (0.4040) data time 0.0009 (0.0033) model time 0.4013 (0.4042) loss 6.2711 (6.2491) grad_norm 6.8443 (3.7606) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][150/625] eta 0:03:11 lr 0.000013 wd 0.0500 time 0.4017 (0.4035) data time 0.0008 (0.0031) model time 0.4009 (0.4033) loss 6.4732 (6.2675) grad_norm 2.5077 (3.7890) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][160/625] eta 0:03:07 lr 0.000013 wd 0.0500 time 0.3952 (0.4031) data time 0.0007 (0.0030) model time 0.3944 (0.4026) loss 5.4951 (6.2644) grad_norm 2.7624 (3.9558) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][170/625] eta 0:03:03 lr 0.000013 wd 0.0500 time 0.3955 (0.4028) data time 0.0008 (0.0029) model time 0.3947 (0.4021) loss 6.6175 (6.2558) grad_norm 2.5638 (3.8768) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][180/625] eta 0:02:59 lr 0.000013 wd 0.0500 time 0.4055 (0.4025) data time 0.0006 (0.0028) model time 0.4048 (0.4017) loss 6.4788 (6.2838) grad_norm 2.5464 (3.8329) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:38:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][190/625] eta 0:02:54 lr 0.000013 wd 0.0500 time 0.3930 (0.4021) data time 0.0006 (0.0027) model time 0.3924 (0.4012) loss 6.7585 (6.2941) grad_norm 2.4252 (3.8229) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][200/625] eta 0:02:50 lr 0.000013 wd 0.0500 time 0.3955 (0.4019) data time 0.0008 (0.0026) model time 0.3947 (0.4009) loss 6.1141 (6.2873) grad_norm 3.2408 (3.9255) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][210/625] eta 0:02:46 lr 0.000013 wd 0.0500 time 0.3974 (0.4016) data time 0.0008 (0.0025) model time 0.3966 (0.4006) loss 6.8509 (6.2939) grad_norm 2.5478 (3.8962) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][220/625] eta 0:02:42 lr 0.000013 wd 0.0500 time 0.3942 (0.4015) data time 0.0008 (0.0024) model time 0.3934 (0.4004) loss 7.0776 (6.2921) grad_norm 10.3598 (3.9041) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][230/625] eta 0:02:38 lr 0.000013 wd 0.0500 time 0.3956 (0.4013) data time 0.0007 (0.0023) model time 0.3949 (0.4001) loss 5.4840 (6.2838) grad_norm 2.5789 (3.9206) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][240/625] eta 0:02:34 lr 0.000013 wd 0.0500 time 0.3964 (0.4011) data time 0.0008 (0.0023) model time 0.3956 (0.3999) loss 5.4570 (6.2813) grad_norm 5.1179 (4.0450) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][250/625] eta 0:02:30 lr 0.000013 wd 0.0500 time 0.3987 (0.4009) data time 0.0008 (0.0022) model time 0.3979 (0.3997) loss 6.8098 (6.2757) grad_norm 3.1823 (4.0734) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][260/625] eta 0:02:26 lr 0.000013 wd 0.0500 time 0.3961 (0.4007) data time 0.0008 (0.0022) model time 0.3953 (0.3995) loss 7.2752 (6.2749) grad_norm 2.9547 (4.0682) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][270/625] eta 0:02:22 lr 0.000013 wd 0.0500 time 0.3966 (0.4006) data time 0.0009 (0.0021) model time 0.3958 (0.3994) loss 6.6390 (6.2828) grad_norm 2.6684 (4.0623) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][280/625] eta 0:02:18 lr 0.000013 wd 0.0500 time 0.3980 (0.4005) data time 0.0008 (0.0021) model time 0.3972 (0.3992) loss 7.1319 (6.2791) grad_norm 2.6087 (4.0596) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][290/625] eta 0:02:14 lr 0.000013 wd 0.0500 time 0.3971 (0.4003) data time 0.0006 (0.0020) model time 0.3965 (0.3991) loss 5.8024 (6.2759) grad_norm 2.9701 (4.0275) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][300/625] eta 0:02:10 lr 0.000013 wd 0.0500 time 0.3972 (0.4007) data time 0.0008 (0.0020) model time 0.3964 (0.3995) loss 6.6180 (6.2797) grad_norm 2.9213 (4.0363) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][310/625] eta 0:02:06 lr 0.000013 wd 0.0500 time 0.3939 (0.4005) data time 0.0008 (0.0020) model time 0.3931 (0.3994) loss 6.5761 (6.2811) grad_norm 3.0475 (4.0083) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][320/625] eta 0:02:02 lr 0.000013 wd 0.0500 time 0.3957 (0.4004) data time 0.0009 (0.0019) model time 0.3948 (0.3992) loss 5.9820 (6.2772) grad_norm 3.5650 (4.1440) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][330/625] eta 0:01:58 lr 0.000013 wd 0.0500 time 0.3960 (0.4021) data time 0.0006 (0.0019) model time 0.3954 (0.4012) loss 6.0973 (6.2883) grad_norm 4.6627 (4.1186) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:39:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][340/625] eta 0:01:54 lr 0.000013 wd 0.0500 time 0.3952 (0.4019) data time 0.0006 (0.0019) model time 0.3946 (0.4010) loss 6.8792 (6.2893) grad_norm 29.5703 (4.1788) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][350/625] eta 0:01:50 lr 0.000013 wd 0.0500 time 0.3959 (0.4017) data time 0.0009 (0.0018) model time 0.3950 (0.4008) loss 5.9583 (6.2964) grad_norm 3.4701 (4.1672) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][360/625] eta 0:01:46 lr 0.000013 wd 0.0500 time 0.3962 (0.4016) data time 0.0008 (0.0018) model time 0.3954 (0.4006) loss 5.4802 (6.2918) grad_norm 29.1092 (4.2683) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][370/625] eta 0:01:42 lr 0.000013 wd 0.0500 time 0.3960 (0.4015) data time 0.0008 (0.0018) model time 0.3952 (0.4005) loss 6.0344 (6.2916) grad_norm 3.3640 (4.2724) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][380/625] eta 0:01:38 lr 0.000013 wd 0.0500 time 0.3957 (0.4013) data time 0.0006 (0.0017) model time 0.3951 (0.4003) loss 6.5508 (6.2823) grad_norm 2.3256 (4.2798) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][390/625] eta 0:01:34 lr 0.000013 wd 0.0500 time 0.4015 (0.4012) data time 0.0009 (0.0017) model time 0.4006 (0.4002) loss 6.3514 (6.2861) grad_norm 2.1285 (4.2685) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][400/625] eta 0:01:30 lr 0.000013 wd 0.0500 time 0.3953 (0.4011) data time 0.0006 (0.0017) model time 0.3947 (0.4001) loss 5.7983 (6.2853) grad_norm 4.1251 (4.2702) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][410/625] eta 0:01:26 lr 0.000013 wd 0.0500 time 0.3958 (0.4009) data time 0.0007 (0.0017) model time 0.3951 (0.3999) loss 6.5476 (6.2831) grad_norm 3.4642 (4.2749) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][420/625] eta 0:01:22 lr 0.000013 wd 0.0500 time 0.3946 (0.4008) data time 0.0006 (0.0017) model time 0.3939 (0.3998) loss 6.8002 (6.2767) grad_norm 5.2513 (4.2499) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][430/625] eta 0:01:18 lr 0.000013 wd 0.0500 time 0.3934 (0.4007) data time 0.0008 (0.0016) model time 0.3926 (0.3996) loss 6.6630 (6.2758) grad_norm 2.4415 (4.2347) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][440/625] eta 0:01:14 lr 0.000013 wd 0.0500 time 0.4015 (0.4006) data time 0.0008 (0.0016) model time 0.4007 (0.3995) loss 5.7728 (6.2768) grad_norm 3.4264 (4.2112) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][450/625] eta 0:01:10 lr 0.000013 wd 0.0500 time 0.3967 (0.4005) data time 0.0006 (0.0016) model time 0.3962 (0.3995) loss 7.3963 (6.2749) grad_norm 2.4954 (4.1803) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][460/625] eta 0:01:06 lr 0.000013 wd 0.0500 time 0.3952 (0.4005) data time 0.0008 (0.0016) model time 0.3944 (0.3994) loss 6.0182 (6.2717) grad_norm 2.7377 (4.1759) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][470/625] eta 0:01:02 lr 0.000013 wd 0.0500 time 0.3969 (0.4004) data time 0.0009 (0.0016) model time 0.3960 (0.3993) loss 5.6092 (6.2622) grad_norm 6.4540 (4.1740) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][480/625] eta 0:00:58 lr 0.000013 wd 0.0500 time 0.3976 (0.4003) data time 0.0008 (0.0016) model time 0.3968 (0.3992) loss 5.8511 (6.2588) grad_norm 2.3037 (4.1422) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:40:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][490/625] eta 0:00:54 lr 0.000013 wd 0.0500 time 0.3965 (0.4002) data time 0.0006 (0.0015) model time 0.3959 (0.3992) loss 5.5639 (6.2494) grad_norm 3.8501 (4.1375) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][500/625] eta 0:00:50 lr 0.000013 wd 0.0500 time 0.3954 (0.4002) data time 0.0006 (0.0015) model time 0.3948 (0.3991) loss 6.0291 (6.2533) grad_norm 26.8741 (4.1652) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][510/625] eta 0:00:46 lr 0.000013 wd 0.0500 time 0.3973 (0.4001) data time 0.0008 (0.0015) model time 0.3965 (0.3990) loss 5.5693 (6.2468) grad_norm 1.9179 (4.1506) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][520/625] eta 0:00:42 lr 0.000013 wd 0.0500 time 0.3957 (0.4003) data time 0.0006 (0.0015) model time 0.3950 (0.3992) loss 6.2011 (6.2525) grad_norm 3.7020 (4.1534) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][530/625] eta 0:00:38 lr 0.000013 wd 0.0500 time 0.3963 (0.4002) data time 0.0008 (0.0015) model time 0.3955 (0.3992) loss 6.9678 (6.2492) grad_norm 4.2335 (4.1559) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][540/625] eta 0:00:34 lr 0.000013 wd 0.0500 time 0.4027 (0.4002) data time 0.0006 (0.0015) model time 0.4022 (0.3991) loss 5.8507 (6.2518) grad_norm 2.7102 (4.1330) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][550/625] eta 0:00:30 lr 0.000013 wd 0.0500 time 0.3954 (0.4011) data time 0.0009 (0.0015) model time 0.3945 (0.4001) loss 6.2650 (6.2495) grad_norm 3.0180 (4.1249) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][560/625] eta 0:00:26 lr 0.000013 wd 0.0500 time 0.3957 (0.4010) data time 0.0008 (0.0015) model time 0.3949 (0.4000) loss 6.4813 (6.2538) grad_norm 7.1046 (4.1244) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][570/625] eta 0:00:22 lr 0.000013 wd 0.0500 time 0.3961 (0.4009) data time 0.0008 (0.0014) model time 0.3953 (0.4000) loss 6.2681 (6.2498) grad_norm 2.5418 (4.1049) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][580/625] eta 0:00:18 lr 0.000013 wd 0.0500 time 0.3974 (0.4009) data time 0.0006 (0.0014) model time 0.3967 (0.3999) loss 6.3836 (6.2497) grad_norm 2.6880 (4.0900) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][590/625] eta 0:00:14 lr 0.000013 wd 0.0500 time 0.3955 (0.4008) data time 0.0008 (0.0014) model time 0.3946 (0.3998) loss 6.9967 (6.2471) grad_norm 2.6597 (4.2066) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][600/625] eta 0:00:10 lr 0.000013 wd 0.0500 time 0.3954 (0.4008) data time 0.0008 (0.0014) model time 0.3946 (0.3998) loss 6.6014 (6.2492) grad_norm 2.4861 (4.1851) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][610/625] eta 0:00:06 lr 0.000013 wd 0.0500 time 0.3971 (0.4007) data time 0.0006 (0.0014) model time 0.3965 (0.3997) loss 6.2261 (6.2452) grad_norm 2.5904 (4.1912) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][620/625] eta 0:00:02 lr 0.000013 wd 0.0500 time 0.4002 (0.4007) data time 0.0006 (0.0014) model time 0.3996 (0.3997) loss 6.3801 (6.2509) grad_norm 2.7868 (4.1848) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:41:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 294 training takes 0:04:10 [2024-07-25 20:41:50 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 20:41:55 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 20:41:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.5410 (0.5410) Acc@1 90.527 (90.527) Acc@5 99.072 (99.072) Mem 14939MB [2024-07-25 20:41:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.120) Loss 0.8130 (0.6537) Acc@1 82.666 (87.775) Acc@5 97.070 (98.034) Mem 14939MB [2024-07-25 20:41:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.085 (0.104) Loss 0.8926 (0.7557) Acc@1 79.736 (84.838) Acc@5 96.240 (97.114) Mem 14939MB [2024-07-25 20:41:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.473 Acc@5 97.089 [2024-07-25 20:41:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.5% [2024-07-25 20:41:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.743 (0.743) Loss 0.5425 (0.5425) Acc@1 90.430 (90.430) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-25 20:41:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.153) Loss 0.8081 (0.6535) Acc@1 82.861 (87.740) Acc@5 97.266 (98.082) Mem 14939MB [2024-07-25 20:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.121) Loss 0.9019 (0.7557) Acc@1 79.297 (84.807) Acc@5 96.045 (97.128) Mem 14939MB [2024-07-25 20:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.409 Acc@5 97.093 [2024-07-25 20:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 20:42:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.41% [2024-07-25 20:42:00 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 20:42:01 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 20:42:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][0/625] eta 0:08:26 lr 0.000013 wd 0.0500 time 0.8106 (0.8106) data time 0.4391 (0.4391) model time 0.0000 (0.0000) loss 6.7662 (6.7662) grad_norm 2.6836 (2.6836) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][10/625] eta 0:04:29 lr 0.000013 wd 0.0500 time 0.3971 (0.4379) data time 0.0008 (0.0407) model time 0.0000 (0.0000) loss 6.8633 (6.2858) grad_norm 3.0316 (3.1415) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][20/625] eta 0:04:13 lr 0.000013 wd 0.0500 time 0.3946 (0.4183) data time 0.0009 (0.0217) model time 0.0000 (0.0000) loss 6.3111 (6.1267) grad_norm 2.6525 (6.1878) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][30/625] eta 0:04:06 lr 0.000013 wd 0.0500 time 0.3960 (0.4145) data time 0.0007 (0.0150) model time 0.0000 (0.0000) loss 6.8083 (6.1673) grad_norm 6.0858 (5.4316) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][40/625] eta 0:03:59 lr 0.000013 wd 0.0500 time 0.3962 (0.4100) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 6.5359 (6.1996) grad_norm 2.6163 (5.0581) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][50/625] eta 0:03:55 lr 0.000013 wd 0.0500 time 0.3949 (0.4092) data time 0.0009 (0.0095) model time 0.0000 (0.0000) loss 6.0430 (6.2192) grad_norm 2.9213 (4.6374) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][60/625] eta 0:03:51 lr 0.000013 wd 0.0500 time 0.4097 (0.4101) data time 0.0006 (0.0081) model time 0.4092 (0.4138) loss 6.0502 (6.2184) grad_norm 5.9033 (4.7735) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][70/625] eta 0:03:46 lr 0.000013 wd 0.0500 time 0.3922 (0.4087) data time 0.0008 (0.0070) model time 0.3914 (0.4064) loss 5.7281 (6.1891) grad_norm 3.6291 (4.6068) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][80/625] eta 0:03:41 lr 0.000013 wd 0.0500 time 0.3913 (0.4072) data time 0.0007 (0.0063) model time 0.3906 (0.4029) loss 5.0206 (6.2052) grad_norm 4.4827 (4.6582) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][90/625] eta 0:03:37 lr 0.000013 wd 0.0500 time 0.4271 (0.4070) data time 0.0009 (0.0057) model time 0.4262 (0.4033) loss 7.0815 (6.2037) grad_norm 4.5165 (4.5982) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-25 20:42:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][100/625] eta 0:03:33 lr 0.000013 wd 0.0500 time 0.3995 (0.4063) data time 0.0009 (0.0053) model time 0.3986 (0.4022) loss 5.8227 (6.2116) grad_norm 3.2218 (4.4683) loss_scale 128.0000 (70.3366) mem 14939MB [2024-07-25 20:42:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][110/625] eta 0:03:28 lr 0.000013 wd 0.0500 time 0.3969 (0.4056) data time 0.0009 (0.0049) model time 0.3960 (0.4014) loss 5.7771 (6.2330) grad_norm 3.4470 (4.3977) loss_scale 128.0000 (75.5315) mem 14939MB [2024-07-25 20:42:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][120/625] eta 0:03:24 lr 0.000013 wd 0.0500 time 0.3998 (0.4049) data time 0.0007 (0.0046) model time 0.3991 (0.4007) loss 6.8757 (6.2357) grad_norm 3.3329 (4.4188) loss_scale 128.0000 (79.8678) mem 14939MB [2024-07-25 20:42:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][130/625] eta 0:03:20 lr 0.000013 wd 0.0500 time 0.3958 (0.4046) data time 0.0007 (0.0043) model time 0.3952 (0.4007) loss 6.8536 (6.2265) grad_norm 3.6870 (4.4474) loss_scale 128.0000 (83.5420) mem 14939MB [2024-07-25 20:42:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][140/625] eta 0:03:16 lr 0.000013 wd 0.0500 time 0.3962 (0.4050) data time 0.0007 (0.0041) model time 0.3955 (0.4016) loss 6.2512 (6.2078) grad_norm 4.6356 (4.5541) loss_scale 128.0000 (86.6950) mem 14939MB [2024-07-25 20:43:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][150/625] eta 0:03:12 lr 0.000013 wd 0.0500 time 0.3955 (0.4056) data time 0.0009 (0.0039) model time 0.3946 (0.4028) loss 6.5043 (6.2007) grad_norm 2.8177 (4.5353) loss_scale 128.0000 (89.4305) mem 14939MB [2024-07-25 20:43:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][160/625] eta 0:03:08 lr 0.000013 wd 0.0500 time 0.3931 (0.4051) data time 0.0007 (0.0037) model time 0.3925 (0.4022) loss 6.8240 (6.1927) grad_norm 2.4687 (4.4439) loss_scale 128.0000 (91.8261) mem 14939MB [2024-07-25 20:43:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][170/625] eta 0:03:04 lr 0.000013 wd 0.0500 time 0.3998 (0.4047) data time 0.0007 (0.0035) model time 0.3992 (0.4018) loss 6.3192 (6.1948) grad_norm 34.0764 (4.5448) loss_scale 128.0000 (93.9415) mem 14939MB [2024-07-25 20:43:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][180/625] eta 0:02:59 lr 0.000013 wd 0.0500 time 0.3971 (0.4043) data time 0.0009 (0.0034) model time 0.3962 (0.4014) loss 5.6794 (6.1958) grad_norm 4.6254 (4.4708) loss_scale 128.0000 (95.8232) mem 14939MB [2024-07-25 20:43:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][190/625] eta 0:02:55 lr 0.000013 wd 0.0500 time 0.3948 (0.4040) data time 0.0007 (0.0032) model time 0.3941 (0.4012) loss 6.3408 (6.1726) grad_norm 2.9726 (4.3720) loss_scale 128.0000 (97.5079) mem 14939MB [2024-07-25 20:43:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][200/625] eta 0:02:51 lr 0.000013 wd 0.0500 time 0.3965 (0.4039) data time 0.0007 (0.0032) model time 0.3959 (0.4010) loss 7.2035 (6.1937) grad_norm 3.1976 (4.3051) loss_scale 128.0000 (99.0249) mem 14939MB [2024-07-25 20:43:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][210/625] eta 0:02:47 lr 0.000013 wd 0.0500 time 0.3972 (0.4037) data time 0.0007 (0.0031) model time 0.3965 (0.4008) loss 6.2337 (6.1980) grad_norm 2.2428 (4.2551) loss_scale 128.0000 (100.3981) mem 14939MB [2024-07-25 20:43:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][220/625] eta 0:02:43 lr 0.000013 wd 0.0500 time 0.3940 (0.4034) data time 0.0006 (0.0030) model time 0.3934 (0.4006) loss 5.8790 (6.2011) grad_norm 2.3884 (4.3339) loss_scale 128.0000 (101.6471) mem 14939MB [2024-07-25 20:43:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 20:43:32 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 20:43:32 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 21:17:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 21:17:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 21:17:49 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 21:18:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 21:18:00 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 21:18:00 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 21:18:00 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 21:18:00 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 295) [2024-07-25 21:18:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 21:18:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][230/625] eta 0:13:51 lr 0.000013 wd 0.0500 time 0.3960 (2.1051) data time 0.0007 (0.0861) model time 0.3953 (2.0190) loss 7.5937 (6.6406) grad_norm 3.1722 (3.2228) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][240/625] eta 0:06:39 lr 0.000013 wd 0.0500 time 0.3913 (1.0367) data time 0.0009 (0.0330) model time 0.3905 (1.0037) loss 6.8679 (6.5423) grad_norm 4.1106 (3.5206) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][250/625] eta 0:04:56 lr 0.000013 wd 0.0500 time 0.4074 (0.7909) data time 0.0006 (0.0206) model time 0.4067 (0.7703) loss 6.3870 (6.4399) grad_norm 4.7067 (7.1676) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][260/625] eta 0:04:08 lr 0.000013 wd 0.0500 time 0.3962 (0.6820) data time 0.0010 (0.0152) model time 0.3952 (0.6668) loss 7.0513 (6.5089) grad_norm 2.8604 (6.1441) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][270/625] eta 0:03:40 lr 0.000013 wd 0.0500 time 0.4078 (0.6203) data time 0.0010 (0.0121) model time 0.4068 (0.6082) loss 5.9798 (6.4791) grad_norm 3.7740 (5.6614) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][280/625] eta 0:03:23 lr 0.000013 wd 0.0500 time 0.3897 (0.5889) data time 0.0007 (0.0101) model time 0.3890 (0.5788) loss 7.3642 (6.4744) grad_norm 2.5714 (5.2675) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][290/625] eta 0:03:07 lr 0.000013 wd 0.0500 time 0.3986 (0.5596) data time 0.0008 (0.0087) model time 0.3979 (0.5509) loss 5.5746 (6.4526) grad_norm 2.4680 (5.0131) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][300/625] eta 0:02:55 lr 0.000013 wd 0.0500 time 0.4092 (0.5386) data time 0.0008 (0.0077) model time 0.4084 (0.5309) loss 5.7595 (6.4074) grad_norm 2.2905 (4.8141) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][310/625] eta 0:02:44 lr 0.000013 wd 0.0500 time 0.3978 (0.5222) data time 0.0007 (0.0069) model time 0.3971 (0.5153) loss 5.4275 (6.3641) grad_norm 2.4609 (5.2814) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][320/625] eta 0:02:35 lr 0.000013 wd 0.0500 time 0.3968 (0.5093) data time 0.0007 (0.0063) model time 0.3961 (0.5030) loss 6.3681 (6.3773) grad_norm 4.2073 (5.0559) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:18:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][330/625] eta 0:02:27 lr 0.000013 wd 0.0500 time 0.3986 (0.4988) data time 0.0008 (0.0058) model time 0.3978 (0.4930) loss 6.6001 (6.4170) grad_norm 2.9474 (5.0367) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][340/625] eta 0:02:19 lr 0.000013 wd 0.0500 time 0.4076 (0.4906) data time 0.0006 (0.0054) model time 0.4070 (0.4852) loss 7.0131 (6.3988) grad_norm 3.7842 (4.8461) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][350/625] eta 0:02:12 lr 0.000013 wd 0.0500 time 0.4080 (0.4833) data time 0.0006 (0.0050) model time 0.4074 (0.4783) loss 5.7294 (6.4001) grad_norm 34.6202 (4.9573) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][360/625] eta 0:02:06 lr 0.000013 wd 0.0500 time 0.3963 (0.4776) data time 0.0009 (0.0048) model time 0.3954 (0.4729) loss 5.1022 (6.3902) grad_norm 2.2460 (4.8010) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][370/625] eta 0:02:00 lr 0.000013 wd 0.0500 time 0.3985 (0.4723) data time 0.0007 (0.0045) model time 0.3979 (0.4678) loss 6.0246 (6.3847) grad_norm 2.7201 (4.6816) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][380/625] eta 0:01:54 lr 0.000013 wd 0.0500 time 0.3968 (0.4676) data time 0.0009 (0.0043) model time 0.3959 (0.4633) loss 6.9493 (6.3855) grad_norm 2.6317 (4.6326) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][390/625] eta 0:01:48 lr 0.000013 wd 0.0500 time 0.3994 (0.4637) data time 0.0008 (0.0041) model time 0.3985 (0.4596) loss 6.7161 (6.3786) grad_norm 2.5493 (4.6573) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][400/625] eta 0:01:43 lr 0.000013 wd 0.0500 time 0.3968 (0.4601) data time 0.0007 (0.0039) model time 0.3961 (0.4561) loss 6.4170 (6.3641) grad_norm 3.8417 (4.5750) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][410/625] eta 0:01:38 lr 0.000013 wd 0.0500 time 0.4042 (0.4568) data time 0.0009 (0.0038) model time 0.4033 (0.4531) loss 5.4614 (6.3512) grad_norm 2.0568 (4.5439) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][420/625] eta 0:01:33 lr 0.000013 wd 0.0500 time 0.4008 (0.4539) data time 0.0006 (0.0036) model time 0.4002 (0.4503) loss 5.7286 (6.3471) grad_norm 8.7915 (4.5383) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][430/625] eta 0:01:27 lr 0.000013 wd 0.0500 time 0.3951 (0.4512) data time 0.0007 (0.0035) model time 0.3944 (0.4477) loss 5.4317 (6.3294) grad_norm 2.6596 (4.6937) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][440/625] eta 0:01:23 lr 0.000013 wd 0.0500 time 0.3976 (0.4487) data time 0.0007 (0.0034) model time 0.3968 (0.4453) loss 5.5470 (6.3272) grad_norm 2.9563 (4.6712) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][450/625] eta 0:01:18 lr 0.000013 wd 0.0500 time 0.3973 (0.4465) data time 0.0006 (0.0033) model time 0.3966 (0.4432) loss 6.0838 (6.3293) grad_norm 3.4214 (4.6348) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][460/625] eta 0:01:13 lr 0.000013 wd 0.0500 time 0.3953 (0.4445) data time 0.0006 (0.0032) model time 0.3947 (0.4413) loss 6.1667 (6.3193) grad_norm 4.5219 (4.5858) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][470/625] eta 0:01:08 lr 0.000013 wd 0.0500 time 0.3991 (0.4427) data time 0.0007 (0.0031) model time 0.3984 (0.4396) loss 5.6301 (6.3137) grad_norm 2.1478 (4.5166) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:19:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][480/625] eta 0:01:03 lr 0.000013 wd 0.0500 time 0.3996 (0.4410) data time 0.0008 (0.0030) model time 0.3988 (0.4381) loss 5.4269 (6.3095) grad_norm 3.5587 (4.4871) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:20:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 21:20:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 21:20:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 21:27:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 21:27:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 21:27:42 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 21:28:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 21:28:17 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 21:28:17 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 21:28:17 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 21:28:17 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 295) [2024-07-25 21:28:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 21:28:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][490/625] eta 0:05:42 lr 0.000013 wd 0.0500 time 0.4164 (2.5388) data time 0.0009 (0.1705) model time 0.4155 (2.3683) loss 6.8450 (6.4662) grad_norm 2.2650 (3.0870) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:28:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][500/625] eta 0:02:20 lr 0.000013 wd 0.0500 time 0.4117 (1.1244) data time 0.0010 (0.0576) model time 0.4108 (1.0668) loss 6.9241 (6.4394) grad_norm 4.9325 (7.3242) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:28:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][510/625] eta 0:01:36 lr 0.000013 wd 0.0500 time 0.4175 (0.8420) data time 0.0011 (0.0351) model time 0.4163 (0.8069) loss 7.4831 (6.4370) grad_norm 3.8936 (6.0781) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 21:28:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 21:28:43 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 21:28:47 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 21:35:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 21:35:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 21:44:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 21:44:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 21:45:02 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 21:50:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 21:50:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 22:08:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 22:08:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 22:08:40 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 22:08:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 22:08:53 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 22:08:53 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 22:08:53 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 22:08:53 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 295) [2024-07-25 22:08:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 22:09:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 22:09:15 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 22:09:17 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 22:22:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 22:22:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 22:23:11 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 22:23:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 22:23:24 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 22:23:24 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 22:23:24 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 22:23:24 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 295) [2024-07-25 22:23:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 22:23:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][520/625] eta 0:06:11 lr 0.000013 wd 0.0500 time 0.4126 (3.5334) data time 0.0008 (0.2248) model time 0.4118 (3.3085) loss 5.6105 (6.5613) grad_norm 3.6060 (3.1066) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:23:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][530/625] eta 0:01:48 lr 0.000013 wd 0.0500 time 0.4167 (1.1369) data time 0.0011 (0.0527) model time 0.4157 (1.0842) loss 6.1781 (6.5380) grad_norm 4.9763 (7.9735) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:23:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][540/625] eta 0:01:10 lr 0.000013 wd 0.0500 time 0.4223 (0.8258) data time 0.0008 (0.0303) model time 0.4215 (0.7955) loss 7.0726 (6.5132) grad_norm 13.8984 (6.6695) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:23:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][550/625] eta 0:00:52 lr 0.000013 wd 0.0500 time 0.4156 (0.7025) data time 0.0010 (0.0215) model time 0.4147 (0.6810) loss 6.5573 (6.5280) grad_norm 2.7232 (5.6656) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:23:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][560/625] eta 0:00:41 lr 0.000013 wd 0.0500 time 0.4231 (0.6365) data time 0.0012 (0.0168) model time 0.4219 (0.6198) loss 6.6709 (6.4700) grad_norm 3.6132 (5.1913) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:24:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][570/625] eta 0:00:33 lr 0.000013 wd 0.0500 time 0.4102 (0.6007) data time 0.0011 (0.0139) model time 0.4091 (0.5868) loss 6.3076 (6.4633) grad_norm 3.6109 (4.8821) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:24:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][580/625] eta 0:00:25 lr 0.000013 wd 0.0500 time 0.4225 (0.5768) data time 0.0011 (0.0118) model time 0.4215 (0.5650) loss 5.8448 (6.4099) grad_norm 4.1897 (4.7676) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:24:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][590/625] eta 0:00:19 lr 0.000013 wd 0.0500 time 0.4224 (0.5557) data time 0.0010 (0.0104) model time 0.4214 (0.5454) loss 6.6504 (6.3615) grad_norm 3.9547 (4.6090) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:24:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][600/625] eta 0:00:13 lr 0.000013 wd 0.0500 time 0.4232 (0.5394) data time 0.0010 (0.0093) model time 0.4222 (0.5301) loss 5.0609 (6.3170) grad_norm 3.4353 (4.9865) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:24:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][610/625] eta 0:00:07 lr 0.000013 wd 0.0500 time 0.4222 (0.5266) data time 0.0005 (0.0084) model time 0.4217 (0.5182) loss 7.2146 (6.3447) grad_norm 3.8440 (6.0516) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:24:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][620/625] eta 0:00:02 lr 0.000013 wd 0.0500 time 0.4239 (0.5163) data time 0.0006 (0.0077) model time 0.4234 (0.5086) loss 6.4737 (6.3600) grad_norm 3.0606 (5.7623) loss_scale 128.0000 (128.0000) mem 14931MB [2024-07-25 22:24:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 22:24:23 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 22:24:26 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 22:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 22:40:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 22:40:54 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 22:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 22:41:10 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 22:41:10 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 22:41:10 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 22:41:10 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 295) [2024-07-25 22:41:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 22:50:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 22:50:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 22:51:30 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 22:51:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 22:51:46 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 22:51:46 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 22:51:46 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 22:51:46 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 295) [2024-07-25 22:51:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 22:52:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 295 training takes 0:00:14 [2024-07-25 22:52:05 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 22:52:09 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 22:52:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.560 (0.560) Loss 0.5586 (0.5586) Acc@1 90.332 (90.332) Acc@5 98.926 (98.926) Mem 14934MB [2024-07-25 22:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.081 (0.131) Loss 0.8237 (0.6658) Acc@1 82.471 (87.797) Acc@5 97.217 (98.087) Mem 14934MB [2024-07-25 22:52:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.081 (0.107) Loss 0.9072 (0.7682) Acc@1 79.590 (84.891) Acc@5 96.045 (97.131) Mem 14934MB [2024-07-25 22:52:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.517 Acc@5 97.105 [2024-07-25 22:52:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.5% [2024-07-25 22:52:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 84.52% [2024-07-25 22:52:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saving...... [2024-07-25 22:52:16 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt.pth saved !!! [2024-07-25 22:52:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.550 (0.550) Loss 0.5430 (0.5430) Acc@1 90.430 (90.430) Acc@5 99.023 (99.023) Mem 14934MB [2024-07-25 22:52:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.082 (0.128) Loss 0.8086 (0.6538) Acc@1 82.861 (87.762) Acc@5 97.266 (98.082) Mem 14934MB [2024-07-25 22:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.081 (0.106) Loss 0.9019 (0.7560) Acc@1 79.346 (84.814) Acc@5 96.045 (97.121) Mem 14934MB [2024-07-25 22:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.415 Acc@5 97.087 [2024-07-25 22:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-25 22:52:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.41% [2024-07-25 22:52:18 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-25 22:52:19 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-25 22:52:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][0/625] eta 0:22:31 lr 0.000013 wd 0.0500 time 2.1627 (2.1627) data time 0.4467 (0.4467) model time 0.0000 (0.0000) loss 5.7589 (5.7589) grad_norm 3.8448 (3.8448) loss_scale 128.0000 (128.0000) mem 14938MB [2024-07-25 22:52:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][10/625] eta 0:05:45 lr 0.000013 wd 0.0500 time 0.3886 (0.5615) data time 0.0010 (0.0415) model time 0.0000 (0.0000) loss 7.1210 (6.1869) grad_norm 2.9830 (3.6576) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:52:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][20/625] eta 0:04:57 lr 0.000013 wd 0.0500 time 0.3936 (0.4911) data time 0.0008 (0.0222) model time 0.0000 (0.0000) loss 6.5719 (6.2618) grad_norm 3.4358 (3.6063) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:52:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][30/625] eta 0:04:32 lr 0.000013 wd 0.0500 time 0.3869 (0.4584) data time 0.0008 (0.0153) model time 0.0000 (0.0000) loss 6.7016 (6.3532) grad_norm 2.9726 (3.4044) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:52:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][40/625] eta 0:04:19 lr 0.000013 wd 0.0500 time 0.3828 (0.4428) data time 0.0011 (0.0119) model time 0.0000 (0.0000) loss 6.8016 (6.3176) grad_norm 3.6649 (4.0716) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:52:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][50/625] eta 0:04:13 lr 0.000013 wd 0.0500 time 0.3808 (0.4400) data time 0.0010 (0.0097) model time 0.0000 (0.0000) loss 6.9240 (6.2941) grad_norm 4.3809 (3.9695) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:52:46 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][60/625] eta 0:04:07 lr 0.000013 wd 0.0500 time 0.3895 (0.4384) data time 0.0010 (0.0083) model time 0.3884 (0.4290) loss 5.6699 (6.2850) grad_norm 3.4108 (3.8628) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:52:50 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][70/625] eta 0:04:01 lr 0.000013 wd 0.0500 time 0.3843 (0.4345) data time 0.0009 (0.0073) model time 0.3834 (0.4195) loss 6.4571 (6.2605) grad_norm 3.6636 (3.7483) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:52:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][80/625] eta 0:03:53 lr 0.000013 wd 0.0500 time 0.3844 (0.4290) data time 0.0008 (0.0065) model time 0.3836 (0.4092) loss 5.4732 (6.2595) grad_norm 6.7241 (3.7635) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:52:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][90/625] eta 0:03:48 lr 0.000013 wd 0.0500 time 0.3915 (0.4267) data time 0.0008 (0.0059) model time 0.3907 (0.4087) loss 7.1526 (6.2546) grad_norm 3.0905 (3.6978) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][100/625] eta 0:03:43 lr 0.000013 wd 0.0500 time 0.3879 (0.4251) data time 0.0007 (0.0054) model time 0.3871 (0.4090) loss 7.2355 (6.2624) grad_norm 2.5965 (3.7266) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][110/625] eta 0:03:37 lr 0.000013 wd 0.0500 time 0.3890 (0.4228) data time 0.0010 (0.0050) model time 0.3880 (0.4072) loss 5.8168 (6.2218) grad_norm 2.2693 (3.7149) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][120/625] eta 0:03:33 lr 0.000013 wd 0.0500 time 0.3939 (0.4227) data time 0.0011 (0.0047) model time 0.3928 (0.4091) loss 6.5977 (6.2332) grad_norm 2.4428 (3.7413) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][130/625] eta 0:03:28 lr 0.000013 wd 0.0500 time 0.3854 (0.4202) data time 0.0010 (0.0044) model time 0.3844 (0.4066) loss 6.4630 (6.2586) grad_norm 3.7279 (3.6724) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][140/625] eta 0:03:23 lr 0.000013 wd 0.0500 time 0.4502 (0.4192) data time 0.0012 (0.0042) model time 0.4490 (0.4064) loss 7.2062 (6.2472) grad_norm 1.9038 (3.6678) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][150/625] eta 0:03:18 lr 0.000013 wd 0.0500 time 0.4549 (0.4184) data time 0.0007 (0.0039) model time 0.4542 (0.4063) loss 5.8963 (6.2466) grad_norm 3.1101 (3.6815) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][160/625] eta 0:03:14 lr 0.000013 wd 0.0500 time 0.3844 (0.4176) data time 0.0008 (0.0038) model time 0.3836 (0.4062) loss 5.2825 (6.2404) grad_norm 2.4956 (3.7098) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][170/625] eta 0:03:09 lr 0.000013 wd 0.0500 time 0.3880 (0.4175) data time 0.0010 (0.0036) model time 0.3870 (0.4069) loss 6.9758 (6.2501) grad_norm 2.0514 (3.7131) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][180/625] eta 0:03:05 lr 0.000013 wd 0.0500 time 0.3859 (0.4161) data time 0.0010 (0.0035) model time 0.3849 (0.4057) loss 6.5432 (6.2444) grad_norm 2.2718 (3.7424) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][190/625] eta 0:03:01 lr 0.000013 wd 0.0500 time 0.4895 (0.4170) data time 0.0008 (0.0033) model time 0.4887 (0.4076) loss 6.9588 (6.2397) grad_norm 3.3030 (3.7472) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][200/625] eta 0:02:56 lr 0.000013 wd 0.0500 time 0.4876 (0.4162) data time 0.0008 (0.0032) model time 0.4869 (0.4071) loss 5.0273 (6.2189) grad_norm 4.0546 (3.7608) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][210/625] eta 0:02:52 lr 0.000013 wd 0.0500 time 0.3890 (0.4151) data time 0.0010 (0.0031) model time 0.3880 (0.4061) loss 5.9085 (6.2087) grad_norm 5.8470 (3.7497) loss_scale 128.0000 (128.0000) mem 14939MB [2024-07-25 22:53:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][220/625] eta 0:02:48 lr 0.000012 wd 0.0500 time 0.3917 (0.4151) data time 0.0009 (0.0030) model time 0.3908 (0.4067) loss 6.2679 (6.2064) grad_norm 2.9750 (inf) loss_scale 64.0000 (127.1312) mem 14939MB [2024-07-25 22:53:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-25 22:53:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 22:53:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 23:19:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 23:19:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 23:20:01 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 23:35:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 23:35:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 23:36:03 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 23:36:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 23:36:20 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 23:36:21 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 23:36:21 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 23:36:21 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 296) [2024-07-25 23:36:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 23:38:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-25 23:38:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-25 23:38:42 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-25 23:38:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-25 23:38:54 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-25 23:38:55 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-25 23:38:55 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-25 23:38:55 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 296) [2024-07-25 23:38:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-25 23:39:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][230/625] eta 0:15:17 lr 0.000012 wd 0.0500 time 0.4148 (2.3237) data time 0.0009 (0.1357) model time 0.4140 (2.1880) loss 7.1392 (6.6751) grad_norm 3.1011 (10.6464) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][240/625] eta 0:08:06 lr 0.000012 wd 0.0500 time 0.4193 (1.2626) data time 0.0009 (0.0610) model time 0.4184 (1.2016) loss 5.5645 (6.4289) grad_norm 2.4736 (7.0617) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][250/625] eta 0:05:59 lr 0.000012 wd 0.0500 time 0.4129 (0.9597) data time 0.0011 (0.0396) model time 0.4118 (0.9201) loss 6.8287 (6.5070) grad_norm 4.7888 (5.6392) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][260/625] eta 0:05:02 lr 0.000012 wd 0.0500 time 0.4079 (0.8276) data time 0.0013 (0.0295) model time 0.4066 (0.7980) loss 6.2915 (6.4653) grad_norm 3.3876 (5.2966) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][270/625] eta 0:04:23 lr 0.000012 wd 0.0500 time 0.4138 (0.7418) data time 0.0009 (0.0236) model time 0.4129 (0.7182) loss 6.5759 (6.4315) grad_norm 5.4965 (4.8747) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][280/625] eta 0:03:59 lr 0.000012 wd 0.0500 time 0.4048 (0.6938) data time 0.0009 (0.0197) model time 0.4040 (0.6741) loss 5.4520 (6.4072) grad_norm 2.4449 (5.5904) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][290/625] eta 0:03:38 lr 0.000012 wd 0.0500 time 0.4100 (0.6524) data time 0.0009 (0.0170) model time 0.4090 (0.6355) loss 5.6018 (6.3793) grad_norm 2.6681 (5.2617) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][300/625] eta 0:03:22 lr 0.000012 wd 0.0500 time 0.4178 (0.6221) data time 0.0009 (0.0150) model time 0.4169 (0.6071) loss 5.4892 (6.3695) grad_norm 3.1703 (5.0060) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][310/625] eta 0:03:08 lr 0.000012 wd 0.0500 time 0.4079 (0.5985) data time 0.0011 (0.0134) model time 0.4068 (0.5850) loss 6.7267 (6.3313) grad_norm 2.6926 (4.9267) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:39:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][320/625] eta 0:03:04 lr 0.000012 wd 0.0500 time 0.4145 (0.6057) data time 0.0008 (0.0122) model time 0.4136 (0.5936) loss 6.6381 (6.3274) grad_norm 2.2819 (4.7282) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][330/625] eta 0:02:53 lr 0.000012 wd 0.0500 time 0.4122 (0.5883) data time 0.0010 (0.0111) model time 0.4112 (0.5772) loss 5.7730 (6.3242) grad_norm 2.0261 (4.5639) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][340/625] eta 0:02:43 lr 0.000012 wd 0.0500 time 0.4109 (0.5735) data time 0.0011 (0.0103) model time 0.4098 (0.5633) loss 6.3556 (6.3167) grad_norm 4.0414 (4.4435) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][350/625] eta 0:02:35 lr 0.000012 wd 0.0500 time 0.9837 (0.5657) data time 0.0009 (0.0096) model time 0.9828 (0.5561) loss 6.2082 (6.2863) grad_norm 4.2070 (4.3937) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][360/625] eta 0:02:27 lr 0.000012 wd 0.0500 time 0.4260 (0.5551) data time 0.0012 (0.0090) model time 0.4247 (0.5462) loss 5.9206 (6.2908) grad_norm 2.3525 (4.3248) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][370/625] eta 0:02:19 lr 0.000012 wd 0.0500 time 0.4122 (0.5458) data time 0.0008 (0.0084) model time 0.4114 (0.5373) loss 6.1948 (6.2829) grad_norm 3.1199 (4.2344) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][380/625] eta 0:02:11 lr 0.000012 wd 0.0500 time 0.4146 (0.5374) data time 0.0011 (0.0080) model time 0.4135 (0.5294) loss 5.1633 (6.2827) grad_norm 4.3306 (4.3009) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][390/625] eta 0:02:04 lr 0.000012 wd 0.0500 time 0.4166 (0.5304) data time 0.0011 (0.0076) model time 0.4154 (0.5228) loss 7.2740 (6.3187) grad_norm 3.0406 (4.2435) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:33 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][400/625] eta 0:01:57 lr 0.000012 wd 0.0500 time 0.4138 (0.5240) data time 0.0008 (0.0072) model time 0.4130 (0.5168) loss 5.1927 (6.2999) grad_norm 3.1158 (4.2902) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][410/625] eta 0:01:51 lr 0.000012 wd 0.0500 time 0.4171 (0.5184) data time 0.0009 (0.0069) model time 0.4162 (0.5115) loss 6.5944 (6.2966) grad_norm 2.5959 (4.3107) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][420/625] eta 0:01:45 lr 0.000012 wd 0.0500 time 0.4120 (0.5133) data time 0.0011 (0.0066) model time 0.4108 (0.5066) loss 4.8777 (6.2846) grad_norm 3.6257 (4.2741) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][430/625] eta 0:01:39 lr 0.000012 wd 0.0500 time 0.4129 (0.5086) data time 0.0012 (0.0064) model time 0.4117 (0.5022) loss 6.6959 (6.2764) grad_norm 2.7506 (4.2232) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][440/625] eta 0:01:35 lr 0.000012 wd 0.0500 time 0.4140 (0.5149) data time 0.0011 (0.0162) model time 0.4128 (0.4987) loss 6.4853 (6.2709) grad_norm 3.0944 (4.2036) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:40:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][450/625] eta 0:01:30 lr 0.000012 wd 0.0500 time 0.4243 (0.5148) data time 0.0011 (0.0156) model time 0.4232 (0.4992) loss 6.6384 (6.2783) grad_norm 3.9177 (4.1870) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][460/625] eta 0:01:25 lr 0.000012 wd 0.0500 time 0.4147 (0.5175) data time 0.0009 (0.0149) model time 0.4137 (0.5026) loss 6.4477 (6.2657) grad_norm 4.1323 (4.1759) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][470/625] eta 0:01:19 lr 0.000012 wd 0.0500 time 0.4113 (0.5136) data time 0.0011 (0.0144) model time 0.4103 (0.4992) loss 5.4240 (6.2601) grad_norm 2.8044 (4.1572) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][480/625] eta 0:01:15 lr 0.000012 wd 0.0500 time 0.4299 (0.5177) data time 0.0014 (0.0139) model time 0.4285 (0.5038) loss 5.9515 (6.2438) grad_norm 2.9301 (4.1231) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:17 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][490/625] eta 0:01:09 lr 0.000012 wd 0.0500 time 0.4198 (0.5140) data time 0.0008 (0.0134) model time 0.4190 (0.5006) loss 7.0604 (6.2319) grad_norm 3.1487 (4.1022) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][500/625] eta 0:01:03 lr 0.000012 wd 0.0500 time 0.4056 (0.5119) data time 0.0011 (0.0130) model time 0.4045 (0.4990) loss 5.5501 (6.2379) grad_norm 3.0924 (4.0758) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][510/625] eta 0:00:58 lr 0.000012 wd 0.0500 time 0.4091 (0.5086) data time 0.0009 (0.0126) model time 0.4082 (0.4961) loss 6.6484 (6.2485) grad_norm 2.7395 (4.0501) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][520/625] eta 0:00:53 lr 0.000012 wd 0.0500 time 0.4150 (0.5056) data time 0.0010 (0.0122) model time 0.4140 (0.4934) loss 5.8778 (6.2263) grad_norm 2.6850 (4.1147) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][530/625] eta 0:00:47 lr 0.000012 wd 0.0500 time 0.4139 (0.5027) data time 0.0008 (0.0118) model time 0.4131 (0.4909) loss 5.7524 (6.2204) grad_norm 4.7851 (4.0931) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][540/625] eta 0:00:42 lr 0.000012 wd 0.0500 time 0.4253 (0.5000) data time 0.0011 (0.0115) model time 0.4242 (0.4885) loss 5.9296 (6.2376) grad_norm 3.2438 (4.0621) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][550/625] eta 0:00:37 lr 0.000012 wd 0.0500 time 0.4219 (0.5016) data time 0.0008 (0.0112) model time 0.4211 (0.4905) loss 6.8866 (6.2521) grad_norm 5.7703 (4.0867) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][560/625] eta 0:00:32 lr 0.000012 wd 0.0500 time 0.4107 (0.4991) data time 0.0011 (0.0109) model time 0.4096 (0.4882) loss 6.4015 (6.2529) grad_norm 3.2559 (4.0648) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][570/625] eta 0:00:27 lr 0.000012 wd 0.0500 time 0.4149 (0.4968) data time 0.0008 (0.0106) model time 0.4141 (0.4863) loss 6.1975 (6.2501) grad_norm 3.4479 (4.0691) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:41:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][580/625] eta 0:00:22 lr 0.000012 wd 0.0500 time 0.4098 (0.4946) data time 0.0013 (0.0103) model time 0.4086 (0.4843) loss 6.2847 (6.2458) grad_norm 5.3435 (4.0697) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:42:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][590/625] eta 0:00:17 lr 0.000012 wd 0.0500 time 0.4179 (0.4925) data time 0.0009 (0.0101) model time 0.4170 (0.4824) loss 5.5021 (6.2401) grad_norm 3.7664 (4.0594) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:42:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][600/625] eta 0:00:12 lr 0.000012 wd 0.0500 time 0.4370 (0.4906) data time 0.0010 (0.0098) model time 0.4360 (0.4807) loss 6.4000 (6.2447) grad_norm 3.0634 (4.2694) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:42:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][610/625] eta 0:00:07 lr 0.000012 wd 0.0500 time 0.4149 (0.4886) data time 0.0006 (0.0096) model time 0.4143 (0.4790) loss 5.3528 (6.2398) grad_norm 3.4907 (4.2549) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:42:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][620/625] eta 0:00:02 lr 0.000012 wd 0.0500 time 0.4108 (0.4868) data time 0.0006 (0.0094) model time 0.4103 (0.4773) loss 6.8277 (6.2471) grad_norm 4.0351 (4.2421) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-25 23:42:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 296 training takes 0:03:15 [2024-07-25 23:42:15 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-25 23:42:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-25 23:42:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.310 (1.310) Loss 0.5483 (0.5483) Acc@1 90.479 (90.479) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-25 23:42:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.203) Loss 0.8086 (0.6572) Acc@1 82.959 (87.811) Acc@5 97.217 (98.047) Mem 14931MB [2024-07-25 23:42:21 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.147) Loss 0.9038 (0.7600) Acc@1 79.541 (84.852) Acc@5 96.143 (97.133) Mem 14931MB [2024-07-25 23:42:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.463 Acc@5 97.097 [2024-07-25 23:42:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.5% [2024-07-25 23:42:25 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.966 (0.966) Loss 0.5430 (0.5430) Acc@1 90.479 (90.479) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-25 23:42:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.174) Loss 0.8091 (0.6539) Acc@1 82.861 (87.762) Acc@5 97.217 (98.078) Mem 14931MB [2024-07-27 10:54:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-27 10:54:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-27 10:55:18 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-27 10:55:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-27 10:55:27 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-27 10:55:28 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-27 10:55:28 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-27 10:55:28 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 296) [2024-07-27 10:55:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-27 10:55:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][0/625] eta 1:21:14 lr 0.000012 wd 0.0500 time 7.7993 (7.7993) data time 0.6852 (0.6852) model time 0.0000 (0.0000) loss 6.9122 (6.9122) grad_norm 7.1027 (7.1027) loss_scale 64.0000 (64.0000) mem 16181MB [2024-07-27 10:55:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-27 10:55:40 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-27 10:55:45 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-27 16:25:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-27 16:25:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-27 16:25:08 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-27 16:25:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-27 16:25:43 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-27 16:25:43 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-27 16:25:43 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-27 16:25:43 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 297) [2024-07-27 16:25:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-27 16:25:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][10/625] eta 0:13:18 lr 0.000012 wd 0.0500 time 0.3925 (1.2990) data time 0.0009 (0.0533) model time 0.0000 (0.0000) loss 6.7312 (6.6787) grad_norm 2.2325 (3.7127) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 16:26:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][20/625] eta 0:08:32 lr 0.000012 wd 0.0500 time 0.3929 (0.8467) data time 0.0006 (0.0271) model time 0.0000 (0.0000) loss 6.1681 (6.4989) grad_norm 2.3935 (4.3327) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 16:26:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][30/625] eta 0:06:53 lr 0.000012 wd 0.0500 time 0.3936 (0.6956) data time 0.0008 (0.0183) model time 0.0000 (0.0000) loss 6.5863 (6.5805) grad_norm 2.9042 (4.7530) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 16:26:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][40/625] eta 0:06:02 lr 0.000012 wd 0.0500 time 0.3933 (0.6201) data time 0.0006 (0.0139) model time 0.0000 (0.0000) loss 5.8785 (6.4897) grad_norm 3.5292 (4.7689) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 16:26:13 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-27 16:26:14 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-27 16:26:18 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-27 18:02:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-27 18:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-27 18:03:09 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-27 18:03:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-27 18:03:29 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-27 18:03:29 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-27 18:03:29 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-27 18:03:29 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 297) [2024-07-27 18:03:29 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-27 18:03:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][50/625] eta 0:25:35 lr 0.000012 wd 0.0500 time 0.3920 (2.6705) data time 0.0007 (0.1434) model time 0.0000 (0.0000) loss 6.4368 (6.4570) grad_norm 3.8492 (3.1459) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:03:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][60/625] eta 0:09:50 lr 0.000012 wd 0.0500 time 0.3946 (1.0447) data time 0.0006 (0.0416) model time 0.3940 (0.3936) loss 6.4524 (6.4929) grad_norm 3.0387 (18.2794) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:03:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][70/625] eta 0:07:09 lr 0.000012 wd 0.0500 time 0.3904 (0.7738) data time 0.0007 (0.0246) model time 0.3897 (0.3936) loss 6.6000 (6.4991) grad_norm 2.7000 (12.0390) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:03:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][80/625] eta 0:06:01 lr 0.000012 wd 0.0500 time 0.3910 (0.6624) data time 0.0007 (0.0176) model time 0.3903 (0.3939) loss 6.1244 (6.4620) grad_norm 3.9703 (9.4369) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][90/625] eta 0:05:21 lr 0.000012 wd 0.0500 time 0.3945 (0.6011) data time 0.0006 (0.0138) model time 0.3939 (0.3933) loss 6.2381 (6.4147) grad_norm 3.6151 (7.9704) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][100/625] eta 0:04:57 lr 0.000012 wd 0.0500 time 0.3934 (0.5660) data time 0.0006 (0.0114) model time 0.3928 (0.3968) loss 7.0661 (6.4155) grad_norm 2.3547 (7.0886) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][110/625] eta 0:04:39 lr 0.000012 wd 0.0500 time 0.3904 (0.5432) data time 0.0006 (0.0097) model time 0.3897 (0.4006) loss 5.8815 (6.3527) grad_norm 4.0847 (6.5081) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][120/625] eta 0:04:24 lr 0.000012 wd 0.0500 time 0.3923 (0.5232) data time 0.0007 (0.0085) model time 0.3916 (0.3996) loss 6.4654 (6.3336) grad_norm 5.3737 (6.2255) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][130/625] eta 0:04:11 lr 0.000012 wd 0.0500 time 0.3939 (0.5078) data time 0.0008 (0.0076) model time 0.3932 (0.3988) loss 6.4984 (6.3079) grad_norm 2.4996 (6.1261) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][140/625] eta 0:04:00 lr 0.000012 wd 0.0500 time 0.3935 (0.4957) data time 0.0008 (0.0069) model time 0.3927 (0.3982) loss 5.9650 (6.2803) grad_norm 4.7584 (5.9299) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][150/625] eta 0:03:50 lr 0.000012 wd 0.0500 time 0.3927 (0.4862) data time 0.0008 (0.0063) model time 0.3919 (0.3980) loss 5.6645 (6.3031) grad_norm 3.6910 (5.6912) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][160/625] eta 0:03:42 lr 0.000012 wd 0.0500 time 0.3943 (0.4783) data time 0.0008 (0.0058) model time 0.3935 (0.3977) loss 6.7273 (6.2827) grad_norm 2.5402 (5.4681) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][170/625] eta 0:03:34 lr 0.000012 wd 0.0500 time 0.3961 (0.4715) data time 0.0008 (0.0054) model time 0.3954 (0.3974) loss 6.3175 (6.2990) grad_norm 10.0253 (5.3659) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][180/625] eta 0:03:27 lr 0.000012 wd 0.0500 time 0.3955 (0.4660) data time 0.0008 (0.0051) model time 0.3947 (0.3973) loss 6.8515 (6.3000) grad_norm 2.7646 (5.2632) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][190/625] eta 0:03:20 lr 0.000012 wd 0.0500 time 0.3946 (0.4611) data time 0.0006 (0.0048) model time 0.3940 (0.3971) loss 5.8448 (6.2832) grad_norm 3.0572 (5.1367) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][200/625] eta 0:03:14 lr 0.000012 wd 0.0500 time 0.3984 (0.4570) data time 0.0006 (0.0045) model time 0.3977 (0.3971) loss 5.8403 (6.2717) grad_norm 4.5286 (5.0508) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][210/625] eta 0:03:08 lr 0.000012 wd 0.0500 time 0.3933 (0.4533) data time 0.0007 (0.0043) model time 0.3926 (0.3970) loss 5.6820 (6.2705) grad_norm 3.8537 (4.9805) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][220/625] eta 0:03:02 lr 0.000012 wd 0.0500 time 0.3926 (0.4502) data time 0.0008 (0.0041) model time 0.3918 (0.3971) loss 5.4711 (6.2713) grad_norm 2.9890 (4.9026) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][230/625] eta 0:02:56 lr 0.000012 wd 0.0500 time 0.3984 (0.4474) data time 0.0006 (0.0040) model time 0.3978 (0.3972) loss 6.0755 (6.2694) grad_norm 2.3996 (4.8586) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:04:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][240/625] eta 0:02:51 lr 0.000012 wd 0.0500 time 0.3947 (0.4448) data time 0.0006 (0.0038) model time 0.3941 (0.3971) loss 5.9726 (6.2677) grad_norm 4.5896 (4.8382) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][250/625] eta 0:02:45 lr 0.000012 wd 0.0500 time 0.3984 (0.4425) data time 0.0008 (0.0037) model time 0.3976 (0.3971) loss 5.8958 (6.2427) grad_norm 5.1733 (4.7761) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][260/625] eta 0:02:40 lr 0.000012 wd 0.0500 time 0.3969 (0.4408) data time 0.0006 (0.0035) model time 0.3963 (0.3974) loss 6.0835 (6.2303) grad_norm 3.3969 (4.7580) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][270/625] eta 0:02:35 lr 0.000012 wd 0.0500 time 0.4215 (0.4391) data time 0.0008 (0.0034) model time 0.4207 (0.3976) loss 6.9023 (6.2275) grad_norm 3.2888 (4.7231) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][280/625] eta 0:02:30 lr 0.000012 wd 0.0500 time 0.3948 (0.4373) data time 0.0006 (0.0033) model time 0.3942 (0.3975) loss 4.9290 (6.2173) grad_norm 5.5390 (4.6713) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][290/625] eta 0:02:25 lr 0.000012 wd 0.0500 time 0.4009 (0.4357) data time 0.0007 (0.0032) model time 0.4003 (0.3975) loss 4.6774 (6.2097) grad_norm 3.2927 (4.6604) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][300/625] eta 0:02:21 lr 0.000012 wd 0.0500 time 0.3955 (0.4341) data time 0.0006 (0.0031) model time 0.3949 (0.3975) loss 5.2973 (6.2051) grad_norm 3.4451 (4.6110) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][310/625] eta 0:02:16 lr 0.000012 wd 0.0500 time 0.3964 (0.4328) data time 0.0006 (0.0030) model time 0.3958 (0.3975) loss 6.4721 (6.1975) grad_norm 3.9067 (4.5754) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][320/625] eta 0:02:11 lr 0.000012 wd 0.0500 time 0.6113 (0.4328) data time 0.0008 (0.0030) model time 0.6105 (0.3987) loss 6.6249 (6.1928) grad_norm 4.2289 (4.5415) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][330/625] eta 0:02:07 lr 0.000012 wd 0.0500 time 0.3931 (0.4315) data time 0.0007 (0.0029) model time 0.3924 (0.3986) loss 5.3559 (6.1954) grad_norm 2.8725 (4.4866) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][340/625] eta 0:02:02 lr 0.000012 wd 0.0500 time 0.3944 (0.4303) data time 0.0007 (0.0028) model time 0.3938 (0.3985) loss 6.1621 (6.1926) grad_norm 2.9519 (4.4453) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][350/625] eta 0:01:58 lr 0.000012 wd 0.0500 time 0.3972 (0.4291) data time 0.0008 (0.0027) model time 0.3964 (0.3984) loss 7.0477 (6.1897) grad_norm 2.7760 (4.4444) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][360/625] eta 0:01:53 lr 0.000012 wd 0.0500 time 0.3952 (0.4281) data time 0.0009 (0.0027) model time 0.3944 (0.3983) loss 7.7101 (6.2006) grad_norm 2.3435 (4.4189) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][370/625] eta 0:01:48 lr 0.000012 wd 0.0500 time 0.3950 (0.4271) data time 0.0006 (0.0026) model time 0.3944 (0.3982) loss 6.7500 (6.2151) grad_norm 4.1350 (4.3844) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][380/625] eta 0:01:44 lr 0.000012 wd 0.0500 time 0.3968 (0.4262) data time 0.0007 (0.0026) model time 0.3961 (0.3981) loss 6.1012 (6.2154) grad_norm 158.5879 (4.8241) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:05:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][390/625] eta 0:01:39 lr 0.000012 wd 0.0500 time 0.3941 (0.4254) data time 0.0008 (0.0025) model time 0.3932 (0.3981) loss 6.5517 (6.2238) grad_norm 2.1127 (4.8361) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][400/625] eta 0:01:35 lr 0.000012 wd 0.0500 time 0.3947 (0.4246) data time 0.0009 (0.0025) model time 0.3938 (0.3980) loss 6.4450 (6.2234) grad_norm 4.6651 (4.8012) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][410/625] eta 0:01:31 lr 0.000012 wd 0.0500 time 0.3971 (0.4239) data time 0.0006 (0.0025) model time 0.3964 (0.3980) loss 5.7520 (6.2211) grad_norm 2.6789 (4.7672) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][420/625] eta 0:01:26 lr 0.000012 wd 0.0500 time 0.3947 (0.4231) data time 0.0008 (0.0024) model time 0.3939 (0.3980) loss 7.1507 (6.2251) grad_norm 2.4730 (4.7251) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][430/625] eta 0:01:22 lr 0.000012 wd 0.0500 time 0.4000 (0.4224) data time 0.0006 (0.0024) model time 0.3994 (0.3979) loss 5.8328 (6.2166) grad_norm 4.9535 (4.7007) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][440/625] eta 0:01:18 lr 0.000012 wd 0.0500 time 0.3955 (0.4218) data time 0.0009 (0.0023) model time 0.3946 (0.3978) loss 7.0401 (6.2151) grad_norm 11.5665 (4.7313) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][450/625] eta 0:01:13 lr 0.000012 wd 0.0500 time 0.3937 (0.4211) data time 0.0007 (0.0023) model time 0.3931 (0.3978) loss 6.5133 (6.2126) grad_norm 2.6867 (4.7014) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][460/625] eta 0:01:09 lr 0.000012 wd 0.0500 time 0.3943 (0.4205) data time 0.0008 (0.0023) model time 0.3934 (0.3977) loss 5.9326 (6.2199) grad_norm 8.0943 (4.6678) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][470/625] eta 0:01:05 lr 0.000012 wd 0.0500 time 0.3975 (0.4200) data time 0.0006 (0.0022) model time 0.3969 (0.3977) loss 7.3782 (6.2160) grad_norm 2.7129 (4.6360) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][480/625] eta 0:01:00 lr 0.000012 wd 0.0500 time 0.3938 (0.4195) data time 0.0008 (0.0022) model time 0.3930 (0.3977) loss 6.8897 (6.2205) grad_norm 2.9176 (4.6193) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][490/625] eta 0:00:56 lr 0.000012 wd 0.0500 time 0.4032 (0.4190) data time 0.0008 (0.0022) model time 0.4025 (0.3976) loss 6.5661 (6.2216) grad_norm 3.9035 (4.5959) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][500/625] eta 0:00:52 lr 0.000012 wd 0.0500 time 0.3962 (0.4185) data time 0.0006 (0.0021) model time 0.3956 (0.3976) loss 7.5238 (6.2190) grad_norm 3.5428 (4.5981) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][510/625] eta 0:00:48 lr 0.000012 wd 0.0500 time 0.3987 (0.4181) data time 0.0006 (0.0021) model time 0.3981 (0.3976) loss 6.7508 (6.2144) grad_norm 16.9503 (4.6033) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][520/625] eta 0:00:43 lr 0.000012 wd 0.0500 time 0.3965 (0.4176) data time 0.0009 (0.0021) model time 0.3955 (0.3976) loss 6.4023 (6.2063) grad_norm 5.8463 (4.5777) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][530/625] eta 0:00:39 lr 0.000012 wd 0.0500 time 0.3983 (0.4172) data time 0.0007 (0.0021) model time 0.3976 (0.3975) loss 5.9003 (6.2057) grad_norm 3.6150 (4.5564) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:06:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][540/625] eta 0:00:35 lr 0.000012 wd 0.0500 time 0.5658 (0.4171) data time 0.0009 (0.0020) model time 0.5649 (0.3979) loss 5.9428 (6.2040) grad_norm 2.6899 (4.5251) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:07:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][550/625] eta 0:00:31 lr 0.000012 wd 0.0500 time 0.3949 (0.4171) data time 0.0008 (0.0020) model time 0.3941 (0.3982) loss 6.2563 (6.1985) grad_norm 2.9333 (4.5183) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:07:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][560/625] eta 0:00:27 lr 0.000012 wd 0.0500 time 0.3941 (0.4167) data time 0.0008 (0.0020) model time 0.3932 (0.3982) loss 6.6912 (6.2093) grad_norm 3.1636 (4.4928) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:07:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][570/625] eta 0:00:22 lr 0.000012 wd 0.0500 time 0.3942 (0.4163) data time 0.0006 (0.0020) model time 0.3936 (0.3981) loss 5.2127 (6.2023) grad_norm 3.0734 (4.4777) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:07:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][580/625] eta 0:00:18 lr 0.000012 wd 0.0500 time 0.3987 (0.4160) data time 0.0006 (0.0019) model time 0.3981 (0.3981) loss 5.3820 (6.2037) grad_norm 6.3555 (4.4784) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:07:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][590/625] eta 0:00:14 lr 0.000012 wd 0.0500 time 0.4038 (0.4156) data time 0.0006 (0.0019) model time 0.4032 (0.3980) loss 6.1663 (6.2034) grad_norm 2.9114 (4.5147) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:07:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][600/625] eta 0:00:10 lr 0.000012 wd 0.0500 time 0.4004 (0.4152) data time 0.0006 (0.0019) model time 0.3997 (0.3980) loss 6.6103 (6.2084) grad_norm 3.1537 (4.5525) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:07:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][610/625] eta 0:00:06 lr 0.000012 wd 0.0500 time 0.3959 (0.4149) data time 0.0006 (0.0019) model time 0.3953 (0.3979) loss 6.9603 (6.2117) grad_norm 2.4080 (4.5573) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:07:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-27 18:07:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-27 18:07:31 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-27 18:39:42 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-27 18:39:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-27 18:50:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-27 18:50:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-27 18:57:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-27 18:57:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-27 18:58:02 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-27 18:58:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-27 18:58:14 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-27 18:58:14 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-27 18:58:14 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-27 18:58:14 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 297) [2024-07-27 18:58:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-27 18:58:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][620/625] eta 0:00:12 lr 0.000012 wd 0.0500 time 0.3938 (2.4664) data time 0.0005 (0.1439) model time 0.3933 (2.3225) loss 6.4747 (6.3283) grad_norm 3.4815 (4.2482) loss_scale 64.0000 (64.0000) mem 14931MB [2024-07-27 18:58:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 297 training takes 0:00:11 [2024-07-27 18:58:30 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-27 18:58:34 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-27 18:58:34 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.416 (0.416) Loss 0.5527 (0.5527) Acc@1 90.576 (90.576) Acc@5 99.072 (99.072) Mem 14931MB [2024-07-27 18:58:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.084 (0.114) Loss 0.8208 (0.6624) Acc@1 82.812 (87.775) Acc@5 97.119 (98.051) Mem 14931MB [2024-07-27 18:58:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.084 (0.100) Loss 0.9043 (0.7644) Acc@1 79.297 (84.807) Acc@5 96.289 (97.135) Mem 14931MB [2024-07-27 18:58:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.441 Acc@5 97.109 [2024-07-27 18:58:38 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-27 18:58:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.738 (0.738) Loss 0.5430 (0.5430) Acc@1 90.430 (90.430) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-27 18:58:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.084 (0.151) Loss 0.8091 (0.6539) Acc@1 82.764 (87.762) Acc@5 97.217 (98.078) Mem 14931MB [2024-07-27 18:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.084 (0.119) Loss 0.9019 (0.7559) Acc@1 79.395 (84.833) Acc@5 96.045 (97.126) Mem 14931MB [2024-07-27 18:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.435 Acc@5 97.093 [2024-07-27 18:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-27 18:58:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.43% [2024-07-27 18:58:41 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-27 18:58:42 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-27 18:58:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][0/625] eta 0:10:44 lr 0.000012 wd 0.0500 time 1.0318 (1.0318) data time 0.3260 (0.3260) model time 0.0000 (0.0000) loss 7.0947 (7.0947) grad_norm 11.5964 (11.5964) loss_scale 64.0000 (64.0000) mem 14938MB [2024-07-27 18:58:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][10/625] eta 0:04:37 lr 0.000012 wd 0.0500 time 0.3913 (0.4509) data time 0.0009 (0.0305) model time 0.0000 (0.0000) loss 7.0000 (6.4105) grad_norm 2.8915 (3.9401) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-27 18:58:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][20/625] eta 0:04:16 lr 0.000012 wd 0.0500 time 0.3918 (0.4247) data time 0.0007 (0.0164) model time 0.0000 (0.0000) loss 6.1007 (6.3956) grad_norm 2.6986 (3.4509) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-27 18:58:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][30/625] eta 0:04:06 lr 0.000012 wd 0.0500 time 0.3908 (0.4147) data time 0.0008 (0.0114) model time 0.0000 (0.0000) loss 6.2621 (6.3523) grad_norm 3.3539 (3.4344) loss_scale 64.0000 (64.0000) mem 14939MB [2024-07-27 18:58:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][40/625] eta 0:03:59 lr 0.000012 wd 0.0500 time 0.3939 (0.4095) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 6.3373 (6.3245) grad_norm 2.6833 (inf) loss_scale 32.0000 (60.8780) mem 14939MB [2024-07-27 18:59:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][50/625] eta 0:03:56 lr 0.000012 wd 0.0500 time 0.3938 (0.4120) data time 0.0007 (0.0073) model time 0.0000 (0.0000) loss 5.9833 (6.2960) grad_norm 4.6447 (inf) loss_scale 32.0000 (55.2157) mem 14939MB [2024-07-27 18:59:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][60/625] eta 0:03:51 lr 0.000012 wd 0.0500 time 0.3960 (0.4095) data time 0.0009 (0.0062) model time 0.3951 (0.3958) loss 6.4637 (6.2772) grad_norm 2.9021 (inf) loss_scale 32.0000 (51.4098) mem 14939MB [2024-07-27 18:59:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][70/625] eta 0:03:46 lr 0.000012 wd 0.0500 time 0.4003 (0.4078) data time 0.0008 (0.0056) model time 0.3995 (0.3958) loss 5.5034 (6.2210) grad_norm 2.6368 (inf) loss_scale 32.0000 (48.6761) mem 14939MB [2024-07-27 18:59:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][80/625] eta 0:03:41 lr 0.000012 wd 0.0500 time 0.3943 (0.4068) data time 0.0007 (0.0050) model time 0.3936 (0.3968) loss 6.2102 (6.2311) grad_norm 4.9541 (inf) loss_scale 32.0000 (46.6173) mem 14939MB [2024-07-27 18:59:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][90/625] eta 0:03:37 lr 0.000012 wd 0.0500 time 0.3973 (0.4061) data time 0.0009 (0.0046) model time 0.3964 (0.3973) loss 5.8060 (6.2563) grad_norm 2.6063 (inf) loss_scale 32.0000 (45.0110) mem 14939MB [2024-07-27 18:59:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][100/625] eta 0:03:32 lr 0.000012 wd 0.0500 time 0.3966 (0.4051) data time 0.0007 (0.0042) model time 0.3959 (0.3970) loss 6.8505 (6.2791) grad_norm 3.2516 (inf) loss_scale 32.0000 (43.7228) mem 14939MB [2024-07-27 18:59:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][110/625] eta 0:03:28 lr 0.000012 wd 0.0500 time 0.4078 (0.4046) data time 0.0008 (0.0039) model time 0.4070 (0.3972) loss 7.5853 (6.2894) grad_norm 2.8259 (inf) loss_scale 32.0000 (42.6667) mem 14939MB [2024-07-27 18:59:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][120/625] eta 0:03:23 lr 0.000012 wd 0.0500 time 0.3951 (0.4038) data time 0.0006 (0.0037) model time 0.3945 (0.3968) loss 5.4469 (6.2666) grad_norm 2.3133 (inf) loss_scale 32.0000 (41.7851) mem 14939MB [2024-07-27 18:59:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][130/625] eta 0:03:19 lr 0.000012 wd 0.0500 time 0.3933 (0.4034) data time 0.0007 (0.0035) model time 0.3926 (0.3968) loss 6.5111 (6.2645) grad_norm 3.1421 (inf) loss_scale 32.0000 (41.0382) mem 14939MB [2024-07-27 18:59:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][140/625] eta 0:03:15 lr 0.000012 wd 0.0500 time 0.3944 (0.4029) data time 0.0007 (0.0033) model time 0.3936 (0.3967) loss 5.8505 (6.2541) grad_norm 10.4284 (inf) loss_scale 32.0000 (40.3972) mem 14939MB [2024-07-27 18:59:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][150/625] eta 0:03:11 lr 0.000012 wd 0.0500 time 0.3955 (0.4023) data time 0.0006 (0.0031) model time 0.3949 (0.3964) loss 7.0378 (6.2650) grad_norm 2.6008 (inf) loss_scale 32.0000 (39.8411) mem 14939MB [2024-07-27 18:59:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][160/625] eta 0:03:06 lr 0.000012 wd 0.0500 time 0.3942 (0.4019) data time 0.0009 (0.0030) model time 0.3934 (0.3962) loss 6.2689 (6.2812) grad_norm 4.2041 (inf) loss_scale 32.0000 (39.3540) mem 14939MB [2024-07-27 18:59:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][170/625] eta 0:03:02 lr 0.000012 wd 0.0500 time 0.3947 (0.4016) data time 0.0007 (0.0029) model time 0.3939 (0.3961) loss 6.6032 (6.2697) grad_norm 3.0512 (inf) loss_scale 32.0000 (38.9240) mem 14939MB [2024-07-27 18:59:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][180/625] eta 0:02:58 lr 0.000012 wd 0.0500 time 0.3982 (0.4013) data time 0.0007 (0.0028) model time 0.3975 (0.3960) loss 6.1637 (6.2711) grad_norm 2.2520 (inf) loss_scale 32.0000 (38.5414) mem 14939MB [2024-07-27 18:59:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][190/625] eta 0:02:54 lr 0.000012 wd 0.0500 time 0.3929 (0.4010) data time 0.0008 (0.0027) model time 0.3921 (0.3960) loss 5.7674 (6.2570) grad_norm 2.6787 (inf) loss_scale 32.0000 (38.1990) mem 14939MB [2024-07-27 19:00:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][200/625] eta 0:02:50 lr 0.000012 wd 0.0500 time 0.3950 (0.4009) data time 0.0006 (0.0026) model time 0.3944 (0.3962) loss 6.4067 (6.2550) grad_norm 4.2422 (inf) loss_scale 32.0000 (37.8905) mem 14939MB [2024-07-27 19:00:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][210/625] eta 0:02:46 lr 0.000012 wd 0.0500 time 0.3973 (0.4017) data time 0.0006 (0.0025) model time 0.3967 (0.3974) loss 6.3426 (6.2414) grad_norm 6.0966 (inf) loss_scale 32.0000 (37.6114) mem 14939MB [2024-07-27 19:00:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][220/625] eta 0:02:42 lr 0.000012 wd 0.0500 time 0.3935 (0.4015) data time 0.0007 (0.0024) model time 0.3928 (0.3973) loss 5.7255 (6.2453) grad_norm 2.9905 (inf) loss_scale 32.0000 (37.3575) mem 14939MB [2024-07-27 19:00:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][230/625] eta 0:02:38 lr 0.000012 wd 0.0500 time 0.3956 (0.4012) data time 0.0006 (0.0023) model time 0.3950 (0.3972) loss 5.0685 (6.2392) grad_norm 3.9808 (inf) loss_scale 32.0000 (37.1255) mem 14939MB [2024-07-27 19:00:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][240/625] eta 0:02:34 lr 0.000012 wd 0.0500 time 0.3970 (0.4010) data time 0.0007 (0.0023) model time 0.3964 (0.3970) loss 6.4326 (6.2402) grad_norm 2.7772 (inf) loss_scale 32.0000 (36.9129) mem 14939MB [2024-07-27 19:00:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][250/625] eta 0:02:30 lr 0.000012 wd 0.0500 time 0.3993 (0.4009) data time 0.0008 (0.0022) model time 0.3985 (0.3971) loss 7.0380 (6.2379) grad_norm 3.4898 (inf) loss_scale 32.0000 (36.7171) mem 14939MB [2024-07-27 19:00:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][260/625] eta 0:02:26 lr 0.000012 wd 0.0500 time 0.3958 (0.4008) data time 0.0006 (0.0022) model time 0.3952 (0.3971) loss 5.8751 (6.2322) grad_norm 3.6500 (inf) loss_scale 32.0000 (36.5364) mem 14939MB [2024-07-27 19:00:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][270/625] eta 0:02:22 lr 0.000012 wd 0.0500 time 0.6061 (0.4020) data time 0.0008 (0.0021) model time 0.6053 (0.3987) loss 7.1926 (6.2434) grad_norm 2.6458 (inf) loss_scale 32.0000 (36.3690) mem 14939MB [2024-07-27 19:00:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][280/625] eta 0:02:18 lr 0.000012 wd 0.0500 time 0.3931 (0.4019) data time 0.0009 (0.0021) model time 0.3922 (0.3987) loss 6.1510 (6.2510) grad_norm 4.3875 (inf) loss_scale 32.0000 (36.2135) mem 14939MB [2024-07-27 19:00:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][290/625] eta 0:02:14 lr 0.000012 wd 0.0500 time 0.3947 (0.4017) data time 0.0008 (0.0020) model time 0.3939 (0.3986) loss 6.2817 (6.2361) grad_norm 3.2624 (inf) loss_scale 32.0000 (36.0687) mem 14939MB [2024-07-27 19:00:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][300/625] eta 0:02:10 lr 0.000012 wd 0.0500 time 0.3953 (0.4015) data time 0.0008 (0.0020) model time 0.3944 (0.3985) loss 6.7086 (6.2306) grad_norm 3.2767 (inf) loss_scale 32.0000 (35.9336) mem 14939MB [2024-07-27 19:00:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][310/625] eta 0:02:06 lr 0.000012 wd 0.0500 time 0.3963 (0.4014) data time 0.0006 (0.0020) model time 0.3957 (0.3983) loss 6.7086 (6.2472) grad_norm 2.4536 (inf) loss_scale 32.0000 (35.8071) mem 14939MB [2024-07-27 19:00:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][320/625] eta 0:02:02 lr 0.000012 wd 0.0500 time 0.3981 (0.4012) data time 0.0009 (0.0019) model time 0.3972 (0.3982) loss 5.2176 (6.2534) grad_norm 4.6318 (inf) loss_scale 32.0000 (35.6885) mem 14939MB [2024-07-27 19:00:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][330/625] eta 0:01:58 lr 0.000012 wd 0.0500 time 0.3948 (0.4010) data time 0.0008 (0.0019) model time 0.3940 (0.3981) loss 5.9380 (6.2525) grad_norm 3.6574 (inf) loss_scale 32.0000 (35.5770) mem 14939MB [2024-07-27 19:00:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][340/625] eta 0:01:54 lr 0.000012 wd 0.0500 time 0.4185 (0.4010) data time 0.0006 (0.0019) model time 0.4178 (0.3981) loss 6.3120 (6.2566) grad_norm 4.9586 (inf) loss_scale 32.0000 (35.4721) mem 14939MB [2024-07-27 19:01:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][350/625] eta 0:01:50 lr 0.000012 wd 0.0500 time 0.4032 (0.4010) data time 0.0006 (0.0018) model time 0.4026 (0.3982) loss 5.6221 (6.2458) grad_norm 2.7244 (inf) loss_scale 32.0000 (35.3732) mem 14939MB [2024-07-27 19:01:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][360/625] eta 0:01:46 lr 0.000012 wd 0.0500 time 0.3972 (0.4009) data time 0.0009 (0.0018) model time 0.3963 (0.3982) loss 7.0582 (6.2461) grad_norm 5.7192 (inf) loss_scale 32.0000 (35.2798) mem 14939MB [2024-07-27 19:01:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][370/625] eta 0:01:42 lr 0.000012 wd 0.0500 time 0.3956 (0.4008) data time 0.0007 (0.0018) model time 0.3950 (0.3981) loss 6.3899 (6.2475) grad_norm 3.4025 (inf) loss_scale 32.0000 (35.1914) mem 14939MB [2024-07-27 19:01:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][380/625] eta 0:01:38 lr 0.000012 wd 0.0500 time 0.4016 (0.4008) data time 0.0007 (0.0018) model time 0.4009 (0.3981) loss 6.8437 (6.2408) grad_norm 3.7795 (inf) loss_scale 32.0000 (35.1076) mem 14939MB [2024-07-27 19:01:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][390/625] eta 0:01:34 lr 0.000012 wd 0.0500 time 0.3989 (0.4008) data time 0.0006 (0.0018) model time 0.3983 (0.3981) loss 6.4465 (6.2412) grad_norm 2.8124 (inf) loss_scale 32.0000 (35.0281) mem 14939MB [2024-07-27 19:01:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][400/625] eta 0:01:30 lr 0.000012 wd 0.0500 time 0.3976 (0.4007) data time 0.0009 (0.0017) model time 0.3967 (0.3981) loss 6.2866 (6.2471) grad_norm 2.2544 (inf) loss_scale 32.0000 (34.9526) mem 14939MB [2024-07-27 19:01:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][410/625] eta 0:01:26 lr 0.000012 wd 0.0500 time 0.3960 (0.4006) data time 0.0006 (0.0017) model time 0.3954 (0.3980) loss 5.2276 (6.2441) grad_norm 3.9540 (inf) loss_scale 32.0000 (34.8808) mem 14939MB [2024-07-27 19:01:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][420/625] eta 0:01:22 lr 0.000012 wd 0.0500 time 0.3986 (0.4006) data time 0.0006 (0.0017) model time 0.3979 (0.3981) loss 6.1932 (6.2450) grad_norm 4.5957 (inf) loss_scale 32.0000 (34.8124) mem 14939MB [2024-07-27 19:01:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][430/625] eta 0:01:18 lr 0.000012 wd 0.0500 time 0.3954 (0.4010) data time 0.0008 (0.0017) model time 0.3946 (0.3985) loss 5.8938 (6.2447) grad_norm 3.2234 (inf) loss_scale 32.0000 (34.7471) mem 14939MB [2024-07-27 19:01:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][440/625] eta 0:01:14 lr 0.000012 wd 0.0500 time 0.3952 (0.4008) data time 0.0008 (0.0017) model time 0.3944 (0.3984) loss 7.1678 (6.2450) grad_norm 4.4244 (inf) loss_scale 32.0000 (34.6848) mem 14939MB [2024-07-27 19:01:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][450/625] eta 0:01:10 lr 0.000012 wd 0.0500 time 0.4029 (0.4008) data time 0.0008 (0.0016) model time 0.4020 (0.3984) loss 6.1094 (6.2354) grad_norm 3.6672 (inf) loss_scale 32.0000 (34.6253) mem 14939MB [2024-07-27 19:01:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][460/625] eta 0:01:06 lr 0.000012 wd 0.0500 time 0.3991 (0.4007) data time 0.0006 (0.0016) model time 0.3985 (0.3984) loss 4.9993 (6.2306) grad_norm 3.0054 (inf) loss_scale 32.0000 (34.5683) mem 14939MB [2024-07-27 19:01:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][470/625] eta 0:01:02 lr 0.000012 wd 0.0500 time 0.4008 (0.4007) data time 0.0008 (0.0016) model time 0.4000 (0.3984) loss 6.6762 (6.2310) grad_norm 4.2598 (inf) loss_scale 32.0000 (34.5138) mem 14939MB [2024-07-27 19:01:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][480/625] eta 0:00:58 lr 0.000012 wd 0.0500 time 0.3970 (0.4006) data time 0.0008 (0.0016) model time 0.3962 (0.3983) loss 6.5029 (6.2397) grad_norm 2.8964 (inf) loss_scale 32.0000 (34.4615) mem 14939MB [2024-07-27 19:01:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][490/625] eta 0:00:54 lr 0.000012 wd 0.0500 time 0.3931 (0.4013) data time 0.0006 (0.0016) model time 0.3925 (0.3992) loss 5.7484 (6.2420) grad_norm 3.1346 (inf) loss_scale 32.0000 (34.4114) mem 14939MB [2024-07-27 19:02:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][500/625] eta 0:00:50 lr 0.000012 wd 0.0500 time 0.3970 (0.4012) data time 0.0006 (0.0016) model time 0.3964 (0.3991) loss 7.4042 (6.2445) grad_norm 4.3924 (inf) loss_scale 32.0000 (34.3633) mem 14939MB [2024-07-27 19:02:07 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][510/625] eta 0:00:46 lr 0.000012 wd 0.0500 time 0.4081 (0.4012) data time 0.0008 (0.0016) model time 0.4073 (0.3990) loss 5.9281 (6.2486) grad_norm 3.9353 (inf) loss_scale 32.0000 (34.3170) mem 14939MB [2024-07-27 19:02:11 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][520/625] eta 0:00:42 lr 0.000012 wd 0.0500 time 0.4045 (0.4011) data time 0.0006 (0.0015) model time 0.4038 (0.3990) loss 6.1913 (6.2434) grad_norm 7.1392 (inf) loss_scale 32.0000 (34.2726) mem 14939MB [2024-07-27 19:02:15 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][530/625] eta 0:00:38 lr 0.000012 wd 0.0500 time 0.3979 (0.4011) data time 0.0008 (0.0015) model time 0.3972 (0.3990) loss 7.2304 (6.2429) grad_norm 5.0497 (inf) loss_scale 32.0000 (34.2298) mem 14939MB [2024-07-27 19:02:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][540/625] eta 0:00:34 lr 0.000012 wd 0.0500 time 0.3962 (0.4010) data time 0.0006 (0.0015) model time 0.3955 (0.3989) loss 6.9866 (6.2424) grad_norm 3.0197 (inf) loss_scale 32.0000 (34.1885) mem 14939MB [2024-07-27 19:02:23 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][550/625] eta 0:00:30 lr 0.000012 wd 0.0500 time 0.3957 (0.4009) data time 0.0006 (0.0015) model time 0.3950 (0.3989) loss 7.4617 (6.2432) grad_norm 6.7457 (inf) loss_scale 32.0000 (34.1488) mem 14939MB [2024-07-27 19:02:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][560/625] eta 0:00:26 lr 0.000012 wd 0.0500 time 0.3978 (0.4008) data time 0.0008 (0.0015) model time 0.3970 (0.3988) loss 6.0867 (6.2429) grad_norm 2.3535 (inf) loss_scale 32.0000 (34.1105) mem 14939MB [2024-07-27 19:02:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][570/625] eta 0:00:22 lr 0.000012 wd 0.0500 time 0.3965 (0.4008) data time 0.0006 (0.0015) model time 0.3959 (0.3988) loss 7.4687 (6.2460) grad_norm 3.3287 (inf) loss_scale 32.0000 (34.0736) mem 14939MB [2024-07-27 19:02:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][580/625] eta 0:00:18 lr 0.000012 wd 0.0500 time 0.3969 (0.4008) data time 0.0008 (0.0015) model time 0.3961 (0.3988) loss 5.8243 (6.2493) grad_norm 2.5674 (inf) loss_scale 32.0000 (34.0379) mem 14939MB [2024-07-27 19:02:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][590/625] eta 0:00:14 lr 0.000012 wd 0.0500 time 0.3975 (0.4007) data time 0.0008 (0.0015) model time 0.3967 (0.3988) loss 5.5537 (6.2462) grad_norm 4.5318 (inf) loss_scale 32.0000 (34.0034) mem 14939MB [2024-07-27 19:02:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][600/625] eta 0:00:10 lr 0.000012 wd 0.0500 time 0.4055 (0.4007) data time 0.0006 (0.0015) model time 0.4048 (0.3988) loss 7.0746 (6.2433) grad_norm 3.6931 (inf) loss_scale 32.0000 (33.9700) mem 14939MB [2024-07-27 19:02:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][610/625] eta 0:00:06 lr 0.000012 wd 0.0500 time 0.3984 (0.4007) data time 0.0004 (0.0014) model time 0.3980 (0.3987) loss 6.6238 (6.2460) grad_norm 2.3910 (inf) loss_scale 32.0000 (33.9378) mem 14939MB [2024-07-27 19:02:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][620/625] eta 0:00:02 lr 0.000012 wd 0.0500 time 0.3971 (0.4007) data time 0.0006 (0.0014) model time 0.3965 (0.3987) loss 5.7854 (6.2446) grad_norm 2.5002 (inf) loss_scale 32.0000 (33.9066) mem 14939MB [2024-07-27 19:02:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 298 training takes 0:04:10 [2024-07-27 19:02:52 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-27 19:02:54 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-27 19:02:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.424 (0.424) Loss 0.5469 (0.5469) Acc@1 90.430 (90.430) Acc@5 99.072 (99.072) Mem 14939MB [2024-07-27 19:02:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.117) Loss 0.8110 (0.6569) Acc@1 82.861 (87.797) Acc@5 97.168 (98.051) Mem 14939MB [2024-07-27 19:02:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.102) Loss 0.9014 (0.7590) Acc@1 79.443 (84.882) Acc@5 96.143 (97.135) Mem 14939MB [2024-07-27 19:02:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.507 Acc@5 97.103 [2024-07-27 19:02:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.5% [2024-07-27 19:02:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.831 (0.831) Loss 0.5430 (0.5430) Acc@1 90.430 (90.430) Acc@5 99.023 (99.023) Mem 14939MB [2024-07-27 19:02:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.086 (0.155) Loss 0.8086 (0.6538) Acc@1 82.764 (87.753) Acc@5 97.217 (98.078) Mem 14939MB [2024-07-27 19:02:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.122) Loss 0.9019 (0.7558) Acc@1 79.395 (84.824) Acc@5 96.094 (97.119) Mem 14939MB [2024-07-27 19:02:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.433 Acc@5 97.089 [2024-07-27 19:02:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-27 19:03:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][0/625] eta 0:13:27 lr 0.000012 wd 0.0500 time 1.2926 (1.2926) data time 0.4435 (0.4435) model time 0.0000 (0.0000) loss 5.6687 (5.6687) grad_norm 2.7944 (2.7944) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][10/625] eta 0:04:54 lr 0.000012 wd 0.0500 time 0.3974 (0.4782) data time 0.0006 (0.0410) model time 0.0000 (0.0000) loss 5.6518 (5.9353) grad_norm 2.8659 (3.4377) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][20/625] eta 0:04:26 lr 0.000012 wd 0.0500 time 0.3964 (0.4398) data time 0.0006 (0.0219) model time 0.0000 (0.0000) loss 6.5194 (6.0505) grad_norm 3.2374 (3.3420) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][30/625] eta 0:04:13 lr 0.000012 wd 0.0500 time 0.3978 (0.4261) data time 0.0008 (0.0151) model time 0.0000 (0.0000) loss 6.6677 (5.9875) grad_norm 4.4758 (3.3300) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][40/625] eta 0:04:05 lr 0.000012 wd 0.0500 time 0.3962 (0.4189) data time 0.0006 (0.0116) model time 0.0000 (0.0000) loss 7.2956 (6.0506) grad_norm 33.0489 (4.3982) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][50/625] eta 0:03:58 lr 0.000012 wd 0.0500 time 0.3974 (0.4144) data time 0.0006 (0.0095) model time 0.0000 (0.0000) loss 6.9955 (6.0817) grad_norm 4.1114 (4.3343) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][60/625] eta 0:03:52 lr 0.000012 wd 0.0500 time 0.3987 (0.4117) data time 0.0009 (0.0081) model time 0.3978 (0.3969) loss 6.6960 (6.1217) grad_norm 6.0628 (4.1949) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][70/625] eta 0:03:47 lr 0.000012 wd 0.0500 time 0.3975 (0.4097) data time 0.0008 (0.0071) model time 0.3967 (0.3967) loss 6.5736 (6.1230) grad_norm 2.7645 (4.1461) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][80/625] eta 0:03:42 lr 0.000012 wd 0.0500 time 0.3986 (0.4082) data time 0.0007 (0.0063) model time 0.3979 (0.3968) loss 5.2441 (6.0909) grad_norm 3.9537 (4.0528) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][90/625] eta 0:03:40 lr 0.000012 wd 0.0500 time 0.3952 (0.4113) data time 0.0006 (0.0057) model time 0.3946 (0.4064) loss 5.6899 (6.1039) grad_norm 3.6919 (4.0038) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:40 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][100/625] eta 0:03:35 lr 0.000012 wd 0.0500 time 0.3981 (0.4100) data time 0.0009 (0.0052) model time 0.3973 (0.4047) loss 6.5800 (6.1200) grad_norm 4.0377 (4.1806) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:44 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][110/625] eta 0:03:30 lr 0.000012 wd 0.0500 time 0.3985 (0.4090) data time 0.0006 (0.0048) model time 0.3978 (0.4036) loss 5.3482 (6.1487) grad_norm 2.9930 (4.2395) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:48 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][120/625] eta 0:03:26 lr 0.000012 wd 0.0500 time 0.3966 (0.4081) data time 0.0008 (0.0045) model time 0.3957 (0.4026) loss 5.2525 (6.1538) grad_norm 4.5030 (4.2439) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:52 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][130/625] eta 0:03:21 lr 0.000012 wd 0.0500 time 0.3964 (0.4073) data time 0.0008 (0.0042) model time 0.3955 (0.4019) loss 6.2330 (6.1690) grad_norm 2.2257 (4.1716) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:03:56 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][140/625] eta 0:03:17 lr 0.000012 wd 0.0500 time 0.4003 (0.4068) data time 0.0009 (0.0040) model time 0.3995 (0.4015) loss 6.3881 (6.1858) grad_norm 2.6518 (4.1113) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:04:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][150/625] eta 0:03:12 lr 0.000012 wd 0.0500 time 0.4086 (0.4063) data time 0.0006 (0.0038) model time 0.4080 (0.4012) loss 5.2018 (6.1948) grad_norm 3.0111 (4.0851) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:04:04 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][160/625] eta 0:03:08 lr 0.000012 wd 0.0500 time 0.3957 (0.4058) data time 0.0009 (0.0036) model time 0.3947 (0.4009) loss 7.4102 (6.2050) grad_norm 3.1756 (4.0659) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:04:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][170/625] eta 0:03:04 lr 0.000012 wd 0.0500 time 0.3973 (0.4053) data time 0.0008 (0.0035) model time 0.3965 (0.4005) loss 6.0757 (6.1988) grad_norm 2.6778 (4.1054) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:04:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][180/625] eta 0:03:00 lr 0.000012 wd 0.0500 time 0.3956 (0.4048) data time 0.0008 (0.0033) model time 0.3948 (0.4001) loss 6.8943 (6.1828) grad_norm 9.0359 (4.0855) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:04:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][190/625] eta 0:02:55 lr 0.000012 wd 0.0500 time 0.3985 (0.4044) data time 0.0008 (0.0032) model time 0.3977 (0.3999) loss 7.0999 (6.1985) grad_norm 2.4004 (4.0476) loss_scale 32.0000 (32.0000) mem 14939MB [2024-07-27 19:04:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-07-27 19:04:18 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-27 19:04:19 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-27 19:06:19 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-27 19:06:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-27 19:06:43 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-27 19:10:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/config.json [2024-07-27 19:10:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 129): INFO Creating model:vmamba2/vssd_mesa_retrain_small_e300 [2024-07-27 19:11:19 vssd_mesa_retrain_small_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-27 19:11:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth [2024-07-27 19:11:35 vssd_mesa_retrain_small_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth.................... [2024-07-27 19:11:35 vssd_mesa_retrain_small_e300] (utils.py 30): INFO resuming model: [2024-07-27 19:11:35 vssd_mesa_retrain_small_e300] (utils.py 37): INFO resuming model_ema: [2024-07-27 19:11:36 vssd_mesa_retrain_small_e300] (utils.py 61): INFO => loaded successfully './exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth' (epoch 299) [2024-07-27 19:11:36 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-07-27 19:11:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][200/625] eta 0:15:31 lr 0.000012 wd 0.0500 time 0.4404 (2.1912) data time 0.0008 (0.1344) model time 0.4396 (2.0568) loss 6.6047 (6.7676) grad_norm 3.7823 (4.4663) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:11:57 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][210/625] eta 0:07:28 lr 0.000012 wd 0.0500 time 0.4081 (1.0814) data time 0.0010 (0.0511) model time 0.4071 (1.0304) loss 6.4691 (6.5598) grad_norm 2.7309 (3.5049) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][220/625] eta 0:05:33 lr 0.000012 wd 0.0500 time 0.4199 (0.8246) data time 0.0007 (0.0318) model time 0.4192 (0.7928) loss 5.2663 (6.5277) grad_norm 3.8164 (3.8090) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][230/625] eta 0:04:41 lr 0.000012 wd 0.0500 time 0.4165 (0.7120) data time 0.0011 (0.0235) model time 0.4155 (0.6885) loss 6.7550 (6.4689) grad_norm 3.4385 (3.8115) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:09 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][240/625] eta 0:04:08 lr 0.000012 wd 0.0500 time 0.4065 (0.6467) data time 0.0011 (0.0186) model time 0.4054 (0.6281) loss 6.5831 (6.3994) grad_norm 4.7998 (3.6421) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][250/625] eta 0:03:50 lr 0.000012 wd 0.0500 time 0.4117 (0.6139) data time 0.0008 (0.0155) model time 0.4109 (0.5985) loss 6.6623 (6.3541) grad_norm 2.6520 (3.6069) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][260/625] eta 0:03:33 lr 0.000012 wd 0.0500 time 0.4169 (0.5839) data time 0.0007 (0.0133) model time 0.4161 (0.5706) loss 5.1759 (6.3130) grad_norm 3.5311 (3.5491) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][270/625] eta 0:03:19 lr 0.000012 wd 0.0500 time 0.4177 (0.5614) data time 0.0009 (0.0117) model time 0.4168 (0.5498) loss 6.9703 (6.2753) grad_norm 2.5091 (3.4541) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:26 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][280/625] eta 0:03:07 lr 0.000012 wd 0.0500 time 0.4084 (0.5441) data time 0.0007 (0.0104) model time 0.4077 (0.5337) loss 5.1998 (6.2427) grad_norm 4.7398 (3.4111) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:30 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][290/625] eta 0:02:57 lr 0.000012 wd 0.0500 time 0.4071 (0.5304) data time 0.0008 (0.0095) model time 0.4063 (0.5209) loss 6.1073 (6.2546) grad_norm 2.4870 (3.4287) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][300/625] eta 0:02:48 lr 0.000012 wd 0.0500 time 0.4104 (0.5189) data time 0.0010 (0.0087) model time 0.4094 (0.5102) loss 6.9672 (6.2908) grad_norm 2.6193 (3.4328) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][310/625] eta 0:02:40 lr 0.000012 wd 0.0500 time 0.4117 (0.5096) data time 0.0008 (0.0080) model time 0.4109 (0.5016) loss 6.4631 (6.2924) grad_norm 3.2083 (3.4857) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][320/625] eta 0:02:33 lr 0.000012 wd 0.0500 time 0.4233 (0.5020) data time 0.0008 (0.0075) model time 0.4225 (0.4946) loss 5.6170 (6.3056) grad_norm 3.5420 (3.4753) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][330/625] eta 0:02:26 lr 0.000012 wd 0.0500 time 0.4140 (0.4955) data time 0.0010 (0.0070) model time 0.4130 (0.4885) loss 5.5394 (6.3175) grad_norm 3.5484 (3.5777) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:51 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][340/625] eta 0:02:19 lr 0.000012 wd 0.0500 time 0.4119 (0.4899) data time 0.0008 (0.0066) model time 0.4111 (0.4833) loss 4.9983 (6.2944) grad_norm 2.9692 (3.5473) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][350/625] eta 0:02:13 lr 0.000012 wd 0.0500 time 0.4130 (0.4849) data time 0.0012 (0.0062) model time 0.4119 (0.4786) loss 7.0362 (6.3142) grad_norm 3.1608 (3.5996) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:12:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][360/625] eta 0:02:07 lr 0.000012 wd 0.0500 time 0.4097 (0.4803) data time 0.0010 (0.0059) model time 0.4087 (0.4744) loss 6.9778 (6.3281) grad_norm 2.4537 (3.6112) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:03 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][370/625] eta 0:02:01 lr 0.000012 wd 0.0500 time 0.4092 (0.4765) data time 0.0009 (0.0057) model time 0.4083 (0.4709) loss 5.9694 (6.3167) grad_norm 3.1073 (3.5890) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:08 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][380/625] eta 0:01:55 lr 0.000012 wd 0.0500 time 0.4128 (0.4731) data time 0.0014 (0.0054) model time 0.4114 (0.4677) loss 5.9443 (6.3071) grad_norm 2.7265 (4.1917) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:12 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][390/625] eta 0:01:50 lr 0.000012 wd 0.0500 time 0.4118 (0.4700) data time 0.0008 (0.0052) model time 0.4110 (0.4649) loss 5.7398 (6.2915) grad_norm 2.8183 (4.1435) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:16 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][400/625] eta 0:01:45 lr 0.000012 wd 0.0500 time 0.4121 (0.4672) data time 0.0009 (0.0050) model time 0.4112 (0.4622) loss 5.2791 (6.2638) grad_norm 3.4057 (4.1042) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:20 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][410/625] eta 0:01:39 lr 0.000012 wd 0.0500 time 0.4187 (0.4649) data time 0.0008 (0.0048) model time 0.4180 (0.4601) loss 5.2334 (6.2607) grad_norm 4.8515 (4.1371) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:24 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][420/625] eta 0:01:34 lr 0.000012 wd 0.0500 time 0.4129 (0.4626) data time 0.0008 (0.0046) model time 0.4121 (0.4579) loss 6.1059 (6.2586) grad_norm 3.0562 (4.1091) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:28 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][430/625] eta 0:01:29 lr 0.000012 wd 0.0500 time 0.4181 (0.4605) data time 0.0008 (0.0045) model time 0.4173 (0.4560) loss 6.9398 (6.2654) grad_norm 2.3543 (4.0842) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:32 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][440/625] eta 0:01:24 lr 0.000012 wd 0.0500 time 0.4174 (0.4585) data time 0.0009 (0.0044) model time 0.4165 (0.4541) loss 5.5172 (6.2647) grad_norm 2.9812 (4.0516) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:37 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][450/625] eta 0:01:19 lr 0.000012 wd 0.0500 time 0.4245 (0.4568) data time 0.0011 (0.0042) model time 0.4233 (0.4526) loss 5.6484 (6.2490) grad_norm 2.3110 (4.0982) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:41 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][460/625] eta 0:01:15 lr 0.000012 wd 0.0500 time 0.4108 (0.4552) data time 0.0007 (0.0041) model time 0.4100 (0.4511) loss 5.8145 (6.2415) grad_norm 3.2795 (4.0660) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:45 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][470/625] eta 0:01:10 lr 0.000012 wd 0.0500 time 0.4125 (0.4546) data time 0.0010 (0.0040) model time 0.4115 (0.4506) loss 6.4985 (6.2447) grad_norm 3.1600 (4.0588) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][480/625] eta 0:01:05 lr 0.000012 wd 0.0500 time 0.4079 (0.4532) data time 0.0010 (0.0039) model time 0.4069 (0.4493) loss 5.5701 (6.2447) grad_norm 4.0353 (4.0230) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][490/625] eta 0:01:01 lr 0.000012 wd 0.0500 time 0.4201 (0.4520) data time 0.0008 (0.0038) model time 0.4193 (0.4482) loss 5.1844 (6.2330) grad_norm 3.8279 (4.0412) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:13:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][500/625] eta 0:00:56 lr 0.000012 wd 0.0500 time 0.4169 (0.4508) data time 0.0007 (0.0037) model time 0.4161 (0.4471) loss 5.4456 (6.2250) grad_norm 4.2984 (4.0238) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:02 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][510/625] eta 0:00:51 lr 0.000012 wd 0.0500 time 0.4176 (0.4498) data time 0.0007 (0.0037) model time 0.4169 (0.4461) loss 6.2378 (6.2333) grad_norm 2.4038 (4.0097) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:06 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][520/625] eta 0:00:47 lr 0.000012 wd 0.0500 time 0.4065 (0.4488) data time 0.0011 (0.0036) model time 0.4054 (0.4452) loss 6.8637 (6.2473) grad_norm 3.1687 (4.0084) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:10 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][530/625] eta 0:00:42 lr 0.000012 wd 0.0500 time 0.4203 (0.4478) data time 0.0012 (0.0035) model time 0.4191 (0.4443) loss 6.5672 (6.2460) grad_norm 2.8788 (4.0707) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:14 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][540/625] eta 0:00:37 lr 0.000012 wd 0.0500 time 0.4270 (0.4468) data time 0.0011 (0.0035) model time 0.4260 (0.4434) loss 7.0651 (6.2441) grad_norm 2.9556 (4.0923) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:18 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][550/625] eta 0:00:33 lr 0.000012 wd 0.0500 time 0.4176 (0.4459) data time 0.0008 (0.0034) model time 0.4168 (0.4425) loss 5.6226 (6.2368) grad_norm 3.4341 (4.0783) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:22 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][560/625] eta 0:00:28 lr 0.000012 wd 0.0500 time 0.4352 (0.4451) data time 0.0011 (0.0033) model time 0.4341 (0.4418) loss 6.9080 (6.2375) grad_norm 3.2174 (4.0484) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:27 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][570/625] eta 0:00:24 lr 0.000012 wd 0.0500 time 0.4100 (0.4443) data time 0.0008 (0.0033) model time 0.4091 (0.4410) loss 5.0500 (6.2373) grad_norm 6.3085 (4.1321) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:31 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][580/625] eta 0:00:19 lr 0.000012 wd 0.0500 time 0.4060 (0.4434) data time 0.0011 (0.0032) model time 0.4049 (0.4402) loss 6.5742 (6.2364) grad_norm 3.0931 (4.1146) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:35 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][590/625] eta 0:00:15 lr 0.000012 wd 0.0500 time 0.4093 (0.4428) data time 0.0008 (0.0032) model time 0.4086 (0.4396) loss 6.5583 (6.2390) grad_norm 2.2104 (4.0846) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:39 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][600/625] eta 0:00:11 lr 0.000012 wd 0.0500 time 0.4118 (0.4421) data time 0.0008 (0.0031) model time 0.4110 (0.4389) loss 6.4988 (6.2395) grad_norm 2.6108 (4.1334) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:43 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][610/625] eta 0:00:06 lr 0.000012 wd 0.0500 time 0.4101 (0.4413) data time 0.0005 (0.0031) model time 0.4095 (0.4382) loss 5.4399 (6.2419) grad_norm 2.4762 (4.1199) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:47 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][620/625] eta 0:00:02 lr 0.000012 wd 0.0500 time 0.4122 (0.4406) data time 0.0007 (0.0030) model time 0.4115 (0.4375) loss 7.2967 (6.2428) grad_norm 3.4991 (4.1312) loss_scale 32.0000 (32.0000) mem 14931MB [2024-07-27 19:14:49 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 394): INFO EPOCH 299 training takes 0:03:09 [2024-07-27 19:14:49 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saving...... [2024-07-27 19:14:53 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/latest_ckpt.pth saved !!! [2024-07-27 19:14:53 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.492 (0.492) Loss 0.5488 (0.5488) Acc@1 90.527 (90.527) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-27 19:14:54 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.130) Loss 0.8120 (0.6568) Acc@1 82.812 (87.780) Acc@5 97.314 (98.078) Mem 14931MB [2024-07-27 19:14:55 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.109) Loss 0.9028 (0.7599) Acc@1 79.590 (84.859) Acc@5 96.045 (97.119) Mem 14931MB [2024-07-27 19:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.471 Acc@5 97.097 [2024-07-27 19:14:58 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 84.5% [2024-07-27 19:14:59 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.812 (0.812) Loss 0.5430 (0.5430) Acc@1 90.430 (90.430) Acc@5 99.023 (99.023) Mem 14931MB [2024-07-27 19:15:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.085 (0.162) Loss 0.8086 (0.6537) Acc@1 82.715 (87.753) Acc@5 97.217 (98.069) Mem 14931MB [2024-07-27 19:15:00 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.086 (0.125) Loss 0.9014 (0.7557) Acc@1 79.395 (84.828) Acc@5 96.045 (97.112) Mem 14931MB [2024-07-27 19:15:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.441 Acc@5 97.081 [2024-07-27 19:15:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-07-27 19:15:01 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.44% [2024-07-27 19:15:01 vssd_mesa_retrain_small_e300] (utils.py 118): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saving...... [2024-07-27 19:15:05 vssd_mesa_retrain_small_e300] (utils.py 120): INFO ./exclude/output_mesa/vssd_mesa_retrain_small_e300/20240723083909/best_ckpt_ema.pth saved !!! [2024-07-27 19:15:05 vssd_mesa_retrain_small_e300] (main_hfai_mnodes.py 291): INFO Training time 0:03:29