diff --git "a/sf_log.txt" "b/sf_log.txt"
--- "a/sf_log.txt"
+++ "b/sf_log.txt"
@@ -1,49 +1,92 @@
-[2025-03-15 15:53:32,225][06641] Saving configuration to /home/aa/Downloads/train_dir/default_experiment/config.json...
-[2025-03-15 15:53:32,227][06641] Rollout worker 0 uses device cpu
-[2025-03-15 15:53:32,227][06641] Rollout worker 1 uses device cpu
-[2025-03-15 15:53:32,228][06641] Rollout worker 2 uses device cpu
-[2025-03-15 15:53:32,229][06641] Rollout worker 3 uses device cpu
-[2025-03-15 15:53:32,230][06641] Rollout worker 4 uses device cpu
-[2025-03-15 15:53:32,230][06641] Rollout worker 5 uses device cpu
-[2025-03-15 15:53:32,231][06641] Rollout worker 6 uses device cpu
-[2025-03-15 15:53:32,232][06641] Rollout worker 7 uses device cpu
-[2025-03-15 15:53:32,283][06641] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-03-15 15:53:32,283][06641] InferenceWorker_p0-w0: min num requests: 2
-[2025-03-15 15:53:32,312][06641] Starting all processes...
-[2025-03-15 15:53:32,313][06641] Starting process learner_proc0
-[2025-03-15 15:53:32,362][06641] Starting all processes...
-[2025-03-15 15:53:32,367][06641] Starting process inference_proc0-0
-[2025-03-15 15:53:32,368][06641] Starting process rollout_proc0
-[2025-03-15 15:53:32,368][06641] Starting process rollout_proc1
-[2025-03-15 15:53:32,368][06641] Starting process rollout_proc2
-[2025-03-15 15:53:32,368][06641] Starting process rollout_proc3
-[2025-03-15 15:53:32,369][06641] Starting process rollout_proc4
-[2025-03-15 15:53:32,370][06641] Starting process rollout_proc5
-[2025-03-15 15:53:32,370][06641] Starting process rollout_proc6
-[2025-03-15 15:53:32,370][06641] Starting process rollout_proc7
-[2025-03-15 15:53:35,638][06726] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-03-15 15:53:35,638][06726] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
-[2025-03-15 15:53:35,660][06726] Num visible devices: 1
-[2025-03-15 15:53:35,664][06726] Starting seed is not provided
-[2025-03-15 15:53:35,665][06726] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-03-15 15:53:35,665][06726] Initializing actor-critic model on device cuda:0
-[2025-03-15 15:53:35,665][06726] RunningMeanStd input shape: (3, 72, 128)
-[2025-03-15 15:53:35,667][06726] RunningMeanStd input shape: (1,)
-[2025-03-15 15:53:35,688][06726] ConvEncoder: input_channels=3
-[2025-03-15 15:53:35,724][06740] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2025-03-15 15:53:35,734][06744] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2025-03-15 15:53:35,776][06746] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2025-03-15 15:53:35,776][06739] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-03-15 15:53:35,777][06739] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
-[2025-03-15 15:53:35,781][06741] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2025-03-15 15:53:35,794][06739] Num visible devices: 1
-[2025-03-15 15:53:35,820][06745] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2025-03-15 15:53:35,828][06742] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2025-03-15 15:53:35,841][06747] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2025-03-15 15:53:35,842][06726] Conv encoder output size: 512
-[2025-03-15 15:53:35,842][06726] Policy head output size: 512
-[2025-03-15 15:53:35,858][06726] Created Actor Critic model with architecture:
-[2025-03-15 15:53:35,858][06726] ActorCriticSharedWeights(
+[2025-03-15 16:00:11,916][09103] Saving configuration to /home/aa/Downloads/train_dir/default_experiment/config.json...
+[2025-03-15 16:00:11,917][09103] Rollout worker 0 uses device cpu
+[2025-03-15 16:00:11,918][09103] Rollout worker 1 uses device cpu
+[2025-03-15 16:00:11,919][09103] Rollout worker 2 uses device cpu
+[2025-03-15 16:00:11,920][09103] Rollout worker 3 uses device cpu
+[2025-03-15 16:00:11,920][09103] Rollout worker 4 uses device cpu
+[2025-03-15 16:00:11,921][09103] Rollout worker 5 uses device cpu
+[2025-03-15 16:00:11,922][09103] Rollout worker 6 uses device cpu
+[2025-03-15 16:00:11,923][09103] Rollout worker 7 uses device cpu
+[2025-03-15 16:00:11,971][09103] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-03-15 16:00:11,972][09103] InferenceWorker_p0-w0: min num requests: 2
+[2025-03-15 16:00:11,999][09103] Starting all processes...
+[2025-03-15 16:00:12,000][09103] Starting process learner_proc0
+[2025-03-15 16:00:12,049][09103] Starting all processes...
+[2025-03-15 16:00:12,053][09103] Starting process inference_proc0-0
+[2025-03-15 16:00:12,053][09103] Starting process rollout_proc0
+[2025-03-15 16:00:12,053][09103] Starting process rollout_proc1
+[2025-03-15 16:00:12,053][09103] Starting process rollout_proc2
+[2025-03-15 16:00:12,053][09103] Starting process rollout_proc3
+[2025-03-15 16:00:12,054][09103] Starting process rollout_proc4
+[2025-03-15 16:00:12,054][09103] Starting process rollout_proc5
+[2025-03-15 16:00:12,054][09103] Starting process rollout_proc6
+[2025-03-15 16:00:12,054][09103] Starting process rollout_proc7
+[2025-03-15 16:00:13,399][09103] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 9103], exiting...
+[2025-03-15 16:00:13,401][09103] Runner profile tree view:
+main_loop: 1.4017
+[2025-03-15 16:00:13,402][09103] Collected {}, FPS: 0.0
+[2025-03-15 16:00:14,743][09181] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:14,744][09181] Stopping RolloutWorker_w1...
+[2025-03-15 16:00:14,744][09181] Loop rollout_proc1_evt_loop terminating...
+[2025-03-15 16:00:14,854][09186] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:14,855][09186] Stopping RolloutWorker_w5...
+[2025-03-15 16:00:14,855][09186] Loop rollout_proc5_evt_loop terminating...
+[2025-03-15 16:00:14,878][09187] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:14,878][09187] Stopping RolloutWorker_w4...
+[2025-03-15 16:00:14,879][09187] Loop rollout_proc4_evt_loop terminating...
+[2025-03-15 16:00:14,997][09183] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:14,998][09183] Stopping RolloutWorker_w0...
+[2025-03-15 16:00:14,998][09183] Loop rollout_proc0_evt_loop terminating...
+[2025-03-15 16:00:15,033][09184] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:15,034][09184] Stopping RolloutWorker_w2...
+[2025-03-15 16:00:15,034][09184] Loop rollout_proc2_evt_loop terminating...
+[2025-03-15 16:00:42,102][09281] Saving configuration to /home/aa/Downloads/train_dir/default_experiment/config.json...
+[2025-03-15 16:00:42,104][09281] Rollout worker 0 uses device cpu
+[2025-03-15 16:00:42,104][09281] Rollout worker 1 uses device cpu
+[2025-03-15 16:00:42,105][09281] Rollout worker 2 uses device cpu
+[2025-03-15 16:00:42,106][09281] Rollout worker 3 uses device cpu
+[2025-03-15 16:00:42,106][09281] Rollout worker 4 uses device cpu
+[2025-03-15 16:00:42,107][09281] Rollout worker 5 uses device cpu
+[2025-03-15 16:00:42,108][09281] Rollout worker 6 uses device cpu
+[2025-03-15 16:00:42,108][09281] Rollout worker 7 uses device cpu
+[2025-03-15 16:00:42,189][09281] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-03-15 16:00:42,191][09281] InferenceWorker_p0-w0: min num requests: 2
+[2025-03-15 16:00:42,224][09281] Starting all processes...
+[2025-03-15 16:00:42,225][09281] Starting process learner_proc0
+[2025-03-15 16:00:42,274][09281] Starting all processes...
+[2025-03-15 16:00:42,279][09281] Starting process inference_proc0-0
+[2025-03-15 16:00:42,280][09281] Starting process rollout_proc0
+[2025-03-15 16:00:42,280][09281] Starting process rollout_proc1
+[2025-03-15 16:00:42,280][09281] Starting process rollout_proc2
+[2025-03-15 16:00:42,280][09281] Starting process rollout_proc3
+[2025-03-15 16:00:42,280][09281] Starting process rollout_proc4
+[2025-03-15 16:00:42,280][09281] Starting process rollout_proc5
+[2025-03-15 16:00:42,280][09281] Starting process rollout_proc6
+[2025-03-15 16:00:42,281][09281] Starting process rollout_proc7
+[2025-03-15 16:00:45,728][09460] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:45,735][09464] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:45,735][09461] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:45,750][09458] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:45,758][09445] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-03-15 16:00:45,758][09445] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-03-15 16:00:45,776][09445] Num visible devices: 1
+[2025-03-15 16:00:45,777][09445] Starting seed is not provided
+[2025-03-15 16:00:45,778][09445] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-03-15 16:00:45,778][09445] Initializing actor-critic model on device cuda:0
+[2025-03-15 16:00:45,778][09445] RunningMeanStd input shape: (3, 72, 128)
+[2025-03-15 16:00:45,780][09445] RunningMeanStd input shape: (1,)
+[2025-03-15 16:00:45,796][09445] ConvEncoder: input_channels=3
+[2025-03-15 16:00:45,798][09463] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:45,846][09462] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:45,868][09459] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-03-15 16:00:45,868][09459] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-03-15 16:00:45,885][09459] Num visible devices: 1
+[2025-03-15 16:00:45,904][09466] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:45,910][09465] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2025-03-15 16:00:45,937][09445] Conv encoder output size: 512
+[2025-03-15 16:00:45,937][09445] Policy head output size: 512
+[2025-03-15 16:00:45,952][09445] Created Actor Critic model with architecture:
+[2025-03-15 16:00:45,952][09445] ActorCriticSharedWeights(
   (obs_normalizer): ObservationNormalizer(
     (running_mean_std): RunningMeanStdDictInPlace(
       (running_mean_std): ModuleDict(
@@ -84,687 +127,1019 @@
     (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
   )
 )
-[2025-03-15 15:53:35,893][06743] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2025-03-15 15:53:36,027][06726] Using optimizer <class 'torch.optim.adam.Adam'>
-[2025-03-15 15:53:37,132][06726] No checkpoints found
-[2025-03-15 15:53:37,132][06726] Did not load from checkpoint, starting from scratch!
-[2025-03-15 15:53:37,133][06726] Initialized policy 0 weights for model version 0
-[2025-03-15 15:53:37,136][06726] LearnerWorker_p0 finished initialization!
-[2025-03-15 15:53:37,136][06726] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-03-15 15:53:37,230][06739] RunningMeanStd input shape: (3, 72, 128)
-[2025-03-15 15:53:37,231][06739] RunningMeanStd input shape: (1,)
-[2025-03-15 15:53:37,241][06739] ConvEncoder: input_channels=3
-[2025-03-15 15:53:37,330][06739] Conv encoder output size: 512
-[2025-03-15 15:53:37,330][06739] Policy head output size: 512
-[2025-03-15 15:53:37,361][06641] Inference worker 0-0 is ready!
-[2025-03-15 15:53:37,361][06641] All inference workers are ready! Signal rollout workers to start!
-[2025-03-15 15:53:37,407][06743] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:53:37,407][06744] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:53:37,408][06740] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:53:37,409][06746] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:53:37,423][06745] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:53:37,423][06741] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:53:37,423][06747] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:53:37,424][06742] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:53:37,796][06740] Decorrelating experience for 0 frames...
-[2025-03-15 15:53:37,798][06744] Decorrelating experience for 0 frames...
-[2025-03-15 15:53:37,823][06743] Decorrelating experience for 0 frames...
-[2025-03-15 15:53:37,826][06745] Decorrelating experience for 0 frames...
-[2025-03-15 15:53:37,849][06741] Decorrelating experience for 0 frames...
-[2025-03-15 15:53:38,177][06746] Decorrelating experience for 0 frames...
-[2025-03-15 15:53:38,179][06742] Decorrelating experience for 0 frames...
-[2025-03-15 15:53:38,194][06743] Decorrelating experience for 32 frames...
-[2025-03-15 15:53:38,204][06747] Decorrelating experience for 0 frames...
-[2025-03-15 15:53:38,209][06744] Decorrelating experience for 32 frames...
-[2025-03-15 15:53:38,230][06641] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2025-03-15 15:53:38,531][06742] Decorrelating experience for 32 frames...
-[2025-03-15 15:53:38,546][06747] Decorrelating experience for 32 frames...
-[2025-03-15 15:53:38,579][06746] Decorrelating experience for 32 frames...
-[2025-03-15 15:53:38,582][06741] Decorrelating experience for 32 frames...
-[2025-03-15 15:53:38,683][06745] Decorrelating experience for 32 frames...
-[2025-03-15 15:53:38,946][06743] Decorrelating experience for 64 frames...
-[2025-03-15 15:53:39,063][06746] Decorrelating experience for 64 frames...
-[2025-03-15 15:53:39,110][06740] Decorrelating experience for 32 frames...
-[2025-03-15 15:53:39,148][06742] Decorrelating experience for 64 frames...
-[2025-03-15 15:53:39,162][06745] Decorrelating experience for 64 frames...
-[2025-03-15 15:53:39,210][06744] Decorrelating experience for 64 frames...
-[2025-03-15 15:53:39,221][06741] Decorrelating experience for 64 frames...
-[2025-03-15 15:53:39,395][06747] Decorrelating experience for 64 frames...
-[2025-03-15 15:53:39,504][06743] Decorrelating experience for 96 frames...
-[2025-03-15 15:53:39,593][06746] Decorrelating experience for 96 frames...
-[2025-03-15 15:53:39,641][06740] Decorrelating experience for 64 frames...
-[2025-03-15 15:53:39,697][06744] Decorrelating experience for 96 frames...
-[2025-03-15 15:53:39,876][06747] Decorrelating experience for 96 frames...
-[2025-03-15 15:53:39,887][06742] Decorrelating experience for 96 frames...
-[2025-03-15 15:53:40,067][06740] Decorrelating experience for 96 frames...
-[2025-03-15 15:53:40,090][06741] Decorrelating experience for 96 frames...
-[2025-03-15 15:53:40,243][06745] Decorrelating experience for 96 frames...
-[2025-03-15 15:53:40,885][06726] Signal inference workers to stop experience collection...
-[2025-03-15 15:53:40,893][06739] InferenceWorker_p0-w0: stopping experience collection
-[2025-03-15 15:53:42,668][06726] Signal inference workers to resume experience collection...
-[2025-03-15 15:53:42,669][06739] InferenceWorker_p0-w0: resuming experience collection
-[2025-03-15 15:53:43,229][06641] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 12288. Throughput: 0: 146.4. Samples: 732. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
-[2025-03-15 15:53:43,231][06641] Avg episode reward: [(0, '3.252')]
-[2025-03-15 15:53:44,741][06739] Updated weights for policy 0, policy_version 10 (0.0074)
-[2025-03-15 15:53:47,139][06739] Updated weights for policy 0, policy_version 20 (0.0013)
-[2025-03-15 15:53:48,229][06641] Fps is (10 sec: 9830.5, 60 sec: 9830.5, 300 sec: 9830.5). Total num frames: 98304. Throughput: 0: 1804.4. Samples: 18044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
-[2025-03-15 15:53:48,230][06641] Avg episode reward: [(0, '4.309')]
-[2025-03-15 15:53:49,557][06739] Updated weights for policy 0, policy_version 30 (0.0012)
-[2025-03-15 15:53:52,055][06739] Updated weights for policy 0, policy_version 40 (0.0014)
-[2025-03-15 15:53:52,274][06641] Heartbeat connected on Batcher_0
-[2025-03-15 15:53:52,289][06641] Heartbeat connected on RolloutWorker_w0
-[2025-03-15 15:53:52,291][06641] Heartbeat connected on InferenceWorker_p0-w0
-[2025-03-15 15:53:52,294][06641] Heartbeat connected on LearnerWorker_p0
-[2025-03-15 15:53:52,295][06641] Heartbeat connected on RolloutWorker_w1
-[2025-03-15 15:53:52,296][06641] Heartbeat connected on RolloutWorker_w2
-[2025-03-15 15:53:52,299][06641] Heartbeat connected on RolloutWorker_w3
-[2025-03-15 15:53:52,303][06641] Heartbeat connected on RolloutWorker_w4
-[2025-03-15 15:53:52,307][06641] Heartbeat connected on RolloutWorker_w5
-[2025-03-15 15:53:52,311][06641] Heartbeat connected on RolloutWorker_w6
-[2025-03-15 15:53:52,315][06641] Heartbeat connected on RolloutWorker_w7
-[2025-03-15 15:53:53,229][06641] Fps is (10 sec: 16793.9, 60 sec: 12015.1, 300 sec: 12015.1). Total num frames: 180224. Throughput: 0: 2875.6. Samples: 43134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
-[2025-03-15 15:53:53,230][06641] Avg episode reward: [(0, '4.275')]
-[2025-03-15 15:53:53,235][06726] Saving new best policy, reward=4.275!
-[2025-03-15 15:53:54,439][06739] Updated weights for policy 0, policy_version 50 (0.0013)
-[2025-03-15 15:53:56,863][06739] Updated weights for policy 0, policy_version 60 (0.0013)
-[2025-03-15 15:53:58,229][06641] Fps is (10 sec: 16793.8, 60 sec: 13312.1, 300 sec: 13312.1). Total num frames: 266240. Throughput: 0: 2804.1. Samples: 56082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:53:58,231][06641] Avg episode reward: [(0, '4.447')]
-[2025-03-15 15:53:58,237][06726] Saving new best policy, reward=4.447!
-[2025-03-15 15:53:59,254][06739] Updated weights for policy 0, policy_version 70 (0.0013)
-[2025-03-15 15:54:01,677][06739] Updated weights for policy 0, policy_version 80 (0.0012)
-[2025-03-15 15:54:03,229][06641] Fps is (10 sec: 17203.2, 60 sec: 14090.3, 300 sec: 14090.3). Total num frames: 352256. Throughput: 0: 3262.5. Samples: 81562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:54:03,230][06641] Avg episode reward: [(0, '4.272')]
-[2025-03-15 15:54:04,080][06739] Updated weights for policy 0, policy_version 90 (0.0013)
-[2025-03-15 15:54:06,475][06739] Updated weights for policy 0, policy_version 100 (0.0013)
-[2025-03-15 15:54:08,229][06641] Fps is (10 sec: 17203.1, 60 sec: 14609.1, 300 sec: 14609.1). Total num frames: 438272. Throughput: 0: 3576.7. Samples: 107300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
-[2025-03-15 15:54:08,230][06641] Avg episode reward: [(0, '4.562')]
-[2025-03-15 15:54:08,235][06726] Saving new best policy, reward=4.562!
-[2025-03-15 15:54:08,870][06739] Updated weights for policy 0, policy_version 110 (0.0012)
-[2025-03-15 15:54:11,265][06739] Updated weights for policy 0, policy_version 120 (0.0013)
-[2025-03-15 15:54:13,230][06641] Fps is (10 sec: 16793.1, 60 sec: 14862.6, 300 sec: 14862.6). Total num frames: 520192. Throughput: 0: 3428.3. Samples: 119990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:54:13,231][06641] Avg episode reward: [(0, '4.600')]
-[2025-03-15 15:54:13,247][06726] Saving new best policy, reward=4.600!
-[2025-03-15 15:54:13,740][06739] Updated weights for policy 0, policy_version 130 (0.0013)
-[2025-03-15 15:54:16,156][06739] Updated weights for policy 0, policy_version 140 (0.0014)
-[2025-03-15 15:54:18,229][06641] Fps is (10 sec: 16793.6, 60 sec: 15155.2, 300 sec: 15155.2). Total num frames: 606208. Throughput: 0: 3631.6. Samples: 145264. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:54:18,230][06641] Avg episode reward: [(0, '4.290')]
-[2025-03-15 15:54:18,572][06739] Updated weights for policy 0, policy_version 150 (0.0013)
-[2025-03-15 15:54:20,990][06739] Updated weights for policy 0, policy_version 160 (0.0013)
-[2025-03-15 15:54:23,229][06641] Fps is (10 sec: 17203.6, 60 sec: 15382.8, 300 sec: 15382.8). Total num frames: 692224. Throughput: 0: 3794.0. Samples: 170730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
-[2025-03-15 15:54:23,230][06641] Avg episode reward: [(0, '4.476')]
-[2025-03-15 15:54:23,402][06739] Updated weights for policy 0, policy_version 170 (0.0013)
-[2025-03-15 15:54:25,857][06739] Updated weights for policy 0, policy_version 180 (0.0013)
-[2025-03-15 15:54:28,229][06641] Fps is (10 sec: 16793.6, 60 sec: 15482.9, 300 sec: 15482.9). Total num frames: 774144. Throughput: 0: 4057.6. Samples: 183326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
-[2025-03-15 15:54:28,230][06641] Avg episode reward: [(0, '4.899')]
-[2025-03-15 15:54:28,236][06726] Saving new best policy, reward=4.899!
-[2025-03-15 15:54:28,367][06739] Updated weights for policy 0, policy_version 190 (0.0014)
-[2025-03-15 15:54:30,747][06739] Updated weights for policy 0, policy_version 200 (0.0014)
-[2025-03-15 15:54:33,229][06641] Fps is (10 sec: 16383.9, 60 sec: 15564.8, 300 sec: 15564.8). Total num frames: 856064. Throughput: 0: 4233.0. Samples: 208528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
-[2025-03-15 15:54:33,230][06641] Avg episode reward: [(0, '4.671')]
-[2025-03-15 15:54:33,344][06739] Updated weights for policy 0, policy_version 210 (0.0013)
-[2025-03-15 15:54:36,049][06739] Updated weights for policy 0, policy_version 220 (0.0014)
-[2025-03-15 15:54:38,229][06641] Fps is (10 sec: 15974.4, 60 sec: 15564.8, 300 sec: 15564.8). Total num frames: 933888. Throughput: 0: 4190.3. Samples: 231698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:54:38,233][06641] Avg episode reward: [(0, '4.477')]
-[2025-03-15 15:54:38,575][06739] Updated weights for policy 0, policy_version 230 (0.0014)
-[2025-03-15 15:54:41,215][06739] Updated weights for policy 0, policy_version 240 (0.0014)
-[2025-03-15 15:54:43,229][06641] Fps is (10 sec: 15974.5, 60 sec: 16725.4, 300 sec: 15627.8). Total num frames: 1015808. Throughput: 0: 4161.2. Samples: 243336. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:54:43,230][06641] Avg episode reward: [(0, '4.579')]
-[2025-03-15 15:54:43,726][06739] Updated weights for policy 0, policy_version 250 (0.0014)
-[2025-03-15 15:54:46,233][06739] Updated weights for policy 0, policy_version 260 (0.0014)
-[2025-03-15 15:54:48,229][06641] Fps is (10 sec: 15974.6, 60 sec: 16588.8, 300 sec: 15623.4). Total num frames: 1093632. Throughput: 0: 4136.4. Samples: 267698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:54:48,230][06641] Avg episode reward: [(0, '4.644')]
-[2025-03-15 15:54:48,763][06739] Updated weights for policy 0, policy_version 270 (0.0014)
-[2025-03-15 15:54:51,265][06739] Updated weights for policy 0, policy_version 280 (0.0014)
-[2025-03-15 15:54:53,229][06641] Fps is (10 sec: 15974.4, 60 sec: 16588.8, 300 sec: 15674.1). Total num frames: 1175552. Throughput: 0: 4102.7. Samples: 291922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:54:53,230][06641] Avg episode reward: [(0, '5.228')]
-[2025-03-15 15:54:53,232][06726] Saving new best policy, reward=5.228!
-[2025-03-15 15:54:53,834][06739] Updated weights for policy 0, policy_version 290 (0.0014)
-[2025-03-15 15:54:56,417][06739] Updated weights for policy 0, policy_version 300 (0.0013)
-[2025-03-15 15:54:58,229][06641] Fps is (10 sec: 15974.4, 60 sec: 16452.3, 300 sec: 15667.2). Total num frames: 1253376. Throughput: 0: 4087.0. Samples: 303904. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:54:58,230][06641] Avg episode reward: [(0, '5.186')]
-[2025-03-15 15:54:59,016][06739] Updated weights for policy 0, policy_version 310 (0.0015)
-[2025-03-15 15:55:01,588][06739] Updated weights for policy 0, policy_version 320 (0.0014)
-[2025-03-15 15:55:03,229][06641] Fps is (10 sec: 15974.3, 60 sec: 16384.0, 300 sec: 15709.4). Total num frames: 1335296. Throughput: 0: 4054.3. Samples: 327708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:55:03,231][06641] Avg episode reward: [(0, '5.705')]
-[2025-03-15 15:55:03,232][06726] Saving new best policy, reward=5.705!
-[2025-03-15 15:55:04,126][06739] Updated weights for policy 0, policy_version 330 (0.0015)
-[2025-03-15 15:55:06,766][06739] Updated weights for policy 0, policy_version 340 (0.0013)
-[2025-03-15 15:55:08,229][06641] Fps is (10 sec: 15974.4, 60 sec: 16247.5, 300 sec: 15701.4). Total num frames: 1413120. Throughput: 0: 4011.1. Samples: 351228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:55:08,231][06641] Avg episode reward: [(0, '6.511')]
-[2025-03-15 15:55:08,238][06726] Saving new best policy, reward=6.511!
-[2025-03-15 15:55:09,677][06739] Updated weights for policy 0, policy_version 350 (0.0015)
-[2025-03-15 15:55:12,329][06739] Updated weights for policy 0, policy_version 360 (0.0014)
-[2025-03-15 15:55:13,230][06641] Fps is (10 sec: 15153.9, 60 sec: 16110.8, 300 sec: 15650.9). Total num frames: 1486848. Throughput: 0: 3973.2. Samples: 362124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:55:13,232][06641] Avg episode reward: [(0, '6.156')]
-[2025-03-15 15:55:15,199][06739] Updated weights for policy 0, policy_version 370 (0.0015)
-[2025-03-15 15:55:17,863][06739] Updated weights for policy 0, policy_version 380 (0.0014)
-[2025-03-15 15:55:18,230][06641] Fps is (10 sec: 14745.5, 60 sec: 15906.1, 300 sec: 15605.8). Total num frames: 1560576. Throughput: 0: 3906.4. Samples: 384318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:55:18,231][06641] Avg episode reward: [(0, '6.009')]
-[2025-03-15 15:55:20,366][06739] Updated weights for policy 0, policy_version 390 (0.0014)
-[2025-03-15 15:55:23,076][06739] Updated weights for policy 0, policy_version 400 (0.0014)
-[2025-03-15 15:55:23,229][06641] Fps is (10 sec: 15156.6, 60 sec: 15769.6, 300 sec: 15603.8). Total num frames: 1638400. Throughput: 0: 3920.9. Samples: 408140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:55:23,231][06641] Avg episode reward: [(0, '6.438')]
-[2025-03-15 15:55:25,534][06739] Updated weights for policy 0, policy_version 410 (0.0013)
-[2025-03-15 15:55:28,001][06739] Updated weights for policy 0, policy_version 420 (0.0013)
-[2025-03-15 15:55:28,229][06641] Fps is (10 sec: 15974.4, 60 sec: 15769.6, 300 sec: 15639.3). Total num frames: 1720320. Throughput: 0: 3927.7. Samples: 420084. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:55:28,231][06641] Avg episode reward: [(0, '6.799')]
-[2025-03-15 15:55:28,246][06726] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000421_1724416.pth...
-[2025-03-15 15:55:28,307][06726] Saving new best policy, reward=6.799!
-[2025-03-15 15:55:30,444][06739] Updated weights for policy 0, policy_version 430 (0.0014)
-[2025-03-15 15:55:32,879][06739] Updated weights for policy 0, policy_version 440 (0.0014)
-[2025-03-15 15:55:33,229][06641] Fps is (10 sec: 16793.6, 60 sec: 15837.9, 300 sec: 15707.3). Total num frames: 1806336. Throughput: 0: 3942.5. Samples: 445112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:55:33,231][06641] Avg episode reward: [(0, '8.663')]
-[2025-03-15 15:55:33,232][06726] Saving new best policy, reward=8.663!
-[2025-03-15 15:55:35,319][06739] Updated weights for policy 0, policy_version 450 (0.0013)
-[2025-03-15 15:55:37,850][06739] Updated weights for policy 0, policy_version 460 (0.0014)
-[2025-03-15 15:55:38,230][06641] Fps is (10 sec: 16793.4, 60 sec: 15906.1, 300 sec: 15735.5). Total num frames: 1888256. Throughput: 0: 3960.1. Samples: 470128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
-[2025-03-15 15:55:38,231][06641] Avg episode reward: [(0, '8.344')]
-[2025-03-15 15:55:40,440][06739] Updated weights for policy 0, policy_version 470 (0.0014)
-[2025-03-15 15:55:42,947][06739] Updated weights for policy 0, policy_version 480 (0.0013)
-[2025-03-15 15:55:43,229][06641] Fps is (10 sec: 16383.9, 60 sec: 15906.1, 300 sec: 15761.4). Total num frames: 1970176. Throughput: 0: 3955.3. Samples: 481894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:55:43,230][06641] Avg episode reward: [(0, '9.046')]
-[2025-03-15 15:55:43,232][06726] Saving new best policy, reward=9.046!
-[2025-03-15 15:55:45,392][06739] Updated weights for policy 0, policy_version 490 (0.0013)
-[2025-03-15 15:55:47,963][06739] Updated weights for policy 0, policy_version 500 (0.0014)
-[2025-03-15 15:55:48,229][06641] Fps is (10 sec: 16384.2, 60 sec: 15974.4, 300 sec: 15785.4). Total num frames: 2052096. Throughput: 0: 3977.2. Samples: 506684. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:55:48,230][06641] Avg episode reward: [(0, '10.340')]
-[2025-03-15 15:55:48,236][06726] Saving new best policy, reward=10.340!
-[2025-03-15 15:55:50,441][06739] Updated weights for policy 0, policy_version 510 (0.0013)
-[2025-03-15 15:55:52,922][06739] Updated weights for policy 0, policy_version 520 (0.0013)
-[2025-03-15 15:55:53,229][06641] Fps is (10 sec: 16384.0, 60 sec: 15974.4, 300 sec: 15807.5). Total num frames: 2134016. Throughput: 0: 3998.4. Samples: 531154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:55:53,231][06641] Avg episode reward: [(0, '10.348')]
-[2025-03-15 15:55:53,233][06726] Saving new best policy, reward=10.348!
-[2025-03-15 15:55:55,398][06739] Updated weights for policy 0, policy_version 530 (0.0014)
-[2025-03-15 15:55:57,893][06739] Updated weights for policy 0, policy_version 540 (0.0014)
-[2025-03-15 15:55:58,229][06641] Fps is (10 sec: 16383.9, 60 sec: 16042.6, 300 sec: 15828.1). Total num frames: 2215936. Throughput: 0: 4029.4. Samples: 543442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:55:58,231][06641] Avg episode reward: [(0, '9.755')]
-[2025-03-15 15:56:00,381][06739] Updated weights for policy 0, policy_version 550 (0.0014)
-[2025-03-15 15:56:03,004][06739] Updated weights for policy 0, policy_version 560 (0.0015)
-[2025-03-15 15:56:03,229][06641] Fps is (10 sec: 15974.4, 60 sec: 15974.4, 300 sec: 15819.0). Total num frames: 2293760. Throughput: 0: 4082.3. Samples: 568022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:56:03,230][06641] Avg episode reward: [(0, '13.241')]
-[2025-03-15 15:56:03,284][06726] Saving new best policy, reward=13.241!
-[2025-03-15 15:56:05,555][06739] Updated weights for policy 0, policy_version 570 (0.0015)
-[2025-03-15 15:56:08,092][06739] Updated weights for policy 0, policy_version 580 (0.0014)
-[2025-03-15 15:56:08,229][06641] Fps is (10 sec: 15974.4, 60 sec: 16042.6, 300 sec: 15837.9). Total num frames: 2375680. Throughput: 0: 4080.9. Samples: 591782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:56:08,231][06641] Avg episode reward: [(0, '12.836')]
-[2025-03-15 15:56:10,632][06739] Updated weights for policy 0, policy_version 590 (0.0014)
-[2025-03-15 15:56:13,229][06641] Fps is (10 sec: 15974.4, 60 sec: 16111.2, 300 sec: 15829.1). Total num frames: 2453504. Throughput: 0: 4086.0. Samples: 603954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:56:13,231][06641] Avg episode reward: [(0, '14.640')]
-[2025-03-15 15:56:13,232][06726] Saving new best policy, reward=14.640!
-[2025-03-15 15:56:13,398][06739] Updated weights for policy 0, policy_version 600 (0.0014)
-[2025-03-15 15:56:15,937][06739] Updated weights for policy 0, policy_version 610 (0.0014)
-[2025-03-15 15:56:18,229][06641] Fps is (10 sec: 15565.0, 60 sec: 16179.2, 300 sec: 15820.8). Total num frames: 2531328. Throughput: 0: 4045.3. Samples: 627152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:56:18,231][06641] Avg episode reward: [(0, '15.983')]
-[2025-03-15 15:56:18,238][06726] Saving new best policy, reward=15.983!
-[2025-03-15 15:56:18,597][06739] Updated weights for policy 0, policy_version 620 (0.0013)
-[2025-03-15 15:56:21,516][06739] Updated weights for policy 0, policy_version 630 (0.0015)
-[2025-03-15 15:56:23,229][06641] Fps is (10 sec: 14745.5, 60 sec: 16042.6, 300 sec: 15763.4). Total num frames: 2600960. Throughput: 0: 3968.9. Samples: 648730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
-[2025-03-15 15:56:23,231][06641] Avg episode reward: [(0, '17.863')]
-[2025-03-15 15:56:23,233][06726] Saving new best policy, reward=17.863!
-[2025-03-15 15:56:24,325][06739] Updated weights for policy 0, policy_version 640 (0.0014)
-[2025-03-15 15:56:26,928][06739] Updated weights for policy 0, policy_version 650 (0.0014)
-[2025-03-15 15:56:28,230][06641] Fps is (10 sec: 14745.3, 60 sec: 15974.4, 300 sec: 15757.5). Total num frames: 2678784. Throughput: 0: 3963.9. Samples: 660272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-03-15 15:56:28,231][06641] Avg episode reward: [(0, '18.605')]
-[2025-03-15 15:56:28,239][06726] Saving new best policy, reward=18.605!
-[2025-03-15 15:56:29,761][06739] Updated weights for policy 0, policy_version 660 (0.0015)
-[2025-03-15 15:56:32,562][06739] Updated weights for policy 0, policy_version 670 (0.0015)
-[2025-03-15 15:56:33,229][06641] Fps is (10 sec: 15155.4, 60 sec: 15769.6, 300 sec: 15728.7). Total num frames: 2752512. Throughput: 0: 3900.8. Samples: 682218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:56:33,231][06641] Avg episode reward: [(0, '17.530')]
-[2025-03-15 15:56:35,364][06739] Updated weights for policy 0, policy_version 680 (0.0015)
-[2025-03-15 15:56:37,945][06739] Updated weights for policy 0, policy_version 690 (0.0014)
-[2025-03-15 15:56:38,229][06641] Fps is (10 sec: 15155.4, 60 sec: 15701.4, 300 sec: 15724.1). Total num frames: 2830336. Throughput: 0: 3862.9. Samples: 704986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-03-15 15:56:38,231][06641] Avg episode reward: [(0, '19.431')]
-[2025-03-15 15:56:38,239][06726] Saving new best policy, reward=19.431!
-[2025-03-15 15:56:41,194][06739] Updated weights for policy 0, policy_version 700 (0.0016)
-[2025-03-15 15:56:43,229][06641] Fps is (10 sec: 14335.9, 60 sec: 15428.3, 300 sec: 15653.4). Total num frames: 2895872. Throughput: 0: 3807.4. Samples: 714774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-03-15 15:56:43,231][06641] Avg episode reward: [(0, '18.313')]
-[2025-03-15 15:56:43,925][06739] Updated weights for policy 0, policy_version 710 (0.0015)
-[2025-03-15 15:56:46,617][06739] Updated weights for policy 0, policy_version 720 (0.0015)
-[2025-03-15 15:56:48,229][06641] Fps is (10 sec: 14336.0, 60 sec: 15360.0, 300 sec: 15651.0). Total num frames: 2973696. Throughput: 0: 3753.6. Samples: 736936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-03-15 15:56:48,231][06641] Avg episode reward: [(0, '20.308')]
-[2025-03-15 15:56:48,237][06726] Saving new best policy, reward=20.308!
-[2025-03-15 15:56:49,261][06739] Updated weights for policy 0, policy_version 730 (0.0014)
-[2025-03-15 15:56:51,940][06739] Updated weights for policy 0, policy_version 740 (0.0014)
-[2025-03-15 15:56:53,229][06641] Fps is (10 sec: 15155.3, 60 sec: 15223.5, 300 sec: 15627.8). Total num frames: 3047424. Throughput: 0: 3741.6. Samples: 760154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:56:53,231][06641] Avg episode reward: [(0, '18.605')]
-[2025-03-15 15:56:54,564][06739] Updated weights for policy 0, policy_version 750 (0.0015)
-[2025-03-15 15:56:57,219][06739] Updated weights for policy 0, policy_version 760 (0.0015)
-[2025-03-15 15:56:58,229][06641] Fps is (10 sec: 15155.3, 60 sec: 15155.2, 300 sec: 15626.2). Total num frames: 3125248. Throughput: 0: 3729.3. Samples: 771772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:56:58,231][06641] Avg episode reward: [(0, '22.034')]
-[2025-03-15 15:56:58,262][06726] Saving new best policy, reward=22.034!
-[2025-03-15 15:56:59,858][06739] Updated weights for policy 0, policy_version 770 (0.0014)
-[2025-03-15 15:57:02,509][06739] Updated weights for policy 0, policy_version 780 (0.0014)
-[2025-03-15 15:57:03,230][06641] Fps is (10 sec: 15564.5, 60 sec: 15155.2, 300 sec: 15624.7). Total num frames: 3203072. Throughput: 0: 3729.9. Samples: 795000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:57:03,231][06641] Avg episode reward: [(0, '18.466')]
-[2025-03-15 15:57:05,389][06739] Updated weights for policy 0, policy_version 790 (0.0015)
-[2025-03-15 15:57:08,230][06641] Fps is (10 sec: 14745.2, 60 sec: 14950.3, 300 sec: 15584.3). Total num frames: 3272704. Throughput: 0: 3721.3. Samples: 816190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:57:08,231][06641] Avg episode reward: [(0, '20.328')]
-[2025-03-15 15:57:08,442][06739] Updated weights for policy 0, policy_version 800 (0.0015)
-[2025-03-15 15:57:11,291][06739] Updated weights for policy 0, policy_version 810 (0.0015)
-[2025-03-15 15:57:13,229][06641] Fps is (10 sec: 14336.2, 60 sec: 14882.1, 300 sec: 15564.8). Total num frames: 3346432. Throughput: 0: 3696.0. Samples: 826590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:57:13,230][06641] Avg episode reward: [(0, '20.912')]
-[2025-03-15 15:57:13,817][06739] Updated weights for policy 0, policy_version 820 (0.0014)
-[2025-03-15 15:57:16,283][06739] Updated weights for policy 0, policy_version 830 (0.0013)
-[2025-03-15 15:57:18,229][06641] Fps is (10 sec: 15565.1, 60 sec: 14950.4, 300 sec: 15583.4). Total num frames: 3428352. Throughput: 0: 3748.2. Samples: 850888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:57:18,231][06641] Avg episode reward: [(0, '17.853')]
-[2025-03-15 15:57:18,802][06739] Updated weights for policy 0, policy_version 840 (0.0014)
-[2025-03-15 15:57:21,316][06739] Updated weights for policy 0, policy_version 850 (0.0015)
-[2025-03-15 15:57:23,229][06641] Fps is (10 sec: 16384.0, 60 sec: 15155.2, 300 sec: 15601.2). Total num frames: 3510272. Throughput: 0: 3786.4. Samples: 875372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:57:23,230][06641] Avg episode reward: [(0, '20.200')]
-[2025-03-15 15:57:23,819][06739] Updated weights for policy 0, policy_version 860 (0.0015)
-[2025-03-15 15:57:26,318][06739] Updated weights for policy 0, policy_version 870 (0.0015)
-[2025-03-15 15:57:28,230][06641] Fps is (10 sec: 16384.0, 60 sec: 15223.5, 300 sec: 15618.2). Total num frames: 3592192. Throughput: 0: 3841.2. Samples: 887628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:57:28,231][06641] Avg episode reward: [(0, '20.353')]
-[2025-03-15 15:57:28,238][06726] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000877_3592192.pth...
-[2025-03-15 15:57:28,854][06739] Updated weights for policy 0, policy_version 880 (0.0014)
-[2025-03-15 15:57:31,382][06739] Updated weights for policy 0, policy_version 890 (0.0014)
-[2025-03-15 15:57:33,229][06641] Fps is (10 sec: 16384.0, 60 sec: 15360.0, 300 sec: 15634.5). Total num frames: 3674112. Throughput: 0: 3891.5. Samples: 912052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:57:33,230][06641] Avg episode reward: [(0, '20.766')]
-[2025-03-15 15:57:33,894][06739] Updated weights for policy 0, policy_version 900 (0.0013)
-[2025-03-15 15:57:36,426][06739] Updated weights for policy 0, policy_version 910 (0.0014)
-[2025-03-15 15:57:38,229][06641] Fps is (10 sec: 16384.1, 60 sec: 15428.3, 300 sec: 15650.1). Total num frames: 3756032. Throughput: 0: 3917.1. Samples: 936426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:57:38,230][06641] Avg episode reward: [(0, '20.329')]
-[2025-03-15 15:57:38,976][06739] Updated weights for policy 0, policy_version 920 (0.0014)
-[2025-03-15 15:57:41,466][06739] Updated weights for policy 0, policy_version 930 (0.0014)
-[2025-03-15 15:57:43,229][06641] Fps is (10 sec: 16383.9, 60 sec: 15701.3, 300 sec: 15665.1). Total num frames: 3837952. Throughput: 0: 3928.0. Samples: 948530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:57:43,231][06641] Avg episode reward: [(0, '22.139')]
-[2025-03-15 15:57:43,232][06726] Saving new best policy, reward=22.139!
-[2025-03-15 15:57:43,983][06739] Updated weights for policy 0, policy_version 940 (0.0014)
-[2025-03-15 15:57:46,495][06739] Updated weights for policy 0, policy_version 950 (0.0014)
-[2025-03-15 15:57:48,229][06641] Fps is (10 sec: 15974.5, 60 sec: 15701.4, 300 sec: 15663.1). Total num frames: 3915776. Throughput: 0: 3955.3. Samples: 972988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-03-15 15:57:48,230][06641] Avg episode reward: [(0, '21.062')]
-[2025-03-15 15:57:48,983][06739] Updated weights for policy 0, policy_version 960 (0.0013)
-[2025-03-15 15:57:51,461][06739] Updated weights for policy 0, policy_version 970 (0.0013)
-[2025-03-15 15:57:53,229][06641] Fps is (10 sec: 16384.1, 60 sec: 15906.1, 300 sec: 15693.3). Total num frames: 4001792. Throughput: 0: 4032.9. Samples: 997668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-03-15 15:57:53,230][06641] Avg episode reward: [(0, '22.185')]
-[2025-03-15 15:57:53,232][06726] Saving new best policy, reward=22.185!
-[2025-03-15 15:57:53,452][06726] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2025-03-15 15:57:53,452][06641] Component Batcher_0 stopped!
-[2025-03-15 15:57:53,452][06726] Stopping Batcher_0...
-[2025-03-15 15:57:53,458][06726] Loop batcher_evt_loop terminating...
-[2025-03-15 15:57:53,476][06739] Weights refcount: 2 0
-[2025-03-15 15:57:53,478][06739] Stopping InferenceWorker_p0-w0...
-[2025-03-15 15:57:53,478][06739] Loop inference_proc0-0_evt_loop terminating...
-[2025-03-15 15:57:53,478][06641] Component InferenceWorker_p0-w0 stopped!
-[2025-03-15 15:57:53,525][06743] Stopping RolloutWorker_w2...
-[2025-03-15 15:57:53,525][06740] Stopping RolloutWorker_w0...
-[2025-03-15 15:57:53,525][06743] Loop rollout_proc2_evt_loop terminating...
-[2025-03-15 15:57:53,525][06740] Loop rollout_proc0_evt_loop terminating...
-[2025-03-15 15:57:53,525][06744] Stopping RolloutWorker_w3...
-[2025-03-15 15:57:53,526][06744] Loop rollout_proc3_evt_loop terminating...
-[2025-03-15 15:57:53,526][06742] Stopping RolloutWorker_w4...
-[2025-03-15 15:57:53,526][06747] Stopping RolloutWorker_w6...
-[2025-03-15 15:57:53,526][06742] Loop rollout_proc4_evt_loop terminating...
-[2025-03-15 15:57:53,527][06747] Loop rollout_proc6_evt_loop terminating...
-[2025-03-15 15:57:53,527][06726] Removing /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000421_1724416.pth
-[2025-03-15 15:57:53,528][06745] Stopping RolloutWorker_w5...
-[2025-03-15 15:57:53,529][06745] Loop rollout_proc5_evt_loop terminating...
-[2025-03-15 15:57:53,525][06641] Component RolloutWorker_w2 stopped!
-[2025-03-15 15:57:53,530][06641] Component RolloutWorker_w0 stopped!
-[2025-03-15 15:57:53,531][06641] Component RolloutWorker_w3 stopped!
-[2025-03-15 15:57:53,532][06641] Component RolloutWorker_w4 stopped!
-[2025-03-15 15:57:53,534][06746] Stopping RolloutWorker_w7...
-[2025-03-15 15:57:53,534][06746] Loop rollout_proc7_evt_loop terminating...
-[2025-03-15 15:57:53,534][06641] Component RolloutWorker_w6 stopped!
-[2025-03-15 15:57:53,536][06741] Stopping RolloutWorker_w1...
-[2025-03-15 15:57:53,536][06741] Loop rollout_proc1_evt_loop terminating...
-[2025-03-15 15:57:53,535][06641] Component RolloutWorker_w5 stopped!
-[2025-03-15 15:57:53,537][06726] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2025-03-15 15:57:53,537][06641] Component RolloutWorker_w7 stopped!
-[2025-03-15 15:57:53,539][06641] Component RolloutWorker_w1 stopped!
-[2025-03-15 15:57:53,644][06726] Stopping LearnerWorker_p0...
-[2025-03-15 15:57:53,644][06726] Loop learner_proc0_evt_loop terminating...
-[2025-03-15 15:57:53,644][06641] Component LearnerWorker_p0 stopped!
-[2025-03-15 15:57:53,646][06641] Waiting for process learner_proc0 to stop...
-[2025-03-15 15:57:54,650][06641] Waiting for process inference_proc0-0 to join...
-[2025-03-15 15:57:54,651][06641] Waiting for process rollout_proc0 to join...
-[2025-03-15 15:57:54,652][06641] Waiting for process rollout_proc1 to join...
-[2025-03-15 15:57:54,653][06641] Waiting for process rollout_proc2 to join...
-[2025-03-15 15:57:54,653][06641] Waiting for process rollout_proc3 to join...
-[2025-03-15 15:57:54,654][06641] Waiting for process rollout_proc4 to join...
-[2025-03-15 15:57:54,655][06641] Waiting for process rollout_proc5 to join...
-[2025-03-15 15:57:54,656][06641] Waiting for process rollout_proc6 to join...
-[2025-03-15 15:57:54,657][06641] Waiting for process rollout_proc7 to join...
-[2025-03-15 15:57:54,657][06641] Batcher 0 profile tree view:
-batching: 12.8715, releasing_batches: 0.0268
-[2025-03-15 15:57:54,658][06641] InferenceWorker_p0-w0 profile tree view:
+[2025-03-15 16:00:46,086][09445] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-03-15 16:00:47,077][09445] No checkpoints found
+[2025-03-15 16:00:47,077][09445] Did not load from checkpoint, starting from scratch!
+[2025-03-15 16:00:47,077][09445] Initialized policy 0 weights for model version 0
+[2025-03-15 16:00:47,081][09445] LearnerWorker_p0 finished initialization!
+[2025-03-15 16:00:47,081][09445] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-03-15 16:00:47,186][09459] RunningMeanStd input shape: (3, 72, 128)
+[2025-03-15 16:00:47,187][09459] RunningMeanStd input shape: (1,)
+[2025-03-15 16:00:47,200][09459] ConvEncoder: input_channels=3
+[2025-03-15 16:00:47,319][09459] Conv encoder output size: 512
+[2025-03-15 16:00:47,319][09459] Policy head output size: 512
+[2025-03-15 16:00:47,353][09281] Inference worker 0-0 is ready!
+[2025-03-15 16:00:47,353][09281] All inference workers are ready! Signal rollout workers to start!
+[2025-03-15 16:00:47,405][09462] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:00:47,411][09466] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:00:47,412][09460] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:00:47,428][09465] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:00:47,440][09464] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:00:47,456][09463] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:00:47,470][09461] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:00:47,485][09458] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:00:47,864][09460] Decorrelating experience for 0 frames...
+[2025-03-15 16:00:47,891][09461] Decorrelating experience for 0 frames...
+[2025-03-15 16:00:47,928][09466] Decorrelating experience for 0 frames...
+[2025-03-15 16:00:47,953][09463] Decorrelating experience for 0 frames...
+[2025-03-15 16:00:47,970][09462] Decorrelating experience for 0 frames...
+[2025-03-15 16:00:47,972][09464] Decorrelating experience for 0 frames...
+[2025-03-15 16:00:47,977][09281] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-03-15 16:00:48,268][09458] Decorrelating experience for 0 frames...
+[2025-03-15 16:00:48,276][09461] Decorrelating experience for 32 frames...
+[2025-03-15 16:00:48,356][09462] Decorrelating experience for 32 frames...
+[2025-03-15 16:00:48,374][09464] Decorrelating experience for 32 frames...
+[2025-03-15 16:00:48,434][09466] Decorrelating experience for 32 frames...
+[2025-03-15 16:00:48,454][09463] Decorrelating experience for 32 frames...
+[2025-03-15 16:00:48,705][09458] Decorrelating experience for 32 frames...
+[2025-03-15 16:00:48,819][09461] Decorrelating experience for 64 frames...
+[2025-03-15 16:00:48,821][09465] Decorrelating experience for 0 frames...
+[2025-03-15 16:00:48,856][09460] Decorrelating experience for 32 frames...
+[2025-03-15 16:00:48,922][09464] Decorrelating experience for 64 frames...
+[2025-03-15 16:00:49,125][09466] Decorrelating experience for 64 frames...
+[2025-03-15 16:00:49,197][09465] Decorrelating experience for 32 frames...
+[2025-03-15 16:00:49,214][09463] Decorrelating experience for 64 frames...
+[2025-03-15 16:00:49,362][09464] Decorrelating experience for 96 frames...
+[2025-03-15 16:00:49,395][09460] Decorrelating experience for 64 frames...
+[2025-03-15 16:00:49,408][09458] Decorrelating experience for 64 frames...
+[2025-03-15 16:00:49,563][09461] Decorrelating experience for 96 frames...
+[2025-03-15 16:00:49,679][09465] Decorrelating experience for 64 frames...
+[2025-03-15 16:00:49,786][09463] Decorrelating experience for 96 frames...
+[2025-03-15 16:00:49,819][09458] Decorrelating experience for 96 frames...
+[2025-03-15 16:00:49,939][09462] Decorrelating experience for 64 frames...
+[2025-03-15 16:00:50,100][09465] Decorrelating experience for 96 frames...
+[2025-03-15 16:00:50,196][09460] Decorrelating experience for 96 frames...
+[2025-03-15 16:00:50,342][09466] Decorrelating experience for 96 frames...
+[2025-03-15 16:00:50,418][09462] Decorrelating experience for 96 frames...
+[2025-03-15 16:00:50,760][09445] Signal inference workers to stop experience collection...
+[2025-03-15 16:00:50,765][09459] InferenceWorker_p0-w0: stopping experience collection
+[2025-03-15 16:00:52,446][09445] Signal inference workers to resume experience collection...
+[2025-03-15 16:00:52,446][09459] InferenceWorker_p0-w0: resuming experience collection
+[2025-03-15 16:00:52,976][09281] Fps is (10 sec: 2457.9, 60 sec: 2457.9, 300 sec: 2457.9). Total num frames: 12288. Throughput: 0: 177.2. Samples: 886. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2025-03-15 16:00:52,977][09281] Avg episode reward: [(0, '2.959')]
+[2025-03-15 16:00:54,521][09459] Updated weights for policy 0, policy_version 10 (0.0070)
+[2025-03-15 16:00:57,022][09459] Updated weights for policy 0, policy_version 20 (0.0013)
+[2025-03-15 16:00:57,977][09281] Fps is (10 sec: 9421.0, 60 sec: 9421.0, 300 sec: 9421.0). Total num frames: 94208. Throughput: 0: 1823.8. Samples: 18238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-03-15 16:00:57,978][09281] Avg episode reward: [(0, '4.438')]
+[2025-03-15 16:00:59,445][09459] Updated weights for policy 0, policy_version 30 (0.0013)
+[2025-03-15 16:01:01,821][09459] Updated weights for policy 0, policy_version 40 (0.0012)
+[2025-03-15 16:01:02,180][09281] Heartbeat connected on Batcher_0
+[2025-03-15 16:01:02,184][09281] Heartbeat connected on LearnerWorker_p0
+[2025-03-15 16:01:02,197][09281] Heartbeat connected on InferenceWorker_p0-w0
+[2025-03-15 16:01:02,199][09281] Heartbeat connected on RolloutWorker_w0
+[2025-03-15 16:01:02,203][09281] Heartbeat connected on RolloutWorker_w1
+[2025-03-15 16:01:02,206][09281] Heartbeat connected on RolloutWorker_w2
+[2025-03-15 16:01:02,213][09281] Heartbeat connected on RolloutWorker_w3
+[2025-03-15 16:01:02,214][09281] Heartbeat connected on RolloutWorker_w4
+[2025-03-15 16:01:02,217][09281] Heartbeat connected on RolloutWorker_w5
+[2025-03-15 16:01:02,222][09281] Heartbeat connected on RolloutWorker_w6
+[2025-03-15 16:01:02,226][09281] Heartbeat connected on RolloutWorker_w7
+[2025-03-15 16:01:02,976][09281] Fps is (10 sec: 16793.4, 60 sec: 12015.3, 300 sec: 12015.3). Total num frames: 180224. Throughput: 0: 2919.7. Samples: 43794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-03-15 16:01:02,978][09281] Avg episode reward: [(0, '4.523')]
+[2025-03-15 16:01:02,979][09445] Saving new best policy, reward=4.523!
+[2025-03-15 16:01:04,242][09459] Updated weights for policy 0, policy_version 50 (0.0012)
+[2025-03-15 16:01:06,634][09459] Updated weights for policy 0, policy_version 60 (0.0012)
+[2025-03-15 16:01:07,976][09281] Fps is (10 sec: 17203.9, 60 sec: 13312.4, 300 sec: 13312.4). Total num frames: 266240. Throughput: 0: 2830.4. Samples: 56606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-03-15 16:01:07,978][09281] Avg episode reward: [(0, '4.448')]
+[2025-03-15 16:01:09,036][09459] Updated weights for policy 0, policy_version 70 (0.0013)
+[2025-03-15 16:01:11,448][09459] Updated weights for policy 0, policy_version 80 (0.0013)
+[2025-03-15 16:01:12,976][09281] Fps is (10 sec: 17203.1, 60 sec: 14090.5, 300 sec: 14090.5). Total num frames: 352256. Throughput: 0: 3286.5. Samples: 82162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-03-15 16:01:12,977][09281] Avg episode reward: [(0, '4.303')]
+[2025-03-15 16:01:13,865][09459] Updated weights for policy 0, policy_version 90 (0.0013)
+[2025-03-15 16:01:16,313][09459] Updated weights for policy 0, policy_version 100 (0.0014)
+[2025-03-15 16:01:17,976][09281] Fps is (10 sec: 16793.5, 60 sec: 14472.8, 300 sec: 14472.8). Total num frames: 434176. Throughput: 0: 3583.0. Samples: 107488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:01:17,980][09281] Avg episode reward: [(0, '4.684')]
+[2025-03-15 16:01:18,029][09445] Saving new best policy, reward=4.684!
+[2025-03-15 16:01:18,772][09459] Updated weights for policy 0, policy_version 110 (0.0013)
+[2025-03-15 16:01:21,195][09459] Updated weights for policy 0, policy_version 120 (0.0014)
+[2025-03-15 16:01:22,976][09281] Fps is (10 sec: 16793.8, 60 sec: 14862.9, 300 sec: 14862.9). Total num frames: 520192. Throughput: 0: 3429.1. Samples: 120016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:01:22,977][09281] Avg episode reward: [(0, '4.556')]
+[2025-03-15 16:01:23,631][09459] Updated weights for policy 0, policy_version 130 (0.0012)
+[2025-03-15 16:01:26,053][09459] Updated weights for policy 0, policy_version 140 (0.0012)
+[2025-03-15 16:01:27,976][09281] Fps is (10 sec: 16793.4, 60 sec: 15053.0, 300 sec: 15053.0). Total num frames: 602112. Throughput: 0: 3634.8. Samples: 145392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:01:27,978][09281] Avg episode reward: [(0, '4.356')]
+[2025-03-15 16:01:28,485][09459] Updated weights for policy 0, policy_version 150 (0.0013)
+[2025-03-15 16:01:30,919][09459] Updated weights for policy 0, policy_version 160 (0.0013)
+[2025-03-15 16:01:32,976][09281] Fps is (10 sec: 16793.6, 60 sec: 15291.9, 300 sec: 15291.9). Total num frames: 688128. Throughput: 0: 3791.5. Samples: 170616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:01:32,977][09281] Avg episode reward: [(0, '4.677')]
+[2025-03-15 16:01:33,361][09459] Updated weights for policy 0, policy_version 170 (0.0013)
+[2025-03-15 16:01:35,830][09459] Updated weights for policy 0, policy_version 180 (0.0013)
+[2025-03-15 16:01:37,976][09281] Fps is (10 sec: 16793.5, 60 sec: 15401.1, 300 sec: 15401.1). Total num frames: 770048. Throughput: 0: 4051.3. Samples: 183194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:01:37,978][09281] Avg episode reward: [(0, '4.554')]
+[2025-03-15 16:01:38,297][09459] Updated weights for policy 0, policy_version 190 (0.0014)
+[2025-03-15 16:01:40,738][09459] Updated weights for policy 0, policy_version 200 (0.0014)
+[2025-03-15 16:01:42,976][09281] Fps is (10 sec: 16793.6, 60 sec: 15564.9, 300 sec: 15564.9). Total num frames: 856064. Throughput: 0: 4221.8. Samples: 208218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:01:42,977][09281] Avg episode reward: [(0, '4.673')]
+[2025-03-15 16:01:43,180][09459] Updated weights for policy 0, policy_version 210 (0.0014)
+[2025-03-15 16:01:45,608][09459] Updated weights for policy 0, policy_version 220 (0.0012)
+[2025-03-15 16:01:47,976][09281] Fps is (10 sec: 16793.7, 60 sec: 15633.2, 300 sec: 15633.2). Total num frames: 937984. Throughput: 0: 4213.4. Samples: 233398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:01:47,978][09281] Avg episode reward: [(0, '4.598')]
+[2025-03-15 16:01:48,055][09459] Updated weights for policy 0, policy_version 230 (0.0014)
+[2025-03-15 16:01:50,475][09459] Updated weights for policy 0, policy_version 240 (0.0013)
+[2025-03-15 16:01:52,961][09459] Updated weights for policy 0, policy_version 250 (0.0013)
+[2025-03-15 16:01:52,976][09281] Fps is (10 sec: 16793.6, 60 sec: 16861.9, 300 sec: 15754.0). Total num frames: 1024000. Throughput: 0: 4208.1. Samples: 245970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:01:52,977][09281] Avg episode reward: [(0, '4.840')]
+[2025-03-15 16:01:52,979][09445] Saving new best policy, reward=4.840!
+[2025-03-15 16:01:55,422][09459] Updated weights for policy 0, policy_version 260 (0.0013)
+[2025-03-15 16:01:57,903][09459] Updated weights for policy 0, policy_version 270 (0.0014)
+[2025-03-15 16:01:57,976][09281] Fps is (10 sec: 16793.8, 60 sec: 16862.0, 300 sec: 15799.0). Total num frames: 1105920. Throughput: 0: 4194.3. Samples: 270906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:01:57,977][09281] Avg episode reward: [(0, '4.986')]
+[2025-03-15 16:01:57,984][09445] Saving new best policy, reward=4.986!
+[2025-03-15 16:02:00,396][09459] Updated weights for policy 0, policy_version 280 (0.0013)
+[2025-03-15 16:02:02,880][09459] Updated weights for policy 0, policy_version 290 (0.0014)
+[2025-03-15 16:02:02,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16793.6, 300 sec: 15838.0). Total num frames: 1187840. Throughput: 0: 4180.1. Samples: 295592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:02,977][09281] Avg episode reward: [(0, '5.463')]
+[2025-03-15 16:02:02,979][09445] Saving new best policy, reward=5.463!
+[2025-03-15 16:02:05,353][09459] Updated weights for policy 0, policy_version 300 (0.0014)
+[2025-03-15 16:02:07,802][09459] Updated weights for policy 0, policy_version 310 (0.0013)
+[2025-03-15 16:02:07,976][09281] Fps is (10 sec: 16383.8, 60 sec: 16725.3, 300 sec: 15872.1). Total num frames: 1269760. Throughput: 0: 4176.9. Samples: 307978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:07,978][09281] Avg episode reward: [(0, '5.790')]
+[2025-03-15 16:02:07,984][09445] Saving new best policy, reward=5.790!
+[2025-03-15 16:02:10,266][09459] Updated weights for policy 0, policy_version 320 (0.0014)
+[2025-03-15 16:02:12,754][09459] Updated weights for policy 0, policy_version 330 (0.0013)
+[2025-03-15 16:02:12,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16657.1, 300 sec: 15902.2). Total num frames: 1351680. Throughput: 0: 4167.0. Samples: 332908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:12,978][09281] Avg episode reward: [(0, '6.506')]
+[2025-03-15 16:02:12,995][09445] Saving new best policy, reward=6.506!
+[2025-03-15 16:02:15,223][09459] Updated weights for policy 0, policy_version 340 (0.0013)
+[2025-03-15 16:02:17,672][09459] Updated weights for policy 0, policy_version 350 (0.0013)
+[2025-03-15 16:02:17,976][09281] Fps is (10 sec: 16793.6, 60 sec: 16725.3, 300 sec: 15974.5). Total num frames: 1437696. Throughput: 0: 4159.8. Samples: 357808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:17,978][09281] Avg episode reward: [(0, '6.804')]
+[2025-03-15 16:02:17,983][09445] Saving new best policy, reward=6.804!
+[2025-03-15 16:02:20,157][09459] Updated weights for policy 0, policy_version 360 (0.0013)
+[2025-03-15 16:02:22,646][09459] Updated weights for policy 0, policy_version 370 (0.0013)
+[2025-03-15 16:02:22,976][09281] Fps is (10 sec: 16793.8, 60 sec: 16657.1, 300 sec: 15996.1). Total num frames: 1519616. Throughput: 0: 4157.0. Samples: 370260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:22,977][09281] Avg episode reward: [(0, '7.122')]
+[2025-03-15 16:02:22,979][09445] Saving new best policy, reward=7.122!
+[2025-03-15 16:02:25,154][09459] Updated weights for policy 0, policy_version 380 (0.0014)
+[2025-03-15 16:02:27,614][09459] Updated weights for policy 0, policy_version 390 (0.0013)
+[2025-03-15 16:02:27,977][09281] Fps is (10 sec: 16383.2, 60 sec: 16656.9, 300 sec: 16015.4). Total num frames: 1601536. Throughput: 0: 4147.7. Samples: 394868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:27,978][09281] Avg episode reward: [(0, '9.145')]
+[2025-03-15 16:02:27,983][09445] Saving new best policy, reward=9.145!
+[2025-03-15 16:02:30,086][09459] Updated weights for policy 0, policy_version 400 (0.0014)
+[2025-03-15 16:02:32,583][09459] Updated weights for policy 0, policy_version 410 (0.0013)
+[2025-03-15 16:02:32,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16588.8, 300 sec: 16033.0). Total num frames: 1683456. Throughput: 0: 4138.1. Samples: 419612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:02:32,977][09281] Avg episode reward: [(0, '9.675')]
+[2025-03-15 16:02:32,979][09445] Saving new best policy, reward=9.675!
+[2025-03-15 16:02:35,085][09459] Updated weights for policy 0, policy_version 420 (0.0013)
+[2025-03-15 16:02:37,562][09459] Updated weights for policy 0, policy_version 430 (0.0014)
+[2025-03-15 16:02:37,976][09281] Fps is (10 sec: 16384.9, 60 sec: 16588.9, 300 sec: 16049.0). Total num frames: 1765376. Throughput: 0: 4132.9. Samples: 431950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:37,978][09281] Avg episode reward: [(0, '10.881')]
+[2025-03-15 16:02:37,984][09445] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000431_1765376.pth...
+[2025-03-15 16:02:38,048][09445] Saving new best policy, reward=10.881!
+[2025-03-15 16:02:40,049][09459] Updated weights for policy 0, policy_version 440 (0.0013)
+[2025-03-15 16:02:42,557][09459] Updated weights for policy 0, policy_version 450 (0.0014)
+[2025-03-15 16:02:42,976][09281] Fps is (10 sec: 16383.8, 60 sec: 16520.5, 300 sec: 16063.5). Total num frames: 1847296. Throughput: 0: 4125.5. Samples: 456556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:42,977][09281] Avg episode reward: [(0, '13.622')]
+[2025-03-15 16:02:42,978][09445] Saving new best policy, reward=13.622!
+[2025-03-15 16:02:45,036][09459] Updated weights for policy 0, policy_version 460 (0.0014)
+[2025-03-15 16:02:47,520][09459] Updated weights for policy 0, policy_version 470 (0.0014)
+[2025-03-15 16:02:47,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16520.6, 300 sec: 16076.9). Total num frames: 1929216. Throughput: 0: 4127.5. Samples: 481330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:02:47,978][09281] Avg episode reward: [(0, '16.242')]
+[2025-03-15 16:02:48,016][09445] Saving new best policy, reward=16.242!
+[2025-03-15 16:02:50,032][09459] Updated weights for policy 0, policy_version 480 (0.0013)
+[2025-03-15 16:02:52,499][09459] Updated weights for policy 0, policy_version 490 (0.0014)
+[2025-03-15 16:02:52,976][09281] Fps is (10 sec: 16383.8, 60 sec: 16452.2, 300 sec: 16089.1). Total num frames: 2011136. Throughput: 0: 4124.8. Samples: 493596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:02:52,977][09281] Avg episode reward: [(0, '15.902')]
+[2025-03-15 16:02:54,957][09459] Updated weights for policy 0, policy_version 500 (0.0014)
+[2025-03-15 16:02:57,415][09459] Updated weights for policy 0, policy_version 510 (0.0013)
+[2025-03-15 16:02:57,977][09281] Fps is (10 sec: 16793.3, 60 sec: 16520.5, 300 sec: 16132.0). Total num frames: 2097152. Throughput: 0: 4125.9. Samples: 518576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:02:57,978][09281] Avg episode reward: [(0, '16.464')]
+[2025-03-15 16:02:57,983][09445] Saving new best policy, reward=16.464!
+[2025-03-15 16:02:59,895][09459] Updated weights for policy 0, policy_version 520 (0.0014)
+[2025-03-15 16:03:02,342][09459] Updated weights for policy 0, policy_version 530 (0.0014)
+[2025-03-15 16:03:02,976][09281] Fps is (10 sec: 16793.8, 60 sec: 16520.6, 300 sec: 16141.3). Total num frames: 2179072. Throughput: 0: 4124.3. Samples: 543402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:03:02,977][09281] Avg episode reward: [(0, '20.861')]
+[2025-03-15 16:03:02,979][09445] Saving new best policy, reward=20.861!
+[2025-03-15 16:03:04,824][09459] Updated weights for policy 0, policy_version 540 (0.0014)
+[2025-03-15 16:03:07,290][09459] Updated weights for policy 0, policy_version 550 (0.0014)
+[2025-03-15 16:03:07,977][09281] Fps is (10 sec: 16383.8, 60 sec: 16520.5, 300 sec: 16150.0). Total num frames: 2260992. Throughput: 0: 4122.1. Samples: 555754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:03:07,978][09281] Avg episode reward: [(0, '22.160')]
+[2025-03-15 16:03:08,030][09445] Saving new best policy, reward=22.160!
+[2025-03-15 16:03:09,775][09459] Updated weights for policy 0, policy_version 560 (0.0013)
+[2025-03-15 16:03:12,254][09459] Updated weights for policy 0, policy_version 570 (0.0013)
+[2025-03-15 16:03:12,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16520.5, 300 sec: 16158.1). Total num frames: 2342912. Throughput: 0: 4127.1. Samples: 580586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:03:12,978][09281] Avg episode reward: [(0, '19.857')]
+[2025-03-15 16:03:14,735][09459] Updated weights for policy 0, policy_version 580 (0.0013)
+[2025-03-15 16:03:17,257][09459] Updated weights for policy 0, policy_version 590 (0.0014)
+[2025-03-15 16:03:17,976][09281] Fps is (10 sec: 16384.2, 60 sec: 16452.2, 300 sec: 16165.6). Total num frames: 2424832. Throughput: 0: 4125.2. Samples: 605248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:03:17,978][09281] Avg episode reward: [(0, '21.228')]
+[2025-03-15 16:03:19,703][09459] Updated weights for policy 0, policy_version 600 (0.0014)
+[2025-03-15 16:03:22,166][09459] Updated weights for policy 0, policy_version 610 (0.0013)
+[2025-03-15 16:03:22,976][09281] Fps is (10 sec: 16793.7, 60 sec: 16520.5, 300 sec: 16199.1). Total num frames: 2510848. Throughput: 0: 4128.6. Samples: 617738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:03:22,977][09281] Avg episode reward: [(0, '22.539')]
+[2025-03-15 16:03:22,979][09445] Saving new best policy, reward=22.539!
+[2025-03-15 16:03:24,655][09459] Updated weights for policy 0, policy_version 620 (0.0013)
+[2025-03-15 16:03:27,125][09459] Updated weights for policy 0, policy_version 630 (0.0014)
+[2025-03-15 16:03:27,976][09281] Fps is (10 sec: 16793.8, 60 sec: 16520.7, 300 sec: 16204.9). Total num frames: 2592768. Throughput: 0: 4132.2. Samples: 642504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:03:27,977][09281] Avg episode reward: [(0, '19.086')]
+[2025-03-15 16:03:29,610][09459] Updated weights for policy 0, policy_version 640 (0.0013)
+[2025-03-15 16:03:32,071][09459] Updated weights for policy 0, policy_version 650 (0.0014)
+[2025-03-15 16:03:32,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16520.5, 300 sec: 16210.3). Total num frames: 2674688. Throughput: 0: 4134.5. Samples: 667382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:03:32,978][09281] Avg episode reward: [(0, '21.123')]
+[2025-03-15 16:03:34,556][09459] Updated weights for policy 0, policy_version 660 (0.0014)
+[2025-03-15 16:03:37,024][09459] Updated weights for policy 0, policy_version 670 (0.0013)
+[2025-03-15 16:03:37,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16520.5, 300 sec: 16215.4). Total num frames: 2756608. Throughput: 0: 4136.4. Samples: 679734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:03:37,977][09281] Avg episode reward: [(0, '21.249')]
+[2025-03-15 16:03:39,541][09459] Updated weights for policy 0, policy_version 680 (0.0014)
+[2025-03-15 16:03:42,013][09459] Updated weights for policy 0, policy_version 690 (0.0014)
+[2025-03-15 16:03:42,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16520.5, 300 sec: 16220.2). Total num frames: 2838528. Throughput: 0: 4128.7. Samples: 704368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:03:42,978][09281] Avg episode reward: [(0, '22.851')]
+[2025-03-15 16:03:43,002][09445] Saving new best policy, reward=22.851!
+[2025-03-15 16:03:44,497][09459] Updated weights for policy 0, policy_version 700 (0.0013)
+[2025-03-15 16:03:47,003][09459] Updated weights for policy 0, policy_version 710 (0.0013)
+[2025-03-15 16:03:47,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16520.5, 300 sec: 16224.8). Total num frames: 2920448. Throughput: 0: 4126.0. Samples: 729070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:03:47,977][09281] Avg episode reward: [(0, '20.478')]
+[2025-03-15 16:03:49,481][09459] Updated weights for policy 0, policy_version 720 (0.0014)
+[2025-03-15 16:03:51,971][09459] Updated weights for policy 0, policy_version 730 (0.0013)
+[2025-03-15 16:03:52,976][09281] Fps is (10 sec: 16793.6, 60 sec: 16588.8, 300 sec: 16251.2). Total num frames: 3006464. Throughput: 0: 4127.1. Samples: 741474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:03:52,977][09281] Avg episode reward: [(0, '21.279')]
+[2025-03-15 16:03:54,460][09459] Updated weights for policy 0, policy_version 740 (0.0014)
+[2025-03-15 16:03:56,939][09459] Updated weights for policy 0, policy_version 750 (0.0014)
+[2025-03-15 16:03:57,976][09281] Fps is (10 sec: 16793.5, 60 sec: 16520.6, 300 sec: 16254.7). Total num frames: 3088384. Throughput: 0: 4124.8. Samples: 766202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:03:57,978][09281] Avg episode reward: [(0, '23.824')]
+[2025-03-15 16:03:57,984][09445] Saving new best policy, reward=23.824!
+[2025-03-15 16:03:59,445][09459] Updated weights for policy 0, policy_version 760 (0.0013)
+[2025-03-15 16:04:01,949][09459] Updated weights for policy 0, policy_version 770 (0.0014)
+[2025-03-15 16:04:02,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16520.5, 300 sec: 16258.0). Total num frames: 3170304. Throughput: 0: 4124.9. Samples: 790868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:04:02,977][09281] Avg episode reward: [(0, '24.192')]
+[2025-03-15 16:04:02,979][09445] Saving new best policy, reward=24.192!
+[2025-03-15 16:04:04,442][09459] Updated weights for policy 0, policy_version 780 (0.0014)
+[2025-03-15 16:04:06,938][09459] Updated weights for policy 0, policy_version 790 (0.0013)
+[2025-03-15 16:04:07,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16520.6, 300 sec: 16261.2). Total num frames: 3252224. Throughput: 0: 4121.1. Samples: 803190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:04:07,978][09281] Avg episode reward: [(0, '21.199')]
+[2025-03-15 16:04:09,456][09459] Updated weights for policy 0, policy_version 800 (0.0014)
+[2025-03-15 16:04:11,938][09459] Updated weights for policy 0, policy_version 810 (0.0014)
+[2025-03-15 16:04:12,976][09281] Fps is (10 sec: 16384.2, 60 sec: 16520.5, 300 sec: 16264.2). Total num frames: 3334144. Throughput: 0: 4118.0. Samples: 827814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:04:12,977][09281] Avg episode reward: [(0, '23.722')]
+[2025-03-15 16:04:14,433][09459] Updated weights for policy 0, policy_version 820 (0.0013)
+[2025-03-15 16:04:16,960][09459] Updated weights for policy 0, policy_version 830 (0.0014)
+[2025-03-15 16:04:17,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16520.6, 300 sec: 16267.0). Total num frames: 3416064. Throughput: 0: 4109.3. Samples: 852300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:04:17,977][09281] Avg episode reward: [(0, '22.380')]
+[2025-03-15 16:04:19,451][09459] Updated weights for policy 0, policy_version 840 (0.0013)
+[2025-03-15 16:04:21,940][09459] Updated weights for policy 0, policy_version 850 (0.0014)
+[2025-03-15 16:04:22,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16452.2, 300 sec: 16269.7). Total num frames: 3497984. Throughput: 0: 4109.2. Samples: 864648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:04:22,978][09281] Avg episode reward: [(0, '22.203')]
+[2025-03-15 16:04:24,455][09459] Updated weights for policy 0, policy_version 860 (0.0014)
+[2025-03-15 16:04:26,967][09459] Updated weights for policy 0, policy_version 870 (0.0014)
+[2025-03-15 16:04:27,977][09281] Fps is (10 sec: 16383.8, 60 sec: 16452.2, 300 sec: 16272.3). Total num frames: 3579904. Throughput: 0: 4105.1. Samples: 889096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:04:27,977][09281] Avg episode reward: [(0, '22.411')]
+[2025-03-15 16:04:29,466][09459] Updated weights for policy 0, policy_version 880 (0.0013)
+[2025-03-15 16:04:31,934][09459] Updated weights for policy 0, policy_version 890 (0.0014)
+[2025-03-15 16:04:32,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16452.3, 300 sec: 16274.8). Total num frames: 3661824. Throughput: 0: 4105.4. Samples: 913814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:04:32,977][09281] Avg episode reward: [(0, '21.047')]
+[2025-03-15 16:04:34,434][09459] Updated weights for policy 0, policy_version 900 (0.0014)
+[2025-03-15 16:04:36,941][09459] Updated weights for policy 0, policy_version 910 (0.0014)
+[2025-03-15 16:04:37,977][09281] Fps is (10 sec: 16383.9, 60 sec: 16452.2, 300 sec: 16277.2). Total num frames: 3743744. Throughput: 0: 4105.3. Samples: 926212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:04:37,978][09281] Avg episode reward: [(0, '24.113')]
+[2025-03-15 16:04:37,984][09445] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000914_3743744.pth...
+[2025-03-15 16:04:39,460][09459] Updated weights for policy 0, policy_version 920 (0.0015)
+[2025-03-15 16:04:41,984][09459] Updated weights for policy 0, policy_version 930 (0.0014)
+[2025-03-15 16:04:42,976][09281] Fps is (10 sec: 15974.4, 60 sec: 16384.0, 300 sec: 16262.0). Total num frames: 3821568. Throughput: 0: 4097.4. Samples: 950584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:04:42,978][09281] Avg episode reward: [(0, '27.418')]
+[2025-03-15 16:04:42,983][09445] Saving new best policy, reward=27.418!
+[2025-03-15 16:04:44,543][09459] Updated weights for policy 0, policy_version 940 (0.0014)
+[2025-03-15 16:04:47,087][09459] Updated weights for policy 0, policy_version 950 (0.0014)
+[2025-03-15 16:04:47,976][09281] Fps is (10 sec: 15974.8, 60 sec: 16384.0, 300 sec: 16264.6). Total num frames: 3903488. Throughput: 0: 4085.3. Samples: 974708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:04:47,977][09281] Avg episode reward: [(0, '23.843')]
+[2025-03-15 16:04:49,575][09459] Updated weights for policy 0, policy_version 960 (0.0015)
+[2025-03-15 16:04:52,091][09459] Updated weights for policy 0, policy_version 970 (0.0014)
+[2025-03-15 16:04:52,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16267.0). Total num frames: 3985408. Throughput: 0: 4084.4. Samples: 986986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:04:52,978][09281] Avg episode reward: [(0, '25.150')]
+[2025-03-15 16:04:54,627][09459] Updated weights for policy 0, policy_version 980 (0.0014)
+[2025-03-15 16:04:57,140][09459] Updated weights for policy 0, policy_version 990 (0.0014)
+[2025-03-15 16:04:57,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16269.3). Total num frames: 4067328. Throughput: 0: 4078.9. Samples: 1011364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:04:57,978][09281] Avg episode reward: [(0, '25.249')]
+[2025-03-15 16:04:59,649][09459] Updated weights for policy 0, policy_version 1000 (0.0014)
+[2025-03-15 16:05:02,155][09459] Updated weights for policy 0, policy_version 1010 (0.0014)
+[2025-03-15 16:05:02,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16315.7, 300 sec: 16271.6). Total num frames: 4149248. Throughput: 0: 4078.4. Samples: 1035830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:05:02,978][09281] Avg episode reward: [(0, '25.765')]
+[2025-03-15 16:05:04,676][09459] Updated weights for policy 0, policy_version 1020 (0.0014)
+[2025-03-15 16:05:07,187][09459] Updated weights for policy 0, policy_version 1030 (0.0015)
+[2025-03-15 16:05:07,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16315.7, 300 sec: 16273.8). Total num frames: 4231168. Throughput: 0: 4075.2. Samples: 1048032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:05:07,978][09281] Avg episode reward: [(0, '25.181')]
+[2025-03-15 16:05:09,836][09459] Updated weights for policy 0, policy_version 1040 (0.0015)
+[2025-03-15 16:05:12,602][09459] Updated weights for policy 0, policy_version 1050 (0.0014)
+[2025-03-15 16:05:12,976][09281] Fps is (10 sec: 15564.8, 60 sec: 16179.2, 300 sec: 16244.9). Total num frames: 4304896. Throughput: 0: 4048.1. Samples: 1071258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:05:12,977][09281] Avg episode reward: [(0, '24.763')]
+[2025-03-15 16:05:15,340][09459] Updated weights for policy 0, policy_version 1060 (0.0014)
+[2025-03-15 16:05:17,976][09281] Fps is (10 sec: 14745.7, 60 sec: 16042.7, 300 sec: 16217.2). Total num frames: 4378624. Throughput: 0: 3987.2. Samples: 1093240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:05:17,977][09281] Avg episode reward: [(0, '27.931')]
+[2025-03-15 16:05:17,984][09445] Saving new best policy, reward=27.931!
+[2025-03-15 16:05:18,134][09459] Updated weights for policy 0, policy_version 1070 (0.0014)
+[2025-03-15 16:05:20,699][09459] Updated weights for policy 0, policy_version 1080 (0.0013)
+[2025-03-15 16:05:22,976][09281] Fps is (10 sec: 15155.3, 60 sec: 15974.4, 300 sec: 16205.3). Total num frames: 4456448. Throughput: 0: 3979.0. Samples: 1105268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:05:22,977][09281] Avg episode reward: [(0, '25.898')]
+[2025-03-15 16:05:23,260][09459] Updated weights for policy 0, policy_version 1090 (0.0014)
+[2025-03-15 16:05:25,830][09459] Updated weights for policy 0, policy_version 1100 (0.0014)
+[2025-03-15 16:05:27,976][09281] Fps is (10 sec: 15974.3, 60 sec: 15974.4, 300 sec: 16208.5). Total num frames: 4538368. Throughput: 0: 3969.8. Samples: 1129224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:05:27,978][09281] Avg episode reward: [(0, '22.060')]
+[2025-03-15 16:05:28,360][09459] Updated weights for policy 0, policy_version 1110 (0.0014)
+[2025-03-15 16:05:30,843][09459] Updated weights for policy 0, policy_version 1120 (0.0014)
+[2025-03-15 16:05:32,976][09281] Fps is (10 sec: 16384.1, 60 sec: 15974.4, 300 sec: 16211.6). Total num frames: 4620288. Throughput: 0: 3980.1. Samples: 1153812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:05:32,977][09281] Avg episode reward: [(0, '23.708')]
+[2025-03-15 16:05:33,335][09459] Updated weights for policy 0, policy_version 1130 (0.0014)
+[2025-03-15 16:05:35,924][09459] Updated weights for policy 0, policy_version 1140 (0.0014)
+[2025-03-15 16:05:37,976][09281] Fps is (10 sec: 16384.0, 60 sec: 15974.5, 300 sec: 16214.5). Total num frames: 4702208. Throughput: 0: 3975.6. Samples: 1165888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:05:37,978][09281] Avg episode reward: [(0, '24.089')]
+[2025-03-15 16:05:38,412][09459] Updated weights for policy 0, policy_version 1150 (0.0014)
+[2025-03-15 16:05:40,949][09459] Updated weights for policy 0, policy_version 1160 (0.0014)
+[2025-03-15 16:05:42,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16042.7, 300 sec: 16217.4). Total num frames: 4784128. Throughput: 0: 3977.3. Samples: 1190342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:05:42,977][09281] Avg episode reward: [(0, '26.642')]
+[2025-03-15 16:05:43,446][09459] Updated weights for policy 0, policy_version 1170 (0.0013)
+[2025-03-15 16:05:45,918][09459] Updated weights for policy 0, policy_version 1180 (0.0013)
+[2025-03-15 16:05:47,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16042.7, 300 sec: 16453.4). Total num frames: 4866048. Throughput: 0: 3983.6. Samples: 1215092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:05:47,977][09281] Avg episode reward: [(0, '28.433')]
+[2025-03-15 16:05:47,984][09445] Saving new best policy, reward=28.433!
+[2025-03-15 16:05:48,407][09459] Updated weights for policy 0, policy_version 1190 (0.0014)
+[2025-03-15 16:05:50,927][09459] Updated weights for policy 0, policy_version 1200 (0.0014)
+[2025-03-15 16:05:52,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16042.7, 300 sec: 16453.4). Total num frames: 4947968. Throughput: 0: 3983.9. Samples: 1227306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:05:52,978][09281] Avg episode reward: [(0, '27.533')]
+[2025-03-15 16:05:53,422][09459] Updated weights for policy 0, policy_version 1210 (0.0013)
+[2025-03-15 16:05:55,924][09459] Updated weights for policy 0, policy_version 1220 (0.0014)
+[2025-03-15 16:05:57,976][09281] Fps is (10 sec: 16383.8, 60 sec: 16042.6, 300 sec: 16439.5). Total num frames: 5029888. Throughput: 0: 4013.9. Samples: 1251886. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:05:57,978][09281] Avg episode reward: [(0, '24.998')]
+[2025-03-15 16:05:58,461][09459] Updated weights for policy 0, policy_version 1230 (0.0014)
+[2025-03-15 16:06:00,995][09459] Updated weights for policy 0, policy_version 1240 (0.0015)
+[2025-03-15 16:06:02,976][09281] Fps is (10 sec: 15974.4, 60 sec: 15974.4, 300 sec: 16411.8). Total num frames: 5107712. Throughput: 0: 4064.5. Samples: 1276142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:06:02,977][09281] Avg episode reward: [(0, '28.744')]
+[2025-03-15 16:06:02,997][09445] Saving new best policy, reward=28.744!
+[2025-03-15 16:06:03,505][09459] Updated weights for policy 0, policy_version 1250 (0.0014)
+[2025-03-15 16:06:06,017][09459] Updated weights for policy 0, policy_version 1260 (0.0015)
+[2025-03-15 16:06:07,976][09281] Fps is (10 sec: 15974.5, 60 sec: 15974.4, 300 sec: 16397.9). Total num frames: 5189632. Throughput: 0: 4067.9. Samples: 1288322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-03-15 16:06:07,977][09281] Avg episode reward: [(0, '30.295')]
+[2025-03-15 16:06:07,983][09445] Saving new best policy, reward=30.295!
+[2025-03-15 16:06:08,591][09459] Updated weights for policy 0, policy_version 1270 (0.0014)
+[2025-03-15 16:06:11,147][09459] Updated weights for policy 0, policy_version 1280 (0.0014)
+[2025-03-15 16:06:12,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16110.9, 300 sec: 16397.9). Total num frames: 5271552. Throughput: 0: 4072.0. Samples: 1312464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:06:12,978][09281] Avg episode reward: [(0, '22.932')]
+[2025-03-15 16:06:13,643][09459] Updated weights for policy 0, policy_version 1290 (0.0014)
+[2025-03-15 16:06:16,142][09459] Updated weights for policy 0, policy_version 1300 (0.0014)
+[2025-03-15 16:06:17,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16247.5, 300 sec: 16384.0). Total num frames: 5353472. Throughput: 0: 4070.7. Samples: 1336992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:06:17,977][09281] Avg episode reward: [(0, '24.043')]
+[2025-03-15 16:06:18,666][09459] Updated weights for policy 0, policy_version 1310 (0.0014)
+[2025-03-15 16:06:21,163][09459] Updated weights for policy 0, policy_version 1320 (0.0014)
+[2025-03-15 16:06:22,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16315.7, 300 sec: 16384.0). Total num frames: 5435392. Throughput: 0: 4073.9. Samples: 1349212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:06:22,977][09281] Avg episode reward: [(0, '25.183')]
+[2025-03-15 16:06:23,702][09459] Updated weights for policy 0, policy_version 1330 (0.0014)
+[2025-03-15 16:06:26,211][09459] Updated weights for policy 0, policy_version 1340 (0.0014)
+[2025-03-15 16:06:27,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16370.1). Total num frames: 5517312. Throughput: 0: 4074.2. Samples: 1373682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:06:27,977][09281] Avg episode reward: [(0, '24.623')]
+[2025-03-15 16:06:28,695][09459] Updated weights for policy 0, policy_version 1350 (0.0014)
+[2025-03-15 16:06:31,192][09459] Updated weights for policy 0, policy_version 1360 (0.0014)
+[2025-03-15 16:06:32,976][09281] Fps is (10 sec: 16383.8, 60 sec: 16315.7, 300 sec: 16370.1). Total num frames: 5599232. Throughput: 0: 4071.5. Samples: 1398310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:06:32,977][09281] Avg episode reward: [(0, '27.281')]
+[2025-03-15 16:06:33,706][09459] Updated weights for policy 0, policy_version 1370 (0.0014)
+[2025-03-15 16:06:36,210][09459] Updated weights for policy 0, policy_version 1380 (0.0013)
+[2025-03-15 16:06:37,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16315.7, 300 sec: 16356.2). Total num frames: 5681152. Throughput: 0: 4071.6. Samples: 1410530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:06:37,978][09281] Avg episode reward: [(0, '26.395')]
+[2025-03-15 16:06:37,984][09445] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000001387_5681152.pth...
+[2025-03-15 16:06:38,045][09445] Removing /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000431_1765376.pth
+[2025-03-15 16:06:38,767][09459] Updated weights for policy 0, policy_version 1390 (0.0014)
+[2025-03-15 16:06:41,301][09459] Updated weights for policy 0, policy_version 1400 (0.0014)
+[2025-03-15 16:06:42,976][09281] Fps is (10 sec: 15974.5, 60 sec: 16247.5, 300 sec: 16342.3). Total num frames: 5758976. Throughput: 0: 4065.7. Samples: 1434840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:06:42,978][09281] Avg episode reward: [(0, '27.776')]
+[2025-03-15 16:06:43,795][09459] Updated weights for policy 0, policy_version 1410 (0.0013)
+[2025-03-15 16:06:46,294][09459] Updated weights for policy 0, policy_version 1420 (0.0014)
+[2025-03-15 16:06:47,976][09281] Fps is (10 sec: 15974.4, 60 sec: 16247.5, 300 sec: 16328.5). Total num frames: 5840896. Throughput: 0: 4070.9. Samples: 1459334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:06:47,977][09281] Avg episode reward: [(0, '26.983')]
+[2025-03-15 16:06:48,786][09459] Updated weights for policy 0, policy_version 1430 (0.0014)
+[2025-03-15 16:06:51,279][09459] Updated weights for policy 0, policy_version 1440 (0.0014)
+[2025-03-15 16:06:52,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16247.5, 300 sec: 16328.5). Total num frames: 5922816. Throughput: 0: 4074.2. Samples: 1471662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:06:52,977][09281] Avg episode reward: [(0, '26.194')]
+[2025-03-15 16:06:53,768][09459] Updated weights for policy 0, policy_version 1450 (0.0014)
+[2025-03-15 16:06:56,266][09459] Updated weights for policy 0, policy_version 1460 (0.0014)
+[2025-03-15 16:06:57,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16247.5, 300 sec: 16328.5). Total num frames: 6004736. Throughput: 0: 4086.4. Samples: 1496350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:06:57,978][09281] Avg episode reward: [(0, '26.450')]
+[2025-03-15 16:06:58,777][09459] Updated weights for policy 0, policy_version 1470 (0.0014)
+[2025-03-15 16:07:01,280][09459] Updated weights for policy 0, policy_version 1480 (0.0014)
+[2025-03-15 16:07:02,977][09281] Fps is (10 sec: 16383.7, 60 sec: 16315.7, 300 sec: 16328.5). Total num frames: 6086656. Throughput: 0: 4084.4. Samples: 1520792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:02,978][09281] Avg episode reward: [(0, '25.216')]
+[2025-03-15 16:07:03,783][09459] Updated weights for policy 0, policy_version 1490 (0.0014)
+[2025-03-15 16:07:06,305][09459] Updated weights for policy 0, policy_version 1500 (0.0014)
+[2025-03-15 16:07:07,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16328.5). Total num frames: 6168576. Throughput: 0: 4085.5. Samples: 1533062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:07,978][09281] Avg episode reward: [(0, '24.200')]
+[2025-03-15 16:07:08,857][09459] Updated weights for policy 0, policy_version 1510 (0.0014)
+[2025-03-15 16:07:11,397][09459] Updated weights for policy 0, policy_version 1520 (0.0014)
+[2025-03-15 16:07:12,976][09281] Fps is (10 sec: 16384.4, 60 sec: 16315.8, 300 sec: 16314.6). Total num frames: 6250496. Throughput: 0: 4079.9. Samples: 1557278. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:07:12,977][09281] Avg episode reward: [(0, '26.237')]
+[2025-03-15 16:07:13,890][09459] Updated weights for policy 0, policy_version 1530 (0.0014)
+[2025-03-15 16:07:16,377][09459] Updated weights for policy 0, policy_version 1540 (0.0014)
+[2025-03-15 16:07:17,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16314.6). Total num frames: 6332416. Throughput: 0: 4078.8. Samples: 1581856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:07:17,977][09281] Avg episode reward: [(0, '26.841')]
+[2025-03-15 16:07:18,869][09459] Updated weights for policy 0, policy_version 1550 (0.0014)
+[2025-03-15 16:07:21,347][09459] Updated weights for policy 0, policy_version 1560 (0.0014)
+[2025-03-15 16:07:22,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16315.7, 300 sec: 16314.6). Total num frames: 6414336. Throughput: 0: 4081.1. Samples: 1594180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:22,977][09281] Avg episode reward: [(0, '25.730')]
+[2025-03-15 16:07:23,841][09459] Updated weights for policy 0, policy_version 1570 (0.0014)
+[2025-03-15 16:07:26,421][09459] Updated weights for policy 0, policy_version 1580 (0.0014)
+[2025-03-15 16:07:27,976][09281] Fps is (10 sec: 16384.2, 60 sec: 16315.8, 300 sec: 16314.6). Total num frames: 6496256. Throughput: 0: 4083.7. Samples: 1618608. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:27,977][09281] Avg episode reward: [(0, '27.721')]
+[2025-03-15 16:07:28,912][09459] Updated weights for policy 0, policy_version 1590 (0.0013)
+[2025-03-15 16:07:31,388][09459] Updated weights for policy 0, policy_version 1600 (0.0013)
+[2025-03-15 16:07:32,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16314.6). Total num frames: 6578176. Throughput: 0: 4087.1. Samples: 1643254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:32,977][09281] Avg episode reward: [(0, '28.395')]
+[2025-03-15 16:07:33,879][09459] Updated weights for policy 0, policy_version 1610 (0.0014)
+[2025-03-15 16:07:36,359][09459] Updated weights for policy 0, policy_version 1620 (0.0013)
+[2025-03-15 16:07:37,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16314.6). Total num frames: 6660096. Throughput: 0: 4087.4. Samples: 1655594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:37,977][09281] Avg episode reward: [(0, '26.676')]
+[2025-03-15 16:07:38,902][09459] Updated weights for policy 0, policy_version 1630 (0.0014)
+[2025-03-15 16:07:41,423][09459] Updated weights for policy 0, policy_version 1640 (0.0014)
+[2025-03-15 16:07:42,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16384.0, 300 sec: 16314.6). Total num frames: 6742016. Throughput: 0: 4081.5. Samples: 1680018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:42,977][09281] Avg episode reward: [(0, '27.700')]
+[2025-03-15 16:07:43,916][09459] Updated weights for policy 0, policy_version 1650 (0.0013)
+[2025-03-15 16:07:46,409][09459] Updated weights for policy 0, policy_version 1660 (0.0014)
+[2025-03-15 16:07:47,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16384.0, 300 sec: 16314.6). Total num frames: 6823936. Throughput: 0: 4086.2. Samples: 1704672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:47,978][09281] Avg episode reward: [(0, '28.120')]
+[2025-03-15 16:07:48,899][09459] Updated weights for policy 0, policy_version 1670 (0.0013)
+[2025-03-15 16:07:51,383][09459] Updated weights for policy 0, policy_version 1680 (0.0013)
+[2025-03-15 16:07:52,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16384.0, 300 sec: 16300.7). Total num frames: 6905856. Throughput: 0: 4086.3. Samples: 1716944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:52,978][09281] Avg episode reward: [(0, '28.519')]
+[2025-03-15 16:07:53,885][09459] Updated weights for policy 0, policy_version 1690 (0.0014)
+[2025-03-15 16:07:56,429][09459] Updated weights for policy 0, policy_version 1700 (0.0014)
+[2025-03-15 16:07:57,976][09281] Fps is (10 sec: 16384.2, 60 sec: 16384.0, 300 sec: 16300.7). Total num frames: 6987776. Throughput: 0: 4091.5. Samples: 1741396. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:07:57,978][09281] Avg episode reward: [(0, '26.862')]
+[2025-03-15 16:07:58,926][09459] Updated weights for policy 0, policy_version 1710 (0.0014)
+[2025-03-15 16:08:01,404][09459] Updated weights for policy 0, policy_version 1720 (0.0014)
+[2025-03-15 16:08:02,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16384.1, 300 sec: 16300.7). Total num frames: 7069696. Throughput: 0: 4094.9. Samples: 1766124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:08:02,978][09281] Avg episode reward: [(0, '26.064')]
+[2025-03-15 16:08:03,886][09459] Updated weights for policy 0, policy_version 1730 (0.0014)
+[2025-03-15 16:08:06,372][09459] Updated weights for policy 0, policy_version 1740 (0.0014)
+[2025-03-15 16:08:07,976][09281] Fps is (10 sec: 16383.8, 60 sec: 16384.0, 300 sec: 16300.7). Total num frames: 7151616. Throughput: 0: 4095.3. Samples: 1778468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:08:07,978][09281] Avg episode reward: [(0, '28.796')]
+[2025-03-15 16:08:08,865][09459] Updated weights for policy 0, policy_version 1750 (0.0013)
+[2025-03-15 16:08:11,402][09459] Updated weights for policy 0, policy_version 1760 (0.0014)
+[2025-03-15 16:08:12,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16384.0, 300 sec: 16300.7). Total num frames: 7233536. Throughput: 0: 4098.1. Samples: 1803024. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:08:12,978][09281] Avg episode reward: [(0, '26.388')]
+[2025-03-15 16:08:13,902][09459] Updated weights for policy 0, policy_version 1770 (0.0013)
+[2025-03-15 16:08:16,415][09459] Updated weights for policy 0, policy_version 1780 (0.0013)
+[2025-03-15 16:08:17,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16384.0, 300 sec: 16286.8). Total num frames: 7315456. Throughput: 0: 4096.6. Samples: 1827600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:08:17,978][09281] Avg episode reward: [(0, '25.842')]
+[2025-03-15 16:08:18,934][09459] Updated weights for policy 0, policy_version 1790 (0.0014)
+[2025-03-15 16:08:21,422][09459] Updated weights for policy 0, policy_version 1800 (0.0014)
+[2025-03-15 16:08:22,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16384.0, 300 sec: 16286.8). Total num frames: 7397376. Throughput: 0: 4093.5. Samples: 1839800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:08:22,978][09281] Avg episode reward: [(0, '26.687')]
+[2025-03-15 16:08:23,918][09459] Updated weights for policy 0, policy_version 1810 (0.0014)
+[2025-03-15 16:08:26,465][09459] Updated weights for policy 0, policy_version 1820 (0.0013)
+[2025-03-15 16:08:27,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16384.0, 300 sec: 16286.8). Total num frames: 7479296. Throughput: 0: 4094.0. Samples: 1864250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:08:27,978][09281] Avg episode reward: [(0, '28.356')]
+[2025-03-15 16:08:28,964][09459] Updated weights for policy 0, policy_version 1830 (0.0013)
+[2025-03-15 16:08:31,456][09459] Updated weights for policy 0, policy_version 1840 (0.0013)
+[2025-03-15 16:08:32,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16384.0, 300 sec: 16286.8). Total num frames: 7561216. Throughput: 0: 4091.6. Samples: 1888796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2025-03-15 16:08:32,977][09281] Avg episode reward: [(0, '26.819')]
+[2025-03-15 16:08:33,952][09459] Updated weights for policy 0, policy_version 1850 (0.0013)
+[2025-03-15 16:08:36,476][09459] Updated weights for policy 0, policy_version 1860 (0.0013)
+[2025-03-15 16:08:37,976][09281] Fps is (10 sec: 16383.8, 60 sec: 16384.0, 300 sec: 16286.8). Total num frames: 7643136. Throughput: 0: 4091.9. Samples: 1901082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:08:37,978][09281] Avg episode reward: [(0, '28.723')]
+[2025-03-15 16:08:37,984][09445] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000001866_7643136.pth...
+[2025-03-15 16:08:38,047][09445] Removing /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000914_3743744.pth
+[2025-03-15 16:08:38,990][09459] Updated weights for policy 0, policy_version 1870 (0.0013)
+[2025-03-15 16:08:41,505][09459] Updated weights for policy 0, policy_version 1880 (0.0014)
+[2025-03-15 16:08:42,977][09281] Fps is (10 sec: 15974.1, 60 sec: 16315.7, 300 sec: 16272.9). Total num frames: 7720960. Throughput: 0: 4087.8. Samples: 1925346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:08:42,978][09281] Avg episode reward: [(0, '26.192')]
+[2025-03-15 16:08:44,039][09459] Updated weights for policy 0, policy_version 1890 (0.0014)
+[2025-03-15 16:08:46,540][09459] Updated weights for policy 0, policy_version 1900 (0.0013)
+[2025-03-15 16:08:47,976][09281] Fps is (10 sec: 15974.6, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 7802880. Throughput: 0: 4081.6. Samples: 1949798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-03-15 16:08:47,978][09281] Avg episode reward: [(0, '29.221')]
+[2025-03-15 16:08:49,017][09459] Updated weights for policy 0, policy_version 1910 (0.0013)
+[2025-03-15 16:08:51,528][09459] Updated weights for policy 0, policy_version 1920 (0.0014)
+[2025-03-15 16:08:52,976][09281] Fps is (10 sec: 16384.4, 60 sec: 16315.8, 300 sec: 16259.0). Total num frames: 7884800. Throughput: 0: 4079.6. Samples: 1962050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:08:52,978][09281] Avg episode reward: [(0, '30.322')]
+[2025-03-15 16:08:52,980][09445] Saving new best policy, reward=30.322!
+[2025-03-15 16:08:54,041][09459] Updated weights for policy 0, policy_version 1930 (0.0013)
+[2025-03-15 16:08:56,569][09459] Updated weights for policy 0, policy_version 1940 (0.0013)
+[2025-03-15 16:08:57,977][09281] Fps is (10 sec: 16383.8, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 7966720. Throughput: 0: 4074.5. Samples: 1986376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:08:57,978][09281] Avg episode reward: [(0, '27.962')]
+[2025-03-15 16:08:59,075][09459] Updated weights for policy 0, policy_version 1950 (0.0013)
+[2025-03-15 16:09:01,551][09459] Updated weights for policy 0, policy_version 1960 (0.0013)
+[2025-03-15 16:09:02,976][09281] Fps is (10 sec: 16383.8, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 8048640. Throughput: 0: 4075.3. Samples: 2010990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-03-15 16:09:02,977][09281] Avg episode reward: [(0, '28.018')]
+[2025-03-15 16:09:04,056][09459] Updated weights for policy 0, policy_version 1970 (0.0013)
+[2025-03-15 16:09:06,554][09459] Updated weights for policy 0, policy_version 1980 (0.0014)
+[2025-03-15 16:09:07,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 8130560. Throughput: 0: 4077.1. Samples: 2023272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:09:07,977][09281] Avg episode reward: [(0, '29.017')]
+[2025-03-15 16:09:09,061][09459] Updated weights for policy 0, policy_version 1990 (0.0014)
+[2025-03-15 16:09:11,567][09459] Updated weights for policy 0, policy_version 2000 (0.0014)
+[2025-03-15 16:09:12,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 8212480. Throughput: 0: 4077.3. Samples: 2047728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:09:12,978][09281] Avg episode reward: [(0, '27.183')]
+[2025-03-15 16:09:14,100][09459] Updated weights for policy 0, policy_version 2010 (0.0014)
+[2025-03-15 16:09:16,641][09459] Updated weights for policy 0, policy_version 2020 (0.0015)
+[2025-03-15 16:09:17,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 8294400. Throughput: 0: 4071.2. Samples: 2072000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:09:17,978][09281] Avg episode reward: [(0, '28.719')]
+[2025-03-15 16:09:19,142][09459] Updated weights for policy 0, policy_version 2030 (0.0013)
+[2025-03-15 16:09:21,637][09459] Updated weights for policy 0, policy_version 2040 (0.0013)
+[2025-03-15 16:09:22,976][09281] Fps is (10 sec: 16384.2, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 8376320. Throughput: 0: 4071.4. Samples: 2084294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:09:22,977][09281] Avg episode reward: [(0, '26.148')]
+[2025-03-15 16:09:24,163][09459] Updated weights for policy 0, policy_version 2050 (0.0014)
+[2025-03-15 16:09:26,663][09459] Updated weights for policy 0, policy_version 2060 (0.0014)
+[2025-03-15 16:09:27,977][09281] Fps is (10 sec: 16383.3, 60 sec: 16315.6, 300 sec: 16259.0). Total num frames: 8458240. Throughput: 0: 4077.3. Samples: 2108828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:09:27,978][09281] Avg episode reward: [(0, '26.221')]
+[2025-03-15 16:09:29,180][09459] Updated weights for policy 0, policy_version 2070 (0.0014)
+[2025-03-15 16:09:31,711][09459] Updated weights for policy 0, policy_version 2080 (0.0014)
+[2025-03-15 16:09:32,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16315.7, 300 sec: 16259.1). Total num frames: 8540160. Throughput: 0: 4076.6. Samples: 2133244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:09:32,978][09281] Avg episode reward: [(0, '27.797')]
+[2025-03-15 16:09:34,199][09459] Updated weights for policy 0, policy_version 2090 (0.0014)
+[2025-03-15 16:09:36,687][09459] Updated weights for policy 0, policy_version 2100 (0.0013)
+[2025-03-15 16:09:37,976][09281] Fps is (10 sec: 16384.8, 60 sec: 16315.8, 300 sec: 16272.9). Total num frames: 8622080. Throughput: 0: 4077.1. Samples: 2145522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:09:37,978][09281] Avg episode reward: [(0, '30.573')]
+[2025-03-15 16:09:37,984][09445] Saving new best policy, reward=30.573!
+[2025-03-15 16:09:39,226][09459] Updated weights for policy 0, policy_version 2110 (0.0014)
+[2025-03-15 16:09:41,780][09459] Updated weights for policy 0, policy_version 2120 (0.0014)
+[2025-03-15 16:09:42,977][09281] Fps is (10 sec: 15973.8, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 8699904. Throughput: 0: 4075.4. Samples: 2169768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-03-15 16:09:42,978][09281] Avg episode reward: [(0, '27.938')]
+[2025-03-15 16:09:44,286][09459] Updated weights for policy 0, policy_version 2130 (0.0014)
+[2025-03-15 16:09:46,809][09459] Updated weights for policy 0, policy_version 2140 (0.0014)
+[2025-03-15 16:09:47,976][09281] Fps is (10 sec: 15974.4, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 8781824. Throughput: 0: 4070.3. Samples: 2194154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:09:47,977][09281] Avg episode reward: [(0, '26.935')]
+[2025-03-15 16:09:49,327][09459] Updated weights for policy 0, policy_version 2150 (0.0013)
+[2025-03-15 16:09:51,826][09459] Updated weights for policy 0, policy_version 2160 (0.0013)
+[2025-03-15 16:09:52,976][09281] Fps is (10 sec: 16384.5, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 8863744. Throughput: 0: 4068.5. Samples: 2206352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:09:52,978][09281] Avg episode reward: [(0, '27.644')]
+[2025-03-15 16:09:54,312][09459] Updated weights for policy 0, policy_version 2170 (0.0014)
+[2025-03-15 16:09:56,806][09459] Updated weights for policy 0, policy_version 2180 (0.0013)
+[2025-03-15 16:09:57,977][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.8, 300 sec: 16259.0). Total num frames: 8945664. Throughput: 0: 4075.0. Samples: 2231102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-03-15 16:09:57,978][09281] Avg episode reward: [(0, '27.232')]
+[2025-03-15 16:09:59,302][09459] Updated weights for policy 0, policy_version 2190 (0.0013)
+[2025-03-15 16:10:01,792][09459] Updated weights for policy 0, policy_version 2200 (0.0014)
+[2025-03-15 16:10:02,976][09281] Fps is (10 sec: 16383.9, 60 sec: 16315.7, 300 sec: 16259.0). Total num frames: 9027584. Throughput: 0: 4080.9. Samples: 2255640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:10:02,978][09281] Avg episode reward: [(0, '26.891')]
+[2025-03-15 16:10:04,303][09459] Updated weights for policy 0, policy_version 2210 (0.0014)
+[2025-03-15 16:10:06,810][09459] Updated weights for policy 0, policy_version 2220 (0.0015)
+[2025-03-15 16:10:07,976][09281] Fps is (10 sec: 16384.0, 60 sec: 16315.7, 300 sec: 16286.8). Total num frames: 9109504. Throughput: 0: 4080.8. Samples: 2267930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:10:07,977][09281] Avg episode reward: [(0, '25.807')]
+[2025-03-15 16:10:09,335][09459] Updated weights for policy 0, policy_version 2230 (0.0014)
+[2025-03-15 16:10:11,803][09459] Updated weights for policy 0, policy_version 2240 (0.0014)
+[2025-03-15 16:10:12,976][09281] Fps is (10 sec: 16384.1, 60 sec: 16315.8, 300 sec: 16314.6). Total num frames: 9191424. Throughput: 0: 4080.4. Samples: 2292446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:10:12,977][09281] Avg episode reward: [(0, '29.713')]
+[2025-03-15 16:10:14,319][09459] Updated weights for policy 0, policy_version 2250 (0.0014)
+[2025-03-15 16:10:16,920][09459] Updated weights for policy 0, policy_version 2260 (0.0014)
+[2025-03-15 16:10:17,976][09281] Fps is (10 sec: 15974.3, 60 sec: 16247.5, 300 sec: 16314.6). Total num frames: 9269248. Throughput: 0: 4069.5. Samples: 2316374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:10:17,978][09281] Avg episode reward: [(0, '31.112')]
+[2025-03-15 16:10:18,008][09445] Saving new best policy, reward=31.112!
+[2025-03-15 16:10:19,645][09459] Updated weights for policy 0, policy_version 2270 (0.0015)
+[2025-03-15 16:10:22,222][09459] Updated weights for policy 0, policy_version 2280 (0.0014)
+[2025-03-15 16:10:22,977][09281] Fps is (10 sec: 15564.5, 60 sec: 16179.1, 300 sec: 16300.7). Total num frames: 9347072. Throughput: 0: 4053.3. Samples: 2327922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:10:22,978][09281] Avg episode reward: [(0, '28.468')]
+[2025-03-15 16:10:24,801][09459] Updated weights for policy 0, policy_version 2290 (0.0014)
+[2025-03-15 16:10:27,262][09459] Updated weights for policy 0, policy_version 2300 (0.0014)
+[2025-03-15 16:10:27,976][09281] Fps is (10 sec: 16384.2, 60 sec: 16247.6, 300 sec: 16314.6). Total num frames: 9433088. Throughput: 0: 4043.9. Samples: 2351744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:10:27,977][09281] Avg episode reward: [(0, '29.412')]
+[2025-03-15 16:10:29,616][09459] Updated weights for policy 0, policy_version 2310 (0.0014)
+[2025-03-15 16:10:32,015][09459] Updated weights for policy 0, policy_version 2320 (0.0013)
+[2025-03-15 16:10:32,976][09281] Fps is (10 sec: 16794.1, 60 sec: 16247.5, 300 sec: 16314.6). Total num frames: 9515008. Throughput: 0: 4078.4. Samples: 2377682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:10:32,977][09281] Avg episode reward: [(0, '26.824')]
+[2025-03-15 16:10:34,415][09459] Updated weights for policy 0, policy_version 2330 (0.0013)
+[2025-03-15 16:10:36,767][09459] Updated weights for policy 0, policy_version 2340 (0.0013)
+[2025-03-15 16:10:37,976][09281] Fps is (10 sec: 17203.1, 60 sec: 16384.0, 300 sec: 16342.3). Total num frames: 9605120. Throughput: 0: 4092.8. Samples: 2390526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-03-15 16:10:37,977][09281] Avg episode reward: [(0, '31.490')]
+[2025-03-15 16:10:37,984][09445] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000002345_9605120.pth...
+[2025-03-15 16:10:38,044][09445] Removing /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000001387_5681152.pth
+[2025-03-15 16:10:38,052][09445] Saving new best policy, reward=31.490!
+[2025-03-15 16:10:39,162][09459] Updated weights for policy 0, policy_version 2350 (0.0013)
+[2025-03-15 16:10:41,682][09459] Updated weights for policy 0, policy_version 2360 (0.0013)
+[2025-03-15 16:10:42,976][09281] Fps is (10 sec: 17203.1, 60 sec: 16452.4, 300 sec: 16342.3). Total num frames: 9687040. Throughput: 0: 4105.3. Samples: 2415840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:10:42,977][09281] Avg episode reward: [(0, '27.658')]
+[2025-03-15 16:10:44,068][09459] Updated weights for policy 0, policy_version 2370 (0.0013)
+[2025-03-15 16:10:46,400][09459] Updated weights for policy 0, policy_version 2380 (0.0013)
+[2025-03-15 16:10:47,976][09281] Fps is (10 sec: 16793.7, 60 sec: 16520.5, 300 sec: 16356.2). Total num frames: 9773056. Throughput: 0: 4138.2. Samples: 2441858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-03-15 16:10:47,977][09281] Avg episode reward: [(0, '27.962')]
+[2025-03-15 16:10:48,743][09459] Updated weights for policy 0, policy_version 2390 (0.0013)
+[2025-03-15 16:10:51,107][09459] Updated weights for policy 0, policy_version 2400 (0.0013)
+[2025-03-15 16:10:52,976][09281] Fps is (10 sec: 17612.8, 60 sec: 16657.1, 300 sec: 16384.0). Total num frames: 9863168. Throughput: 0: 4155.9. Samples: 2454946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:10:52,977][09281] Avg episode reward: [(0, '28.989')]
+[2025-03-15 16:10:53,419][09459] Updated weights for policy 0, policy_version 2410 (0.0013)
+[2025-03-15 16:10:55,774][09459] Updated weights for policy 0, policy_version 2420 (0.0012)
+[2025-03-15 16:10:57,976][09281] Fps is (10 sec: 17612.8, 60 sec: 16725.3, 300 sec: 16411.8). Total num frames: 9949184. Throughput: 0: 4198.0. Samples: 2481354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-03-15 16:10:57,977][09281] Avg episode reward: [(0, '28.061')]
+[2025-03-15 16:10:58,126][09459] Updated weights for policy 0, policy_version 2430 (0.0013)
+[2025-03-15 16:11:00,455][09459] Updated weights for policy 0, policy_version 2440 (0.0012)
+[2025-03-15 16:11:01,145][09445] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
+[2025-03-15 16:11:01,152][09445] Stopping Batcher_0...
+[2025-03-15 16:11:01,153][09445] Loop batcher_evt_loop terminating...
+[2025-03-15 16:11:01,152][09281] Component Batcher_0 stopped!
+[2025-03-15 16:11:01,169][09459] Weights refcount: 2 0
+[2025-03-15 16:11:01,172][09459] Stopping InferenceWorker_p0-w0...
+[2025-03-15 16:11:01,172][09459] Loop inference_proc0-0_evt_loop terminating...
+[2025-03-15 16:11:01,173][09281] Component InferenceWorker_p0-w0 stopped!
+[2025-03-15 16:11:01,210][09465] Stopping RolloutWorker_w6...
+[2025-03-15 16:11:01,211][09465] Loop rollout_proc6_evt_loop terminating...
+[2025-03-15 16:11:01,210][09281] Component RolloutWorker_w6 stopped!
+[2025-03-15 16:11:01,210][09463] Stopping RolloutWorker_w4...
+[2025-03-15 16:11:01,213][09460] Stopping RolloutWorker_w1...
+[2025-03-15 16:11:01,212][09463] Loop rollout_proc4_evt_loop terminating...
+[2025-03-15 16:11:01,213][09460] Loop rollout_proc1_evt_loop terminating...
+[2025-03-15 16:11:01,213][09461] Stopping RolloutWorker_w2...
+[2025-03-15 16:11:01,214][09461] Loop rollout_proc2_evt_loop terminating...
+[2025-03-15 16:11:01,214][09462] Stopping RolloutWorker_w3...
+[2025-03-15 16:11:01,214][09462] Loop rollout_proc3_evt_loop terminating...
+[2025-03-15 16:11:01,212][09281] Component RolloutWorker_w4 stopped!
+[2025-03-15 16:11:01,217][09281] Component RolloutWorker_w1 stopped!
+[2025-03-15 16:11:01,218][09466] Stopping RolloutWorker_w7...
+[2025-03-15 16:11:01,219][09466] Loop rollout_proc7_evt_loop terminating...
+[2025-03-15 16:11:01,219][09445] Removing /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000001866_7643136.pth
+[2025-03-15 16:11:01,218][09281] Component RolloutWorker_w2 stopped!
+[2025-03-15 16:11:01,219][09281] Component RolloutWorker_w3 stopped!
+[2025-03-15 16:11:01,224][09464] Stopping RolloutWorker_w5...
+[2025-03-15 16:11:01,225][09464] Loop rollout_proc5_evt_loop terminating...
+[2025-03-15 16:11:01,225][09458] Stopping RolloutWorker_w0...
+[2025-03-15 16:11:01,226][09458] Loop rollout_proc0_evt_loop terminating...
+[2025-03-15 16:11:01,221][09281] Component RolloutWorker_w7 stopped!
+[2025-03-15 16:11:01,227][09281] Component RolloutWorker_w5 stopped!
+[2025-03-15 16:11:01,229][09445] Saving /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
+[2025-03-15 16:11:01,228][09281] Component RolloutWorker_w0 stopped!
+[2025-03-15 16:11:01,334][09281] Component LearnerWorker_p0 stopped!
+[2025-03-15 16:11:01,336][09445] Stopping LearnerWorker_p0...
+[2025-03-15 16:11:01,337][09445] Loop learner_proc0_evt_loop terminating...
+[2025-03-15 16:11:01,335][09281] Waiting for process learner_proc0 to stop...
+[2025-03-15 16:11:02,274][09281] Waiting for process inference_proc0-0 to join...
+[2025-03-15 16:11:02,275][09281] Waiting for process rollout_proc0 to join...
+[2025-03-15 16:11:02,275][09281] Waiting for process rollout_proc1 to join...
+[2025-03-15 16:11:02,276][09281] Waiting for process rollout_proc2 to join...
+[2025-03-15 16:11:02,277][09281] Waiting for process rollout_proc3 to join...
+[2025-03-15 16:11:02,278][09281] Waiting for process rollout_proc4 to join...
+[2025-03-15 16:11:02,279][09281] Waiting for process rollout_proc5 to join...
+[2025-03-15 16:11:02,280][09281] Waiting for process rollout_proc6 to join...
+[2025-03-15 16:11:02,281][09281] Waiting for process rollout_proc7 to join...
+[2025-03-15 16:11:02,282][09281] Batcher 0 profile tree view:
+batching: 27.2978, releasing_batches: 0.0629
+[2025-03-15 16:11:02,282][09281] InferenceWorker_p0-w0 profile tree view:
 wait_policy: 0.0001
-  wait_policy_total: 5.6023
-update_model: 4.0313
-  weight_update: 0.0014
-one_step: 0.0046
-  handle_policy_step: 233.6388
-    deserialize: 10.0874, stack: 1.3770, obs_to_device_normalize: 55.6877, forward: 105.9535, send_messages: 16.8725
-    prepare_outputs: 34.8387
-      to_cpu: 23.8238
-[2025-03-15 15:57:54,659][06641] Learner 0 profile tree view:
-misc: 0.0043, prepare_batch: 9.8281
-train: 39.5948
-  epoch_init: 0.0040, minibatch_init: 0.0059, losses_postprocess: 0.4596, kl_divergence: 0.3789, after_optimizer: 16.4166
-  calculate_losses: 13.7465
-    losses_init: 0.0028, forward_head: 0.9158, bptt_initial: 9.4557, tail: 0.6255, advantages_returns: 0.1704, losses: 1.3507
-    bptt: 1.0575
-      bptt_forward_core: 1.0001
-  update: 8.2319
-    clip: 0.8346
-[2025-03-15 15:57:54,659][06641] RolloutWorker_w0 profile tree view:
-wait_for_trajectories: 0.1595, enqueue_policy_requests: 10.9979, env_step: 140.6637, overhead: 7.2801, complete_rollouts: 0.2554
-save_policy_outputs: 12.1902
-  split_output_tensors: 3.9508
-[2025-03-15 15:57:54,660][06641] RolloutWorker_w7 profile tree view:
-wait_for_trajectories: 0.1608, enqueue_policy_requests: 10.9947, env_step: 140.5420, overhead: 7.3610, complete_rollouts: 0.2484
-save_policy_outputs: 12.2805
-  split_output_tensors: 4.0136
-[2025-03-15 15:57:54,660][06641] Loop Runner_EvtLoop terminating...
-[2025-03-15 15:57:54,662][06641] Runner profile tree view:
-main_loop: 262.3501
-[2025-03-15 15:57:54,663][06641] Collected {0: 4005888}, FPS: 15269.2
-[2025-03-15 15:57:54,695][06641] Loading existing experiment configuration from /home/aa/Downloads/train_dir/default_experiment/config.json
-[2025-03-15 15:57:54,696][06641] Overriding arg 'num_workers' with value 1 passed from command line
-[2025-03-15 15:57:54,696][06641] Adding new argument 'no_render'=True that is not in the saved config file!
-[2025-03-15 15:57:54,697][06641] Adding new argument 'save_video'=True that is not in the saved config file!
-[2025-03-15 15:57:54,697][06641] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2025-03-15 15:57:54,698][06641] Adding new argument 'video_name'=None that is not in the saved config file!
-[2025-03-15 15:57:54,699][06641] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
-[2025-03-15 15:57:54,700][06641] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2025-03-15 15:57:54,701][06641] Adding new argument 'push_to_hub'=False that is not in the saved config file!
-[2025-03-15 15:57:54,702][06641] Adding new argument 'hf_repository'=None that is not in the saved config file!
-[2025-03-15 15:57:54,702][06641] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2025-03-15 15:57:54,703][06641] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2025-03-15 15:57:54,703][06641] Adding new argument 'train_script'=None that is not in the saved config file!
-[2025-03-15 15:57:54,704][06641] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2025-03-15 15:57:54,705][06641] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2025-03-15 15:57:54,726][06641] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-03-15 15:57:54,728][06641] RunningMeanStd input shape: (3, 72, 128)
-[2025-03-15 15:57:54,729][06641] RunningMeanStd input shape: (1,)
-[2025-03-15 15:57:54,738][06641] ConvEncoder: input_channels=3
-[2025-03-15 15:57:54,835][06641] Conv encoder output size: 512
-[2025-03-15 15:57:54,835][06641] Policy head output size: 512
-[2025-03-15 15:57:54,964][06641] Loading state from checkpoint /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2025-03-15 15:57:55,659][06641] Num frames 100...
-[2025-03-15 15:57:55,766][06641] Num frames 200...
-[2025-03-15 15:57:55,875][06641] Num frames 300...
-[2025-03-15 15:57:55,996][06641] Num frames 400...
-[2025-03-15 15:57:56,105][06641] Num frames 500...
-[2025-03-15 15:57:56,213][06641] Num frames 600...
-[2025-03-15 15:57:56,322][06641] Num frames 700...
-[2025-03-15 15:57:56,434][06641] Num frames 800...
-[2025-03-15 15:57:56,575][06641] Avg episode rewards: #0: 23.800, true rewards: #0: 8.800
-[2025-03-15 15:57:56,576][06641] Avg episode reward: 23.800, avg true_objective: 8.800
-[2025-03-15 15:57:56,615][06641] Num frames 900...
-[2025-03-15 15:57:56,729][06641] Num frames 1000...
-[2025-03-15 15:57:56,833][06641] Num frames 1100...
-[2025-03-15 15:57:56,948][06641] Num frames 1200...
-[2025-03-15 15:57:57,062][06641] Num frames 1300...
-[2025-03-15 15:57:57,166][06641] Num frames 1400...
-[2025-03-15 15:57:57,270][06641] Num frames 1500...
-[2025-03-15 15:57:57,376][06641] Num frames 1600...
-[2025-03-15 15:57:57,484][06641] Num frames 1700...
-[2025-03-15 15:57:57,591][06641] Num frames 1800...
-[2025-03-15 15:57:57,700][06641] Num frames 1900...
-[2025-03-15 15:57:57,808][06641] Num frames 2000...
-[2025-03-15 15:57:57,964][06641] Avg episode rewards: #0: 25.990, true rewards: #0: 10.490
-[2025-03-15 15:57:57,964][06641] Avg episode reward: 25.990, avg true_objective: 10.490
-[2025-03-15 15:57:57,968][06641] Num frames 2100...
-[2025-03-15 15:57:58,092][06641] Num frames 2200...
-[2025-03-15 15:57:58,197][06641] Num frames 2300...
-[2025-03-15 15:57:58,302][06641] Num frames 2400...
-[2025-03-15 15:57:58,409][06641] Num frames 2500...
-[2025-03-15 15:57:58,512][06641] Num frames 2600...
-[2025-03-15 15:57:58,620][06641] Num frames 2700...
-[2025-03-15 15:57:58,725][06641] Num frames 2800...
-[2025-03-15 15:57:58,830][06641] Num frames 2900...
-[2025-03-15 15:57:58,938][06641] Num frames 3000...
-[2025-03-15 15:57:59,046][06641] Num frames 3100...
-[2025-03-15 15:57:59,150][06641] Num frames 3200...
-[2025-03-15 15:57:59,254][06641] Num frames 3300...
-[2025-03-15 15:57:59,359][06641] Num frames 3400...
-[2025-03-15 15:57:59,463][06641] Num frames 3500...
-[2025-03-15 15:57:59,555][06641] Avg episode rewards: #0: 28.793, true rewards: #0: 11.793
-[2025-03-15 15:57:59,556][06641] Avg episode reward: 28.793, avg true_objective: 11.793
-[2025-03-15 15:57:59,637][06641] Num frames 3600...
-[2025-03-15 15:57:59,742][06641] Num frames 3700...
-[2025-03-15 15:57:59,855][06641] Num frames 3800...
-[2025-03-15 15:57:59,963][06641] Num frames 3900...
-[2025-03-15 15:58:00,107][06641] Avg episode rewards: #0: 23.715, true rewards: #0: 9.965
-[2025-03-15 15:58:00,108][06641] Avg episode reward: 23.715, avg true_objective: 9.965
-[2025-03-15 15:58:00,127][06641] Num frames 4000...
-[2025-03-15 15:58:00,237][06641] Num frames 4100...
-[2025-03-15 15:58:00,342][06641] Num frames 4200...
-[2025-03-15 15:58:00,445][06641] Num frames 4300...
-[2025-03-15 15:58:00,550][06641] Num frames 4400...
-[2025-03-15 15:58:00,663][06641] Num frames 4500...
-[2025-03-15 15:58:00,773][06641] Num frames 4600...
-[2025-03-15 15:58:00,880][06641] Num frames 4700...
-[2025-03-15 15:58:00,987][06641] Num frames 4800...
-[2025-03-15 15:58:01,091][06641] Num frames 4900...
-[2025-03-15 15:58:01,195][06641] Num frames 5000...
-[2025-03-15 15:58:01,303][06641] Num frames 5100...
-[2025-03-15 15:58:01,408][06641] Num frames 5200...
-[2025-03-15 15:58:01,512][06641] Num frames 5300...
-[2025-03-15 15:58:01,616][06641] Num frames 5400...
-[2025-03-15 15:58:01,720][06641] Num frames 5500...
-[2025-03-15 15:58:01,825][06641] Num frames 5600...
-[2025-03-15 15:58:01,936][06641] Num frames 5700...
-[2025-03-15 15:58:02,006][06641] Avg episode rewards: #0: 27.628, true rewards: #0: 11.428
-[2025-03-15 15:58:02,008][06641] Avg episode reward: 27.628, avg true_objective: 11.428
-[2025-03-15 15:58:02,112][06641] Num frames 5800...
-[2025-03-15 15:58:02,216][06641] Num frames 5900...
-[2025-03-15 15:58:02,324][06641] Num frames 6000...
-[2025-03-15 15:58:02,435][06641] Num frames 6100...
-[2025-03-15 15:58:02,547][06641] Num frames 6200...
-[2025-03-15 15:58:02,655][06641] Num frames 6300...
-[2025-03-15 15:58:02,763][06641] Num frames 6400...
-[2025-03-15 15:58:02,871][06641] Num frames 6500...
-[2025-03-15 15:58:02,998][06641] Num frames 6600...
-[2025-03-15 15:58:03,112][06641] Num frames 6700...
-[2025-03-15 15:58:03,226][06641] Num frames 6800...
-[2025-03-15 15:58:03,342][06641] Num frames 6900...
-[2025-03-15 15:58:03,463][06641] Num frames 7000...
-[2025-03-15 15:58:03,584][06641] Num frames 7100...
-[2025-03-15 15:58:03,702][06641] Num frames 7200...
-[2025-03-15 15:58:03,819][06641] Num frames 7300...
-[2025-03-15 15:58:03,939][06641] Num frames 7400...
-[2025-03-15 15:58:04,054][06641] Num frames 7500...
-[2025-03-15 15:58:04,208][06641] Avg episode rewards: #0: 31.480, true rewards: #0: 12.647
-[2025-03-15 15:58:04,210][06641] Avg episode reward: 31.480, avg true_objective: 12.647
-[2025-03-15 15:58:04,243][06641] Num frames 7600...
-[2025-03-15 15:58:04,359][06641] Num frames 7700...
-[2025-03-15 15:58:04,462][06641] Num frames 7800...
-[2025-03-15 15:58:04,567][06641] Num frames 7900...
-[2025-03-15 15:58:04,666][06641] Avg episode rewards: #0: 27.628, true rewards: #0: 11.343
-[2025-03-15 15:58:04,668][06641] Avg episode reward: 27.628, avg true_objective: 11.343
-[2025-03-15 15:58:04,762][06641] Num frames 8000...
-[2025-03-15 15:58:04,866][06641] Num frames 8100...
-[2025-03-15 15:58:04,969][06641] Num frames 8200...
-[2025-03-15 15:58:05,074][06641] Num frames 8300...
-[2025-03-15 15:58:05,179][06641] Num frames 8400...
-[2025-03-15 15:58:05,283][06641] Num frames 8500...
-[2025-03-15 15:58:05,387][06641] Num frames 8600...
-[2025-03-15 15:58:05,491][06641] Num frames 8700...
-[2025-03-15 15:58:05,595][06641] Num frames 8800...
-[2025-03-15 15:58:05,698][06641] Num frames 8900...
-[2025-03-15 15:58:05,804][06641] Num frames 9000...
-[2025-03-15 15:58:05,909][06641] Num frames 9100...
-[2025-03-15 15:58:06,017][06641] Num frames 9200...
-[2025-03-15 15:58:06,119][06641] Num frames 9300...
-[2025-03-15 15:58:06,226][06641] Num frames 9400...
-[2025-03-15 15:58:06,332][06641] Num frames 9500...
-[2025-03-15 15:58:06,437][06641] Num frames 9600...
-[2025-03-15 15:58:06,545][06641] Avg episode rewards: #0: 29.816, true rewards: #0: 12.066
-[2025-03-15 15:58:06,546][06641] Avg episode reward: 29.816, avg true_objective: 12.066
-[2025-03-15 15:58:06,613][06641] Num frames 9700...
-[2025-03-15 15:58:06,717][06641] Num frames 9800...
-[2025-03-15 15:58:06,820][06641] Num frames 9900...
-[2025-03-15 15:58:06,925][06641] Num frames 10000...
-[2025-03-15 15:58:07,029][06641] Num frames 10100...
-[2025-03-15 15:58:07,133][06641] Num frames 10200...
-[2025-03-15 15:58:07,237][06641] Num frames 10300...
-[2025-03-15 15:58:07,342][06641] Num frames 10400...
-[2025-03-15 15:58:07,446][06641] Num frames 10500...
-[2025-03-15 15:58:07,549][06641] Num frames 10600...
-[2025-03-15 15:58:07,669][06641] Num frames 10700...
-[2025-03-15 15:58:07,815][06641] Avg episode rewards: #0: 29.859, true rewards: #0: 11.970
-[2025-03-15 15:58:07,817][06641] Avg episode reward: 29.859, avg true_objective: 11.970
-[2025-03-15 15:58:07,878][06641] Num frames 10800...
-[2025-03-15 15:58:07,982][06641] Num frames 10900...
-[2025-03-15 15:58:08,085][06641] Num frames 11000...
-[2025-03-15 15:58:08,191][06641] Num frames 11100...
-[2025-03-15 15:58:08,295][06641] Num frames 11200...
-[2025-03-15 15:58:08,398][06641] Num frames 11300...
-[2025-03-15 15:58:08,502][06641] Num frames 11400...
-[2025-03-15 15:58:08,608][06641] Num frames 11500...
-[2025-03-15 15:58:08,713][06641] Num frames 11600...
-[2025-03-15 15:58:08,818][06641] Num frames 11700...
-[2025-03-15 15:58:08,925][06641] Num frames 11800...
-[2025-03-15 15:58:09,029][06641] Num frames 11900...
-[2025-03-15 15:58:09,133][06641] Num frames 12000...
-[2025-03-15 15:58:09,209][06641] Avg episode rewards: #0: 29.521, true rewards: #0: 12.021
-[2025-03-15 15:58:09,210][06641] Avg episode reward: 29.521, avg true_objective: 12.021
-[2025-03-15 15:58:33,751][06641] Replay video saved to /home/aa/Downloads/train_dir/default_experiment/replay.mp4!
-[2025-03-15 15:58:35,081][06641] Loading existing experiment configuration from /home/aa/Downloads/train_dir/default_experiment/config.json
-[2025-03-15 15:58:35,082][06641] Overriding arg 'num_workers' with value 1 passed from command line
-[2025-03-15 15:58:35,082][06641] Adding new argument 'no_render'=True that is not in the saved config file!
-[2025-03-15 15:58:35,083][06641] Adding new argument 'save_video'=True that is not in the saved config file!
-[2025-03-15 15:58:35,083][06641] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2025-03-15 15:58:35,084][06641] Adding new argument 'video_name'=None that is not in the saved config file!
-[2025-03-15 15:58:35,084][06641] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
-[2025-03-15 15:58:35,085][06641] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2025-03-15 15:58:35,086][06641] Adding new argument 'push_to_hub'=True that is not in the saved config file!
-[2025-03-15 15:58:35,087][06641] Adding new argument 'hf_repository'='ALEXIOSTER/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
-[2025-03-15 15:58:35,087][06641] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2025-03-15 15:58:35,088][06641] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2025-03-15 15:58:35,088][06641] Adding new argument 'train_script'=None that is not in the saved config file!
-[2025-03-15 15:58:35,088][06641] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2025-03-15 15:58:35,090][06641] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2025-03-15 15:58:35,113][06641] RunningMeanStd input shape: (3, 72, 128)
-[2025-03-15 15:58:35,115][06641] RunningMeanStd input shape: (1,)
-[2025-03-15 15:58:35,123][06641] ConvEncoder: input_channels=3
-[2025-03-15 15:58:35,163][06641] Conv encoder output size: 512
-[2025-03-15 15:58:35,164][06641] Policy head output size: 512
-[2025-03-15 15:58:35,198][06641] Loading state from checkpoint /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2025-03-15 15:58:35,641][06641] Num frames 100...
-[2025-03-15 15:58:35,782][06641] Num frames 200...
-[2025-03-15 15:58:35,909][06641] Num frames 300...
-[2025-03-15 15:58:36,043][06641] Num frames 400...
-[2025-03-15 15:58:36,179][06641] Num frames 500...
-[2025-03-15 15:58:36,322][06641] Num frames 600...
-[2025-03-15 15:58:36,431][06641] Avg episode rewards: #0: 13.400, true rewards: #0: 6.400
-[2025-03-15 15:58:36,431][06641] Avg episode reward: 13.400, avg true_objective: 6.400
-[2025-03-15 15:58:36,503][06641] Num frames 700...
-[2025-03-15 15:58:36,619][06641] Num frames 800...
-[2025-03-15 15:58:36,730][06641] Num frames 900...
-[2025-03-15 15:58:36,839][06641] Num frames 1000...
-[2025-03-15 15:58:36,956][06641] Num frames 1100...
-[2025-03-15 15:58:37,090][06641] Num frames 1200...
-[2025-03-15 15:58:37,208][06641] Num frames 1300...
-[2025-03-15 15:58:37,336][06641] Num frames 1400...
-[2025-03-15 15:58:37,454][06641] Num frames 1500...
-[2025-03-15 15:58:37,552][06641] Avg episode rewards: #0: 18.180, true rewards: #0: 7.680
-[2025-03-15 15:58:37,552][06641] Avg episode reward: 18.180, avg true_objective: 7.680
-[2025-03-15 15:58:37,627][06641] Num frames 1600...
-[2025-03-15 15:58:37,737][06641] Num frames 1700...
-[2025-03-15 15:58:37,856][06641] Num frames 1800...
-[2025-03-15 15:58:37,967][06641] Num frames 1900...
-[2025-03-15 15:58:38,075][06641] Num frames 2000...
-[2025-03-15 15:58:38,189][06641] Num frames 2100...
-[2025-03-15 15:58:38,312][06641] Num frames 2200...
-[2025-03-15 15:58:38,417][06641] Num frames 2300...
-[2025-03-15 15:58:38,526][06641] Num frames 2400...
-[2025-03-15 15:58:38,638][06641] Num frames 2500...
-[2025-03-15 15:58:38,756][06641] Num frames 2600...
-[2025-03-15 15:58:38,876][06641] Num frames 2700...
-[2025-03-15 15:58:38,991][06641] Num frames 2800...
-[2025-03-15 15:58:39,104][06641] Num frames 2900...
-[2025-03-15 15:58:39,229][06641] Num frames 3000...
-[2025-03-15 15:58:39,348][06641] Num frames 3100...
-[2025-03-15 15:58:39,463][06641] Num frames 3200...
-[2025-03-15 15:58:39,625][06641] Avg episode rewards: #0: 27.320, true rewards: #0: 10.987
-[2025-03-15 15:58:39,626][06641] Avg episode reward: 27.320, avg true_objective: 10.987
-[2025-03-15 15:58:39,634][06641] Num frames 3300...
-[2025-03-15 15:58:39,755][06641] Num frames 3400...
-[2025-03-15 15:58:39,866][06641] Num frames 3500...
-[2025-03-15 15:58:39,975][06641] Num frames 3600...
-[2025-03-15 15:58:40,097][06641] Num frames 3700...
-[2025-03-15 15:58:40,210][06641] Num frames 3800...
-[2025-03-15 15:58:40,320][06641] Num frames 3900...
-[2025-03-15 15:58:40,430][06641] Num frames 4000...
-[2025-03-15 15:58:40,546][06641] Num frames 4100...
-[2025-03-15 15:58:40,707][06641] Avg episode rewards: #0: 25.230, true rewards: #0: 10.480
-[2025-03-15 15:58:40,710][06641] Avg episode reward: 25.230, avg true_objective: 10.480
-[2025-03-15 15:58:40,737][06641] Num frames 4200...
-[2025-03-15 15:58:40,855][06641] Num frames 4300...
-[2025-03-15 15:58:40,967][06641] Num frames 4400...
-[2025-03-15 15:58:41,072][06641] Num frames 4500...
-[2025-03-15 15:58:41,187][06641] Num frames 4600...
-[2025-03-15 15:58:41,301][06641] Num frames 4700...
-[2025-03-15 15:58:41,396][06641] Avg episode rewards: #0: 21.672, true rewards: #0: 9.472
-[2025-03-15 15:58:41,396][06641] Avg episode reward: 21.672, avg true_objective: 9.472
-[2025-03-15 15:58:41,487][06641] Num frames 4800...
-[2025-03-15 15:58:41,609][06641] Num frames 4900...
-[2025-03-15 15:58:41,729][06641] Num frames 5000...
-[2025-03-15 15:58:41,850][06641] Num frames 5100...
-[2025-03-15 15:58:42,004][06641] Avg episode rewards: #0: 19.473, true rewards: #0: 8.640
-[2025-03-15 15:58:42,005][06641] Avg episode reward: 19.473, avg true_objective: 8.640
-[2025-03-15 15:58:42,026][06641] Num frames 5200...
-[2025-03-15 15:58:42,140][06641] Num frames 5300...
-[2025-03-15 15:58:42,247][06641] Num frames 5400...
-[2025-03-15 15:58:42,360][06641] Num frames 5500...
-[2025-03-15 15:58:42,475][06641] Num frames 5600...
-[2025-03-15 15:58:42,596][06641] Num frames 5700...
-[2025-03-15 15:58:42,709][06641] Num frames 5800...
-[2025-03-15 15:58:42,825][06641] Num frames 5900...
-[2025-03-15 15:58:42,943][06641] Num frames 6000...
-[2025-03-15 15:58:43,065][06641] Num frames 6100...
-[2025-03-15 15:58:43,184][06641] Num frames 6200...
-[2025-03-15 15:58:43,293][06641] Num frames 6300...
-[2025-03-15 15:58:43,415][06641] Num frames 6400...
-[2025-03-15 15:58:43,522][06641] Num frames 6500...
-[2025-03-15 15:58:43,639][06641] Num frames 6600...
-[2025-03-15 15:58:43,757][06641] Num frames 6700...
-[2025-03-15 15:58:43,875][06641] Num frames 6800...
-[2025-03-15 15:58:43,985][06641] Num frames 6900...
-[2025-03-15 15:58:44,098][06641] Num frames 7000...
-[2025-03-15 15:58:44,210][06641] Num frames 7100...
-[2025-03-15 15:58:44,329][06641] Num frames 7200...
-[2025-03-15 15:58:44,479][06641] Avg episode rewards: #0: 24.406, true rewards: #0: 10.406
-[2025-03-15 15:58:44,479][06641] Avg episode reward: 24.406, avg true_objective: 10.406
-[2025-03-15 15:58:44,508][06641] Num frames 7300...
-[2025-03-15 15:58:44,632][06641] Num frames 7400...
-[2025-03-15 15:58:44,737][06641] Num frames 7500...
-[2025-03-15 15:58:44,846][06641] Num frames 7600...
-[2025-03-15 15:58:44,955][06641] Num frames 7700...
-[2025-03-15 15:58:45,112][06641] Avg episode rewards: #0: 22.370, true rewards: #0: 9.745
-[2025-03-15 15:58:45,112][06641] Avg episode reward: 22.370, avg true_objective: 9.745
-[2025-03-15 15:58:45,119][06641] Num frames 7800...
-[2025-03-15 15:58:45,224][06641] Num frames 7900...
-[2025-03-15 15:58:45,330][06641] Num frames 8000...
-[2025-03-15 15:58:45,441][06641] Num frames 8100...
-[2025-03-15 15:58:45,546][06641] Num frames 8200...
-[2025-03-15 15:58:45,651][06641] Num frames 8300...
-[2025-03-15 15:58:45,758][06641] Num frames 8400...
-[2025-03-15 15:58:45,868][06641] Num frames 8500...
-[2025-03-15 15:58:45,974][06641] Num frames 8600...
-[2025-03-15 15:58:46,083][06641] Num frames 8700...
-[2025-03-15 15:58:46,199][06641] Avg episode rewards: #0: 22.062, true rewards: #0: 9.729
-[2025-03-15 15:58:46,199][06641] Avg episode reward: 22.062, avg true_objective: 9.729
-[2025-03-15 15:58:46,279][06641] Num frames 8800...
-[2025-03-15 15:58:46,384][06641] Num frames 8900...
-[2025-03-15 15:58:46,493][06641] Num frames 9000...
-[2025-03-15 15:58:46,605][06641] Num frames 9100...
-[2025-03-15 15:58:46,722][06641] Num frames 9200...
-[2025-03-15 15:58:46,830][06641] Num frames 9300...
-[2025-03-15 15:58:46,940][06641] Num frames 9400...
-[2025-03-15 15:58:47,053][06641] Num frames 9500...
-[2025-03-15 15:58:47,161][06641] Num frames 9600...
-[2025-03-15 15:58:47,270][06641] Num frames 9700...
-[2025-03-15 15:58:47,385][06641] Num frames 9800...
-[2025-03-15 15:58:47,499][06641] Num frames 9900...
-[2025-03-15 15:58:47,618][06641] Num frames 10000...
-[2025-03-15 15:58:47,742][06641] Num frames 10100...
-[2025-03-15 15:58:47,851][06641] Num frames 10200...
-[2025-03-15 15:58:47,970][06641] Num frames 10300...
-[2025-03-15 15:58:48,091][06641] Num frames 10400...
-[2025-03-15 15:58:48,189][06641] Avg episode rewards: #0: 24.037, true rewards: #0: 10.437
-[2025-03-15 15:58:48,190][06641] Avg episode reward: 24.037, avg true_objective: 10.437
-[2025-03-15 15:59:10,238][06641] Replay video saved to /home/aa/Downloads/train_dir/default_experiment/replay.mp4!
+  wait_policy_total: 10.2723
+update_model: 9.4224
+  weight_update: 0.0013
+one_step: 0.0040
+  handle_policy_step: 565.7002
+    deserialize: 24.9351, stack: 3.3637, obs_to_device_normalize: 133.9778, forward: 255.6293, send_messages: 41.0534
+    prepare_outputs: 85.3800
+      to_cpu: 58.8882
+[2025-03-15 16:11:02,283][09281] Learner 0 profile tree view:
+misc: 0.0093, prepare_batch: 23.1607
+train: 96.1219
+  epoch_init: 0.0102, minibatch_init: 0.0144, losses_postprocess: 1.1455, kl_divergence: 0.9498, after_optimizer: 41.9016
+  calculate_losses: 32.7948
+    losses_init: 0.0075, forward_head: 1.8398, bptt_initial: 22.8006, tail: 1.4916, advantages_returns: 0.4144, losses: 3.2450
+    bptt: 2.5851
+      bptt_forward_core: 2.4455
+  update: 18.4554
+    clip: 1.9753
+[2025-03-15 16:11:02,284][09281] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.3951, enqueue_policy_requests: 26.6558, env_step: 337.6223, overhead: 17.8972, complete_rollouts: 0.5986
+save_policy_outputs: 29.5487
+  split_output_tensors: 9.6943
+[2025-03-15 16:11:02,285][09281] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.3968, enqueue_policy_requests: 26.7717, env_step: 337.7451, overhead: 17.6112, complete_rollouts: 0.5969
+save_policy_outputs: 29.5733
+  split_output_tensors: 9.6699
+[2025-03-15 16:11:02,285][09281] Loop Runner_EvtLoop terminating...
+[2025-03-15 16:11:02,286][09281] Runner profile tree view:
+main_loop: 620.0620
+[2025-03-15 16:11:02,287][09281] Collected {0: 10006528}, FPS: 16137.9
+[2025-03-15 16:11:02,311][09281] Loading existing experiment configuration from /home/aa/Downloads/train_dir/default_experiment/config.json
+[2025-03-15 16:11:02,312][09281] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-03-15 16:11:02,312][09281] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-03-15 16:11:02,313][09281] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-03-15 16:11:02,313][09281] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-03-15 16:11:02,314][09281] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-03-15 16:11:02,314][09281] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-03-15 16:11:02,315][09281] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-03-15 16:11:02,315][09281] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-03-15 16:11:02,317][09281] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-03-15 16:11:02,317][09281] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-03-15 16:11:02,317][09281] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-03-15 16:11:02,318][09281] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-03-15 16:11:02,318][09281] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-03-15 16:11:02,319][09281] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-03-15 16:11:02,339][09281] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-03-15 16:11:02,341][09281] RunningMeanStd input shape: (3, 72, 128)
+[2025-03-15 16:11:02,342][09281] RunningMeanStd input shape: (1,)
+[2025-03-15 16:11:02,351][09281] ConvEncoder: input_channels=3
+[2025-03-15 16:11:02,442][09281] Conv encoder output size: 512
+[2025-03-15 16:11:02,443][09281] Policy head output size: 512
+[2025-03-15 16:11:02,571][09281] Loading state from checkpoint /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
+[2025-03-15 16:11:03,194][09281] Num frames 100...
+[2025-03-15 16:11:03,291][09281] Num frames 200...
+[2025-03-15 16:11:03,384][09281] Num frames 300...
+[2025-03-15 16:11:03,478][09281] Num frames 400...
+[2025-03-15 16:11:03,570][09281] Num frames 500...
+[2025-03-15 16:11:03,662][09281] Num frames 600...
+[2025-03-15 16:11:03,756][09281] Num frames 700...
+[2025-03-15 16:11:03,850][09281] Num frames 800...
+[2025-03-15 16:11:03,944][09281] Num frames 900...
+[2025-03-15 16:11:04,039][09281] Num frames 1000...
+[2025-03-15 16:11:04,151][09281] Num frames 1100...
+[2025-03-15 16:11:04,254][09281] Num frames 1200...
+[2025-03-15 16:11:04,359][09281] Num frames 1300...
+[2025-03-15 16:11:04,456][09281] Num frames 1400...
+[2025-03-15 16:11:04,556][09281] Num frames 1500...
+[2025-03-15 16:11:04,660][09281] Num frames 1600...
+[2025-03-15 16:11:04,765][09281] Num frames 1700...
+[2025-03-15 16:11:04,868][09281] Num frames 1800...
+[2025-03-15 16:11:04,980][09281] Avg episode rewards: #0: 48.559, true rewards: #0: 18.560
+[2025-03-15 16:11:04,980][09281] Avg episode reward: 48.559, avg true_objective: 18.560
+[2025-03-15 16:11:05,064][09281] Num frames 1900...
+[2025-03-15 16:11:05,159][09281] Num frames 2000...
+[2025-03-15 16:11:05,259][09281] Num frames 2100...
+[2025-03-15 16:11:05,359][09281] Num frames 2200...
+[2025-03-15 16:11:05,460][09281] Num frames 2300...
+[2025-03-15 16:11:05,560][09281] Num frames 2400...
+[2025-03-15 16:11:05,661][09281] Num frames 2500...
+[2025-03-15 16:11:05,762][09281] Num frames 2600...
+[2025-03-15 16:11:05,863][09281] Num frames 2700...
+[2025-03-15 16:11:05,964][09281] Num frames 2800...
+[2025-03-15 16:11:06,068][09281] Num frames 2900...
+[2025-03-15 16:11:06,170][09281] Num frames 3000...
+[2025-03-15 16:11:06,273][09281] Num frames 3100...
+[2025-03-15 16:11:06,373][09281] Num frames 3200...
+[2025-03-15 16:11:06,476][09281] Num frames 3300...
+[2025-03-15 16:11:06,580][09281] Num frames 3400...
+[2025-03-15 16:11:06,682][09281] Num frames 3500...
+[2025-03-15 16:11:06,784][09281] Num frames 3600...
+[2025-03-15 16:11:06,886][09281] Num frames 3700...
+[2025-03-15 16:11:06,989][09281] Num frames 3800...
+[2025-03-15 16:11:07,094][09281] Num frames 3900...
+[2025-03-15 16:11:07,204][09281] Avg episode rewards: #0: 53.779, true rewards: #0: 19.780
+[2025-03-15 16:11:07,205][09281] Avg episode reward: 53.779, avg true_objective: 19.780
+[2025-03-15 16:11:07,273][09281] Num frames 4000...
+[2025-03-15 16:11:07,369][09281] Num frames 4100...
+[2025-03-15 16:11:07,470][09281] Num frames 4200...
+[2025-03-15 16:11:07,570][09281] Num frames 4300...
+[2025-03-15 16:11:07,674][09281] Num frames 4400...
+[2025-03-15 16:11:07,775][09281] Num frames 4500...
+[2025-03-15 16:11:07,877][09281] Num frames 4600...
+[2025-03-15 16:11:07,979][09281] Num frames 4700...
+[2025-03-15 16:11:08,080][09281] Num frames 4800...
+[2025-03-15 16:11:08,183][09281] Num frames 4900...
+[2025-03-15 16:11:08,285][09281] Num frames 5000...
+[2025-03-15 16:11:08,387][09281] Num frames 5100...
+[2025-03-15 16:11:08,490][09281] Num frames 5200...
+[2025-03-15 16:11:08,592][09281] Num frames 5300...
+[2025-03-15 16:11:08,695][09281] Num frames 5400...
+[2025-03-15 16:11:08,794][09281] Avg episode rewards: #0: 47.813, true rewards: #0: 18.147
+[2025-03-15 16:11:08,796][09281] Avg episode reward: 47.813, avg true_objective: 18.147
+[2025-03-15 16:11:08,893][09281] Num frames 5500...
+[2025-03-15 16:11:08,988][09281] Num frames 5600...
+[2025-03-15 16:11:09,088][09281] Num frames 5700...
+[2025-03-15 16:11:09,189][09281] Num frames 5800...
+[2025-03-15 16:11:09,253][09281] Avg episode rewards: #0: 37.274, true rewards: #0: 14.525
+[2025-03-15 16:11:09,254][09281] Avg episode reward: 37.274, avg true_objective: 14.525
+[2025-03-15 16:11:09,362][09281] Num frames 5900...
+[2025-03-15 16:11:09,462][09281] Num frames 6000...
+[2025-03-15 16:11:09,562][09281] Num frames 6100...
+[2025-03-15 16:11:09,661][09281] Num frames 6200...
+[2025-03-15 16:11:09,763][09281] Num frames 6300...
+[2025-03-15 16:11:09,873][09281] Num frames 6400...
+[2025-03-15 16:11:09,972][09281] Num frames 6500...
+[2025-03-15 16:11:10,071][09281] Num frames 6600...
+[2025-03-15 16:11:10,180][09281] Num frames 6700...
+[2025-03-15 16:11:10,282][09281] Num frames 6800...
+[2025-03-15 16:11:10,390][09281] Num frames 6900...
+[2025-03-15 16:11:10,491][09281] Num frames 7000...
+[2025-03-15 16:11:10,571][09281] Avg episode rewards: #0: 36.254, true rewards: #0: 14.054
+[2025-03-15 16:11:10,572][09281] Avg episode reward: 36.254, avg true_objective: 14.054
+[2025-03-15 16:11:10,660][09281] Num frames 7100...
+[2025-03-15 16:11:10,757][09281] Num frames 7200...
+[2025-03-15 16:11:10,858][09281] Num frames 7300...
+[2025-03-15 16:11:10,957][09281] Num frames 7400...
+[2025-03-15 16:11:11,056][09281] Num frames 7500...
+[2025-03-15 16:11:11,156][09281] Num frames 7600...
+[2025-03-15 16:11:11,257][09281] Num frames 7700...
+[2025-03-15 16:11:11,357][09281] Num frames 7800...
+[2025-03-15 16:11:11,452][09281] Num frames 7900...
+[2025-03-15 16:11:11,549][09281] Num frames 8000...
+[2025-03-15 16:11:11,644][09281] Num frames 8100...
+[2025-03-15 16:11:11,740][09281] Num frames 8200...
+[2025-03-15 16:11:11,839][09281] Num frames 8300...
+[2025-03-15 16:11:11,936][09281] Num frames 8400...
+[2025-03-15 16:11:12,053][09281] Avg episode rewards: #0: 36.278, true rewards: #0: 14.112
+[2025-03-15 16:11:12,054][09281] Avg episode reward: 36.278, avg true_objective: 14.112
+[2025-03-15 16:11:12,110][09281] Num frames 8500...
+[2025-03-15 16:11:12,205][09281] Num frames 8600...
+[2025-03-15 16:11:12,300][09281] Num frames 8700...
+[2025-03-15 16:11:12,396][09281] Num frames 8800...
+[2025-03-15 16:11:12,491][09281] Num frames 8900...
+[2025-03-15 16:11:12,585][09281] Num frames 9000...
+[2025-03-15 16:11:12,679][09281] Num frames 9100...
+[2025-03-15 16:11:12,776][09281] Num frames 9200...
+[2025-03-15 16:11:12,872][09281] Num frames 9300...
+[2025-03-15 16:11:12,967][09281] Num frames 9400...
+[2025-03-15 16:11:13,062][09281] Num frames 9500...
+[2025-03-15 16:11:13,157][09281] Num frames 9600...
+[2025-03-15 16:11:13,254][09281] Num frames 9700...
+[2025-03-15 16:11:13,349][09281] Num frames 9800...
+[2025-03-15 16:11:13,445][09281] Num frames 9900...
+[2025-03-15 16:11:13,541][09281] Num frames 10000...
+[2025-03-15 16:11:13,636][09281] Num frames 10100...
+[2025-03-15 16:11:13,731][09281] Num frames 10200...
+[2025-03-15 16:11:13,827][09281] Num frames 10300...
+[2025-03-15 16:11:13,922][09281] Num frames 10400...
+[2025-03-15 16:11:14,016][09281] Num frames 10500...
+[2025-03-15 16:11:14,133][09281] Avg episode rewards: #0: 39.095, true rewards: #0: 15.096
+[2025-03-15 16:11:14,134][09281] Avg episode reward: 39.095, avg true_objective: 15.096
+[2025-03-15 16:11:14,184][09281] Num frames 10600...
+[2025-03-15 16:11:14,279][09281] Num frames 10700...
+[2025-03-15 16:11:14,373][09281] Num frames 10800...
+[2025-03-15 16:11:14,465][09281] Num frames 10900...
+[2025-03-15 16:11:14,561][09281] Num frames 11000...
+[2025-03-15 16:11:14,659][09281] Avg episode rewards: #0: 35.808, true rewards: #0: 13.809
+[2025-03-15 16:11:14,660][09281] Avg episode reward: 35.808, avg true_objective: 13.809
+[2025-03-15 16:11:14,732][09281] Num frames 11100...
+[2025-03-15 16:11:14,824][09281] Num frames 11200...
+[2025-03-15 16:11:14,917][09281] Num frames 11300...
+[2025-03-15 16:11:15,010][09281] Num frames 11400...
+[2025-03-15 16:11:15,102][09281] Num frames 11500...
+[2025-03-15 16:11:15,195][09281] Num frames 11600...
+[2025-03-15 16:11:15,290][09281] Num frames 11700...
+[2025-03-15 16:11:15,383][09281] Num frames 11800...
+[2025-03-15 16:11:15,476][09281] Num frames 11900...
+[2025-03-15 16:11:15,571][09281] Num frames 12000...
+[2025-03-15 16:11:15,666][09281] Num frames 12100...
+[2025-03-15 16:11:15,723][09281] Avg episode rewards: #0: 34.781, true rewards: #0: 13.448
+[2025-03-15 16:11:15,725][09281] Avg episode reward: 34.781, avg true_objective: 13.448
+[2025-03-15 16:11:15,829][09281] Num frames 12200...
+[2025-03-15 16:11:15,920][09281] Num frames 12300...
+[2025-03-15 16:11:16,010][09281] Num frames 12400...
+[2025-03-15 16:11:16,101][09281] Num frames 12500...
+[2025-03-15 16:11:16,192][09281] Num frames 12600...
+[2025-03-15 16:11:16,259][09281] Avg episode rewards: #0: 32.215, true rewards: #0: 12.615
+[2025-03-15 16:11:16,260][09281] Avg episode reward: 32.215, avg true_objective: 12.615
+[2025-03-15 16:11:40,656][09281] Replay video saved to /home/aa/Downloads/train_dir/default_experiment/replay.mp4!
+[2025-03-15 16:11:41,962][09281] Loading existing experiment configuration from /home/aa/Downloads/train_dir/default_experiment/config.json
+[2025-03-15 16:11:41,962][09281] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-03-15 16:11:41,963][09281] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-03-15 16:11:41,964][09281] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-03-15 16:11:41,964][09281] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-03-15 16:11:41,965][09281] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-03-15 16:11:41,965][09281] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-03-15 16:11:41,966][09281] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-03-15 16:11:41,966][09281] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-03-15 16:11:41,966][09281] Adding new argument 'hf_repository'='ALEXIOSTER/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-03-15 16:11:41,967][09281] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-03-15 16:11:41,968][09281] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-03-15 16:11:41,969][09281] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-03-15 16:11:41,969][09281] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-03-15 16:11:41,970][09281] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-03-15 16:11:41,990][09281] RunningMeanStd input shape: (3, 72, 128)
+[2025-03-15 16:11:41,991][09281] RunningMeanStd input shape: (1,)
+[2025-03-15 16:11:42,000][09281] ConvEncoder: input_channels=3
+[2025-03-15 16:11:42,039][09281] Conv encoder output size: 512
+[2025-03-15 16:11:42,039][09281] Policy head output size: 512
+[2025-03-15 16:11:42,072][09281] Loading state from checkpoint /home/aa/Downloads/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
+[2025-03-15 16:11:42,484][09281] Num frames 100...
+[2025-03-15 16:11:42,588][09281] Num frames 200...
+[2025-03-15 16:11:42,693][09281] Num frames 300...
+[2025-03-15 16:11:42,800][09281] Num frames 400...
+[2025-03-15 16:11:42,902][09281] Num frames 500...
+[2025-03-15 16:11:43,001][09281] Num frames 600...
+[2025-03-15 16:11:43,102][09281] Num frames 700...
+[2025-03-15 16:11:43,208][09281] Num frames 800...
+[2025-03-15 16:11:43,309][09281] Num frames 900...
+[2025-03-15 16:11:43,411][09281] Num frames 1000...
+[2025-03-15 16:11:43,511][09281] Num frames 1100...
+[2025-03-15 16:11:43,610][09281] Num frames 1200...
+[2025-03-15 16:11:43,707][09281] Num frames 1300...
+[2025-03-15 16:11:43,803][09281] Num frames 1400...
+[2025-03-15 16:11:43,910][09281] Num frames 1500...
+[2025-03-15 16:11:44,024][09281] Num frames 1600...
+[2025-03-15 16:11:44,128][09281] Num frames 1700...
+[2025-03-15 16:11:44,228][09281] Num frames 1800...
+[2025-03-15 16:11:44,364][09281] Avg episode rewards: #0: 47.779, true rewards: #0: 18.780
+[2025-03-15 16:11:44,364][09281] Avg episode reward: 47.779, avg true_objective: 18.780
+[2025-03-15 16:11:44,399][09281] Num frames 1900...
+[2025-03-15 16:11:44,505][09281] Num frames 2000...
+[2025-03-15 16:11:44,614][09281] Num frames 2100...
+[2025-03-15 16:11:44,718][09281] Num frames 2200...
+[2025-03-15 16:11:44,822][09281] Num frames 2300...
+[2025-03-15 16:11:44,922][09281] Num frames 2400...
+[2025-03-15 16:11:45,022][09281] Num frames 2500...
+[2025-03-15 16:11:45,124][09281] Num frames 2600...
+[2025-03-15 16:11:45,225][09281] Num frames 2700...
+[2025-03-15 16:11:45,326][09281] Num frames 2800...
+[2025-03-15 16:11:45,428][09281] Num frames 2900...
+[2025-03-15 16:11:45,534][09281] Num frames 3000...
+[2025-03-15 16:11:45,637][09281] Num frames 3100...
+[2025-03-15 16:11:45,753][09281] Num frames 3200...
+[2025-03-15 16:11:45,852][09281] Num frames 3300...
+[2025-03-15 16:11:45,962][09281] Num frames 3400...
+[2025-03-15 16:11:46,065][09281] Num frames 3500...
+[2025-03-15 16:11:46,166][09281] Num frames 3600...
+[2025-03-15 16:11:46,273][09281] Num frames 3700...
+[2025-03-15 16:11:46,371][09281] Num frames 3800...
+[2025-03-15 16:11:46,470][09281] Num frames 3900...
+[2025-03-15 16:11:46,596][09281] Avg episode rewards: #0: 53.889, true rewards: #0: 19.890
+[2025-03-15 16:11:46,597][09281] Avg episode reward: 53.889, avg true_objective: 19.890
+[2025-03-15 16:11:46,632][09281] Num frames 4000...
+[2025-03-15 16:11:46,736][09281] Num frames 4100...
+[2025-03-15 16:11:46,831][09281] Num frames 4200...
+[2025-03-15 16:11:46,938][09281] Num frames 4300...
+[2025-03-15 16:11:47,040][09281] Num frames 4400...
+[2025-03-15 16:11:47,146][09281] Num frames 4500...
+[2025-03-15 16:11:47,249][09281] Num frames 4600...
+[2025-03-15 16:11:47,352][09281] Num frames 4700...
+[2025-03-15 16:11:47,453][09281] Avg episode rewards: #0: 41.819, true rewards: #0: 15.820
+[2025-03-15 16:11:47,454][09281] Avg episode reward: 41.819, avg true_objective: 15.820
+[2025-03-15 16:11:47,533][09281] Num frames 4800...
+[2025-03-15 16:11:47,629][09281] Num frames 4900...
+[2025-03-15 16:11:47,728][09281] Num frames 5000...
+[2025-03-15 16:11:47,877][09281] Avg episode rewards: #0: 32.995, true rewards: #0: 12.745
+[2025-03-15 16:11:47,878][09281] Avg episode reward: 32.995, avg true_objective: 12.745
+[2025-03-15 16:11:47,881][09281] Num frames 5100...
+[2025-03-15 16:11:47,994][09281] Num frames 5200...
+[2025-03-15 16:11:48,092][09281] Num frames 5300...
+[2025-03-15 16:11:48,189][09281] Num frames 5400...
+[2025-03-15 16:11:48,286][09281] Num frames 5500...
+[2025-03-15 16:11:48,383][09281] Num frames 5600...
+[2025-03-15 16:11:48,483][09281] Num frames 5700...
+[2025-03-15 16:11:48,580][09281] Num frames 5800...
+[2025-03-15 16:11:48,677][09281] Num frames 5900...
+[2025-03-15 16:11:48,774][09281] Num frames 6000...
+[2025-03-15 16:11:48,874][09281] Num frames 6100...
+[2025-03-15 16:11:48,975][09281] Num frames 6200...
+[2025-03-15 16:11:49,076][09281] Num frames 6300...
+[2025-03-15 16:11:49,174][09281] Num frames 6400...
+[2025-03-15 16:11:49,273][09281] Num frames 6500...
+[2025-03-15 16:11:49,372][09281] Num frames 6600...
+[2025-03-15 16:11:49,470][09281] Num frames 6700...
+[2025-03-15 16:11:49,570][09281] Num frames 6800...
+[2025-03-15 16:11:49,700][09281] Avg episode rewards: #0: 34.953, true rewards: #0: 13.754
+[2025-03-15 16:11:49,701][09281] Avg episode reward: 34.953, avg true_objective: 13.754
+[2025-03-15 16:11:49,745][09281] Num frames 6900...
+[2025-03-15 16:11:49,849][09281] Num frames 7000...
+[2025-03-15 16:11:49,948][09281] Num frames 7100...
+[2025-03-15 16:11:50,047][09281] Num frames 7200...
+[2025-03-15 16:11:50,143][09281] Num frames 7300...
+[2025-03-15 16:11:50,242][09281] Num frames 7400...
+[2025-03-15 16:11:50,345][09281] Num frames 7500...
+[2025-03-15 16:11:50,445][09281] Num frames 7600...
+[2025-03-15 16:11:50,544][09281] Num frames 7700...
+[2025-03-15 16:11:50,644][09281] Num frames 7800...
+[2025-03-15 16:11:50,746][09281] Num frames 7900...
+[2025-03-15 16:11:50,839][09281] Num frames 8000...
+[2025-03-15 16:11:50,940][09281] Num frames 8100...
+[2025-03-15 16:11:51,041][09281] Num frames 8200...
+[2025-03-15 16:11:51,145][09281] Num frames 8300...
+[2025-03-15 16:11:51,244][09281] Num frames 8400...
+[2025-03-15 16:11:51,341][09281] Num frames 8500...
+[2025-03-15 16:11:51,453][09281] Num frames 8600...
+[2025-03-15 16:11:51,551][09281] Num frames 8700...
+[2025-03-15 16:11:51,650][09281] Num frames 8800...
+[2025-03-15 16:11:51,765][09281] Avg episode rewards: #0: 38.434, true rewards: #0: 14.768
+[2025-03-15 16:11:51,766][09281] Avg episode reward: 38.434, avg true_objective: 14.768
+[2025-03-15 16:11:51,843][09281] Num frames 8900...
+[2025-03-15 16:11:51,941][09281] Num frames 9000...
+[2025-03-15 16:11:52,039][09281] Num frames 9100...
+[2025-03-15 16:11:52,137][09281] Num frames 9200...
+[2025-03-15 16:11:52,237][09281] Num frames 9300...
+[2025-03-15 16:11:52,335][09281] Num frames 9400...
+[2025-03-15 16:11:52,434][09281] Num frames 9500...
+[2025-03-15 16:11:52,537][09281] Num frames 9600...
+[2025-03-15 16:11:52,641][09281] Num frames 9700...
+[2025-03-15 16:11:52,741][09281] Num frames 9800...
+[2025-03-15 16:11:52,841][09281] Num frames 9900...
+[2025-03-15 16:11:52,941][09281] Num frames 10000...
+[2025-03-15 16:11:53,038][09281] Num frames 10100...
+[2025-03-15 16:11:53,136][09281] Num frames 10200...
+[2025-03-15 16:11:53,235][09281] Num frames 10300...
+[2025-03-15 16:11:53,336][09281] Num frames 10400...
+[2025-03-15 16:11:53,430][09281] Num frames 10500...
+[2025-03-15 16:11:53,532][09281] Num frames 10600...
+[2025-03-15 16:11:53,634][09281] Num frames 10700...
+[2025-03-15 16:11:53,735][09281] Num frames 10800...
+[2025-03-15 16:11:53,836][09281] Num frames 10900...
+[2025-03-15 16:11:53,953][09281] Avg episode rewards: #0: 40.801, true rewards: #0: 15.659
+[2025-03-15 16:11:53,953][09281] Avg episode reward: 40.801, avg true_objective: 15.659
+[2025-03-15 16:11:54,017][09281] Num frames 11000...
+[2025-03-15 16:11:54,113][09281] Num frames 11100...
+[2025-03-15 16:11:54,210][09281] Num frames 11200...
+[2025-03-15 16:11:54,311][09281] Num frames 11300...
+[2025-03-15 16:11:54,412][09281] Num frames 11400...
+[2025-03-15 16:11:54,514][09281] Num frames 11500...
+[2025-03-15 16:11:54,621][09281] Num frames 11600...
+[2025-03-15 16:11:54,712][09281] Avg episode rewards: #0: 37.541, true rewards: #0: 14.541
+[2025-03-15 16:11:54,713][09281] Avg episode reward: 37.541, avg true_objective: 14.541
+[2025-03-15 16:11:54,796][09281] Num frames 11700...
+[2025-03-15 16:11:54,895][09281] Num frames 11800...
+[2025-03-15 16:11:54,997][09281] Num frames 11900...
+[2025-03-15 16:11:55,101][09281] Num frames 12000...
+[2025-03-15 16:11:55,202][09281] Num frames 12100...
+[2025-03-15 16:11:55,272][09281] Avg episode rewards: #0: 34.682, true rewards: #0: 13.460
+[2025-03-15 16:11:55,273][09281] Avg episode reward: 34.682, avg true_objective: 13.460
+[2025-03-15 16:11:55,375][09281] Num frames 12200...
+[2025-03-15 16:11:55,475][09281] Num frames 12300...
+[2025-03-15 16:11:55,576][09281] Num frames 12400...
+[2025-03-15 16:11:55,677][09281] Num frames 12500...
+[2025-03-15 16:11:55,781][09281] Num frames 12600...
+[2025-03-15 16:11:55,886][09281] Num frames 12700...
+[2025-03-15 16:11:55,990][09281] Num frames 12800...
+[2025-03-15 16:11:56,091][09281] Num frames 12900...
+[2025-03-15 16:11:56,198][09281] Avg episode rewards: #0: 33.349, true rewards: #0: 12.949
+[2025-03-15 16:11:56,198][09281] Avg episode reward: 33.349, avg true_objective: 12.949
+[2025-03-15 16:12:21,878][09281] Replay video saved to /home/aa/Downloads/train_dir/default_experiment/replay.mp4!