[2025-02-20 15:25:40,690][00228] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-20 15:25:40,692][00228] Rollout worker 0 uses device cpu [2025-02-20 15:25:40,693][00228] Rollout worker 1 uses device cpu [2025-02-20 15:25:40,694][00228] Rollout worker 2 uses device cpu [2025-02-20 15:25:40,695][00228] Rollout worker 3 uses device cpu [2025-02-20 15:25:40,696][00228] Rollout worker 4 uses device cpu [2025-02-20 15:25:40,697][00228] Rollout worker 5 uses device cpu [2025-02-20 15:25:40,698][00228] Rollout worker 6 uses device cpu [2025-02-20 15:25:40,699][00228] Rollout worker 7 uses device cpu [2025-02-20 15:25:40,847][00228] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-20 15:25:40,848][00228] InferenceWorker_p0-w0: min num requests: 2 [2025-02-20 15:25:40,879][00228] Starting all processes... [2025-02-20 15:25:40,880][00228] Starting process learner_proc0 [2025-02-20 15:25:40,934][00228] Starting all processes... [2025-02-20 15:25:40,942][00228] Starting process inference_proc0-0 [2025-02-20 15:25:40,943][00228] Starting process rollout_proc0 [2025-02-20 15:25:40,943][00228] Starting process rollout_proc1 [2025-02-20 15:25:40,943][00228] Starting process rollout_proc2 [2025-02-20 15:25:40,943][00228] Starting process rollout_proc3 [2025-02-20 15:25:40,943][00228] Starting process rollout_proc4 [2025-02-20 15:25:40,943][00228] Starting process rollout_proc5 [2025-02-20 15:25:40,943][00228] Starting process rollout_proc6 [2025-02-20 15:25:40,943][00228] Starting process rollout_proc7 [2025-02-20 15:25:55,872][04092] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-20 15:25:55,878][04092] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-20 15:25:55,988][04092] Num visible devices: 1 [2025-02-20 15:25:55,996][04111] Worker 5 uses CPU cores [1] [2025-02-20 15:25:56,049][04092] Starting seed is not provided [2025-02-20 15:25:56,050][04092] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-20 15:25:56,050][04092] Initializing actor-critic model on device cuda:0 [2025-02-20 15:25:56,052][04092] RunningMeanStd input shape: (3, 72, 128) [2025-02-20 15:25:56,056][04092] RunningMeanStd input shape: (1,) [2025-02-20 15:25:56,187][04092] ConvEncoder: input_channels=3 [2025-02-20 15:25:56,191][04112] Worker 6 uses CPU cores [0] [2025-02-20 15:25:56,297][04109] Worker 4 uses CPU cores [0] [2025-02-20 15:25:56,392][04105] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-20 15:25:56,393][04105] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-20 15:25:56,410][04108] Worker 2 uses CPU cores [0] [2025-02-20 15:25:56,500][04105] Num visible devices: 1 [2025-02-20 15:25:56,595][04106] Worker 0 uses CPU cores [0] [2025-02-20 15:25:56,608][04113] Worker 7 uses CPU cores [1] [2025-02-20 15:25:56,650][04110] Worker 3 uses CPU cores [1] [2025-02-20 15:25:56,766][04107] Worker 1 uses CPU cores [1] [2025-02-20 15:25:56,812][04092] Conv encoder output size: 512 [2025-02-20 15:25:56,813][04092] Policy head output size: 512 [2025-02-20 15:25:56,876][04092] Created Actor Critic model with architecture: [2025-02-20 15:25:56,876][04092] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-20 15:25:57,238][04092] Using optimizer [2025-02-20 15:26:00,841][00228] Heartbeat connected on Batcher_0 [2025-02-20 15:26:00,847][00228] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-20 15:26:00,856][00228] Heartbeat connected on RolloutWorker_w0 [2025-02-20 15:26:00,858][00228] Heartbeat connected on RolloutWorker_w1 [2025-02-20 15:26:00,861][00228] Heartbeat connected on RolloutWorker_w2 [2025-02-20 15:26:00,865][00228] Heartbeat connected on RolloutWorker_w3 [2025-02-20 15:26:00,868][00228] Heartbeat connected on RolloutWorker_w4 [2025-02-20 15:26:00,875][00228] Heartbeat connected on RolloutWorker_w5 [2025-02-20 15:26:00,877][00228] Heartbeat connected on RolloutWorker_w6 [2025-02-20 15:26:00,881][00228] Heartbeat connected on RolloutWorker_w7 [2025-02-20 15:26:01,591][04092] No checkpoints found [2025-02-20 15:26:01,592][04092] Did not load from checkpoint, starting from scratch! [2025-02-20 15:26:01,592][04092] Initialized policy 0 weights for model version 0 [2025-02-20 15:26:01,594][04092] LearnerWorker_p0 finished initialization! [2025-02-20 15:26:01,595][00228] Heartbeat connected on LearnerWorker_p0 [2025-02-20 15:26:01,598][04092] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-20 15:26:01,736][04105] RunningMeanStd input shape: (3, 72, 128) [2025-02-20 15:26:01,737][04105] RunningMeanStd input shape: (1,) [2025-02-20 15:26:01,749][04105] ConvEncoder: input_channels=3 [2025-02-20 15:26:01,849][04105] Conv encoder output size: 512 [2025-02-20 15:26:01,850][04105] Policy head output size: 512 [2025-02-20 15:26:01,884][00228] Inference worker 0-0 is ready! [2025-02-20 15:26:01,886][00228] All inference workers are ready! Signal rollout workers to start! [2025-02-20 15:26:02,173][04108] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:26:02,170][04110] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:26:02,179][04106] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:26:02,192][04111] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:26:02,210][04113] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:26:02,204][04112] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:26:02,257][04109] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:26:02,323][04107] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:26:03,594][04109] Decorrelating experience for 0 frames... [2025-02-20 15:26:03,592][04108] Decorrelating experience for 0 frames... [2025-02-20 15:26:03,594][04113] Decorrelating experience for 0 frames... [2025-02-20 15:26:03,595][04107] Decorrelating experience for 0 frames... [2025-02-20 15:26:03,593][04110] Decorrelating experience for 0 frames... [2025-02-20 15:26:04,345][04108] Decorrelating experience for 32 frames... [2025-02-20 15:26:04,348][04109] Decorrelating experience for 32 frames... [2025-02-20 15:26:04,874][04113] Decorrelating experience for 32 frames... [2025-02-20 15:26:04,886][04110] Decorrelating experience for 32 frames... [2025-02-20 15:26:04,982][04111] Decorrelating experience for 0 frames... [2025-02-20 15:26:05,800][00228] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-20 15:26:06,193][04107] Decorrelating experience for 32 frames... [2025-02-20 15:26:06,409][04112] Decorrelating experience for 0 frames... [2025-02-20 15:26:06,451][04109] Decorrelating experience for 64 frames... [2025-02-20 15:26:06,453][04108] Decorrelating experience for 64 frames... [2025-02-20 15:26:06,694][04111] Decorrelating experience for 32 frames... [2025-02-20 15:26:06,988][04110] Decorrelating experience for 64 frames... [2025-02-20 15:26:06,992][04113] Decorrelating experience for 64 frames... [2025-02-20 15:26:07,499][04108] Decorrelating experience for 96 frames... [2025-02-20 15:26:07,504][04109] Decorrelating experience for 96 frames... [2025-02-20 15:26:08,049][04113] Decorrelating experience for 96 frames... [2025-02-20 15:26:08,061][04110] Decorrelating experience for 96 frames... [2025-02-20 15:26:08,130][04112] Decorrelating experience for 32 frames... [2025-02-20 15:26:08,651][04111] Decorrelating experience for 64 frames... [2025-02-20 15:26:10,130][04112] Decorrelating experience for 64 frames... [2025-02-20 15:26:10,802][00228] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 4.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-20 15:26:10,804][00228] Avg episode reward: [(0, '1.672')] [2025-02-20 15:26:11,300][04107] Decorrelating experience for 64 frames... [2025-02-20 15:26:11,502][04111] Decorrelating experience for 96 frames... [2025-02-20 15:26:14,046][04092] Signal inference workers to stop experience collection... [2025-02-20 15:26:14,055][04105] InferenceWorker_p0-w0: stopping experience collection [2025-02-20 15:26:14,277][04112] Decorrelating experience for 96 frames... [2025-02-20 15:26:14,306][04107] Decorrelating experience for 96 frames... [2025-02-20 15:26:15,709][04092] Signal inference workers to resume experience collection... [2025-02-20 15:26:15,711][04105] InferenceWorker_p0-w0: resuming experience collection [2025-02-20 15:26:15,800][00228] Fps is (10 sec: 409.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 4096. Throughput: 0: 234.2. Samples: 2342. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-02-20 15:26:15,803][00228] Avg episode reward: [(0, '2.772')] [2025-02-20 15:26:20,800][00228] Fps is (10 sec: 2867.8, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 28672. Throughput: 0: 466.9. Samples: 7004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:26:20,801][00228] Avg episode reward: [(0, '3.705')] [2025-02-20 15:26:23,948][04105] Updated weights for policy 0, policy_version 10 (0.0098) [2025-02-20 15:26:25,801][00228] Fps is (10 sec: 3686.1, 60 sec: 2047.9, 300 sec: 2047.9). Total num frames: 40960. Throughput: 0: 484.0. Samples: 9680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-20 15:26:25,802][00228] Avg episode reward: [(0, '4.196')] [2025-02-20 15:26:30,800][00228] Fps is (10 sec: 3686.4, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 65536. Throughput: 0: 593.4. Samples: 14836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:26:30,801][00228] Avg episode reward: [(0, '4.308')] [2025-02-20 15:26:34,361][04105] Updated weights for policy 0, policy_version 20 (0.0014) [2025-02-20 15:26:35,800][00228] Fps is (10 sec: 4505.9, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 86016. Throughput: 0: 723.5. Samples: 21706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:26:35,808][00228] Avg episode reward: [(0, '4.380')] [2025-02-20 15:26:40,801][00228] Fps is (10 sec: 3686.0, 60 sec: 2925.6, 300 sec: 2925.6). Total num frames: 102400. Throughput: 0: 693.0. Samples: 24254. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:26:40,804][00228] Avg episode reward: [(0, '4.474')] [2025-02-20 15:26:40,806][04092] Saving new best policy, reward=4.474! [2025-02-20 15:26:45,320][04105] Updated weights for policy 0, policy_version 30 (0.0012) [2025-02-20 15:26:45,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 122880. Throughput: 0: 739.9. Samples: 29596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:26:45,806][00228] Avg episode reward: [(0, '4.389')] [2025-02-20 15:26:50,800][00228] Fps is (10 sec: 4505.9, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 147456. Throughput: 0: 810.9. Samples: 36492. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:26:50,801][00228] Avg episode reward: [(0, '4.488')] [2025-02-20 15:26:50,805][04092] Saving new best policy, reward=4.488! [2025-02-20 15:26:55,800][00228] Fps is (10 sec: 2867.2, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 151552. Throughput: 0: 860.7. Samples: 38748. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:26:55,801][00228] Avg episode reward: [(0, '4.524')] [2025-02-20 15:26:55,807][04092] Saving new best policy, reward=4.524! [2025-02-20 15:26:59,682][04105] Updated weights for policy 0, policy_version 40 (0.0015) [2025-02-20 15:27:00,800][00228] Fps is (10 sec: 1638.4, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 859.4. Samples: 41014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:27:00,801][00228] Avg episode reward: [(0, '4.526')] [2025-02-20 15:27:00,805][04092] Saving new best policy, reward=4.526! [2025-02-20 15:27:05,801][00228] Fps is (10 sec: 3276.6, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 879.0. Samples: 46558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:27:05,802][00228] Avg episode reward: [(0, '4.323')] [2025-02-20 15:27:10,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3087.8). Total num frames: 200704. Throughput: 0: 871.8. Samples: 48910. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:27:10,801][00228] Avg episode reward: [(0, '4.419')] [2025-02-20 15:27:11,236][04105] Updated weights for policy 0, policy_version 50 (0.0026) [2025-02-20 15:27:15,800][00228] Fps is (10 sec: 4096.3, 60 sec: 3686.4, 300 sec: 3218.3). Total num frames: 225280. Throughput: 0: 889.1. Samples: 54846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:27:15,801][00228] Avg episode reward: [(0, '4.563')] [2025-02-20 15:27:15,806][04092] Saving new best policy, reward=4.563! [2025-02-20 15:27:20,115][04105] Updated weights for policy 0, policy_version 60 (0.0016) [2025-02-20 15:27:20,801][00228] Fps is (10 sec: 4505.0, 60 sec: 3618.1, 300 sec: 3276.7). Total num frames: 245760. Throughput: 0: 888.9. Samples: 61706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:27:20,807][00228] Avg episode reward: [(0, '4.414')] [2025-02-20 15:27:25,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 879.6. Samples: 63836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:27:25,801][00228] Avg episode reward: [(0, '4.365')] [2025-02-20 15:27:30,627][04105] Updated weights for policy 0, policy_version 70 (0.0017) [2025-02-20 15:27:30,800][00228] Fps is (10 sec: 4096.5, 60 sec: 3686.4, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 900.5. Samples: 70120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:27:30,804][00228] Avg episode reward: [(0, '4.446')] [2025-02-20 15:27:35,800][00228] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3413.3). Total num frames: 307200. Throughput: 0: 894.8. Samples: 76758. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:27:35,804][00228] Avg episode reward: [(0, '4.661')] [2025-02-20 15:27:35,809][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... [2025-02-20 15:27:35,958][04092] Saving new best policy, reward=4.661! [2025-02-20 15:27:40,800][00228] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3406.1). Total num frames: 323584. Throughput: 0: 887.7. Samples: 78696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:27:40,808][00228] Avg episode reward: [(0, '4.803')] [2025-02-20 15:27:40,810][04092] Saving new best policy, reward=4.803! [2025-02-20 15:27:41,520][04105] Updated weights for policy 0, policy_version 80 (0.0026) [2025-02-20 15:27:45,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3440.6). Total num frames: 344064. Throughput: 0: 980.2. Samples: 85122. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:27:45,801][00228] Avg episode reward: [(0, '4.501')] [2025-02-20 15:27:50,800][00228] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3471.8). Total num frames: 364544. Throughput: 0: 1000.8. Samples: 91594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:27:50,802][00228] Avg episode reward: [(0, '4.388')] [2025-02-20 15:27:51,306][04105] Updated weights for policy 0, policy_version 90 (0.0031) [2025-02-20 15:27:55,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3500.2). Total num frames: 385024. Throughput: 0: 995.0. Samples: 93686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:27:55,804][00228] Avg episode reward: [(0, '4.312')] [2025-02-20 15:28:00,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3526.1). Total num frames: 405504. Throughput: 0: 1015.5. Samples: 100544. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:28:00,804][00228] Avg episode reward: [(0, '4.480')] [2025-02-20 15:28:00,993][04105] Updated weights for policy 0, policy_version 100 (0.0028) [2025-02-20 15:28:05,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 996.8. Samples: 106562. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:28:05,802][00228] Avg episode reward: [(0, '4.602')] [2025-02-20 15:28:10,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3538.9). Total num frames: 442368. Throughput: 0: 992.0. Samples: 108474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:28:10,804][00228] Avg episode reward: [(0, '4.540')] [2025-02-20 15:28:12,026][04105] Updated weights for policy 0, policy_version 110 (0.0012) [2025-02-20 15:28:15,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3591.9). Total num frames: 466944. Throughput: 0: 1007.4. Samples: 115454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:28:15,805][00228] Avg episode reward: [(0, '4.514')] [2025-02-20 15:28:20,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 988.5. Samples: 121242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:28:20,801][00228] Avg episode reward: [(0, '4.693')] [2025-02-20 15:28:22,399][04105] Updated weights for policy 0, policy_version 120 (0.0022) [2025-02-20 15:28:25,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3598.6). Total num frames: 503808. Throughput: 0: 1005.3. Samples: 123934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:28:25,803][00228] Avg episode reward: [(0, '4.698')] [2025-02-20 15:28:30,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3644.0). Total num frames: 528384. Throughput: 0: 1017.0. Samples: 130888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:28:30,801][00228] Avg episode reward: [(0, '4.459')] [2025-02-20 15:28:31,257][04105] Updated weights for policy 0, policy_version 130 (0.0012) [2025-02-20 15:28:35,800][00228] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3631.8). Total num frames: 544768. Throughput: 0: 994.0. Samples: 136326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:28:35,802][00228] Avg episode reward: [(0, '4.251')] [2025-02-20 15:28:40,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 1012.1. Samples: 139230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:28:40,801][00228] Avg episode reward: [(0, '4.319')] [2025-02-20 15:28:42,074][04105] Updated weights for policy 0, policy_version 140 (0.0015) [2025-02-20 15:28:45,800][00228] Fps is (10 sec: 4505.8, 60 sec: 4096.0, 300 sec: 3686.4). Total num frames: 589824. Throughput: 0: 1010.1. Samples: 146000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:28:45,802][00228] Avg episode reward: [(0, '4.415')] [2025-02-20 15:28:50,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3649.2). Total num frames: 602112. Throughput: 0: 990.0. Samples: 151114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:28:50,801][00228] Avg episode reward: [(0, '4.428')] [2025-02-20 15:28:52,585][04105] Updated weights for policy 0, policy_version 150 (0.0013) [2025-02-20 15:28:55,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 626688. Throughput: 0: 1021.4. Samples: 154438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:28:55,803][00228] Avg episode reward: [(0, '4.573')] [2025-02-20 15:29:00,800][00228] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3721.5). Total num frames: 651264. Throughput: 0: 1020.8. Samples: 161392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:29:00,803][00228] Avg episode reward: [(0, '4.646')] [2025-02-20 15:29:01,952][04105] Updated weights for policy 0, policy_version 160 (0.0013) [2025-02-20 15:29:05,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3709.2). Total num frames: 667648. Throughput: 0: 1003.2. Samples: 166386. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:29:05,808][00228] Avg episode reward: [(0, '4.938')] [2025-02-20 15:29:05,815][04092] Saving new best policy, reward=4.938! [2025-02-20 15:29:10,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3719.6). Total num frames: 688128. Throughput: 0: 1019.1. Samples: 169792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:29:10,805][00228] Avg episode reward: [(0, '4.860')] [2025-02-20 15:29:12,086][04105] Updated weights for policy 0, policy_version 170 (0.0021) [2025-02-20 15:29:15,804][00228] Fps is (10 sec: 4094.4, 60 sec: 4027.5, 300 sec: 3729.4). Total num frames: 708608. Throughput: 0: 1015.4. Samples: 176586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:29:15,807][00228] Avg episode reward: [(0, '4.829')] [2025-02-20 15:29:20,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3717.9). Total num frames: 724992. Throughput: 0: 1006.7. Samples: 181626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:29:20,805][00228] Avg episode reward: [(0, '5.138')] [2025-02-20 15:29:20,808][04092] Saving new best policy, reward=5.138! [2025-02-20 15:29:22,726][04105] Updated weights for policy 0, policy_version 180 (0.0018) [2025-02-20 15:29:25,800][00228] Fps is (10 sec: 4097.6, 60 sec: 4096.0, 300 sec: 3747.8). Total num frames: 749568. Throughput: 0: 1018.0. Samples: 185040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:29:25,804][00228] Avg episode reward: [(0, '5.036')] [2025-02-20 15:29:30,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3756.3). Total num frames: 770048. Throughput: 0: 1021.7. Samples: 191976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:29:30,804][00228] Avg episode reward: [(0, '4.695')] [2025-02-20 15:29:33,184][04105] Updated weights for policy 0, policy_version 190 (0.0019) [2025-02-20 15:29:35,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 1018.1. Samples: 196928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:29:35,801][00228] Avg episode reward: [(0, '4.679')] [2025-02-20 15:29:35,810][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth... [2025-02-20 15:29:40,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3772.1). Total num frames: 811008. Throughput: 0: 1016.1. Samples: 200162. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:29:40,804][00228] Avg episode reward: [(0, '5.308')] [2025-02-20 15:29:40,806][04092] Saving new best policy, reward=5.308! [2025-02-20 15:29:42,389][04105] Updated weights for policy 0, policy_version 200 (0.0015) [2025-02-20 15:29:45,800][00228] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3760.9). Total num frames: 827392. Throughput: 0: 1008.3. Samples: 206766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:29:45,803][00228] Avg episode reward: [(0, '5.360')] [2025-02-20 15:29:45,810][04092] Saving new best policy, reward=5.360! [2025-02-20 15:29:50,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3768.3). Total num frames: 847872. Throughput: 0: 1010.2. Samples: 211846. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:29:50,804][00228] Avg episode reward: [(0, '5.202')] [2025-02-20 15:29:53,195][04105] Updated weights for policy 0, policy_version 210 (0.0016) [2025-02-20 15:29:55,800][00228] Fps is (10 sec: 4505.8, 60 sec: 4096.0, 300 sec: 3793.3). Total num frames: 872448. Throughput: 0: 1011.6. Samples: 215312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:29:55,801][00228] Avg episode reward: [(0, '5.338')] [2025-02-20 15:30:00,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3782.3). Total num frames: 888832. Throughput: 0: 1004.9. Samples: 221802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:30:00,804][00228] Avg episode reward: [(0, '5.331')] [2025-02-20 15:30:03,589][04105] Updated weights for policy 0, policy_version 220 (0.0028) [2025-02-20 15:30:05,801][00228] Fps is (10 sec: 3686.1, 60 sec: 4027.7, 300 sec: 3788.8). Total num frames: 909312. Throughput: 0: 1012.1. Samples: 227172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:30:05,802][00228] Avg episode reward: [(0, '5.306')] [2025-02-20 15:30:10,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3795.1). Total num frames: 929792. Throughput: 0: 1015.5. Samples: 230736. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:30:10,805][00228] Avg episode reward: [(0, '5.697')] [2025-02-20 15:30:10,829][04092] Saving new best policy, reward=5.697! [2025-02-20 15:30:13,165][04105] Updated weights for policy 0, policy_version 230 (0.0014) [2025-02-20 15:30:15,800][00228] Fps is (10 sec: 4096.3, 60 sec: 4028.0, 300 sec: 3801.1). Total num frames: 950272. Throughput: 0: 991.7. Samples: 236604. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:30:15,804][00228] Avg episode reward: [(0, '5.541')] [2025-02-20 15:30:20,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3806.9). Total num frames: 970752. Throughput: 0: 1011.6. Samples: 242448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:30:20,801][00228] Avg episode reward: [(0, '5.245')] [2025-02-20 15:30:23,131][04105] Updated weights for policy 0, policy_version 240 (0.0014) [2025-02-20 15:30:25,802][00228] Fps is (10 sec: 4504.8, 60 sec: 4095.9, 300 sec: 3828.2). Total num frames: 995328. Throughput: 0: 1017.6. Samples: 245958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:30:25,803][00228] Avg episode reward: [(0, '5.592')] [2025-02-20 15:30:30,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3802.3). Total num frames: 1007616. Throughput: 0: 1004.0. Samples: 251944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:30:30,801][00228] Avg episode reward: [(0, '5.883')] [2025-02-20 15:30:30,803][04092] Saving new best policy, reward=5.883! [2025-02-20 15:30:33,677][04105] Updated weights for policy 0, policy_version 250 (0.0014) [2025-02-20 15:30:35,800][00228] Fps is (10 sec: 3687.1, 60 sec: 4096.0, 300 sec: 3822.9). Total num frames: 1032192. Throughput: 0: 1023.9. Samples: 257920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:30:35,804][00228] Avg episode reward: [(0, '5.598')] [2025-02-20 15:30:40,800][00228] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3842.8). Total num frames: 1056768. Throughput: 0: 1025.2. Samples: 261444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:30:40,804][00228] Avg episode reward: [(0, '6.098')] [2025-02-20 15:30:40,807][04092] Saving new best policy, reward=6.098! [2025-02-20 15:30:43,704][04105] Updated weights for policy 0, policy_version 260 (0.0013) [2025-02-20 15:30:45,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3818.1). Total num frames: 1069056. Throughput: 0: 1001.8. Samples: 266882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:30:45,804][00228] Avg episode reward: [(0, '6.372')] [2025-02-20 15:30:45,820][04092] Saving new best policy, reward=6.372! [2025-02-20 15:30:50,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3837.3). Total num frames: 1093632. Throughput: 0: 1024.2. Samples: 273258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:30:50,806][00228] Avg episode reward: [(0, '6.235')] [2025-02-20 15:30:53,106][04105] Updated weights for policy 0, policy_version 270 (0.0012) [2025-02-20 15:30:55,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3841.8). Total num frames: 1114112. Throughput: 0: 1023.2. Samples: 276782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:30:55,801][00228] Avg episode reward: [(0, '6.158')] [2025-02-20 15:31:00,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 1008.4. Samples: 281982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:31:00,804][00228] Avg episode reward: [(0, '6.552')] [2025-02-20 15:31:00,857][04092] Saving new best policy, reward=6.552! [2025-02-20 15:31:03,744][04105] Updated weights for policy 0, policy_version 280 (0.0012) [2025-02-20 15:31:05,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3915.5). Total num frames: 1155072. Throughput: 0: 1027.3. Samples: 288676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:31:05,804][00228] Avg episode reward: [(0, '7.201')] [2025-02-20 15:31:05,810][04092] Saving new best policy, reward=7.201! [2025-02-20 15:31:10,806][00228] Fps is (10 sec: 4502.9, 60 sec: 4095.6, 300 sec: 3971.0). Total num frames: 1175552. Throughput: 0: 1024.4. Samples: 292062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:31:10,808][00228] Avg episode reward: [(0, '7.212')] [2025-02-20 15:31:10,809][04092] Saving new best policy, reward=7.212! [2025-02-20 15:31:14,282][04105] Updated weights for policy 0, policy_version 290 (0.0018) [2025-02-20 15:31:15,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1191936. Throughput: 0: 1001.6. Samples: 297018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:31:15,805][00228] Avg episode reward: [(0, '7.605')] [2025-02-20 15:31:15,812][04092] Saving new best policy, reward=7.605! [2025-02-20 15:31:20,800][00228] Fps is (10 sec: 4098.4, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1216512. Throughput: 0: 1021.4. Samples: 303882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:31:20,802][00228] Avg episode reward: [(0, '8.146')] [2025-02-20 15:31:20,804][04092] Saving new best policy, reward=8.146! [2025-02-20 15:31:23,382][04105] Updated weights for policy 0, policy_version 300 (0.0014) [2025-02-20 15:31:25,800][00228] Fps is (10 sec: 4505.5, 60 sec: 4027.8, 300 sec: 3971.0). Total num frames: 1236992. Throughput: 0: 1019.2. Samples: 307310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:31:25,803][00228] Avg episode reward: [(0, '8.109')] [2025-02-20 15:31:30,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1253376. Throughput: 0: 1009.5. Samples: 312310. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-20 15:31:30,805][00228] Avg episode reward: [(0, '8.388')] [2025-02-20 15:31:30,809][04092] Saving new best policy, reward=8.388! [2025-02-20 15:31:33,882][04105] Updated weights for policy 0, policy_version 310 (0.0017) [2025-02-20 15:31:35,800][00228] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1277952. Throughput: 0: 1019.4. Samples: 319132. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:31:35,801][00228] Avg episode reward: [(0, '9.937')] [2025-02-20 15:31:35,813][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth... [2025-02-20 15:31:35,934][04092] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth [2025-02-20 15:31:35,943][04092] Saving new best policy, reward=9.937! [2025-02-20 15:31:40,801][00228] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 1294336. Throughput: 0: 1013.1. Samples: 322374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:31:40,807][00228] Avg episode reward: [(0, '9.694')] [2025-02-20 15:31:44,708][04105] Updated weights for policy 0, policy_version 320 (0.0012) [2025-02-20 15:31:45,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1314816. Throughput: 0: 1005.2. Samples: 327216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:31:45,801][00228] Avg episode reward: [(0, '9.705')] [2025-02-20 15:31:50,800][00228] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1335296. Throughput: 0: 1010.9. Samples: 334166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:31:50,801][00228] Avg episode reward: [(0, '9.602')] [2025-02-20 15:31:54,483][04105] Updated weights for policy 0, policy_version 330 (0.0013) [2025-02-20 15:31:55,800][00228] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1351680. Throughput: 0: 1006.9. Samples: 337368. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-20 15:31:55,804][00228] Avg episode reward: [(0, '10.735')] [2025-02-20 15:31:55,827][04092] Saving new best policy, reward=10.735! [2025-02-20 15:32:00,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1372160. Throughput: 0: 1008.3. Samples: 342392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:32:00,806][00228] Avg episode reward: [(0, '11.738')] [2025-02-20 15:32:00,812][04092] Saving new best policy, reward=11.738! [2025-02-20 15:32:04,674][04105] Updated weights for policy 0, policy_version 340 (0.0018) [2025-02-20 15:32:05,800][00228] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1396736. Throughput: 0: 1004.8. Samples: 349098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:32:05,805][00228] Avg episode reward: [(0, '11.341')] [2025-02-20 15:32:10,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 4026.6). Total num frames: 1413120. Throughput: 0: 995.0. Samples: 352084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:32:10,803][00228] Avg episode reward: [(0, '10.712')] [2025-02-20 15:32:15,050][04105] Updated weights for policy 0, policy_version 350 (0.0012) [2025-02-20 15:32:15,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1433600. Throughput: 0: 1001.6. Samples: 357382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:32:15,802][00228] Avg episode reward: [(0, '10.631')] [2025-02-20 15:32:20,800][00228] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1458176. Throughput: 0: 1004.1. Samples: 364318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:32:20,805][00228] Avg episode reward: [(0, '11.765')] [2025-02-20 15:32:20,809][04092] Saving new best policy, reward=11.765! [2025-02-20 15:32:25,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1470464. Throughput: 0: 992.6. Samples: 367038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:32:25,804][00228] Avg episode reward: [(0, '12.502')] [2025-02-20 15:32:25,813][04092] Saving new best policy, reward=12.502! [2025-02-20 15:32:26,036][04105] Updated weights for policy 0, policy_version 360 (0.0013) [2025-02-20 15:32:30,801][00228] Fps is (10 sec: 3686.1, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1495040. Throughput: 0: 1006.4. Samples: 372506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:32:30,807][00228] Avg episode reward: [(0, '12.685')] [2025-02-20 15:32:30,810][04092] Saving new best policy, reward=12.685! [2025-02-20 15:32:34,858][04105] Updated weights for policy 0, policy_version 370 (0.0015) [2025-02-20 15:32:35,800][00228] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1519616. Throughput: 0: 1005.6. Samples: 379418. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:32:35,803][00228] Avg episode reward: [(0, '12.100')] [2025-02-20 15:32:40,800][00228] Fps is (10 sec: 3686.8, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1531904. Throughput: 0: 988.6. Samples: 381854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:32:40,805][00228] Avg episode reward: [(0, '11.743')] [2025-02-20 15:32:45,463][04105] Updated weights for policy 0, policy_version 380 (0.0023) [2025-02-20 15:32:45,800][00228] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1556480. Throughput: 0: 1008.0. Samples: 387752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:32:45,802][00228] Avg episode reward: [(0, '12.469')] [2025-02-20 15:32:50,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1576960. Throughput: 0: 1014.3. Samples: 394740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:32:50,804][00228] Avg episode reward: [(0, '12.818')] [2025-02-20 15:32:50,808][04092] Saving new best policy, reward=12.818! [2025-02-20 15:32:55,800][00228] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1593344. Throughput: 0: 996.5. Samples: 396928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:32:55,801][00228] Avg episode reward: [(0, '11.921')] [2025-02-20 15:32:55,870][04105] Updated weights for policy 0, policy_version 390 (0.0015) [2025-02-20 15:33:00,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1617920. Throughput: 0: 1016.3. Samples: 403116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:33:00,802][00228] Avg episode reward: [(0, '11.991')] [2025-02-20 15:33:05,148][04105] Updated weights for policy 0, policy_version 400 (0.0028) [2025-02-20 15:33:05,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1638400. Throughput: 0: 1011.5. Samples: 409836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:33:05,803][00228] Avg episode reward: [(0, '12.854')] [2025-02-20 15:33:05,811][04092] Saving new best policy, reward=12.854! [2025-02-20 15:33:10,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1654784. Throughput: 0: 996.4. Samples: 411874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:33:10,804][00228] Avg episode reward: [(0, '13.291')] [2025-02-20 15:33:10,808][04092] Saving new best policy, reward=13.291! [2025-02-20 15:33:15,488][04105] Updated weights for policy 0, policy_version 410 (0.0019) [2025-02-20 15:33:15,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1679360. Throughput: 0: 1016.6. Samples: 418250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:33:15,801][00228] Avg episode reward: [(0, '14.204')] [2025-02-20 15:33:15,811][04092] Saving new best policy, reward=14.204! [2025-02-20 15:33:20,800][00228] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1699840. Throughput: 0: 1007.8. Samples: 424768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:33:20,802][00228] Avg episode reward: [(0, '15.673')] [2025-02-20 15:33:20,803][04092] Saving new best policy, reward=15.673! [2025-02-20 15:33:25,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1716224. Throughput: 0: 999.9. Samples: 426848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:33:25,801][00228] Avg episode reward: [(0, '16.234')] [2025-02-20 15:33:25,807][04092] Saving new best policy, reward=16.234! [2025-02-20 15:33:26,222][04105] Updated weights for policy 0, policy_version 420 (0.0024) [2025-02-20 15:33:30,800][00228] Fps is (10 sec: 4096.1, 60 sec: 4096.1, 300 sec: 4054.4). Total num frames: 1740800. Throughput: 0: 1021.5. Samples: 433720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:33:30,801][00228] Avg episode reward: [(0, '16.734')] [2025-02-20 15:33:30,804][04092] Saving new best policy, reward=16.734! [2025-02-20 15:33:35,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1757184. Throughput: 0: 1001.2. Samples: 439796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:33:35,801][00228] Avg episode reward: [(0, '16.959')] [2025-02-20 15:33:35,813][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000429_1757184.pth... [2025-02-20 15:33:35,990][04092] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth [2025-02-20 15:33:36,005][04092] Saving new best policy, reward=16.959! [2025-02-20 15:33:36,387][04105] Updated weights for policy 0, policy_version 430 (0.0025) [2025-02-20 15:33:40,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1777664. Throughput: 0: 1003.3. Samples: 442076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:33:40,804][00228] Avg episode reward: [(0, '15.889')] [2025-02-20 15:33:45,716][04105] Updated weights for policy 0, policy_version 440 (0.0014) [2025-02-20 15:33:45,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1802240. Throughput: 0: 1016.8. Samples: 448872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:33:45,804][00228] Avg episode reward: [(0, '16.122')] [2025-02-20 15:33:50,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1818624. Throughput: 0: 996.2. Samples: 454664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:33:50,804][00228] Avg episode reward: [(0, '16.544')] [2025-02-20 15:33:55,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1839104. Throughput: 0: 1011.1. Samples: 457372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:33:55,804][00228] Avg episode reward: [(0, '15.672')] [2025-02-20 15:33:56,340][04105] Updated weights for policy 0, policy_version 450 (0.0014) [2025-02-20 15:34:00,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1863680. Throughput: 0: 1024.7. Samples: 464360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:34:00,802][00228] Avg episode reward: [(0, '16.042')] [2025-02-20 15:34:05,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1875968. Throughput: 0: 1000.4. Samples: 469788. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:34:05,801][00228] Avg episode reward: [(0, '15.693')] [2025-02-20 15:34:06,707][04105] Updated weights for policy 0, policy_version 460 (0.0018) [2025-02-20 15:34:10,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1900544. Throughput: 0: 1018.6. Samples: 472684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:34:10,801][00228] Avg episode reward: [(0, '16.136')] [2025-02-20 15:34:15,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1921024. Throughput: 0: 1010.2. Samples: 479180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:34:15,803][00228] Avg episode reward: [(0, '16.625')] [2025-02-20 15:34:16,427][04105] Updated weights for policy 0, policy_version 470 (0.0022) [2025-02-20 15:34:20,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1937408. Throughput: 0: 984.5. Samples: 484098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:34:20,804][00228] Avg episode reward: [(0, '18.391')] [2025-02-20 15:34:20,808][04092] Saving new best policy, reward=18.391! [2025-02-20 15:34:25,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1957888. Throughput: 0: 998.8. Samples: 487020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:34:25,803][00228] Avg episode reward: [(0, '18.762')] [2025-02-20 15:34:25,812][04092] Saving new best policy, reward=18.762! [2025-02-20 15:34:27,411][04105] Updated weights for policy 0, policy_version 480 (0.0035) [2025-02-20 15:34:30,802][00228] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 4040.4). Total num frames: 1978368. Throughput: 0: 993.7. Samples: 493590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:34:30,803][00228] Avg episode reward: [(0, '20.096')] [2025-02-20 15:34:30,805][04092] Saving new best policy, reward=20.096! [2025-02-20 15:34:35,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1994752. Throughput: 0: 972.6. Samples: 498432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:34:35,801][00228] Avg episode reward: [(0, '20.280')] [2025-02-20 15:34:35,807][04092] Saving new best policy, reward=20.280! [2025-02-20 15:34:38,569][04105] Updated weights for policy 0, policy_version 490 (0.0016) [2025-02-20 15:34:40,800][00228] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2015232. Throughput: 0: 981.5. Samples: 501538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:34:40,801][00228] Avg episode reward: [(0, '19.112')] [2025-02-20 15:34:45,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2035712. Throughput: 0: 968.8. Samples: 507958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:34:45,804][00228] Avg episode reward: [(0, '18.099')] [2025-02-20 15:34:49,561][04105] Updated weights for policy 0, policy_version 500 (0.0014) [2025-02-20 15:34:50,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2052096. Throughput: 0: 954.8. Samples: 512754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:34:50,805][00228] Avg episode reward: [(0, '16.498')] [2025-02-20 15:34:55,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2072576. Throughput: 0: 967.4. Samples: 516216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:34:55,805][00228] Avg episode reward: [(0, '14.865')] [2025-02-20 15:34:58,495][04105] Updated weights for policy 0, policy_version 510 (0.0020) [2025-02-20 15:35:00,800][00228] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2097152. Throughput: 0: 977.8. Samples: 523180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:35:00,803][00228] Avg episode reward: [(0, '14.647')] [2025-02-20 15:35:05,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2113536. Throughput: 0: 978.5. Samples: 528130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:35:05,805][00228] Avg episode reward: [(0, '15.814')] [2025-02-20 15:35:09,030][04105] Updated weights for policy 0, policy_version 520 (0.0012) [2025-02-20 15:35:10,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2138112. Throughput: 0: 990.7. Samples: 531600. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:35:10,801][00228] Avg episode reward: [(0, '15.163')] [2025-02-20 15:35:15,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2154496. Throughput: 0: 998.1. Samples: 538504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:35:15,801][00228] Avg episode reward: [(0, '15.337')] [2025-02-20 15:35:19,635][04105] Updated weights for policy 0, policy_version 530 (0.0013) [2025-02-20 15:35:20,800][00228] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2174976. Throughput: 0: 1001.9. Samples: 543516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:35:20,801][00228] Avg episode reward: [(0, '15.248')] [2025-02-20 15:35:25,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2199552. Throughput: 0: 1011.3. Samples: 547048. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:35:25,803][00228] Avg episode reward: [(0, '15.617')] [2025-02-20 15:35:28,447][04105] Updated weights for policy 0, policy_version 540 (0.0020) [2025-02-20 15:35:30,803][00228] Fps is (10 sec: 4094.8, 60 sec: 3959.4, 300 sec: 4012.7). Total num frames: 2215936. Throughput: 0: 1017.6. Samples: 553754. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:35:30,807][00228] Avg episode reward: [(0, '17.250')] [2025-02-20 15:35:35,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2236416. Throughput: 0: 1027.8. Samples: 559006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:35:35,804][00228] Avg episode reward: [(0, '17.355')] [2025-02-20 15:35:35,810][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000546_2236416.pth... [2025-02-20 15:35:35,934][04092] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth [2025-02-20 15:35:39,005][04105] Updated weights for policy 0, policy_version 550 (0.0014) [2025-02-20 15:35:40,800][00228] Fps is (10 sec: 4506.9, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2260992. Throughput: 0: 1026.4. Samples: 562406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:35:40,801][00228] Avg episode reward: [(0, '20.064')] [2025-02-20 15:35:45,800][00228] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2277376. Throughput: 0: 1010.9. Samples: 568670. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:35:45,802][00228] Avg episode reward: [(0, '20.071')] [2025-02-20 15:35:49,591][04105] Updated weights for policy 0, policy_version 560 (0.0019) [2025-02-20 15:35:50,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 2297856. Throughput: 0: 1026.1. Samples: 574304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:35:50,801][00228] Avg episode reward: [(0, '21.964')] [2025-02-20 15:35:50,807][04092] Saving new best policy, reward=21.964! [2025-02-20 15:35:55,800][00228] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 2322432. Throughput: 0: 1024.4. Samples: 577700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:35:55,801][00228] Avg episode reward: [(0, '21.629')] [2025-02-20 15:35:59,206][04105] Updated weights for policy 0, policy_version 570 (0.0027) [2025-02-20 15:36:00,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2338816. Throughput: 0: 1008.8. Samples: 583900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:36:00,802][00228] Avg episode reward: [(0, '21.379')] [2025-02-20 15:36:05,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4012.8). Total num frames: 2359296. Throughput: 0: 1028.0. Samples: 589774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:36:05,801][00228] Avg episode reward: [(0, '21.954')] [2025-02-20 15:36:08,826][04105] Updated weights for policy 0, policy_version 580 (0.0013) [2025-02-20 15:36:10,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2383872. Throughput: 0: 1029.0. Samples: 593352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:36:10,801][00228] Avg episode reward: [(0, '21.910')] [2025-02-20 15:36:15,801][00228] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2396160. Throughput: 0: 1009.9. Samples: 599196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:36:15,805][00228] Avg episode reward: [(0, '20.757')] [2025-02-20 15:36:19,400][04105] Updated weights for policy 0, policy_version 590 (0.0021) [2025-02-20 15:36:20,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 2420736. Throughput: 0: 1028.8. Samples: 605304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:36:20,807][00228] Avg episode reward: [(0, '22.260')] [2025-02-20 15:36:20,808][04092] Saving new best policy, reward=22.260! [2025-02-20 15:36:25,800][00228] Fps is (10 sec: 4915.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2445312. Throughput: 0: 1028.9. Samples: 608706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:36:25,802][00228] Avg episode reward: [(0, '23.766')] [2025-02-20 15:36:25,812][04092] Saving new best policy, reward=23.766! [2025-02-20 15:36:29,812][04105] Updated weights for policy 0, policy_version 600 (0.0022) [2025-02-20 15:36:30,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4012.7). Total num frames: 2461696. Throughput: 0: 1012.9. Samples: 614250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:36:30,801][00228] Avg episode reward: [(0, '22.357')] [2025-02-20 15:36:35,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2482176. Throughput: 0: 1030.6. Samples: 620680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:36:35,802][00228] Avg episode reward: [(0, '21.885')] [2025-02-20 15:36:38,733][04105] Updated weights for policy 0, policy_version 610 (0.0026) [2025-02-20 15:36:40,805][00228] Fps is (10 sec: 4503.4, 60 sec: 4095.7, 300 sec: 4040.4). Total num frames: 2506752. Throughput: 0: 1031.0. Samples: 624102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:36:40,813][00228] Avg episode reward: [(0, '20.999')] [2025-02-20 15:36:45,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 2519040. Throughput: 0: 1010.3. Samples: 629364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:36:45,805][00228] Avg episode reward: [(0, '21.559')] [2025-02-20 15:36:49,470][04105] Updated weights for policy 0, policy_version 620 (0.0022) [2025-02-20 15:36:50,800][00228] Fps is (10 sec: 3688.2, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2543616. Throughput: 0: 1026.6. Samples: 635972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:36:50,801][00228] Avg episode reward: [(0, '22.577')] [2025-02-20 15:36:55,808][00228] Fps is (10 sec: 4502.0, 60 sec: 4027.2, 300 sec: 4040.4). Total num frames: 2564096. Throughput: 0: 1024.4. Samples: 639456. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:36:55,809][00228] Avg episode reward: [(0, '22.583')] [2025-02-20 15:36:59,828][04105] Updated weights for policy 0, policy_version 630 (0.0018) [2025-02-20 15:37:00,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2584576. Throughput: 0: 1009.0. Samples: 644600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:37:00,804][00228] Avg episode reward: [(0, '22.777')] [2025-02-20 15:37:05,800][00228] Fps is (10 sec: 4099.3, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2605056. Throughput: 0: 1025.3. Samples: 651444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:37:05,803][00228] Avg episode reward: [(0, '23.119')] [2025-02-20 15:37:08,596][04105] Updated weights for policy 0, policy_version 640 (0.0012) [2025-02-20 15:37:10,805][00228] Fps is (10 sec: 4094.0, 60 sec: 4027.4, 300 sec: 4040.4). Total num frames: 2625536. Throughput: 0: 1026.6. Samples: 654906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:37:10,807][00228] Avg episode reward: [(0, '23.551')] [2025-02-20 15:37:15,800][00228] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 2646016. Throughput: 0: 1012.7. Samples: 659824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:37:15,803][00228] Avg episode reward: [(0, '23.838')] [2025-02-20 15:37:15,809][04092] Saving new best policy, reward=23.838! [2025-02-20 15:37:19,369][04105] Updated weights for policy 0, policy_version 650 (0.0022) [2025-02-20 15:37:20,800][00228] Fps is (10 sec: 4098.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2666496. Throughput: 0: 1022.8. Samples: 666708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:37:20,801][00228] Avg episode reward: [(0, '24.458')] [2025-02-20 15:37:20,806][04092] Saving new best policy, reward=24.458! [2025-02-20 15:37:25,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2686976. Throughput: 0: 1022.1. Samples: 670092. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:37:25,805][00228] Avg episode reward: [(0, '25.110')] [2025-02-20 15:37:25,814][04092] Saving new best policy, reward=25.110! [2025-02-20 15:37:29,996][04105] Updated weights for policy 0, policy_version 660 (0.0033) [2025-02-20 15:37:30,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2703360. Throughput: 0: 1015.1. Samples: 675044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:37:30,802][00228] Avg episode reward: [(0, '24.513')] [2025-02-20 15:37:35,800][00228] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2727936. Throughput: 0: 1023.6. Samples: 682032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:37:35,804][00228] Avg episode reward: [(0, '24.346')] [2025-02-20 15:37:35,812][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000666_2727936.pth... [2025-02-20 15:37:35,929][04092] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000429_1757184.pth [2025-02-20 15:37:39,166][04105] Updated weights for policy 0, policy_version 670 (0.0018) [2025-02-20 15:37:40,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 4040.5). Total num frames: 2748416. Throughput: 0: 1022.9. Samples: 685480. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:37:40,801][00228] Avg episode reward: [(0, '23.657')] [2025-02-20 15:37:45,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2764800. Throughput: 0: 1014.9. Samples: 690270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:37:45,801][00228] Avg episode reward: [(0, '24.155')] [2025-02-20 15:37:49,536][04105] Updated weights for policy 0, policy_version 680 (0.0020) [2025-02-20 15:37:50,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2789376. Throughput: 0: 1016.3. Samples: 697176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:37:50,802][00228] Avg episode reward: [(0, '23.072')] [2025-02-20 15:37:55,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4028.3, 300 sec: 4026.6). Total num frames: 2805760. Throughput: 0: 1015.1. Samples: 700580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:37:55,803][00228] Avg episode reward: [(0, '23.644')] [2025-02-20 15:38:00,155][04105] Updated weights for policy 0, policy_version 690 (0.0015) [2025-02-20 15:38:00,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2826240. Throughput: 0: 1018.3. Samples: 705646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:38:00,804][00228] Avg episode reward: [(0, '21.734')] [2025-02-20 15:38:05,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2850816. Throughput: 0: 1020.2. Samples: 712616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:38:05,801][00228] Avg episode reward: [(0, '19.998')] [2025-02-20 15:38:10,080][04105] Updated weights for policy 0, policy_version 700 (0.0020) [2025-02-20 15:38:10,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4028.1, 300 sec: 4026.6). Total num frames: 2867200. Throughput: 0: 1015.9. Samples: 715808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:38:10,803][00228] Avg episode reward: [(0, '19.090')] [2025-02-20 15:38:15,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2887680. Throughput: 0: 1022.6. Samples: 721060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:38:15,805][00228] Avg episode reward: [(0, '18.047')] [2025-02-20 15:38:19,393][04105] Updated weights for policy 0, policy_version 710 (0.0027) [2025-02-20 15:38:20,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2912256. Throughput: 0: 1022.0. Samples: 728022. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:38:20,810][00228] Avg episode reward: [(0, '18.325')] [2025-02-20 15:38:25,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 2928640. Throughput: 0: 1009.7. Samples: 730916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:38:25,805][00228] Avg episode reward: [(0, '19.114')] [2025-02-20 15:38:29,945][04105] Updated weights for policy 0, policy_version 720 (0.0017) [2025-02-20 15:38:30,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 2953216. Throughput: 0: 1029.4. Samples: 736594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:38:30,801][00228] Avg episode reward: [(0, '18.324')] [2025-02-20 15:38:35,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2973696. Throughput: 0: 1031.6. Samples: 743598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:38:35,801][00228] Avg episode reward: [(0, '21.651')] [2025-02-20 15:38:40,323][04105] Updated weights for policy 0, policy_version 730 (0.0021) [2025-02-20 15:38:40,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2990080. Throughput: 0: 1013.0. Samples: 746164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:38:40,804][00228] Avg episode reward: [(0, '21.448')] [2025-02-20 15:38:45,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3010560. Throughput: 0: 1029.4. Samples: 751968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:38:45,801][00228] Avg episode reward: [(0, '21.868')] [2025-02-20 15:38:49,345][04105] Updated weights for policy 0, policy_version 740 (0.0022) [2025-02-20 15:38:50,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3035136. Throughput: 0: 1028.5. Samples: 758898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:38:50,806][00228] Avg episode reward: [(0, '22.601')] [2025-02-20 15:38:55,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3051520. Throughput: 0: 1006.0. Samples: 761078. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:38:55,803][00228] Avg episode reward: [(0, '21.976')] [2025-02-20 15:38:59,981][04105] Updated weights for policy 0, policy_version 750 (0.0019) [2025-02-20 15:39:00,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3072000. Throughput: 0: 1028.7. Samples: 767352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:39:00,804][00228] Avg episode reward: [(0, '21.959')] [2025-02-20 15:39:05,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3096576. Throughput: 0: 1020.9. Samples: 773962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:39:05,801][00228] Avg episode reward: [(0, '22.501')] [2025-02-20 15:39:10,561][04105] Updated weights for policy 0, policy_version 760 (0.0012) [2025-02-20 15:39:10,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3112960. Throughput: 0: 1003.9. Samples: 776092. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:39:10,802][00228] Avg episode reward: [(0, '22.264')] [2025-02-20 15:39:15,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3133440. Throughput: 0: 1021.9. Samples: 782580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:39:15,801][00228] Avg episode reward: [(0, '21.583')] [2025-02-20 15:39:19,754][04105] Updated weights for policy 0, policy_version 770 (0.0012) [2025-02-20 15:39:20,800][00228] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3153920. Throughput: 0: 1007.7. Samples: 788946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:39:20,806][00228] Avg episode reward: [(0, '21.613')] [2025-02-20 15:39:25,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.4). Total num frames: 3174400. Throughput: 0: 997.4. Samples: 791048. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:39:25,801][00228] Avg episode reward: [(0, '21.516')] [2025-02-20 15:39:30,035][04105] Updated weights for policy 0, policy_version 780 (0.0014) [2025-02-20 15:39:30,800][00228] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3194880. Throughput: 0: 1021.7. Samples: 797944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:39:30,805][00228] Avg episode reward: [(0, '20.513')] [2025-02-20 15:39:35,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3215360. Throughput: 0: 1003.7. Samples: 804064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:39:35,802][00228] Avg episode reward: [(0, '21.266')] [2025-02-20 15:39:35,812][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000785_3215360.pth... [2025-02-20 15:39:35,958][04092] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000546_2236416.pth [2025-02-20 15:39:40,684][04105] Updated weights for policy 0, policy_version 790 (0.0021) [2025-02-20 15:39:40,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3235840. Throughput: 0: 1004.7. Samples: 806290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-20 15:39:40,801][00228] Avg episode reward: [(0, '21.847')] [2025-02-20 15:39:45,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3256320. Throughput: 0: 1015.7. Samples: 813060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:39:45,801][00228] Avg episode reward: [(0, '22.262')] [2025-02-20 15:39:50,801][00228] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 4068.2). Total num frames: 3272704. Throughput: 0: 998.2. Samples: 818884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:39:50,803][00228] Avg episode reward: [(0, '23.032')] [2025-02-20 15:39:51,041][04105] Updated weights for policy 0, policy_version 800 (0.0013) [2025-02-20 15:39:55,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3293184. Throughput: 0: 1007.6. Samples: 821432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:39:55,801][00228] Avg episode reward: [(0, '22.563')] [2025-02-20 15:40:00,328][04105] Updated weights for policy 0, policy_version 810 (0.0013) [2025-02-20 15:40:00,800][00228] Fps is (10 sec: 4506.2, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3317760. Throughput: 0: 1015.6. Samples: 828280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:40:00,802][00228] Avg episode reward: [(0, '22.974')] [2025-02-20 15:40:05,800][00228] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3334144. Throughput: 0: 997.4. Samples: 833828. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2025-02-20 15:40:05,802][00228] Avg episode reward: [(0, '22.751')] [2025-02-20 15:40:10,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3354624. Throughput: 0: 1016.2. Samples: 836778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:40:10,802][00228] Avg episode reward: [(0, '23.303')] [2025-02-20 15:40:10,911][04105] Updated weights for policy 0, policy_version 820 (0.0012) [2025-02-20 15:40:15,800][00228] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3379200. Throughput: 0: 1016.7. Samples: 843694. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:40:15,802][00228] Avg episode reward: [(0, '24.111')] [2025-02-20 15:40:20,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 3395584. Throughput: 0: 995.3. Samples: 848854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:40:20,801][00228] Avg episode reward: [(0, '24.833')] [2025-02-20 15:40:21,351][04105] Updated weights for policy 0, policy_version 830 (0.0016) [2025-02-20 15:40:25,800][00228] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4082.2). Total num frames: 3420160. Throughput: 0: 1017.2. Samples: 852064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:40:25,804][00228] Avg episode reward: [(0, '25.421')] [2025-02-20 15:40:25,811][04092] Saving new best policy, reward=25.421! [2025-02-20 15:40:30,442][04105] Updated weights for policy 0, policy_version 840 (0.0025) [2025-02-20 15:40:30,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3440640. Throughput: 0: 1021.1. Samples: 859008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:40:30,805][00228] Avg episode reward: [(0, '24.692')] [2025-02-20 15:40:35,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3457024. Throughput: 0: 1003.1. Samples: 864022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:40:35,805][00228] Avg episode reward: [(0, '23.763')] [2025-02-20 15:40:40,791][04105] Updated weights for policy 0, policy_version 850 (0.0013) [2025-02-20 15:40:40,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3481600. Throughput: 0: 1024.5. Samples: 867536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:40:40,804][00228] Avg episode reward: [(0, '23.463')] [2025-02-20 15:40:45,800][00228] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3502080. Throughput: 0: 1027.9. Samples: 874536. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:40:45,803][00228] Avg episode reward: [(0, '22.897')] [2025-02-20 15:40:50,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4054.3). Total num frames: 3518464. Throughput: 0: 1013.9. Samples: 879454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:40:50,805][00228] Avg episode reward: [(0, '24.039')] [2025-02-20 15:40:51,489][04105] Updated weights for policy 0, policy_version 860 (0.0015) [2025-02-20 15:40:55,800][00228] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3538944. Throughput: 0: 1026.8. Samples: 882986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:40:55,801][00228] Avg episode reward: [(0, '23.363')] [2025-02-20 15:41:00,715][04105] Updated weights for policy 0, policy_version 870 (0.0013) [2025-02-20 15:41:00,806][00228] Fps is (10 sec: 4503.2, 60 sec: 4095.6, 300 sec: 4082.0). Total num frames: 3563520. Throughput: 0: 1028.1. Samples: 889964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:41:00,809][00228] Avg episode reward: [(0, '24.157')] [2025-02-20 15:41:05,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3579904. Throughput: 0: 1024.5. Samples: 894956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:41:05,805][00228] Avg episode reward: [(0, '24.496')] [2025-02-20 15:41:10,507][04105] Updated weights for policy 0, policy_version 880 (0.0016) [2025-02-20 15:41:10,800][00228] Fps is (10 sec: 4098.2, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3604480. Throughput: 0: 1030.5. Samples: 898438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:41:10,801][00228] Avg episode reward: [(0, '24.839')] [2025-02-20 15:41:15,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3624960. Throughput: 0: 1029.5. Samples: 905334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:41:15,804][00228] Avg episode reward: [(0, '23.927')] [2025-02-20 15:41:20,801][00228] Fps is (10 sec: 3685.9, 60 sec: 4095.9, 300 sec: 4054.3). Total num frames: 3641344. Throughput: 0: 1031.2. Samples: 910426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:41:20,806][00228] Avg episode reward: [(0, '23.513')] [2025-02-20 15:41:21,241][04105] Updated weights for policy 0, policy_version 890 (0.0025) [2025-02-20 15:41:25,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3665920. Throughput: 0: 1030.5. Samples: 913910. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-20 15:41:25,803][00228] Avg episode reward: [(0, '24.496')] [2025-02-20 15:41:30,802][00228] Fps is (10 sec: 4095.7, 60 sec: 4027.6, 300 sec: 4068.2). Total num frames: 3682304. Throughput: 0: 1024.5. Samples: 920642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:41:30,805][00228] Avg episode reward: [(0, '23.723')] [2025-02-20 15:41:30,820][04105] Updated weights for policy 0, policy_version 900 (0.0022) [2025-02-20 15:41:35,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.4). Total num frames: 3702784. Throughput: 0: 1035.5. Samples: 926050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:41:35,806][00228] Avg episode reward: [(0, '23.987')] [2025-02-20 15:41:35,902][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth... [2025-02-20 15:41:36,020][04092] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000666_2727936.pth [2025-02-20 15:41:40,257][04105] Updated weights for policy 0, policy_version 910 (0.0019) [2025-02-20 15:41:40,800][00228] Fps is (10 sec: 4506.5, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3727360. Throughput: 0: 1034.0. Samples: 929514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:41:40,805][00228] Avg episode reward: [(0, '24.175')] [2025-02-20 15:41:45,801][00228] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3743744. Throughput: 0: 1018.9. Samples: 935810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:41:45,805][00228] Avg episode reward: [(0, '24.321')] [2025-02-20 15:41:50,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4068.3). Total num frames: 3764224. Throughput: 0: 1032.1. Samples: 941402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:41:50,805][00228] Avg episode reward: [(0, '23.946')] [2025-02-20 15:41:50,970][04105] Updated weights for policy 0, policy_version 920 (0.0028) [2025-02-20 15:41:55,800][00228] Fps is (10 sec: 4505.9, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3788800. Throughput: 0: 1031.7. Samples: 944864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:41:55,805][00228] Avg episode reward: [(0, '24.742')] [2025-02-20 15:42:00,800][00228] Fps is (10 sec: 4095.9, 60 sec: 4028.1, 300 sec: 4068.2). Total num frames: 3805184. Throughput: 0: 1013.5. Samples: 950944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:42:00,802][00228] Avg episode reward: [(0, '26.557')] [2025-02-20 15:42:00,803][04092] Saving new best policy, reward=26.557! [2025-02-20 15:42:01,388][04105] Updated weights for policy 0, policy_version 930 (0.0013) [2025-02-20 15:42:05,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4068.3). Total num frames: 3825664. Throughput: 0: 1033.4. Samples: 956926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:42:05,801][00228] Avg episode reward: [(0, '26.206')] [2025-02-20 15:42:10,324][04105] Updated weights for policy 0, policy_version 940 (0.0012) [2025-02-20 15:42:10,800][00228] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3850240. Throughput: 0: 1031.3. Samples: 960320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-20 15:42:10,805][00228] Avg episode reward: [(0, '27.205')] [2025-02-20 15:42:10,809][04092] Saving new best policy, reward=27.205! [2025-02-20 15:42:15,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3866624. Throughput: 0: 1008.6. Samples: 966026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-20 15:42:15,802][00228] Avg episode reward: [(0, '27.818')] [2025-02-20 15:42:15,808][04092] Saving new best policy, reward=27.818! [2025-02-20 15:42:20,800][00228] Fps is (10 sec: 3686.5, 60 sec: 4096.1, 300 sec: 4068.2). Total num frames: 3887104. Throughput: 0: 1023.5. Samples: 972106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:42:20,802][00228] Avg episode reward: [(0, '25.793')] [2025-02-20 15:42:20,967][04105] Updated weights for policy 0, policy_version 950 (0.0028) [2025-02-20 15:42:25,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3911680. Throughput: 0: 1023.6. Samples: 975578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:42:25,803][00228] Avg episode reward: [(0, '25.887')] [2025-02-20 15:42:30,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 4068.2). Total num frames: 3928064. Throughput: 0: 1007.7. Samples: 981158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-20 15:42:30,803][00228] Avg episode reward: [(0, '25.425')] [2025-02-20 15:42:31,341][04105] Updated weights for policy 0, policy_version 960 (0.0012) [2025-02-20 15:42:35,800][00228] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3952640. Throughput: 0: 1028.5. Samples: 987684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-20 15:42:35,804][00228] Avg episode reward: [(0, '24.930')] [2025-02-20 15:42:40,122][04105] Updated weights for policy 0, policy_version 970 (0.0020) [2025-02-20 15:42:40,800][00228] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3973120. Throughput: 0: 1030.3. Samples: 991228. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-20 15:42:40,801][00228] Avg episode reward: [(0, '24.085')] [2025-02-20 15:42:45,800][00228] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3989504. Throughput: 0: 1008.3. Samples: 996318. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2025-02-20 15:42:45,802][00228] Avg episode reward: [(0, '25.680')] [2025-02-20 15:42:49,062][04092] Stopping Batcher_0... [2025-02-20 15:42:49,063][04092] Loop batcher_evt_loop terminating... [2025-02-20 15:42:49,063][00228] Component Batcher_0 stopped! [2025-02-20 15:42:49,065][00228] Component RolloutWorker_w0 process died already! Don't wait for it. [2025-02-20 15:42:49,074][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-20 15:42:49,124][04105] Weights refcount: 2 0 [2025-02-20 15:42:49,130][04105] Stopping InferenceWorker_p0-w0... [2025-02-20 15:42:49,130][00228] Component InferenceWorker_p0-w0 stopped! [2025-02-20 15:42:49,131][04105] Loop inference_proc0-0_evt_loop terminating... [2025-02-20 15:42:49,190][04092] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000785_3215360.pth [2025-02-20 15:42:49,207][04092] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-20 15:42:49,380][00228] Component LearnerWorker_p0 stopped! [2025-02-20 15:42:49,381][00228] Component RolloutWorker_w6 stopped! [2025-02-20 15:42:49,382][04092] Stopping LearnerWorker_p0... [2025-02-20 15:42:49,383][04092] Loop learner_proc0_evt_loop terminating... [2025-02-20 15:42:49,380][04112] Stopping RolloutWorker_w6... [2025-02-20 15:42:49,386][04112] Loop rollout_proc6_evt_loop terminating... [2025-02-20 15:42:49,400][00228] Component RolloutWorker_w2 stopped! [2025-02-20 15:42:49,401][04108] Stopping RolloutWorker_w2... [2025-02-20 15:42:49,409][04108] Loop rollout_proc2_evt_loop terminating... [2025-02-20 15:42:49,414][00228] Component RolloutWorker_w4 stopped! [2025-02-20 15:42:49,415][04109] Stopping RolloutWorker_w4... [2025-02-20 15:42:49,418][04109] Loop rollout_proc4_evt_loop terminating... [2025-02-20 15:42:49,544][04107] Stopping RolloutWorker_w1... [2025-02-20 15:42:49,544][00228] Component RolloutWorker_w1 stopped! [2025-02-20 15:42:49,548][04107] Loop rollout_proc1_evt_loop terminating... [2025-02-20 15:42:49,560][04113] Stopping RolloutWorker_w7... [2025-02-20 15:42:49,561][04113] Loop rollout_proc7_evt_loop terminating... [2025-02-20 15:42:49,560][00228] Component RolloutWorker_w7 stopped! [2025-02-20 15:42:49,599][04110] Stopping RolloutWorker_w3... [2025-02-20 15:42:49,599][00228] Component RolloutWorker_w3 stopped! [2025-02-20 15:42:49,602][04110] Loop rollout_proc3_evt_loop terminating... [2025-02-20 15:42:49,635][04111] Stopping RolloutWorker_w5... [2025-02-20 15:42:49,635][00228] Component RolloutWorker_w5 stopped! [2025-02-20 15:42:49,638][04111] Loop rollout_proc5_evt_loop terminating... [2025-02-20 15:42:49,637][00228] Waiting for process learner_proc0 to stop... [2025-02-20 15:42:51,038][00228] Waiting for process inference_proc0-0 to join... [2025-02-20 15:42:51,039][00228] Waiting for process rollout_proc0 to join... [2025-02-20 15:42:51,040][00228] Waiting for process rollout_proc1 to join... [2025-02-20 15:42:53,041][00228] Waiting for process rollout_proc2 to join... [2025-02-20 15:42:53,042][00228] Waiting for process rollout_proc3 to join... [2025-02-20 15:42:53,049][00228] Waiting for process rollout_proc4 to join... [2025-02-20 15:42:53,050][00228] Waiting for process rollout_proc5 to join... [2025-02-20 15:42:53,051][00228] Waiting for process rollout_proc6 to join... [2025-02-20 15:42:53,052][00228] Waiting for process rollout_proc7 to join... [2025-02-20 15:42:53,054][00228] Batcher 0 profile tree view: batching: 23.7952, releasing_batches: 0.0289 [2025-02-20 15:42:53,055][00228] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0005 wait_policy_total: 409.6675 update_model: 8.4336 weight_update: 0.0020 one_step: 0.0084 handle_policy_step: 554.0999 deserialize: 13.3453, stack: 3.0446, obs_to_device_normalize: 121.0718, forward: 287.8764, send_messages: 24.0966 prepare_outputs: 81.0108 to_cpu: 50.8417 [2025-02-20 15:42:53,056][00228] Learner 0 profile tree view: misc: 0.0042, prepare_batch: 12.7919 train: 71.0537 epoch_init: 0.0047, minibatch_init: 0.0154, losses_postprocess: 0.6019, kl_divergence: 0.6152, after_optimizer: 33.7085 calculate_losses: 24.2699 losses_init: 0.0050, forward_head: 1.2969, bptt_initial: 16.1375, tail: 1.0562, advantages_returns: 0.2671, losses: 3.2912 bptt: 1.9912 bptt_forward_core: 1.9343 update: 11.2810 clip: 0.8355 [2025-02-20 15:42:53,057][00228] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3099, enqueue_policy_requests: 92.9892, env_step: 808.5695, overhead: 12.6361, complete_rollouts: 7.5392 save_policy_outputs: 19.0559 split_output_tensors: 7.3732 [2025-02-20 15:42:53,058][00228] Loop Runner_EvtLoop terminating... [2025-02-20 15:42:53,060][00228] Runner profile tree view: main_loop: 1032.1810 [2025-02-20 15:42:53,060][00228] Collected {0: 4005888}, FPS: 3881.0 [2025-02-20 15:42:53,492][00228] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-20 15:42:53,493][00228] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-20 15:42:53,494][00228] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-20 15:42:53,495][00228] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-20 15:42:53,496][00228] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-20 15:42:53,497][00228] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-20 15:42:53,498][00228] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-02-20 15:42:53,499][00228] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-20 15:42:53,500][00228] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-02-20 15:42:53,501][00228] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-02-20 15:42:53,502][00228] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-20 15:42:53,504][00228] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-20 15:42:53,504][00228] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-20 15:42:53,505][00228] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-20 15:42:53,506][00228] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-20 15:42:53,539][00228] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-20 15:42:53,542][00228] RunningMeanStd input shape: (3, 72, 128) [2025-02-20 15:42:53,544][00228] RunningMeanStd input shape: (1,) [2025-02-20 15:42:53,559][00228] ConvEncoder: input_channels=3 [2025-02-20 15:42:53,654][00228] Conv encoder output size: 512 [2025-02-20 15:42:53,655][00228] Policy head output size: 512 [2025-02-20 15:42:53,834][00228] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-20 15:42:54,791][00228] Num frames 100... [2025-02-20 15:42:54,960][00228] Num frames 200... [2025-02-20 15:42:55,125][00228] Num frames 300... [2025-02-20 15:42:55,298][00228] Num frames 400... [2025-02-20 15:42:55,473][00228] Num frames 500... [2025-02-20 15:42:55,647][00228] Num frames 600... [2025-02-20 15:42:55,817][00228] Num frames 700... [2025-02-20 15:42:55,999][00228] Num frames 800... [2025-02-20 15:42:56,098][00228] Avg episode rewards: #0: 16.240, true rewards: #0: 8.240 [2025-02-20 15:42:56,100][00228] Avg episode reward: 16.240, avg true_objective: 8.240 [2025-02-20 15:42:56,211][00228] Num frames 900... [2025-02-20 15:42:56,343][00228] Num frames 1000... [2025-02-20 15:42:56,469][00228] Num frames 1100... [2025-02-20 15:42:56,595][00228] Num frames 1200... [2025-02-20 15:42:56,721][00228] Num frames 1300... [2025-02-20 15:42:56,852][00228] Num frames 1400... [2025-02-20 15:42:56,983][00228] Num frames 1500... [2025-02-20 15:42:57,118][00228] Num frames 1600... [2025-02-20 15:42:57,238][00228] Avg episode rewards: #0: 16.755, true rewards: #0: 8.255 [2025-02-20 15:42:57,240][00228] Avg episode reward: 16.755, avg true_objective: 8.255 [2025-02-20 15:42:57,312][00228] Num frames 1700... [2025-02-20 15:42:57,440][00228] Num frames 1800... [2025-02-20 15:42:57,572][00228] Num frames 1900... [2025-02-20 15:42:57,631][00228] Avg episode rewards: #0: 12.340, true rewards: #0: 6.340 [2025-02-20 15:42:57,631][00228] Avg episode reward: 12.340, avg true_objective: 6.340 [2025-02-20 15:42:57,759][00228] Num frames 2000... [2025-02-20 15:42:57,896][00228] Num frames 2100... [2025-02-20 15:42:58,032][00228] Num frames 2200... [2025-02-20 15:42:58,168][00228] Num frames 2300... [2025-02-20 15:42:58,305][00228] Num frames 2400... [2025-02-20 15:42:58,434][00228] Num frames 2500... [2025-02-20 15:42:58,543][00228] Avg episode rewards: #0: 12.105, true rewards: #0: 6.355 [2025-02-20 15:42:58,544][00228] Avg episode reward: 12.105, avg true_objective: 6.355 [2025-02-20 15:42:58,618][00228] Num frames 2600... [2025-02-20 15:42:58,748][00228] Num frames 2700... [2025-02-20 15:42:58,881][00228] Num frames 2800... [2025-02-20 15:42:59,014][00228] Num frames 2900... [2025-02-20 15:42:59,189][00228] Avg episode rewards: #0: 11.380, true rewards: #0: 5.980 [2025-02-20 15:42:59,190][00228] Avg episode reward: 11.380, avg true_objective: 5.980 [2025-02-20 15:42:59,204][00228] Num frames 3000... [2025-02-20 15:42:59,339][00228] Num frames 3100... [2025-02-20 15:42:59,469][00228] Num frames 3200... [2025-02-20 15:42:59,597][00228] Num frames 3300... [2025-02-20 15:42:59,726][00228] Num frames 3400... [2025-02-20 15:42:59,859][00228] Num frames 3500... [2025-02-20 15:42:59,989][00228] Num frames 3600... [2025-02-20 15:43:00,121][00228] Num frames 3700... [2025-02-20 15:43:00,249][00228] Num frames 3800... [2025-02-20 15:43:00,385][00228] Num frames 3900... [2025-02-20 15:43:00,516][00228] Num frames 4000... [2025-02-20 15:43:00,648][00228] Num frames 4100... [2025-02-20 15:43:00,787][00228] Num frames 4200... [2025-02-20 15:43:00,918][00228] Num frames 4300... [2025-02-20 15:43:01,052][00228] Num frames 4400... [2025-02-20 15:43:01,148][00228] Avg episode rewards: #0: 15.217, true rewards: #0: 7.383 [2025-02-20 15:43:01,149][00228] Avg episode reward: 15.217, avg true_objective: 7.383 [2025-02-20 15:43:01,240][00228] Num frames 4500... [2025-02-20 15:43:01,375][00228] Num frames 4600... [2025-02-20 15:43:01,503][00228] Num frames 4700... [2025-02-20 15:43:01,629][00228] Num frames 4800... [2025-02-20 15:43:01,757][00228] Num frames 4900... [2025-02-20 15:43:01,889][00228] Num frames 5000... [2025-02-20 15:43:02,021][00228] Num frames 5100... [2025-02-20 15:43:02,171][00228] Avg episode rewards: #0: 15.099, true rewards: #0: 7.384 [2025-02-20 15:43:02,172][00228] Avg episode reward: 15.099, avg true_objective: 7.384 [2025-02-20 15:43:02,213][00228] Num frames 5200... [2025-02-20 15:43:02,340][00228] Num frames 5300... [2025-02-20 15:43:02,476][00228] Num frames 5400... [2025-02-20 15:43:02,603][00228] Num frames 5500... [2025-02-20 15:43:02,769][00228] Avg episode rewards: #0: 14.231, true rewards: #0: 6.981 [2025-02-20 15:43:02,770][00228] Avg episode reward: 14.231, avg true_objective: 6.981 [2025-02-20 15:43:02,792][00228] Num frames 5600... [2025-02-20 15:43:02,923][00228] Num frames 5700... [2025-02-20 15:43:03,054][00228] Num frames 5800... [2025-02-20 15:43:03,180][00228] Num frames 5900... [2025-02-20 15:43:03,319][00228] Num frames 6000... [2025-02-20 15:43:03,378][00228] Avg episode rewards: #0: 13.335, true rewards: #0: 6.668 [2025-02-20 15:43:03,379][00228] Avg episode reward: 13.335, avg true_objective: 6.668 [2025-02-20 15:43:03,522][00228] Num frames 6100... [2025-02-20 15:43:03,650][00228] Num frames 6200... [2025-02-20 15:43:03,778][00228] Num frames 6300... [2025-02-20 15:43:03,900][00228] Avg episode rewards: #0: 12.453, true rewards: #0: 6.353 [2025-02-20 15:43:03,901][00228] Avg episode reward: 12.453, avg true_objective: 6.353 [2025-02-20 15:43:42,062][00228] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-02-20 15:45:21,955][00228] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-20 15:45:21,956][00228] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-20 15:45:21,957][00228] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-20 15:45:21,958][00228] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-20 15:45:21,959][00228] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-20 15:45:21,960][00228] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-20 15:45:21,961][00228] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-02-20 15:45:21,961][00228] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-20 15:45:21,962][00228] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-02-20 15:45:21,963][00228] Adding new argument 'hf_repository'='ntn201105/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-02-20 15:45:21,964][00228] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-20 15:45:21,965][00228] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-20 15:45:21,966][00228] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-20 15:45:21,966][00228] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-20 15:45:21,967][00228] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-20 15:45:21,994][00228] RunningMeanStd input shape: (3, 72, 128) [2025-02-20 15:45:21,996][00228] RunningMeanStd input shape: (1,) [2025-02-20 15:45:22,008][00228] ConvEncoder: input_channels=3 [2025-02-20 15:45:22,046][00228] Conv encoder output size: 512 [2025-02-20 15:45:22,047][00228] Policy head output size: 512 [2025-02-20 15:45:22,068][00228] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-20 15:45:22,490][00228] Num frames 100... [2025-02-20 15:45:22,616][00228] Num frames 200... [2025-02-20 15:45:22,741][00228] Num frames 300... [2025-02-20 15:45:22,864][00228] Num frames 400... [2025-02-20 15:45:22,990][00228] Num frames 500... [2025-02-20 15:45:23,116][00228] Num frames 600... [2025-02-20 15:45:23,238][00228] Num frames 700... [2025-02-20 15:45:23,366][00228] Num frames 800... [2025-02-20 15:45:23,500][00228] Num frames 900... [2025-02-20 15:45:23,627][00228] Num frames 1000... [2025-02-20 15:45:23,751][00228] Num frames 1100... [2025-02-20 15:45:23,890][00228] Num frames 1200... [2025-02-20 15:45:24,061][00228] Avg episode rewards: #0: 28.920, true rewards: #0: 12.920 [2025-02-20 15:45:24,062][00228] Avg episode reward: 28.920, avg true_objective: 12.920 [2025-02-20 15:45:24,076][00228] Num frames 1300... [2025-02-20 15:45:24,202][00228] Num frames 1400... [2025-02-20 15:45:24,328][00228] Num frames 1500... [2025-02-20 15:45:24,455][00228] Num frames 1600... [2025-02-20 15:45:24,591][00228] Num frames 1700... [2025-02-20 15:45:24,717][00228] Num frames 1800... [2025-02-20 15:45:24,777][00228] Avg episode rewards: #0: 19.520, true rewards: #0: 9.020 [2025-02-20 15:45:24,778][00228] Avg episode reward: 19.520, avg true_objective: 9.020 [2025-02-20 15:45:24,900][00228] Num frames 1900... [2025-02-20 15:45:25,025][00228] Num frames 2000... [2025-02-20 15:45:25,154][00228] Num frames 2100... [2025-02-20 15:45:25,278][00228] Avg episode rewards: #0: 14.520, true rewards: #0: 7.187 [2025-02-20 15:45:25,279][00228] Avg episode reward: 14.520, avg true_objective: 7.187 [2025-02-20 15:45:25,349][00228] Num frames 2200... [2025-02-20 15:45:25,523][00228] Num frames 2300... [2025-02-20 15:45:25,698][00228] Num frames 2400... [2025-02-20 15:45:25,862][00228] Num frames 2500... [2025-02-20 15:45:26,026][00228] Num frames 2600... [2025-02-20 15:45:26,194][00228] Num frames 2700... [2025-02-20 15:45:26,358][00228] Num frames 2800... [2025-02-20 15:45:26,523][00228] Num frames 2900... [2025-02-20 15:45:26,707][00228] Num frames 3000... [2025-02-20 15:45:26,878][00228] Num frames 3100... [2025-02-20 15:45:27,058][00228] Num frames 3200... [2025-02-20 15:45:27,240][00228] Num frames 3300... [2025-02-20 15:45:27,420][00228] Num frames 3400... [2025-02-20 15:45:27,550][00228] Num frames 3500... [2025-02-20 15:45:27,682][00228] Num frames 3600... [2025-02-20 15:45:27,812][00228] Num frames 3700... [2025-02-20 15:45:27,938][00228] Num frames 3800... [2025-02-20 15:45:28,066][00228] Num frames 3900... [2025-02-20 15:45:28,191][00228] Num frames 4000... [2025-02-20 15:45:28,319][00228] Num frames 4100... [2025-02-20 15:45:28,444][00228] Num frames 4200... [2025-02-20 15:45:28,569][00228] Avg episode rewards: #0: 25.890, true rewards: #0: 10.640 [2025-02-20 15:45:28,570][00228] Avg episode reward: 25.890, avg true_objective: 10.640 [2025-02-20 15:45:28,628][00228] Num frames 4300... [2025-02-20 15:45:28,766][00228] Num frames 4400... [2025-02-20 15:45:28,890][00228] Num frames 4500... [2025-02-20 15:45:29,015][00228] Num frames 4600... [2025-02-20 15:45:29,145][00228] Num frames 4700... [2025-02-20 15:45:29,272][00228] Num frames 4800... [2025-02-20 15:45:29,401][00228] Num frames 4900... [2025-02-20 15:45:29,528][00228] Num frames 5000... [2025-02-20 15:45:29,651][00228] Avg episode rewards: #0: 23.712, true rewards: #0: 10.112 [2025-02-20 15:45:29,652][00228] Avg episode reward: 23.712, avg true_objective: 10.112 [2025-02-20 15:45:29,716][00228] Num frames 5100... [2025-02-20 15:45:29,842][00228] Num frames 5200... [2025-02-20 15:45:29,969][00228] Num frames 5300... [2025-02-20 15:45:30,097][00228] Num frames 5400... [2025-02-20 15:45:30,244][00228] Num frames 5500... [2025-02-20 15:45:30,369][00228] Num frames 5600... [2025-02-20 15:45:30,497][00228] Num frames 5700... [2025-02-20 15:45:30,626][00228] Num frames 5800... [2025-02-20 15:45:30,762][00228] Num frames 5900... [2025-02-20 15:45:30,887][00228] Num frames 6000... [2025-02-20 15:45:31,013][00228] Num frames 6100... [2025-02-20 15:45:31,144][00228] Num frames 6200... [2025-02-20 15:45:31,273][00228] Num frames 6300... [2025-02-20 15:45:31,400][00228] Num frames 6400... [2025-02-20 15:45:31,529][00228] Num frames 6500... [2025-02-20 15:45:31,656][00228] Num frames 6600... [2025-02-20 15:45:31,800][00228] Num frames 6700... [2025-02-20 15:45:31,899][00228] Avg episode rewards: #0: 27.391, true rewards: #0: 11.225 [2025-02-20 15:45:31,901][00228] Avg episode reward: 27.391, avg true_objective: 11.225 [2025-02-20 15:45:31,984][00228] Num frames 6800... [2025-02-20 15:45:32,114][00228] Num frames 6900... [2025-02-20 15:45:32,241][00228] Num frames 7000... [2025-02-20 15:45:32,367][00228] Num frames 7100... [2025-02-20 15:45:32,492][00228] Num frames 7200... [2025-02-20 15:45:32,620][00228] Num frames 7300... [2025-02-20 15:45:32,747][00228] Num frames 7400... [2025-02-20 15:45:32,883][00228] Num frames 7500... [2025-02-20 15:45:33,009][00228] Num frames 7600... [2025-02-20 15:45:33,080][00228] Avg episode rewards: #0: 26.301, true rewards: #0: 10.873 [2025-02-20 15:45:33,081][00228] Avg episode reward: 26.301, avg true_objective: 10.873 [2025-02-20 15:45:33,192][00228] Num frames 7700... [2025-02-20 15:45:33,318][00228] Num frames 7800... [2025-02-20 15:45:33,442][00228] Num frames 7900... [2025-02-20 15:45:33,571][00228] Num frames 8000... [2025-02-20 15:45:33,697][00228] Num frames 8100... [2025-02-20 15:45:33,831][00228] Num frames 8200... [2025-02-20 15:45:33,958][00228] Num frames 8300... [2025-02-20 15:45:34,086][00228] Num frames 8400... [2025-02-20 15:45:34,215][00228] Num frames 8500... [2025-02-20 15:45:34,346][00228] Avg episode rewards: #0: 25.700, true rewards: #0: 10.700 [2025-02-20 15:45:34,346][00228] Avg episode reward: 25.700, avg true_objective: 10.700 [2025-02-20 15:45:34,401][00228] Num frames 8600... [2025-02-20 15:45:34,529][00228] Num frames 8700... [2025-02-20 15:45:34,657][00228] Num frames 8800... [2025-02-20 15:45:34,785][00228] Num frames 8900... [2025-02-20 15:45:34,920][00228] Num frames 9000... [2025-02-20 15:45:35,049][00228] Num frames 9100... [2025-02-20 15:45:35,176][00228] Num frames 9200... [2025-02-20 15:45:35,304][00228] Num frames 9300... [2025-02-20 15:45:35,431][00228] Num frames 9400... [2025-02-20 15:45:35,560][00228] Num frames 9500... [2025-02-20 15:45:35,686][00228] Num frames 9600... [2025-02-20 15:45:35,814][00228] Num frames 9700... [2025-02-20 15:45:35,952][00228] Num frames 9800... [2025-02-20 15:45:36,083][00228] Num frames 9900... [2025-02-20 15:45:36,214][00228] Num frames 10000... [2025-02-20 15:45:36,341][00228] Num frames 10100... [2025-02-20 15:45:36,468][00228] Num frames 10200... [2025-02-20 15:45:36,596][00228] Num frames 10300... [2025-02-20 15:45:36,761][00228] Avg episode rewards: #0: 28.095, true rewards: #0: 11.540 [2025-02-20 15:45:36,762][00228] Avg episode reward: 28.095, avg true_objective: 11.540 [2025-02-20 15:45:36,783][00228] Num frames 10400... [2025-02-20 15:45:36,916][00228] Num frames 10500... [2025-02-20 15:45:37,047][00228] Num frames 10600... [2025-02-20 15:45:37,175][00228] Num frames 10700... [2025-02-20 15:45:37,301][00228] Num frames 10800... [2025-02-20 15:45:37,426][00228] Num frames 10900... [2025-02-20 15:45:37,601][00228] Num frames 11000... [2025-02-20 15:45:37,705][00228] Avg episode rewards: #0: 26.626, true rewards: #0: 11.026 [2025-02-20 15:45:37,706][00228] Avg episode reward: 26.626, avg true_objective: 11.026 [2025-02-20 15:46:43,763][00228] Replay video saved to /content/train_dir/default_experiment/replay.mp4!