2023-06-15 01:56:32,530 INFO [train.py:1056] (0/4) Training started 2023-06-15 01:56:32,535 INFO [train.py:1066] (0/4) Device: cuda:0 2023-06-15 01:56:32,538 INFO [train.py:1075] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Debug', 'k2-with-cuda': True, 'k2-git-sha1': '38211604d6a24b15f320578a1a38f6c12d7a711c', 'k2-git-date': 'Mon Jun 12 10:59:44 2023', 'lhotse-version': '1.15.0.dev+git.f1fd23d.clean', 'torch-version': '2.0.0+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.8', 'icefall-git-branch': 'ted/zipformer', 'icefall-git-sha1': '323a299-dirty', 'icefall-git-date': 'Tue Jun 13 04:47:15 2023', 'icefall-path': '/exp/draj/jsalt2023/icefall', 'k2-path': '/exp/draj/jsalt2023/k2/k2/python/k2/__init__.py', 'lhotse-path': '/exp/draj/jsalt2023/lhotse/lhotse/__init__.py', 'hostname': 'r2n01', 'IP address': '10.1.2.1'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp/v5'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.04, 'lr_batches': 7500, 'lr_epochs': 5.0, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 1, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'manifest_dir': PosixPath('data/manifests'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'blank_id': 0, 'vocab_size': 500} 2023-06-15 01:56:32,539 INFO [train.py:1077] (0/4) About to create model 2023-06-15 01:56:33,401 INFO [train.py:1081] (0/4) Number of model parameters: 65549011 2023-06-15 01:56:46,308 INFO [train.py:1096] (0/4) Using DDP 2023-06-15 01:56:46,646 INFO [asr_datamodule.py:356] (0/4) About to get train cuts 2023-06-15 01:56:46,702 INFO [asr_datamodule.py:185] (0/4) Enable SpecAugment 2023-06-15 01:56:46,702 INFO [asr_datamodule.py:186] (0/4) Time warp factor: 80 2023-06-15 01:56:46,703 INFO [asr_datamodule.py:202] (0/4) About to get Musan cuts 2023-06-15 01:56:46,703 INFO [asr_datamodule.py:205] (0/4) Enable MUSAN 2023-06-15 01:56:48,381 INFO [asr_datamodule.py:227] (0/4) About to create train dataset 2023-06-15 01:56:48,381 INFO [asr_datamodule.py:253] (0/4) Using DynamicBucketingSampler. 2023-06-15 01:56:50,237 INFO [asr_datamodule.py:274] (0/4) About to create train dataloader 2023-06-15 01:56:50,237 INFO [asr_datamodule.py:361] (0/4) About to get dev cuts 2023-06-15 01:56:50,274 INFO [asr_datamodule.py:295] (0/4) About to create dev dataset 2023-06-15 01:56:50,305 INFO [asr_datamodule.py:314] (0/4) About to create dev dataloader 2023-06-15 01:56:50,305 INFO [train.py:1249] (0/4) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2023-06-15 01:57:37,207 INFO [scaling.py:962] (0/4) Whitening: name=None, num_groups=4, num_channels=128, metric=12.72 vs. limit=3.0 2023-06-15 01:57:37,536 INFO [scaling.py:962] (0/4) Whitening: name=None, num_groups=1, num_channels=256, metric=40.75 vs. limit=5.0 2023-06-15 01:57:37,876 INFO [train.py:1277] (0/4) Maximum memory allocated so far is 8736MB 2023-06-15 01:57:40,147 INFO [train.py:1277] (0/4) Maximum memory allocated so far is 8861MB 2023-06-15 01:57:51,998 INFO [train.py:1277] (0/4) Maximum memory allocated so far is 11505MB 2023-06-15 01:57:58,249 INFO [train.py:1277] (0/4) Maximum memory allocated so far is 11834MB 2023-06-15 01:58:17,633 INFO [train.py:1277] (0/4) Maximum memory allocated so far is 11847MB 2023-06-15 01:58:27,470 INFO [train.py:1277] (0/4) Maximum memory allocated so far is 11997MB 2023-06-15 01:58:50,926 INFO [train.py:988] (0/4) Epoch 1, batch 0, loss[loss=7.872, simple_loss=7.171, pruned_loss=6.999, over 16812.00 frames. ], tot_loss[loss=7.872, simple_loss=7.171, pruned_loss=6.999, over 16812.00 frames. ], batch size: 59, lr: 2.00e-02, grad_scale: 1.0 2023-06-15 01:58:50,927 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 01:58:57,871 INFO [train.py:1020] (0/4) Epoch 1, validation: loss=7.824, simple_loss=7.131, pruned_loss=6.914, over 143649.00 frames. 2023-06-15 01:58:57,871 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 11997MB 2023-06-15 01:59:07,016 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=0.0, ans=0.05 2023-06-15 01:59:19,874 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=0.0, ans=0.5 2023-06-15 01:59:22,810 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=66.66666666666667, ans=0.0985 2023-06-15 01:59:43,820 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=66.66666666666667, ans=0.496875 2023-06-15 01:59:44,809 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.45 vs. limit=7.525 2023-06-15 01:59:46,594 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=66.66666666666667, ans=0.0985 2023-06-15 01:59:55,282 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=4.026666666666666 2023-06-15 02:00:28,226 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=7.55 2023-06-15 02:00:33,672 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=239.15 vs. limit=7.65 2023-06-15 02:00:53,246 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=243.44 vs. limit=7.6 2023-06-15 02:01:08,946 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=66.69 vs. limit=7.7 2023-06-15 02:01:10,613 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=202.53 vs. limit=7.6 2023-06-15 02:01:14,756 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=116.19 vs. limit=7.625 2023-06-15 02:01:16,028 INFO [train.py:988] (0/4) Epoch 1, batch 50, loss[loss=1.433, simple_loss=1.286, pruned_loss=1.332, over 20576.00 frames. ], tot_loss[loss=3.401, simple_loss=3.132, pruned_loss=2.633, over 858085.62 frames. ], batch size: 189, lr: 2.20e-02, grad_scale: 0.25 2023-06-15 02:01:32,223 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=4.133333333333334 2023-06-15 02:02:13,232 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466.6666666666667, ans=0.29533333333333334 2023-06-15 02:02:15,899 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=47.80 vs. limit=7.675 2023-06-15 02:02:19,638 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=50.14 vs. limit=7.675 2023-06-15 02:02:26,273 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=36.46 vs. limit=5.266666666666667 2023-06-15 02:02:28,043 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=533.3333333333334, ans=0.29466666666666663 2023-06-15 02:02:28,730 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=14.71 vs. limit=5.133333333333334 2023-06-15 02:02:29,193 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=69.53 vs. limit=7.7 2023-06-15 02:02:32,046 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=533.3333333333334, ans=0.475 2023-06-15 02:02:49,474 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=7.725 2023-06-15 02:02:52,009 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=374.32 vs. limit=7.725 2023-06-15 02:02:55,564 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600.0, ans=0.294 2023-06-15 02:02:55,943 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=37.83 vs. limit=5.3 2023-06-15 02:02:56,071 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=18.87 vs. limit=5.15 2023-06-15 02:02:56,608 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=112.04 vs. limit=7.725 2023-06-15 02:02:57,520 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=600.0, ans=0.048125 2023-06-15 02:03:04,639 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=43.49 vs. limit=5.3 2023-06-15 02:03:07,767 INFO [train.py:988] (0/4) Epoch 1, batch 100, loss[loss=1.252, simple_loss=1.084, pruned_loss=1.345, over 18616.00 frames. ], tot_loss[loss=2.27, simple_loss=2.06, pruned_loss=1.94, over 1491521.07 frames. ], batch size: 80, lr: 2.40e-02, grad_scale: 0.5 2023-06-15 02:03:14,165 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.535e+02 6.068e+02 3.361e+03 1.967e+04, threshold=1.214e+03, percent-clipped=0.0 2023-06-15 02:03:28,247 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=46.95 vs. limit=7.775 2023-06-15 02:03:31,840 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=733.3333333333334, ans=0.7573333333333333 2023-06-15 02:03:38,421 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733.3333333333334, ans=0.29266666666666663 2023-06-15 02:03:46,954 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=4.293333333333333 2023-06-15 02:03:51,339 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=14.01 vs. limit=5.2 2023-06-15 02:03:55,621 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=22.29 vs. limit=7.8 2023-06-15 02:04:16,302 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=8.15 2023-06-15 02:04:20,068 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=8.15 2023-06-15 02:04:20,316 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.35 vs. limit=8.15 2023-06-15 02:04:36,897 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=8.2 2023-06-15 02:04:40,162 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=933.3333333333334, ans=0.04666666666666667 2023-06-15 02:04:45,513 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=65.26 vs. limit=7.85 2023-06-15 02:04:56,163 INFO [train.py:988] (0/4) Epoch 1, batch 150, loss[loss=1.066, simple_loss=0.9105, pruned_loss=1.128, over 19226.00 frames. ], tot_loss[loss=1.78, simple_loss=1.593, pruned_loss=1.618, over 2019709.40 frames. ], batch size: 92, lr: 2.60e-02, grad_scale: 0.5 2023-06-15 02:05:01,023 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=5.25 2023-06-15 02:05:13,250 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1000.0, ans=0.453125 2023-06-15 02:05:19,886 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=32.36 vs. limit=7.9 2023-06-15 02:05:20,018 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=49.87 vs. limit=7.9 2023-06-15 02:05:20,688 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=187.37 vs. limit=7.9 2023-06-15 02:05:24,948 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=7.9 2023-06-15 02:05:34,105 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=137.67 vs. limit=7.9 2023-06-15 02:06:00,213 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1200.0, ans=0.44375 2023-06-15 02:06:29,906 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=49.49 vs. limit=7.975 2023-06-15 02:06:35,851 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1266.6666666666667, ans=0.440625 2023-06-15 02:06:44,809 INFO [train.py:988] (0/4) Epoch 1, batch 200, loss[loss=0.8894, simple_loss=0.7645, pruned_loss=0.8575, over 20042.00 frames. ], tot_loss[loss=1.513, simple_loss=1.34, pruned_loss=1.416, over 2406389.61 frames. ], batch size: 293, lr: 2.80e-02, grad_scale: 1.0 2023-06-15 02:06:45,863 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=8.0 2023-06-15 02:06:50,996 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 9.369e+01 1.058e+02 1.160e+02 1.378e+03, threshold=2.115e+02, percent-clipped=1.0 2023-06-15 02:06:56,427 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=65.64 vs. limit=8.0 2023-06-15 02:06:56,977 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=1333.3333333333333, ans=4.266666666666667 2023-06-15 02:06:59,643 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1333.3333333333333, ans=0.8533333333333334 2023-06-15 02:07:13,428 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=325.69 vs. limit=8.025 2023-06-15 02:07:17,817 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1400.0, ans=0.325 2023-06-15 02:07:36,443 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.75 vs. limit=5.733333333333333 2023-06-15 02:07:38,287 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1466.6666666666667, ans=5.916666666666667 2023-06-15 02:07:49,209 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=26.16 vs. limit=8.075 2023-06-15 02:07:51,460 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=8.075 2023-06-15 02:08:01,639 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=35.87 vs. limit=8.075 2023-06-15 02:08:07,875 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=8.1 2023-06-15 02:08:10,595 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=8.1 2023-06-15 02:08:13,163 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=8.1 2023-06-15 02:08:14,738 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=8.7 2023-06-15 02:08:25,990 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.90 vs. limit=5.4 2023-06-15 02:08:30,675 INFO [train.py:988] (0/4) Epoch 1, batch 250, loss[loss=0.8586, simple_loss=0.7342, pruned_loss=0.8005, over 20050.00 frames. ], tot_loss[loss=1.344, simple_loss=1.179, pruned_loss=1.272, over 2714482.89 frames. ], batch size: 293, lr: 3.00e-02, grad_scale: 1.0 2023-06-15 02:08:31,389 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=24.38 vs. limit=8.125 2023-06-15 02:08:37,155 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1666.6666666666667, ans=0.225 2023-06-15 02:08:42,613 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=176.80 vs. limit=8.125 2023-06-15 02:08:48,169 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=8.75 2023-06-15 02:08:58,725 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=53.17 vs. limit=8.15 2023-06-15 02:09:14,592 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=195.08 vs. limit=8.175 2023-06-15 02:09:28,248 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1800.0, ans=0.415625 2023-06-15 02:09:30,809 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=38.93 vs. limit=8.175 2023-06-15 02:09:34,753 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1866.6666666666667, ans=8.2 2023-06-15 02:09:36,590 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1866.6666666666667, ans=0.4125 2023-06-15 02:09:42,616 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1866.6666666666667, ans=0.4125 2023-06-15 02:09:47,135 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.52 vs. limit=8.9 2023-06-15 02:09:47,611 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=24.26 vs. limit=8.2 2023-06-15 02:10:15,424 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=8.25 2023-06-15 02:10:16,494 INFO [train.py:988] (0/4) Epoch 1, batch 300, loss[loss=0.9198, simple_loss=0.7719, pruned_loss=0.875, over 19103.00 frames. ], tot_loss[loss=1.229, simple_loss=1.069, pruned_loss=1.167, over 2971018.12 frames. ], batch size: 94, lr: 3.20e-02, grad_scale: 2.0 2023-06-15 02:10:16,962 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2000.0, ans=0.40625 2023-06-15 02:10:22,493 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 1.052e+02 1.248e+02 1.628e+02 2.864e+02, threshold=2.496e+02, percent-clipped=3.0 2023-06-15 02:10:29,406 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=8.25 2023-06-15 02:10:36,995 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2066.6666666666665, ans=0.403125 2023-06-15 02:11:10,144 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=4.8533333333333335 2023-06-15 02:11:20,734 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=9.15 2023-06-15 02:11:21,855 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2200.0, ans=0.050499999999999996 2023-06-15 02:11:47,484 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=66.84 vs. limit=6.133333333333333 2023-06-15 02:12:02,178 INFO [train.py:988] (0/4) Epoch 1, batch 350, loss[loss=1.014, simple_loss=0.8446, pruned_loss=0.944, over 18325.00 frames. ], tot_loss[loss=1.149, simple_loss=0.9918, pruned_loss=1.085, over 3168588.29 frames. ], batch size: 72, lr: 3.40e-02, grad_scale: 2.0 2023-06-15 02:12:07,343 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=9.25 2023-06-15 02:12:10,623 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2333.3333333333335, ans=0.390625 2023-06-15 02:12:18,736 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2333.3333333333335, ans=0.20833333333333331 2023-06-15 02:12:19,852 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=27.39 vs. limit=9.25 2023-06-15 02:12:25,642 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=50.56 vs. limit=8.4 2023-06-15 02:12:27,592 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=8.4 2023-06-15 02:12:34,716 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2400.0, ans=0.27599999999999997 2023-06-15 02:12:38,611 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.08 vs. limit=8.4 2023-06-15 02:12:49,895 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=27.82 vs. limit=9.35 2023-06-15 02:12:55,103 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2466.6666666666665, ans=0.8136666666666666 2023-06-15 02:13:01,454 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=8.425 2023-06-15 02:13:23,951 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=8.475 2023-06-15 02:13:25,761 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=9.45 2023-06-15 02:13:27,174 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2600.0, ans=0.27399999999999997 2023-06-15 02:13:27,176 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2600.0, ans=0.378125 2023-06-15 02:13:29,598 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=8.475 2023-06-15 02:13:35,646 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=8.475 2023-06-15 02:13:41,722 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=9.45 2023-06-15 02:13:45,613 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2666.6666666666665, ans=0.24 2023-06-15 02:13:46,995 INFO [train.py:988] (0/4) Epoch 1, batch 400, loss[loss=1.021, simple_loss=0.8513, pruned_loss=0.9097, over 15466.00 frames. ], tot_loss[loss=1.095, simple_loss=0.9382, pruned_loss=1.023, over 3308810.64 frames. ], batch size: 44, lr: 3.60e-02, grad_scale: 4.0 2023-06-15 02:13:50,137 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=29.64 vs. limit=8.5 2023-06-15 02:13:52,979 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.424e+01 1.333e+02 1.553e+02 2.129e+02 3.991e+02, threshold=3.107e+02, percent-clipped=15.0 2023-06-15 02:13:57,910 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=8.5 2023-06-15 02:14:13,816 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2733.3333333333335, ans=0.15833333333333333 2023-06-15 02:14:20,802 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=9.55 2023-06-15 02:14:39,124 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.11 vs. limit=6.4 2023-06-15 02:14:45,562 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=14.67 vs. limit=5.7 2023-06-15 02:14:49,728 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=9.65 2023-06-15 02:14:57,178 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2866.6666666666665, ans=0.365625 2023-06-15 02:15:09,390 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.069e+00 2023-06-15 02:15:13,668 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=9.7 2023-06-15 02:15:15,434 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=5.173333333333334 2023-06-15 02:15:33,120 INFO [train.py:988] (0/4) Epoch 1, batch 450, loss[loss=0.9921, simple_loss=0.8314, pruned_loss=0.8399, over 16275.00 frames. ], tot_loss[loss=1.048, simple_loss=0.8935, pruned_loss=0.9635, over 3417077.44 frames. ], batch size: 52, lr: 3.80e-02, grad_scale: 4.0 2023-06-15 02:15:33,565 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3000.0, ans=0.125 2023-06-15 02:15:36,523 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=20.15 vs. limit=8.625 2023-06-15 02:15:38,551 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=9.75 2023-06-15 02:15:48,165 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=47.36 vs. limit=8.625 2023-06-15 02:16:02,547 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=9.8 2023-06-15 02:16:02,707 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=8.65 2023-06-15 02:16:06,641 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=8.65 2023-06-15 02:16:28,964 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.97 vs. limit=6.566666666666666 2023-06-15 02:16:41,333 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=25.28 vs. limit=8.7 2023-06-15 02:16:53,186 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=9.95 2023-06-15 02:16:54,554 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=8.725 2023-06-15 02:17:00,713 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=8.725 2023-06-15 02:17:01,943 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3266.6666666666665, ans=0.346875 2023-06-15 02:17:12,969 INFO [train.py:988] (0/4) Epoch 1, batch 500, loss[loss=0.8328, simple_loss=0.707, pruned_loss=0.6587, over 19326.00 frames. ], tot_loss[loss=1.007, simple_loss=0.8565, pruned_loss=0.9045, over 3501897.30 frames. ], batch size: 98, lr: 4.00e-02, grad_scale: 8.0 2023-06-15 02:17:17,314 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3333.3333333333335, ans=0.09899494936611666 2023-06-15 02:17:18,742 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.868e+01 1.531e+02 1.902e+02 2.695e+02 6.993e+02, threshold=3.804e+02, percent-clipped=16.0 2023-06-15 02:17:24,088 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.79 vs. limit=6.666666666666667 2023-06-15 02:17:27,059 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333.3333333333335, ans=0.26666666666666666 2023-06-15 02:17:34,753 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3400.0, ans=7.125 2023-06-15 02:17:34,874 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3400.0, ans=0.07250000000000001 2023-06-15 02:17:39,347 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=8.775 2023-06-15 02:17:46,406 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3400.0, ans=0.251 2023-06-15 02:17:52,290 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=8.8 2023-06-15 02:17:55,926 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3466.6666666666665, ans=0.2653333333333333 2023-06-15 02:17:58,464 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=8.8 2023-06-15 02:18:08,743 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3533.3333333333335, ans=0.334375 2023-06-15 02:18:15,226 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-1.pt 2023-06-15 02:18:43,123 INFO [train.py:988] (0/4) Epoch 2, batch 0, loss[loss=0.7612, simple_loss=0.65, pruned_loss=0.5817, over 20520.00 frames. ], tot_loss[loss=0.7612, simple_loss=0.65, pruned_loss=0.5817, over 20520.00 frames. ], batch size: 189, lr: 3.96e-02, grad_scale: 16.0 2023-06-15 02:18:43,124 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 02:18:49,242 INFO [train.py:1020] (0/4) Epoch 2, validation: loss=0.7911, simple_loss=0.6884, pruned_loss=0.5718, over 143649.00 frames. 2023-06-15 02:18:49,243 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 02:18:57,153 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=5.888333333333334 2023-06-15 02:19:17,083 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=10.215 2023-06-15 02:19:22,619 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=10.215 2023-06-15 02:19:29,804 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3686.6666666666665, ans=0.32718749999999996 2023-06-15 02:19:31,632 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3686.6666666666665, ans=0.32718749999999996 2023-06-15 02:19:32,783 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.60 vs. limit=5.921666666666667 2023-06-15 02:19:38,489 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=8.8825 2023-06-15 02:19:49,989 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3753.3333333333335, ans=0.21246666666666666 2023-06-15 02:19:55,746 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3753.3333333333335, ans=0.32406250000000003 2023-06-15 02:20:04,727 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=5.501333333333333 2023-06-15 02:20:12,345 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=10.365 2023-06-15 02:20:15,218 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3820.0, ans=0.3209375 2023-06-15 02:20:25,557 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=3.573 2023-06-15 02:20:28,463 INFO [train.py:988] (0/4) Epoch 2, batch 50, loss[loss=0.7525, simple_loss=0.6494, pruned_loss=0.5455, over 19971.00 frames. ], tot_loss[loss=0.7731, simple_loss=0.6637, pruned_loss=0.5749, over 855626.02 frames. ], batch size: 126, lr: 3.95e-02, grad_scale: 16.0 2023-06-15 02:20:32,777 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3886.6666666666665, ans=0.7639666666666667 2023-06-15 02:20:54,572 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3953.3333333333335, ans=0.01104999999999999 2023-06-15 02:20:58,155 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3953.3333333333335, ans=0.07529166666666667 2023-06-15 02:21:07,472 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.527e+02 2.379e+02 3.485e+02 5.608e+02 1.271e+03, threshold=6.971e+02, percent-clipped=46.0 2023-06-15 02:21:31,459 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=9.0325 2023-06-15 02:21:58,449 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4153.333333333333, ans=0.04936111111111111 2023-06-15 02:22:00,557 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4153.333333333333, ans=0.3053125 2023-06-15 02:22:06,303 INFO [train.py:988] (0/4) Epoch 2, batch 100, loss[loss=0.7012, simple_loss=0.6131, pruned_loss=0.4807, over 19960.00 frames. ], tot_loss[loss=0.7457, simple_loss=0.644, pruned_loss=0.5395, over 1517620.64 frames. ], batch size: 126, lr: 3.95e-02, grad_scale: 8.0 2023-06-15 02:22:16,192 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4220.0, ans=9.0825 2023-06-15 02:22:26,803 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4286.666666666667, ans=0.04880555555555556 2023-06-15 02:23:37,621 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4486.666666666667, ans=0.7429666666666667 2023-06-15 02:23:45,679 INFO [train.py:988] (0/4) Epoch 2, batch 150, loss[loss=0.641, simple_loss=0.567, pruned_loss=0.4192, over 20267.00 frames. ], tot_loss[loss=0.7206, simple_loss=0.6265, pruned_loss=0.5067, over 2030889.08 frames. ], batch size: 141, lr: 3.95e-02, grad_scale: 8.0 2023-06-15 02:23:47,960 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4553.333333333333, ans=0.0 2023-06-15 02:24:03,968 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=5.848 2023-06-15 02:24:04,213 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=9.2325 2023-06-15 02:24:14,064 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=9.2325 2023-06-15 02:24:25,597 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.805e+02 4.216e+02 6.424e+02 2.276e+03, threshold=8.432e+02, percent-clipped=19.0 2023-06-15 02:24:29,595 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4686.666666666667, ans=0.04713888888888889 2023-06-15 02:25:02,650 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4820.0, ans=0.2723 2023-06-15 02:25:20,896 INFO [train.py:988] (0/4) Epoch 2, batch 200, loss[loss=0.65, simple_loss=0.5814, pruned_loss=0.4074, over 18634.00 frames. ], tot_loss[loss=0.6983, simple_loss=0.6106, pruned_loss=0.4787, over 2414865.48 frames. ], batch size: 80, lr: 3.95e-02, grad_scale: 8.0 2023-06-15 02:26:22,628 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5086.666666666667, ans=0.26156250000000003 2023-06-15 02:26:34,574 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5086.666666666667, ans=0.26156250000000003 2023-06-15 02:26:51,742 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=11.365 2023-06-15 02:26:56,703 INFO [train.py:988] (0/4) Epoch 2, batch 250, loss[loss=0.6102, simple_loss=0.5448, pruned_loss=0.3803, over 20472.00 frames. ], tot_loss[loss=0.6759, simple_loss=0.5943, pruned_loss=0.4524, over 2705823.90 frames. ], batch size: 160, lr: 3.95e-02, grad_scale: 8.0 2023-06-15 02:27:01,030 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.367e-01 2023-06-15 02:27:18,927 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5286.666666666667, ans=0.24713333333333332 2023-06-15 02:27:37,760 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.603e+02 2.822e+02 4.791e+02 8.752e+02 2.397e+03, threshold=9.582e+02, percent-clipped=28.0 2023-06-15 02:27:53,915 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.404e+00 2023-06-15 02:27:57,382 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5420.0, ans=0.044083333333333335 2023-06-15 02:28:29,004 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5486.666666666667, ans=0.043805555555555556 2023-06-15 02:28:32,168 INFO [train.py:988] (0/4) Epoch 2, batch 300, loss[loss=0.6493, simple_loss=0.5859, pruned_loss=0.3903, over 16415.00 frames. ], tot_loss[loss=0.6551, simple_loss=0.5791, pruned_loss=0.4288, over 2942176.36 frames. ], batch size: 52, lr: 3.95e-02, grad_scale: 8.0 2023-06-15 02:28:50,629 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5620.0, ans=0.7033 2023-06-15 02:29:29,320 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5753.333333333333, ans=0.03202083333333333 2023-06-15 02:30:06,972 INFO [train.py:988] (0/4) Epoch 2, batch 350, loss[loss=0.5489, simple_loss=0.5048, pruned_loss=0.3127, over 19443.00 frames. ], tot_loss[loss=0.634, simple_loss=0.564, pruned_loss=0.4052, over 3135593.58 frames. ], batch size: 105, lr: 3.95e-02, grad_scale: 8.0 2023-06-15 02:30:15,150 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=5886.666666666667, ans=0.1 2023-06-15 02:30:32,344 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5953.333333333333, ans=0.2209375 2023-06-15 02:30:46,020 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.693e+02 3.241e+02 6.004e+02 9.437e+02 1.791e+03, threshold=1.201e+03, percent-clipped=24.0 2023-06-15 02:31:38,541 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=6220.0, ans=0.2084375 2023-06-15 02:31:38,940 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=12.165 2023-06-15 02:31:40,055 INFO [train.py:988] (0/4) Epoch 2, batch 400, loss[loss=0.5802, simple_loss=0.5285, pruned_loss=0.3365, over 19547.00 frames. ], tot_loss[loss=0.6158, simple_loss=0.5509, pruned_loss=0.3853, over 3280662.60 frames. ], batch size: 102, lr: 3.95e-02, grad_scale: 16.0 2023-06-15 02:32:07,402 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=6286.666666666667, ans=0.2053125 2023-06-15 02:32:13,059 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=6286.666666666667, ans=0.1 2023-06-15 02:32:29,956 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=6353.333333333333, ans=0.20218750000000002 2023-06-15 02:32:35,664 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=9.9075 2023-06-15 02:32:45,007 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=6420.0, ans=0.03991666666666667 2023-06-15 02:32:58,778 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=9.932500000000001 2023-06-15 02:33:13,455 INFO [train.py:988] (0/4) Epoch 2, batch 450, loss[loss=0.5392, simple_loss=0.5029, pruned_loss=0.2948, over 18278.00 frames. ], tot_loss[loss=0.5982, simple_loss=0.5378, pruned_loss=0.3672, over 3411185.62 frames. ], batch size: 74, lr: 3.94e-02, grad_scale: 8.0 2023-06-15 02:33:19,927 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6553.333333333333, ans=0.23446666666666666 2023-06-15 02:33:23,270 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=6553.333333333333, ans=0.05904166666666667 2023-06-15 02:33:38,573 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=6620.0, ans=0.6683 2023-06-15 02:33:48,860 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=9.9825 2023-06-15 02:33:53,434 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=6686.666666666667, ans=0.18656250000000002 2023-06-15 02:33:54,731 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+02 3.705e+02 5.386e+02 8.050e+02 1.837e+03, threshold=1.077e+03, percent-clipped=8.0 2023-06-15 02:34:07,784 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=12.565000000000001 2023-06-15 02:34:13,906 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=6753.333333333333, ans=0.6636333333333334 2023-06-15 02:34:19,533 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.57 vs. limit=12.565000000000001 2023-06-15 02:34:41,930 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=6886.666666666667, ans=0.009372463768115942 2023-06-15 02:34:43,328 INFO [train.py:988] (0/4) Epoch 2, batch 500, loss[loss=0.5558, simple_loss=0.5112, pruned_loss=0.3127, over 19463.00 frames. ], tot_loss[loss=0.5828, simple_loss=0.5272, pruned_loss=0.3505, over 3504229.94 frames. ], batch size: 105, lr: 3.94e-02, grad_scale: 8.0 2023-06-15 02:35:05,034 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=6953.333333333333, ans=0.6566333333333334 2023-06-15 02:35:38,479 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-2.pt 2023-06-15 02:35:59,374 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=7100.0, ans=0.1671875 2023-06-15 02:36:00,641 INFO [train.py:988] (0/4) Epoch 3, batch 0, loss[loss=0.5241, simple_loss=0.4958, pruned_loss=0.2767, over 17129.00 frames. ], tot_loss[loss=0.5241, simple_loss=0.4958, pruned_loss=0.2767, over 17129.00 frames. ], batch size: 60, lr: 3.84e-02, grad_scale: 16.0 2023-06-15 02:36:00,642 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 02:36:06,800 INFO [train.py:1020] (0/4) Epoch 3, validation: loss=0.4219, simple_loss=0.4383, pruned_loss=0.1731, over 143649.00 frames. 2023-06-15 02:36:06,801 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 02:36:10,618 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=7100.0, ans=0.1671875 2023-06-15 02:36:21,382 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=12.825 2023-06-15 02:36:35,624 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=7166.666666666667, ans=0.1640625 2023-06-15 02:36:39,346 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.63 vs. limit=4.075 2023-06-15 02:36:46,013 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=7233.333333333333, ans=0.0 2023-06-15 02:36:54,883 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7233.333333333333, ans=0.22766666666666668 2023-06-15 02:37:05,107 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=7300.0, ans=0.15781250000000002 2023-06-15 02:37:12,039 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=7300.0, ans=0.6445000000000001 2023-06-15 02:37:17,508 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 3.334e+02 5.511e+02 7.843e+02 1.620e+03, threshold=1.102e+03, percent-clipped=11.0 2023-06-15 02:37:19,224 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=10.2625 2023-06-15 02:37:20,006 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=7366.666666666667, ans=0.15468749999999998 2023-06-15 02:37:33,184 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7366.666666666667, ans=0.22633333333333333 2023-06-15 02:37:36,279 INFO [train.py:988] (0/4) Epoch 3, batch 50, loss[loss=0.5236, simple_loss=0.4947, pruned_loss=0.2771, over 18613.00 frames. ], tot_loss[loss=0.5187, simple_loss=0.484, pruned_loss=0.2821, over 868266.97 frames. ], batch size: 80, lr: 3.83e-02, grad_scale: 8.0 2023-06-15 02:37:45,382 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=7433.333333333333, ans=0.6398333333333334 2023-06-15 02:38:06,986 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=7500.0, ans=0.1484375 2023-06-15 02:38:07,156 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=7500.0, ans=0.07 2023-06-15 02:38:46,188 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=7700.0, ans=9.8125 2023-06-15 02:39:06,059 INFO [train.py:988] (0/4) Epoch 3, batch 100, loss[loss=0.4933, simple_loss=0.4689, pruned_loss=0.2578, over 19771.00 frames. ], tot_loss[loss=0.5122, simple_loss=0.48, pruned_loss=0.2759, over 1537668.18 frames. ], batch size: 115, lr: 3.83e-02, grad_scale: 8.0 2023-06-15 02:39:34,219 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=7833.333333333333, ans=0.8283333333333334 2023-06-15 02:40:19,694 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+02 2.598e+02 5.024e+02 8.933e+02 2.174e+03, threshold=1.005e+03, percent-clipped=11.0 2023-06-15 02:40:35,853 INFO [train.py:988] (0/4) Epoch 3, batch 150, loss[loss=0.5438, simple_loss=0.5139, pruned_loss=0.2875, over 16973.00 frames. ], tot_loss[loss=0.5086, simple_loss=0.4785, pruned_loss=0.2717, over 2028131.96 frames. ], batch size: 60, lr: 3.83e-02, grad_scale: 8.0 2023-06-15 02:40:49,869 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=8100.0, ans=0.0 2023-06-15 02:41:09,081 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=8233.333333333334, ans=0.6118333333333335 2023-06-15 02:41:29,433 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=7.32 2023-06-15 02:41:36,142 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=8300.0, ans=0.03208333333333334 2023-06-15 02:41:41,019 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=8300.0, ans=0.03208333333333334 2023-06-15 02:41:57,197 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8366.666666666666, ans=0.21633333333333332 2023-06-15 02:42:04,801 INFO [train.py:988] (0/4) Epoch 3, batch 200, loss[loss=0.4871, simple_loss=0.4549, pruned_loss=0.2631, over 20657.00 frames. ], tot_loss[loss=0.5024, simple_loss=0.4746, pruned_loss=0.2663, over 2418336.23 frames. ], batch size: 211, lr: 3.83e-02, grad_scale: 8.0 2023-06-15 02:42:26,816 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=7.125 2023-06-15 02:42:31,178 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=8500.0, ans=0.03125 2023-06-15 02:43:02,135 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=8633.333333333334, ans=0.008992753623188406 2023-06-15 02:43:05,327 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=8633.333333333334, ans=0.008992753623188406 2023-06-15 02:43:16,970 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+02 3.306e+02 5.233e+02 8.261e+02 1.948e+03, threshold=1.047e+03, percent-clipped=15.0 2023-06-15 02:43:17,307 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=8700.0, ans=0.5955 2023-06-15 02:43:17,512 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=8700.0, ans=0.008978260869565217 2023-06-15 02:43:33,618 INFO [train.py:988] (0/4) Epoch 3, batch 250, loss[loss=0.4856, simple_loss=0.4654, pruned_loss=0.2504, over 18280.00 frames. ], tot_loss[loss=0.4969, simple_loss=0.471, pruned_loss=0.2616, over 2700904.52 frames. ], batch size: 74, lr: 3.83e-02, grad_scale: 8.0 2023-06-15 02:43:58,363 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8833.333333333334, ans=0.21166666666666667 2023-06-15 02:44:19,965 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=8900.0, ans=0.11767500000000002 2023-06-15 02:44:49,978 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=9033.333333333334, ans=0.125 2023-06-15 02:44:55,411 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9033.333333333334, ans=0.20966666666666667 2023-06-15 02:44:57,162 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=9033.333333333334, ans=0.035 2023-06-15 02:45:01,260 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9100.0, ans=0.20900000000000002 2023-06-15 02:45:03,144 INFO [train.py:988] (0/4) Epoch 3, batch 300, loss[loss=0.4669, simple_loss=0.4526, pruned_loss=0.2363, over 20126.00 frames. ], tot_loss[loss=0.489, simple_loss=0.4655, pruned_loss=0.2555, over 2943931.61 frames. ], batch size: 133, lr: 3.82e-02, grad_scale: 8.0 2023-06-15 02:45:03,661 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=9100.0, ans=0.125 2023-06-15 02:45:13,864 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=9100.0, ans=0.125 2023-06-15 02:45:29,760 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=9166.666666666666, ans=0.125 2023-06-15 02:45:57,244 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9300.0, ans=0.125 2023-06-15 02:46:00,414 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=9300.0, ans=0.008847826086956521 2023-06-15 02:46:13,385 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=9366.666666666666, ans=0.008833333333333334 2023-06-15 02:46:16,345 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+02 2.844e+02 4.784e+02 7.568e+02 1.827e+03, threshold=9.568e+02, percent-clipped=19.0 2023-06-15 02:46:21,809 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=9366.666666666666, ans=0.3405 2023-06-15 02:46:32,393 INFO [train.py:988] (0/4) Epoch 3, batch 350, loss[loss=0.4706, simple_loss=0.4569, pruned_loss=0.238, over 18933.00 frames. ], tot_loss[loss=0.485, simple_loss=0.4635, pruned_loss=0.2517, over 3126171.56 frames. ], batch size: 86, lr: 3.82e-02, grad_scale: 8.0 2023-06-15 02:47:07,273 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=9566.666666666666, ans=0.04949747468305833 2023-06-15 02:47:43,544 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=9700.0, ans=0.008760869565217391 2023-06-15 02:48:02,100 INFO [train.py:988] (0/4) Epoch 3, batch 400, loss[loss=0.4466, simple_loss=0.4404, pruned_loss=0.2207, over 18607.00 frames. ], tot_loss[loss=0.4788, simple_loss=0.4597, pruned_loss=0.2467, over 3269632.62 frames. ], batch size: 80, lr: 3.82e-02, grad_scale: 16.0 2023-06-15 02:48:28,662 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=11.1875 2023-06-15 02:48:35,548 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=14.875 2023-06-15 02:49:03,926 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9966.666666666666, ans=0.125 2023-06-15 02:49:11,015 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=9966.666666666666, ans=0.20033333333333334 2023-06-15 02:49:15,711 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.793e+02 4.517e+02 6.574e+02 1.219e+03, threshold=9.033e+02, percent-clipped=7.0 2023-06-15 02:49:19,052 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=11.2625 2023-06-15 02:49:23,379 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=10033.333333333334, ans=0.05 2023-06-15 02:49:29,583 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=7.508333333333334 2023-06-15 02:49:31,986 INFO [train.py:988] (0/4) Epoch 3, batch 450, loss[loss=0.5032, simple_loss=0.4912, pruned_loss=0.2533, over 15135.00 frames. ], tot_loss[loss=0.4732, simple_loss=0.4568, pruned_loss=0.2419, over 3372567.29 frames. ], batch size: 43, lr: 3.82e-02, grad_scale: 16.0 2023-06-15 02:50:00,181 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.083333333333332 2023-06-15 02:50:14,109 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.81 vs. limit=7.558333333333334 2023-06-15 02:50:18,631 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=10233.333333333334, ans=0.024027777777777773 2023-06-15 02:50:51,382 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.275 2023-06-15 02:50:59,006 INFO [train.py:988] (0/4) Epoch 3, batch 500, loss[loss=0.4118, simple_loss=0.4159, pruned_loss=0.1973, over 19477.00 frames. ], tot_loss[loss=0.4663, simple_loss=0.4519, pruned_loss=0.2372, over 3472537.81 frames. ], batch size: 105, lr: 3.81e-02, grad_scale: 16.0 2023-06-15 02:51:21,498 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=8.2 2023-06-15 02:51:50,905 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-3.pt 2023-06-15 02:52:16,085 INFO [train.py:988] (0/4) Epoch 4, batch 0, loss[loss=0.4392, simple_loss=0.4342, pruned_loss=0.2179, over 20326.00 frames. ], tot_loss[loss=0.4392, simple_loss=0.4342, pruned_loss=0.2179, over 20326.00 frames. ], batch size: 149, lr: 3.66e-02, grad_scale: 32.0 2023-06-15 02:52:16,086 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 02:52:22,222 INFO [train.py:1020] (0/4) Epoch 4, validation: loss=0.3338, simple_loss=0.3946, pruned_loss=0.1182, over 143649.00 frames. 2023-06-15 02:52:22,222 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 02:52:38,303 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.821e+02 4.565e+02 6.318e+02 1.774e+03, threshold=9.130e+02, percent-clipped=10.0 2023-06-15 02:52:48,220 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 02:53:10,950 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=10780.0, ans=0.021750000000000002 2023-06-15 02:53:50,950 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=10980.0, ans=0.125 2023-06-15 02:53:52,045 INFO [train.py:988] (0/4) Epoch 4, batch 50, loss[loss=0.4207, simple_loss=0.4298, pruned_loss=0.1996, over 19535.00 frames. ], tot_loss[loss=0.4365, simple_loss=0.4343, pruned_loss=0.2149, over 861769.03 frames. ], batch size: 102, lr: 3.66e-02, grad_scale: 16.0 2023-06-15 02:54:18,036 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=11046.666666666666, ans=0.008468115942028986 2023-06-15 02:54:23,891 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11046.666666666666, ans=0.18953333333333333 2023-06-15 02:54:48,690 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11180.0, ans=0.125 2023-06-15 02:55:23,238 INFO [train.py:988] (0/4) Epoch 4, batch 100, loss[loss=0.4048, simple_loss=0.4147, pruned_loss=0.1922, over 19094.00 frames. ], tot_loss[loss=0.4348, simple_loss=0.4341, pruned_loss=0.2134, over 1511880.27 frames. ], batch size: 94, lr: 3.66e-02, grad_scale: 16.0 2023-06-15 02:55:42,254 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 3.113e+02 4.609e+02 7.608e+02 1.612e+03, threshold=9.219e+02, percent-clipped=13.0 2023-06-15 02:55:44,442 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=11380.0, ans=0.025 2023-06-15 02:55:57,170 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=11380.0, ans=0.125 2023-06-15 02:56:19,985 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=11513.333333333334, ans=10.0 2023-06-15 02:56:25,252 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=8.605333333333334 2023-06-15 02:56:52,237 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=11646.666666666666, ans=0.125 2023-06-15 02:56:53,607 INFO [train.py:988] (0/4) Epoch 4, batch 150, loss[loss=0.4707, simple_loss=0.4779, pruned_loss=0.2275, over 17629.00 frames. ], tot_loss[loss=0.4303, simple_loss=0.4324, pruned_loss=0.2098, over 2016230.84 frames. ], batch size: 67, lr: 3.66e-02, grad_scale: 16.0 2023-06-15 02:57:06,145 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=11646.666666666666, ans=0.3747 2023-06-15 02:57:21,372 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=11713.333333333334, ans=0.4900333333333333 2023-06-15 02:57:22,986 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=11713.333333333334, ans=0.017861111111111105 2023-06-15 02:57:23,190 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=11713.333333333334, ans=0.125 2023-06-15 02:57:33,998 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11780.0, ans=0.18219999999999997 2023-06-15 02:57:45,765 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=11846.666666666666, ans=0.3777 2023-06-15 02:58:12,836 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=11913.333333333334, ans=0.4830333333333333 2023-06-15 02:58:23,377 INFO [train.py:988] (0/4) Epoch 4, batch 200, loss[loss=0.4018, simple_loss=0.4171, pruned_loss=0.1894, over 19071.00 frames. ], tot_loss[loss=0.426, simple_loss=0.4312, pruned_loss=0.2063, over 2392462.95 frames. ], batch size: 89, lr: 3.65e-02, grad_scale: 16.0 2023-06-15 02:58:40,779 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.791e+02 4.180e+02 6.641e+02 1.358e+03, threshold=8.360e+02, percent-clipped=6.0 2023-06-15 02:58:41,148 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12046.666666666666, ans=0.125 2023-06-15 02:58:41,340 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=12046.666666666666, ans=0.008250724637681159 2023-06-15 02:58:54,463 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12046.666666666666, ans=0.125 2023-06-15 02:59:06,193 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=12113.333333333334, ans=0.016194444444444442 2023-06-15 02:59:09,609 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=12113.333333333334, ans=0.05 2023-06-15 02:59:17,982 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=12180.0, ans=0.125 2023-06-15 02:59:21,672 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=12180.0, ans=0.8718 2023-06-15 02:59:42,871 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=12246.666666666666, ans=0.125 2023-06-15 02:59:55,810 INFO [train.py:988] (0/4) Epoch 4, batch 250, loss[loss=0.4143, simple_loss=0.4354, pruned_loss=0.1933, over 18927.00 frames. ], tot_loss[loss=0.422, simple_loss=0.4279, pruned_loss=0.2043, over 2718208.70 frames. ], batch size: 86, lr: 3.65e-02, grad_scale: 16.0 2023-06-15 03:01:26,955 INFO [train.py:988] (0/4) Epoch 4, batch 300, loss[loss=0.4176, simple_loss=0.4404, pruned_loss=0.1951, over 17619.00 frames. ], tot_loss[loss=0.4188, simple_loss=0.4262, pruned_loss=0.2024, over 2965344.27 frames. ], batch size: 67, lr: 3.65e-02, grad_scale: 16.0 2023-06-15 03:01:37,765 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=12646.666666666666, ans=0.125 2023-06-15 03:01:41,384 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=12646.666666666666, ans=0.008120289855072464 2023-06-15 03:01:44,401 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.980e+02 4.845e+02 6.504e+02 1.050e+03, threshold=9.691e+02, percent-clipped=10.0 2023-06-15 03:02:56,505 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.342500000000001 2023-06-15 03:02:59,260 INFO [train.py:988] (0/4) Epoch 4, batch 350, loss[loss=0.3853, simple_loss=0.4083, pruned_loss=0.1801, over 19086.00 frames. ], tot_loss[loss=0.4141, simple_loss=0.4234, pruned_loss=0.1997, over 3154075.97 frames. ], batch size: 94, lr: 3.64e-02, grad_scale: 16.0 2023-06-15 03:03:08,125 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=12980.0, ans=0.125 2023-06-15 03:03:26,037 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=13046.666666666666, ans=0.07 2023-06-15 03:03:42,456 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=13113.333333333334, ans=12.4175 2023-06-15 03:03:43,579 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13113.333333333334, ans=0.125 2023-06-15 03:04:29,342 INFO [train.py:988] (0/4) Epoch 4, batch 400, loss[loss=0.3921, simple_loss=0.4238, pruned_loss=0.1801, over 18335.00 frames. ], tot_loss[loss=0.4114, simple_loss=0.4232, pruned_loss=0.1977, over 3298369.52 frames. ], batch size: 72, lr: 3.64e-02, grad_scale: 32.0 2023-06-15 03:04:46,861 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 3.163e+02 4.789e+02 6.291e+02 1.274e+03, threshold=9.578e+02, percent-clipped=4.0 2023-06-15 03:05:20,554 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=13446.666666666666, ans=0.4017 2023-06-15 03:05:59,015 INFO [train.py:988] (0/4) Epoch 4, batch 450, loss[loss=0.3829, simple_loss=0.4191, pruned_loss=0.1734, over 15415.00 frames. ], tot_loss[loss=0.4087, simple_loss=0.4219, pruned_loss=0.1962, over 3393119.29 frames. ], batch size: 44, lr: 3.64e-02, grad_scale: 16.0 2023-06-15 03:06:04,332 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=13646.666666666666, ans=0.007902898550724638 2023-06-15 03:06:42,117 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=13780.0, ans=0.125 2023-06-15 03:06:45,532 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=13780.0, ans=0.125 2023-06-15 03:06:48,718 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=13846.666666666666, ans=0.125 2023-06-15 03:06:52,047 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=13846.666666666666, ans=0.007859420289855073 2023-06-15 03:07:16,472 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=13913.333333333334, ans=0.4130333333333333 2023-06-15 03:07:24,277 INFO [train.py:988] (0/4) Epoch 4, batch 500, loss[loss=0.4152, simple_loss=0.4428, pruned_loss=0.1938, over 16760.00 frames. ], tot_loss[loss=0.4048, simple_loss=0.4201, pruned_loss=0.1935, over 3494708.52 frames. ], batch size: 59, lr: 3.63e-02, grad_scale: 16.0 2023-06-15 03:07:25,260 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=13980.0, ans=12.7425 2023-06-15 03:07:33,126 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=13980.0, ans=0.00841666666666667 2023-06-15 03:07:38,328 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13980.0, ans=0.1602 2023-06-15 03:07:42,943 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+02 2.873e+02 4.186e+02 6.544e+02 1.200e+03, threshold=8.372e+02, percent-clipped=10.0 2023-06-15 03:08:07,666 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=12.7925 2023-06-15 03:08:10,701 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=9.645333333333333 2023-06-15 03:08:17,624 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-4.pt 2023-06-15 03:08:43,613 INFO [train.py:988] (0/4) Epoch 5, batch 0, loss[loss=0.3887, simple_loss=0.4107, pruned_loss=0.1833, over 20333.00 frames. ], tot_loss[loss=0.3887, simple_loss=0.4107, pruned_loss=0.1833, over 20333.00 frames. ], batch size: 149, lr: 3.47e-02, grad_scale: 32.0 2023-06-15 03:08:43,614 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 03:08:49,783 INFO [train.py:1020] (0/4) Epoch 5, validation: loss=0.2868, simple_loss=0.3756, pruned_loss=0.09898, over 143649.00 frames. 2023-06-15 03:08:49,784 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 03:09:27,188 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=14326.666666666666, ans=0.00697222222222222 2023-06-15 03:09:42,652 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=14393.333333333334, ans=0.025 2023-06-15 03:09:42,891 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=14393.333333333334, ans=0.125 2023-06-15 03:10:18,923 INFO [train.py:988] (0/4) Epoch 5, batch 50, loss[loss=0.4062, simple_loss=0.4326, pruned_loss=0.19, over 18471.00 frames. ], tot_loss[loss=0.3881, simple_loss=0.4104, pruned_loss=0.1829, over 867953.74 frames. ], batch size: 77, lr: 3.46e-02, grad_scale: 32.0 2023-06-15 03:10:41,094 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=9.837333333333333 2023-06-15 03:11:10,140 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.824e+02 3.862e+02 4.906e+02 1.527e+03, threshold=7.724e+02, percent-clipped=12.0 2023-06-15 03:11:15,776 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=14726.666666666666, ans=0.05 2023-06-15 03:11:27,037 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.89 vs. limit=12.363333333333333 2023-06-15 03:11:48,658 INFO [train.py:988] (0/4) Epoch 5, batch 100, loss[loss=0.3871, simple_loss=0.4171, pruned_loss=0.1786, over 19240.00 frames. ], tot_loss[loss=0.3844, simple_loss=0.4093, pruned_loss=0.1798, over 1532735.27 frames. ], batch size: 92, lr: 3.46e-02, grad_scale: 32.0 2023-06-15 03:11:55,149 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=13.0725 2023-06-15 03:12:09,142 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=13.0975 2023-06-15 03:12:23,987 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=14993.333333333334, ans=0.4249 2023-06-15 03:12:48,612 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 03:12:56,134 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=15060.0, ans=0.02 2023-06-15 03:13:17,973 INFO [train.py:988] (0/4) Epoch 5, batch 150, loss[loss=0.3912, simple_loss=0.4217, pruned_loss=0.1803, over 16375.00 frames. ], tot_loss[loss=0.385, simple_loss=0.4104, pruned_loss=0.1798, over 2025519.65 frames. ], batch size: 52, lr: 3.46e-02, grad_scale: 32.0 2023-06-15 03:13:47,817 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15260.0, ans=0.1474 2023-06-15 03:14:10,259 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+02 2.930e+02 4.326e+02 6.625e+02 9.040e+02, threshold=8.653e+02, percent-clipped=9.0 2023-06-15 03:14:14,152 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=15393.333333333334, ans=0.14606666666666665 2023-06-15 03:14:21,016 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=15393.333333333334, ans=0.0025277777777777746 2023-06-15 03:14:28,337 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=15460.0, ans=0.125 2023-06-15 03:14:48,000 INFO [train.py:988] (0/4) Epoch 5, batch 200, loss[loss=0.445, simple_loss=0.4696, pruned_loss=0.2102, over 15433.00 frames. ], tot_loss[loss=0.3839, simple_loss=0.4104, pruned_loss=0.1786, over 2410808.06 frames. ], batch size: 44, lr: 3.45e-02, grad_scale: 32.0 2023-06-15 03:15:05,667 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15593.333333333334, ans=0.14406666666666668 2023-06-15 03:15:05,827 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15593.333333333334, ans=0.14406666666666668 2023-06-15 03:15:09,868 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=15593.333333333334, ans=0.1 2023-06-15 03:15:16,287 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=19.195 2023-06-15 03:15:19,058 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=15593.333333333334, ans=0.0016944444444444429 2023-06-15 03:15:26,735 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15660.0, ans=0.1434 2023-06-15 03:15:49,833 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=15726.666666666666, ans=19.295 2023-06-15 03:16:18,836 INFO [train.py:988] (0/4) Epoch 5, batch 250, loss[loss=0.3683, simple_loss=0.4035, pruned_loss=0.1666, over 19208.00 frames. ], tot_loss[loss=0.3816, simple_loss=0.409, pruned_loss=0.1771, over 2714995.77 frames. ], batch size: 92, lr: 3.45e-02, grad_scale: 32.0 2023-06-15 03:17:11,388 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.885e+02 4.184e+02 6.160e+02 1.201e+03, threshold=8.369e+02, percent-clipped=9.0 2023-06-15 03:17:16,892 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=16060.0, ans=0.05 2023-06-15 03:17:35,461 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=16126.666666666666, ans=0.3355666666666667 2023-06-15 03:17:49,369 INFO [train.py:988] (0/4) Epoch 5, batch 300, loss[loss=0.3364, simple_loss=0.377, pruned_loss=0.1479, over 19105.00 frames. ], tot_loss[loss=0.379, simple_loss=0.4076, pruned_loss=0.1752, over 2955598.98 frames. ], batch size: 94, lr: 3.45e-02, grad_scale: 32.0 2023-06-15 03:19:18,972 INFO [train.py:988] (0/4) Epoch 5, batch 350, loss[loss=0.3573, simple_loss=0.3895, pruned_loss=0.1626, over 20510.00 frames. ], tot_loss[loss=0.3772, simple_loss=0.4064, pruned_loss=0.174, over 3135916.08 frames. ], batch size: 160, lr: 3.44e-02, grad_scale: 32.0 2023-06-15 03:19:25,852 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16526.666666666668, ans=0.13473333333333332 2023-06-15 03:19:44,756 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=16593.333333333332, ans=0.1340666666666667 2023-06-15 03:19:56,149 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=16660.0, ans=0.007247826086956522 2023-06-15 03:19:57,567 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=16660.0, ans=0.125 2023-06-15 03:20:09,902 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.834e+02 3.544e+02 5.201e+02 9.187e+02, threshold=7.089e+02, percent-clipped=1.0 2023-06-15 03:20:10,192 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=16726.666666666668, ans=0.007233333333333333 2023-06-15 03:20:23,187 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=20.045 2023-06-15 03:20:39,793 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=20.095 2023-06-15 03:20:47,493 INFO [train.py:988] (0/4) Epoch 5, batch 400, loss[loss=0.3576, simple_loss=0.393, pruned_loss=0.1611, over 19844.00 frames. ], tot_loss[loss=0.3745, simple_loss=0.4049, pruned_loss=0.1721, over 3286630.64 frames. ], batch size: 120, lr: 3.44e-02, grad_scale: 32.0 2023-06-15 03:21:25,237 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=16993.333333333332, ans=0.30523333333333347 2023-06-15 03:21:34,414 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=16993.333333333332, ans=0.125 2023-06-15 03:22:03,584 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=13.9225 2023-06-15 03:22:16,331 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=17193.333333333332, ans=0.29823333333333346 2023-06-15 03:22:17,704 INFO [train.py:988] (0/4) Epoch 5, batch 450, loss[loss=0.3659, simple_loss=0.391, pruned_loss=0.1704, over 20688.00 frames. ], tot_loss[loss=0.3726, simple_loss=0.404, pruned_loss=0.1706, over 3396321.25 frames. ], batch size: 211, lr: 3.44e-02, grad_scale: 32.0 2023-06-15 03:22:25,176 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17193.333333333332, ans=0.0 2023-06-15 03:22:30,754 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=17193.333333333332, ans=0.29823333333333346 2023-06-15 03:23:02,375 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17326.666666666668, ans=0.125 2023-06-15 03:23:08,864 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 3.161e+02 3.951e+02 6.319e+02 1.120e+03, threshold=7.903e+02, percent-clipped=17.0 2023-06-15 03:23:23,371 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=17393.333333333332, ans=0.29123333333333346 2023-06-15 03:23:35,833 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17460.0, ans=0.0 2023-06-15 03:23:42,495 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=17460.0, ans=0.1254 2023-06-15 03:23:45,489 INFO [train.py:988] (0/4) Epoch 5, batch 500, loss[loss=0.3441, simple_loss=0.3843, pruned_loss=0.152, over 18641.00 frames. ], tot_loss[loss=0.3698, simple_loss=0.4021, pruned_loss=0.1687, over 3478891.90 frames. ], batch size: 80, lr: 3.43e-02, grad_scale: 32.0 2023-06-15 03:23:55,806 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=17526.666666666668, ans=0.125 2023-06-15 03:24:12,458 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=17593.333333333332, ans=0.125 2023-06-15 03:24:24,032 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17660.0, ans=0.12340000000000001 2023-06-15 03:24:40,072 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-5.pt 2023-06-15 03:25:04,642 INFO [train.py:988] (0/4) Epoch 6, batch 0, loss[loss=0.3563, simple_loss=0.3916, pruned_loss=0.1605, over 19100.00 frames. ], tot_loss[loss=0.3563, simple_loss=0.3916, pruned_loss=0.1605, over 19100.00 frames. ], batch size: 89, lr: 3.27e-02, grad_scale: 32.0 2023-06-15 03:25:04,643 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 03:25:10,739 INFO [train.py:1020] (0/4) Epoch 6, validation: loss=0.268, simple_loss=0.365, pruned_loss=0.08554, over 143649.00 frames. 2023-06-15 03:25:10,740 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 03:25:13,265 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=9.436666666666667 2023-06-15 03:25:40,465 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=17813.333333333332, ans=0.006997101449275362 2023-06-15 03:25:41,561 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=14.18 2023-06-15 03:25:56,530 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=17880.0, ans=0.125 2023-06-15 03:25:56,599 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=17880.0, ans=0.2742 2023-06-15 03:26:07,494 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=17946.666666666668, ans=0.0 2023-06-15 03:26:21,323 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.81 vs. limit=21.009999999999998 2023-06-15 03:26:23,138 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.21 vs. limit=14.006666666666666 2023-06-15 03:26:30,214 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.488e+02 3.063e+02 4.314e+02 9.185e+02, threshold=6.126e+02, percent-clipped=4.0 2023-06-15 03:26:37,322 INFO [train.py:988] (0/4) Epoch 6, batch 50, loss[loss=0.3526, simple_loss=0.396, pruned_loss=0.1546, over 19526.00 frames. ], tot_loss[loss=0.3605, simple_loss=0.3977, pruned_loss=0.1617, over 853840.52 frames. ], batch size: 102, lr: 3.26e-02, grad_scale: 32.0 2023-06-15 03:26:41,602 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=18080.0, ans=0.125 2023-06-15 03:26:43,613 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=5.712 2023-06-15 03:26:55,798 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=18146.666666666668, ans=0.11853333333333332 2023-06-15 03:27:13,809 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=18213.333333333332, ans=0.0 2023-06-15 03:27:14,285 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=21.16 2023-06-15 03:27:29,474 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=18280.0, ans=0.125 2023-06-15 03:27:36,200 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=18280.0, ans=0.125 2023-06-15 03:27:45,516 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=21.259999999999998 2023-06-15 03:27:48,147 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=18346.666666666668, ans=0.125 2023-06-15 03:28:02,259 INFO [train.py:988] (0/4) Epoch 6, batch 100, loss[loss=0.336, simple_loss=0.3812, pruned_loss=0.1454, over 19071.00 frames. ], tot_loss[loss=0.3571, simple_loss=0.3942, pruned_loss=0.16, over 1508568.03 frames. ], batch size: 94, lr: 3.26e-02, grad_scale: 16.0 2023-06-15 03:28:09,738 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=18413.333333333332, ans=0.125 2023-06-15 03:28:35,273 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=18546.666666666668, ans=0.0 2023-06-15 03:28:45,707 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=11.418666666666667 2023-06-15 03:29:10,742 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=18680.0, ans=14.504999999999999 2023-06-15 03:29:13,244 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=18680.0, ans=0.006808695652173913 2023-06-15 03:29:22,869 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.570e+02 3.710e+02 4.834e+02 1.052e+03, threshold=7.420e+02, percent-clipped=12.0 2023-06-15 03:29:27,804 INFO [train.py:988] (0/4) Epoch 6, batch 150, loss[loss=0.3719, simple_loss=0.4085, pruned_loss=0.1677, over 18284.00 frames. ], tot_loss[loss=0.3567, simple_loss=0.3932, pruned_loss=0.16, over 1984840.13 frames. ], batch size: 74, lr: 3.25e-02, grad_scale: 16.0 2023-06-15 03:29:33,026 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=18746.666666666668, ans=0.0 2023-06-15 03:29:44,104 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=18813.333333333332, ans=0.125 2023-06-15 03:30:54,544 INFO [train.py:988] (0/4) Epoch 6, batch 200, loss[loss=0.3559, simple_loss=0.3924, pruned_loss=0.1597, over 19101.00 frames. ], tot_loss[loss=0.3545, simple_loss=0.3913, pruned_loss=0.1589, over 2397319.02 frames. ], batch size: 89, lr: 3.25e-02, grad_scale: 16.0 2023-06-15 03:30:54,762 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=19080.0, ans=0.125 2023-06-15 03:31:01,477 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=19080.0, ans=0.125 2023-06-15 03:31:02,940 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=19080.0, ans=0.125 2023-06-15 03:31:17,691 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19146.666666666668, ans=0.0 2023-06-15 03:31:32,529 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=19213.333333333332, ans=0.22753333333333337 2023-06-15 03:31:39,975 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=5.882 2023-06-15 03:31:45,602 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=19280.0, ans=0.125 2023-06-15 03:31:48,554 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=19280.0, ans=0.006678260869565217 2023-06-15 03:31:58,962 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=19280.0, ans=0.0 2023-06-15 03:32:01,264 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19280.0, ans=0.10720000000000002 2023-06-15 03:32:03,299 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=11.738666666666667 2023-06-15 03:32:15,537 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+02 2.813e+02 3.502e+02 4.540e+02 9.537e+02, threshold=7.003e+02, percent-clipped=3.0 2023-06-15 03:32:20,489 INFO [train.py:988] (0/4) Epoch 6, batch 250, loss[loss=0.3209, simple_loss=0.3723, pruned_loss=0.1347, over 19080.00 frames. ], tot_loss[loss=0.3534, simple_loss=0.3901, pruned_loss=0.1583, over 2699657.89 frames. ], batch size: 89, lr: 3.25e-02, grad_scale: 16.0 2023-06-15 03:32:22,489 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=19413.333333333332, ans=0.125 2023-06-15 03:32:22,678 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19413.333333333332, ans=0.10586666666666669 2023-06-15 03:32:37,965 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=19480.0, ans=0.21820000000000006 2023-06-15 03:32:45,558 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=19480.0, ans=0.125 2023-06-15 03:32:50,478 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=19480.0, ans=0.0 2023-06-15 03:33:02,845 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=5.932 2023-06-15 03:33:08,377 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=19546.666666666668, ans=0.125 2023-06-15 03:33:16,861 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=19613.333333333332, ans=0.0 2023-06-15 03:33:20,653 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=19613.333333333332, ans=0.05 2023-06-15 03:33:23,968 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=19613.333333333332, ans=0.0 2023-06-15 03:33:29,474 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=19680.0, ans=0.0 2023-06-15 03:33:38,420 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19680.0, ans=0.10320000000000001 2023-06-15 03:33:46,375 INFO [train.py:988] (0/4) Epoch 6, batch 300, loss[loss=0.3658, simple_loss=0.3892, pruned_loss=0.1712, over 20281.00 frames. ], tot_loss[loss=0.3525, simple_loss=0.3896, pruned_loss=0.1577, over 2948880.97 frames. ], batch size: 141, lr: 3.24e-02, grad_scale: 16.0 2023-06-15 03:34:01,892 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=19813.333333333332, ans=0.20653333333333346 2023-06-15 03:34:27,009 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=19880.0, ans=0.125 2023-06-15 03:34:56,808 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=20013.333333333332, ans=0.006518840579710146 2023-06-15 03:35:05,807 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0 2023-06-15 03:35:08,283 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+02 2.795e+02 3.555e+02 5.115e+02 9.485e+02, threshold=7.110e+02, percent-clipped=8.0 2023-06-15 03:35:09,369 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-06-15 03:35:13,422 INFO [train.py:988] (0/4) Epoch 6, batch 350, loss[loss=0.3369, simple_loss=0.3876, pruned_loss=0.1431, over 17043.00 frames. ], tot_loss[loss=0.3529, simple_loss=0.3908, pruned_loss=0.1575, over 3124848.11 frames. ], batch size: 60, lr: 3.24e-02, grad_scale: 16.0 2023-06-15 03:35:18,749 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20080.0, ans=0.1 2023-06-15 03:35:25,460 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=20080.0, ans=0.125 2023-06-15 03:35:38,537 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2023-06-15 03:35:47,845 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20213.333333333332, ans=0.1 2023-06-15 03:36:20,324 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=20280.0, ans=0.125 2023-06-15 03:36:26,946 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2023-06-15 03:36:36,135 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=20346.666666666668, ans=0.125 2023-06-15 03:36:40,615 INFO [train.py:988] (0/4) Epoch 6, batch 400, loss[loss=0.373, simple_loss=0.4155, pruned_loss=0.1653, over 16726.00 frames. ], tot_loss[loss=0.3508, simple_loss=0.389, pruned_loss=0.1563, over 3275879.60 frames. ], batch size: 59, lr: 3.24e-02, grad_scale: 32.0 2023-06-15 03:36:44,257 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=20413.333333333332, ans=0.125 2023-06-15 03:36:54,181 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=20413.333333333332, ans=0.125 2023-06-15 03:36:58,981 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=20480.0, ans=0.0 2023-06-15 03:37:15,935 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=20546.666666666668, ans=0.0 2023-06-15 03:38:01,337 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.580e+02 3.237e+02 4.539e+02 1.447e+03, threshold=6.474e+02, percent-clipped=10.0 2023-06-15 03:38:06,908 INFO [train.py:988] (0/4) Epoch 6, batch 450, loss[loss=0.3546, simple_loss=0.3881, pruned_loss=0.1605, over 20128.00 frames. ], tot_loss[loss=0.3504, simple_loss=0.3891, pruned_loss=0.1559, over 3393170.46 frames. ], batch size: 133, lr: 3.23e-02, grad_scale: 32.0 2023-06-15 03:38:17,039 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=20746.666666666668, ans=0.006359420289855073 2023-06-15 03:39:07,073 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=20946.666666666668, ans=0.5 2023-06-15 03:39:26,513 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=21013.333333333332, ans=0.125 2023-06-15 03:39:31,071 INFO [train.py:988] (0/4) Epoch 6, batch 500, loss[loss=0.4166, simple_loss=0.4533, pruned_loss=0.19, over 11371.00 frames. ], tot_loss[loss=0.3483, simple_loss=0.3866, pruned_loss=0.155, over 3479342.35 frames. ], batch size: 32, lr: 3.23e-02, grad_scale: 32.0 2023-06-15 03:39:46,824 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2023-06-15 03:40:01,073 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=21146.666666666668, ans=0.2 2023-06-15 03:40:11,392 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-06-15 03:40:17,543 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=21213.333333333332, ans=0.2 2023-06-15 03:40:24,343 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-6.pt 2023-06-15 03:40:50,446 INFO [train.py:988] (0/4) Epoch 7, batch 0, loss[loss=0.3674, simple_loss=0.4209, pruned_loss=0.157, over 18310.00 frames. ], tot_loss[loss=0.3674, simple_loss=0.4209, pruned_loss=0.157, over 18310.00 frames. ], batch size: 72, lr: 3.07e-02, grad_scale: 32.0 2023-06-15 03:40:50,447 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 03:40:56,359 INFO [train.py:1020] (0/4) Epoch 7, validation: loss=0.2561, simple_loss=0.3562, pruned_loss=0.07803, over 143649.00 frames. 2023-06-15 03:40:56,359 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 03:41:09,125 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-06-15 03:41:14,032 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.54 vs. limit=10.0 2023-06-15 03:41:19,473 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+02 2.547e+02 3.083e+02 4.575e+02 1.238e+03, threshold=6.165e+02, percent-clipped=14.0 2023-06-15 03:41:44,659 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=21433.333333333332, ans=0.006210144927536233 2023-06-15 03:42:20,197 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=21633.333333333332, ans=0.0 2023-06-15 03:42:21,391 INFO [train.py:988] (0/4) Epoch 7, batch 50, loss[loss=0.3596, simple_loss=0.3537, pruned_loss=0.1827, over 16972.00 frames. ], tot_loss[loss=0.3423, simple_loss=0.3834, pruned_loss=0.1506, over 855765.30 frames. ], batch size: 391, lr: 3.07e-02, grad_scale: 32.0 2023-06-15 03:42:41,484 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 03:43:49,170 INFO [train.py:988] (0/4) Epoch 7, batch 100, loss[loss=0.3712, simple_loss=0.4123, pruned_loss=0.165, over 16787.00 frames. ], tot_loss[loss=0.3425, simple_loss=0.3836, pruned_loss=0.1507, over 1507512.36 frames. ], batch size: 59, lr: 3.06e-02, grad_scale: 32.0 2023-06-15 03:43:55,658 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-06-15 03:44:13,034 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+02 2.359e+02 2.777e+02 3.733e+02 1.026e+03, threshold=5.554e+02, percent-clipped=6.0 2023-06-15 03:44:21,939 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=22100.0, ans=0.0 2023-06-15 03:44:25,740 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=22100.0, ans=0.035 2023-06-15 03:44:52,778 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=22166.666666666668, ans=0.006050724637681159 2023-06-15 03:44:53,318 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.94 vs. limit=22.5 2023-06-15 03:45:04,440 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=22233.333333333332, ans=0.125 2023-06-15 03:45:13,923 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-06-15 03:45:16,414 INFO [train.py:988] (0/4) Epoch 7, batch 150, loss[loss=0.369, simple_loss=0.4171, pruned_loss=0.1604, over 16997.00 frames. ], tot_loss[loss=0.3413, simple_loss=0.3842, pruned_loss=0.1492, over 2007940.00 frames. ], batch size: 60, lr: 3.06e-02, grad_scale: 32.0 2023-06-15 03:45:19,019 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=22300.0, ans=0.0 2023-06-15 03:45:53,647 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=22433.333333333332, ans=0.125 2023-06-15 03:46:29,627 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=22566.666666666668, ans=0.015 2023-06-15 03:46:45,791 INFO [train.py:988] (0/4) Epoch 7, batch 200, loss[loss=0.3215, simple_loss=0.3572, pruned_loss=0.1428, over 20243.00 frames. ], tot_loss[loss=0.3393, simple_loss=0.3819, pruned_loss=0.1483, over 2411656.05 frames. ], batch size: 239, lr: 3.05e-02, grad_scale: 32.0 2023-06-15 03:46:58,071 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=22633.333333333332, ans=0.125 2023-06-15 03:47:05,362 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=22700.0, ans=0.125 2023-06-15 03:47:07,532 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-06-15 03:47:11,638 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.670e+02 3.468e+02 4.313e+02 7.598e+02, threshold=6.936e+02, percent-clipped=8.0 2023-06-15 03:47:17,224 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=22700.0, ans=0.125 2023-06-15 03:48:01,652 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=22900.0, ans=0.1 2023-06-15 03:48:02,121 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.14 vs. limit=22.5 2023-06-15 03:48:10,867 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=22900.0, ans=0.1 2023-06-15 03:48:15,524 INFO [train.py:988] (0/4) Epoch 7, batch 250, loss[loss=0.3404, simple_loss=0.3844, pruned_loss=0.1482, over 19816.00 frames. ], tot_loss[loss=0.338, simple_loss=0.3816, pruned_loss=0.1472, over 2701784.01 frames. ], batch size: 115, lr: 3.05e-02, grad_scale: 32.0 2023-06-15 03:49:29,305 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=23233.333333333332, ans=0.005818840579710146 2023-06-15 03:49:31,140 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23233.333333333332, ans=0.1 2023-06-15 03:49:36,215 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=23233.333333333332, ans=0.125 2023-06-15 03:49:44,561 INFO [train.py:988] (0/4) Epoch 7, batch 300, loss[loss=0.3313, simple_loss=0.392, pruned_loss=0.1353, over 15164.00 frames. ], tot_loss[loss=0.3372, simple_loss=0.3815, pruned_loss=0.1464, over 2945658.46 frames. ], batch size: 43, lr: 3.05e-02, grad_scale: 32.0 2023-06-15 03:50:08,792 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+02 2.398e+02 2.879e+02 3.819e+02 6.544e+02, threshold=5.757e+02, percent-clipped=0.0 2023-06-15 03:50:14,785 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=23366.666666666668, ans=0.005789855072463768 2023-06-15 03:50:21,912 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=23433.333333333332, ans=0.005775362318840581 2023-06-15 03:50:49,980 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=23500.0, ans=0.015 2023-06-15 03:51:12,754 INFO [train.py:988] (0/4) Epoch 7, batch 350, loss[loss=0.3138, simple_loss=0.3796, pruned_loss=0.124, over 17644.00 frames. ], tot_loss[loss=0.3373, simple_loss=0.381, pruned_loss=0.1468, over 3134911.75 frames. ], batch size: 67, lr: 3.04e-02, grad_scale: 32.0 2023-06-15 03:51:45,196 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=23700.0, ans=0.005717391304347826 2023-06-15 03:51:48,578 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=23766.666666666668, ans=0.0 2023-06-15 03:51:49,264 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.18 vs. limit=22.5 2023-06-15 03:52:21,217 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-06-15 03:52:41,275 INFO [train.py:988] (0/4) Epoch 7, batch 400, loss[loss=0.3451, simple_loss=0.3898, pruned_loss=0.1502, over 19075.00 frames. ], tot_loss[loss=0.3357, simple_loss=0.3795, pruned_loss=0.1459, over 3289133.38 frames. ], batch size: 89, lr: 3.04e-02, grad_scale: 32.0 2023-06-15 03:53:06,045 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 3.077e+02 3.837e+02 5.079e+02 9.527e+02, threshold=7.674e+02, percent-clipped=15.0 2023-06-15 03:53:32,860 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=24166.666666666668, ans=0.125 2023-06-15 03:53:36,290 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=24166.666666666668, ans=0.125 2023-06-15 03:53:37,914 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-06-15 03:53:46,998 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=24166.666666666668, ans=0.0 2023-06-15 03:53:47,098 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=24166.666666666668, ans=0.0 2023-06-15 03:53:50,092 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=24233.333333333332, ans=0.125 2023-06-15 03:54:04,426 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=24233.333333333332, ans=0.005601449275362319 2023-06-15 03:54:09,648 INFO [train.py:988] (0/4) Epoch 7, batch 450, loss[loss=0.3305, simple_loss=0.3637, pruned_loss=0.1486, over 20158.00 frames. ], tot_loss[loss=0.3356, simple_loss=0.3797, pruned_loss=0.1458, over 3396943.80 frames. ], batch size: 239, lr: 3.04e-02, grad_scale: 32.0 2023-06-15 03:54:15,681 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=24300.0, ans=0.2 2023-06-15 03:54:18,025 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-06-15 03:54:19,064 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=24300.0, ans=0.125 2023-06-15 03:54:33,771 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24366.666666666668, ans=0.1 2023-06-15 03:54:40,436 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=24366.666666666668, ans=10.0 2023-06-15 03:54:51,959 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=24433.333333333332, ans=0.125 2023-06-15 03:54:55,090 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24433.333333333332, ans=0.1 2023-06-15 03:55:03,388 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=24500.0, ans=0.035 2023-06-15 03:55:03,601 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=24500.0, ans=0.125 2023-06-15 03:55:14,938 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=24500.0, ans=0.125 2023-06-15 03:55:17,403 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=24566.666666666668, ans=0.005528985507246377 2023-06-15 03:55:35,129 INFO [train.py:988] (0/4) Epoch 7, batch 500, loss[loss=0.3348, simple_loss=0.3769, pruned_loss=0.1464, over 20317.00 frames. ], tot_loss[loss=0.3339, simple_loss=0.3784, pruned_loss=0.1447, over 3495456.83 frames. ], batch size: 141, lr: 3.03e-02, grad_scale: 32.0 2023-06-15 03:55:37,037 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=24633.333333333332, ans=0.0 2023-06-15 03:55:38,811 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=24633.333333333332, ans=0.125 2023-06-15 03:55:52,717 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-06-15 03:55:58,176 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.744e+02 3.251e+02 4.322e+02 7.093e+02, threshold=6.501e+02, percent-clipped=0.0 2023-06-15 03:56:03,465 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=24700.0, ans=0.125 2023-06-15 03:56:05,558 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24700.0, ans=0.1 2023-06-15 03:56:27,474 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-7.pt 2023-06-15 03:56:53,603 INFO [train.py:988] (0/4) Epoch 8, batch 0, loss[loss=0.3415, simple_loss=0.3812, pruned_loss=0.1509, over 19970.00 frames. ], tot_loss[loss=0.3415, simple_loss=0.3812, pruned_loss=0.1509, over 19970.00 frames. ], batch size: 126, lr: 2.89e-02, grad_scale: 32.0 2023-06-15 03:56:53,604 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 03:56:59,690 INFO [train.py:1020] (0/4) Epoch 8, validation: loss=0.2482, simple_loss=0.3483, pruned_loss=0.0741, over 143649.00 frames. 2023-06-15 03:56:59,691 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 03:57:06,040 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2023-06-15 03:58:15,516 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=25113.333333333332, ans=0.125 2023-06-15 03:58:28,212 INFO [train.py:988] (0/4) Epoch 8, batch 50, loss[loss=0.3094, simple_loss=0.3608, pruned_loss=0.1291, over 19437.00 frames. ], tot_loss[loss=0.3284, simple_loss=0.376, pruned_loss=0.1404, over 845626.59 frames. ], batch size: 105, lr: 2.88e-02, grad_scale: 32.0 2023-06-15 03:58:28,430 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=25180.0, ans=0.125 2023-06-15 03:59:05,936 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2023-06-15 03:59:14,666 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-06-15 03:59:25,152 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+02 2.660e+02 2.942e+02 3.457e+02 5.575e+02, threshold=5.885e+02, percent-clipped=0.0 2023-06-15 03:59:44,208 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=10.0 2023-06-15 03:59:57,949 INFO [train.py:988] (0/4) Epoch 8, batch 100, loss[loss=0.3036, simple_loss=0.3571, pruned_loss=0.125, over 19547.00 frames. ], tot_loss[loss=0.3268, simple_loss=0.3743, pruned_loss=0.1397, over 1496446.90 frames. ], batch size: 102, lr: 2.88e-02, grad_scale: 32.0 2023-06-15 04:00:04,931 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=25513.333333333332, ans=0.005323188405797102 2023-06-15 04:00:16,029 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2023-06-15 04:00:44,460 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=25646.666666666668, ans=0.125 2023-06-15 04:01:08,930 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=25780.0, ans=0.0 2023-06-15 04:01:18,055 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=25780.0, ans=0.1 2023-06-15 04:01:20,434 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=25780.0, ans=0.2 2023-06-15 04:01:27,340 INFO [train.py:988] (0/4) Epoch 8, batch 150, loss[loss=0.3558, simple_loss=0.3513, pruned_loss=0.1801, over 16959.00 frames. ], tot_loss[loss=0.3264, simple_loss=0.3736, pruned_loss=0.1396, over 2013123.90 frames. ], batch size: 391, lr: 2.87e-02, grad_scale: 32.0 2023-06-15 04:01:50,162 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25913.333333333332, ans=0.1 2023-06-15 04:01:57,874 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.18 vs. limit=10.0 2023-06-15 04:02:12,970 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-06-15 04:02:20,772 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 04:02:23,620 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.658e+02 2.617e+02 3.387e+02 4.495e+02 9.103e+02, threshold=6.774e+02, percent-clipped=7.0 2023-06-15 04:02:49,171 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=26113.333333333332, ans=0.125 2023-06-15 04:02:55,956 INFO [train.py:988] (0/4) Epoch 8, batch 200, loss[loss=0.307, simple_loss=0.3625, pruned_loss=0.1257, over 19357.00 frames. ], tot_loss[loss=0.3258, simple_loss=0.3742, pruned_loss=0.1387, over 2394143.90 frames. ], batch size: 98, lr: 2.87e-02, grad_scale: 32.0 2023-06-15 04:03:23,977 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=26246.666666666668, ans=0.0 2023-06-15 04:03:31,088 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=26313.333333333332, ans=0.025 2023-06-15 04:03:40,295 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=26313.333333333332, ans=0.125 2023-06-15 04:04:01,129 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=26380.0, ans=0.0 2023-06-15 04:04:10,238 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=26446.666666666668, ans=0.125 2023-06-15 04:04:25,090 INFO [train.py:988] (0/4) Epoch 8, batch 250, loss[loss=0.3207, simple_loss=0.3571, pruned_loss=0.1422, over 20770.00 frames. ], tot_loss[loss=0.3238, simple_loss=0.3724, pruned_loss=0.1376, over 2715888.83 frames. ], batch size: 211, lr: 2.87e-02, grad_scale: 32.0 2023-06-15 04:04:54,067 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=26580.0, ans=0.0 2023-06-15 04:05:04,824 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/checkpoint-4000.pt 2023-06-15 04:05:25,757 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.627e+02 2.237e+02 2.856e+02 3.826e+02 6.923e+02, threshold=5.713e+02, percent-clipped=1.0 2023-06-15 04:05:52,665 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=26780.0, ans=0.0 2023-06-15 04:05:57,928 INFO [train.py:988] (0/4) Epoch 8, batch 300, loss[loss=0.316, simple_loss=0.3699, pruned_loss=0.131, over 19094.00 frames. ], tot_loss[loss=0.3242, simple_loss=0.3719, pruned_loss=0.1383, over 2973765.48 frames. ], batch size: 94, lr: 2.86e-02, grad_scale: 32.0 2023-06-15 04:05:59,888 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=26846.666666666668, ans=0.125 2023-06-15 04:06:05,844 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=26846.666666666668, ans=0.125 2023-06-15 04:06:08,092 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-06-15 04:06:20,380 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2023-06-15 04:07:02,751 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=27046.666666666668, ans=0.125 2023-06-15 04:07:10,156 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=27113.333333333332, ans=0.0 2023-06-15 04:07:11,930 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=27113.333333333332, ans=0.125 2023-06-15 04:07:16,069 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-06-15 04:07:27,557 INFO [train.py:988] (0/4) Epoch 8, batch 350, loss[loss=0.3528, simple_loss=0.3837, pruned_loss=0.161, over 20470.00 frames. ], tot_loss[loss=0.3249, simple_loss=0.3722, pruned_loss=0.1388, over 3152754.32 frames. ], batch size: 160, lr: 2.86e-02, grad_scale: 32.0 2023-06-15 04:08:05,331 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2023-06-15 04:08:17,250 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=27313.333333333332, ans=0.2 2023-06-15 04:08:24,596 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.602e+02 3.097e+02 4.121e+02 7.485e+02, threshold=6.195e+02, percent-clipped=4.0 2023-06-15 04:08:30,179 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27380.0, ans=0.1 2023-06-15 04:08:56,013 INFO [train.py:988] (0/4) Epoch 8, batch 400, loss[loss=0.3257, simple_loss=0.3302, pruned_loss=0.1606, over 16731.00 frames. ], tot_loss[loss=0.3236, simple_loss=0.3717, pruned_loss=0.1377, over 3285832.04 frames. ], batch size: 392, lr: 2.85e-02, grad_scale: 32.0 2023-06-15 04:09:06,550 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=27513.333333333332, ans=0.125 2023-06-15 04:09:07,258 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=27.71 vs. limit=22.5 2023-06-15 04:09:19,125 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=27580.0, ans=0.0 2023-06-15 04:09:31,551 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=27646.666666666668, ans=0.2 2023-06-15 04:09:50,314 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-06-15 04:10:13,291 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=27780.0, ans=0.125 2023-06-15 04:10:27,005 INFO [train.py:988] (0/4) Epoch 8, batch 450, loss[loss=0.3146, simple_loss=0.3633, pruned_loss=0.1329, over 19868.00 frames. ], tot_loss[loss=0.3229, simple_loss=0.3715, pruned_loss=0.1372, over 3408679.34 frames. ], batch size: 120, lr: 2.85e-02, grad_scale: 32.0 2023-06-15 04:10:36,193 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-06-15 04:10:48,602 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2023-06-15 04:10:51,309 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27913.333333333332, ans=0.1 2023-06-15 04:11:03,219 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=27980.0, ans=0.125 2023-06-15 04:11:03,262 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=27980.0, ans=0.0 2023-06-15 04:11:06,879 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=27980.0, ans=0.125 2023-06-15 04:11:23,226 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.630e+02 2.599e+02 3.032e+02 3.753e+02 5.594e+02, threshold=6.064e+02, percent-clipped=0.0 2023-06-15 04:11:45,052 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=28113.333333333332, ans=0.0 2023-06-15 04:11:53,598 INFO [train.py:988] (0/4) Epoch 8, batch 500, loss[loss=0.3347, simple_loss=0.3979, pruned_loss=0.1357, over 17840.00 frames. ], tot_loss[loss=0.3225, simple_loss=0.3716, pruned_loss=0.1367, over 3481202.29 frames. ], batch size: 68, lr: 2.85e-02, grad_scale: 32.0 2023-06-15 04:11:57,187 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=28180.0, ans=0.125 2023-06-15 04:12:19,636 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=28246.666666666668, ans=0.07 2023-06-15 04:12:42,441 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=28380.0, ans=0.125 2023-06-15 04:12:47,235 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-8.pt 2023-06-15 04:13:14,338 INFO [train.py:988] (0/4) Epoch 9, batch 0, loss[loss=0.3198, simple_loss=0.362, pruned_loss=0.1388, over 20346.00 frames. ], tot_loss[loss=0.3198, simple_loss=0.362, pruned_loss=0.1388, over 20346.00 frames. ], batch size: 149, lr: 2.72e-02, grad_scale: 32.0 2023-06-15 04:13:14,339 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 04:13:20,336 INFO [train.py:1020] (0/4) Epoch 9, validation: loss=0.2394, simple_loss=0.343, pruned_loss=0.06786, over 143649.00 frames. 2023-06-15 04:13:20,337 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 04:14:10,558 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=28526.666666666668, ans=0.125 2023-06-15 04:14:46,218 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28660.0, ans=0.1 2023-06-15 04:14:49,778 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.336e+02 2.823e+02 3.585e+02 6.203e+02, threshold=5.645e+02, percent-clipped=2.0 2023-06-15 04:14:49,824 INFO [train.py:988] (0/4) Epoch 9, batch 50, loss[loss=0.3173, simple_loss=0.3598, pruned_loss=0.1374, over 20569.00 frames. ], tot_loss[loss=0.317, simple_loss=0.3677, pruned_loss=0.1331, over 862241.97 frames. ], batch size: 189, lr: 2.71e-02, grad_scale: 32.0 2023-06-15 04:15:00,596 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=28726.666666666668, ans=0.09899494936611666 2023-06-15 04:15:09,810 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.59 vs. limit=22.5 2023-06-15 04:15:35,303 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=28860.0, ans=0.0 2023-06-15 04:15:42,812 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 04:16:01,375 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28993.333333333332, ans=0.125 2023-06-15 04:16:04,082 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2023-06-15 04:16:17,496 INFO [train.py:988] (0/4) Epoch 9, batch 100, loss[loss=0.3126, simple_loss=0.3502, pruned_loss=0.1374, over 20561.00 frames. ], tot_loss[loss=0.3146, simple_loss=0.3654, pruned_loss=0.1319, over 1526464.15 frames. ], batch size: 189, lr: 2.71e-02, grad_scale: 32.0 2023-06-15 04:16:30,762 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=29060.0, ans=0.05 2023-06-15 04:16:41,115 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2023-06-15 04:16:50,763 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=29193.333333333332, ans=0.125 2023-06-15 04:16:59,959 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2023-06-15 04:17:04,503 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=29193.333333333332, ans=0.0 2023-06-15 04:17:09,455 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=29260.0, ans=0.125 2023-06-15 04:17:21,725 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=29260.0, ans=0.125 2023-06-15 04:17:44,301 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+02 2.451e+02 3.023e+02 4.117e+02 8.643e+02, threshold=6.045e+02, percent-clipped=4.0 2023-06-15 04:17:44,348 INFO [train.py:988] (0/4) Epoch 9, batch 150, loss[loss=0.2968, simple_loss=0.3547, pruned_loss=0.1194, over 19696.00 frames. ], tot_loss[loss=0.3166, simple_loss=0.367, pruned_loss=0.1332, over 2032913.76 frames. ], batch size: 110, lr: 2.70e-02, grad_scale: 32.0 2023-06-15 04:18:25,320 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=29526.666666666668, ans=0.125 2023-06-15 04:18:26,103 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2023-06-15 04:19:02,739 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-06-15 04:19:12,160 INFO [train.py:988] (0/4) Epoch 9, batch 200, loss[loss=0.3385, simple_loss=0.4067, pruned_loss=0.1351, over 16212.00 frames. ], tot_loss[loss=0.3166, simple_loss=0.3677, pruned_loss=0.1327, over 2417841.10 frames. ], batch size: 52, lr: 2.70e-02, grad_scale: 32.0 2023-06-15 04:19:25,118 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=29726.666666666668, ans=0.125 2023-06-15 04:19:28,448 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=29793.333333333332, ans=0.0 2023-06-15 04:19:32,270 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=29793.333333333332, ans=0.0 2023-06-15 04:19:39,555 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=29793.333333333332, ans=0.125 2023-06-15 04:20:12,636 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=29926.666666666668, ans=0.0 2023-06-15 04:20:41,735 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+02 2.377e+02 2.781e+02 3.474e+02 5.000e+02, threshold=5.562e+02, percent-clipped=0.0 2023-06-15 04:20:41,782 INFO [train.py:988] (0/4) Epoch 9, batch 250, loss[loss=0.311, simple_loss=0.3697, pruned_loss=0.1262, over 19479.00 frames. ], tot_loss[loss=0.315, simple_loss=0.3663, pruned_loss=0.1318, over 2722408.02 frames. ], batch size: 105, lr: 2.70e-02, grad_scale: 32.0 2023-06-15 04:20:44,359 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.17 vs. limit=10.0 2023-06-15 04:21:03,502 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=30126.666666666668, ans=0.125 2023-06-15 04:21:27,954 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=30193.333333333332, ans=0.125 2023-06-15 04:21:28,115 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=30193.333333333332, ans=0.125 2023-06-15 04:21:31,435 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=30193.333333333332, ans=0.125 2023-06-15 04:21:37,530 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-06-15 04:21:54,833 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=30326.666666666668, ans=0.125 2023-06-15 04:21:58,050 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=30326.666666666668, ans=0.0 2023-06-15 04:22:09,952 INFO [train.py:988] (0/4) Epoch 9, batch 300, loss[loss=0.3409, simple_loss=0.3857, pruned_loss=0.1481, over 20448.00 frames. ], tot_loss[loss=0.3154, simple_loss=0.3667, pruned_loss=0.1321, over 2957980.05 frames. ], batch size: 160, lr: 2.69e-02, grad_scale: 32.0 2023-06-15 04:22:14,202 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=30393.333333333332, ans=0.00426231884057971 2023-06-15 04:22:25,512 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2023-06-15 04:22:30,611 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=15.0 2023-06-15 04:22:44,551 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=30526.666666666668, ans=0.125 2023-06-15 04:23:07,476 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=30593.333333333332, ans=0.07 2023-06-15 04:23:19,608 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=30660.0, ans=0.05 2023-06-15 04:23:19,782 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30660.0, ans=0.1 2023-06-15 04:23:31,319 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=30660.0, ans=0.125 2023-06-15 04:23:40,484 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 2.682e+02 3.189e+02 4.179e+02 7.690e+02, threshold=6.378e+02, percent-clipped=10.0 2023-06-15 04:23:40,556 INFO [train.py:988] (0/4) Epoch 9, batch 350, loss[loss=0.2947, simple_loss=0.357, pruned_loss=0.1162, over 19329.00 frames. ], tot_loss[loss=0.3132, simple_loss=0.3642, pruned_loss=0.1311, over 3152575.46 frames. ], batch size: 98, lr: 2.69e-02, grad_scale: 32.0 2023-06-15 04:23:42,840 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-06-15 04:23:51,023 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=30726.666666666668, ans=0.0 2023-06-15 04:24:00,104 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-06-15 04:24:19,461 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=30860.0, ans=0.2 2023-06-15 04:24:30,414 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=30860.0, ans=0.1 2023-06-15 04:25:03,257 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=30993.333333333332, ans=0.125 2023-06-15 04:25:09,348 INFO [train.py:988] (0/4) Epoch 9, batch 400, loss[loss=0.3518, simple_loss=0.3885, pruned_loss=0.1575, over 20528.00 frames. ], tot_loss[loss=0.3136, simple_loss=0.3644, pruned_loss=0.1314, over 3291828.30 frames. ], batch size: 160, lr: 2.68e-02, grad_scale: 32.0 2023-06-15 04:25:40,851 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31126.666666666668, ans=0.1 2023-06-15 04:25:46,002 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=31193.333333333332, ans=0.125 2023-06-15 04:25:57,360 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=31193.333333333332, ans=0.2 2023-06-15 04:26:19,619 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=12.0 2023-06-15 04:26:23,993 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=31326.666666666668, ans=0.0 2023-06-15 04:26:36,143 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.307e+02 2.949e+02 3.902e+02 6.879e+02, threshold=5.899e+02, percent-clipped=2.0 2023-06-15 04:26:36,190 INFO [train.py:988] (0/4) Epoch 9, batch 450, loss[loss=0.3295, simple_loss=0.3901, pruned_loss=0.1344, over 16708.00 frames. ], tot_loss[loss=0.3138, simple_loss=0.3648, pruned_loss=0.1314, over 3396496.12 frames. ], batch size: 59, lr: 2.68e-02, grad_scale: 32.0 2023-06-15 04:27:05,912 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=31460.0, ans=0.0 2023-06-15 04:27:07,529 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=31460.0, ans=0.125 2023-06-15 04:27:07,656 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=31460.0, ans=0.125 2023-06-15 04:27:14,982 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=31526.666666666668, ans=0.2 2023-06-15 04:27:41,098 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31593.333333333332, ans=0.1 2023-06-15 04:27:48,967 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=31660.0, ans=0.125 2023-06-15 04:27:53,323 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2023-06-15 04:28:01,910 INFO [train.py:988] (0/4) Epoch 9, batch 500, loss[loss=0.3181, simple_loss=0.3653, pruned_loss=0.1354, over 19359.00 frames. ], tot_loss[loss=0.3116, simple_loss=0.363, pruned_loss=0.1301, over 3483553.14 frames. ], batch size: 98, lr: 2.68e-02, grad_scale: 64.0 2023-06-15 04:28:04,480 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=15.0 2023-06-15 04:28:17,408 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=31793.333333333332, ans=0.5 2023-06-15 04:28:20,578 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=31793.333333333332, ans=0.125 2023-06-15 04:28:45,136 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=31860.0, ans=0.2 2023-06-15 04:28:51,455 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=31926.666666666668, ans=0.003928985507246376 2023-06-15 04:28:54,442 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-9.pt 2023-06-15 04:29:21,619 INFO [train.py:988] (0/4) Epoch 10, batch 0, loss[loss=0.2987, simple_loss=0.3544, pruned_loss=0.1215, over 19453.00 frames. ], tot_loss[loss=0.2987, simple_loss=0.3544, pruned_loss=0.1215, over 19453.00 frames. ], batch size: 105, lr: 2.56e-02, grad_scale: 64.0 2023-06-15 04:29:21,620 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 04:29:28,489 INFO [train.py:1020] (0/4) Epoch 10, validation: loss=0.2327, simple_loss=0.3375, pruned_loss=0.06395, over 143649.00 frames. 2023-06-15 04:29:28,491 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 04:29:49,440 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=32006.666666666668, ans=0.1 2023-06-15 04:30:00,345 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=32006.666666666668, ans=0.125 2023-06-15 04:30:01,788 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+02 2.277e+02 2.643e+02 3.234e+02 5.475e+02, threshold=5.286e+02, percent-clipped=0.0 2023-06-15 04:30:14,822 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=32073.333333333332, ans=0.07 2023-06-15 04:30:16,450 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=32073.333333333332, ans=0.1 2023-06-15 04:30:46,467 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 04:30:58,887 INFO [train.py:988] (0/4) Epoch 10, batch 50, loss[loss=0.3172, simple_loss=0.3554, pruned_loss=0.1395, over 20339.00 frames. ], tot_loss[loss=0.3078, simple_loss=0.3576, pruned_loss=0.129, over 854471.83 frames. ], batch size: 239, lr: 2.56e-02, grad_scale: 64.0 2023-06-15 04:31:33,630 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=32406.666666666668, ans=0.125 2023-06-15 04:31:33,638 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=32406.666666666668, ans=0.00382463768115942 2023-06-15 04:31:42,363 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2023-06-15 04:31:58,501 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=32473.333333333332, ans=0.07 2023-06-15 04:32:22,617 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2023-06-15 04:32:28,849 INFO [train.py:988] (0/4) Epoch 10, batch 100, loss[loss=0.2934, simple_loss=0.3541, pruned_loss=0.1163, over 19465.00 frames. ], tot_loss[loss=0.3071, simple_loss=0.3596, pruned_loss=0.1273, over 1508984.08 frames. ], batch size: 105, lr: 2.55e-02, grad_scale: 64.0 2023-06-15 04:32:35,466 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2023-06-15 04:32:35,570 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-06-15 04:32:40,528 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=32606.666666666668, ans=0.2 2023-06-15 04:33:02,080 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+02 2.450e+02 2.873e+02 3.278e+02 7.765e+02, threshold=5.745e+02, percent-clipped=3.0 2023-06-15 04:33:38,893 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=32806.666666666664, ans=0.0 2023-06-15 04:33:42,479 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=32873.333333333336, ans=0.0 2023-06-15 04:33:56,013 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=12.0 2023-06-15 04:33:59,943 INFO [train.py:988] (0/4) Epoch 10, batch 150, loss[loss=0.3261, simple_loss=0.3703, pruned_loss=0.141, over 19940.00 frames. ], tot_loss[loss=0.3045, simple_loss=0.3582, pruned_loss=0.1255, over 2027731.76 frames. ], batch size: 126, lr: 2.55e-02, grad_scale: 64.0 2023-06-15 04:34:03,828 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=32940.0, ans=0.2 2023-06-15 04:34:10,203 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-06-15 04:34:17,529 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=33006.666666666664, ans=0.0 2023-06-15 04:34:46,591 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2023-06-15 04:35:30,177 INFO [train.py:988] (0/4) Epoch 10, batch 200, loss[loss=0.3123, simple_loss=0.34, pruned_loss=0.1423, over 19874.00 frames. ], tot_loss[loss=0.3043, simple_loss=0.3577, pruned_loss=0.1255, over 2408563.85 frames. ], batch size: 293, lr: 2.54e-02, grad_scale: 64.0 2023-06-15 04:35:34,118 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33273.333333333336, ans=0.1 2023-06-15 04:35:39,150 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-06-15 04:35:50,087 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=33340.0, ans=0.05 2023-06-15 04:35:55,898 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=33340.0, ans=0.125 2023-06-15 04:35:59,379 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=33340.0, ans=0.0 2023-06-15 04:36:02,396 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+02 2.312e+02 2.745e+02 3.367e+02 5.641e+02, threshold=5.490e+02, percent-clipped=0.0 2023-06-15 04:36:03,475 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-06-15 04:36:13,783 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33406.666666666664, ans=0.1 2023-06-15 04:36:15,650 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=33406.666666666664, ans=0.125 2023-06-15 04:36:17,908 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-06-15 04:36:19,165 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=33406.666666666664, ans=0.0 2023-06-15 04:36:22,586 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=33473.333333333336, ans=0.07 2023-06-15 04:36:34,427 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=33473.333333333336, ans=0.125 2023-06-15 04:36:49,796 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=33540.0, ans=0.125 2023-06-15 04:36:55,319 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=33540.0, ans=0.125 2023-06-15 04:36:59,960 INFO [train.py:988] (0/4) Epoch 10, batch 250, loss[loss=0.3149, simple_loss=0.3222, pruned_loss=0.1538, over 16929.00 frames. ], tot_loss[loss=0.3048, simple_loss=0.3575, pruned_loss=0.1261, over 2712570.40 frames. ], batch size: 391, lr: 2.54e-02, grad_scale: 64.0 2023-06-15 04:37:08,987 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=33606.666666666664, ans=0.2 2023-06-15 04:37:19,801 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33673.333333333336, ans=0.1 2023-06-15 04:37:23,524 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-06-15 04:37:35,737 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=33740.0, ans=0.125 2023-06-15 04:37:58,608 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=33806.666666666664, ans=0.125 2023-06-15 04:38:00,675 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.56 vs. limit=22.5 2023-06-15 04:38:06,321 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=33806.666666666664, ans=0.125 2023-06-15 04:38:21,412 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=33873.333333333336, ans=0.003505797101449275 2023-06-15 04:38:22,785 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=33873.333333333336, ans=0.125 2023-06-15 04:38:29,329 INFO [train.py:988] (0/4) Epoch 10, batch 300, loss[loss=0.3199, simple_loss=0.3608, pruned_loss=0.1395, over 20572.00 frames. ], tot_loss[loss=0.3057, simple_loss=0.3579, pruned_loss=0.1268, over 2959022.37 frames. ], batch size: 173, lr: 2.54e-02, grad_scale: 64.0 2023-06-15 04:39:01,818 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+02 2.483e+02 2.953e+02 3.724e+02 5.914e+02, threshold=5.906e+02, percent-clipped=1.0 2023-06-15 04:39:18,138 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=34073.333333333336, ans=0.2 2023-06-15 04:39:38,936 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=34206.666666666664, ans=0.125 2023-06-15 04:39:53,171 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-06-15 04:39:59,124 INFO [train.py:988] (0/4) Epoch 10, batch 350, loss[loss=0.3113, simple_loss=0.3699, pruned_loss=0.1264, over 18460.00 frames. ], tot_loss[loss=0.304, simple_loss=0.3571, pruned_loss=0.1255, over 3144613.85 frames. ], batch size: 77, lr: 2.53e-02, grad_scale: 64.0 2023-06-15 04:40:08,066 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=34273.333333333336, ans=0.0034188405797101447 2023-06-15 04:40:10,499 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=34273.333333333336, ans=15.0 2023-06-15 04:40:24,399 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=34340.0, ans=0.125 2023-06-15 04:40:36,069 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=34406.666666666664, ans=0.125 2023-06-15 04:40:47,985 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=34406.666666666664, ans=0.0 2023-06-15 04:41:29,068 INFO [train.py:988] (0/4) Epoch 10, batch 400, loss[loss=0.3334, simple_loss=0.3965, pruned_loss=0.1352, over 17612.00 frames. ], tot_loss[loss=0.3037, simple_loss=0.3577, pruned_loss=0.1248, over 3293005.82 frames. ], batch size: 67, lr: 2.53e-02, grad_scale: 32.0 2023-06-15 04:42:02,410 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+02 2.389e+02 2.906e+02 3.855e+02 6.206e+02, threshold=5.812e+02, percent-clipped=1.0 2023-06-15 04:42:02,669 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=34740.0, ans=0.125 2023-06-15 04:42:02,745 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=34740.0, ans=0.003317391304347826 2023-06-15 04:42:10,308 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=34740.0, ans=0.0 2023-06-15 04:42:10,465 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=34740.0, ans=0.07 2023-06-15 04:42:12,050 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=34740.0, ans=0.125 2023-06-15 04:42:12,091 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=34740.0, ans=0.1 2023-06-15 04:42:54,476 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=34873.333333333336, ans=0.125 2023-06-15 04:42:58,383 INFO [train.py:988] (0/4) Epoch 10, batch 450, loss[loss=0.3346, simple_loss=0.3711, pruned_loss=0.1491, over 20107.00 frames. ], tot_loss[loss=0.3027, simple_loss=0.3563, pruned_loss=0.1246, over 3417454.91 frames. ], batch size: 239, lr: 2.52e-02, grad_scale: 32.0 2023-06-15 04:43:17,965 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=35006.666666666664, ans=0.0 2023-06-15 04:44:02,867 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=35140.0, ans=0.125 2023-06-15 04:44:24,825 INFO [train.py:988] (0/4) Epoch 10, batch 500, loss[loss=0.3001, simple_loss=0.3616, pruned_loss=0.1193, over 19506.00 frames. ], tot_loss[loss=0.3026, simple_loss=0.3573, pruned_loss=0.1239, over 3484807.01 frames. ], batch size: 102, lr: 2.52e-02, grad_scale: 32.0 2023-06-15 04:44:56,328 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+02 2.429e+02 2.839e+02 3.294e+02 4.521e+02, threshold=5.678e+02, percent-clipped=0.0 2023-06-15 04:44:56,828 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=35406.666666666664, ans=0.0 2023-06-15 04:45:02,195 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-06-15 04:45:06,696 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=35406.666666666664, ans=0.2 2023-06-15 04:45:19,615 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-10.pt 2023-06-15 04:45:44,017 INFO [train.py:988] (0/4) Epoch 11, batch 0, loss[loss=0.335, simple_loss=0.3887, pruned_loss=0.1407, over 19196.00 frames. ], tot_loss[loss=0.335, simple_loss=0.3887, pruned_loss=0.1407, over 19196.00 frames. ], batch size: 92, lr: 2.42e-02, grad_scale: 32.0 2023-06-15 04:45:44,018 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 04:45:50,096 INFO [train.py:1020] (0/4) Epoch 11, validation: loss=0.2306, simple_loss=0.3357, pruned_loss=0.06271, over 143649.00 frames. 2023-06-15 04:45:50,097 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 04:46:18,004 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2023-06-15 04:46:25,922 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=35626.666666666664, ans=0.003124637681159421 2023-06-15 04:46:29,817 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=35626.666666666664, ans=0.125 2023-06-15 04:46:40,111 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=35626.666666666664, ans=0.125 2023-06-15 04:46:52,901 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=35693.333333333336, ans=0.125 2023-06-15 04:47:19,220 INFO [train.py:988] (0/4) Epoch 11, batch 50, loss[loss=0.293, simple_loss=0.3533, pruned_loss=0.1164, over 18649.00 frames. ], tot_loss[loss=0.2993, simple_loss=0.357, pruned_loss=0.1208, over 866399.69 frames. ], batch size: 80, lr: 2.41e-02, grad_scale: 32.0 2023-06-15 04:47:28,045 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35826.666666666664, ans=0.1 2023-06-15 04:47:30,028 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=35826.666666666664, ans=0.0 2023-06-15 04:47:33,861 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 04:47:50,266 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35893.333333333336, ans=0.1 2023-06-15 04:48:05,363 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.16 vs. limit=15.0 2023-06-15 04:48:10,281 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2023-06-15 04:48:22,827 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+02 2.367e+02 2.815e+02 3.714e+02 5.103e+02, threshold=5.629e+02, percent-clipped=0.0 2023-06-15 04:48:30,564 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=36093.333333333336, ans=0.1 2023-06-15 04:48:47,410 INFO [train.py:988] (0/4) Epoch 11, batch 100, loss[loss=0.301, simple_loss=0.3586, pruned_loss=0.1217, over 19242.00 frames. ], tot_loss[loss=0.2994, simple_loss=0.3562, pruned_loss=0.1213, over 1512515.75 frames. ], batch size: 92, lr: 2.41e-02, grad_scale: 32.0 2023-06-15 04:49:20,346 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=36226.666666666664, ans=0.125 2023-06-15 04:49:35,748 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=15.0 2023-06-15 04:50:06,009 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=36426.666666666664, ans=0.07 2023-06-15 04:50:18,762 INFO [train.py:988] (0/4) Epoch 11, batch 150, loss[loss=0.2986, simple_loss=0.3487, pruned_loss=0.1242, over 20319.00 frames. ], tot_loss[loss=0.3013, simple_loss=0.3559, pruned_loss=0.1234, over 2014949.84 frames. ], batch size: 149, lr: 2.40e-02, grad_scale: 32.0 2023-06-15 04:50:21,036 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-06-15 04:50:31,465 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-06-15 04:50:46,385 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.75 vs. limit=10.0 2023-06-15 04:50:50,481 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 04:51:13,874 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=36693.333333333336, ans=0.125 2023-06-15 04:51:15,357 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=36693.333333333336, ans=0.125 2023-06-15 04:51:22,606 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+02 2.254e+02 2.489e+02 3.022e+02 4.758e+02, threshold=4.979e+02, percent-clipped=0.0 2023-06-15 04:51:24,875 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=36693.333333333336, ans=0.002892753623188406 2023-06-15 04:51:47,711 INFO [train.py:988] (0/4) Epoch 11, batch 200, loss[loss=0.3116, simple_loss=0.3803, pruned_loss=0.1214, over 17629.00 frames. ], tot_loss[loss=0.2994, simple_loss=0.3555, pruned_loss=0.1217, over 2406437.99 frames. ], batch size: 67, lr: 2.40e-02, grad_scale: 32.0 2023-06-15 04:52:00,109 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.08 vs. limit=22.5 2023-06-15 04:52:17,943 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 04:52:31,944 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=36960.0, ans=0.025 2023-06-15 04:52:42,665 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=37026.666666666664, ans=0.125 2023-06-15 04:52:44,348 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=37026.666666666664, ans=0.2 2023-06-15 04:52:54,550 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-06-15 04:52:56,087 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37026.666666666664, ans=0.1 2023-06-15 04:53:17,717 INFO [train.py:988] (0/4) Epoch 11, batch 250, loss[loss=0.3154, simple_loss=0.3228, pruned_loss=0.154, over 16945.00 frames. ], tot_loss[loss=0.2992, simple_loss=0.3548, pruned_loss=0.1218, over 2694953.88 frames. ], batch size: 392, lr: 2.40e-02, grad_scale: 32.0 2023-06-15 04:53:21,640 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=37160.0, ans=0.125 2023-06-15 04:53:37,662 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=37226.666666666664, ans=0.1 2023-06-15 04:53:51,961 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=37293.333333333336, ans=0.125 2023-06-15 04:53:53,827 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=37293.333333333336, ans=0.09899494936611666 2023-06-15 04:54:01,108 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=37293.333333333336, ans=0.2 2023-06-15 04:54:03,024 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=37293.333333333336, ans=0.95 2023-06-15 04:54:06,562 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=37293.333333333336, ans=0.125 2023-06-15 04:54:10,505 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2023-06-15 04:54:13,462 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=37360.0, ans=0.125 2023-06-15 04:54:22,485 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+02 2.173e+02 2.592e+02 3.214e+02 4.591e+02, threshold=5.183e+02, percent-clipped=0.0 2023-06-15 04:54:48,166 INFO [train.py:988] (0/4) Epoch 11, batch 300, loss[loss=0.3001, simple_loss=0.356, pruned_loss=0.1221, over 19884.00 frames. ], tot_loss[loss=0.2982, simple_loss=0.3541, pruned_loss=0.1212, over 2957239.24 frames. ], batch size: 120, lr: 2.39e-02, grad_scale: 32.0 2023-06-15 04:54:50,236 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=37493.333333333336, ans=0.0 2023-06-15 04:55:06,703 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-06-15 04:55:21,640 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=37626.666666666664, ans=0.0 2023-06-15 04:55:32,630 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2023-06-15 04:55:36,226 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.17 vs. limit=22.5 2023-06-15 04:56:18,505 INFO [train.py:988] (0/4) Epoch 11, batch 350, loss[loss=0.3189, simple_loss=0.3848, pruned_loss=0.1265, over 18315.00 frames. ], tot_loss[loss=0.2982, simple_loss=0.3538, pruned_loss=0.1212, over 3135283.16 frames. ], batch size: 72, lr: 2.39e-02, grad_scale: 32.0 2023-06-15 04:56:46,965 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=37893.333333333336, ans=0.125 2023-06-15 04:57:02,223 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=37960.0, ans=0.0 2023-06-15 04:57:18,830 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=38026.666666666664, ans=0.0 2023-06-15 04:57:23,392 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+02 2.398e+02 2.846e+02 3.298e+02 5.496e+02, threshold=5.692e+02, percent-clipped=3.0 2023-06-15 04:57:48,725 INFO [train.py:988] (0/4) Epoch 11, batch 400, loss[loss=0.3106, simple_loss=0.3449, pruned_loss=0.1381, over 20236.00 frames. ], tot_loss[loss=0.2976, simple_loss=0.3537, pruned_loss=0.1207, over 3283910.48 frames. ], batch size: 239, lr: 2.38e-02, grad_scale: 32.0 2023-06-15 04:58:01,547 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=38160.0, ans=0.025 2023-06-15 04:58:24,306 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38293.333333333336, ans=0.1 2023-06-15 04:58:29,513 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2023-06-15 04:59:00,965 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=38426.666666666664, ans=0.125 2023-06-15 04:59:02,595 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=38426.666666666664, ans=0.125 2023-06-15 04:59:18,303 INFO [train.py:988] (0/4) Epoch 11, batch 450, loss[loss=0.2789, simple_loss=0.3437, pruned_loss=0.1071, over 18948.00 frames. ], tot_loss[loss=0.2974, simple_loss=0.3533, pruned_loss=0.1207, over 3391323.37 frames. ], batch size: 86, lr: 2.38e-02, grad_scale: 32.0 2023-06-15 05:00:21,921 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+02 2.134e+02 2.582e+02 3.273e+02 5.590e+02, threshold=5.163e+02, percent-clipped=0.0 2023-06-15 05:00:24,310 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-06-15 05:00:45,638 INFO [train.py:988] (0/4) Epoch 11, batch 500, loss[loss=0.2821, simple_loss=0.3356, pruned_loss=0.1143, over 20462.00 frames. ], tot_loss[loss=0.2957, simple_loss=0.3521, pruned_loss=0.1196, over 3483054.00 frames. ], batch size: 160, lr: 2.38e-02, grad_scale: 32.0 2023-06-15 05:00:52,744 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=38826.666666666664, ans=0.125 2023-06-15 05:01:05,174 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2023-06-15 05:01:23,369 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.07 vs. limit=22.5 2023-06-15 05:01:26,322 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=38960.0, ans=0.125 2023-06-15 05:01:38,974 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-11.pt 2023-06-15 05:02:04,578 INFO [train.py:988] (0/4) Epoch 12, batch 0, loss[loss=0.2856, simple_loss=0.3484, pruned_loss=0.1114, over 19466.00 frames. ], tot_loss[loss=0.2856, simple_loss=0.3484, pruned_loss=0.1114, over 19466.00 frames. ], batch size: 105, lr: 2.28e-02, grad_scale: 32.0 2023-06-15 05:02:04,579 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 05:02:10,664 INFO [train.py:1020] (0/4) Epoch 12, validation: loss=0.2286, simple_loss=0.3321, pruned_loss=0.06259, over 143649.00 frames. 2023-06-15 05:02:10,665 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 05:02:15,036 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=12.0 2023-06-15 05:02:50,647 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=39173.333333333336, ans=0.125 2023-06-15 05:03:26,291 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-06-15 05:03:39,947 INFO [train.py:988] (0/4) Epoch 12, batch 50, loss[loss=0.2867, simple_loss=0.3466, pruned_loss=0.1134, over 19983.00 frames. ], tot_loss[loss=0.2896, simple_loss=0.3454, pruned_loss=0.1169, over 873116.73 frames. ], batch size: 126, lr: 2.28e-02, grad_scale: 32.0 2023-06-15 05:03:47,409 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.648e+02 2.229e+02 2.614e+02 3.246e+02 5.755e+02, threshold=5.228e+02, percent-clipped=1.0 2023-06-15 05:03:55,155 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=39373.333333333336, ans=0.125 2023-06-15 05:03:58,556 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=39440.0, ans=0.125 2023-06-15 05:04:23,787 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2023-06-15 05:04:33,439 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.41 vs. limit=10.0 2023-06-15 05:04:37,861 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=39573.333333333336, ans=0.125 2023-06-15 05:04:53,020 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-06-15 05:05:05,304 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=39640.0, ans=0.2 2023-06-15 05:05:09,994 INFO [train.py:988] (0/4) Epoch 12, batch 100, loss[loss=0.2725, simple_loss=0.3206, pruned_loss=0.1122, over 20254.00 frames. ], tot_loss[loss=0.2919, simple_loss=0.3491, pruned_loss=0.1173, over 1512387.42 frames. ], batch size: 239, lr: 2.28e-02, grad_scale: 32.0 2023-06-15 05:05:19,508 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=39706.666666666664, ans=0.5 2023-06-15 05:05:38,922 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=15.0 2023-06-15 05:05:41,004 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.53 vs. limit=22.5 2023-06-15 05:05:45,994 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=39840.0, ans=0.002208695652173912 2023-06-15 05:06:05,102 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2023-06-15 05:06:28,962 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=39973.333333333336, ans=0.125 2023-06-15 05:06:40,215 INFO [train.py:988] (0/4) Epoch 12, batch 150, loss[loss=0.2973, simple_loss=0.3557, pruned_loss=0.1194, over 18947.00 frames. ], tot_loss[loss=0.2912, simple_loss=0.3496, pruned_loss=0.1164, over 2011587.22 frames. ], batch size: 86, lr: 2.27e-02, grad_scale: 32.0 2023-06-15 05:06:46,885 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.645e+02 2.288e+02 2.648e+02 3.128e+02 5.617e+02, threshold=5.296e+02, percent-clipped=1.0 2023-06-15 05:06:53,987 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-06-15 05:08:09,671 INFO [train.py:988] (0/4) Epoch 12, batch 200, loss[loss=0.2941, simple_loss=0.3629, pruned_loss=0.1126, over 16337.00 frames. ], tot_loss[loss=0.2894, simple_loss=0.3483, pruned_loss=0.1153, over 2391640.84 frames. ], batch size: 52, lr: 2.27e-02, grad_scale: 32.0 2023-06-15 05:08:28,696 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=40440.0, ans=0.0 2023-06-15 05:08:37,559 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=40440.0, ans=0.0 2023-06-15 05:08:44,593 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=40506.666666666664, ans=0.0 2023-06-15 05:09:39,241 INFO [train.py:988] (0/4) Epoch 12, batch 250, loss[loss=0.261, simple_loss=0.3294, pruned_loss=0.09636, over 19698.00 frames. ], tot_loss[loss=0.2895, simple_loss=0.3488, pruned_loss=0.1151, over 2696445.03 frames. ], batch size: 110, lr: 2.27e-02, grad_scale: 32.0 2023-06-15 05:09:46,505 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.507e+02 2.177e+02 2.544e+02 3.093e+02 5.809e+02, threshold=5.088e+02, percent-clipped=2.0 2023-06-15 05:09:48,559 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=40706.666666666664, ans=0.125 2023-06-15 05:09:48,777 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=40706.666666666664, ans=0.2 2023-06-15 05:10:40,074 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=40906.666666666664, ans=0.0 2023-06-15 05:10:53,918 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=40973.333333333336, ans=0.0 2023-06-15 05:10:54,204 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=40973.333333333336, ans=0.5 2023-06-15 05:11:09,728 INFO [train.py:988] (0/4) Epoch 12, batch 300, loss[loss=0.3004, simple_loss=0.3468, pruned_loss=0.127, over 20555.00 frames. ], tot_loss[loss=0.2895, simple_loss=0.3479, pruned_loss=0.1156, over 2944754.16 frames. ], batch size: 189, lr: 2.26e-02, grad_scale: 32.0 2023-06-15 05:11:13,420 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=41040.0, ans=0.0019478260869565216 2023-06-15 05:11:13,533 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=41040.0, ans=0.1 2023-06-15 05:11:14,491 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=41040.0, ans=6.0 2023-06-15 05:11:35,971 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=41106.666666666664, ans=0.125 2023-06-15 05:12:23,042 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=41306.666666666664, ans=0.5 2023-06-15 05:12:35,953 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=41306.666666666664, ans=0.0018898550724637687 2023-06-15 05:12:39,425 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=41373.333333333336, ans=0.125 2023-06-15 05:12:40,658 INFO [train.py:988] (0/4) Epoch 12, batch 350, loss[loss=0.2907, simple_loss=0.3578, pruned_loss=0.1118, over 17645.00 frames. ], tot_loss[loss=0.2891, simple_loss=0.3477, pruned_loss=0.1152, over 3133610.00 frames. ], batch size: 67, lr: 2.26e-02, grad_scale: 32.0 2023-06-15 05:12:44,492 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=41373.333333333336, ans=0.125 2023-06-15 05:12:47,498 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.640e+02 2.134e+02 2.417e+02 3.017e+02 4.561e+02, threshold=4.834e+02, percent-clipped=0.0 2023-06-15 05:13:21,446 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2023-06-15 05:14:10,563 INFO [train.py:988] (0/4) Epoch 12, batch 400, loss[loss=0.2978, simple_loss=0.3695, pruned_loss=0.1131, over 16391.00 frames. ], tot_loss[loss=0.2888, simple_loss=0.3472, pruned_loss=0.1152, over 3275430.63 frames. ], batch size: 52, lr: 2.25e-02, grad_scale: 32.0 2023-06-15 05:14:12,839 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=41706.666666666664, ans=0.0 2023-06-15 05:14:14,481 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=41706.666666666664, ans=0.125 2023-06-15 05:15:21,979 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2023-06-15 05:15:35,829 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=41973.333333333336, ans=0.2 2023-06-15 05:15:40,740 INFO [train.py:988] (0/4) Epoch 12, batch 450, loss[loss=0.2693, simple_loss=0.3353, pruned_loss=0.1017, over 19534.00 frames. ], tot_loss[loss=0.289, simple_loss=0.3471, pruned_loss=0.1155, over 3370746.68 frames. ], batch size: 102, lr: 2.25e-02, grad_scale: 32.0 2023-06-15 05:15:48,079 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.312e+02 2.678e+02 3.302e+02 6.342e+02, threshold=5.355e+02, percent-clipped=8.0 2023-06-15 05:15:48,433 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=42040.0, ans=0.1 2023-06-15 05:16:14,380 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.61 vs. limit=15.0 2023-06-15 05:16:48,050 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 05:17:08,309 INFO [train.py:988] (0/4) Epoch 12, batch 500, loss[loss=0.2945, simple_loss=0.3512, pruned_loss=0.1189, over 19703.00 frames. ], tot_loss[loss=0.2885, simple_loss=0.3468, pruned_loss=0.1151, over 3454158.81 frames. ], batch size: 110, lr: 2.25e-02, grad_scale: 32.0 2023-06-15 05:17:34,338 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=42440.0, ans=0.125 2023-06-15 05:17:35,996 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=42440.0, ans=0.001643478260869564 2023-06-15 05:17:44,101 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=42506.666666666664, ans=0.2 2023-06-15 05:18:03,345 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-12.pt 2023-06-15 05:18:28,973 INFO [train.py:988] (0/4) Epoch 13, batch 0, loss[loss=0.2784, simple_loss=0.3362, pruned_loss=0.1103, over 19540.00 frames. ], tot_loss[loss=0.2784, simple_loss=0.3362, pruned_loss=0.1103, over 19540.00 frames. ], batch size: 102, lr: 2.16e-02, grad_scale: 32.0 2023-06-15 05:18:28,974 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 05:18:35,086 INFO [train.py:1020] (0/4) Epoch 13, validation: loss=0.2246, simple_loss=0.3282, pruned_loss=0.06053, over 143649.00 frames. 2023-06-15 05:18:35,086 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 05:18:39,447 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=42593.333333333336, ans=0.0 2023-06-15 05:18:39,464 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=42593.333333333336, ans=0.0 2023-06-15 05:18:42,814 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=42593.333333333336, ans=0.0 2023-06-15 05:18:59,678 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=42660.0, ans=0.0 2023-06-15 05:19:13,300 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+02 2.235e+02 2.660e+02 3.477e+02 4.514e+02, threshold=5.320e+02, percent-clipped=0.0 2023-06-15 05:19:23,354 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-06-15 05:19:24,565 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=42726.666666666664, ans=0.5 2023-06-15 05:19:44,192 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 05:19:52,916 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=42860.0, ans=0.0 2023-06-15 05:20:04,778 INFO [train.py:988] (0/4) Epoch 13, batch 50, loss[loss=0.2842, simple_loss=0.3386, pruned_loss=0.1149, over 20329.00 frames. ], tot_loss[loss=0.289, simple_loss=0.3464, pruned_loss=0.1158, over 862922.53 frames. ], batch size: 149, lr: 2.16e-02, grad_scale: 32.0 2023-06-15 05:21:06,237 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=43126.666666666664, ans=0.125 2023-06-15 05:21:33,292 INFO [train.py:988] (0/4) Epoch 13, batch 100, loss[loss=0.2976, simple_loss=0.3637, pruned_loss=0.1157, over 19325.00 frames. ], tot_loss[loss=0.2867, simple_loss=0.3458, pruned_loss=0.1138, over 1501236.30 frames. ], batch size: 98, lr: 2.16e-02, grad_scale: 32.0 2023-06-15 05:21:36,875 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43260.0, ans=0.1 2023-06-15 05:21:51,480 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=43326.666666666664, ans=0.0 2023-06-15 05:21:57,115 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=43326.666666666664, ans=0.00145072463768116 2023-06-15 05:22:10,400 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.673e+02 1.949e+02 2.269e+02 2.644e+02 4.836e+02, threshold=4.538e+02, percent-clipped=0.0 2023-06-15 05:22:14,737 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=43393.333333333336, ans=0.2 2023-06-15 05:22:14,827 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=43393.333333333336, ans=0.125 2023-06-15 05:22:35,646 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=43460.0, ans=0.0014217391304347828 2023-06-15 05:22:44,422 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=43526.666666666664, ans=0.1 2023-06-15 05:22:44,693 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43526.666666666664, ans=0.1 2023-06-15 05:23:00,734 INFO [train.py:988] (0/4) Epoch 13, batch 150, loss[loss=0.2716, simple_loss=0.3404, pruned_loss=0.1014, over 19321.00 frames. ], tot_loss[loss=0.2863, simple_loss=0.3457, pruned_loss=0.1134, over 2009820.37 frames. ], batch size: 98, lr: 2.15e-02, grad_scale: 32.0 2023-06-15 05:23:04,860 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=43593.333333333336, ans=0.0013927536231884054 2023-06-15 05:23:06,631 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=43593.333333333336, ans=0.125 2023-06-15 05:23:09,979 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=43593.333333333336, ans=0.5 2023-06-15 05:23:11,700 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=43593.333333333336, ans=0.2 2023-06-15 05:23:14,836 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=43593.333333333336, ans=0.0 2023-06-15 05:23:56,773 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=43793.333333333336, ans=0.125 2023-06-15 05:23:57,103 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=43793.333333333336, ans=0.0 2023-06-15 05:24:02,846 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.07 vs. limit=15.0 2023-06-15 05:24:28,494 INFO [train.py:988] (0/4) Epoch 13, batch 200, loss[loss=0.2969, simple_loss=0.3097, pruned_loss=0.142, over 17018.00 frames. ], tot_loss[loss=0.2853, simple_loss=0.3439, pruned_loss=0.1133, over 2419671.70 frames. ], batch size: 391, lr: 2.15e-02, grad_scale: 32.0 2023-06-15 05:24:34,817 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-06-15 05:24:42,988 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=43926.666666666664, ans=0.125 2023-06-15 05:25:04,864 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=44060.0, ans=0.125 2023-06-15 05:25:05,895 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.630e+02 2.172e+02 2.424e+02 2.924e+02 5.184e+02, threshold=4.848e+02, percent-clipped=5.0 2023-06-15 05:25:14,581 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=44060.0, ans=0.2 2023-06-15 05:25:17,881 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=44060.0, ans=0.125 2023-06-15 05:25:21,645 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44126.666666666664, ans=0.1 2023-06-15 05:25:56,672 INFO [train.py:988] (0/4) Epoch 13, batch 250, loss[loss=0.2873, simple_loss=0.3603, pruned_loss=0.1072, over 16706.00 frames. ], tot_loss[loss=0.2852, simple_loss=0.3443, pruned_loss=0.113, over 2711233.51 frames. ], batch size: 59, lr: 2.15e-02, grad_scale: 16.0 2023-06-15 05:26:28,227 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=44326.666666666664, ans=10.0 2023-06-15 05:26:30,938 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=44393.333333333336, ans=0.04949747468305833 2023-06-15 05:26:36,149 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=44393.333333333336, ans=0.0 2023-06-15 05:26:42,874 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=44393.333333333336, ans=0.0 2023-06-15 05:26:51,901 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2023-06-15 05:26:53,214 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=44460.0, ans=0.125 2023-06-15 05:27:24,263 INFO [train.py:988] (0/4) Epoch 13, batch 300, loss[loss=0.2752, simple_loss=0.3329, pruned_loss=0.1088, over 19953.00 frames. ], tot_loss[loss=0.2839, simple_loss=0.3437, pruned_loss=0.112, over 2955552.32 frames. ], batch size: 126, lr: 2.14e-02, grad_scale: 16.0 2023-06-15 05:27:41,958 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=44660.0, ans=0.5 2023-06-15 05:27:54,994 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=44660.0, ans=0.125 2023-06-15 05:28:03,430 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+02 2.137e+02 2.490e+02 3.192e+02 5.767e+02, threshold=4.980e+02, percent-clipped=3.0 2023-06-15 05:28:52,650 INFO [train.py:988] (0/4) Epoch 13, batch 350, loss[loss=0.2694, simple_loss=0.3335, pruned_loss=0.1027, over 19527.00 frames. ], tot_loss[loss=0.2833, simple_loss=0.3437, pruned_loss=0.1114, over 3151849.62 frames. ], batch size: 102, lr: 2.14e-02, grad_scale: 16.0 2023-06-15 05:29:06,672 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=44926.666666666664, ans=0.2 2023-06-15 05:29:34,775 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=45060.0, ans=0.125 2023-06-15 05:30:19,948 INFO [train.py:988] (0/4) Epoch 13, batch 400, loss[loss=0.2771, simple_loss=0.3339, pruned_loss=0.1101, over 19929.00 frames. ], tot_loss[loss=0.2829, simple_loss=0.3433, pruned_loss=0.1113, over 3299766.40 frames. ], batch size: 126, lr: 2.14e-02, grad_scale: 32.0 2023-06-15 05:30:55,510 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=45393.333333333336, ans=0.125 2023-06-15 05:30:59,583 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 2.082e+02 2.371e+02 2.770e+02 5.646e+02, threshold=4.742e+02, percent-clipped=0.0 2023-06-15 05:30:59,958 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=45393.333333333336, ans=0.0 2023-06-15 05:31:05,329 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=45393.333333333336, ans=0.0010014492753623178 2023-06-15 05:31:06,589 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=45393.333333333336, ans=0.125 2023-06-15 05:31:12,426 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=45460.0, ans=10.0 2023-06-15 05:31:48,748 INFO [train.py:988] (0/4) Epoch 13, batch 450, loss[loss=0.2744, simple_loss=0.3428, pruned_loss=0.103, over 19686.00 frames. ], tot_loss[loss=0.2827, simple_loss=0.3435, pruned_loss=0.111, over 3391964.56 frames. ], batch size: 110, lr: 2.13e-02, grad_scale: 32.0 2023-06-15 05:31:49,224 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45593.333333333336, ans=0.1 2023-06-15 05:31:56,528 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.45 vs. limit=22.5 2023-06-15 05:32:11,849 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=45660.0, ans=0.0009434782608695649 2023-06-15 05:32:45,483 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=45793.333333333336, ans=0.125 2023-06-15 05:32:55,537 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-06-15 05:33:13,471 INFO [train.py:988] (0/4) Epoch 13, batch 500, loss[loss=0.2943, simple_loss=0.3483, pruned_loss=0.1202, over 20291.00 frames. ], tot_loss[loss=0.2819, simple_loss=0.3427, pruned_loss=0.1106, over 3498525.42 frames. ], batch size: 149, lr: 2.13e-02, grad_scale: 32.0 2023-06-15 05:33:49,314 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+02 2.112e+02 2.450e+02 3.177e+02 4.704e+02, threshold=4.901e+02, percent-clipped=1.0 2023-06-15 05:33:58,093 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2023-06-15 05:34:05,337 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-13.pt 2023-06-15 05:34:30,992 INFO [train.py:988] (0/4) Epoch 14, batch 0, loss[loss=0.2785, simple_loss=0.336, pruned_loss=0.1105, over 20315.00 frames. ], tot_loss[loss=0.2785, simple_loss=0.336, pruned_loss=0.1105, over 20315.00 frames. ], batch size: 141, lr: 2.05e-02, grad_scale: 32.0 2023-06-15 05:34:30,993 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 05:34:37,020 INFO [train.py:1020] (0/4) Epoch 14, validation: loss=0.2205, simple_loss=0.3248, pruned_loss=0.05804, over 143649.00 frames. 2023-06-15 05:34:37,021 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 05:35:36,650 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=46340.0, ans=0.0 2023-06-15 05:36:03,637 INFO [train.py:988] (0/4) Epoch 14, batch 50, loss[loss=0.2822, simple_loss=0.3526, pruned_loss=0.1059, over 16729.00 frames. ], tot_loss[loss=0.2802, simple_loss=0.3406, pruned_loss=0.1099, over 846593.58 frames. ], batch size: 59, lr: 2.05e-02, grad_scale: 32.0 2023-06-15 05:36:12,987 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=46473.333333333336, ans=0.125 2023-06-15 05:36:23,662 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=46540.0, ans=0.125 2023-06-15 05:36:33,414 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=46540.0, ans=0.0007521739130434777 2023-06-15 05:36:37,237 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=46540.0, ans=22.5 2023-06-15 05:36:50,309 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2023-06-15 05:37:14,889 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+02 2.122e+02 2.332e+02 2.601e+02 5.252e+02, threshold=4.663e+02, percent-clipped=1.0 2023-06-15 05:37:15,377 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=46740.0, ans=0.0 2023-06-15 05:37:32,790 INFO [train.py:988] (0/4) Epoch 14, batch 100, loss[loss=0.2796, simple_loss=0.3327, pruned_loss=0.1132, over 20549.00 frames. ], tot_loss[loss=0.2771, simple_loss=0.3392, pruned_loss=0.1075, over 1512599.82 frames. ], batch size: 189, lr: 2.05e-02, grad_scale: 32.0 2023-06-15 05:37:42,909 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=46806.666666666664, ans=0.0 2023-06-15 05:38:01,066 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=46873.333333333336, ans=0.05 2023-06-15 05:38:13,012 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2023-06-15 05:38:14,993 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-06-15 05:38:18,926 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2023-06-15 05:38:24,576 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=47006.666666666664, ans=0.125 2023-06-15 05:38:43,378 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=47073.333333333336, ans=0.125 2023-06-15 05:38:49,453 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-06-15 05:38:52,929 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=47073.333333333336, ans=0.0006362318840579702 2023-06-15 05:39:01,372 INFO [train.py:988] (0/4) Epoch 14, batch 150, loss[loss=0.2642, simple_loss=0.3283, pruned_loss=0.1001, over 18612.00 frames. ], tot_loss[loss=0.2777, simple_loss=0.3398, pruned_loss=0.1079, over 1998618.99 frames. ], batch size: 80, lr: 2.04e-02, grad_scale: 32.0 2023-06-15 05:39:05,644 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-06-15 05:40:11,168 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.584e+02 2.106e+02 2.352e+02 2.722e+02 4.842e+02, threshold=4.704e+02, percent-clipped=2.0 2023-06-15 05:40:18,254 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=47406.666666666664, ans=0.125 2023-06-15 05:40:21,950 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=47406.666666666664, ans=0.0 2023-06-15 05:40:28,795 INFO [train.py:988] (0/4) Epoch 14, batch 200, loss[loss=0.254, simple_loss=0.3227, pruned_loss=0.0926, over 19087.00 frames. ], tot_loss[loss=0.2776, simple_loss=0.3398, pruned_loss=0.1077, over 2374906.57 frames. ], batch size: 89, lr: 2.04e-02, grad_scale: 32.0 2023-06-15 05:40:38,138 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=47473.333333333336, ans=10.0 2023-06-15 05:40:50,516 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=47540.0, ans=0.125 2023-06-15 05:40:54,586 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.70 vs. limit=10.0 2023-06-15 05:40:55,419 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=47540.0, ans=0.1 2023-06-15 05:40:56,829 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=47540.0, ans=0.125 2023-06-15 05:41:17,948 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 05:41:35,081 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-06-15 05:41:47,672 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=47740.0, ans=0.125 2023-06-15 05:41:56,671 INFO [train.py:988] (0/4) Epoch 14, batch 250, loss[loss=0.2971, simple_loss=0.3514, pruned_loss=0.1214, over 19975.00 frames. ], tot_loss[loss=0.278, simple_loss=0.3394, pruned_loss=0.1082, over 2681604.25 frames. ], batch size: 126, lr: 2.04e-02, grad_scale: 32.0 2023-06-15 05:41:57,300 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-06-15 05:42:02,176 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=47806.666666666664, ans=0.125 2023-06-15 05:42:05,007 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.77 vs. limit=22.5 2023-06-15 05:42:08,272 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.35 vs. limit=22.5 2023-06-15 05:43:06,524 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.140e+02 2.401e+02 2.980e+02 6.123e+02, threshold=4.801e+02, percent-clipped=4.0 2023-06-15 05:43:06,957 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=48073.333333333336, ans=0.125 2023-06-15 05:43:07,342 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-06-15 05:43:14,431 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=48073.333333333336, ans=15.0 2023-06-15 05:43:23,497 INFO [train.py:988] (0/4) Epoch 14, batch 300, loss[loss=0.2621, simple_loss=0.3317, pruned_loss=0.09626, over 18918.00 frames. ], tot_loss[loss=0.2779, simple_loss=0.3397, pruned_loss=0.108, over 2930119.81 frames. ], batch size: 86, lr: 2.03e-02, grad_scale: 32.0 2023-06-15 05:43:39,188 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2023-06-15 05:44:20,286 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=48340.0, ans=0.125 2023-06-15 05:44:35,834 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=48406.666666666664, ans=0.125 2023-06-15 05:44:42,459 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48406.666666666664, ans=0.1 2023-06-15 05:44:50,834 INFO [train.py:988] (0/4) Epoch 14, batch 350, loss[loss=0.269, simple_loss=0.3383, pruned_loss=0.09987, over 18634.00 frames. ], tot_loss[loss=0.2772, simple_loss=0.3392, pruned_loss=0.1076, over 3121628.10 frames. ], batch size: 80, lr: 2.03e-02, grad_scale: 32.0 2023-06-15 05:45:00,586 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=10.0 2023-06-15 05:45:02,772 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2023-06-15 05:45:55,052 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=48673.333333333336, ans=0.0 2023-06-15 05:45:55,225 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=48673.333333333336, ans=0.125 2023-06-15 05:45:59,719 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.587e+02 2.238e+02 2.722e+02 3.535e+02 5.292e+02, threshold=5.444e+02, percent-clipped=1.0 2023-06-15 05:46:11,711 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=48740.0, ans=0.05 2023-06-15 05:46:16,360 INFO [train.py:988] (0/4) Epoch 14, batch 400, loss[loss=0.3116, simple_loss=0.3451, pruned_loss=0.1391, over 20006.00 frames. ], tot_loss[loss=0.2764, simple_loss=0.339, pruned_loss=0.1069, over 3266407.07 frames. ], batch size: 294, lr: 2.03e-02, grad_scale: 32.0 2023-06-15 05:46:51,954 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=48940.0, ans=0.07 2023-06-15 05:47:12,751 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=49006.666666666664, ans=0.05 2023-06-15 05:47:17,631 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=49006.666666666664, ans=0.0 2023-06-15 05:47:28,901 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.60 vs. limit=22.5 2023-06-15 05:47:39,484 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=49073.333333333336, ans=0.0 2023-06-15 05:47:42,468 INFO [train.py:988] (0/4) Epoch 14, batch 450, loss[loss=0.2922, simple_loss=0.3429, pruned_loss=0.1208, over 20725.00 frames. ], tot_loss[loss=0.2764, simple_loss=0.339, pruned_loss=0.1069, over 3391983.85 frames. ], batch size: 211, lr: 2.02e-02, grad_scale: 32.0 2023-06-15 05:47:46,766 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-06-15 05:47:56,171 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-06-15 05:48:20,119 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2023-06-15 05:48:24,596 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=49273.333333333336, ans=0.125 2023-06-15 05:48:50,094 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+02 2.110e+02 2.476e+02 3.051e+02 4.850e+02, threshold=4.953e+02, percent-clipped=0.0 2023-06-15 05:48:56,206 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-06-15 05:49:06,429 INFO [train.py:988] (0/4) Epoch 14, batch 500, loss[loss=0.3005, simple_loss=0.3701, pruned_loss=0.1155, over 18296.00 frames. ], tot_loss[loss=0.2761, simple_loss=0.3389, pruned_loss=0.1067, over 3464774.86 frames. ], batch size: 72, lr: 2.02e-02, grad_scale: 32.0 2023-06-15 05:49:12,139 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=49473.333333333336, ans=10.0 2023-06-15 05:49:25,047 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=49540.0, ans=9.99999999999994e-05 2023-06-15 05:49:59,468 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-14.pt 2023-06-15 05:50:25,274 INFO [train.py:988] (0/4) Epoch 15, batch 0, loss[loss=0.2949, simple_loss=0.345, pruned_loss=0.1224, over 20719.00 frames. ], tot_loss[loss=0.2949, simple_loss=0.345, pruned_loss=0.1224, over 20719.00 frames. ], batch size: 211, lr: 1.95e-02, grad_scale: 32.0 2023-06-15 05:50:25,275 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 05:50:31,410 INFO [train.py:1020] (0/4) Epoch 15, validation: loss=0.2189, simple_loss=0.3232, pruned_loss=0.05727, over 143649.00 frames. 2023-06-15 05:50:31,410 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 05:50:36,657 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=49693.333333333336, ans=0.07 2023-06-15 05:51:07,653 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49826.666666666664, ans=0.1 2023-06-15 05:51:57,523 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=50026.666666666664, ans=0.125 2023-06-15 05:51:58,715 INFO [train.py:988] (0/4) Epoch 15, batch 50, loss[loss=0.256, simple_loss=0.3235, pruned_loss=0.09423, over 19869.00 frames. ], tot_loss[loss=0.2768, simple_loss=0.3352, pruned_loss=0.1092, over 867325.04 frames. ], batch size: 120, lr: 1.95e-02, grad_scale: 32.0 2023-06-15 05:52:03,997 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=50026.666666666664, ans=0.0 2023-06-15 05:52:10,523 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+02 2.143e+02 2.454e+02 2.855e+02 6.420e+02, threshold=4.907e+02, percent-clipped=3.0 2023-06-15 05:52:41,374 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=50160.0, ans=0.0 2023-06-15 05:53:03,962 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=22.5 2023-06-15 05:53:12,062 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=50293.333333333336, ans=0.0 2023-06-15 05:53:14,108 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=50293.333333333336, ans=0.125 2023-06-15 05:53:26,701 INFO [train.py:988] (0/4) Epoch 15, batch 100, loss[loss=0.2926, simple_loss=0.3504, pruned_loss=0.1174, over 20521.00 frames. ], tot_loss[loss=0.2752, simple_loss=0.334, pruned_loss=0.1082, over 1523834.03 frames. ], batch size: 160, lr: 1.95e-02, grad_scale: 32.0 2023-06-15 05:53:43,652 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=50426.666666666664, ans=0.125 2023-06-15 05:53:58,676 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=50493.333333333336, ans=0.1 2023-06-15 05:54:54,016 INFO [train.py:988] (0/4) Epoch 15, batch 150, loss[loss=0.2556, simple_loss=0.3254, pruned_loss=0.09292, over 19810.00 frames. ], tot_loss[loss=0.2717, simple_loss=0.333, pruned_loss=0.1052, over 2037172.55 frames. ], batch size: 115, lr: 1.94e-02, grad_scale: 32.0 2023-06-15 05:55:06,225 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.090e+02 2.403e+02 2.891e+02 4.165e+02, threshold=4.806e+02, percent-clipped=0.0 2023-06-15 05:55:52,200 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=50893.333333333336, ans=0.125 2023-06-15 05:56:03,760 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=50960.0, ans=0.035 2023-06-15 05:56:22,228 INFO [train.py:988] (0/4) Epoch 15, batch 200, loss[loss=0.2831, simple_loss=0.3477, pruned_loss=0.1092, over 18315.00 frames. ], tot_loss[loss=0.2711, simple_loss=0.3328, pruned_loss=0.1047, over 2417266.72 frames. ], batch size: 74, lr: 1.94e-02, grad_scale: 32.0 2023-06-15 05:56:33,175 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=51026.666666666664, ans=0.1 2023-06-15 05:56:46,820 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=51093.333333333336, ans=0.125 2023-06-15 05:57:14,994 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51226.666666666664, ans=0.1 2023-06-15 05:57:35,919 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=51293.333333333336, ans=0.0 2023-06-15 05:57:50,247 INFO [train.py:988] (0/4) Epoch 15, batch 250, loss[loss=0.3027, simple_loss=0.3801, pruned_loss=0.1127, over 17646.00 frames. ], tot_loss[loss=0.272, simple_loss=0.3344, pruned_loss=0.1048, over 2730593.67 frames. ], batch size: 67, lr: 1.94e-02, grad_scale: 32.0 2023-06-15 05:58:03,087 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.609e+02 1.983e+02 2.255e+02 2.708e+02 4.170e+02, threshold=4.510e+02, percent-clipped=0.0 2023-06-15 05:58:30,546 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=51493.333333333336, ans=0.0 2023-06-15 05:59:01,910 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=51626.666666666664, ans=0.125 2023-06-15 05:59:18,435 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=51693.333333333336, ans=0.125 2023-06-15 05:59:19,779 INFO [train.py:988] (0/4) Epoch 15, batch 300, loss[loss=0.236, simple_loss=0.3081, pruned_loss=0.08197, over 19870.00 frames. ], tot_loss[loss=0.2719, simple_loss=0.3341, pruned_loss=0.1049, over 2963411.07 frames. ], batch size: 120, lr: 1.93e-02, grad_scale: 32.0 2023-06-15 05:59:23,093 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-06-15 05:59:23,899 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51693.333333333336, ans=0.1 2023-06-15 05:59:39,346 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=51760.0, ans=0.1 2023-06-15 05:59:41,184 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=51760.0, ans=0.125 2023-06-15 05:59:50,090 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.98 vs. limit=15.0 2023-06-15 05:59:52,833 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=51826.666666666664, ans=0.0 2023-06-15 05:59:52,849 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51826.666666666664, ans=0.1 2023-06-15 06:00:07,116 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=51826.666666666664, ans=0.125 2023-06-15 06:00:32,197 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=51960.0, ans=0.125 2023-06-15 06:00:47,516 INFO [train.py:988] (0/4) Epoch 15, batch 350, loss[loss=0.2904, simple_loss=0.3451, pruned_loss=0.1179, over 20142.00 frames. ], tot_loss[loss=0.2717, simple_loss=0.3348, pruned_loss=0.1043, over 3126523.22 frames. ], batch size: 133, lr: 1.93e-02, grad_scale: 32.0 2023-06-15 06:00:52,971 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=52026.666666666664, ans=0.125 2023-06-15 06:01:00,528 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+02 2.062e+02 2.429e+02 2.907e+02 4.781e+02, threshold=4.857e+02, percent-clipped=2.0 2023-06-15 06:01:10,622 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=52093.333333333336, ans=0.0 2023-06-15 06:01:57,154 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=52293.333333333336, ans=0.2 2023-06-15 06:01:58,023 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2023-06-15 06:01:58,829 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=52293.333333333336, ans=0.0 2023-06-15 06:02:11,094 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=52293.333333333336, ans=0.125 2023-06-15 06:02:13,330 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=52293.333333333336, ans=0.125 2023-06-15 06:02:16,545 INFO [train.py:988] (0/4) Epoch 15, batch 400, loss[loss=0.2666, simple_loss=0.3313, pruned_loss=0.101, over 18940.00 frames. ], tot_loss[loss=0.2717, simple_loss=0.3348, pruned_loss=0.1042, over 3284236.71 frames. ], batch size: 86, lr: 1.93e-02, grad_scale: 32.0 2023-06-15 06:02:44,205 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=52426.666666666664, ans=0.125 2023-06-15 06:03:00,375 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=52493.333333333336, ans=0.05 2023-06-15 06:03:43,713 INFO [train.py:988] (0/4) Epoch 15, batch 450, loss[loss=0.264, simple_loss=0.3366, pruned_loss=0.09571, over 19219.00 frames. ], tot_loss[loss=0.2713, simple_loss=0.335, pruned_loss=0.1039, over 3401187.75 frames. ], batch size: 92, lr: 1.92e-02, grad_scale: 32.0 2023-06-15 06:03:56,849 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+02 2.130e+02 2.421e+02 3.094e+02 4.907e+02, threshold=4.841e+02, percent-clipped=1.0 2023-06-15 06:03:59,663 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=52693.333333333336, ans=0.125 2023-06-15 06:04:06,547 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2023-06-15 06:04:15,574 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.22 vs. limit=22.5 2023-06-15 06:04:23,282 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=52826.666666666664, ans=0.07 2023-06-15 06:04:34,928 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=52893.333333333336, ans=0.05 2023-06-15 06:04:45,255 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=52893.333333333336, ans=0.2 2023-06-15 06:05:10,592 INFO [train.py:988] (0/4) Epoch 15, batch 500, loss[loss=0.2731, simple_loss=0.3295, pruned_loss=0.1083, over 20068.00 frames. ], tot_loss[loss=0.2714, simple_loss=0.3347, pruned_loss=0.1041, over 3479939.22 frames. ], batch size: 133, lr: 1.92e-02, grad_scale: 32.0 2023-06-15 06:05:14,120 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=53026.666666666664, ans=0.0 2023-06-15 06:05:40,223 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2023-06-15 06:05:42,904 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=53160.0, ans=0.125 2023-06-15 06:06:02,556 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-15.pt 2023-06-15 06:06:28,443 INFO [train.py:988] (0/4) Epoch 16, batch 0, loss[loss=0.2667, simple_loss=0.3301, pruned_loss=0.1017, over 20455.00 frames. ], tot_loss[loss=0.2667, simple_loss=0.3301, pruned_loss=0.1017, over 20455.00 frames. ], batch size: 160, lr: 1.86e-02, grad_scale: 32.0 2023-06-15 06:06:28,444 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 06:06:34,509 INFO [train.py:1020] (0/4) Epoch 16, validation: loss=0.2134, simple_loss=0.3194, pruned_loss=0.05367, over 143649.00 frames. 2023-06-15 06:06:34,510 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 06:06:54,049 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=53306.666666666664, ans=0.0 2023-06-15 06:06:57,325 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/checkpoint-8000.pt 2023-06-15 06:07:19,736 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+02 2.212e+02 2.676e+02 3.191e+02 5.269e+02, threshold=5.353e+02, percent-clipped=1.0 2023-06-15 06:08:03,128 INFO [train.py:988] (0/4) Epoch 16, batch 50, loss[loss=0.2789, simple_loss=0.3276, pruned_loss=0.1151, over 20223.00 frames. ], tot_loss[loss=0.2705, simple_loss=0.3329, pruned_loss=0.104, over 839234.49 frames. ], batch size: 239, lr: 1.86e-02, grad_scale: 32.0 2023-06-15 06:08:53,837 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=53773.333333333336, ans=0.2 2023-06-15 06:09:19,888 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=53840.0, ans=0.025 2023-06-15 06:09:29,420 INFO [train.py:988] (0/4) Epoch 16, batch 100, loss[loss=0.2808, simple_loss=0.3341, pruned_loss=0.1138, over 19948.00 frames. ], tot_loss[loss=0.2708, simple_loss=0.335, pruned_loss=0.1033, over 1484621.04 frames. ], batch size: 126, lr: 1.85e-02, grad_scale: 32.0 2023-06-15 06:10:12,487 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.622e+02 1.999e+02 2.214e+02 2.667e+02 3.874e+02, threshold=4.428e+02, percent-clipped=0.0 2023-06-15 06:10:21,659 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=54106.666666666664, ans=0.125 2023-06-15 06:10:55,313 INFO [train.py:988] (0/4) Epoch 16, batch 150, loss[loss=0.2974, simple_loss=0.3705, pruned_loss=0.1121, over 16208.00 frames. ], tot_loss[loss=0.2691, simple_loss=0.3335, pruned_loss=0.1024, over 1996117.53 frames. ], batch size: 52, lr: 1.85e-02, grad_scale: 32.0 2023-06-15 06:11:00,590 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=54240.0, ans=0.125 2023-06-15 06:11:29,391 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=54373.333333333336, ans=0.2 2023-06-15 06:11:35,812 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=54373.333333333336, ans=0.125 2023-06-15 06:11:44,424 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=54373.333333333336, ans=0.1 2023-06-15 06:11:53,074 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=54440.0, ans=0.0 2023-06-15 06:11:59,768 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 06:12:14,406 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=54506.666666666664, ans=0.015 2023-06-15 06:12:22,651 INFO [train.py:988] (0/4) Epoch 16, batch 200, loss[loss=0.2635, simple_loss=0.3249, pruned_loss=0.101, over 20327.00 frames. ], tot_loss[loss=0.2682, simple_loss=0.3321, pruned_loss=0.1021, over 2405877.57 frames. ], batch size: 141, lr: 1.85e-02, grad_scale: 32.0 2023-06-15 06:12:26,101 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=54573.333333333336, ans=0.125 2023-06-15 06:12:40,251 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=12.0 2023-06-15 06:13:05,778 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.168e+02 2.479e+02 3.039e+02 4.350e+02, threshold=4.958e+02, percent-clipped=0.0 2023-06-15 06:13:06,429 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-06-15 06:13:13,586 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=12.0 2023-06-15 06:13:18,837 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=54773.333333333336, ans=0.125 2023-06-15 06:13:29,428 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-06-15 06:13:50,152 INFO [train.py:988] (0/4) Epoch 16, batch 250, loss[loss=0.2658, simple_loss=0.3297, pruned_loss=0.101, over 20285.00 frames. ], tot_loss[loss=0.2683, simple_loss=0.3324, pruned_loss=0.1021, over 2722421.44 frames. ], batch size: 141, lr: 1.85e-02, grad_scale: 32.0 2023-06-15 06:13:53,826 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=54906.666666666664, ans=0.125 2023-06-15 06:14:00,411 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=54906.666666666664, ans=0.125 2023-06-15 06:14:13,017 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=54973.333333333336, ans=15.0 2023-06-15 06:14:19,923 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=54973.333333333336, ans=0.0 2023-06-15 06:14:21,920 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54973.333333333336, ans=0.1 2023-06-15 06:14:42,904 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=55106.666666666664, ans=0.125 2023-06-15 06:15:15,820 INFO [train.py:988] (0/4) Epoch 16, batch 300, loss[loss=0.2547, simple_loss=0.3226, pruned_loss=0.0934, over 19317.00 frames. ], tot_loss[loss=0.2673, simple_loss=0.3316, pruned_loss=0.1015, over 2963380.68 frames. ], batch size: 98, lr: 1.84e-02, grad_scale: 32.0 2023-06-15 06:15:17,845 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=55240.0, ans=0.0 2023-06-15 06:15:30,708 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=55240.0, ans=0.125 2023-06-15 06:15:59,333 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+02 2.170e+02 2.637e+02 3.215e+02 4.848e+02, threshold=5.274e+02, percent-clipped=0.0 2023-06-15 06:16:38,266 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=55506.666666666664, ans=0.0 2023-06-15 06:16:39,991 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=55506.666666666664, ans=0.125 2023-06-15 06:16:43,036 INFO [train.py:988] (0/4) Epoch 16, batch 350, loss[loss=0.2535, simple_loss=0.3281, pruned_loss=0.08947, over 19827.00 frames. ], tot_loss[loss=0.2663, simple_loss=0.3316, pruned_loss=0.1005, over 3147597.12 frames. ], batch size: 115, lr: 1.84e-02, grad_scale: 32.0 2023-06-15 06:17:06,467 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=55640.0, ans=0.125 2023-06-15 06:17:30,861 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2023-06-15 06:18:10,031 INFO [train.py:988] (0/4) Epoch 16, batch 400, loss[loss=0.2485, simple_loss=0.3181, pruned_loss=0.08946, over 19652.00 frames. ], tot_loss[loss=0.2663, simple_loss=0.3317, pruned_loss=0.1005, over 3294374.53 frames. ], batch size: 110, lr: 1.84e-02, grad_scale: 32.0 2023-06-15 06:18:28,229 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=55973.333333333336, ans=0.125 2023-06-15 06:18:33,421 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=55973.333333333336, ans=0.125 2023-06-15 06:18:53,436 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.566e+02 1.986e+02 2.238e+02 2.515e+02 3.872e+02, threshold=4.476e+02, percent-clipped=0.0 2023-06-15 06:19:08,330 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=12.0 2023-06-15 06:19:26,751 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2023-06-15 06:19:32,439 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=56173.333333333336, ans=0.125 2023-06-15 06:19:36,764 INFO [train.py:988] (0/4) Epoch 16, batch 450, loss[loss=0.2983, simple_loss=0.3687, pruned_loss=0.1139, over 17606.00 frames. ], tot_loss[loss=0.266, simple_loss=0.3311, pruned_loss=0.1005, over 3415588.64 frames. ], batch size: 67, lr: 1.83e-02, grad_scale: 32.0 2023-06-15 06:19:47,333 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2023-06-15 06:19:48,834 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=56240.0, ans=0.2 2023-06-15 06:19:55,923 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=56306.666666666664, ans=0.2 2023-06-15 06:20:01,698 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-06-15 06:20:33,463 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2023-06-15 06:20:48,881 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=56506.666666666664, ans=0.125 2023-06-15 06:20:53,933 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=15.0 2023-06-15 06:20:59,733 INFO [train.py:988] (0/4) Epoch 16, batch 500, loss[loss=0.2957, simple_loss=0.3484, pruned_loss=0.1215, over 19969.00 frames. ], tot_loss[loss=0.2657, simple_loss=0.3313, pruned_loss=0.1001, over 3501187.07 frames. ], batch size: 126, lr: 1.83e-02, grad_scale: 32.0 2023-06-15 06:21:07,971 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 06:21:11,094 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=56573.333333333336, ans=0.125 2023-06-15 06:21:40,073 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.580e+02 2.020e+02 2.317e+02 2.823e+02 4.617e+02, threshold=4.634e+02, percent-clipped=2.0 2023-06-15 06:21:50,930 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-16.pt 2023-06-15 06:22:11,794 INFO [train.py:988] (0/4) Epoch 17, batch 0, loss[loss=0.2574, simple_loss=0.3348, pruned_loss=0.08997, over 19822.00 frames. ], tot_loss[loss=0.2574, simple_loss=0.3348, pruned_loss=0.08997, over 19822.00 frames. ], batch size: 120, lr: 1.78e-02, grad_scale: 32.0 2023-06-15 06:22:11,795 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 06:22:17,845 INFO [train.py:1020] (0/4) Epoch 17, validation: loss=0.2144, simple_loss=0.3175, pruned_loss=0.05564, over 143649.00 frames. 2023-06-15 06:22:17,846 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 06:22:28,118 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-06-15 06:22:36,240 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=56853.333333333336, ans=0.125 2023-06-15 06:22:42,752 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=56853.333333333336, ans=0.07 2023-06-15 06:22:55,458 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=56920.0, ans=0.125 2023-06-15 06:23:26,845 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=57053.333333333336, ans=0.0 2023-06-15 06:23:45,531 INFO [train.py:988] (0/4) Epoch 17, batch 50, loss[loss=0.272, simple_loss=0.33, pruned_loss=0.107, over 19961.00 frames. ], tot_loss[loss=0.2619, simple_loss=0.3274, pruned_loss=0.09818, over 868415.97 frames. ], batch size: 126, lr: 1.77e-02, grad_scale: 32.0 2023-06-15 06:24:05,497 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=57186.666666666664, ans=0.0 2023-06-15 06:24:29,926 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=57253.333333333336, ans=0.125 2023-06-15 06:24:37,451 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=57320.0, ans=0.09899494936611666 2023-06-15 06:24:53,600 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=57320.0, ans=0.0 2023-06-15 06:24:54,183 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-06-15 06:25:01,586 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.492e+02 2.063e+02 2.327e+02 2.647e+02 3.796e+02, threshold=4.655e+02, percent-clipped=0.0 2023-06-15 06:25:09,354 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=57386.666666666664, ans=0.125 2023-06-15 06:25:14,202 INFO [train.py:988] (0/4) Epoch 17, batch 100, loss[loss=0.2649, simple_loss=0.3267, pruned_loss=0.1016, over 19830.00 frames. ], tot_loss[loss=0.2658, simple_loss=0.3306, pruned_loss=0.1005, over 1498019.82 frames. ], batch size: 115, lr: 1.77e-02, grad_scale: 32.0 2023-06-15 06:25:39,882 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=57520.0, ans=0.125 2023-06-15 06:25:41,611 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=57520.0, ans=0.0 2023-06-15 06:26:32,879 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=57720.0, ans=0.125 2023-06-15 06:26:42,433 INFO [train.py:988] (0/4) Epoch 17, batch 150, loss[loss=0.2453, simple_loss=0.3199, pruned_loss=0.08538, over 18935.00 frames. ], tot_loss[loss=0.264, simple_loss=0.3292, pruned_loss=0.09941, over 2020464.44 frames. ], batch size: 86, lr: 1.77e-02, grad_scale: 64.0 2023-06-15 06:26:49,401 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.15 vs. limit=15.0 2023-06-15 06:27:39,186 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=57986.666666666664, ans=0.125 2023-06-15 06:27:56,808 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2023-06-15 06:27:57,483 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+02 2.306e+02 2.737e+02 3.252e+02 5.355e+02, threshold=5.474e+02, percent-clipped=3.0 2023-06-15 06:28:04,520 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=58053.333333333336, ans=0.05 2023-06-15 06:28:09,791 INFO [train.py:988] (0/4) Epoch 17, batch 200, loss[loss=0.2834, simple_loss=0.3543, pruned_loss=0.1063, over 17161.00 frames. ], tot_loss[loss=0.2654, simple_loss=0.3307, pruned_loss=0.1001, over 2407797.73 frames. ], batch size: 60, lr: 1.76e-02, grad_scale: 64.0 2023-06-15 06:28:18,621 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=58120.0, ans=0.1 2023-06-15 06:28:24,288 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=58120.0, ans=0.125 2023-06-15 06:28:32,486 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-06-15 06:28:35,591 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=58186.666666666664, ans=0.125 2023-06-15 06:28:38,975 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=58186.666666666664, ans=0.95 2023-06-15 06:28:52,334 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=58253.333333333336, ans=0.0 2023-06-15 06:29:01,689 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-06-15 06:29:38,285 INFO [train.py:988] (0/4) Epoch 17, batch 250, loss[loss=0.2537, simple_loss=0.3288, pruned_loss=0.08929, over 18292.00 frames. ], tot_loss[loss=0.2643, simple_loss=0.3293, pruned_loss=0.09965, over 2713923.24 frames. ], batch size: 74, lr: 1.76e-02, grad_scale: 64.0 2023-06-15 06:29:42,075 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=58453.333333333336, ans=0.125 2023-06-15 06:30:33,145 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=58653.333333333336, ans=0.125 2023-06-15 06:30:54,266 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.584e+02 1.968e+02 2.205e+02 2.465e+02 3.814e+02, threshold=4.411e+02, percent-clipped=0.0 2023-06-15 06:31:06,218 INFO [train.py:988] (0/4) Epoch 17, batch 300, loss[loss=0.26, simple_loss=0.3216, pruned_loss=0.09921, over 19335.00 frames. ], tot_loss[loss=0.2635, simple_loss=0.3279, pruned_loss=0.09957, over 2952789.92 frames. ], batch size: 98, lr: 1.76e-02, grad_scale: 64.0 2023-06-15 06:31:27,890 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=58853.333333333336, ans=0.0 2023-06-15 06:31:31,029 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-06-15 06:31:37,756 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-06-15 06:32:10,769 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=58986.666666666664, ans=0.0 2023-06-15 06:32:28,982 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=59053.333333333336, ans=0.125 2023-06-15 06:32:33,711 INFO [train.py:988] (0/4) Epoch 17, batch 350, loss[loss=0.2538, simple_loss=0.3296, pruned_loss=0.08904, over 18786.00 frames. ], tot_loss[loss=0.2631, simple_loss=0.3279, pruned_loss=0.09917, over 3139167.22 frames. ], batch size: 83, lr: 1.76e-02, grad_scale: 64.0 2023-06-15 06:32:46,433 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=59120.0, ans=0.125 2023-06-15 06:33:03,251 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59186.666666666664, ans=0.1 2023-06-15 06:33:14,760 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=59253.333333333336, ans=0.0 2023-06-15 06:33:49,928 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.577e+02 2.042e+02 2.346e+02 2.711e+02 3.857e+02, threshold=4.693e+02, percent-clipped=0.0 2023-06-15 06:34:01,699 INFO [train.py:988] (0/4) Epoch 17, batch 400, loss[loss=0.2982, simple_loss=0.366, pruned_loss=0.1152, over 16286.00 frames. ], tot_loss[loss=0.2626, simple_loss=0.3279, pruned_loss=0.09866, over 3278492.43 frames. ], batch size: 52, lr: 1.75e-02, grad_scale: 64.0 2023-06-15 06:34:20,208 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=59520.0, ans=0.125 2023-06-15 06:34:39,910 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=59586.666666666664, ans=0.1 2023-06-15 06:35:00,348 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=59653.333333333336, ans=0.0 2023-06-15 06:35:24,898 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=59720.0, ans=0.125 2023-06-15 06:35:27,598 INFO [train.py:988] (0/4) Epoch 17, batch 450, loss[loss=0.2344, simple_loss=0.3029, pruned_loss=0.08295, over 19667.00 frames. ], tot_loss[loss=0.2615, simple_loss=0.3276, pruned_loss=0.09765, over 3379593.65 frames. ], batch size: 110, lr: 1.75e-02, grad_scale: 64.0 2023-06-15 06:35:50,328 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=59853.333333333336, ans=0.125 2023-06-15 06:36:01,023 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=59920.0, ans=0.0 2023-06-15 06:36:41,062 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+02 2.138e+02 2.622e+02 3.155e+02 6.039e+02, threshold=5.245e+02, percent-clipped=6.0 2023-06-15 06:36:52,653 INFO [train.py:988] (0/4) Epoch 17, batch 500, loss[loss=0.2373, simple_loss=0.3116, pruned_loss=0.0815, over 18800.00 frames. ], tot_loss[loss=0.2618, simple_loss=0.3282, pruned_loss=0.0977, over 3450900.58 frames. ], batch size: 83, lr: 1.75e-02, grad_scale: 64.0 2023-06-15 06:37:10,411 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=60186.666666666664, ans=0.125 2023-06-15 06:37:33,194 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=60253.333333333336, ans=0.0 2023-06-15 06:37:45,539 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-17.pt 2023-06-15 06:38:08,683 INFO [train.py:988] (0/4) Epoch 18, batch 0, loss[loss=0.2433, simple_loss=0.3181, pruned_loss=0.08425, over 19352.00 frames. ], tot_loss[loss=0.2433, simple_loss=0.3181, pruned_loss=0.08425, over 19352.00 frames. ], batch size: 98, lr: 1.70e-02, grad_scale: 64.0 2023-06-15 06:38:08,684 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 06:38:14,735 INFO [train.py:1020] (0/4) Epoch 18, validation: loss=0.2126, simple_loss=0.3161, pruned_loss=0.05459, over 143649.00 frames. 2023-06-15 06:38:14,735 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 06:38:22,212 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=60333.333333333336, ans=0.125 2023-06-15 06:38:51,486 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=60466.666666666664, ans=0.125 2023-06-15 06:39:19,972 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=60533.333333333336, ans=0.0 2023-06-15 06:39:35,576 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=60600.0, ans=0.2 2023-06-15 06:39:42,216 INFO [train.py:988] (0/4) Epoch 18, batch 50, loss[loss=0.2503, simple_loss=0.3158, pruned_loss=0.09245, over 20260.00 frames. ], tot_loss[loss=0.262, simple_loss=0.3298, pruned_loss=0.09713, over 856949.85 frames. ], batch size: 141, lr: 1.69e-02, grad_scale: 64.0 2023-06-15 06:40:00,852 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.503e+02 1.965e+02 2.316e+02 2.715e+02 4.312e+02, threshold=4.632e+02, percent-clipped=0.0 2023-06-15 06:40:03,888 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=12.0 2023-06-15 06:40:12,250 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=60733.333333333336, ans=0.125 2023-06-15 06:40:17,891 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=12.0 2023-06-15 06:40:45,211 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=60866.666666666664, ans=0.0 2023-06-15 06:41:04,613 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=60933.333333333336, ans=0.2 2023-06-15 06:41:10,123 INFO [train.py:988] (0/4) Epoch 18, batch 100, loss[loss=0.2576, simple_loss=0.3276, pruned_loss=0.09383, over 18936.00 frames. ], tot_loss[loss=0.2602, simple_loss=0.3279, pruned_loss=0.0962, over 1508724.35 frames. ], batch size: 86, lr: 1.69e-02, grad_scale: 64.0 2023-06-15 06:41:11,254 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2023-06-15 06:42:01,084 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61200.0, ans=0.1 2023-06-15 06:42:32,906 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2023-06-15 06:42:37,674 INFO [train.py:988] (0/4) Epoch 18, batch 150, loss[loss=0.2549, simple_loss=0.3315, pruned_loss=0.0892, over 18645.00 frames. ], tot_loss[loss=0.261, simple_loss=0.3284, pruned_loss=0.09679, over 2010041.81 frames. ], batch size: 80, lr: 1.69e-02, grad_scale: 64.0 2023-06-15 06:42:48,703 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=61333.333333333336, ans=0.0 2023-06-15 06:42:57,615 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+02 2.047e+02 2.339e+02 2.700e+02 3.981e+02, threshold=4.677e+02, percent-clipped=0.0 2023-06-15 06:43:17,267 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=61466.666666666664, ans=15.0 2023-06-15 06:43:31,018 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 06:43:34,604 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=61533.333333333336, ans=0.0 2023-06-15 06:43:41,650 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=61533.333333333336, ans=0.125 2023-06-15 06:44:06,116 INFO [train.py:988] (0/4) Epoch 18, batch 200, loss[loss=0.2682, simple_loss=0.3456, pruned_loss=0.09537, over 16311.00 frames. ], tot_loss[loss=0.2597, simple_loss=0.3258, pruned_loss=0.0968, over 2409867.45 frames. ], batch size: 52, lr: 1.69e-02, grad_scale: 64.0 2023-06-15 06:44:41,853 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=61800.0, ans=0.1 2023-06-15 06:45:07,606 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61866.666666666664, ans=0.1 2023-06-15 06:45:14,760 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=61933.333333333336, ans=0.125 2023-06-15 06:45:31,178 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=61933.333333333336, ans=0.0 2023-06-15 06:45:32,637 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=62000.0, ans=0.125 2023-06-15 06:45:33,957 INFO [train.py:988] (0/4) Epoch 18, batch 250, loss[loss=0.2554, simple_loss=0.3271, pruned_loss=0.09185, over 19704.00 frames. ], tot_loss[loss=0.2597, simple_loss=0.3267, pruned_loss=0.09634, over 2687238.38 frames. ], batch size: 110, lr: 1.68e-02, grad_scale: 64.0 2023-06-15 06:45:53,599 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.568e+02 2.076e+02 2.248e+02 2.609e+02 3.858e+02, threshold=4.496e+02, percent-clipped=0.0 2023-06-15 06:46:11,239 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=62133.333333333336, ans=0.125 2023-06-15 06:46:20,620 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=62133.333333333336, ans=0.2 2023-06-15 06:46:23,882 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=62133.333333333336, ans=0.2 2023-06-15 06:46:23,946 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 06:47:02,674 INFO [train.py:988] (0/4) Epoch 18, batch 300, loss[loss=0.2709, simple_loss=0.301, pruned_loss=0.1204, over 17318.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.3264, pruned_loss=0.09624, over 2942642.89 frames. ], batch size: 391, lr: 1.68e-02, grad_scale: 64.0 2023-06-15 06:47:03,253 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=62333.333333333336, ans=0.125 2023-06-15 06:47:21,737 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=62400.0, ans=0.125 2023-06-15 06:47:51,814 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=62466.666666666664, ans=0.125 2023-06-15 06:47:51,850 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 06:48:06,740 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-06-15 06:48:30,193 INFO [train.py:988] (0/4) Epoch 18, batch 350, loss[loss=0.2545, simple_loss=0.3326, pruned_loss=0.08814, over 18299.00 frames. ], tot_loss[loss=0.2585, simple_loss=0.3257, pruned_loss=0.09567, over 3121748.90 frames. ], batch size: 74, lr: 1.68e-02, grad_scale: 32.0 2023-06-15 06:48:32,325 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=62666.666666666664, ans=0.125 2023-06-15 06:48:34,088 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=62666.666666666664, ans=0.125 2023-06-15 06:48:49,837 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=62733.333333333336, ans=0.125 2023-06-15 06:48:51,048 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+02 2.098e+02 2.384e+02 2.713e+02 4.621e+02, threshold=4.767e+02, percent-clipped=2.0 2023-06-15 06:49:01,282 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2023-06-15 06:49:36,087 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-06-15 06:49:52,444 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=62933.333333333336, ans=0.125 2023-06-15 06:49:57,682 INFO [train.py:988] (0/4) Epoch 18, batch 400, loss[loss=0.2493, simple_loss=0.3177, pruned_loss=0.09049, over 19950.00 frames. ], tot_loss[loss=0.2581, simple_loss=0.3257, pruned_loss=0.09528, over 3277936.06 frames. ], batch size: 126, lr: 1.68e-02, grad_scale: 32.0 2023-06-15 06:50:01,232 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=63000.0, ans=0.125 2023-06-15 06:50:04,768 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=63000.0, ans=0.2 2023-06-15 06:50:04,797 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=63000.0, ans=0.125 2023-06-15 06:50:21,014 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=63066.666666666664, ans=0.125 2023-06-15 06:50:49,394 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2023-06-15 06:51:26,151 INFO [train.py:988] (0/4) Epoch 18, batch 450, loss[loss=0.2763, simple_loss=0.3328, pruned_loss=0.1099, over 19958.00 frames. ], tot_loss[loss=0.2583, simple_loss=0.3258, pruned_loss=0.09534, over 3377776.97 frames. ], batch size: 126, lr: 1.67e-02, grad_scale: 32.0 2023-06-15 06:51:47,892 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.627e+02 2.045e+02 2.298e+02 2.779e+02 4.422e+02, threshold=4.596e+02, percent-clipped=0.0 2023-06-15 06:51:48,807 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=15.0 2023-06-15 06:52:04,818 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2023-06-15 06:52:07,417 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=63466.666666666664, ans=0.125 2023-06-15 06:52:28,830 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-06-15 06:52:38,467 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=63600.0, ans=0.0 2023-06-15 06:52:40,312 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=63600.0, ans=10.0 2023-06-15 06:52:51,113 INFO [train.py:988] (0/4) Epoch 18, batch 500, loss[loss=0.241, simple_loss=0.3098, pruned_loss=0.08611, over 19471.00 frames. ], tot_loss[loss=0.2585, simple_loss=0.3253, pruned_loss=0.09581, over 3480849.39 frames. ], batch size: 105, lr: 1.67e-02, grad_scale: 32.0 2023-06-15 06:52:59,613 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63666.666666666664, ans=0.1 2023-06-15 06:53:03,193 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=63666.666666666664, ans=0.2 2023-06-15 06:53:22,294 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-06-15 06:53:24,787 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=63800.0, ans=0.0 2023-06-15 06:53:32,708 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=63800.0, ans=0.125 2023-06-15 06:53:43,101 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-18.pt 2023-06-15 06:54:08,034 INFO [train.py:988] (0/4) Epoch 19, batch 0, loss[loss=0.2709, simple_loss=0.3303, pruned_loss=0.1057, over 20567.00 frames. ], tot_loss[loss=0.2709, simple_loss=0.3303, pruned_loss=0.1057, over 20567.00 frames. ], batch size: 173, lr: 1.62e-02, grad_scale: 32.0 2023-06-15 06:54:08,035 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 06:54:14,159 INFO [train.py:1020] (0/4) Epoch 19, validation: loss=0.2113, simple_loss=0.3157, pruned_loss=0.05349, over 143649.00 frames. 2023-06-15 06:54:14,160 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 06:54:21,099 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=63880.0, ans=0.0 2023-06-15 06:54:43,477 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=63946.666666666664, ans=0.125 2023-06-15 06:55:05,671 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.559e+02 1.944e+02 2.133e+02 2.428e+02 3.266e+02, threshold=4.266e+02, percent-clipped=0.0 2023-06-15 06:55:12,942 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=64080.0, ans=0.125 2023-06-15 06:55:40,296 INFO [train.py:988] (0/4) Epoch 19, batch 50, loss[loss=0.2406, simple_loss=0.3118, pruned_loss=0.08471, over 20291.00 frames. ], tot_loss[loss=0.2538, simple_loss=0.3222, pruned_loss=0.09268, over 866042.69 frames. ], batch size: 149, lr: 1.62e-02, grad_scale: 32.0 2023-06-15 06:55:40,651 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=64213.333333333336, ans=0.1 2023-06-15 06:55:51,433 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=64213.333333333336, ans=0.0 2023-06-15 06:55:54,515 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=64213.333333333336, ans=0.07 2023-06-15 06:55:54,528 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=64213.333333333336, ans=0.0 2023-06-15 06:56:34,925 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=64413.333333333336, ans=0.2 2023-06-15 06:56:41,650 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=64413.333333333336, ans=0.07 2023-06-15 06:57:08,420 INFO [train.py:988] (0/4) Epoch 19, batch 100, loss[loss=0.2606, simple_loss=0.326, pruned_loss=0.09764, over 20287.00 frames. ], tot_loss[loss=0.2547, simple_loss=0.3222, pruned_loss=0.09362, over 1508104.87 frames. ], batch size: 149, lr: 1.62e-02, grad_scale: 32.0 2023-06-15 06:57:11,918 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=64546.666666666664, ans=0.2 2023-06-15 06:57:30,879 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=64613.333333333336, ans=0.2 2023-06-15 06:57:38,357 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-06-15 06:57:47,114 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.27 vs. limit=15.0 2023-06-15 06:58:00,778 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+02 2.073e+02 2.297e+02 2.619e+02 4.375e+02, threshold=4.594e+02, percent-clipped=1.0 2023-06-15 06:58:10,240 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=64746.666666666664, ans=0.125 2023-06-15 06:58:27,324 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=64813.333333333336, ans=0.2 2023-06-15 06:58:36,690 INFO [train.py:988] (0/4) Epoch 19, batch 150, loss[loss=0.2737, simple_loss=0.3415, pruned_loss=0.103, over 19333.00 frames. ], tot_loss[loss=0.2549, simple_loss=0.3218, pruned_loss=0.094, over 2021620.43 frames. ], batch size: 98, lr: 1.62e-02, grad_scale: 32.0 2023-06-15 06:58:45,579 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=64880.0, ans=0.0 2023-06-15 06:58:59,348 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=64946.666666666664, ans=0.0 2023-06-15 06:59:01,784 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=12.0 2023-06-15 06:59:47,980 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65146.666666666664, ans=0.125 2023-06-15 06:59:58,411 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=65146.666666666664, ans=0.0 2023-06-15 06:59:58,616 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=65146.666666666664, ans=0.0 2023-06-15 07:00:04,047 INFO [train.py:988] (0/4) Epoch 19, batch 200, loss[loss=0.2542, simple_loss=0.3213, pruned_loss=0.09353, over 19465.00 frames. ], tot_loss[loss=0.2545, simple_loss=0.3223, pruned_loss=0.09339, over 2416510.39 frames. ], batch size: 105, lr: 1.61e-02, grad_scale: 32.0 2023-06-15 07:00:11,677 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=65213.333333333336, ans=0.2 2023-06-15 07:00:13,466 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=65213.333333333336, ans=0.125 2023-06-15 07:00:25,413 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=65280.0, ans=0.125 2023-06-15 07:00:27,018 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=65280.0, ans=0.125 2023-06-15 07:00:54,080 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65346.666666666664, ans=0.1 2023-06-15 07:00:55,715 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65413.333333333336, ans=0.1 2023-06-15 07:00:56,943 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+02 1.984e+02 2.245e+02 2.601e+02 3.971e+02, threshold=4.489e+02, percent-clipped=0.0 2023-06-15 07:01:00,069 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.79 vs. limit=10.0 2023-06-15 07:01:04,556 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=65413.333333333336, ans=0.0 2023-06-15 07:01:09,606 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=65413.333333333336, ans=0.125 2023-06-15 07:01:32,670 INFO [train.py:988] (0/4) Epoch 19, batch 250, loss[loss=0.2463, simple_loss=0.3145, pruned_loss=0.0891, over 19715.00 frames. ], tot_loss[loss=0.2543, simple_loss=0.3217, pruned_loss=0.09342, over 2732469.78 frames. ], batch size: 110, lr: 1.61e-02, grad_scale: 32.0 2023-06-15 07:01:38,552 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=65546.66666666667, ans=0.0 2023-06-15 07:02:06,832 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2023-06-15 07:02:12,608 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-06-15 07:02:14,555 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=12.0 2023-06-15 07:02:38,651 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=22.5 2023-06-15 07:02:55,566 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=65813.33333333333, ans=0.125 2023-06-15 07:03:01,180 INFO [train.py:988] (0/4) Epoch 19, batch 300, loss[loss=0.251, simple_loss=0.317, pruned_loss=0.09249, over 20549.00 frames. ], tot_loss[loss=0.2544, simple_loss=0.322, pruned_loss=0.09343, over 2962075.12 frames. ], batch size: 173, lr: 1.61e-02, grad_scale: 32.0 2023-06-15 07:03:03,195 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=65880.0, ans=0.0 2023-06-15 07:03:06,505 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65880.0, ans=0.125 2023-06-15 07:03:43,968 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-06-15 07:03:53,127 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.568e+02 1.956e+02 2.157e+02 2.430e+02 3.269e+02, threshold=4.313e+02, percent-clipped=0.0 2023-06-15 07:04:04,726 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=66080.0, ans=0.125 2023-06-15 07:04:24,394 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=66146.66666666667, ans=15.0 2023-06-15 07:04:28,906 INFO [train.py:988] (0/4) Epoch 19, batch 350, loss[loss=0.247, simple_loss=0.3208, pruned_loss=0.08666, over 18647.00 frames. ], tot_loss[loss=0.2539, simple_loss=0.3219, pruned_loss=0.09293, over 3145703.67 frames. ], batch size: 80, lr: 1.61e-02, grad_scale: 32.0 2023-06-15 07:04:34,150 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=66213.33333333333, ans=0.125 2023-06-15 07:05:24,290 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=66413.33333333333, ans=0.2 2023-06-15 07:05:26,171 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=66413.33333333333, ans=0.0 2023-06-15 07:05:36,766 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=66480.0, ans=0.0 2023-06-15 07:05:51,358 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66480.0, ans=0.1 2023-06-15 07:05:54,455 INFO [train.py:988] (0/4) Epoch 19, batch 400, loss[loss=0.2643, simple_loss=0.3409, pruned_loss=0.09383, over 18296.00 frames. ], tot_loss[loss=0.2539, simple_loss=0.3217, pruned_loss=0.09306, over 3308478.42 frames. ], batch size: 74, lr: 1.60e-02, grad_scale: 32.0 2023-06-15 07:06:02,916 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.99 vs. limit=15.0 2023-06-15 07:06:20,148 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=66613.33333333333, ans=0.025 2023-06-15 07:06:47,307 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+02 1.968e+02 2.388e+02 2.889e+02 4.258e+02, threshold=4.776e+02, percent-clipped=0.0 2023-06-15 07:07:01,484 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=66746.66666666667, ans=0.0 2023-06-15 07:07:22,219 INFO [train.py:988] (0/4) Epoch 19, batch 450, loss[loss=0.2493, simple_loss=0.3143, pruned_loss=0.09213, over 20261.00 frames. ], tot_loss[loss=0.2528, simple_loss=0.3216, pruned_loss=0.09198, over 3413991.88 frames. ], batch size: 141, lr: 1.60e-02, grad_scale: 32.0 2023-06-15 07:07:22,550 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=66880.0, ans=0.125 2023-06-15 07:07:31,285 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=66880.0, ans=0.125 2023-06-15 07:07:41,938 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=66946.66666666667, ans=0.0 2023-06-15 07:08:06,336 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67013.33333333333, ans=0.1 2023-06-15 07:08:06,433 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=67013.33333333333, ans=0.1 2023-06-15 07:08:07,076 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2023-06-15 07:08:08,090 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67013.33333333333, ans=0.1 2023-06-15 07:08:22,193 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67080.0, ans=0.1 2023-06-15 07:08:28,621 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=67080.0, ans=0.125 2023-06-15 07:08:37,137 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=67146.66666666667, ans=0.125 2023-06-15 07:08:48,902 INFO [train.py:988] (0/4) Epoch 19, batch 500, loss[loss=0.2448, simple_loss=0.3119, pruned_loss=0.08881, over 20138.00 frames. ], tot_loss[loss=0.2526, simple_loss=0.3217, pruned_loss=0.09175, over 3487726.37 frames. ], batch size: 133, lr: 1.60e-02, grad_scale: 32.0 2023-06-15 07:08:54,110 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=67213.33333333333, ans=0.125 2023-06-15 07:09:05,862 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67280.0, ans=0.1 2023-06-15 07:09:15,245 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 07:09:20,307 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-06-15 07:09:37,677 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.568e+02 1.943e+02 2.120e+02 2.445e+02 3.405e+02, threshold=4.239e+02, percent-clipped=0.0 2023-06-15 07:09:42,274 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-19.pt 2023-06-15 07:10:07,197 INFO [train.py:988] (0/4) Epoch 20, batch 0, loss[loss=0.2517, simple_loss=0.3186, pruned_loss=0.0924, over 18767.00 frames. ], tot_loss[loss=0.2517, simple_loss=0.3186, pruned_loss=0.0924, over 18767.00 frames. ], batch size: 83, lr: 1.56e-02, grad_scale: 32.0 2023-06-15 07:10:07,198 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 07:10:13,265 INFO [train.py:1020] (0/4) Epoch 20, validation: loss=0.2092, simple_loss=0.3126, pruned_loss=0.05295, over 143649.00 frames. 2023-06-15 07:10:13,267 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 07:10:24,618 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=67433.33333333333, ans=0.125 2023-06-15 07:10:37,057 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=67500.0, ans=0.125 2023-06-15 07:10:47,817 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2023-06-15 07:11:13,533 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=67633.33333333333, ans=0.035 2023-06-15 07:11:29,351 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=67700.0, ans=0.125 2023-06-15 07:11:31,238 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=67700.0, ans=0.1 2023-06-15 07:11:41,412 INFO [train.py:988] (0/4) Epoch 20, batch 50, loss[loss=0.2503, simple_loss=0.3005, pruned_loss=0.1001, over 19959.00 frames. ], tot_loss[loss=0.25, simple_loss=0.3199, pruned_loss=0.09001, over 847098.29 frames. ], batch size: 293, lr: 1.55e-02, grad_scale: 32.0 2023-06-15 07:12:43,772 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=12.0 2023-06-15 07:12:55,704 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=68033.33333333333, ans=0.125 2023-06-15 07:13:03,934 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.494e+02 2.103e+02 2.332e+02 2.749e+02 4.592e+02, threshold=4.664e+02, percent-clipped=1.0 2023-06-15 07:13:09,724 INFO [train.py:988] (0/4) Epoch 20, batch 100, loss[loss=0.2407, simple_loss=0.3121, pruned_loss=0.08462, over 19697.00 frames. ], tot_loss[loss=0.2511, simple_loss=0.3205, pruned_loss=0.09082, over 1497899.88 frames. ], batch size: 110, lr: 1.55e-02, grad_scale: 32.0 2023-06-15 07:13:17,646 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2023-06-15 07:13:34,734 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=68166.66666666667, ans=0.125 2023-06-15 07:13:45,327 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68233.33333333333, ans=0.1 2023-06-15 07:13:55,893 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=68233.33333333333, ans=0.0 2023-06-15 07:14:06,016 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=68300.0, ans=0.0 2023-06-15 07:14:16,922 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68300.0, ans=0.1 2023-06-15 07:14:22,075 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=68366.66666666667, ans=0.125 2023-06-15 07:14:26,107 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=68366.66666666667, ans=0.125 2023-06-15 07:14:37,883 INFO [train.py:988] (0/4) Epoch 20, batch 150, loss[loss=0.2395, simple_loss=0.3175, pruned_loss=0.08072, over 18748.00 frames. ], tot_loss[loss=0.2508, simple_loss=0.32, pruned_loss=0.09085, over 2003839.76 frames. ], batch size: 83, lr: 1.55e-02, grad_scale: 32.0 2023-06-15 07:14:41,505 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=68433.33333333333, ans=0.125 2023-06-15 07:15:10,704 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.14 vs. limit=22.5 2023-06-15 07:15:18,217 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=68566.66666666667, ans=0.2 2023-06-15 07:15:59,988 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.562e+02 1.981e+02 2.212e+02 2.517e+02 3.865e+02, threshold=4.424e+02, percent-clipped=0.0 2023-06-15 07:16:05,157 INFO [train.py:988] (0/4) Epoch 20, batch 200, loss[loss=0.2407, simple_loss=0.307, pruned_loss=0.08721, over 20569.00 frames. ], tot_loss[loss=0.251, simple_loss=0.3204, pruned_loss=0.09086, over 2402044.68 frames. ], batch size: 173, lr: 1.55e-02, grad_scale: 32.0 2023-06-15 07:16:39,739 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2023-06-15 07:16:46,993 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2023-06-15 07:17:07,471 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=68966.66666666667, ans=0.125 2023-06-15 07:17:15,267 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2023-06-15 07:17:17,718 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=69033.33333333333, ans=0.125 2023-06-15 07:17:21,846 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=69033.33333333333, ans=0.2 2023-06-15 07:17:29,850 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=69033.33333333333, ans=0.2 2023-06-15 07:17:32,287 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=69100.0, ans=0.125 2023-06-15 07:17:33,543 INFO [train.py:988] (0/4) Epoch 20, batch 250, loss[loss=0.261, simple_loss=0.3253, pruned_loss=0.09836, over 20575.00 frames. ], tot_loss[loss=0.251, simple_loss=0.3202, pruned_loss=0.09087, over 2713139.95 frames. ], batch size: 173, lr: 1.54e-02, grad_scale: 32.0 2023-06-15 07:17:57,280 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=12.0 2023-06-15 07:18:17,256 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=22.5 2023-06-15 07:18:22,251 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69233.33333333333, ans=0.1 2023-06-15 07:18:27,453 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69300.0, ans=0.125 2023-06-15 07:18:36,630 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=69300.0, ans=0.0 2023-06-15 07:18:55,597 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.504e+02 1.907e+02 2.171e+02 2.570e+02 4.262e+02, threshold=4.342e+02, percent-clipped=0.0 2023-06-15 07:19:00,688 INFO [train.py:988] (0/4) Epoch 20, batch 300, loss[loss=0.2516, simple_loss=0.3233, pruned_loss=0.08996, over 18807.00 frames. ], tot_loss[loss=0.25, simple_loss=0.3191, pruned_loss=0.09044, over 2955693.73 frames. ], batch size: 83, lr: 1.54e-02, grad_scale: 32.0 2023-06-15 07:19:13,082 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=69433.33333333333, ans=0.0 2023-06-15 07:19:44,743 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=12.0 2023-06-15 07:19:45,857 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=69566.66666666667, ans=0.0 2023-06-15 07:20:20,445 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-06-15 07:20:28,554 INFO [train.py:988] (0/4) Epoch 20, batch 350, loss[loss=0.2543, simple_loss=0.3222, pruned_loss=0.09322, over 20288.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.3192, pruned_loss=0.09022, over 3151132.44 frames. ], batch size: 149, lr: 1.54e-02, grad_scale: 32.0 2023-06-15 07:20:49,915 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=69833.33333333333, ans=0.2 2023-06-15 07:20:59,893 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.95 vs. limit=10.0 2023-06-15 07:21:00,686 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=69833.33333333333, ans=0.07 2023-06-15 07:21:36,845 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=70033.33333333333, ans=0.125 2023-06-15 07:21:43,360 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=70033.33333333333, ans=0.0 2023-06-15 07:21:50,095 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.529e+02 2.058e+02 2.277e+02 2.664e+02 3.684e+02, threshold=4.554e+02, percent-clipped=0.0 2023-06-15 07:21:55,239 INFO [train.py:988] (0/4) Epoch 20, batch 400, loss[loss=0.241, simple_loss=0.3121, pruned_loss=0.08497, over 20121.00 frames. ], tot_loss[loss=0.2505, simple_loss=0.3203, pruned_loss=0.09032, over 3270572.55 frames. ], batch size: 133, lr: 1.54e-02, grad_scale: 32.0 2023-06-15 07:22:13,246 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=70166.66666666667, ans=0.125 2023-06-15 07:22:16,493 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2023-06-15 07:22:35,089 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=70233.33333333333, ans=0.2 2023-06-15 07:22:38,357 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=70233.33333333333, ans=0.125 2023-06-15 07:22:40,707 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2023-06-15 07:22:45,656 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=70233.33333333333, ans=0.125 2023-06-15 07:22:56,427 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=70300.0, ans=0.125 2023-06-15 07:23:02,218 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=70300.0, ans=0.125 2023-06-15 07:23:23,996 INFO [train.py:988] (0/4) Epoch 20, batch 450, loss[loss=0.261, simple_loss=0.317, pruned_loss=0.1025, over 20219.00 frames. ], tot_loss[loss=0.2503, simple_loss=0.3198, pruned_loss=0.09039, over 3382138.80 frames. ], batch size: 239, lr: 1.54e-02, grad_scale: 32.0 2023-06-15 07:23:47,763 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=70500.0, ans=0.2 2023-06-15 07:23:52,899 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=70500.0, ans=0.0 2023-06-15 07:23:59,642 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=70566.66666666667, ans=22.5 2023-06-15 07:24:02,396 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-06-15 07:24:08,157 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=70566.66666666667, ans=0.125 2023-06-15 07:24:24,117 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=22.5 2023-06-15 07:24:25,453 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-06-15 07:24:44,758 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+02 2.031e+02 2.228e+02 2.537e+02 4.676e+02, threshold=4.456e+02, percent-clipped=1.0 2023-06-15 07:24:47,309 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=12.0 2023-06-15 07:24:48,197 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=70766.66666666667, ans=0.125 2023-06-15 07:24:49,645 INFO [train.py:988] (0/4) Epoch 20, batch 500, loss[loss=0.2626, simple_loss=0.3203, pruned_loss=0.1025, over 20235.00 frames. ], tot_loss[loss=0.2501, simple_loss=0.3195, pruned_loss=0.09032, over 3474093.09 frames. ], batch size: 239, lr: 1.53e-02, grad_scale: 32.0 2023-06-15 07:25:42,103 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-20.pt 2023-06-15 07:26:07,905 INFO [train.py:988] (0/4) Epoch 21, batch 0, loss[loss=0.2472, simple_loss=0.3189, pruned_loss=0.08776, over 16379.00 frames. ], tot_loss[loss=0.2472, simple_loss=0.3189, pruned_loss=0.08776, over 16379.00 frames. ], batch size: 52, lr: 1.49e-02, grad_scale: 32.0 2023-06-15 07:26:07,906 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 07:26:14,424 INFO [train.py:1020] (0/4) Epoch 21, validation: loss=0.209, simple_loss=0.3126, pruned_loss=0.05274, over 143649.00 frames. 2023-06-15 07:26:14,425 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 07:26:15,436 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2023-06-15 07:26:37,525 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-06-15 07:26:47,935 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.689e-03 2023-06-15 07:26:51,016 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=71113.33333333333, ans=0.0 2023-06-15 07:27:06,965 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=71180.0, ans=0.125 2023-06-15 07:27:08,705 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=71180.0, ans=0.2 2023-06-15 07:27:15,609 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=71180.0, ans=0.0 2023-06-15 07:27:31,419 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=71246.66666666667, ans=0.125 2023-06-15 07:27:41,164 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2023-06-15 07:27:41,939 INFO [train.py:988] (0/4) Epoch 21, batch 50, loss[loss=0.227, simple_loss=0.2897, pruned_loss=0.08215, over 20515.00 frames. ], tot_loss[loss=0.2508, simple_loss=0.3219, pruned_loss=0.08984, over 861331.27 frames. ], batch size: 173, lr: 1.49e-02, grad_scale: 32.0 2023-06-15 07:27:49,084 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71313.33333333333, ans=0.1 2023-06-15 07:27:49,104 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=71313.33333333333, ans=0.125 2023-06-15 07:27:56,180 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=71313.33333333333, ans=0.0 2023-06-15 07:28:07,647 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+02 2.043e+02 2.374e+02 2.878e+02 4.060e+02, threshold=4.748e+02, percent-clipped=0.0 2023-06-15 07:28:28,063 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=71446.66666666667, ans=0.2 2023-06-15 07:28:36,422 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=71513.33333333333, ans=0.125 2023-06-15 07:28:55,226 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=71580.0, ans=0.0 2023-06-15 07:29:09,317 INFO [train.py:988] (0/4) Epoch 21, batch 100, loss[loss=0.2311, simple_loss=0.3035, pruned_loss=0.07941, over 20249.00 frames. ], tot_loss[loss=0.2484, simple_loss=0.3208, pruned_loss=0.08801, over 1496771.58 frames. ], batch size: 141, lr: 1.49e-02, grad_scale: 32.0 2023-06-15 07:29:59,621 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71846.66666666667, ans=0.1 2023-06-15 07:30:07,761 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=71846.66666666667, ans=0.125 2023-06-15 07:30:11,112 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71846.66666666667, ans=0.1 2023-06-15 07:30:14,991 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=71846.66666666667, ans=0.125 2023-06-15 07:30:32,587 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=71913.33333333333, ans=0.95 2023-06-15 07:30:35,470 INFO [train.py:988] (0/4) Epoch 21, batch 150, loss[loss=0.2439, simple_loss=0.3136, pruned_loss=0.08709, over 20508.00 frames. ], tot_loss[loss=0.2463, simple_loss=0.3188, pruned_loss=0.08687, over 1991974.02 frames. ], batch size: 173, lr: 1.49e-02, grad_scale: 32.0 2023-06-15 07:30:42,025 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=71980.0, ans=0.125 2023-06-15 07:31:01,908 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.683e+02 2.007e+02 2.293e+02 2.740e+02 3.931e+02, threshold=4.586e+02, percent-clipped=0.0 2023-06-15 07:31:16,643 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=72113.33333333333, ans=0.125 2023-06-15 07:31:20,611 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-06-15 07:31:27,008 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=72180.0, ans=0.0 2023-06-15 07:31:47,216 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72246.66666666667, ans=0.125 2023-06-15 07:32:02,953 INFO [train.py:988] (0/4) Epoch 21, batch 200, loss[loss=0.2223, simple_loss=0.3016, pruned_loss=0.07147, over 18808.00 frames. ], tot_loss[loss=0.2478, simple_loss=0.3182, pruned_loss=0.08874, over 2395842.44 frames. ], batch size: 83, lr: 1.49e-02, grad_scale: 32.0 2023-06-15 07:32:06,613 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72313.33333333333, ans=0.1 2023-06-15 07:32:20,492 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72380.0, ans=0.1 2023-06-15 07:32:37,243 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2023-06-15 07:32:42,938 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72446.66666666667, ans=0.1 2023-06-15 07:32:50,163 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 07:32:55,611 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=72513.33333333333, ans=0.125 2023-06-15 07:33:13,172 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=22.5 2023-06-15 07:33:25,136 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.07 vs. limit=22.5 2023-06-15 07:33:29,794 INFO [train.py:988] (0/4) Epoch 21, batch 250, loss[loss=0.2305, simple_loss=0.3115, pruned_loss=0.07472, over 19331.00 frames. ], tot_loss[loss=0.2476, simple_loss=0.3177, pruned_loss=0.08876, over 2717491.48 frames. ], batch size: 98, lr: 1.48e-02, grad_scale: 32.0 2023-06-15 07:33:29,989 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=72646.66666666667, ans=0.2 2023-06-15 07:33:51,669 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 07:33:54,719 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.599e+02 1.979e+02 2.224e+02 2.730e+02 4.525e+02, threshold=4.448e+02, percent-clipped=0.0 2023-06-15 07:34:07,711 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=72780.0, ans=0.125 2023-06-15 07:34:14,097 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=72780.0, ans=0.0 2023-06-15 07:34:35,168 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=72846.66666666667, ans=0.125 2023-06-15 07:34:52,084 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2023-06-15 07:34:55,959 INFO [train.py:988] (0/4) Epoch 21, batch 300, loss[loss=0.2696, simple_loss=0.349, pruned_loss=0.09509, over 16888.00 frames. ], tot_loss[loss=0.2479, simple_loss=0.318, pruned_loss=0.08885, over 2948960.93 frames. ], batch size: 60, lr: 1.48e-02, grad_scale: 32.0 2023-06-15 07:35:10,854 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=12.0 2023-06-15 07:35:27,434 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=73046.66666666667, ans=0.125 2023-06-15 07:35:29,169 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=73113.33333333333, ans=0.0 2023-06-15 07:35:34,745 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=73113.33333333333, ans=0.0 2023-06-15 07:35:44,235 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=73113.33333333333, ans=0.125 2023-06-15 07:35:50,669 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73180.0, ans=0.1 2023-06-15 07:35:57,404 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=73180.0, ans=0.0 2023-06-15 07:36:20,565 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=73246.66666666667, ans=0.5 2023-06-15 07:36:23,384 INFO [train.py:988] (0/4) Epoch 21, batch 350, loss[loss=0.2482, simple_loss=0.2747, pruned_loss=0.1108, over 17128.00 frames. ], tot_loss[loss=0.248, simple_loss=0.3175, pruned_loss=0.08929, over 3123618.47 frames. ], batch size: 391, lr: 1.48e-02, grad_scale: 32.0 2023-06-15 07:36:37,617 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=73313.33333333333, ans=0.04949747468305833 2023-06-15 07:36:38,316 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2023-06-15 07:36:49,398 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.513e+02 2.024e+02 2.303e+02 3.042e+02 4.564e+02, threshold=4.607e+02, percent-clipped=2.0 2023-06-15 07:37:32,495 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=73580.0, ans=0.125 2023-06-15 07:37:39,454 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 07:37:49,165 INFO [train.py:988] (0/4) Epoch 21, batch 400, loss[loss=0.263, simple_loss=0.2943, pruned_loss=0.1158, over 16795.00 frames. ], tot_loss[loss=0.2488, simple_loss=0.3179, pruned_loss=0.08982, over 3267970.87 frames. ], batch size: 391, lr: 1.48e-02, grad_scale: 32.0 2023-06-15 07:37:49,775 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-06-15 07:37:58,627 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=73646.66666666667, ans=0.0 2023-06-15 07:38:00,435 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=73646.66666666667, ans=0.125 2023-06-15 07:38:14,140 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=73713.33333333333, ans=10.0 2023-06-15 07:38:31,160 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=73780.0, ans=0.125 2023-06-15 07:38:59,010 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73913.33333333333, ans=0.1 2023-06-15 07:39:08,274 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.25 vs. limit=10.0 2023-06-15 07:39:14,172 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 07:39:15,483 INFO [train.py:988] (0/4) Epoch 21, batch 450, loss[loss=0.2334, simple_loss=0.3047, pruned_loss=0.08101, over 19303.00 frames. ], tot_loss[loss=0.2475, simple_loss=0.3169, pruned_loss=0.08908, over 3383981.04 frames. ], batch size: 98, lr: 1.47e-02, grad_scale: 32.0 2023-06-15 07:39:17,720 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=73980.0, ans=0.125 2023-06-15 07:39:34,831 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-06-15 07:39:42,139 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+02 2.035e+02 2.412e+02 2.747e+02 5.192e+02, threshold=4.824e+02, percent-clipped=2.0 2023-06-15 07:39:45,738 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=74046.66666666667, ans=0.125 2023-06-15 07:39:53,199 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=74113.33333333333, ans=0.125 2023-06-15 07:39:54,874 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74113.33333333333, ans=0.1 2023-06-15 07:40:10,215 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2023-06-15 07:40:11,583 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-06-15 07:40:40,642 INFO [train.py:988] (0/4) Epoch 21, batch 500, loss[loss=0.2515, simple_loss=0.321, pruned_loss=0.09103, over 20442.00 frames. ], tot_loss[loss=0.2476, simple_loss=0.3163, pruned_loss=0.08948, over 3475279.05 frames. ], batch size: 160, lr: 1.47e-02, grad_scale: 32.0 2023-06-15 07:40:50,811 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=74313.33333333333, ans=0.1 2023-06-15 07:41:00,562 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=74380.0, ans=0.125 2023-06-15 07:41:19,879 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=74446.66666666667, ans=0.125 2023-06-15 07:41:31,478 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-06-15 07:41:33,897 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-21.pt 2023-06-15 07:42:00,896 INFO [train.py:988] (0/4) Epoch 22, batch 0, loss[loss=0.2534, simple_loss=0.3145, pruned_loss=0.09621, over 20572.00 frames. ], tot_loss[loss=0.2534, simple_loss=0.3145, pruned_loss=0.09621, over 20572.00 frames. ], batch size: 173, lr: 1.44e-02, grad_scale: 32.0 2023-06-15 07:42:00,897 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 07:42:07,057 INFO [train.py:1020] (0/4) Epoch 22, validation: loss=0.2075, simple_loss=0.3107, pruned_loss=0.05212, over 143649.00 frames. 2023-06-15 07:42:07,057 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 07:42:07,621 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=74533.33333333333, ans=0.125 2023-06-15 07:42:18,967 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=74533.33333333333, ans=0.05 2023-06-15 07:42:23,955 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=74600.0, ans=0.0 2023-06-15 07:43:03,106 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.484e+02 1.996e+02 2.190e+02 2.519e+02 3.668e+02, threshold=4.380e+02, percent-clipped=0.0 2023-06-15 07:43:05,239 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=74733.33333333333, ans=0.125 2023-06-15 07:43:28,564 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=74800.0, ans=0.125 2023-06-15 07:43:35,409 INFO [train.py:988] (0/4) Epoch 22, batch 50, loss[loss=0.2395, simple_loss=0.313, pruned_loss=0.083, over 19802.00 frames. ], tot_loss[loss=0.247, simple_loss=0.315, pruned_loss=0.08952, over 861805.21 frames. ], batch size: 115, lr: 1.43e-02, grad_scale: 32.0 2023-06-15 07:44:05,909 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-06-15 07:44:31,571 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2023-06-15 07:44:35,946 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=75066.66666666667, ans=0.0 2023-06-15 07:45:02,459 INFO [train.py:988] (0/4) Epoch 22, batch 100, loss[loss=0.2282, simple_loss=0.299, pruned_loss=0.07874, over 19671.00 frames. ], tot_loss[loss=0.2457, simple_loss=0.3164, pruned_loss=0.08743, over 1514060.76 frames. ], batch size: 110, lr: 1.43e-02, grad_scale: 32.0 2023-06-15 07:45:06,143 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=75200.0, ans=0.09899494936611666 2023-06-15 07:45:13,704 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-06-15 07:45:23,694 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=75266.66666666667, ans=0.0 2023-06-15 07:45:35,833 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-06-15 07:45:35,922 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=28.80 vs. limit=15.0 2023-06-15 07:45:40,780 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=75333.33333333333, ans=0.5 2023-06-15 07:45:49,177 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=75333.33333333333, ans=0.1 2023-06-15 07:45:59,418 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.531e+02 1.974e+02 2.218e+02 2.504e+02 3.922e+02, threshold=4.437e+02, percent-clipped=0.0 2023-06-15 07:46:07,510 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2023-06-15 07:46:18,270 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=75466.66666666667, ans=0.125 2023-06-15 07:46:27,007 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=75466.66666666667, ans=0.0 2023-06-15 07:46:29,303 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.24 vs. limit=15.0 2023-06-15 07:46:31,638 INFO [train.py:988] (0/4) Epoch 22, batch 150, loss[loss=0.2487, simple_loss=0.3096, pruned_loss=0.09391, over 20303.00 frames. ], tot_loss[loss=0.2462, simple_loss=0.3158, pruned_loss=0.08825, over 2009212.69 frames. ], batch size: 149, lr: 1.43e-02, grad_scale: 32.0 2023-06-15 07:46:42,431 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=75533.33333333333, ans=0.125 2023-06-15 07:46:46,287 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=12.0 2023-06-15 07:46:46,344 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-06-15 07:47:06,175 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2023-06-15 07:47:41,826 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=75800.0, ans=0.0 2023-06-15 07:48:00,061 INFO [train.py:988] (0/4) Epoch 22, batch 200, loss[loss=0.2352, simple_loss=0.3109, pruned_loss=0.07979, over 19823.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.3158, pruned_loss=0.08759, over 2403326.44 frames. ], batch size: 115, lr: 1.43e-02, grad_scale: 32.0 2023-06-15 07:48:24,048 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2023-06-15 07:48:26,873 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-06-15 07:48:30,429 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=75933.33333333333, ans=0.125 2023-06-15 07:48:45,006 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 07:48:55,142 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=76066.66666666667, ans=0.125 2023-06-15 07:48:56,262 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.537e+02 1.899e+02 2.203e+02 2.409e+02 3.907e+02, threshold=4.406e+02, percent-clipped=0.0 2023-06-15 07:48:57,316 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=76066.66666666667, ans=0.1 2023-06-15 07:49:01,335 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-06-15 07:49:16,327 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=76133.33333333333, ans=0.1 2023-06-15 07:49:27,455 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=76200.0, ans=0.0 2023-06-15 07:49:29,219 INFO [train.py:988] (0/4) Epoch 22, batch 250, loss[loss=0.2632, simple_loss=0.3492, pruned_loss=0.08866, over 15469.00 frames. ], tot_loss[loss=0.2448, simple_loss=0.3147, pruned_loss=0.08743, over 2708138.13 frames. ], batch size: 44, lr: 1.43e-02, grad_scale: 64.0 2023-06-15 07:49:34,642 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=76200.0, ans=0.125 2023-06-15 07:49:57,064 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=76266.66666666667, ans=0.125 2023-06-15 07:50:03,551 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-06-15 07:50:37,935 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=76466.66666666667, ans=0.0 2023-06-15 07:50:52,098 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=76466.66666666667, ans=0.125 2023-06-15 07:50:57,239 INFO [train.py:988] (0/4) Epoch 22, batch 300, loss[loss=0.2578, simple_loss=0.3149, pruned_loss=0.1004, over 19858.00 frames. ], tot_loss[loss=0.2448, simple_loss=0.3147, pruned_loss=0.08749, over 2948361.70 frames. ], batch size: 294, lr: 1.42e-02, grad_scale: 64.0 2023-06-15 07:51:12,907 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=76600.0, ans=0.2 2023-06-15 07:51:16,975 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76600.0, ans=0.1 2023-06-15 07:51:26,525 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=76600.0, ans=0.0 2023-06-15 07:51:43,909 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=76666.66666666667, ans=0.0 2023-06-15 07:51:53,162 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+02 1.887e+02 2.089e+02 2.496e+02 3.491e+02, threshold=4.177e+02, percent-clipped=0.0 2023-06-15 07:52:23,191 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=76866.66666666667, ans=0.125 2023-06-15 07:52:24,344 INFO [train.py:988] (0/4) Epoch 22, batch 350, loss[loss=0.2537, simple_loss=0.3315, pruned_loss=0.08795, over 15147.00 frames. ], tot_loss[loss=0.2441, simple_loss=0.3144, pruned_loss=0.08685, over 3137411.43 frames. ], batch size: 43, lr: 1.42e-02, grad_scale: 64.0 2023-06-15 07:52:28,220 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=76866.66666666667, ans=0.1 2023-06-15 07:53:26,280 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=77066.66666666667, ans=0.125 2023-06-15 07:53:30,053 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=77066.66666666667, ans=0.0 2023-06-15 07:53:54,159 INFO [train.py:988] (0/4) Epoch 22, batch 400, loss[loss=0.2455, simple_loss=0.315, pruned_loss=0.08801, over 19648.00 frames. ], tot_loss[loss=0.2438, simple_loss=0.3143, pruned_loss=0.08669, over 3306201.11 frames. ], batch size: 110, lr: 1.42e-02, grad_scale: 64.0 2023-06-15 07:53:56,780 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=77200.0, ans=0.0 2023-06-15 07:54:18,041 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=77266.66666666667, ans=0.2 2023-06-15 07:54:32,460 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=77333.33333333333, ans=0.125 2023-06-15 07:54:39,014 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=77333.33333333333, ans=0.0 2023-06-15 07:54:47,777 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 07:54:50,821 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.618e+02 1.936e+02 2.140e+02 2.519e+02 4.231e+02, threshold=4.281e+02, percent-clipped=1.0 2023-06-15 07:54:56,370 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-06-15 07:54:59,079 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=77400.0, ans=0.125 2023-06-15 07:55:03,184 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.20 vs. limit=10.0 2023-06-15 07:55:14,037 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=77466.66666666667, ans=0.09899494936611666 2023-06-15 07:55:19,504 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=77466.66666666667, ans=0.0 2023-06-15 07:55:22,554 INFO [train.py:988] (0/4) Epoch 22, batch 450, loss[loss=0.2557, simple_loss=0.3195, pruned_loss=0.09594, over 20622.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.315, pruned_loss=0.08677, over 3392119.54 frames. ], batch size: 173, lr: 1.42e-02, grad_scale: 64.0 2023-06-15 07:55:52,573 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=77600.0, ans=0.5 2023-06-15 07:56:04,162 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 07:56:18,612 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-06-15 07:56:46,432 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=77800.0, ans=0.125 2023-06-15 07:56:49,258 INFO [train.py:988] (0/4) Epoch 22, batch 500, loss[loss=0.2432, simple_loss=0.3174, pruned_loss=0.08454, over 19081.00 frames. ], tot_loss[loss=0.2448, simple_loss=0.3157, pruned_loss=0.08698, over 3461584.78 frames. ], batch size: 89, lr: 1.42e-02, grad_scale: 64.0 2023-06-15 07:57:03,372 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=77866.66666666667, ans=0.125 2023-06-15 07:57:29,675 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=78000.0, ans=0.125 2023-06-15 07:57:38,907 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=78066.66666666667, ans=0.125 2023-06-15 07:57:39,394 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.10 vs. limit=22.5 2023-06-15 07:57:41,527 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-22.pt 2023-06-15 07:58:03,422 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2023-06-15 07:58:03,840 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.539e+02 1.901e+02 2.129e+02 2.481e+02 3.635e+02, threshold=4.258e+02, percent-clipped=0.0 2023-06-15 07:58:03,890 INFO [train.py:988] (0/4) Epoch 23, batch 0, loss[loss=0.2418, simple_loss=0.325, pruned_loss=0.07935, over 15192.00 frames. ], tot_loss[loss=0.2418, simple_loss=0.325, pruned_loss=0.07935, over 15192.00 frames. ], batch size: 43, lr: 1.38e-02, grad_scale: 64.0 2023-06-15 07:58:03,891 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 07:58:10,166 INFO [train.py:1020] (0/4) Epoch 23, validation: loss=0.2051, simple_loss=0.3092, pruned_loss=0.05051, over 143649.00 frames. 2023-06-15 07:58:10,166 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 07:58:14,000 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=78080.0, ans=0.0 2023-06-15 07:58:16,281 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=78080.0, ans=0.125 2023-06-15 07:58:41,854 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=78146.66666666667, ans=0.0 2023-06-15 07:58:46,298 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.28 vs. limit=15.0 2023-06-15 07:59:03,225 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=78280.0, ans=0.125 2023-06-15 07:59:09,953 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=78280.0, ans=0.125 2023-06-15 07:59:39,926 INFO [train.py:988] (0/4) Epoch 23, batch 50, loss[loss=0.2424, simple_loss=0.3152, pruned_loss=0.08477, over 20317.00 frames. ], tot_loss[loss=0.243, simple_loss=0.3151, pruned_loss=0.08544, over 856075.40 frames. ], batch size: 149, lr: 1.38e-02, grad_scale: 64.0 2023-06-15 07:59:46,951 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78413.33333333333, ans=0.1 2023-06-15 08:00:00,413 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=78480.0, ans=0.2 2023-06-15 08:00:13,586 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=78480.0, ans=0.0 2023-06-15 08:00:20,522 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=78546.66666666667, ans=0.09899494936611666 2023-06-15 08:00:55,306 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=78680.0, ans=0.125 2023-06-15 08:00:57,176 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=78680.0, ans=0.125 2023-06-15 08:01:10,775 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.600e+02 1.902e+02 2.065e+02 2.419e+02 3.199e+02, threshold=4.129e+02, percent-clipped=0.0 2023-06-15 08:01:10,821 INFO [train.py:988] (0/4) Epoch 23, batch 100, loss[loss=0.2478, simple_loss=0.325, pruned_loss=0.0853, over 18613.00 frames. ], tot_loss[loss=0.2433, simple_loss=0.3144, pruned_loss=0.08613, over 1507149.31 frames. ], batch size: 80, lr: 1.38e-02, grad_scale: 64.0 2023-06-15 08:01:16,229 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=78746.66666666667, ans=0.025 2023-06-15 08:01:17,945 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:01:28,389 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=78813.33333333333, ans=0.125 2023-06-15 08:01:30,342 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78813.33333333333, ans=0.1 2023-06-15 08:01:41,614 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:02:28,821 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2023-06-15 08:02:31,972 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=79013.33333333333, ans=0.125 2023-06-15 08:02:40,318 INFO [train.py:988] (0/4) Epoch 23, batch 150, loss[loss=0.2233, simple_loss=0.3036, pruned_loss=0.07146, over 19317.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.3124, pruned_loss=0.08473, over 2019138.70 frames. ], batch size: 98, lr: 1.38e-02, grad_scale: 64.0 2023-06-15 08:02:58,340 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-06-15 08:03:46,496 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=79280.0, ans=0.125 2023-06-15 08:04:09,176 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.427e+02 1.913e+02 2.150e+02 2.448e+02 4.177e+02, threshold=4.300e+02, percent-clipped=1.0 2023-06-15 08:04:09,223 INFO [train.py:988] (0/4) Epoch 23, batch 200, loss[loss=0.2351, simple_loss=0.3016, pruned_loss=0.08428, over 20644.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.313, pruned_loss=0.08509, over 2412706.82 frames. ], batch size: 189, lr: 1.37e-02, grad_scale: 64.0 2023-06-15 08:04:16,971 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2023-06-15 08:04:26,458 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79480.0, ans=0.1 2023-06-15 08:05:10,412 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79613.33333333333, ans=0.1 2023-06-15 08:05:10,699 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:05:12,440 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=79613.33333333333, ans=0.0 2023-06-15 08:05:32,518 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=79680.0, ans=0.0 2023-06-15 08:05:36,552 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=79746.66666666667, ans=0.2 2023-06-15 08:05:37,750 INFO [train.py:988] (0/4) Epoch 23, batch 250, loss[loss=0.2527, simple_loss=0.3295, pruned_loss=0.08793, over 18413.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.3132, pruned_loss=0.0848, over 2713806.83 frames. ], batch size: 77, lr: 1.37e-02, grad_scale: 64.0 2023-06-15 08:05:56,544 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=79813.33333333333, ans=0.125 2023-06-15 08:06:09,699 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.47 vs. limit=15.0 2023-06-15 08:06:34,331 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=79946.66666666667, ans=0.125 2023-06-15 08:06:37,407 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=79946.66666666667, ans=0.0 2023-06-15 08:06:40,881 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79946.66666666667, ans=0.1 2023-06-15 08:06:42,665 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/checkpoint-12000.pt 2023-06-15 08:07:06,547 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:07:10,029 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.585e+02 1.868e+02 2.154e+02 2.660e+02 5.559e+02, threshold=4.308e+02, percent-clipped=2.0 2023-06-15 08:07:10,076 INFO [train.py:988] (0/4) Epoch 23, batch 300, loss[loss=0.2524, simple_loss=0.3198, pruned_loss=0.09249, over 20447.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.3136, pruned_loss=0.08462, over 2953363.24 frames. ], batch size: 160, lr: 1.37e-02, grad_scale: 64.0 2023-06-15 08:07:53,177 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=80213.33333333333, ans=0.1 2023-06-15 08:08:31,692 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80346.66666666667, ans=0.125 2023-06-15 08:08:38,474 INFO [train.py:988] (0/4) Epoch 23, batch 350, loss[loss=0.2405, simple_loss=0.3023, pruned_loss=0.08937, over 20278.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.3132, pruned_loss=0.08464, over 3121070.43 frames. ], batch size: 239, lr: 1.37e-02, grad_scale: 64.0 2023-06-15 08:08:47,363 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=80413.33333333333, ans=0.0 2023-06-15 08:08:50,497 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=80413.33333333333, ans=0.0 2023-06-15 08:09:05,520 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=80480.0, ans=0.125 2023-06-15 08:09:16,855 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=80546.66666666667, ans=0.125 2023-06-15 08:09:32,248 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=80613.33333333333, ans=0.2 2023-06-15 08:09:47,829 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=80680.0, ans=0.125 2023-06-15 08:10:05,632 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.477e+02 1.945e+02 2.149e+02 2.617e+02 4.906e+02, threshold=4.297e+02, percent-clipped=1.0 2023-06-15 08:10:05,678 INFO [train.py:988] (0/4) Epoch 23, batch 400, loss[loss=0.2388, simple_loss=0.3188, pruned_loss=0.07937, over 18334.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.3135, pruned_loss=0.08454, over 3271066.18 frames. ], batch size: 74, lr: 1.37e-02, grad_scale: 64.0 2023-06-15 08:10:46,587 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=80880.0, ans=0.0 2023-06-15 08:11:34,898 INFO [train.py:988] (0/4) Epoch 23, batch 450, loss[loss=0.2404, simple_loss=0.3055, pruned_loss=0.0877, over 20593.00 frames. ], tot_loss[loss=0.2418, simple_loss=0.3141, pruned_loss=0.0847, over 3384939.82 frames. ], batch size: 189, lr: 1.36e-02, grad_scale: 64.0 2023-06-15 08:11:35,106 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=81080.0, ans=0.1 2023-06-15 08:12:15,791 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81213.33333333333, ans=0.1 2023-06-15 08:12:39,971 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-06-15 08:12:58,052 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:12:59,902 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=81413.33333333333, ans=0.0 2023-06-15 08:13:01,014 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+02 2.014e+02 2.230e+02 2.721e+02 4.611e+02, threshold=4.461e+02, percent-clipped=1.0 2023-06-15 08:13:01,062 INFO [train.py:988] (0/4) Epoch 23, batch 500, loss[loss=0.2488, simple_loss=0.326, pruned_loss=0.08577, over 18271.00 frames. ], tot_loss[loss=0.241, simple_loss=0.3136, pruned_loss=0.08425, over 3479135.05 frames. ], batch size: 74, lr: 1.36e-02, grad_scale: 64.0 2023-06-15 08:13:03,890 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2023-06-15 08:13:36,448 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=81546.66666666667, ans=0.09899494936611666 2023-06-15 08:13:54,244 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-23.pt 2023-06-15 08:14:20,676 INFO [train.py:988] (0/4) Epoch 24, batch 0, loss[loss=0.2771, simple_loss=0.3593, pruned_loss=0.09747, over 15206.00 frames. ], tot_loss[loss=0.2771, simple_loss=0.3593, pruned_loss=0.09747, over 15206.00 frames. ], batch size: 43, lr: 1.33e-02, grad_scale: 64.0 2023-06-15 08:14:20,678 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 08:14:27,199 INFO [train.py:1020] (0/4) Epoch 24, validation: loss=0.2057, simple_loss=0.3089, pruned_loss=0.05123, over 143649.00 frames. 2023-06-15 08:14:27,200 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 08:14:58,819 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81693.33333333333, ans=0.1 2023-06-15 08:15:04,253 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=81760.0, ans=0.125 2023-06-15 08:15:23,517 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-06-15 08:15:35,605 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=81826.66666666667, ans=15.0 2023-06-15 08:15:40,279 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=81893.33333333333, ans=0.0 2023-06-15 08:15:48,554 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=81893.33333333333, ans=0.125 2023-06-15 08:15:57,117 INFO [train.py:988] (0/4) Epoch 24, batch 50, loss[loss=0.254, simple_loss=0.3157, pruned_loss=0.09616, over 20108.00 frames. ], tot_loss[loss=0.236, simple_loss=0.3098, pruned_loss=0.08115, over 873637.13 frames. ], batch size: 239, lr: 1.33e-02, grad_scale: 64.0 2023-06-15 08:16:08,623 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:16:11,382 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.55 vs. limit=22.5 2023-06-15 08:16:24,493 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=82026.66666666667, ans=0.1 2023-06-15 08:16:24,499 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=82026.66666666667, ans=0.0 2023-06-15 08:16:29,077 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.509e+02 1.940e+02 2.249e+02 2.625e+02 3.999e+02, threshold=4.499e+02, percent-clipped=0.0 2023-06-15 08:17:00,188 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=82160.0, ans=0.125 2023-06-15 08:17:25,776 INFO [train.py:988] (0/4) Epoch 24, batch 100, loss[loss=0.2624, simple_loss=0.3442, pruned_loss=0.09035, over 18325.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.3117, pruned_loss=0.08203, over 1523238.75 frames. ], batch size: 72, lr: 1.33e-02, grad_scale: 64.0 2023-06-15 08:18:16,216 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:18:45,627 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=12.0 2023-06-15 08:18:54,699 INFO [train.py:988] (0/4) Epoch 24, batch 150, loss[loss=0.2307, simple_loss=0.3066, pruned_loss=0.07741, over 18919.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.3121, pruned_loss=0.08355, over 2010420.46 frames. ], batch size: 86, lr: 1.33e-02, grad_scale: 64.0 2023-06-15 08:18:55,060 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=82626.66666666667, ans=0.125 2023-06-15 08:19:19,220 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2023-06-15 08:19:27,041 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.579e+02 1.862e+02 2.076e+02 2.332e+02 3.767e+02, threshold=4.152e+02, percent-clipped=0.0 2023-06-15 08:19:57,024 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=82826.66666666667, ans=0.0 2023-06-15 08:19:57,229 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=82826.66666666667, ans=0.0 2023-06-15 08:20:17,350 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=82893.33333333333, ans=0.0 2023-06-15 08:20:19,239 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=82893.33333333333, ans=0.025 2023-06-15 08:20:19,309 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=82893.33333333333, ans=0.125 2023-06-15 08:20:22,781 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=82960.0, ans=0.125 2023-06-15 08:20:24,124 INFO [train.py:988] (0/4) Epoch 24, batch 200, loss[loss=0.2376, simple_loss=0.3073, pruned_loss=0.08394, over 20123.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.3131, pruned_loss=0.08411, over 2395663.18 frames. ], batch size: 133, lr: 1.32e-02, grad_scale: 64.0 2023-06-15 08:20:32,758 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82960.0, ans=0.1 2023-06-15 08:20:40,260 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=83026.66666666667, ans=0.125 2023-06-15 08:21:04,235 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2023-06-15 08:21:29,303 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-06-15 08:21:33,025 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-06-15 08:21:36,104 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=83226.66666666667, ans=0.0 2023-06-15 08:21:53,197 INFO [train.py:988] (0/4) Epoch 24, batch 250, loss[loss=0.2474, simple_loss=0.3306, pruned_loss=0.08206, over 16403.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.3119, pruned_loss=0.08361, over 2681144.28 frames. ], batch size: 52, lr: 1.32e-02, grad_scale: 64.0 2023-06-15 08:22:24,751 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.498e+02 1.954e+02 2.132e+02 2.511e+02 4.253e+02, threshold=4.265e+02, percent-clipped=1.0 2023-06-15 08:22:25,259 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=83360.0, ans=0.125 2023-06-15 08:22:55,709 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-06-15 08:23:01,889 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=83560.0, ans=0.2 2023-06-15 08:23:17,683 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.07 vs. limit=10.0 2023-06-15 08:23:21,359 INFO [train.py:988] (0/4) Epoch 24, batch 300, loss[loss=0.2226, simple_loss=0.2982, pruned_loss=0.07348, over 19538.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.3116, pruned_loss=0.08356, over 2930527.45 frames. ], batch size: 102, lr: 1.32e-02, grad_scale: 64.0 2023-06-15 08:23:42,744 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:23:49,365 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83693.33333333333, ans=0.1 2023-06-15 08:24:14,787 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=83826.66666666667, ans=0.2 2023-06-15 08:24:23,305 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=12.0 2023-06-15 08:24:34,660 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=83893.33333333333, ans=22.5 2023-06-15 08:24:38,589 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=83893.33333333333, ans=0.125 2023-06-15 08:24:50,649 INFO [train.py:988] (0/4) Epoch 24, batch 350, loss[loss=0.2184, simple_loss=0.3013, pruned_loss=0.0678, over 18669.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.3109, pruned_loss=0.08313, over 3118822.52 frames. ], batch size: 80, lr: 1.32e-02, grad_scale: 64.0 2023-06-15 08:25:12,795 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=84026.66666666667, ans=0.0 2023-06-15 08:25:22,580 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+02 1.988e+02 2.360e+02 2.767e+02 4.250e+02, threshold=4.720e+02, percent-clipped=0.0 2023-06-15 08:25:32,664 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=84093.33333333333, ans=0.125 2023-06-15 08:25:48,030 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84160.0, ans=0.1 2023-06-15 08:25:59,380 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=84160.0, ans=0.125 2023-06-15 08:26:09,768 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=84226.66666666667, ans=0.125 2023-06-15 08:26:21,076 INFO [train.py:988] (0/4) Epoch 24, batch 400, loss[loss=0.2298, simple_loss=0.3031, pruned_loss=0.07831, over 19450.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.3105, pruned_loss=0.08255, over 3282467.09 frames. ], batch size: 105, lr: 1.32e-02, grad_scale: 64.0 2023-06-15 08:26:29,991 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=84293.33333333333, ans=0.2 2023-06-15 08:26:33,078 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=84293.33333333333, ans=0.125 2023-06-15 08:27:16,276 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=84493.33333333333, ans=0.125 2023-06-15 08:27:26,656 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=84493.33333333333, ans=0.1 2023-06-15 08:27:46,536 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=84560.0, ans=0.125 2023-06-15 08:27:49,970 INFO [train.py:988] (0/4) Epoch 24, batch 450, loss[loss=0.2248, simple_loss=0.3031, pruned_loss=0.07322, over 19703.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.311, pruned_loss=0.08271, over 3390319.07 frames. ], batch size: 110, lr: 1.31e-02, grad_scale: 64.0 2023-06-15 08:28:00,596 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=84626.66666666667, ans=0.125 2023-06-15 08:28:14,044 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=84693.33333333333, ans=0.2 2023-06-15 08:28:21,504 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.485e+02 1.787e+02 2.025e+02 2.287e+02 3.179e+02, threshold=4.050e+02, percent-clipped=0.0 2023-06-15 08:29:06,304 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2023-06-15 08:29:15,502 INFO [train.py:988] (0/4) Epoch 24, batch 500, loss[loss=0.2511, simple_loss=0.3119, pruned_loss=0.09518, over 20624.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.3108, pruned_loss=0.08308, over 3469422.41 frames. ], batch size: 211, lr: 1.31e-02, grad_scale: 32.0 2023-06-15 08:29:30,893 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=85026.66666666667, ans=0.0 2023-06-15 08:29:31,474 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-06-15 08:29:47,092 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=85093.33333333333, ans=0.05 2023-06-15 08:29:55,412 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=85093.33333333333, ans=0.0 2023-06-15 08:30:08,477 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-24.pt 2023-06-15 08:30:31,629 INFO [train.py:988] (0/4) Epoch 25, batch 0, loss[loss=0.2257, simple_loss=0.3029, pruned_loss=0.07423, over 18652.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.3029, pruned_loss=0.07423, over 18652.00 frames. ], batch size: 80, lr: 1.29e-02, grad_scale: 32.0 2023-06-15 08:30:31,630 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 08:30:37,743 INFO [train.py:1020] (0/4) Epoch 25, validation: loss=0.205, simple_loss=0.3085, pruned_loss=0.05071, over 143649.00 frames. 2023-06-15 08:30:37,744 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 08:30:52,857 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2023-06-15 08:31:31,795 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.23 vs. limit=10.0 2023-06-15 08:31:40,898 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=85373.33333333333, ans=0.125 2023-06-15 08:31:44,634 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.449e+02 1.894e+02 2.218e+02 2.492e+02 3.446e+02, threshold=4.437e+02, percent-clipped=0.0 2023-06-15 08:31:48,387 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=85440.0, ans=0.0 2023-06-15 08:32:07,397 INFO [train.py:988] (0/4) Epoch 25, batch 50, loss[loss=0.2305, simple_loss=0.2891, pruned_loss=0.08592, over 20195.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.3076, pruned_loss=0.08044, over 863996.61 frames. ], batch size: 239, lr: 1.28e-02, grad_scale: 32.0 2023-06-15 08:32:28,396 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-06-15 08:32:53,015 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=85640.0, ans=0.0 2023-06-15 08:32:59,857 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2023-06-15 08:33:15,119 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=85706.66666666667, ans=0.125 2023-06-15 08:33:25,729 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2023-06-15 08:33:34,497 INFO [train.py:988] (0/4) Epoch 25, batch 100, loss[loss=0.246, simple_loss=0.3058, pruned_loss=0.09313, over 20512.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.3106, pruned_loss=0.08204, over 1499412.49 frames. ], batch size: 160, lr: 1.28e-02, grad_scale: 32.0 2023-06-15 08:33:48,144 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85840.0, ans=0.1 2023-06-15 08:33:53,003 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:34:26,755 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2023-06-15 08:34:39,967 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.426e+02 1.831e+02 2.059e+02 2.312e+02 3.649e+02, threshold=4.117e+02, percent-clipped=0.0 2023-06-15 08:34:45,267 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=22.5 2023-06-15 08:35:02,331 INFO [train.py:988] (0/4) Epoch 25, batch 150, loss[loss=0.2511, simple_loss=0.2846, pruned_loss=0.1088, over 16749.00 frames. ], tot_loss[loss=0.2371, simple_loss=0.3106, pruned_loss=0.08176, over 2009311.81 frames. ], batch size: 391, lr: 1.28e-02, grad_scale: 32.0 2023-06-15 08:35:39,989 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=86306.66666666667, ans=0.0 2023-06-15 08:35:52,909 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-06-15 08:36:19,352 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=86440.0, ans=0.0 2023-06-15 08:36:27,594 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86440.0, ans=0.0 2023-06-15 08:36:30,647 INFO [train.py:988] (0/4) Epoch 25, batch 200, loss[loss=0.2366, simple_loss=0.3007, pruned_loss=0.08621, over 20662.00 frames. ], tot_loss[loss=0.2381, simple_loss=0.3105, pruned_loss=0.08279, over 2388863.46 frames. ], batch size: 211, lr: 1.28e-02, grad_scale: 32.0 2023-06-15 08:36:34,722 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-06-15 08:37:15,580 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=86640.0, ans=0.125 2023-06-15 08:37:34,642 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.465e+02 1.840e+02 2.036e+02 2.337e+02 3.806e+02, threshold=4.072e+02, percent-clipped=0.0 2023-06-15 08:37:34,877 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=86706.66666666667, ans=0.05 2023-06-15 08:37:35,723 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=12.0 2023-06-15 08:37:40,827 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=86773.33333333333, ans=0.025 2023-06-15 08:37:58,074 INFO [train.py:988] (0/4) Epoch 25, batch 250, loss[loss=0.2188, simple_loss=0.299, pruned_loss=0.06932, over 18450.00 frames. ], tot_loss[loss=0.237, simple_loss=0.3104, pruned_loss=0.08179, over 2695880.28 frames. ], batch size: 77, lr: 1.28e-02, grad_scale: 32.0 2023-06-15 08:37:58,406 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=86840.0, ans=0.125 2023-06-15 08:38:11,147 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=86840.0, ans=0.035 2023-06-15 08:38:25,233 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=86906.66666666667, ans=0.125 2023-06-15 08:38:32,008 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=86973.33333333333, ans=0.0 2023-06-15 08:39:18,096 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=87106.66666666667, ans=0.2 2023-06-15 08:39:21,405 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87106.66666666667, ans=0.1 2023-06-15 08:39:21,462 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=87106.66666666667, ans=0.2 2023-06-15 08:39:26,083 INFO [train.py:988] (0/4) Epoch 25, batch 300, loss[loss=0.2394, simple_loss=0.3035, pruned_loss=0.08765, over 20540.00 frames. ], tot_loss[loss=0.2371, simple_loss=0.3105, pruned_loss=0.08179, over 2932610.78 frames. ], batch size: 189, lr: 1.27e-02, grad_scale: 32.0 2023-06-15 08:39:30,598 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=12.0 2023-06-15 08:40:06,998 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=87306.66666666667, ans=0.2 2023-06-15 08:40:31,123 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.531e+02 1.920e+02 2.106e+02 2.371e+02 3.707e+02, threshold=4.212e+02, percent-clipped=0.0 2023-06-15 08:40:35,056 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87440.0, ans=0.1 2023-06-15 08:40:39,098 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=87440.0, ans=0.0 2023-06-15 08:40:52,819 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=87506.66666666667, ans=0.2 2023-06-15 08:40:54,079 INFO [train.py:988] (0/4) Epoch 25, batch 350, loss[loss=0.2488, simple_loss=0.33, pruned_loss=0.08384, over 17063.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.3098, pruned_loss=0.08131, over 3117744.47 frames. ], batch size: 60, lr: 1.27e-02, grad_scale: 32.0 2023-06-15 08:41:11,608 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=87573.33333333333, ans=0.09899494936611666 2023-06-15 08:41:24,033 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=22.5 2023-06-15 08:41:33,671 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-06-15 08:41:35,031 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=87640.0, ans=0.125 2023-06-15 08:42:21,279 INFO [train.py:988] (0/4) Epoch 25, batch 400, loss[loss=0.2415, simple_loss=0.3168, pruned_loss=0.08312, over 19514.00 frames. ], tot_loss[loss=0.2357, simple_loss=0.3094, pruned_loss=0.08103, over 3278393.61 frames. ], batch size: 102, lr: 1.27e-02, grad_scale: 32.0 2023-06-15 08:42:25,789 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=87840.0, ans=0.125 2023-06-15 08:42:41,709 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=87906.66666666667, ans=0.1 2023-06-15 08:42:43,127 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=87906.66666666667, ans=0.125 2023-06-15 08:43:15,489 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=88040.0, ans=0.125 2023-06-15 08:43:27,908 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.558e+02 1.933e+02 2.148e+02 2.533e+02 3.587e+02, threshold=4.297e+02, percent-clipped=0.0 2023-06-15 08:43:31,011 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2023-06-15 08:43:31,982 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=88106.66666666667, ans=0.125 2023-06-15 08:43:40,798 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=88106.66666666667, ans=0.0 2023-06-15 08:43:42,321 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=88106.66666666667, ans=0.125 2023-06-15 08:43:42,461 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=88106.66666666667, ans=0.125 2023-06-15 08:43:44,139 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=88106.66666666667, ans=0.2 2023-06-15 08:43:50,322 INFO [train.py:988] (0/4) Epoch 25, batch 450, loss[loss=0.2247, simple_loss=0.2997, pruned_loss=0.07486, over 19075.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.3093, pruned_loss=0.08045, over 3393479.85 frames. ], batch size: 89, lr: 1.27e-02, grad_scale: 32.0 2023-06-15 08:44:54,504 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-06-15 08:45:00,391 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=88440.0, ans=0.0 2023-06-15 08:45:02,246 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=88440.0, ans=0.0 2023-06-15 08:45:07,048 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=88440.0, ans=0.0 2023-06-15 08:45:15,637 INFO [train.py:988] (0/4) Epoch 25, batch 500, loss[loss=0.2256, simple_loss=0.2893, pruned_loss=0.08092, over 20257.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.3093, pruned_loss=0.08073, over 3496110.83 frames. ], batch size: 239, lr: 1.27e-02, grad_scale: 32.0 2023-06-15 08:45:29,146 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=88506.66666666667, ans=0.1 2023-06-15 08:45:42,355 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=88573.33333333333, ans=0.125 2023-06-15 08:46:07,656 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-25.pt 2023-06-15 08:46:29,698 INFO [train.py:988] (0/4) Epoch 26, batch 0, loss[loss=0.2387, simple_loss=0.3159, pruned_loss=0.08081, over 19809.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.3159, pruned_loss=0.08081, over 19809.00 frames. ], batch size: 115, lr: 1.24e-02, grad_scale: 32.0 2023-06-15 08:46:29,699 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 08:46:35,679 INFO [train.py:1020] (0/4) Epoch 26, validation: loss=0.2057, simple_loss=0.3076, pruned_loss=0.05187, over 143649.00 frames. 2023-06-15 08:46:35,680 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 08:46:43,670 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.407e+02 1.988e+02 2.148e+02 2.357e+02 3.601e+02, threshold=4.296e+02, percent-clipped=0.0 2023-06-15 08:46:51,251 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88786.66666666667, ans=0.1 2023-06-15 08:46:56,004 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=88786.66666666667, ans=0.0 2023-06-15 08:46:58,486 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-06-15 08:47:05,175 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=88786.66666666667, ans=0.0 2023-06-15 08:47:10,728 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=88853.33333333333, ans=0.0 2023-06-15 08:47:11,307 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-06-15 08:47:21,310 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=88853.33333333333, ans=0.125 2023-06-15 08:47:28,534 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-06-15 08:48:02,871 INFO [train.py:988] (0/4) Epoch 26, batch 50, loss[loss=0.2271, simple_loss=0.2951, pruned_loss=0.0795, over 19973.00 frames. ], tot_loss[loss=0.2371, simple_loss=0.3093, pruned_loss=0.0824, over 865896.22 frames. ], batch size: 126, lr: 1.24e-02, grad_scale: 32.0 2023-06-15 08:48:06,696 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=89053.33333333333, ans=0.0 2023-06-15 08:48:14,973 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=89053.33333333333, ans=0.125 2023-06-15 08:48:20,290 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89120.0, ans=0.1 2023-06-15 08:48:20,350 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=89120.0, ans=0.035 2023-06-15 08:48:25,847 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=89120.0, ans=0.0 2023-06-15 08:48:58,021 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=89253.33333333333, ans=0.1 2023-06-15 08:49:14,712 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-06-15 08:49:26,497 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=89320.0, ans=0.125 2023-06-15 08:49:32,997 INFO [train.py:988] (0/4) Epoch 26, batch 100, loss[loss=0.2607, simple_loss=0.3354, pruned_loss=0.09296, over 16709.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.3075, pruned_loss=0.08175, over 1530055.69 frames. ], batch size: 59, lr: 1.24e-02, grad_scale: 32.0 2023-06-15 08:49:41,540 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.560e+02 1.958e+02 2.207e+02 2.501e+02 3.726e+02, threshold=4.413e+02, percent-clipped=0.0 2023-06-15 08:50:06,734 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=89520.0, ans=0.2 2023-06-15 08:50:10,246 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=89520.0, ans=0.2 2023-06-15 08:50:47,759 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=89653.33333333333, ans=0.125 2023-06-15 08:51:01,447 INFO [train.py:988] (0/4) Epoch 26, batch 150, loss[loss=0.2248, simple_loss=0.2908, pruned_loss=0.07943, over 20288.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.306, pruned_loss=0.08108, over 2025753.90 frames. ], batch size: 239, lr: 1.24e-02, grad_scale: 32.0 2023-06-15 08:51:12,142 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=89720.0, ans=0.125 2023-06-15 08:51:55,306 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-06-15 08:51:56,317 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=89920.0, ans=0.125 2023-06-15 08:52:29,982 INFO [train.py:988] (0/4) Epoch 26, batch 200, loss[loss=0.2272, simple_loss=0.3059, pruned_loss=0.07423, over 19698.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.3055, pruned_loss=0.08043, over 2429355.98 frames. ], batch size: 110, lr: 1.23e-02, grad_scale: 32.0 2023-06-15 08:52:37,910 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=90053.33333333333, ans=0.125 2023-06-15 08:52:39,014 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.544e+02 1.858e+02 1.962e+02 2.261e+02 3.889e+02, threshold=3.924e+02, percent-clipped=0.0 2023-06-15 08:53:03,562 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=90186.66666666667, ans=0.0 2023-06-15 08:53:58,459 INFO [train.py:988] (0/4) Epoch 26, batch 250, loss[loss=0.2445, simple_loss=0.3211, pruned_loss=0.08399, over 19207.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.3056, pruned_loss=0.08013, over 2732957.73 frames. ], batch size: 92, lr: 1.23e-02, grad_scale: 32.0 2023-06-15 08:53:59,781 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-06-15 08:54:08,160 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 08:54:43,877 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=90520.0, ans=0.125 2023-06-15 08:54:55,714 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=90586.66666666667, ans=0.05 2023-06-15 08:55:05,453 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2023-06-15 08:55:08,048 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=90653.33333333333, ans=0.07 2023-06-15 08:55:24,156 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-06-15 08:55:25,293 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=90720.0, ans=0.125 2023-06-15 08:55:26,661 INFO [train.py:988] (0/4) Epoch 26, batch 300, loss[loss=0.2531, simple_loss=0.3354, pruned_loss=0.08535, over 18293.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.3068, pruned_loss=0.07971, over 2968118.56 frames. ], batch size: 72, lr: 1.23e-02, grad_scale: 32.0 2023-06-15 08:55:36,598 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.497e+02 1.907e+02 2.229e+02 2.657e+02 5.301e+02, threshold=4.457e+02, percent-clipped=1.0 2023-06-15 08:55:40,577 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=90720.0, ans=0.125 2023-06-15 08:56:35,441 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=90920.0, ans=0.125 2023-06-15 08:56:40,646 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=90986.66666666667, ans=0.0 2023-06-15 08:56:44,061 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=90986.66666666667, ans=0.09899494936611666 2023-06-15 08:56:55,550 INFO [train.py:988] (0/4) Epoch 26, batch 350, loss[loss=0.2361, simple_loss=0.3036, pruned_loss=0.0843, over 20762.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.3061, pruned_loss=0.07921, over 3156797.00 frames. ], batch size: 211, lr: 1.23e-02, grad_scale: 32.0 2023-06-15 08:56:56,047 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=91053.33333333333, ans=0.0 2023-06-15 08:57:42,280 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=91186.66666666667, ans=0.125 2023-06-15 08:58:24,523 INFO [train.py:988] (0/4) Epoch 26, batch 400, loss[loss=0.2371, simple_loss=0.3248, pruned_loss=0.07475, over 18306.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.3064, pruned_loss=0.07968, over 3290310.75 frames. ], batch size: 72, lr: 1.23e-02, grad_scale: 32.0 2023-06-15 08:58:33,461 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.572e+02 1.996e+02 2.310e+02 2.636e+02 4.226e+02, threshold=4.620e+02, percent-clipped=0.0 2023-06-15 08:58:52,887 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-06-15 08:58:54,177 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=91453.33333333333, ans=0.125 2023-06-15 08:58:59,323 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=91520.0, ans=0.125 2023-06-15 08:59:30,529 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2023-06-15 08:59:35,636 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=12.0 2023-06-15 08:59:40,654 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=91653.33333333333, ans=0.5 2023-06-15 08:59:53,733 INFO [train.py:988] (0/4) Epoch 26, batch 450, loss[loss=0.2338, simple_loss=0.2985, pruned_loss=0.08453, over 20009.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.3062, pruned_loss=0.07947, over 3415759.42 frames. ], batch size: 126, lr: 1.23e-02, grad_scale: 32.0 2023-06-15 08:59:54,063 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=91720.0, ans=0.125 2023-06-15 09:00:15,696 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=91786.66666666667, ans=0.125 2023-06-15 09:01:03,968 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=91986.66666666667, ans=0.0 2023-06-15 09:01:10,955 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=91986.66666666667, ans=0.0 2023-06-15 09:01:20,832 INFO [train.py:988] (0/4) Epoch 26, batch 500, loss[loss=0.248, simple_loss=0.33, pruned_loss=0.08302, over 16311.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.3063, pruned_loss=0.07929, over 3491303.20 frames. ], batch size: 52, lr: 1.22e-02, grad_scale: 32.0 2023-06-15 09:01:28,965 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.589e+02 1.890e+02 2.061e+02 2.503e+02 4.030e+02, threshold=4.121e+02, percent-clipped=0.0 2023-06-15 09:01:41,753 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=92120.0, ans=0.0 2023-06-15 09:01:51,607 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=92120.0, ans=0.0 2023-06-15 09:02:15,408 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-26.pt 2023-06-15 09:02:43,436 INFO [train.py:988] (0/4) Epoch 27, batch 0, loss[loss=0.2107, simple_loss=0.2928, pruned_loss=0.06428, over 18443.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2928, pruned_loss=0.06428, over 18443.00 frames. ], batch size: 77, lr: 1.20e-02, grad_scale: 32.0 2023-06-15 09:02:43,437 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 09:02:52,292 INFO [train.py:1020] (0/4) Epoch 27, validation: loss=0.2009, simple_loss=0.305, pruned_loss=0.04841, over 143649.00 frames. 2023-06-15 09:02:52,293 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 09:03:46,468 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=92473.33333333333, ans=0.125 2023-06-15 09:03:52,502 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=92473.33333333333, ans=10.0 2023-06-15 09:04:00,773 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=92473.33333333333, ans=0.0 2023-06-15 09:04:04,214 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=92540.0, ans=0.2 2023-06-15 09:04:21,896 INFO [train.py:988] (0/4) Epoch 27, batch 50, loss[loss=0.2146, simple_loss=0.2965, pruned_loss=0.06629, over 18911.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.3045, pruned_loss=0.07916, over 857950.19 frames. ], batch size: 86, lr: 1.20e-02, grad_scale: 32.0 2023-06-15 09:04:35,515 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=92606.66666666667, ans=0.125 2023-06-15 09:04:37,867 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=92673.33333333333, ans=0.0 2023-06-15 09:04:38,024 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=92673.33333333333, ans=0.07 2023-06-15 09:04:52,653 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=92673.33333333333, ans=0.125 2023-06-15 09:04:59,691 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.519e+02 1.842e+02 2.115e+02 2.294e+02 3.126e+02, threshold=4.230e+02, percent-clipped=0.0 2023-06-15 09:05:15,241 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-06-15 09:05:21,536 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=92806.66666666667, ans=0.1 2023-06-15 09:05:24,153 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-06-15 09:05:47,897 INFO [train.py:988] (0/4) Epoch 27, batch 100, loss[loss=0.2249, simple_loss=0.306, pruned_loss=0.07188, over 19322.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.3048, pruned_loss=0.07774, over 1526817.18 frames. ], batch size: 98, lr: 1.20e-02, grad_scale: 32.0 2023-06-15 09:06:01,010 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=92940.0, ans=0.125 2023-06-15 09:06:01,040 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=92940.0, ans=0.125 2023-06-15 09:06:10,146 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-06-15 09:06:21,839 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=93073.33333333333, ans=0.025 2023-06-15 09:06:24,903 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=93073.33333333333, ans=0.0 2023-06-15 09:06:41,255 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=93140.0, ans=0.0 2023-06-15 09:06:44,751 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=93140.0, ans=0.0 2023-06-15 09:06:56,271 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=93206.66666666667, ans=0.0 2023-06-15 09:07:13,726 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=93273.33333333333, ans=0.125 2023-06-15 09:07:14,881 INFO [train.py:988] (0/4) Epoch 27, batch 150, loss[loss=0.2409, simple_loss=0.2765, pruned_loss=0.1027, over 17062.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.303, pruned_loss=0.07706, over 2026681.14 frames. ], batch size: 391, lr: 1.19e-02, grad_scale: 32.0 2023-06-15 09:07:33,825 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=93340.0, ans=0.125 2023-06-15 09:07:54,429 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.514e+02 1.908e+02 2.202e+02 2.518e+02 3.722e+02, threshold=4.404e+02, percent-clipped=0.0 2023-06-15 09:07:54,954 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=93406.66666666667, ans=0.2 2023-06-15 09:08:05,436 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=93406.66666666667, ans=0.125 2023-06-15 09:08:14,961 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=93473.33333333333, ans=0.0 2023-06-15 09:08:24,810 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=93540.0, ans=0.125 2023-06-15 09:08:28,695 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=93540.0, ans=0.125 2023-06-15 09:08:30,673 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=93540.0, ans=0.0 2023-06-15 09:08:43,694 INFO [train.py:988] (0/4) Epoch 27, batch 200, loss[loss=0.2361, simple_loss=0.3038, pruned_loss=0.08417, over 20480.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.3047, pruned_loss=0.07803, over 2409677.35 frames. ], batch size: 160, lr: 1.19e-02, grad_scale: 32.0 2023-06-15 09:08:45,247 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-06-15 09:09:17,716 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=93740.0, ans=0.2 2023-06-15 09:09:29,572 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=93740.0, ans=0.04949747468305833 2023-06-15 09:09:33,940 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2023-06-15 09:09:37,570 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=12.0 2023-06-15 09:09:49,536 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=93806.66666666667, ans=0.2 2023-06-15 09:10:08,523 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=93873.33333333333, ans=0.125 2023-06-15 09:10:11,427 INFO [train.py:988] (0/4) Epoch 27, batch 250, loss[loss=0.2223, simple_loss=0.3066, pruned_loss=0.06901, over 19445.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.3046, pruned_loss=0.07778, over 2719058.15 frames. ], batch size: 105, lr: 1.19e-02, grad_scale: 32.0 2023-06-15 09:10:36,911 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=94006.66666666667, ans=0.125 2023-06-15 09:10:38,693 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2023-06-15 09:10:50,260 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.425e+02 1.782e+02 1.944e+02 2.302e+02 3.570e+02, threshold=3.888e+02, percent-clipped=0.0 2023-06-15 09:11:09,841 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=94140.0, ans=0.0 2023-06-15 09:11:13,780 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=94140.0, ans=0.09899494936611666 2023-06-15 09:11:28,007 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94206.66666666667, ans=0.1 2023-06-15 09:11:33,017 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=94206.66666666667, ans=0.5 2023-06-15 09:11:39,400 INFO [train.py:988] (0/4) Epoch 27, batch 300, loss[loss=0.2032, simple_loss=0.2832, pruned_loss=0.06163, over 19103.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.3049, pruned_loss=0.07835, over 2959519.47 frames. ], batch size: 94, lr: 1.19e-02, grad_scale: 32.0 2023-06-15 09:11:44,609 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=94273.33333333333, ans=0.2 2023-06-15 09:12:21,448 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=94406.66666666667, ans=0.125 2023-06-15 09:12:31,506 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94473.33333333333, ans=0.1 2023-06-15 09:12:57,594 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=94540.0, ans=0.125 2023-06-15 09:12:59,830 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 09:13:06,373 INFO [train.py:988] (0/4) Epoch 27, batch 350, loss[loss=0.2331, simple_loss=0.3099, pruned_loss=0.07813, over 19074.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.3046, pruned_loss=0.0779, over 3159057.18 frames. ], batch size: 94, lr: 1.19e-02, grad_scale: 16.0 2023-06-15 09:13:14,942 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94606.66666666667, ans=0.0 2023-06-15 09:13:45,655 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=94740.0, ans=0.125 2023-06-15 09:13:46,983 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+02 1.867e+02 2.054e+02 2.333e+02 3.496e+02, threshold=4.108e+02, percent-clipped=0.0 2023-06-15 09:14:02,582 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94806.66666666667, ans=0.1 2023-06-15 09:14:02,592 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=94806.66666666667, ans=0.125 2023-06-15 09:14:10,312 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=94806.66666666667, ans=0.125 2023-06-15 09:14:24,614 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=94873.33333333333, ans=0.05 2023-06-15 09:14:35,030 INFO [train.py:988] (0/4) Epoch 27, batch 400, loss[loss=0.218, simple_loss=0.2994, pruned_loss=0.06829, over 19090.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.3047, pruned_loss=0.07818, over 3296314.13 frames. ], batch size: 94, lr: 1.19e-02, grad_scale: 32.0 2023-06-15 09:14:35,549 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=94940.0, ans=0.125 2023-06-15 09:14:46,971 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=94940.0, ans=0.05 2023-06-15 09:14:53,037 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-06-15 09:15:01,402 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=95006.66666666667, ans=0.125 2023-06-15 09:15:32,670 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=95140.0, ans=0.0 2023-06-15 09:15:36,058 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=95140.0, ans=0.125 2023-06-15 09:15:50,234 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=95206.66666666667, ans=0.2 2023-06-15 09:16:02,640 INFO [train.py:988] (0/4) Epoch 27, batch 450, loss[loss=0.2264, simple_loss=0.3002, pruned_loss=0.07627, over 18788.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.3053, pruned_loss=0.07787, over 3401322.65 frames. ], batch size: 83, lr: 1.18e-02, grad_scale: 16.0 2023-06-15 09:16:13,783 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-06-15 09:16:18,063 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=95340.0, ans=0.0 2023-06-15 09:16:19,631 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=95340.0, ans=0.07 2023-06-15 09:16:28,522 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=95340.0, ans=0.0 2023-06-15 09:16:41,504 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-06-15 09:16:43,712 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.600e+02 1.922e+02 2.176e+02 2.834e+02 5.039e+02, threshold=4.352e+02, percent-clipped=1.0 2023-06-15 09:17:16,125 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95540.0, ans=0.1 2023-06-15 09:17:27,056 INFO [train.py:988] (0/4) Epoch 27, batch 500, loss[loss=0.2347, simple_loss=0.2955, pruned_loss=0.08695, over 20287.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.3048, pruned_loss=0.07768, over 3485215.01 frames. ], batch size: 239, lr: 1.18e-02, grad_scale: 16.0 2023-06-15 09:17:37,731 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-06-15 09:18:04,077 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=95740.0, ans=0.0 2023-06-15 09:18:07,939 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=95740.0, ans=0.125 2023-06-15 09:18:14,459 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=95740.0, ans=0.95 2023-06-15 09:18:22,349 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-27.pt 2023-06-15 09:18:47,751 INFO [train.py:988] (0/4) Epoch 28, batch 0, loss[loss=0.2192, simple_loss=0.2965, pruned_loss=0.07096, over 18924.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2965, pruned_loss=0.07096, over 18924.00 frames. ], batch size: 86, lr: 1.16e-02, grad_scale: 32.0 2023-06-15 09:18:47,752 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 09:18:53,821 INFO [train.py:1020] (0/4) Epoch 28, validation: loss=0.203, simple_loss=0.307, pruned_loss=0.0495, over 143649.00 frames. 2023-06-15 09:18:53,822 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 09:18:59,749 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-06-15 09:19:23,105 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=95893.33333333333, ans=0.125 2023-06-15 09:19:35,882 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=95960.0, ans=0.0 2023-06-15 09:19:48,413 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=96026.66666666667, ans=0.125 2023-06-15 09:20:04,489 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2023-06-15 09:20:05,429 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.551e+02 1.841e+02 2.127e+02 2.560e+02 4.411e+02, threshold=4.254e+02, percent-clipped=1.0 2023-06-15 09:20:07,494 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=96093.33333333333, ans=0.0 2023-06-15 09:20:21,889 INFO [train.py:988] (0/4) Epoch 28, batch 50, loss[loss=0.2192, simple_loss=0.3028, pruned_loss=0.06785, over 18624.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.3027, pruned_loss=0.07655, over 865860.64 frames. ], batch size: 80, lr: 1.16e-02, grad_scale: 32.0 2023-06-15 09:20:34,528 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=96160.0, ans=0.125 2023-06-15 09:21:01,726 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=96293.33333333333, ans=0.0 2023-06-15 09:21:47,586 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=96426.66666666667, ans=0.2 2023-06-15 09:21:49,322 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96493.33333333333, ans=0.1 2023-06-15 09:21:51,028 INFO [train.py:988] (0/4) Epoch 28, batch 100, loss[loss=0.2226, simple_loss=0.3037, pruned_loss=0.07078, over 18918.00 frames. ], tot_loss[loss=0.228, simple_loss=0.3035, pruned_loss=0.07629, over 1514684.21 frames. ], batch size: 86, lr: 1.16e-02, grad_scale: 32.0 2023-06-15 09:22:06,355 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 09:22:57,816 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=96693.33333333333, ans=0.125 2023-06-15 09:23:02,135 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2023-06-15 09:23:02,541 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.442e+02 1.826e+02 2.059e+02 2.321e+02 3.339e+02, threshold=4.117e+02, percent-clipped=0.0 2023-06-15 09:23:18,517 INFO [train.py:988] (0/4) Epoch 28, batch 150, loss[loss=0.2274, simple_loss=0.3113, pruned_loss=0.0718, over 19083.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.3037, pruned_loss=0.07725, over 2005020.16 frames. ], batch size: 89, lr: 1.16e-02, grad_scale: 32.0 2023-06-15 09:24:39,857 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2023-06-15 09:24:46,176 INFO [train.py:988] (0/4) Epoch 28, batch 200, loss[loss=0.2482, simple_loss=0.3263, pruned_loss=0.08502, over 16071.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.3053, pruned_loss=0.07793, over 2397720.84 frames. ], batch size: 51, lr: 1.15e-02, grad_scale: 32.0 2023-06-15 09:24:50,282 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-06-15 09:25:00,152 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=97160.0, ans=0.125 2023-06-15 09:25:05,113 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=97226.66666666667, ans=0.125 2023-06-15 09:25:15,990 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97226.66666666667, ans=0.1 2023-06-15 09:25:56,378 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+02 1.821e+02 1.977e+02 2.337e+02 4.646e+02, threshold=3.954e+02, percent-clipped=1.0 2023-06-15 09:25:58,372 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=97426.66666666667, ans=0.0 2023-06-15 09:26:11,642 INFO [train.py:988] (0/4) Epoch 28, batch 250, loss[loss=0.2052, simple_loss=0.2858, pruned_loss=0.06237, over 18899.00 frames. ], tot_loss[loss=0.23, simple_loss=0.3045, pruned_loss=0.07774, over 2716742.08 frames. ], batch size: 86, lr: 1.15e-02, grad_scale: 32.0 2023-06-15 09:26:23,508 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=97493.33333333333, ans=0.125 2023-06-15 09:26:23,518 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=97493.33333333333, ans=0.125 2023-06-15 09:26:46,824 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=97626.66666666667, ans=0.0 2023-06-15 09:27:17,099 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 09:27:23,844 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=97760.0, ans=0.125 2023-06-15 09:27:23,930 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=97760.0, ans=0.0 2023-06-15 09:27:41,178 INFO [train.py:988] (0/4) Epoch 28, batch 300, loss[loss=0.2441, simple_loss=0.3141, pruned_loss=0.08705, over 20297.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.3044, pruned_loss=0.07698, over 2952634.33 frames. ], batch size: 141, lr: 1.15e-02, grad_scale: 32.0 2023-06-15 09:27:41,430 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=97826.66666666667, ans=0.125 2023-06-15 09:27:52,778 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-06-15 09:27:57,881 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=97893.33333333333, ans=0.0 2023-06-15 09:28:09,482 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=97893.33333333333, ans=0.2 2023-06-15 09:28:11,206 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=97893.33333333333, ans=0.125 2023-06-15 09:28:52,629 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.613e+02 1.947e+02 2.273e+02 2.757e+02 4.812e+02, threshold=4.546e+02, percent-clipped=3.0 2023-06-15 09:29:06,042 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=98160.0, ans=0.0 2023-06-15 09:29:07,334 INFO [train.py:988] (0/4) Epoch 28, batch 350, loss[loss=0.2216, simple_loss=0.3016, pruned_loss=0.07074, over 19060.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.3034, pruned_loss=0.0766, over 3150949.48 frames. ], batch size: 89, lr: 1.15e-02, grad_scale: 32.0 2023-06-15 09:29:26,185 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=98226.66666666667, ans=0.0 2023-06-15 09:29:39,514 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=98226.66666666667, ans=0.0 2023-06-15 09:29:43,319 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2023-06-15 09:29:49,714 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=98293.33333333333, ans=0.125 2023-06-15 09:30:19,754 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=98426.66666666667, ans=0.0 2023-06-15 09:30:34,472 INFO [train.py:988] (0/4) Epoch 28, batch 400, loss[loss=0.2125, simple_loss=0.2952, pruned_loss=0.06494, over 18314.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.3026, pruned_loss=0.07648, over 3289312.41 frames. ], batch size: 74, lr: 1.15e-02, grad_scale: 32.0 2023-06-15 09:30:38,126 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=98493.33333333333, ans=0.2 2023-06-15 09:30:50,365 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2023-06-15 09:30:51,412 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=98560.0, ans=0.0 2023-06-15 09:31:01,840 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=98560.0, ans=0.125 2023-06-15 09:31:19,051 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=98626.66666666667, ans=0.125 2023-06-15 09:31:20,634 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=98626.66666666667, ans=0.2 2023-06-15 09:31:24,485 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=98626.66666666667, ans=0.125 2023-06-15 09:31:38,868 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=98693.33333333333, ans=0.2 2023-06-15 09:31:42,418 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=98693.33333333333, ans=0.04949747468305833 2023-06-15 09:31:48,681 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.579e+02 1.911e+02 2.084e+02 2.355e+02 3.960e+02, threshold=4.169e+02, percent-clipped=0.0 2023-06-15 09:32:01,893 INFO [train.py:988] (0/4) Epoch 28, batch 450, loss[loss=0.2181, simple_loss=0.307, pruned_loss=0.06459, over 18596.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.3024, pruned_loss=0.07622, over 3390975.55 frames. ], batch size: 80, lr: 1.15e-02, grad_scale: 32.0 2023-06-15 09:32:17,158 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=98826.66666666667, ans=0.2 2023-06-15 09:32:42,832 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=98960.0, ans=0.125 2023-06-15 09:32:48,429 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-06-15 09:33:06,465 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=99026.66666666667, ans=0.1 2023-06-15 09:33:28,920 INFO [train.py:988] (0/4) Epoch 28, batch 500, loss[loss=0.2406, simple_loss=0.3136, pruned_loss=0.08383, over 20130.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.3027, pruned_loss=0.07649, over 3483096.09 frames. ], batch size: 133, lr: 1.15e-02, grad_scale: 32.0 2023-06-15 09:33:37,762 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2023-06-15 09:33:40,917 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99160.0, ans=0.1 2023-06-15 09:33:54,000 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=99226.66666666667, ans=0.125 2023-06-15 09:34:13,409 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=99293.33333333333, ans=0.125 2023-06-15 09:34:22,267 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-28.pt 2023-06-15 09:34:46,542 INFO [train.py:988] (0/4) Epoch 29, batch 0, loss[loss=0.2233, simple_loss=0.2947, pruned_loss=0.07596, over 20698.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2947, pruned_loss=0.07596, over 20698.00 frames. ], batch size: 211, lr: 1.12e-02, grad_scale: 32.0 2023-06-15 09:34:46,543 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 09:34:52,716 INFO [train.py:1020] (0/4) Epoch 29, validation: loss=0.2012, simple_loss=0.3049, pruned_loss=0.04872, over 143649.00 frames. 2023-06-15 09:34:52,718 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 09:34:54,748 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=99380.0, ans=0.125 2023-06-15 09:35:07,423 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.516e+02 1.808e+02 2.025e+02 2.226e+02 3.535e+02, threshold=4.050e+02, percent-clipped=0.0 2023-06-15 09:36:20,597 INFO [train.py:988] (0/4) Epoch 29, batch 50, loss[loss=0.2355, simple_loss=0.2881, pruned_loss=0.09145, over 19917.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.3015, pruned_loss=0.0731, over 848320.18 frames. ], batch size: 294, lr: 1.12e-02, grad_scale: 32.0 2023-06-15 09:36:29,175 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=99713.33333333333, ans=0.125 2023-06-15 09:36:48,654 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.53 vs. limit=12.0 2023-06-15 09:37:06,488 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=99846.66666666667, ans=0.2 2023-06-15 09:37:34,222 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=99980.0, ans=0.0 2023-06-15 09:37:48,022 INFO [train.py:988] (0/4) Epoch 29, batch 100, loss[loss=0.2055, simple_loss=0.2778, pruned_loss=0.06655, over 19203.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.3022, pruned_loss=0.07611, over 1506849.89 frames. ], batch size: 92, lr: 1.12e-02, grad_scale: 32.0 2023-06-15 09:38:03,418 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+02 1.850e+02 2.039e+02 2.478e+02 3.886e+02, threshold=4.079e+02, percent-clipped=0.0 2023-06-15 09:38:37,285 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=100180.0, ans=0.0 2023-06-15 09:38:45,443 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100246.66666666667, ans=0.1 2023-06-15 09:39:14,993 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=15.0 2023-06-15 09:39:15,398 INFO [train.py:988] (0/4) Epoch 29, batch 150, loss[loss=0.2209, simple_loss=0.2948, pruned_loss=0.07348, over 19997.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.3019, pruned_loss=0.07729, over 2010449.32 frames. ], batch size: 126, lr: 1.12e-02, grad_scale: 32.0 2023-06-15 09:39:19,037 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=100380.0, ans=0.0 2023-06-15 09:39:32,885 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=12.0 2023-06-15 09:39:44,579 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=100446.66666666667, ans=0.0 2023-06-15 09:40:40,943 INFO [train.py:988] (0/4) Epoch 29, batch 200, loss[loss=0.2275, simple_loss=0.303, pruned_loss=0.07602, over 19252.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.3022, pruned_loss=0.07596, over 2400421.49 frames. ], batch size: 92, lr: 1.12e-02, grad_scale: 32.0 2023-06-15 09:40:50,789 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=100713.33333333333, ans=0.125 2023-06-15 09:40:50,854 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=100713.33333333333, ans=0.125 2023-06-15 09:40:57,018 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.536e+02 1.792e+02 2.001e+02 2.365e+02 3.519e+02, threshold=4.002e+02, percent-clipped=0.0 2023-06-15 09:41:39,888 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100913.33333333333, ans=0.1 2023-06-15 09:42:07,401 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=101046.66666666667, ans=0.2 2023-06-15 09:42:08,455 INFO [train.py:988] (0/4) Epoch 29, batch 250, loss[loss=0.2158, simple_loss=0.2983, pruned_loss=0.06668, over 19106.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.3023, pruned_loss=0.07554, over 2698814.77 frames. ], batch size: 94, lr: 1.12e-02, grad_scale: 32.0 2023-06-15 09:42:17,889 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=101046.66666666667, ans=0.0 2023-06-15 09:42:33,241 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=101113.33333333333, ans=0.125 2023-06-15 09:42:33,319 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101113.33333333333, ans=0.1 2023-06-15 09:42:36,845 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=101113.33333333333, ans=0.125 2023-06-15 09:42:45,456 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=101180.0, ans=0.035 2023-06-15 09:43:06,315 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=101246.66666666667, ans=0.125 2023-06-15 09:43:24,654 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-06-15 09:43:35,033 INFO [train.py:988] (0/4) Epoch 29, batch 300, loss[loss=0.2239, simple_loss=0.3145, pruned_loss=0.06663, over 18323.00 frames. ], tot_loss[loss=0.228, simple_loss=0.3042, pruned_loss=0.07593, over 2924139.82 frames. ], batch size: 72, lr: 1.12e-02, grad_scale: 32.0 2023-06-15 09:43:50,905 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.486e+02 1.806e+02 2.034e+02 2.286e+02 3.270e+02, threshold=4.068e+02, percent-clipped=0.0 2023-06-15 09:43:54,779 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=101446.66666666667, ans=0.05 2023-06-15 09:44:00,301 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=101446.66666666667, ans=0.125 2023-06-15 09:44:58,107 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=101646.66666666667, ans=0.5 2023-06-15 09:45:02,599 INFO [train.py:988] (0/4) Epoch 29, batch 350, loss[loss=0.2118, simple_loss=0.2969, pruned_loss=0.06328, over 19214.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.3044, pruned_loss=0.07591, over 3102303.22 frames. ], batch size: 92, lr: 1.11e-02, grad_scale: 16.0 2023-06-15 09:45:25,233 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=101780.0, ans=0.125 2023-06-15 09:45:28,552 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=101780.0, ans=0.025 2023-06-15 09:45:31,741 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=101780.0, ans=0.125 2023-06-15 09:45:37,373 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=101846.66666666667, ans=0.125 2023-06-15 09:46:24,058 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101980.0, ans=0.1 2023-06-15 09:46:25,728 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=101980.0, ans=0.125 2023-06-15 09:46:29,148 INFO [train.py:988] (0/4) Epoch 29, batch 400, loss[loss=0.2426, simple_loss=0.3079, pruned_loss=0.08867, over 20081.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.3053, pruned_loss=0.07577, over 3246024.26 frames. ], batch size: 133, lr: 1.11e-02, grad_scale: 32.0 2023-06-15 09:46:33,059 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-06-15 09:46:43,746 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=102046.66666666667, ans=0.2 2023-06-15 09:46:46,571 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.587e+02 1.969e+02 2.274e+02 2.602e+02 3.577e+02, threshold=4.548e+02, percent-clipped=0.0 2023-06-15 09:46:51,833 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102113.33333333333, ans=0.1 2023-06-15 09:47:17,636 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=102180.0, ans=0.0 2023-06-15 09:47:17,738 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102180.0, ans=0.1 2023-06-15 09:47:54,295 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=102380.0, ans=0.2 2023-06-15 09:47:55,545 INFO [train.py:988] (0/4) Epoch 29, batch 450, loss[loss=0.2337, simple_loss=0.3109, pruned_loss=0.07822, over 19951.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.3042, pruned_loss=0.07514, over 3364890.85 frames. ], batch size: 126, lr: 1.11e-02, grad_scale: 32.0 2023-06-15 09:49:20,987 INFO [train.py:988] (0/4) Epoch 29, batch 500, loss[loss=0.2242, simple_loss=0.3031, pruned_loss=0.07267, over 19067.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.3031, pruned_loss=0.07523, over 3452310.05 frames. ], batch size: 89, lr: 1.11e-02, grad_scale: 32.0 2023-06-15 09:49:27,817 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=102713.33333333333, ans=0.125 2023-06-15 09:49:28,002 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102713.33333333333, ans=0.1 2023-06-15 09:49:32,972 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=102713.33333333333, ans=0.0 2023-06-15 09:49:37,630 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.560e+02 1.900e+02 2.167e+02 2.548e+02 3.918e+02, threshold=4.335e+02, percent-clipped=0.0 2023-06-15 09:49:50,796 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=102780.0, ans=0.125 2023-06-15 09:50:12,179 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-29.pt 2023-06-15 09:50:35,325 INFO [train.py:988] (0/4) Epoch 30, batch 0, loss[loss=0.2223, simple_loss=0.3004, pruned_loss=0.07211, over 19103.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.3004, pruned_loss=0.07211, over 19103.00 frames. ], batch size: 94, lr: 1.09e-02, grad_scale: 32.0 2023-06-15 09:50:35,326 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 09:50:41,562 INFO [train.py:1020] (0/4) Epoch 30, validation: loss=0.2006, simple_loss=0.3036, pruned_loss=0.04881, over 143649.00 frames. 2023-06-15 09:50:41,562 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 09:51:42,521 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=103126.66666666667, ans=0.0 2023-06-15 09:52:08,094 INFO [train.py:988] (0/4) Epoch 30, batch 50, loss[loss=0.2204, simple_loss=0.3061, pruned_loss=0.0673, over 18306.00 frames. ], tot_loss[loss=0.226, simple_loss=0.302, pruned_loss=0.07499, over 849250.76 frames. ], batch size: 74, lr: 1.09e-02, grad_scale: 32.0 2023-06-15 09:52:10,165 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103260.0, ans=0.1 2023-06-15 09:52:24,083 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=103326.66666666667, ans=0.2 2023-06-15 09:52:44,217 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=103393.33333333333, ans=0.125 2023-06-15 09:52:48,144 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=103393.33333333333, ans=0.125 2023-06-15 09:52:54,678 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=103393.33333333333, ans=0.0 2023-06-15 09:52:55,895 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.533e+02 1.895e+02 2.155e+02 2.482e+02 4.117e+02, threshold=4.310e+02, percent-clipped=0.0 2023-06-15 09:52:57,008 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=22.5 2023-06-15 09:53:12,386 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=103460.0, ans=0.2 2023-06-15 09:53:34,732 INFO [train.py:988] (0/4) Epoch 30, batch 100, loss[loss=0.223, simple_loss=0.2979, pruned_loss=0.07404, over 20093.00 frames. ], tot_loss[loss=0.226, simple_loss=0.3017, pruned_loss=0.07517, over 1489179.14 frames. ], batch size: 133, lr: 1.09e-02, grad_scale: 32.0 2023-06-15 09:53:37,521 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2023-06-15 09:53:54,127 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=103660.0, ans=0.125 2023-06-15 09:53:55,492 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=15.0 2023-06-15 09:53:59,434 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=103660.0, ans=0.04949747468305833 2023-06-15 09:54:36,078 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.27 vs. limit=22.5 2023-06-15 09:54:49,514 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=103860.0, ans=0.0 2023-06-15 09:54:54,627 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=103860.0, ans=0.2 2023-06-15 09:54:59,702 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.18 vs. limit=10.0 2023-06-15 09:55:01,990 INFO [train.py:988] (0/4) Epoch 30, batch 150, loss[loss=0.2218, simple_loss=0.2918, pruned_loss=0.07587, over 20319.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.3017, pruned_loss=0.07566, over 1991825.87 frames. ], batch size: 141, lr: 1.09e-02, grad_scale: 32.0 2023-06-15 09:55:14,458 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2023-06-15 09:55:15,826 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=103926.66666666667, ans=0.125 2023-06-15 09:55:40,783 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=104060.0, ans=0.125 2023-06-15 09:55:44,652 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=104060.0, ans=0.125 2023-06-15 09:55:51,193 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.525e+02 1.792e+02 2.087e+02 2.377e+02 3.180e+02, threshold=4.174e+02, percent-clipped=0.0 2023-06-15 09:55:53,399 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=104126.66666666667, ans=0.0 2023-06-15 09:56:10,938 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=104193.33333333333, ans=0.1 2023-06-15 09:56:22,533 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=104193.33333333333, ans=0.125 2023-06-15 09:56:26,492 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=104193.33333333333, ans=10.0 2023-06-15 09:56:29,875 INFO [train.py:988] (0/4) Epoch 30, batch 200, loss[loss=0.2303, simple_loss=0.3098, pruned_loss=0.07538, over 18759.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.3022, pruned_loss=0.07547, over 2371934.19 frames. ], batch size: 83, lr: 1.08e-02, grad_scale: 32.0 2023-06-15 09:56:36,963 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=104260.0, ans=0.125 2023-06-15 09:57:51,677 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=104526.66666666667, ans=0.125 2023-06-15 09:57:54,034 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-06-15 09:57:56,078 INFO [train.py:988] (0/4) Epoch 30, batch 250, loss[loss=0.2202, simple_loss=0.3004, pruned_loss=0.07001, over 18448.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.3016, pruned_loss=0.07406, over 2693083.75 frames. ], batch size: 77, lr: 1.08e-02, grad_scale: 32.0 2023-06-15 09:58:01,355 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=22.5 2023-06-15 09:58:28,569 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-06-15 09:58:33,826 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=104726.66666666667, ans=0.2 2023-06-15 09:58:45,936 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.493e+02 1.806e+02 1.918e+02 2.139e+02 3.492e+02, threshold=3.836e+02, percent-clipped=0.0 2023-06-15 09:58:54,538 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=104793.33333333333, ans=0.125 2023-06-15 09:59:17,672 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.56 vs. limit=15.0 2023-06-15 09:59:23,432 INFO [train.py:988] (0/4) Epoch 30, batch 300, loss[loss=0.2129, simple_loss=0.2782, pruned_loss=0.07384, over 20637.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.3015, pruned_loss=0.07467, over 2923697.69 frames. ], batch size: 211, lr: 1.08e-02, grad_scale: 32.0 2023-06-15 09:59:36,026 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=104926.66666666667, ans=0.125 2023-06-15 09:59:39,492 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104993.33333333333, ans=0.1 2023-06-15 09:59:41,021 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104993.33333333333, ans=0.1 2023-06-15 09:59:44,250 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=104993.33333333333, ans=0.125 2023-06-15 10:00:03,689 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:00:24,946 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=105126.66666666667, ans=0.0 2023-06-15 10:00:29,872 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=105126.66666666667, ans=0.0 2023-06-15 10:00:40,350 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105193.33333333333, ans=0.1 2023-06-15 10:00:49,820 INFO [train.py:988] (0/4) Epoch 30, batch 350, loss[loss=0.2153, simple_loss=0.2984, pruned_loss=0.0661, over 18789.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.3005, pruned_loss=0.07358, over 3120214.70 frames. ], batch size: 83, lr: 1.08e-02, grad_scale: 32.0 2023-06-15 10:01:25,486 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-06-15 10:01:32,459 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.46 vs. limit=10.0 2023-06-15 10:01:38,548 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.487e+02 1.770e+02 1.941e+02 2.233e+02 2.810e+02, threshold=3.882e+02, percent-clipped=0.0 2023-06-15 10:01:41,878 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2023-06-15 10:01:42,657 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=105460.0, ans=0.0 2023-06-15 10:01:54,253 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105460.0, ans=0.1 2023-06-15 10:02:09,128 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=105526.66666666667, ans=0.125 2023-06-15 10:02:15,102 INFO [train.py:988] (0/4) Epoch 30, batch 400, loss[loss=0.2039, simple_loss=0.2885, pruned_loss=0.05963, over 19509.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.3005, pruned_loss=0.07361, over 3268193.76 frames. ], batch size: 105, lr: 1.08e-02, grad_scale: 32.0 2023-06-15 10:02:33,997 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=105660.0, ans=0.125 2023-06-15 10:02:45,953 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=105660.0, ans=0.0 2023-06-15 10:03:01,728 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=105726.66666666667, ans=0.125 2023-06-15 10:03:04,163 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-06-15 10:03:40,282 INFO [train.py:988] (0/4) Epoch 30, batch 450, loss[loss=0.2413, simple_loss=0.2855, pruned_loss=0.09854, over 17008.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2998, pruned_loss=0.07378, over 3393320.25 frames. ], batch size: 392, lr: 1.08e-02, grad_scale: 32.0 2023-06-15 10:04:05,293 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=105993.33333333333, ans=0.0 2023-06-15 10:04:28,191 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.453e+02 1.828e+02 2.027e+02 2.323e+02 3.183e+02, threshold=4.054e+02, percent-clipped=0.0 2023-06-15 10:04:32,103 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-06-15 10:04:41,410 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=106126.66666666667, ans=0.0 2023-06-15 10:05:04,415 INFO [train.py:988] (0/4) Epoch 30, batch 500, loss[loss=0.2327, simple_loss=0.3235, pruned_loss=0.07097, over 18336.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.3, pruned_loss=0.07346, over 3496206.95 frames. ], batch size: 72, lr: 1.08e-02, grad_scale: 32.0 2023-06-15 10:05:26,296 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=106326.66666666667, ans=0.125 2023-06-15 10:05:37,352 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=106393.33333333333, ans=0.05 2023-06-15 10:05:38,970 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=106393.33333333333, ans=0.0 2023-06-15 10:05:57,501 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-30.pt 2023-06-15 10:06:22,312 INFO [train.py:988] (0/4) Epoch 31, batch 0, loss[loss=0.1984, simple_loss=0.282, pruned_loss=0.05734, over 19078.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.282, pruned_loss=0.05734, over 19078.00 frames. ], batch size: 94, lr: 1.06e-02, grad_scale: 32.0 2023-06-15 10:06:22,313 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 10:06:28,532 INFO [train.py:1020] (0/4) Epoch 31, validation: loss=0.2014, simple_loss=0.3032, pruned_loss=0.0498, over 143649.00 frames. 2023-06-15 10:06:28,533 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 10:06:33,974 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=106480.0, ans=0.125 2023-06-15 10:07:15,089 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/checkpoint-16000.pt 2023-06-15 10:07:22,654 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=106680.0, ans=0.125 2023-06-15 10:07:25,714 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=106680.0, ans=0.02 2023-06-15 10:07:30,845 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=106680.0, ans=0.0 2023-06-15 10:07:44,681 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=106746.66666666667, ans=0.09899494936611666 2023-06-15 10:07:48,110 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.517e+02 1.719e+02 1.932e+02 2.142e+02 3.238e+02, threshold=3.865e+02, percent-clipped=0.0 2023-06-15 10:07:49,976 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=106746.66666666667, ans=0.125 2023-06-15 10:07:55,176 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=106813.33333333333, ans=0.0 2023-06-15 10:07:56,262 INFO [train.py:988] (0/4) Epoch 31, batch 50, loss[loss=0.2197, simple_loss=0.2859, pruned_loss=0.07672, over 20689.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.3013, pruned_loss=0.07241, over 854495.50 frames. ], batch size: 211, lr: 1.06e-02, grad_scale: 32.0 2023-06-15 10:08:04,298 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2023-06-15 10:08:19,107 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2023-06-15 10:08:41,128 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=106946.66666666667, ans=0.1 2023-06-15 10:08:44,145 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=12.0 2023-06-15 10:08:47,157 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=12.0 2023-06-15 10:08:50,329 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=107013.33333333333, ans=0.125 2023-06-15 10:09:22,800 INFO [train.py:988] (0/4) Epoch 31, batch 100, loss[loss=0.2326, simple_loss=0.3244, pruned_loss=0.07038, over 18286.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.3023, pruned_loss=0.07425, over 1498911.85 frames. ], batch size: 72, lr: 1.05e-02, grad_scale: 32.0 2023-06-15 10:09:41,519 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:10:25,052 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=107346.66666666667, ans=0.125 2023-06-15 10:10:40,187 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.492e+02 1.809e+02 2.076e+02 2.464e+02 3.768e+02, threshold=4.152e+02, percent-clipped=0.0 2023-06-15 10:10:43,208 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=107413.33333333333, ans=10.0 2023-06-15 10:10:49,498 INFO [train.py:988] (0/4) Epoch 31, batch 150, loss[loss=0.2269, simple_loss=0.3107, pruned_loss=0.07157, over 18946.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.3017, pruned_loss=0.07367, over 1999761.57 frames. ], batch size: 86, lr: 1.05e-02, grad_scale: 32.0 2023-06-15 10:11:06,649 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=107546.66666666667, ans=0.05 2023-06-15 10:11:28,433 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-06-15 10:11:40,242 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=107680.0, ans=0.2 2023-06-15 10:11:48,538 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=107680.0, ans=0.0 2023-06-15 10:11:58,374 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:12:03,228 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107746.66666666667, ans=0.1 2023-06-15 10:12:10,405 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=107746.66666666667, ans=0.125 2023-06-15 10:12:15,586 INFO [train.py:988] (0/4) Epoch 31, batch 200, loss[loss=0.2197, simple_loss=0.2892, pruned_loss=0.07508, over 20194.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.3006, pruned_loss=0.07318, over 2399271.94 frames. ], batch size: 239, lr: 1.05e-02, grad_scale: 32.0 2023-06-15 10:12:55,678 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=107946.66666666667, ans=0.125 2023-06-15 10:13:23,704 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:13:33,054 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.438e+02 1.862e+02 2.218e+02 2.544e+02 4.456e+02, threshold=4.436e+02, percent-clipped=1.0 2023-06-15 10:13:39,973 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=108146.66666666667, ans=0.125 2023-06-15 10:13:41,908 INFO [train.py:988] (0/4) Epoch 31, batch 250, loss[loss=0.2166, simple_loss=0.2863, pruned_loss=0.07341, over 20298.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.3014, pruned_loss=0.07359, over 2709937.67 frames. ], batch size: 239, lr: 1.05e-02, grad_scale: 32.0 2023-06-15 10:13:56,696 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=108146.66666666667, ans=0.0 2023-06-15 10:13:56,842 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=108146.66666666667, ans=0.1 2023-06-15 10:14:18,501 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=108280.0, ans=0.125 2023-06-15 10:14:35,441 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.75 vs. limit=12.0 2023-06-15 10:14:44,170 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=108346.66666666667, ans=0.2 2023-06-15 10:14:44,481 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-06-15 10:15:02,187 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=108413.33333333333, ans=0.125 2023-06-15 10:15:09,067 INFO [train.py:988] (0/4) Epoch 31, batch 300, loss[loss=0.2248, simple_loss=0.3014, pruned_loss=0.07413, over 20079.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.3009, pruned_loss=0.07318, over 2958936.80 frames. ], batch size: 133, lr: 1.05e-02, grad_scale: 32.0 2023-06-15 10:15:24,530 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-06-15 10:15:48,633 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=108613.33333333333, ans=0.0 2023-06-15 10:16:26,651 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.536e+02 1.814e+02 2.012e+02 2.294e+02 3.766e+02, threshold=4.023e+02, percent-clipped=0.0 2023-06-15 10:16:27,171 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=108746.66666666667, ans=0.0 2023-06-15 10:16:34,825 INFO [train.py:988] (0/4) Epoch 31, batch 350, loss[loss=0.2213, simple_loss=0.3052, pruned_loss=0.06876, over 18492.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2995, pruned_loss=0.07312, over 3147933.74 frames. ], batch size: 77, lr: 1.05e-02, grad_scale: 32.0 2023-06-15 10:17:02,337 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=22.5 2023-06-15 10:17:07,819 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=22.5 2023-06-15 10:17:19,408 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.09 vs. limit=22.5 2023-06-15 10:17:27,256 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109013.33333333333, ans=0.1 2023-06-15 10:18:00,831 INFO [train.py:988] (0/4) Epoch 31, batch 400, loss[loss=0.2167, simple_loss=0.2878, pruned_loss=0.07281, over 20734.00 frames. ], tot_loss[loss=0.222, simple_loss=0.299, pruned_loss=0.0725, over 3292410.08 frames. ], batch size: 211, lr: 1.05e-02, grad_scale: 32.0 2023-06-15 10:18:39,125 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=109280.0, ans=0.1 2023-06-15 10:19:04,893 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=109346.66666666667, ans=0.125 2023-06-15 10:19:20,539 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.455e+02 1.873e+02 2.079e+02 2.420e+02 3.208e+02, threshold=4.158e+02, percent-clipped=0.0 2023-06-15 10:19:25,826 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=109480.0, ans=0.0 2023-06-15 10:19:27,266 INFO [train.py:988] (0/4) Epoch 31, batch 450, loss[loss=0.2179, simple_loss=0.2999, pruned_loss=0.06798, over 18647.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2997, pruned_loss=0.07301, over 3388615.29 frames. ], batch size: 80, lr: 1.05e-02, grad_scale: 32.0 2023-06-15 10:19:29,117 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=109480.0, ans=0.125 2023-06-15 10:20:25,993 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=109680.0, ans=0.125 2023-06-15 10:20:26,000 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=109680.0, ans=0.125 2023-06-15 10:20:28,259 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2023-06-15 10:20:32,439 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109680.0, ans=0.1 2023-06-15 10:20:39,218 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=109746.66666666667, ans=0.2 2023-06-15 10:20:45,715 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=109746.66666666667, ans=0.0 2023-06-15 10:20:51,574 INFO [train.py:988] (0/4) Epoch 31, batch 500, loss[loss=0.2285, simple_loss=0.3131, pruned_loss=0.07196, over 17127.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.2997, pruned_loss=0.07268, over 3479570.11 frames. ], batch size: 60, lr: 1.04e-02, grad_scale: 32.0 2023-06-15 10:20:58,355 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109813.33333333333, ans=0.1 2023-06-15 10:20:59,949 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=109813.33333333333, ans=0.125 2023-06-15 10:21:04,748 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=109813.33333333333, ans=0.0 2023-06-15 10:21:28,964 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.03 vs. limit=15.0 2023-06-15 10:21:43,598 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-31.pt 2023-06-15 10:22:06,252 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=110026.66666666667, ans=0.2 2023-06-15 10:22:07,561 INFO [train.py:988] (0/4) Epoch 32, batch 0, loss[loss=0.2601, simple_loss=0.2949, pruned_loss=0.1127, over 17089.00 frames. ], tot_loss[loss=0.2601, simple_loss=0.2949, pruned_loss=0.1127, over 17089.00 frames. ], batch size: 391, lr: 1.03e-02, grad_scale: 32.0 2023-06-15 10:22:07,562 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 10:22:13,538 INFO [train.py:1020] (0/4) Epoch 32, validation: loss=0.1996, simple_loss=0.3022, pruned_loss=0.04853, over 143649.00 frames. 2023-06-15 10:22:13,538 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 10:22:17,710 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=110026.66666666667, ans=0.0 2023-06-15 10:22:32,894 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110093.33333333333, ans=0.1 2023-06-15 10:22:34,614 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=110093.33333333333, ans=0.125 2023-06-15 10:22:37,961 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.534e+02 1.793e+02 2.026e+02 2.443e+02 4.216e+02, threshold=4.052e+02, percent-clipped=1.0 2023-06-15 10:22:56,968 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=110160.0, ans=0.2 2023-06-15 10:22:58,854 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=110160.0, ans=0.2 2023-06-15 10:23:06,113 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=110226.66666666667, ans=0.2 2023-06-15 10:23:06,182 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=110226.66666666667, ans=0.125 2023-06-15 10:23:10,990 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=110226.66666666667, ans=0.125 2023-06-15 10:23:16,942 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=110226.66666666667, ans=0.125 2023-06-15 10:23:18,593 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=110226.66666666667, ans=0.125 2023-06-15 10:23:20,376 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2023-06-15 10:23:29,770 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-06-15 10:23:33,953 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=110293.33333333333, ans=0.0 2023-06-15 10:23:40,693 INFO [train.py:988] (0/4) Epoch 32, batch 50, loss[loss=0.2182, simple_loss=0.2995, pruned_loss=0.06848, over 19653.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.3004, pruned_loss=0.0731, over 852135.07 frames. ], batch size: 110, lr: 1.03e-02, grad_scale: 32.0 2023-06-15 10:23:43,250 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.54 vs. limit=22.5 2023-06-15 10:24:07,451 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=110426.66666666667, ans=0.125 2023-06-15 10:24:27,863 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=110493.33333333333, ans=0.05 2023-06-15 10:25:08,055 INFO [train.py:988] (0/4) Epoch 32, batch 100, loss[loss=0.2103, simple_loss=0.2943, pruned_loss=0.06317, over 19530.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2967, pruned_loss=0.07053, over 1527811.94 frames. ], batch size: 102, lr: 1.02e-02, grad_scale: 32.0 2023-06-15 10:25:18,253 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:25:31,497 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.473e+02 1.727e+02 1.862e+02 2.037e+02 3.271e+02, threshold=3.724e+02, percent-clipped=0.0 2023-06-15 10:25:39,152 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=110760.0, ans=0.0 2023-06-15 10:25:51,676 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=110826.66666666667, ans=0.2 2023-06-15 10:26:05,618 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-06-15 10:26:10,110 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=110893.33333333333, ans=0.125 2023-06-15 10:26:33,268 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=111026.66666666667, ans=0.125 2023-06-15 10:26:34,700 INFO [train.py:988] (0/4) Epoch 32, batch 150, loss[loss=0.2064, simple_loss=0.2954, pruned_loss=0.0587, over 18293.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2962, pruned_loss=0.07017, over 2029836.29 frames. ], batch size: 74, lr: 1.02e-02, grad_scale: 32.0 2023-06-15 10:27:22,329 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=111160.0, ans=0.0 2023-06-15 10:27:49,315 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111293.33333333333, ans=0.125 2023-06-15 10:27:53,113 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=111293.33333333333, ans=0.2 2023-06-15 10:28:00,885 INFO [train.py:988] (0/4) Epoch 32, batch 200, loss[loss=0.2341, simple_loss=0.3136, pruned_loss=0.07724, over 18274.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2976, pruned_loss=0.07078, over 2417153.74 frames. ], batch size: 74, lr: 1.02e-02, grad_scale: 32.0 2023-06-15 10:28:06,072 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2023-06-15 10:28:14,753 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2023-06-15 10:28:24,814 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.498e+02 1.849e+02 2.097e+02 2.469e+02 3.862e+02, threshold=4.194e+02, percent-clipped=1.0 2023-06-15 10:28:25,289 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=111426.66666666667, ans=0.1 2023-06-15 10:28:31,956 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=111426.66666666667, ans=0.125 2023-06-15 10:28:39,836 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-06-15 10:29:01,683 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=111560.0, ans=10.0 2023-06-15 10:29:26,811 INFO [train.py:988] (0/4) Epoch 32, batch 250, loss[loss=0.2033, simple_loss=0.2826, pruned_loss=0.06198, over 19813.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2988, pruned_loss=0.07097, over 2692026.23 frames. ], batch size: 115, lr: 1.02e-02, grad_scale: 32.0 2023-06-15 10:30:11,140 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-06-15 10:30:43,231 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=111960.0, ans=0.0 2023-06-15 10:30:53,807 INFO [train.py:988] (0/4) Epoch 32, batch 300, loss[loss=0.2182, simple_loss=0.3019, pruned_loss=0.06728, over 19116.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2987, pruned_loss=0.07176, over 2925161.64 frames. ], batch size: 94, lr: 1.02e-02, grad_scale: 32.0 2023-06-15 10:31:10,175 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:31:18,114 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.463e+02 1.817e+02 2.017e+02 2.252e+02 3.365e+02, threshold=4.033e+02, percent-clipped=0.0 2023-06-15 10:31:54,534 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=112226.66666666667, ans=0.0 2023-06-15 10:31:54,690 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=112226.66666666667, ans=0.0 2023-06-15 10:32:20,596 INFO [train.py:988] (0/4) Epoch 32, batch 350, loss[loss=0.2249, simple_loss=0.3051, pruned_loss=0.07239, over 19465.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2986, pruned_loss=0.0715, over 3117671.75 frames. ], batch size: 105, lr: 1.02e-02, grad_scale: 32.0 2023-06-15 10:33:03,605 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=112493.33333333333, ans=0.125 2023-06-15 10:33:34,864 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:33:39,143 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-06-15 10:33:40,970 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.39 vs. limit=22.5 2023-06-15 10:33:42,983 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=112626.66666666667, ans=15.0 2023-06-15 10:33:45,061 INFO [train.py:988] (0/4) Epoch 32, batch 400, loss[loss=0.2283, simple_loss=0.2887, pruned_loss=0.08391, over 19982.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2982, pruned_loss=0.07132, over 3267783.16 frames. ], batch size: 293, lr: 1.02e-02, grad_scale: 32.0 2023-06-15 10:33:51,450 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=112693.33333333333, ans=0.0 2023-06-15 10:34:09,517 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+02 1.921e+02 2.168e+02 2.475e+02 4.297e+02, threshold=4.337e+02, percent-clipped=1.0 2023-06-15 10:34:49,460 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=112893.33333333333, ans=0.1 2023-06-15 10:35:00,199 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.96 vs. limit=22.5 2023-06-15 10:35:08,995 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-06-15 10:35:11,181 INFO [train.py:988] (0/4) Epoch 32, batch 450, loss[loss=0.2116, simple_loss=0.2886, pruned_loss=0.06735, over 19518.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2986, pruned_loss=0.07155, over 3386266.16 frames. ], batch size: 102, lr: 1.02e-02, grad_scale: 32.0 2023-06-15 10:36:36,129 INFO [train.py:988] (0/4) Epoch 32, batch 500, loss[loss=0.2107, simple_loss=0.2964, pruned_loss=0.06255, over 17627.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2983, pruned_loss=0.07116, over 3471571.09 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 32.0 2023-06-15 10:36:59,351 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.475e+02 1.786e+02 2.074e+02 2.314e+02 3.487e+02, threshold=4.148e+02, percent-clipped=0.0 2023-06-15 10:37:27,516 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-32.pt 2023-06-15 10:37:52,764 INFO [train.py:988] (0/4) Epoch 33, batch 0, loss[loss=0.223, simple_loss=0.2897, pruned_loss=0.07814, over 20220.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2897, pruned_loss=0.07814, over 20220.00 frames. ], batch size: 239, lr: 9.98e-03, grad_scale: 32.0 2023-06-15 10:37:52,765 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 10:37:58,955 INFO [train.py:1020] (0/4) Epoch 33, validation: loss=0.2021, simple_loss=0.3035, pruned_loss=0.05038, over 143649.00 frames. 2023-06-15 10:37:58,956 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 10:38:33,657 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-06-15 10:39:11,452 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:39:13,771 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=113840.0, ans=0.125 2023-06-15 10:39:26,784 INFO [train.py:988] (0/4) Epoch 33, batch 50, loss[loss=0.2217, simple_loss=0.2986, pruned_loss=0.07236, over 19201.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2959, pruned_loss=0.07123, over 864385.32 frames. ], batch size: 92, lr: 9.96e-03, grad_scale: 32.0 2023-06-15 10:39:27,203 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=113906.66666666667, ans=0.09899494936611666 2023-06-15 10:39:32,095 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=113906.66666666667, ans=0.0 2023-06-15 10:39:54,149 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=12.0 2023-06-15 10:40:02,108 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=114040.0, ans=0.125 2023-06-15 10:40:02,324 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=114040.0, ans=0.125 2023-06-15 10:40:22,220 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.544e+02 1.812e+02 2.020e+02 2.321e+02 4.264e+02, threshold=4.041e+02, percent-clipped=1.0 2023-06-15 10:40:31,427 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 10:40:53,372 INFO [train.py:988] (0/4) Epoch 33, batch 100, loss[loss=0.2109, simple_loss=0.2942, pruned_loss=0.06383, over 19846.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2959, pruned_loss=0.07039, over 1527587.83 frames. ], batch size: 120, lr: 9.95e-03, grad_scale: 32.0 2023-06-15 10:41:01,648 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=114240.0, ans=0.0 2023-06-15 10:41:07,434 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=114240.0, ans=0.125 2023-06-15 10:41:08,997 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=114306.66666666667, ans=0.1 2023-06-15 10:41:10,644 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=114306.66666666667, ans=0.125 2023-06-15 10:41:11,103 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-06-15 10:41:17,661 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=114306.66666666667, ans=0.04949747468305833 2023-06-15 10:41:17,809 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=114306.66666666667, ans=0.125 2023-06-15 10:41:50,566 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=114440.0, ans=0.0 2023-06-15 10:41:52,867 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2023-06-15 10:42:07,747 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=114506.66666666667, ans=0.0 2023-06-15 10:42:19,606 INFO [train.py:988] (0/4) Epoch 33, batch 150, loss[loss=0.1997, simple_loss=0.2837, pruned_loss=0.05787, over 19513.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2959, pruned_loss=0.07093, over 2043007.14 frames. ], batch size: 102, lr: 9.94e-03, grad_scale: 32.0 2023-06-15 10:42:31,783 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=114573.33333333333, ans=0.0 2023-06-15 10:42:45,897 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=114640.0, ans=0.2 2023-06-15 10:42:46,632 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2023-06-15 10:43:14,642 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.568e+02 1.854e+02 2.036e+02 2.409e+02 3.930e+02, threshold=4.072e+02, percent-clipped=0.0 2023-06-15 10:43:34,369 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-06-15 10:43:45,585 INFO [train.py:988] (0/4) Epoch 33, batch 200, loss[loss=0.2065, simple_loss=0.276, pruned_loss=0.06856, over 20238.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2968, pruned_loss=0.07091, over 2415443.10 frames. ], batch size: 239, lr: 9.93e-03, grad_scale: 32.0 2023-06-15 10:43:58,019 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=114906.66666666667, ans=0.0 2023-06-15 10:44:24,347 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.02 vs. limit=10.0 2023-06-15 10:44:48,428 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115106.66666666667, ans=0.1 2023-06-15 10:44:50,081 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=115106.66666666667, ans=0.0 2023-06-15 10:45:00,638 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=115173.33333333333, ans=0.125 2023-06-15 10:45:08,579 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=115173.33333333333, ans=0.125 2023-06-15 10:45:11,582 INFO [train.py:988] (0/4) Epoch 33, batch 250, loss[loss=0.215, simple_loss=0.2958, pruned_loss=0.06712, over 18635.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2966, pruned_loss=0.07172, over 2715967.76 frames. ], batch size: 80, lr: 9.92e-03, grad_scale: 32.0 2023-06-15 10:45:32,201 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=115306.66666666667, ans=0.125 2023-06-15 10:45:40,790 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=115306.66666666667, ans=0.0 2023-06-15 10:45:53,098 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=115373.33333333333, ans=0.125 2023-06-15 10:46:04,662 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=22.5 2023-06-15 10:46:06,694 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.489e+02 1.758e+02 1.982e+02 2.404e+02 3.997e+02, threshold=3.964e+02, percent-clipped=0.0 2023-06-15 10:46:12,560 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-06-15 10:46:17,238 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2023-06-15 10:46:27,663 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=115506.66666666667, ans=0.05 2023-06-15 10:46:37,242 INFO [train.py:988] (0/4) Epoch 33, batch 300, loss[loss=0.2292, simple_loss=0.2979, pruned_loss=0.08029, over 20660.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2967, pruned_loss=0.07202, over 2941447.92 frames. ], batch size: 211, lr: 9.90e-03, grad_scale: 32.0 2023-06-15 10:46:39,847 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.24 vs. limit=10.0 2023-06-15 10:46:41,060 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=115573.33333333333, ans=0.0 2023-06-15 10:46:54,332 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=115640.0, ans=0.125 2023-06-15 10:47:08,567 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=115640.0, ans=0.0 2023-06-15 10:47:16,717 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115706.66666666667, ans=0.1 2023-06-15 10:47:28,709 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=115773.33333333333, ans=0.125 2023-06-15 10:47:44,974 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=115840.0, ans=0.125 2023-06-15 10:48:02,185 INFO [train.py:988] (0/4) Epoch 33, batch 350, loss[loss=0.21, simple_loss=0.2985, pruned_loss=0.06072, over 18296.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2966, pruned_loss=0.07176, over 3135149.41 frames. ], batch size: 74, lr: 9.89e-03, grad_scale: 32.0 2023-06-15 10:48:52,536 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2023-06-15 10:48:57,396 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.480e+02 1.859e+02 2.087e+02 2.471e+02 4.224e+02, threshold=4.174e+02, percent-clipped=1.0 2023-06-15 10:49:21,988 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=116173.33333333333, ans=0.1 2023-06-15 10:49:24,308 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2023-06-15 10:49:28,267 INFO [train.py:988] (0/4) Epoch 33, batch 400, loss[loss=0.2277, simple_loss=0.2901, pruned_loss=0.08259, over 20277.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2964, pruned_loss=0.07172, over 3298201.69 frames. ], batch size: 239, lr: 9.88e-03, grad_scale: 32.0 2023-06-15 10:49:35,750 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2023-06-15 10:49:44,744 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=116306.66666666667, ans=0.2 2023-06-15 10:50:08,966 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=116373.33333333333, ans=0.0 2023-06-15 10:50:11,159 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-06-15 10:50:12,085 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=116373.33333333333, ans=0.125 2023-06-15 10:50:13,800 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=116373.33333333333, ans=0.2 2023-06-15 10:50:36,802 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=12.0 2023-06-15 10:50:41,032 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=116506.66666666667, ans=0.0 2023-06-15 10:50:54,719 INFO [train.py:988] (0/4) Epoch 33, batch 450, loss[loss=0.2192, simple_loss=0.2917, pruned_loss=0.07341, over 19946.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2963, pruned_loss=0.07094, over 3382804.90 frames. ], batch size: 126, lr: 9.87e-03, grad_scale: 32.0 2023-06-15 10:51:00,197 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=116573.33333333333, ans=0.1 2023-06-15 10:51:17,685 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=116640.0, ans=0.125 2023-06-15 10:51:50,040 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+02 1.786e+02 1.951e+02 2.072e+02 2.874e+02, threshold=3.901e+02, percent-clipped=0.0 2023-06-15 10:52:11,794 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=116840.0, ans=0.125 2023-06-15 10:52:17,728 INFO [train.py:988] (0/4) Epoch 33, batch 500, loss[loss=0.2295, simple_loss=0.2701, pruned_loss=0.09443, over 16808.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2961, pruned_loss=0.07076, over 3457416.05 frames. ], batch size: 391, lr: 9.86e-03, grad_scale: 32.0 2023-06-15 10:52:26,409 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=116906.66666666667, ans=0.2 2023-06-15 10:52:29,423 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116906.66666666667, ans=0.1 2023-06-15 10:52:41,400 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=116973.33333333333, ans=0.0 2023-06-15 10:52:59,783 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-06-15 10:53:11,362 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-33.pt 2023-06-15 10:53:34,840 INFO [train.py:988] (0/4) Epoch 34, batch 0, loss[loss=0.2141, simple_loss=0.2922, pruned_loss=0.06799, over 19225.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.2922, pruned_loss=0.06799, over 19225.00 frames. ], batch size: 92, lr: 9.70e-03, grad_scale: 32.0 2023-06-15 10:53:34,841 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 10:53:41,147 INFO [train.py:1020] (0/4) Epoch 34, validation: loss=0.2011, simple_loss=0.3024, pruned_loss=0.04991, over 143649.00 frames. 2023-06-15 10:53:41,148 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 10:53:47,084 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=117126.66666666667, ans=0.2 2023-06-15 10:54:09,506 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=117193.33333333333, ans=0.125 2023-06-15 10:54:14,541 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=117260.0, ans=0.0 2023-06-15 10:54:16,915 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=117260.0, ans=0.0 2023-06-15 10:54:23,074 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=12.0 2023-06-15 10:55:10,948 INFO [train.py:988] (0/4) Epoch 34, batch 50, loss[loss=0.2026, simple_loss=0.2778, pruned_loss=0.06371, over 20515.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2982, pruned_loss=0.06997, over 828964.69 frames. ], batch size: 173, lr: 9.69e-03, grad_scale: 16.0 2023-06-15 10:55:12,580 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+02 1.849e+02 2.100e+02 2.383e+02 3.120e+02, threshold=4.200e+02, percent-clipped=0.0 2023-06-15 10:55:38,358 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=117526.66666666667, ans=0.125 2023-06-15 10:56:02,301 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=117593.33333333333, ans=0.1 2023-06-15 10:56:13,340 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=117660.0, ans=0.0 2023-06-15 10:56:16,603 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=117660.0, ans=0.0 2023-06-15 10:56:19,005 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2023-06-15 10:56:40,473 INFO [train.py:988] (0/4) Epoch 34, batch 100, loss[loss=0.2108, simple_loss=0.2948, pruned_loss=0.06342, over 19462.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2958, pruned_loss=0.06861, over 1497322.98 frames. ], batch size: 105, lr: 9.68e-03, grad_scale: 16.0 2023-06-15 10:56:44,538 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117793.33333333333, ans=0.1 2023-06-15 10:57:04,749 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=117860.0, ans=0.125 2023-06-15 10:58:02,664 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=118060.0, ans=0.125 2023-06-15 10:58:09,683 INFO [train.py:988] (0/4) Epoch 34, batch 150, loss[loss=0.1938, simple_loss=0.2793, pruned_loss=0.0542, over 19533.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2966, pruned_loss=0.0691, over 2006965.39 frames. ], batch size: 102, lr: 9.67e-03, grad_scale: 16.0 2023-06-15 10:58:11,347 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.542e+02 1.765e+02 1.971e+02 2.282e+02 3.738e+02, threshold=3.942e+02, percent-clipped=0.0 2023-06-15 10:58:55,979 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=118260.0, ans=0.025 2023-06-15 10:59:19,838 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118393.33333333333, ans=0.1 2023-06-15 10:59:29,437 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=118393.33333333333, ans=0.035 2023-06-15 10:59:37,539 INFO [train.py:988] (0/4) Epoch 34, batch 200, loss[loss=0.1951, simple_loss=0.2781, pruned_loss=0.05608, over 19681.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2955, pruned_loss=0.06845, over 2400601.04 frames. ], batch size: 110, lr: 9.65e-03, grad_scale: 16.0 2023-06-15 10:59:49,324 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=118460.0, ans=0.0 2023-06-15 11:00:10,726 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118526.66666666667, ans=0.1 2023-06-15 11:00:36,218 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=118660.0, ans=0.0 2023-06-15 11:00:41,253 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118660.0, ans=0.1 2023-06-15 11:00:47,344 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2023-06-15 11:01:07,017 INFO [train.py:988] (0/4) Epoch 34, batch 250, loss[loss=0.231, simple_loss=0.3183, pruned_loss=0.07188, over 15633.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2952, pruned_loss=0.06897, over 2699492.17 frames. ], batch size: 44, lr: 9.64e-03, grad_scale: 16.0 2023-06-15 11:01:09,083 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.505e+02 1.911e+02 2.159e+02 2.494e+02 3.715e+02, threshold=4.319e+02, percent-clipped=0.0 2023-06-15 11:01:11,213 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=118793.33333333333, ans=0.125 2023-06-15 11:01:27,385 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=118860.0, ans=0.125 2023-06-15 11:01:31,652 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.19 vs. limit=10.0 2023-06-15 11:01:36,439 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118860.0, ans=0.1 2023-06-15 11:01:47,865 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118926.66666666667, ans=0.1 2023-06-15 11:02:04,697 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=118993.33333333333, ans=0.2 2023-06-15 11:02:19,665 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=119060.0, ans=0.0 2023-06-15 11:02:34,399 INFO [train.py:988] (0/4) Epoch 34, batch 300, loss[loss=0.2022, simple_loss=0.2863, pruned_loss=0.05906, over 19444.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2953, pruned_loss=0.06898, over 2953581.57 frames. ], batch size: 105, lr: 9.63e-03, grad_scale: 16.0 2023-06-15 11:03:40,775 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=119326.66666666667, ans=0.07 2023-06-15 11:03:52,187 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=119393.33333333333, ans=0.0 2023-06-15 11:03:54,352 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-06-15 11:04:03,794 INFO [train.py:988] (0/4) Epoch 34, batch 350, loss[loss=0.1862, simple_loss=0.2693, pruned_loss=0.0515, over 19459.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2956, pruned_loss=0.06886, over 3124142.09 frames. ], batch size: 105, lr: 9.62e-03, grad_scale: 16.0 2023-06-15 11:04:05,481 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.557e+02 1.947e+02 2.157e+02 2.600e+02 3.674e+02, threshold=4.313e+02, percent-clipped=0.0 2023-06-15 11:04:13,298 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=119460.0, ans=0.125 2023-06-15 11:04:13,493 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119460.0, ans=0.1 2023-06-15 11:04:15,702 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.91 vs. limit=10.0 2023-06-15 11:04:52,779 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=119593.33333333333, ans=0.0 2023-06-15 11:05:00,098 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=119660.0, ans=0.0 2023-06-15 11:05:34,492 INFO [train.py:988] (0/4) Epoch 34, batch 400, loss[loss=0.2208, simple_loss=0.3059, pruned_loss=0.06784, over 18339.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2959, pruned_loss=0.06871, over 3269888.12 frames. ], batch size: 72, lr: 9.61e-03, grad_scale: 32.0 2023-06-15 11:05:40,614 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-06-15 11:05:50,600 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=119860.0, ans=0.125 2023-06-15 11:05:54,549 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=119860.0, ans=0.125 2023-06-15 11:06:19,595 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=119926.66666666667, ans=0.95 2023-06-15 11:06:23,023 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=119926.66666666667, ans=0.125 2023-06-15 11:06:46,262 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=120060.0, ans=0.125 2023-06-15 11:06:49,526 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=120060.0, ans=0.125 2023-06-15 11:07:03,745 INFO [train.py:988] (0/4) Epoch 34, batch 450, loss[loss=0.2474, simple_loss=0.3314, pruned_loss=0.08167, over 16742.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2958, pruned_loss=0.06889, over 3391931.77 frames. ], batch size: 59, lr: 9.60e-03, grad_scale: 32.0 2023-06-15 11:07:05,356 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.501e+02 1.842e+02 2.161e+02 2.491e+02 3.686e+02, threshold=4.322e+02, percent-clipped=0.0 2023-06-15 11:07:05,843 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=120126.66666666667, ans=0.1 2023-06-15 11:07:58,000 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=120326.66666666667, ans=10.0 2023-06-15 11:08:17,981 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=120393.33333333333, ans=0.0 2023-06-15 11:08:29,408 INFO [train.py:988] (0/4) Epoch 34, batch 500, loss[loss=0.1968, simple_loss=0.2768, pruned_loss=0.0584, over 19845.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2956, pruned_loss=0.06877, over 3474568.03 frames. ], batch size: 120, lr: 9.59e-03, grad_scale: 32.0 2023-06-15 11:08:31,167 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=120460.0, ans=0.0 2023-06-15 11:08:42,444 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-06-15 11:08:51,867 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=120526.66666666667, ans=0.1 2023-06-15 11:09:09,483 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=120593.33333333333, ans=0.95 2023-06-15 11:09:22,687 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-34.pt 2023-06-15 11:09:43,650 INFO [train.py:988] (0/4) Epoch 35, batch 0, loss[loss=0.2199, simple_loss=0.2975, pruned_loss=0.07118, over 19354.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2975, pruned_loss=0.07118, over 19354.00 frames. ], batch size: 98, lr: 9.44e-03, grad_scale: 32.0 2023-06-15 11:09:43,650 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 11:09:49,775 INFO [train.py:1020] (0/4) Epoch 35, validation: loss=0.2016, simple_loss=0.3016, pruned_loss=0.05077, over 143649.00 frames. 2023-06-15 11:09:49,776 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 11:10:21,599 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.525e+02 1.805e+02 2.012e+02 2.315e+02 3.975e+02, threshold=4.025e+02, percent-clipped=0.0 2023-06-15 11:10:26,705 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-06-15 11:10:34,345 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 11:10:46,897 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=120880.0, ans=0.015 2023-06-15 11:10:52,290 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=120880.0, ans=0.1 2023-06-15 11:11:18,993 INFO [train.py:988] (0/4) Epoch 35, batch 50, loss[loss=0.2526, simple_loss=0.3372, pruned_loss=0.08403, over 17099.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2954, pruned_loss=0.06862, over 856823.39 frames. ], batch size: 60, lr: 9.43e-03, grad_scale: 32.0 2023-06-15 11:11:23,672 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=22.5 2023-06-15 11:11:54,170 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=121146.66666666667, ans=0.1 2023-06-15 11:11:56,529 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=121146.66666666667, ans=0.0 2023-06-15 11:12:21,776 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2023-06-15 11:12:35,146 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=121280.0, ans=0.0 2023-06-15 11:12:39,895 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2023-06-15 11:12:47,367 INFO [train.py:988] (0/4) Epoch 35, batch 100, loss[loss=0.2133, simple_loss=0.2929, pruned_loss=0.06683, over 18630.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2939, pruned_loss=0.06824, over 1495433.65 frames. ], batch size: 80, lr: 9.42e-03, grad_scale: 32.0 2023-06-15 11:12:47,780 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=121346.66666666667, ans=0.125 2023-06-15 11:13:00,845 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2023-06-15 11:13:10,569 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=121413.33333333333, ans=0.2 2023-06-15 11:13:12,667 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-06-15 11:13:18,548 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.621e+02 1.883e+02 2.096e+02 2.428e+02 4.337e+02, threshold=4.193e+02, percent-clipped=1.0 2023-06-15 11:14:15,093 INFO [train.py:988] (0/4) Epoch 35, batch 150, loss[loss=0.2046, simple_loss=0.286, pruned_loss=0.06159, over 19082.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.293, pruned_loss=0.06789, over 1995783.17 frames. ], batch size: 89, lr: 9.41e-03, grad_scale: 32.0 2023-06-15 11:14:21,986 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=22.5 2023-06-15 11:14:31,182 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=121746.66666666667, ans=0.125 2023-06-15 11:14:32,806 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=121746.66666666667, ans=0.125 2023-06-15 11:14:46,802 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=121746.66666666667, ans=0.125 2023-06-15 11:14:50,083 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=121813.33333333333, ans=0.2 2023-06-15 11:14:58,016 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=121813.33333333333, ans=0.125 2023-06-15 11:15:05,282 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=121813.33333333333, ans=0.1 2023-06-15 11:15:08,706 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=121880.0, ans=0.125 2023-06-15 11:15:22,604 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=121880.0, ans=0.125 2023-06-15 11:15:31,521 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-06-15 11:15:43,552 INFO [train.py:988] (0/4) Epoch 35, batch 200, loss[loss=0.2009, simple_loss=0.2873, pruned_loss=0.05724, over 18791.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2926, pruned_loss=0.06771, over 2404411.91 frames. ], batch size: 83, lr: 9.40e-03, grad_scale: 32.0 2023-06-15 11:15:56,231 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122013.33333333333, ans=0.1 2023-06-15 11:15:57,093 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.07 vs. limit=15.0 2023-06-15 11:16:05,224 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=122080.0, ans=0.0 2023-06-15 11:16:14,786 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.481e+02 1.845e+02 2.048e+02 2.405e+02 3.914e+02, threshold=4.095e+02, percent-clipped=0.0 2023-06-15 11:16:18,507 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=122146.66666666667, ans=0.125 2023-06-15 11:16:23,447 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=122146.66666666667, ans=0.0 2023-06-15 11:17:09,616 INFO [train.py:988] (0/4) Epoch 35, batch 250, loss[loss=0.197, simple_loss=0.2828, pruned_loss=0.05563, over 18959.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.293, pruned_loss=0.06807, over 2701324.18 frames. ], batch size: 86, lr: 9.38e-03, grad_scale: 32.0 2023-06-15 11:17:33,111 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=122413.33333333333, ans=0.125 2023-06-15 11:17:38,210 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=122413.33333333333, ans=0.2 2023-06-15 11:17:48,539 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-06-15 11:18:16,583 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=122546.66666666667, ans=0.0 2023-06-15 11:18:36,375 INFO [train.py:988] (0/4) Epoch 35, batch 300, loss[loss=0.207, simple_loss=0.2916, pruned_loss=0.0612, over 19530.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2928, pruned_loss=0.06834, over 2942450.23 frames. ], batch size: 102, lr: 9.37e-03, grad_scale: 32.0 2023-06-15 11:19:05,095 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.28 vs. limit=10.0 2023-06-15 11:19:06,915 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.434e+02 1.736e+02 1.889e+02 2.139e+02 2.972e+02, threshold=3.778e+02, percent-clipped=0.0 2023-06-15 11:19:37,003 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=122880.0, ans=0.125 2023-06-15 11:19:57,454 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=122946.66666666667, ans=0.125 2023-06-15 11:19:58,976 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=122946.66666666667, ans=0.0 2023-06-15 11:20:01,988 INFO [train.py:988] (0/4) Epoch 35, batch 350, loss[loss=0.2026, simple_loss=0.2888, pruned_loss=0.05821, over 19450.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.293, pruned_loss=0.06794, over 3133316.82 frames. ], batch size: 105, lr: 9.36e-03, grad_scale: 32.0 2023-06-15 11:20:16,374 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=123013.33333333333, ans=0.125 2023-06-15 11:21:10,408 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123280.0, ans=0.1 2023-06-15 11:21:28,664 INFO [train.py:988] (0/4) Epoch 35, batch 400, loss[loss=0.2296, simple_loss=0.3117, pruned_loss=0.07374, over 18287.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2927, pruned_loss=0.06789, over 3285317.77 frames. ], batch size: 74, lr: 9.35e-03, grad_scale: 32.0 2023-06-15 11:21:59,501 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.407e+02 1.852e+02 2.101e+02 2.531e+02 3.269e+02, threshold=4.203e+02, percent-clipped=0.0 2023-06-15 11:22:01,935 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=123480.0, ans=0.0 2023-06-15 11:22:08,739 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=123480.0, ans=0.0 2023-06-15 11:22:29,959 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2023-06-15 11:22:54,578 INFO [train.py:988] (0/4) Epoch 35, batch 450, loss[loss=0.2009, simple_loss=0.2873, pruned_loss=0.05723, over 19675.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2929, pruned_loss=0.06777, over 3396516.36 frames. ], batch size: 110, lr: 9.34e-03, grad_scale: 32.0 2023-06-15 11:22:54,914 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 11:22:56,589 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=123680.0, ans=0.125 2023-06-15 11:23:11,203 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=123746.66666666667, ans=0.125 2023-06-15 11:23:26,397 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123746.66666666667, ans=0.1 2023-06-15 11:24:11,875 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=123946.66666666667, ans=0.09899494936611666 2023-06-15 11:24:18,934 INFO [train.py:988] (0/4) Epoch 35, batch 500, loss[loss=0.2082, simple_loss=0.2947, pruned_loss=0.0609, over 19472.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2933, pruned_loss=0.06808, over 3467586.45 frames. ], batch size: 105, lr: 9.33e-03, grad_scale: 32.0 2023-06-15 11:24:25,507 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=124013.33333333333, ans=0.0 2023-06-15 11:24:27,142 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=124013.33333333333, ans=0.125 2023-06-15 11:24:38,149 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=124080.0, ans=0.125 2023-06-15 11:24:47,760 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.531e+02 1.833e+02 2.002e+02 2.181e+02 2.864e+02, threshold=4.004e+02, percent-clipped=0.0 2023-06-15 11:25:09,579 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-35.pt 2023-06-15 11:25:33,791 INFO [train.py:988] (0/4) Epoch 36, batch 0, loss[loss=0.2038, simple_loss=0.2861, pruned_loss=0.06077, over 19110.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2861, pruned_loss=0.06077, over 19110.00 frames. ], batch size: 94, lr: 9.19e-03, grad_scale: 32.0 2023-06-15 11:25:33,792 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 11:25:39,896 INFO [train.py:1020] (0/4) Epoch 36, validation: loss=0.2014, simple_loss=0.3017, pruned_loss=0.05055, over 143649.00 frames. 2023-06-15 11:25:39,897 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 11:25:44,802 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-06-15 11:25:49,312 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=124226.66666666667, ans=0.0 2023-06-15 11:26:06,446 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=124293.33333333333, ans=0.125 2023-06-15 11:26:08,628 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-06-15 11:26:32,067 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=124426.66666666667, ans=10.0 2023-06-15 11:27:05,065 INFO [train.py:988] (0/4) Epoch 36, batch 50, loss[loss=0.2351, simple_loss=0.3112, pruned_loss=0.07948, over 20289.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2924, pruned_loss=0.06678, over 861141.01 frames. ], batch size: 141, lr: 9.18e-03, grad_scale: 32.0 2023-06-15 11:27:17,956 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=124560.0, ans=0.05 2023-06-15 11:27:21,076 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=124626.66666666667, ans=0.125 2023-06-15 11:27:27,921 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=124626.66666666667, ans=0.2 2023-06-15 11:27:43,721 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=124693.33333333333, ans=0.125 2023-06-15 11:28:02,859 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-06-15 11:28:04,862 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2023-06-15 11:28:07,133 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.578e+02 1.827e+02 2.009e+02 2.333e+02 3.474e+02, threshold=4.018e+02, percent-clipped=0.0 2023-06-15 11:28:16,677 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=124826.66666666667, ans=0.0 2023-06-15 11:28:25,314 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=124826.66666666667, ans=0.2 2023-06-15 11:28:31,565 INFO [train.py:988] (0/4) Epoch 36, batch 100, loss[loss=0.2019, simple_loss=0.2843, pruned_loss=0.05973, over 19213.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2926, pruned_loss=0.06628, over 1505111.73 frames. ], batch size: 92, lr: 9.17e-03, grad_scale: 32.0 2023-06-15 11:28:59,047 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-06-15 11:29:03,846 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=124960.0, ans=0.125 2023-06-15 11:29:04,003 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=124960.0, ans=0.125 2023-06-15 11:29:20,162 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=125026.66666666667, ans=0.125 2023-06-15 11:29:25,618 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125093.33333333333, ans=0.1 2023-06-15 11:29:44,912 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=125160.0, ans=0.2 2023-06-15 11:29:53,305 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.87 vs. limit=15.0 2023-06-15 11:29:58,755 INFO [train.py:988] (0/4) Epoch 36, batch 150, loss[loss=0.237, simple_loss=0.3312, pruned_loss=0.07141, over 18346.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.2933, pruned_loss=0.06619, over 2017483.25 frames. ], batch size: 72, lr: 9.16e-03, grad_scale: 16.0 2023-06-15 11:30:17,068 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=125293.33333333333, ans=0.125 2023-06-15 11:30:20,065 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=125293.33333333333, ans=0.125 2023-06-15 11:30:44,478 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=125360.0, ans=0.125 2023-06-15 11:30:54,628 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=22.5 2023-06-15 11:31:02,865 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.526e+02 1.954e+02 2.200e+02 2.726e+02 5.615e+02, threshold=4.401e+02, percent-clipped=3.0 2023-06-15 11:31:25,664 INFO [train.py:988] (0/4) Epoch 36, batch 200, loss[loss=0.2222, simple_loss=0.2959, pruned_loss=0.07429, over 20537.00 frames. ], tot_loss[loss=0.2131, simple_loss=0.2936, pruned_loss=0.0663, over 2409808.13 frames. ], batch size: 173, lr: 9.15e-03, grad_scale: 16.0 2023-06-15 11:31:44,527 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=125626.66666666667, ans=0.125 2023-06-15 11:32:40,551 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=125826.66666666667, ans=0.125 2023-06-15 11:32:52,597 INFO [train.py:988] (0/4) Epoch 36, batch 250, loss[loss=0.2099, simple_loss=0.2919, pruned_loss=0.06393, over 19090.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2933, pruned_loss=0.06661, over 2721309.18 frames. ], batch size: 89, lr: 9.14e-03, grad_scale: 16.0 2023-06-15 11:33:02,725 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125893.33333333333, ans=0.1 2023-06-15 11:33:03,115 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2023-06-15 11:33:56,228 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.389e+02 1.785e+02 1.964e+02 2.188e+02 3.452e+02, threshold=3.927e+02, percent-clipped=0.0 2023-06-15 11:34:18,425 INFO [train.py:988] (0/4) Epoch 36, batch 300, loss[loss=0.2248, simple_loss=0.2838, pruned_loss=0.08293, over 19880.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.293, pruned_loss=0.06716, over 2975833.96 frames. ], batch size: 293, lr: 9.13e-03, grad_scale: 16.0 2023-06-15 11:34:35,769 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=126293.33333333333, ans=0.125 2023-06-15 11:34:55,196 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=126360.0, ans=0.125 2023-06-15 11:35:12,645 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.84 vs. limit=22.5 2023-06-15 11:35:24,097 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=126426.66666666667, ans=0.125 2023-06-15 11:35:25,726 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=126493.33333333333, ans=0.125 2023-06-15 11:35:45,204 INFO [train.py:988] (0/4) Epoch 36, batch 350, loss[loss=0.2041, simple_loss=0.2864, pruned_loss=0.06087, over 18275.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2927, pruned_loss=0.06681, over 3151892.17 frames. ], batch size: 74, lr: 9.12e-03, grad_scale: 16.0 2023-06-15 11:36:38,548 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=126760.0, ans=0.125 2023-06-15 11:36:39,209 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-06-15 11:36:50,191 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.411e+02 1.822e+02 2.085e+02 2.267e+02 3.723e+02, threshold=4.169e+02, percent-clipped=0.0 2023-06-15 11:37:13,980 INFO [train.py:988] (0/4) Epoch 36, batch 400, loss[loss=0.193, simple_loss=0.2747, pruned_loss=0.05564, over 19855.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2935, pruned_loss=0.06693, over 3280480.58 frames. ], batch size: 120, lr: 9.11e-03, grad_scale: 32.0 2023-06-15 11:37:29,300 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=126960.0, ans=0.125 2023-06-15 11:37:46,346 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 11:38:04,081 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=127093.33333333333, ans=0.0 2023-06-15 11:38:13,704 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=127093.33333333333, ans=0.125 2023-06-15 11:38:40,669 INFO [train.py:988] (0/4) Epoch 36, batch 450, loss[loss=0.223, simple_loss=0.29, pruned_loss=0.07797, over 20538.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2938, pruned_loss=0.06695, over 3399511.65 frames. ], batch size: 189, lr: 9.10e-03, grad_scale: 32.0 2023-06-15 11:39:43,808 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.505e+02 1.789e+02 1.972e+02 2.291e+02 3.379e+02, threshold=3.945e+02, percent-clipped=0.0 2023-06-15 11:39:44,070 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127426.66666666667, ans=0.1 2023-06-15 11:39:44,072 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=127426.66666666667, ans=0.125 2023-06-15 11:39:55,472 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=127493.33333333333, ans=0.0 2023-06-15 11:40:05,541 INFO [train.py:988] (0/4) Epoch 36, batch 500, loss[loss=0.1977, simple_loss=0.2682, pruned_loss=0.06354, over 20288.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2931, pruned_loss=0.06705, over 3499898.47 frames. ], batch size: 239, lr: 9.09e-03, grad_scale: 32.0 2023-06-15 11:40:15,842 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=127560.0, ans=0.0 2023-06-15 11:40:19,301 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.00 vs. limit=10.0 2023-06-15 11:40:38,339 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=127693.33333333333, ans=0.5 2023-06-15 11:40:58,076 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-36.pt 2023-06-15 11:41:22,924 INFO [train.py:988] (0/4) Epoch 37, batch 0, loss[loss=0.2144, simple_loss=0.2899, pruned_loss=0.06941, over 19974.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.2899, pruned_loss=0.06941, over 19974.00 frames. ], batch size: 126, lr: 8.96e-03, grad_scale: 32.0 2023-06-15 11:41:22,924 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 11:41:29,087 INFO [train.py:1020] (0/4) Epoch 37, validation: loss=0.2017, simple_loss=0.3019, pruned_loss=0.05073, over 143649.00 frames. 2023-06-15 11:41:29,088 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 11:41:34,179 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=127780.0, ans=0.125 2023-06-15 11:41:38,266 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=127780.0, ans=0.125 2023-06-15 11:41:55,164 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=127846.66666666667, ans=0.125 2023-06-15 11:41:55,291 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=127846.66666666667, ans=0.125 2023-06-15 11:42:03,555 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-06-15 11:42:22,419 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=127980.0, ans=0.125 2023-06-15 11:42:50,047 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=128046.66666666667, ans=0.0 2023-06-15 11:42:53,556 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128046.66666666667, ans=0.1 2023-06-15 11:42:56,860 INFO [train.py:988] (0/4) Epoch 37, batch 50, loss[loss=0.213, simple_loss=0.2804, pruned_loss=0.07285, over 20192.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.29, pruned_loss=0.06478, over 866429.78 frames. ], batch size: 239, lr: 8.95e-03, grad_scale: 32.0 2023-06-15 11:43:04,218 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.437e+02 1.676e+02 1.887e+02 2.171e+02 3.433e+02, threshold=3.773e+02, percent-clipped=0.0 2023-06-15 11:43:20,471 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=128180.0, ans=0.1 2023-06-15 11:43:35,788 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=128246.66666666667, ans=0.0 2023-06-15 11:43:53,338 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=128313.33333333333, ans=0.02 2023-06-15 11:44:21,199 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=128380.0, ans=0.125 2023-06-15 11:44:21,441 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=128380.0, ans=0.125 2023-06-15 11:44:24,266 INFO [train.py:988] (0/4) Epoch 37, batch 100, loss[loss=0.2004, simple_loss=0.2699, pruned_loss=0.0654, over 20739.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2916, pruned_loss=0.06606, over 1500997.24 frames. ], batch size: 211, lr: 8.94e-03, grad_scale: 32.0 2023-06-15 11:44:37,390 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=128446.66666666667, ans=0.0 2023-06-15 11:44:51,572 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=128513.33333333333, ans=0.2 2023-06-15 11:45:00,230 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=128580.0, ans=0.0 2023-06-15 11:45:03,772 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128580.0, ans=0.1 2023-06-15 11:45:19,086 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=128646.66666666667, ans=0.125 2023-06-15 11:45:38,989 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2023-06-15 11:45:47,788 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-06-15 11:45:52,372 INFO [train.py:988] (0/4) Epoch 37, batch 150, loss[loss=0.2222, simple_loss=0.3099, pruned_loss=0.06721, over 17666.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.2922, pruned_loss=0.0664, over 2004160.92 frames. ], batch size: 67, lr: 8.93e-03, grad_scale: 32.0 2023-06-15 11:45:59,532 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.609e+02 1.837e+02 2.114e+02 2.317e+02 3.549e+02, threshold=4.229e+02, percent-clipped=0.0 2023-06-15 11:46:57,711 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128980.0, ans=0.1 2023-06-15 11:47:20,593 INFO [train.py:988] (0/4) Epoch 37, batch 200, loss[loss=0.2137, simple_loss=0.299, pruned_loss=0.06422, over 18240.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2918, pruned_loss=0.06573, over 2401510.84 frames. ], batch size: 74, lr: 8.92e-03, grad_scale: 32.0 2023-06-15 11:47:34,644 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=129113.33333333333, ans=0.2 2023-06-15 11:47:57,661 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-06-15 11:48:27,805 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=129313.33333333333, ans=0.0 2023-06-15 11:48:41,999 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=129380.0, ans=0.2 2023-06-15 11:48:48,305 INFO [train.py:988] (0/4) Epoch 37, batch 250, loss[loss=0.2241, simple_loss=0.2959, pruned_loss=0.07618, over 20129.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.292, pruned_loss=0.06563, over 2702107.21 frames. ], batch size: 133, lr: 8.91e-03, grad_scale: 32.0 2023-06-15 11:48:54,946 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+02 1.981e+02 2.345e+02 2.837e+02 3.921e+02, threshold=4.691e+02, percent-clipped=0.0 2023-06-15 11:49:08,763 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=129513.33333333333, ans=0.0 2023-06-15 11:49:21,062 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=129513.33333333333, ans=0.0 2023-06-15 11:49:21,164 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=129513.33333333333, ans=0.0 2023-06-15 11:50:12,395 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=129713.33333333333, ans=0.125 2023-06-15 11:50:16,946 INFO [train.py:988] (0/4) Epoch 37, batch 300, loss[loss=0.2088, simple_loss=0.2907, pruned_loss=0.0635, over 18762.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2922, pruned_loss=0.06575, over 2956750.16 frames. ], batch size: 83, lr: 8.90e-03, grad_scale: 32.0 2023-06-15 11:50:36,351 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=22.5 2023-06-15 11:50:48,518 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-06-15 11:51:16,293 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=129980.0, ans=0.125 2023-06-15 11:51:20,170 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=129980.0, ans=0.0 2023-06-15 11:51:21,950 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=129980.0, ans=0.125 2023-06-15 11:51:45,594 INFO [train.py:988] (0/4) Epoch 37, batch 350, loss[loss=0.2306, simple_loss=0.298, pruned_loss=0.08161, over 20287.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2922, pruned_loss=0.06587, over 3148861.07 frames. ], batch size: 141, lr: 8.89e-03, grad_scale: 32.0 2023-06-15 11:51:49,247 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=130113.33333333333, ans=0.05 2023-06-15 11:51:52,281 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.549e+02 1.921e+02 2.094e+02 2.447e+02 3.479e+02, threshold=4.189e+02, percent-clipped=0.0 2023-06-15 11:51:57,544 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=130113.33333333333, ans=15.0 2023-06-15 11:52:11,277 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=130180.0, ans=0.5 2023-06-15 11:52:20,053 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=130246.66666666667, ans=0.0 2023-06-15 11:52:26,740 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-06-15 11:52:45,983 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=130313.33333333333, ans=0.0 2023-06-15 11:52:58,118 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130380.0, ans=0.1 2023-06-15 11:53:08,541 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=130380.0, ans=0.125 2023-06-15 11:53:13,086 INFO [train.py:988] (0/4) Epoch 37, batch 400, loss[loss=0.2081, simple_loss=0.2957, pruned_loss=0.06025, over 19081.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2923, pruned_loss=0.06593, over 3303434.89 frames. ], batch size: 89, lr: 8.88e-03, grad_scale: 32.0 2023-06-15 11:53:27,411 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-06-15 11:53:29,135 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=130513.33333333333, ans=0.125 2023-06-15 11:53:43,961 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=130513.33333333333, ans=0.2 2023-06-15 11:53:53,120 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=130580.0, ans=0.0 2023-06-15 11:54:22,065 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=130646.66666666667, ans=0.07 2023-06-15 11:54:26,464 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=130713.33333333333, ans=15.0 2023-06-15 11:54:43,356 INFO [train.py:988] (0/4) Epoch 37, batch 450, loss[loss=0.1894, simple_loss=0.2746, pruned_loss=0.05208, over 18483.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2918, pruned_loss=0.06573, over 3404872.28 frames. ], batch size: 77, lr: 8.87e-03, grad_scale: 16.0 2023-06-15 11:54:51,576 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.554e+02 1.773e+02 2.041e+02 2.312e+02 3.124e+02, threshold=4.082e+02, percent-clipped=0.0 2023-06-15 11:55:04,246 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.10 vs. limit=10.0 2023-06-15 11:55:16,088 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2023-06-15 11:55:25,998 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=130913.33333333333, ans=0.125 2023-06-15 11:55:41,556 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=130980.0, ans=0.125 2023-06-15 11:56:09,208 INFO [train.py:988] (0/4) Epoch 37, batch 500, loss[loss=0.2084, simple_loss=0.2832, pruned_loss=0.06684, over 20302.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2916, pruned_loss=0.06565, over 3503387.46 frames. ], batch size: 239, lr: 8.86e-03, grad_scale: 16.0 2023-06-15 11:56:39,702 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2023-06-15 11:56:51,279 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=131246.66666666666, ans=0.0 2023-06-15 11:57:01,938 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-37.pt 2023-06-15 11:57:24,717 INFO [train.py:988] (0/4) Epoch 38, batch 0, loss[loss=0.207, simple_loss=0.2889, pruned_loss=0.06256, over 18781.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2889, pruned_loss=0.06256, over 18781.00 frames. ], batch size: 83, lr: 8.73e-03, grad_scale: 32.0 2023-06-15 11:57:24,717 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 11:57:31,191 INFO [train.py:1020] (0/4) Epoch 38, validation: loss=0.2046, simple_loss=0.3024, pruned_loss=0.05337, over 143649.00 frames. 2023-06-15 11:57:31,192 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 11:57:40,249 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=131326.66666666666, ans=0.0 2023-06-15 11:57:45,961 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=131326.66666666666, ans=0.125 2023-06-15 11:57:47,721 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=131393.33333333334, ans=0.0 2023-06-15 11:58:05,529 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=131460.0, ans=0.2 2023-06-15 11:58:07,567 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-06-15 11:58:11,965 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2023-06-15 11:58:14,479 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.554e+02 1.971e+02 2.221e+02 2.654e+02 3.969e+02, threshold=4.441e+02, percent-clipped=0.0 2023-06-15 11:59:00,147 INFO [train.py:988] (0/4) Epoch 38, batch 50, loss[loss=0.227, simple_loss=0.3195, pruned_loss=0.06728, over 17678.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.2922, pruned_loss=0.06475, over 847176.34 frames. ], batch size: 67, lr: 8.72e-03, grad_scale: 16.0 2023-06-15 11:59:04,770 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2023-06-15 11:59:13,210 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2023-06-15 11:59:16,404 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=131726.66666666666, ans=0.125 2023-06-15 11:59:24,798 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=131726.66666666666, ans=0.125 2023-06-15 12:00:07,906 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=131926.66666666666, ans=0.0 2023-06-15 12:00:26,787 INFO [train.py:988] (0/4) Epoch 38, batch 100, loss[loss=0.1959, simple_loss=0.2774, pruned_loss=0.0572, over 19073.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2912, pruned_loss=0.06516, over 1502053.63 frames. ], batch size: 94, lr: 8.71e-03, grad_scale: 16.0 2023-06-15 12:00:29,524 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-06-15 12:00:43,727 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=132060.0, ans=0.125 2023-06-15 12:00:55,732 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=132060.0, ans=0.125 2023-06-15 12:01:07,824 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.404e+02 1.839e+02 2.066e+02 2.376e+02 4.240e+02, threshold=4.131e+02, percent-clipped=0.0 2023-06-15 12:01:26,888 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=132193.33333333334, ans=0.125 2023-06-15 12:01:27,167 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=132193.33333333334, ans=0.0 2023-06-15 12:01:32,582 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2023-06-15 12:01:52,138 INFO [train.py:988] (0/4) Epoch 38, batch 150, loss[loss=0.2263, simple_loss=0.3159, pruned_loss=0.06835, over 16754.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2916, pruned_loss=0.06499, over 1999605.11 frames. ], batch size: 59, lr: 8.70e-03, grad_scale: 16.0 2023-06-15 12:01:57,143 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=12.0 2023-06-15 12:02:03,497 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=132326.66666666666, ans=0.125 2023-06-15 12:02:04,913 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 12:02:05,071 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=132326.66666666666, ans=0.125 2023-06-15 12:02:38,723 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=132460.0, ans=0.125 2023-06-15 12:03:04,081 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=22.5 2023-06-15 12:03:17,123 INFO [train.py:988] (0/4) Epoch 38, batch 200, loss[loss=0.2384, simple_loss=0.3325, pruned_loss=0.07211, over 17589.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.292, pruned_loss=0.06509, over 2391526.25 frames. ], batch size: 67, lr: 8.69e-03, grad_scale: 16.0 2023-06-15 12:03:58,558 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.434e+02 1.792e+02 1.952e+02 2.227e+02 3.208e+02, threshold=3.904e+02, percent-clipped=0.0 2023-06-15 12:04:13,798 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=132860.0, ans=0.04949747468305833 2023-06-15 12:04:38,397 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=132926.66666666666, ans=0.035 2023-06-15 12:04:43,044 INFO [train.py:988] (0/4) Epoch 38, batch 250, loss[loss=0.2188, simple_loss=0.309, pruned_loss=0.06436, over 18443.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2923, pruned_loss=0.06501, over 2687782.97 frames. ], batch size: 77, lr: 8.68e-03, grad_scale: 16.0 2023-06-15 12:04:46,430 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-06-15 12:04:57,777 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-06-15 12:05:28,903 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-06-15 12:05:36,625 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2023-06-15 12:05:55,756 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=133260.0, ans=0.125 2023-06-15 12:05:59,578 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2023-06-15 12:06:10,585 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/checkpoint-20000.pt 2023-06-15 12:06:13,049 INFO [train.py:988] (0/4) Epoch 38, batch 300, loss[loss=0.2274, simple_loss=0.3004, pruned_loss=0.07719, over 20580.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2929, pruned_loss=0.06523, over 2925274.04 frames. ], batch size: 173, lr: 8.67e-03, grad_scale: 16.0 2023-06-15 12:06:30,345 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=133393.33333333334, ans=0.07 2023-06-15 12:06:54,536 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.506e+02 1.862e+02 2.054e+02 2.308e+02 3.545e+02, threshold=4.107e+02, percent-clipped=0.0 2023-06-15 12:07:23,271 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=133593.33333333334, ans=0.125 2023-06-15 12:07:39,133 INFO [train.py:988] (0/4) Epoch 38, batch 350, loss[loss=0.209, simple_loss=0.2849, pruned_loss=0.06658, over 20458.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2931, pruned_loss=0.06516, over 3106155.33 frames. ], batch size: 160, lr: 8.66e-03, grad_scale: 16.0 2023-06-15 12:08:16,329 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=133793.33333333334, ans=0.04949747468305833 2023-06-15 12:08:28,270 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=133793.33333333334, ans=0.2 2023-06-15 12:08:45,875 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=133860.0, ans=0.0 2023-06-15 12:09:05,468 INFO [train.py:988] (0/4) Epoch 38, batch 400, loss[loss=0.1895, simple_loss=0.2713, pruned_loss=0.05388, over 19360.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2919, pruned_loss=0.06486, over 3256211.20 frames. ], batch size: 98, lr: 8.65e-03, grad_scale: 32.0 2023-06-15 12:09:09,348 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 12:09:20,700 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=133993.33333333334, ans=0.0 2023-06-15 12:09:26,306 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2023-06-15 12:09:27,205 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=134060.0, ans=0.0 2023-06-15 12:09:39,471 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2023-06-15 12:09:40,406 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=134126.66666666666, ans=0.0 2023-06-15 12:09:49,034 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.427e+02 1.764e+02 2.036e+02 2.337e+02 3.432e+02, threshold=4.071e+02, percent-clipped=0.0 2023-06-15 12:10:13,899 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134260.0, ans=0.1 2023-06-15 12:10:15,912 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134260.0, ans=0.1 2023-06-15 12:10:32,626 INFO [train.py:988] (0/4) Epoch 38, batch 450, loss[loss=0.1905, simple_loss=0.2779, pruned_loss=0.05153, over 19323.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2913, pruned_loss=0.06491, over 3390888.57 frames. ], batch size: 98, lr: 8.65e-03, grad_scale: 16.0 2023-06-15 12:10:34,666 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=134326.66666666666, ans=0.125 2023-06-15 12:10:36,494 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=134326.66666666666, ans=0.125 2023-06-15 12:11:36,822 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=134526.66666666666, ans=0.2 2023-06-15 12:11:41,869 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=134593.33333333334, ans=0.2 2023-06-15 12:11:43,756 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=134593.33333333334, ans=0.2 2023-06-15 12:11:54,902 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-06-15 12:11:56,436 INFO [train.py:988] (0/4) Epoch 38, batch 500, loss[loss=0.2158, simple_loss=0.2924, pruned_loss=0.06958, over 20285.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2908, pruned_loss=0.0646, over 3486014.99 frames. ], batch size: 141, lr: 8.64e-03, grad_scale: 16.0 2023-06-15 12:12:37,174 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.567e+02 1.813e+02 2.102e+02 2.344e+02 3.588e+02, threshold=4.203e+02, percent-clipped=0.0 2023-06-15 12:12:40,761 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=134793.33333333334, ans=0.0 2023-06-15 12:12:41,984 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134793.33333333334, ans=0.1 2023-06-15 12:12:42,589 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-06-15 12:12:47,754 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-38.pt 2023-06-15 12:13:08,843 INFO [train.py:988] (0/4) Epoch 39, batch 0, loss[loss=0.2028, simple_loss=0.292, pruned_loss=0.05676, over 18939.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.292, pruned_loss=0.05676, over 18939.00 frames. ], batch size: 86, lr: 8.52e-03, grad_scale: 32.0 2023-06-15 12:13:08,845 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 12:13:14,994 INFO [train.py:1020] (0/4) Epoch 39, validation: loss=0.2008, simple_loss=0.3008, pruned_loss=0.05042, over 143649.00 frames. 2023-06-15 12:13:14,994 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 12:13:35,945 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=134940.0, ans=0.125 2023-06-15 12:13:36,052 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=134940.0, ans=15.0 2023-06-15 12:13:37,538 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=134940.0, ans=0.125 2023-06-15 12:14:09,049 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=135073.33333333334, ans=0.125 2023-06-15 12:14:17,834 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135073.33333333334, ans=0.1 2023-06-15 12:14:29,001 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=135140.0, ans=0.125 2023-06-15 12:14:42,556 INFO [train.py:988] (0/4) Epoch 39, batch 50, loss[loss=0.222, simple_loss=0.2764, pruned_loss=0.08378, over 19837.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2898, pruned_loss=0.06647, over 858350.34 frames. ], batch size: 293, lr: 8.51e-03, grad_scale: 16.0 2023-06-15 12:15:28,972 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=135340.0, ans=0.125 2023-06-15 12:15:29,070 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=135340.0, ans=0.025 2023-06-15 12:15:58,878 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.571e+02 1.823e+02 2.055e+02 2.289e+02 2.991e+02, threshold=4.109e+02, percent-clipped=0.0 2023-06-15 12:16:09,597 INFO [train.py:988] (0/4) Epoch 39, batch 100, loss[loss=0.2166, simple_loss=0.2773, pruned_loss=0.07794, over 19884.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2924, pruned_loss=0.06652, over 1495471.60 frames. ], batch size: 294, lr: 8.50e-03, grad_scale: 16.0 2023-06-15 12:16:26,290 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=135606.66666666666, ans=0.95 2023-06-15 12:16:40,633 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=135606.66666666666, ans=0.0 2023-06-15 12:17:26,447 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.45 vs. limit=12.0 2023-06-15 12:17:35,691 INFO [train.py:988] (0/4) Epoch 39, batch 150, loss[loss=0.2112, simple_loss=0.2866, pruned_loss=0.06785, over 20214.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2907, pruned_loss=0.066, over 1992703.78 frames. ], batch size: 141, lr: 8.49e-03, grad_scale: 16.0 2023-06-15 12:17:51,220 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=135940.0, ans=0.125 2023-06-15 12:17:53,298 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=135940.0, ans=0.0 2023-06-15 12:18:21,437 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=136006.66666666666, ans=0.125 2023-06-15 12:18:52,154 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.450e+02 1.763e+02 2.014e+02 2.292e+02 3.195e+02, threshold=4.028e+02, percent-clipped=0.0 2023-06-15 12:18:53,843 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2023-06-15 12:19:03,515 INFO [train.py:988] (0/4) Epoch 39, batch 200, loss[loss=0.2, simple_loss=0.2876, pruned_loss=0.05623, over 19467.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2891, pruned_loss=0.06544, over 2398619.70 frames. ], batch size: 105, lr: 8.48e-03, grad_scale: 16.0 2023-06-15 12:20:08,788 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=136406.66666666666, ans=0.125 2023-06-15 12:20:26,161 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136473.33333333334, ans=0.125 2023-06-15 12:20:31,125 INFO [train.py:988] (0/4) Epoch 39, batch 250, loss[loss=0.2026, simple_loss=0.2905, pruned_loss=0.05732, over 18322.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2891, pruned_loss=0.06443, over 2704030.38 frames. ], batch size: 74, lr: 8.47e-03, grad_scale: 16.0 2023-06-15 12:20:55,634 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=136606.66666666666, ans=0.0 2023-06-15 12:20:56,781 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-06-15 12:21:06,501 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=136673.33333333334, ans=0.125 2023-06-15 12:21:18,907 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=136673.33333333334, ans=0.2 2023-06-15 12:21:28,194 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=136740.0, ans=0.2 2023-06-15 12:21:35,102 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=136740.0, ans=0.5 2023-06-15 12:21:48,757 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.433e+02 1.779e+02 1.952e+02 2.172e+02 3.259e+02, threshold=3.903e+02, percent-clipped=0.0 2023-06-15 12:21:50,762 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=136806.66666666666, ans=0.125 2023-06-15 12:21:59,978 INFO [train.py:988] (0/4) Epoch 39, batch 300, loss[loss=0.1973, simple_loss=0.2813, pruned_loss=0.05663, over 18623.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.2887, pruned_loss=0.06419, over 2946122.56 frames. ], batch size: 80, lr: 8.46e-03, grad_scale: 16.0 2023-06-15 12:22:03,855 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=136873.33333333334, ans=0.025 2023-06-15 12:22:31,699 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=136940.0, ans=0.125 2023-06-15 12:23:04,628 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=137073.33333333334, ans=0.0 2023-06-15 12:23:06,374 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=137073.33333333334, ans=0.0 2023-06-15 12:23:26,377 INFO [train.py:988] (0/4) Epoch 39, batch 350, loss[loss=0.2345, simple_loss=0.3267, pruned_loss=0.07116, over 16206.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2895, pruned_loss=0.06462, over 3136782.03 frames. ], batch size: 52, lr: 8.45e-03, grad_scale: 16.0 2023-06-15 12:23:34,473 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=137206.66666666666, ans=0.0 2023-06-15 12:23:36,405 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=137206.66666666666, ans=0.0 2023-06-15 12:23:46,732 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=137273.33333333334, ans=0.1 2023-06-15 12:23:53,713 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=137273.33333333334, ans=0.2 2023-06-15 12:23:55,362 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=137273.33333333334, ans=0.125 2023-06-15 12:24:16,079 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=137340.0, ans=0.125 2023-06-15 12:24:28,375 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137406.66666666666, ans=0.1 2023-06-15 12:24:44,642 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+02 1.816e+02 2.065e+02 2.366e+02 3.841e+02, threshold=4.130e+02, percent-clipped=0.0 2023-06-15 12:24:54,401 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=22.5 2023-06-15 12:24:55,381 INFO [train.py:988] (0/4) Epoch 39, batch 400, loss[loss=0.1909, simple_loss=0.2718, pruned_loss=0.05501, over 19326.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2891, pruned_loss=0.06448, over 3290004.28 frames. ], batch size: 98, lr: 8.44e-03, grad_scale: 32.0 2023-06-15 12:24:59,874 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=137540.0, ans=0.0 2023-06-15 12:25:17,595 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137606.66666666666, ans=0.1 2023-06-15 12:25:21,078 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=137606.66666666666, ans=0.5 2023-06-15 12:26:14,343 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=137806.66666666666, ans=0.125 2023-06-15 12:26:24,762 INFO [train.py:988] (0/4) Epoch 39, batch 450, loss[loss=0.2197, simple_loss=0.3098, pruned_loss=0.06479, over 16280.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2895, pruned_loss=0.06426, over 3400625.90 frames. ], batch size: 52, lr: 8.44e-03, grad_scale: 16.0 2023-06-15 12:26:30,025 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=137873.33333333334, ans=0.125 2023-06-15 12:26:36,315 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=137873.33333333334, ans=0.125 2023-06-15 12:26:41,390 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=137940.0, ans=0.125 2023-06-15 12:26:54,631 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=137940.0, ans=0.2 2023-06-15 12:27:04,836 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2023-06-15 12:27:22,863 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2023-06-15 12:27:27,390 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=138073.33333333334, ans=0.0 2023-06-15 12:27:41,492 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.423e+02 1.826e+02 2.179e+02 2.454e+02 3.798e+02, threshold=4.358e+02, percent-clipped=0.0 2023-06-15 12:27:48,573 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=138206.66666666666, ans=0.2 2023-06-15 12:27:49,800 INFO [train.py:988] (0/4) Epoch 39, batch 500, loss[loss=0.2056, simple_loss=0.2917, pruned_loss=0.05976, over 18800.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.29, pruned_loss=0.06409, over 3483296.28 frames. ], batch size: 83, lr: 8.43e-03, grad_scale: 16.0 2023-06-15 12:28:30,235 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=138340.0, ans=0.125 2023-06-15 12:28:30,643 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=22.5 2023-06-15 12:28:42,434 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-39.pt 2023-06-15 12:29:07,665 INFO [train.py:988] (0/4) Epoch 40, batch 0, loss[loss=0.2059, simple_loss=0.2809, pruned_loss=0.06543, over 20521.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2809, pruned_loss=0.06543, over 20521.00 frames. ], batch size: 160, lr: 8.31e-03, grad_scale: 32.0 2023-06-15 12:29:07,666 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 12:29:13,814 INFO [train.py:1020] (0/4) Epoch 40, validation: loss=0.2011, simple_loss=0.3008, pruned_loss=0.05073, over 143649.00 frames. 2023-06-15 12:29:13,815 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 12:29:19,497 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2023-06-15 12:29:41,070 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-06-15 12:29:51,782 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2023-06-15 12:29:54,275 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=138553.33333333334, ans=0.125 2023-06-15 12:30:21,472 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=138620.0, ans=0.0 2023-06-15 12:30:26,916 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=138686.66666666666, ans=0.0 2023-06-15 12:30:42,317 INFO [train.py:988] (0/4) Epoch 40, batch 50, loss[loss=0.212, simple_loss=0.2889, pruned_loss=0.06749, over 20103.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2884, pruned_loss=0.06335, over 858584.40 frames. ], batch size: 133, lr: 8.31e-03, grad_scale: 32.0 2023-06-15 12:30:46,298 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2023-06-15 12:30:51,034 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=138753.33333333334, ans=0.125 2023-06-15 12:31:01,875 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=138820.0, ans=0.5 2023-06-15 12:31:05,858 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+02 1.755e+02 2.069e+02 2.333e+02 3.346e+02, threshold=4.138e+02, percent-clipped=0.0 2023-06-15 12:31:30,765 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=138886.66666666666, ans=0.05 2023-06-15 12:31:32,510 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=138886.66666666666, ans=0.025 2023-06-15 12:32:12,083 INFO [train.py:988] (0/4) Epoch 40, batch 100, loss[loss=0.2255, simple_loss=0.3125, pruned_loss=0.06924, over 16725.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2888, pruned_loss=0.06301, over 1509077.37 frames. ], batch size: 59, lr: 8.30e-03, grad_scale: 32.0 2023-06-15 12:32:20,167 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.04 vs. limit=10.0 2023-06-15 12:32:24,493 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=139086.66666666666, ans=0.0 2023-06-15 12:32:31,863 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 12:32:37,694 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=139153.33333333334, ans=0.0 2023-06-15 12:32:52,044 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=139220.0, ans=0.0 2023-06-15 12:32:57,531 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=139220.0, ans=0.1 2023-06-15 12:33:01,599 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2023-06-15 12:33:21,489 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=139353.33333333334, ans=0.2 2023-06-15 12:33:30,713 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=139353.33333333334, ans=0.125 2023-06-15 12:33:36,397 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=139353.33333333334, ans=0.125 2023-06-15 12:33:36,607 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=139353.33333333334, ans=0.0 2023-06-15 12:33:39,792 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=139420.0, ans=0.125 2023-06-15 12:33:41,067 INFO [train.py:988] (0/4) Epoch 40, batch 150, loss[loss=0.197, simple_loss=0.2719, pruned_loss=0.06102, over 20618.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.289, pruned_loss=0.06312, over 2001481.87 frames. ], batch size: 189, lr: 8.29e-03, grad_scale: 32.0 2023-06-15 12:34:03,172 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.509e+02 1.849e+02 1.991e+02 2.229e+02 4.188e+02, threshold=3.982e+02, percent-clipped=1.0 2023-06-15 12:34:05,079 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-06-15 12:34:34,204 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=139620.0, ans=0.0 2023-06-15 12:34:51,609 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=139686.66666666666, ans=0.015 2023-06-15 12:35:09,523 INFO [train.py:988] (0/4) Epoch 40, batch 200, loss[loss=0.2015, simple_loss=0.2854, pruned_loss=0.05884, over 19535.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2905, pruned_loss=0.06281, over 2365987.60 frames. ], batch size: 102, lr: 8.28e-03, grad_scale: 32.0 2023-06-15 12:35:21,869 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2023-06-15 12:35:25,886 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=139820.0, ans=0.125 2023-06-15 12:35:44,198 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=139886.66666666666, ans=0.125 2023-06-15 12:35:49,469 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=139886.66666666666, ans=0.125 2023-06-15 12:36:03,275 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=139953.33333333334, ans=0.1 2023-06-15 12:36:08,151 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 12:36:16,274 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2023-06-15 12:36:36,555 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=140086.66666666666, ans=0.125 2023-06-15 12:36:37,805 INFO [train.py:988] (0/4) Epoch 40, batch 250, loss[loss=0.2153, simple_loss=0.2979, pruned_loss=0.06634, over 18794.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2894, pruned_loss=0.0629, over 2687447.44 frames. ], batch size: 83, lr: 8.27e-03, grad_scale: 32.0 2023-06-15 12:36:52,686 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=140086.66666666666, ans=10.0 2023-06-15 12:37:01,101 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.450e+02 1.820e+02 2.078e+02 2.421e+02 4.152e+02, threshold=4.155e+02, percent-clipped=1.0 2023-06-15 12:37:42,781 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 12:37:48,451 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=140353.33333333334, ans=0.125 2023-06-15 12:37:51,896 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=140353.33333333334, ans=0.125 2023-06-15 12:37:52,530 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-06-15 12:38:08,087 INFO [train.py:988] (0/4) Epoch 40, batch 300, loss[loss=0.1951, simple_loss=0.2801, pruned_loss=0.05503, over 19676.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2888, pruned_loss=0.0631, over 2931583.05 frames. ], batch size: 110, lr: 8.26e-03, grad_scale: 32.0 2023-06-15 12:38:37,212 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140486.66666666666, ans=0.1 2023-06-15 12:38:57,561 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2023-06-15 12:39:06,544 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2023-06-15 12:39:26,231 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.66 vs. limit=10.0 2023-06-15 12:39:38,305 INFO [train.py:988] (0/4) Epoch 40, batch 350, loss[loss=0.1931, simple_loss=0.2749, pruned_loss=0.05568, over 19215.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.288, pruned_loss=0.06322, over 3110659.31 frames. ], batch size: 92, lr: 8.25e-03, grad_scale: 32.0 2023-06-15 12:40:01,669 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+02 1.757e+02 1.917e+02 2.241e+02 2.935e+02, threshold=3.834e+02, percent-clipped=0.0 2023-06-15 12:40:59,249 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=141020.0, ans=0.125 2023-06-15 12:41:00,797 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=141020.0, ans=0.0 2023-06-15 12:41:04,321 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=141020.0, ans=0.2 2023-06-15 12:41:08,568 INFO [train.py:988] (0/4) Epoch 40, batch 400, loss[loss=0.2032, simple_loss=0.2857, pruned_loss=0.06038, over 19815.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.287, pruned_loss=0.06284, over 3258761.65 frames. ], batch size: 120, lr: 8.24e-03, grad_scale: 32.0 2023-06-15 12:41:20,604 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=141086.66666666666, ans=0.2 2023-06-15 12:41:26,556 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=141153.33333333334, ans=0.0 2023-06-15 12:41:56,274 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=141220.0, ans=0.125 2023-06-15 12:42:32,389 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2023-06-15 12:42:36,464 INFO [train.py:988] (0/4) Epoch 40, batch 450, loss[loss=0.2102, simple_loss=0.2857, pruned_loss=0.06735, over 20601.00 frames. ], tot_loss[loss=0.207, simple_loss=0.288, pruned_loss=0.06306, over 3369692.88 frames. ], batch size: 173, lr: 8.24e-03, grad_scale: 32.0 2023-06-15 12:42:40,274 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 12:42:57,944 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=141486.66666666666, ans=0.2 2023-06-15 12:42:59,283 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.571e+02 1.768e+02 1.885e+02 2.206e+02 3.327e+02, threshold=3.770e+02, percent-clipped=0.0 2023-06-15 12:43:32,642 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=141620.0, ans=0.025 2023-06-15 12:43:47,280 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2023-06-15 12:44:03,573 INFO [train.py:988] (0/4) Epoch 40, batch 500, loss[loss=0.1838, simple_loss=0.2694, pruned_loss=0.04907, over 19815.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.2876, pruned_loss=0.06297, over 3450851.12 frames. ], batch size: 115, lr: 8.23e-03, grad_scale: 32.0 2023-06-15 12:44:10,415 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=141753.33333333334, ans=0.0 2023-06-15 12:44:10,727 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=141753.33333333334, ans=0.125 2023-06-15 12:44:33,619 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141820.0, ans=0.1 2023-06-15 12:44:49,139 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-06-15 12:44:51,754 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=141953.33333333334, ans=0.0 2023-06-15 12:44:57,846 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-40.pt 2023-06-15 12:45:20,960 INFO [train.py:988] (0/4) Epoch 41, batch 0, loss[loss=0.2092, simple_loss=0.2919, pruned_loss=0.0632, over 19509.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2919, pruned_loss=0.0632, over 19509.00 frames. ], batch size: 105, lr: 8.12e-03, grad_scale: 32.0 2023-06-15 12:45:20,961 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 12:45:28,030 INFO [train.py:1020] (0/4) Epoch 41, validation: loss=0.2002, simple_loss=0.2999, pruned_loss=0.05026, over 143649.00 frames. 2023-06-15 12:45:28,030 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 12:45:39,729 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=22.5 2023-06-15 12:45:44,281 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=142040.0, ans=0.125 2023-06-15 12:46:21,183 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.355e+02 1.817e+02 2.110e+02 2.443e+02 3.477e+02, threshold=4.219e+02, percent-clipped=0.0 2023-06-15 12:46:57,296 INFO [train.py:988] (0/4) Epoch 41, batch 50, loss[loss=0.1904, simple_loss=0.2758, pruned_loss=0.05255, over 19854.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2856, pruned_loss=0.06154, over 865602.62 frames. ], batch size: 120, lr: 8.11e-03, grad_scale: 32.0 2023-06-15 12:47:45,569 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142440.0, ans=0.1 2023-06-15 12:47:54,637 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=142506.66666666666, ans=0.2 2023-06-15 12:48:25,564 INFO [train.py:988] (0/4) Epoch 41, batch 100, loss[loss=0.2034, simple_loss=0.2796, pruned_loss=0.06358, over 20565.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2865, pruned_loss=0.06264, over 1528147.22 frames. ], batch size: 173, lr: 8.10e-03, grad_scale: 32.0 2023-06-15 12:48:56,564 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=142706.66666666666, ans=0.0 2023-06-15 12:49:17,450 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=142840.0, ans=0.0 2023-06-15 12:49:18,944 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.448e+02 1.848e+02 2.100e+02 2.504e+02 3.647e+02, threshold=4.200e+02, percent-clipped=0.0 2023-06-15 12:49:51,274 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142906.66666666666, ans=0.1 2023-06-15 12:49:54,451 INFO [train.py:988] (0/4) Epoch 41, batch 150, loss[loss=0.2091, simple_loss=0.2844, pruned_loss=0.06689, over 20447.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2865, pruned_loss=0.0617, over 2038549.15 frames. ], batch size: 160, lr: 8.09e-03, grad_scale: 32.0 2023-06-15 12:49:58,906 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=142973.33333333334, ans=0.0 2023-06-15 12:50:39,904 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=143106.66666666666, ans=0.125 2023-06-15 12:50:50,872 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=143173.33333333334, ans=0.125 2023-06-15 12:50:52,550 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=143173.33333333334, ans=0.2 2023-06-15 12:50:52,638 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=143173.33333333334, ans=0.0 2023-06-15 12:51:24,264 INFO [train.py:988] (0/4) Epoch 41, batch 200, loss[loss=0.2327, simple_loss=0.3098, pruned_loss=0.07785, over 10872.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2878, pruned_loss=0.06222, over 2427142.45 frames. ], batch size: 30, lr: 8.09e-03, grad_scale: 32.0 2023-06-15 12:51:28,157 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=143306.66666666666, ans=0.05 2023-06-15 12:51:40,293 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=143373.33333333334, ans=0.125 2023-06-15 12:51:46,009 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 12:51:50,422 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.70 vs. limit=6.0 2023-06-15 12:51:57,209 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=143373.33333333334, ans=0.125 2023-06-15 12:51:59,333 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-06-15 12:51:59,555 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2023-06-15 12:52:11,090 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=143440.0, ans=0.0 2023-06-15 12:52:18,421 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.520e+02 1.761e+02 1.969e+02 2.341e+02 3.526e+02, threshold=3.938e+02, percent-clipped=0.0 2023-06-15 12:52:54,192 INFO [train.py:988] (0/4) Epoch 41, batch 250, loss[loss=0.2281, simple_loss=0.3056, pruned_loss=0.07535, over 20108.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2881, pruned_loss=0.06285, over 2724295.68 frames. ], batch size: 133, lr: 8.08e-03, grad_scale: 32.0 2023-06-15 12:53:16,132 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2023-06-15 12:53:43,644 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2023-06-15 12:53:59,513 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=143840.0, ans=0.0 2023-06-15 12:54:24,715 INFO [train.py:988] (0/4) Epoch 41, batch 300, loss[loss=0.1972, simple_loss=0.2835, pruned_loss=0.05541, over 18460.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2875, pruned_loss=0.06269, over 2966795.75 frames. ], batch size: 77, lr: 8.07e-03, grad_scale: 32.0 2023-06-15 12:54:33,488 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.80 vs. limit=6.0 2023-06-15 12:54:49,959 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=144040.0, ans=0.125 2023-06-15 12:55:00,286 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=22.5 2023-06-15 12:55:19,255 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.484e+02 1.842e+02 2.030e+02 2.348e+02 3.359e+02, threshold=4.059e+02, percent-clipped=0.0 2023-06-15 12:55:35,775 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.79 vs. limit=10.0 2023-06-15 12:55:44,857 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=144240.0, ans=0.0 2023-06-15 12:55:54,947 INFO [train.py:988] (0/4) Epoch 41, batch 350, loss[loss=0.2251, simple_loss=0.303, pruned_loss=0.07356, over 10779.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2867, pruned_loss=0.06268, over 3153931.33 frames. ], batch size: 30, lr: 8.06e-03, grad_scale: 32.0 2023-06-15 12:56:43,959 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=22.5 2023-06-15 12:56:48,766 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=144506.66666666666, ans=0.0 2023-06-15 12:57:14,976 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=144573.33333333334, ans=0.2 2023-06-15 12:57:22,275 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=144573.33333333334, ans=0.2 2023-06-15 12:57:25,290 INFO [train.py:988] (0/4) Epoch 41, batch 400, loss[loss=0.2175, simple_loss=0.2912, pruned_loss=0.07193, over 20631.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2863, pruned_loss=0.06281, over 3307848.48 frames. ], batch size: 173, lr: 8.05e-03, grad_scale: 32.0 2023-06-15 12:57:25,591 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=144640.0, ans=0.0 2023-06-15 12:57:25,631 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=144640.0, ans=0.2 2023-06-15 12:57:29,316 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=144640.0, ans=0.125 2023-06-15 12:57:38,310 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=144640.0, ans=0.015 2023-06-15 12:57:59,044 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=144773.33333333334, ans=0.2 2023-06-15 12:58:04,679 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=144773.33333333334, ans=0.1 2023-06-15 12:58:17,903 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.445e+02 1.778e+02 1.933e+02 2.211e+02 3.033e+02, threshold=3.866e+02, percent-clipped=0.0 2023-06-15 12:58:29,714 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-06-15 12:58:39,262 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=144906.66666666666, ans=0.125 2023-06-15 12:58:39,315 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=144906.66666666666, ans=0.125 2023-06-15 12:58:45,361 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2023-06-15 12:58:53,404 INFO [train.py:988] (0/4) Epoch 41, batch 450, loss[loss=0.1955, simple_loss=0.2824, pruned_loss=0.05429, over 19665.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2858, pruned_loss=0.06254, over 3431757.60 frames. ], batch size: 110, lr: 8.04e-03, grad_scale: 32.0 2023-06-15 12:59:02,211 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=144973.33333333334, ans=0.09899494936611666 2023-06-15 12:59:25,514 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-06-15 12:59:27,609 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145106.66666666666, ans=0.1 2023-06-15 12:59:33,669 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=15.0 2023-06-15 12:59:36,043 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=145106.66666666666, ans=0.2 2023-06-15 13:00:17,856 INFO [train.py:988] (0/4) Epoch 41, batch 500, loss[loss=0.2184, simple_loss=0.2944, pruned_loss=0.07119, over 20721.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2861, pruned_loss=0.06239, over 3519495.93 frames. ], batch size: 211, lr: 8.04e-03, grad_scale: 32.0 2023-06-15 13:00:31,683 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145306.66666666666, ans=0.1 2023-06-15 13:00:51,639 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=145440.0, ans=0.0 2023-06-15 13:01:07,580 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.500e+02 1.783e+02 1.958e+02 2.209e+02 2.904e+02, threshold=3.915e+02, percent-clipped=0.0 2023-06-15 13:01:10,586 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-41.pt 2023-06-15 13:01:34,519 INFO [train.py:988] (0/4) Epoch 42, batch 0, loss[loss=0.189, simple_loss=0.2772, pruned_loss=0.05042, over 19750.00 frames. ], tot_loss[loss=0.189, simple_loss=0.2772, pruned_loss=0.05042, over 19750.00 frames. ], batch size: 115, lr: 7.93e-03, grad_scale: 32.0 2023-06-15 13:01:34,520 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 13:01:40,652 INFO [train.py:1020] (0/4) Epoch 42, validation: loss=0.1999, simple_loss=0.2992, pruned_loss=0.05028, over 143649.00 frames. 2023-06-15 13:01:40,653 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 13:01:44,463 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=145520.0, ans=0.125 2023-06-15 13:01:51,809 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=145520.0, ans=0.125 2023-06-15 13:02:05,378 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=145586.66666666666, ans=0.125 2023-06-15 13:02:13,369 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-06-15 13:02:17,811 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=145653.33333333334, ans=0.125 2023-06-15 13:02:17,890 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=145653.33333333334, ans=0.125 2023-06-15 13:02:28,214 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2023-06-15 13:03:10,725 INFO [train.py:988] (0/4) Epoch 42, batch 50, loss[loss=0.1912, simple_loss=0.2645, pruned_loss=0.0589, over 20239.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.2866, pruned_loss=0.06151, over 873561.75 frames. ], batch size: 239, lr: 7.93e-03, grad_scale: 32.0 2023-06-15 13:03:15,355 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2023-06-15 13:03:31,012 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=145920.0, ans=0.125 2023-06-15 13:03:31,070 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=145920.0, ans=0.2 2023-06-15 13:04:07,272 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2023-06-15 13:04:19,112 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=146053.33333333334, ans=0.125 2023-06-15 13:04:35,827 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.571e+02 1.815e+02 2.002e+02 2.267e+02 3.167e+02, threshold=4.003e+02, percent-clipped=0.0 2023-06-15 13:04:39,175 INFO [train.py:988] (0/4) Epoch 42, batch 100, loss[loss=0.212, simple_loss=0.2945, pruned_loss=0.06477, over 18646.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2867, pruned_loss=0.06139, over 1513880.56 frames. ], batch size: 80, lr: 7.92e-03, grad_scale: 32.0 2023-06-15 13:04:58,442 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2023-06-15 13:05:34,160 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=146386.66666666666, ans=0.0 2023-06-15 13:05:47,684 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.82 vs. limit=6.0 2023-06-15 13:06:02,354 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-06-15 13:06:08,108 INFO [train.py:988] (0/4) Epoch 42, batch 150, loss[loss=0.2012, simple_loss=0.2881, pruned_loss=0.05715, over 19550.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2872, pruned_loss=0.0618, over 2007981.57 frames. ], batch size: 102, lr: 7.91e-03, grad_scale: 32.0 2023-06-15 13:06:12,392 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-06-15 13:06:29,852 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=146586.66666666666, ans=0.5 2023-06-15 13:06:55,413 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=146653.33333333334, ans=0.2 2023-06-15 13:07:14,046 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=15.0 2023-06-15 13:07:16,354 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=146720.0, ans=0.0 2023-06-15 13:07:25,049 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=146786.66666666666, ans=0.0 2023-06-15 13:07:34,325 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.452e+02 1.782e+02 1.960e+02 2.224e+02 3.500e+02, threshold=3.921e+02, percent-clipped=0.0 2023-06-15 13:07:37,701 INFO [train.py:988] (0/4) Epoch 42, batch 200, loss[loss=0.1973, simple_loss=0.2859, pruned_loss=0.05436, over 18883.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2867, pruned_loss=0.06152, over 2407447.34 frames. ], batch size: 86, lr: 7.90e-03, grad_scale: 32.0 2023-06-15 13:08:04,522 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=146920.0, ans=0.0 2023-06-15 13:08:22,092 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=22.5 2023-06-15 13:08:53,487 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=147120.0, ans=0.125 2023-06-15 13:09:07,777 INFO [train.py:988] (0/4) Epoch 42, batch 250, loss[loss=0.222, simple_loss=0.3123, pruned_loss=0.0659, over 16699.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2867, pruned_loss=0.06122, over 2700865.90 frames. ], batch size: 59, lr: 7.89e-03, grad_scale: 32.0 2023-06-15 13:09:11,707 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147186.66666666666, ans=0.1 2023-06-15 13:09:13,790 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=22.5 2023-06-15 13:09:31,875 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 13:09:35,371 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=147253.33333333334, ans=0.125 2023-06-15 13:09:37,029 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=147253.33333333334, ans=0.02 2023-06-15 13:09:38,427 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2023-06-15 13:09:41,201 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=147320.0, ans=0.125 2023-06-15 13:09:51,358 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=147320.0, ans=22.5 2023-06-15 13:10:10,927 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=147386.66666666666, ans=0.0 2023-06-15 13:10:32,062 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.470e+02 1.737e+02 1.907e+02 2.072e+02 2.821e+02, threshold=3.814e+02, percent-clipped=0.0 2023-06-15 13:10:36,345 INFO [train.py:988] (0/4) Epoch 42, batch 300, loss[loss=0.2004, simple_loss=0.2863, pruned_loss=0.05726, over 15500.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2863, pruned_loss=0.06109, over 2942103.78 frames. ], batch size: 44, lr: 7.88e-03, grad_scale: 32.0 2023-06-15 13:11:03,007 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=147586.66666666666, ans=0.125 2023-06-15 13:11:53,148 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2023-06-15 13:12:05,696 INFO [train.py:988] (0/4) Epoch 42, batch 350, loss[loss=0.2016, simple_loss=0.292, pruned_loss=0.05562, over 16975.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2859, pruned_loss=0.06145, over 3152704.16 frames. ], batch size: 60, lr: 7.88e-03, grad_scale: 32.0 2023-06-15 13:12:10,035 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=147853.33333333334, ans=0.2 2023-06-15 13:12:17,087 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2023-06-15 13:12:30,270 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147920.0, ans=0.1 2023-06-15 13:13:29,565 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 13:13:30,814 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.475e+02 1.764e+02 1.957e+02 2.256e+02 2.981e+02, threshold=3.914e+02, percent-clipped=0.0 2023-06-15 13:13:34,291 INFO [train.py:988] (0/4) Epoch 42, batch 400, loss[loss=0.2146, simple_loss=0.3058, pruned_loss=0.06169, over 15539.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2855, pruned_loss=0.06109, over 3305331.12 frames. ], batch size: 44, lr: 7.87e-03, grad_scale: 32.0 2023-06-15 13:13:38,530 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=148186.66666666666, ans=0.2 2023-06-15 13:13:56,448 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-06-15 13:14:04,487 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=148253.33333333334, ans=0.125 2023-06-15 13:14:54,159 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=148453.33333333334, ans=0.04949747468305833 2023-06-15 13:15:03,353 INFO [train.py:988] (0/4) Epoch 42, batch 450, loss[loss=0.216, simple_loss=0.2876, pruned_loss=0.07217, over 19951.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2857, pruned_loss=0.0612, over 3417206.99 frames. ], batch size: 126, lr: 7.86e-03, grad_scale: 32.0 2023-06-15 13:15:05,466 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=148520.0, ans=0.0 2023-06-15 13:15:16,389 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=148520.0, ans=0.125 2023-06-15 13:15:26,929 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=148586.66666666666, ans=0.0 2023-06-15 13:15:30,012 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=148586.66666666666, ans=0.125 2023-06-15 13:15:40,638 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=148653.33333333334, ans=0.0 2023-06-15 13:15:43,020 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=12.0 2023-06-15 13:15:50,723 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=148653.33333333334, ans=0.05 2023-06-15 13:16:07,939 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=148720.0, ans=0.0 2023-06-15 13:16:26,115 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.384e+02 1.894e+02 2.076e+02 2.350e+02 3.042e+02, threshold=4.151e+02, percent-clipped=0.0 2023-06-15 13:16:29,393 INFO [train.py:988] (0/4) Epoch 42, batch 500, loss[loss=0.2176, simple_loss=0.2642, pruned_loss=0.08549, over 17016.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2856, pruned_loss=0.06105, over 3502825.15 frames. ], batch size: 391, lr: 7.85e-03, grad_scale: 32.0 2023-06-15 13:16:36,440 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=148853.33333333334, ans=0.07 2023-06-15 13:16:41,559 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=148853.33333333334, ans=0.125 2023-06-15 13:16:58,932 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148920.0, ans=0.1 2023-06-15 13:17:05,274 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=148986.66666666666, ans=0.0 2023-06-15 13:17:12,466 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2023-06-15 13:17:13,396 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=148986.66666666666, ans=0.0 2023-06-15 13:17:23,796 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-42.pt 2023-06-15 13:17:51,474 INFO [train.py:988] (0/4) Epoch 43, batch 0, loss[loss=0.2007, simple_loss=0.2909, pruned_loss=0.0552, over 18928.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2909, pruned_loss=0.0552, over 18928.00 frames. ], batch size: 86, lr: 7.76e-03, grad_scale: 32.0 2023-06-15 13:17:51,475 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 13:17:57,700 INFO [train.py:1020] (0/4) Epoch 43, validation: loss=0.2014, simple_loss=0.3004, pruned_loss=0.05115, over 143649.00 frames. 2023-06-15 13:17:57,701 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 13:17:59,779 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=149073.33333333334, ans=0.07 2023-06-15 13:18:09,439 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=149073.33333333334, ans=0.1 2023-06-15 13:18:18,549 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=149140.0, ans=0.125 2023-06-15 13:18:42,460 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=149206.66666666666, ans=0.125 2023-06-15 13:19:01,720 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-06-15 13:19:09,913 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=149340.0, ans=0.0 2023-06-15 13:19:23,826 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=149340.0, ans=0.0 2023-06-15 13:19:26,754 INFO [train.py:988] (0/4) Epoch 43, batch 50, loss[loss=0.2329, simple_loss=0.3159, pruned_loss=0.0749, over 16879.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2848, pruned_loss=0.0603, over 861704.51 frames. ], batch size: 59, lr: 7.75e-03, grad_scale: 32.0 2023-06-15 13:19:53,430 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.400e+02 1.770e+02 1.939e+02 2.280e+02 3.061e+02, threshold=3.878e+02, percent-clipped=0.0 2023-06-15 13:20:12,887 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=149540.0, ans=0.0 2023-06-15 13:20:16,753 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2023-06-15 13:20:18,135 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149606.66666666666, ans=0.1 2023-06-15 13:20:18,172 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=149606.66666666666, ans=0.0 2023-06-15 13:20:22,759 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149606.66666666666, ans=0.1 2023-06-15 13:20:55,037 INFO [train.py:988] (0/4) Epoch 43, batch 100, loss[loss=0.2057, simple_loss=0.3003, pruned_loss=0.05556, over 18351.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2854, pruned_loss=0.05945, over 1525270.51 frames. ], batch size: 72, lr: 7.74e-03, grad_scale: 32.0 2023-06-15 13:21:48,261 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=149940.0, ans=0.0 2023-06-15 13:22:15,085 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=150006.66666666666, ans=10.0 2023-06-15 13:22:15,364 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=150006.66666666666, ans=0.05 2023-06-15 13:22:22,499 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.23 vs. limit=15.0 2023-06-15 13:22:23,158 INFO [train.py:988] (0/4) Epoch 43, batch 150, loss[loss=0.2007, simple_loss=0.2889, pruned_loss=0.05631, over 19459.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2859, pruned_loss=0.06034, over 2014288.80 frames. ], batch size: 105, lr: 7.73e-03, grad_scale: 32.0 2023-06-15 13:22:23,406 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150073.33333333334, ans=0.1 2023-06-15 13:22:50,679 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.522e+02 1.767e+02 1.914e+02 2.114e+02 3.326e+02, threshold=3.828e+02, percent-clipped=0.0 2023-06-15 13:23:06,448 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=12.0 2023-06-15 13:23:51,949 INFO [train.py:988] (0/4) Epoch 43, batch 200, loss[loss=0.2281, simple_loss=0.3195, pruned_loss=0.06836, over 17036.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2854, pruned_loss=0.06092, over 2385564.74 frames. ], batch size: 60, lr: 7.72e-03, grad_scale: 32.0 2023-06-15 13:23:56,215 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150406.66666666666, ans=0.1 2023-06-15 13:23:59,385 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 13:24:06,117 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2023-06-15 13:24:07,029 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=150406.66666666666, ans=0.07 2023-06-15 13:24:08,716 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=150473.33333333334, ans=10.0 2023-06-15 13:24:38,923 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=150540.0, ans=0.04949747468305833 2023-06-15 13:25:21,381 INFO [train.py:988] (0/4) Epoch 43, batch 250, loss[loss=0.1885, simple_loss=0.2678, pruned_loss=0.05463, over 18606.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2845, pruned_loss=0.0606, over 2706206.30 frames. ], batch size: 80, lr: 7.72e-03, grad_scale: 32.0 2023-06-15 13:25:23,423 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150740.0, ans=0.1 2023-06-15 13:25:28,429 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=150740.0, ans=0.2 2023-06-15 13:25:47,974 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.413e+02 1.795e+02 2.042e+02 2.211e+02 3.400e+02, threshold=4.084e+02, percent-clipped=0.0 2023-06-15 13:25:52,506 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=150806.66666666666, ans=0.0 2023-06-15 13:26:32,954 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=151006.66666666666, ans=0.5 2023-06-15 13:26:47,332 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=151006.66666666666, ans=0.2 2023-06-15 13:26:50,349 INFO [train.py:988] (0/4) Epoch 43, batch 300, loss[loss=0.2006, simple_loss=0.2776, pruned_loss=0.06185, over 20300.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.2847, pruned_loss=0.06044, over 2955483.86 frames. ], batch size: 141, lr: 7.71e-03, grad_scale: 32.0 2023-06-15 13:26:50,707 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=151073.33333333334, ans=0.125 2023-06-15 13:26:59,069 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=151073.33333333334, ans=0.1 2023-06-15 13:27:29,282 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=151206.66666666666, ans=0.0 2023-06-15 13:27:33,261 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2023-06-15 13:28:04,098 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=151340.0, ans=0.0 2023-06-15 13:28:04,763 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2023-06-15 13:28:17,957 INFO [train.py:988] (0/4) Epoch 43, batch 350, loss[loss=0.2213, simple_loss=0.3119, pruned_loss=0.06531, over 17666.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2855, pruned_loss=0.06065, over 3129458.24 frames. ], batch size: 67, lr: 7.70e-03, grad_scale: 64.0 2023-06-15 13:28:32,403 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2023-06-15 13:28:44,426 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.378e+02 1.733e+02 1.920e+02 2.080e+02 2.736e+02, threshold=3.841e+02, percent-clipped=0.0 2023-06-15 13:29:01,251 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=151540.0, ans=0.0 2023-06-15 13:29:04,739 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=151540.0, ans=0.125 2023-06-15 13:29:28,549 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2023-06-15 13:29:47,166 INFO [train.py:988] (0/4) Epoch 43, batch 400, loss[loss=0.2025, simple_loss=0.2937, pruned_loss=0.05567, over 19438.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.285, pruned_loss=0.0601, over 3277295.17 frames. ], batch size: 105, lr: 7.69e-03, grad_scale: 32.0 2023-06-15 13:29:58,421 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-06-15 13:30:15,894 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.10 vs. limit=12.0 2023-06-15 13:30:42,527 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=151940.0, ans=0.2 2023-06-15 13:30:54,388 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 13:31:16,470 INFO [train.py:988] (0/4) Epoch 43, batch 450, loss[loss=0.2149, simple_loss=0.2894, pruned_loss=0.0702, over 20104.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2845, pruned_loss=0.06015, over 3405179.63 frames. ], batch size: 133, lr: 7.69e-03, grad_scale: 32.0 2023-06-15 13:31:30,539 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=152073.33333333334, ans=0.0 2023-06-15 13:31:43,609 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.450e+02 1.899e+02 2.101e+02 2.447e+02 3.803e+02, threshold=4.202e+02, percent-clipped=0.0 2023-06-15 13:32:16,325 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=152273.33333333334, ans=0.125 2023-06-15 13:32:42,696 INFO [train.py:988] (0/4) Epoch 43, batch 500, loss[loss=0.201, simple_loss=0.2865, pruned_loss=0.05776, over 18795.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.2852, pruned_loss=0.05964, over 3492375.79 frames. ], batch size: 83, lr: 7.68e-03, grad_scale: 32.0 2023-06-15 13:32:53,222 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=152406.66666666666, ans=0.0 2023-06-15 13:33:11,805 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=152473.33333333334, ans=0.0 2023-06-15 13:33:23,332 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=152540.0, ans=0.125 2023-06-15 13:33:37,239 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-43.pt 2023-06-15 13:34:02,662 INFO [train.py:988] (0/4) Epoch 44, batch 0, loss[loss=0.2162, simple_loss=0.2803, pruned_loss=0.07601, over 19848.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2803, pruned_loss=0.07601, over 19848.00 frames. ], batch size: 293, lr: 7.58e-03, grad_scale: 32.0 2023-06-15 13:34:02,663 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 13:34:08,930 INFO [train.py:1020] (0/4) Epoch 44, validation: loss=0.204, simple_loss=0.3011, pruned_loss=0.05343, over 143649.00 frames. 2023-06-15 13:34:08,930 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 13:34:22,157 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.04 vs. limit=10.0 2023-06-15 13:34:38,549 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=152693.33333333334, ans=0.125 2023-06-15 13:34:59,351 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=152826.66666666666, ans=0.0 2023-06-15 13:35:06,267 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 1.836e+02 2.115e+02 2.307e+02 4.215e+02, threshold=4.230e+02, percent-clipped=1.0 2023-06-15 13:35:08,417 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=152826.66666666666, ans=0.125 2023-06-15 13:35:22,803 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=22.5 2023-06-15 13:35:35,455 INFO [train.py:988] (0/4) Epoch 44, batch 50, loss[loss=0.2033, simple_loss=0.2833, pruned_loss=0.06165, over 20451.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2808, pruned_loss=0.06071, over 864936.36 frames. ], batch size: 160, lr: 7.58e-03, grad_scale: 32.0 2023-06-15 13:35:37,199 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.57 vs. limit=10.0 2023-06-15 13:35:41,831 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152960.0, ans=0.1 2023-06-15 13:35:47,569 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=152960.0, ans=0.125 2023-06-15 13:36:16,127 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=153093.33333333334, ans=0.04949747468305833 2023-06-15 13:36:36,418 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=153160.0, ans=0.125 2023-06-15 13:36:41,496 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=153160.0, ans=0.125 2023-06-15 13:36:43,341 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=153160.0, ans=0.125 2023-06-15 13:37:03,133 INFO [train.py:988] (0/4) Epoch 44, batch 100, loss[loss=0.2163, simple_loss=0.3081, pruned_loss=0.06223, over 16778.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2831, pruned_loss=0.05942, over 1506980.71 frames. ], batch size: 59, lr: 7.57e-03, grad_scale: 32.0 2023-06-15 13:37:05,151 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=153293.33333333334, ans=0.0 2023-06-15 13:37:05,185 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=153293.33333333334, ans=0.0 2023-06-15 13:37:21,233 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=153360.0, ans=0.125 2023-06-15 13:37:31,338 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=153360.0, ans=0.0 2023-06-15 13:37:42,918 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=153426.66666666666, ans=0.1 2023-06-15 13:37:54,705 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2023-06-15 13:37:59,271 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=153493.33333333334, ans=0.125 2023-06-15 13:37:59,580 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153493.33333333334, ans=0.1 2023-06-15 13:38:03,100 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.517e+02 1.855e+02 2.123e+02 2.461e+02 3.591e+02, threshold=4.246e+02, percent-clipped=0.0 2023-06-15 13:38:07,919 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.88 vs. limit=10.0 2023-06-15 13:38:24,849 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=153560.0, ans=0.125 2023-06-15 13:38:32,994 INFO [train.py:988] (0/4) Epoch 44, batch 150, loss[loss=0.211, simple_loss=0.2869, pruned_loss=0.06749, over 20143.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2837, pruned_loss=0.05961, over 2018560.55 frames. ], batch size: 133, lr: 7.56e-03, grad_scale: 32.0 2023-06-15 13:38:54,281 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=153693.33333333334, ans=0.125 2023-06-15 13:39:23,928 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=12.0 2023-06-15 13:40:02,702 INFO [train.py:988] (0/4) Epoch 44, batch 200, loss[loss=0.2011, simple_loss=0.2811, pruned_loss=0.06056, over 18608.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2834, pruned_loss=0.05944, over 2413287.54 frames. ], batch size: 80, lr: 7.56e-03, grad_scale: 32.0 2023-06-15 13:40:10,539 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=153960.0, ans=0.1 2023-06-15 13:41:02,102 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.414e+02 1.737e+02 1.890e+02 2.056e+02 2.866e+02, threshold=3.780e+02, percent-clipped=0.0 2023-06-15 13:41:19,450 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-06-15 13:41:32,041 INFO [train.py:988] (0/4) Epoch 44, batch 250, loss[loss=0.2107, simple_loss=0.3074, pruned_loss=0.05703, over 17615.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2841, pruned_loss=0.05946, over 2709083.89 frames. ], batch size: 67, lr: 7.55e-03, grad_scale: 32.0 2023-06-15 13:41:35,955 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=154293.33333333334, ans=0.2 2023-06-15 13:41:44,667 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=154293.33333333334, ans=0.125 2023-06-15 13:42:17,663 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=154426.66666666666, ans=10.0 2023-06-15 13:42:43,136 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=154560.0, ans=0.025 2023-06-15 13:43:00,297 INFO [train.py:988] (0/4) Epoch 44, batch 300, loss[loss=0.2291, simple_loss=0.3048, pruned_loss=0.07674, over 20015.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2841, pruned_loss=0.0593, over 2951329.09 frames. ], batch size: 126, lr: 7.54e-03, grad_scale: 32.0 2023-06-15 13:43:34,834 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=154760.0, ans=0.2 2023-06-15 13:44:00,066 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.445e+02 1.826e+02 2.061e+02 2.441e+02 3.264e+02, threshold=4.122e+02, percent-clipped=0.0 2023-06-15 13:44:01,085 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-06-15 13:44:27,392 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2023-06-15 13:44:30,149 INFO [train.py:988] (0/4) Epoch 44, batch 350, loss[loss=0.1928, simple_loss=0.2793, pruned_loss=0.05309, over 19136.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2835, pruned_loss=0.05938, over 3148103.58 frames. ], batch size: 94, lr: 7.53e-03, grad_scale: 32.0 2023-06-15 13:45:03,317 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=155026.66666666666, ans=0.125 2023-06-15 13:45:05,831 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=15.0 2023-06-15 13:45:12,332 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=155093.33333333334, ans=0.125 2023-06-15 13:45:18,976 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=155093.33333333334, ans=0.125 2023-06-15 13:45:59,546 INFO [train.py:988] (0/4) Epoch 44, batch 400, loss[loss=0.2222, simple_loss=0.3058, pruned_loss=0.06927, over 16398.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.283, pruned_loss=0.05924, over 3289184.01 frames. ], batch size: 52, lr: 7.53e-03, grad_scale: 32.0 2023-06-15 13:46:08,863 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2023-06-15 13:46:57,029 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.440e+02 1.791e+02 1.946e+02 2.236e+02 4.124e+02, threshold=3.892e+02, percent-clipped=1.0 2023-06-15 13:47:26,919 INFO [train.py:988] (0/4) Epoch 44, batch 450, loss[loss=0.2124, simple_loss=0.3044, pruned_loss=0.06025, over 15142.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.284, pruned_loss=0.05956, over 3402643.49 frames. ], batch size: 43, lr: 7.52e-03, grad_scale: 32.0 2023-06-15 13:47:33,136 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=12.0 2023-06-15 13:47:41,350 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=155626.66666666666, ans=0.125 2023-06-15 13:47:42,998 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=155693.33333333334, ans=0.125 2023-06-15 13:48:46,234 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=155893.33333333334, ans=0.125 2023-06-15 13:48:52,713 INFO [train.py:988] (0/4) Epoch 44, batch 500, loss[loss=0.2047, simple_loss=0.2882, pruned_loss=0.06062, over 18500.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2843, pruned_loss=0.05987, over 3504096.83 frames. ], batch size: 77, lr: 7.51e-03, grad_scale: 32.0 2023-06-15 13:48:55,740 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=22.5 2023-06-15 13:49:08,747 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=156026.66666666666, ans=0.125 2023-06-15 13:49:46,149 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-44.pt 2023-06-15 13:50:05,158 INFO [train.py:988] (0/4) Epoch 45, batch 0, loss[loss=0.2094, simple_loss=0.2893, pruned_loss=0.06477, over 19668.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2893, pruned_loss=0.06477, over 19668.00 frames. ], batch size: 110, lr: 7.42e-03, grad_scale: 32.0 2023-06-15 13:50:05,159 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 13:50:12,412 INFO [train.py:1020] (0/4) Epoch 45, validation: loss=0.2006, simple_loss=0.2992, pruned_loss=0.05105, over 143649.00 frames. 2023-06-15 13:50:12,413 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 13:50:14,100 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.558e+02 1.838e+02 2.044e+02 2.323e+02 3.630e+02, threshold=4.088e+02, percent-clipped=0.0 2023-06-15 13:50:26,795 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156173.33333333334, ans=0.1 2023-06-15 13:50:34,634 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-06-15 13:50:47,053 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2023-06-15 13:51:02,937 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=156306.66666666666, ans=0.125 2023-06-15 13:51:11,577 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=156373.33333333334, ans=0.125 2023-06-15 13:51:18,247 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=156373.33333333334, ans=0.125 2023-06-15 13:51:41,663 INFO [train.py:988] (0/4) Epoch 45, batch 50, loss[loss=0.2058, simple_loss=0.2857, pruned_loss=0.06299, over 20276.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.2867, pruned_loss=0.06175, over 846465.36 frames. ], batch size: 149, lr: 7.41e-03, grad_scale: 32.0 2023-06-15 13:51:55,405 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=156506.66666666666, ans=0.0 2023-06-15 13:52:09,556 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=156573.33333333334, ans=0.125 2023-06-15 13:52:13,727 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=156573.33333333334, ans=0.125 2023-06-15 13:52:24,121 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156640.0, ans=0.1 2023-06-15 13:52:31,000 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=156640.0, ans=0.2 2023-06-15 13:53:10,637 INFO [train.py:988] (0/4) Epoch 45, batch 100, loss[loss=0.2015, simple_loss=0.2767, pruned_loss=0.06317, over 20537.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2862, pruned_loss=0.06063, over 1500210.10 frames. ], batch size: 160, lr: 7.41e-03, grad_scale: 32.0 2023-06-15 13:53:12,188 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.512e+02 1.891e+02 2.087e+02 2.341e+02 3.228e+02, threshold=4.175e+02, percent-clipped=0.0 2023-06-15 13:53:19,947 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.80 vs. limit=22.5 2023-06-15 13:53:35,950 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 13:53:36,204 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=156906.66666666666, ans=0.2 2023-06-15 13:53:50,508 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=156973.33333333334, ans=0.2 2023-06-15 13:54:07,133 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=157040.0, ans=0.125 2023-06-15 13:54:30,790 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-06-15 13:54:33,575 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 13:54:38,931 INFO [train.py:988] (0/4) Epoch 45, batch 150, loss[loss=0.1881, simple_loss=0.2808, pruned_loss=0.0477, over 19226.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2855, pruned_loss=0.06018, over 2002584.44 frames. ], batch size: 92, lr: 7.40e-03, grad_scale: 32.0 2023-06-15 13:54:46,682 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=157173.33333333334, ans=0.125 2023-06-15 13:55:04,037 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.15 vs. limit=10.0 2023-06-15 13:55:39,647 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=157373.33333333334, ans=0.125 2023-06-15 13:55:58,797 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=157440.0, ans=0.025 2023-06-15 13:56:07,416 INFO [train.py:988] (0/4) Epoch 45, batch 200, loss[loss=0.2173, simple_loss=0.2929, pruned_loss=0.07085, over 20334.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2848, pruned_loss=0.0597, over 2396307.44 frames. ], batch size: 149, lr: 7.39e-03, grad_scale: 32.0 2023-06-15 13:56:09,111 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.387e+02 1.822e+02 1.971e+02 2.216e+02 3.677e+02, threshold=3.943e+02, percent-clipped=0.0 2023-06-15 13:56:17,642 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157506.66666666666, ans=0.1 2023-06-15 13:56:42,225 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.12 vs. limit=22.5 2023-06-15 13:56:49,961 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=157640.0, ans=0.0 2023-06-15 13:56:50,813 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.20 vs. limit=10.0 2023-06-15 13:57:20,728 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.62 vs. limit=6.0 2023-06-15 13:57:23,703 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=157773.33333333334, ans=0.125 2023-06-15 13:57:35,488 INFO [train.py:988] (0/4) Epoch 45, batch 250, loss[loss=0.2187, simple_loss=0.2851, pruned_loss=0.07613, over 20013.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.2856, pruned_loss=0.05972, over 2695588.38 frames. ], batch size: 293, lr: 7.39e-03, grad_scale: 32.0 2023-06-15 13:57:39,072 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=157840.0, ans=0.125 2023-06-15 13:57:50,962 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=157840.0, ans=0.125 2023-06-15 13:58:06,009 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157906.66666666666, ans=0.1 2023-06-15 13:58:11,395 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=22.5 2023-06-15 13:58:26,435 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=158040.0, ans=0.09899494936611666 2023-06-15 13:58:51,181 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2023-06-15 13:59:02,820 INFO [train.py:988] (0/4) Epoch 45, batch 300, loss[loss=0.2117, simple_loss=0.2924, pruned_loss=0.06548, over 18771.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.2849, pruned_loss=0.0592, over 2944896.71 frames. ], batch size: 83, lr: 7.38e-03, grad_scale: 32.0 2023-06-15 13:59:04,367 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.486e+02 1.818e+02 2.016e+02 2.367e+02 3.194e+02, threshold=4.032e+02, percent-clipped=0.0 2023-06-15 13:59:14,512 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=158173.33333333334, ans=0.125 2023-06-15 13:59:19,178 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-06-15 13:59:29,106 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=12.0 2023-06-15 13:59:47,602 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158306.66666666666, ans=0.1 2023-06-15 13:59:49,694 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-06-15 13:59:52,778 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=158306.66666666666, ans=0.125 2023-06-15 14:00:31,870 INFO [train.py:988] (0/4) Epoch 45, batch 350, loss[loss=0.201, simple_loss=0.2901, pruned_loss=0.05601, over 18472.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2843, pruned_loss=0.05967, over 3130262.75 frames. ], batch size: 77, lr: 7.37e-03, grad_scale: 32.0 2023-06-15 14:00:44,351 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158506.66666666666, ans=0.125 2023-06-15 14:01:09,567 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=158640.0, ans=0.0 2023-06-15 14:01:15,397 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=158640.0, ans=0.0 2023-06-15 14:01:37,318 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=158706.66666666666, ans=0.125 2023-06-15 14:01:43,665 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=158773.33333333334, ans=0.125 2023-06-15 14:01:43,750 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=158773.33333333334, ans=0.125 2023-06-15 14:01:49,011 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158773.33333333334, ans=0.1 2023-06-15 14:02:00,908 INFO [train.py:988] (0/4) Epoch 45, batch 400, loss[loss=0.2262, simple_loss=0.3142, pruned_loss=0.06908, over 15516.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2838, pruned_loss=0.05949, over 3277077.34 frames. ], batch size: 44, lr: 7.36e-03, grad_scale: 32.0 2023-06-15 14:02:02,525 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.377e+02 1.797e+02 1.967e+02 2.284e+02 3.128e+02, threshold=3.934e+02, percent-clipped=0.0 2023-06-15 14:02:33,826 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-06-15 14:03:13,631 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=159106.66666666666, ans=0.125 2023-06-15 14:03:25,333 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=159106.66666666666, ans=0.125 2023-06-15 14:03:28,282 INFO [train.py:988] (0/4) Epoch 45, batch 450, loss[loss=0.204, simple_loss=0.2936, pruned_loss=0.05716, over 16366.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.283, pruned_loss=0.05938, over 3387843.84 frames. ], batch size: 52, lr: 7.36e-03, grad_scale: 16.0 2023-06-15 14:04:29,119 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2023-06-15 14:04:52,856 INFO [train.py:988] (0/4) Epoch 45, batch 500, loss[loss=0.1959, simple_loss=0.2819, pruned_loss=0.05496, over 18650.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2831, pruned_loss=0.05947, over 3485410.41 frames. ], batch size: 80, lr: 7.35e-03, grad_scale: 16.0 2023-06-15 14:04:56,041 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.475e+02 1.833e+02 2.042e+02 2.435e+02 3.752e+02, threshold=4.085e+02, percent-clipped=0.0 2023-06-15 14:05:22,622 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-06-15 14:05:33,056 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=159640.0, ans=0.125 2023-06-15 14:05:46,695 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-45.pt 2023-06-15 14:06:16,046 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=159726.66666666666, ans=0.0 2023-06-15 14:06:16,352 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-06-15 14:06:17,151 INFO [train.py:988] (0/4) Epoch 46, batch 0, loss[loss=0.1964, simple_loss=0.2703, pruned_loss=0.06124, over 20255.00 frames. ], tot_loss[loss=0.1964, simple_loss=0.2703, pruned_loss=0.06124, over 20255.00 frames. ], batch size: 141, lr: 7.27e-03, grad_scale: 32.0 2023-06-15 14:06:17,152 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 14:06:23,568 INFO [train.py:1020] (0/4) Epoch 46, validation: loss=0.2018, simple_loss=0.3001, pruned_loss=0.05177, over 143649.00 frames. 2023-06-15 14:06:23,569 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 14:06:35,354 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=159726.66666666666, ans=0.2 2023-06-15 14:06:40,916 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159793.33333333334, ans=0.1 2023-06-15 14:06:42,283 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=159793.33333333334, ans=0.125 2023-06-15 14:06:45,790 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=159793.33333333334, ans=0.125 2023-06-15 14:07:12,237 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=159860.0, ans=0.2 2023-06-15 14:07:32,447 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/checkpoint-24000.pt 2023-06-15 14:07:39,030 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=159993.33333333334, ans=0.1 2023-06-15 14:07:39,740 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.17 vs. limit=22.5 2023-06-15 14:07:53,063 INFO [train.py:988] (0/4) Epoch 46, batch 50, loss[loss=0.2033, simple_loss=0.2882, pruned_loss=0.05919, over 18460.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2801, pruned_loss=0.05899, over 858691.88 frames. ], batch size: 77, lr: 7.26e-03, grad_scale: 32.0 2023-06-15 14:08:02,048 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2023-06-15 14:08:21,652 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-06-15 14:08:25,939 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.498e+02 1.799e+02 2.021e+02 2.409e+02 3.297e+02, threshold=4.042e+02, percent-clipped=0.0 2023-06-15 14:08:46,278 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-06-15 14:09:09,573 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=160326.66666666666, ans=0.0 2023-06-15 14:09:19,621 INFO [train.py:988] (0/4) Epoch 46, batch 100, loss[loss=0.1925, simple_loss=0.2773, pruned_loss=0.05382, over 19789.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2827, pruned_loss=0.05855, over 1515837.77 frames. ], batch size: 115, lr: 7.25e-03, grad_scale: 16.0 2023-06-15 14:09:37,030 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=160460.0, ans=0.0 2023-06-15 14:09:40,101 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=160460.0, ans=0.125 2023-06-15 14:09:59,412 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=160526.66666666666, ans=0.0 2023-06-15 14:10:35,210 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 14:10:45,277 INFO [train.py:988] (0/4) Epoch 46, batch 150, loss[loss=0.2004, simple_loss=0.2746, pruned_loss=0.0631, over 20313.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2832, pruned_loss=0.05858, over 2004920.09 frames. ], batch size: 149, lr: 7.24e-03, grad_scale: 16.0 2023-06-15 14:10:50,614 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=160726.66666666666, ans=0.125 2023-06-15 14:11:01,413 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=160793.33333333334, ans=0.125 2023-06-15 14:11:19,623 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.420e+02 1.724e+02 1.880e+02 2.096e+02 2.711e+02, threshold=3.761e+02, percent-clipped=0.0 2023-06-15 14:11:59,422 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=160993.33333333334, ans=0.125 2023-06-15 14:12:09,134 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=12.0 2023-06-15 14:12:11,663 INFO [train.py:988] (0/4) Epoch 46, batch 200, loss[loss=0.1896, simple_loss=0.2761, pruned_loss=0.05151, over 18608.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2831, pruned_loss=0.05824, over 2398209.38 frames. ], batch size: 80, lr: 7.24e-03, grad_scale: 16.0 2023-06-15 14:12:36,514 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=161126.66666666666, ans=0.0 2023-06-15 14:12:56,099 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=161193.33333333334, ans=0.125 2023-06-15 14:13:28,194 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=161326.66666666666, ans=0.125 2023-06-15 14:13:32,072 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=161326.66666666666, ans=0.0 2023-06-15 14:13:36,121 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.65 vs. limit=10.0 2023-06-15 14:13:38,826 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=161393.33333333334, ans=0.125 2023-06-15 14:13:40,040 INFO [train.py:988] (0/4) Epoch 46, batch 250, loss[loss=0.2041, simple_loss=0.2768, pruned_loss=0.06566, over 20632.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2825, pruned_loss=0.05913, over 2709930.63 frames. ], batch size: 211, lr: 7.23e-03, grad_scale: 16.0 2023-06-15 14:13:57,283 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=161460.0, ans=0.125 2023-06-15 14:13:59,732 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=161460.0, ans=0.2 2023-06-15 14:14:15,413 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+02 1.802e+02 2.040e+02 2.477e+02 3.551e+02, threshold=4.080e+02, percent-clipped=0.0 2023-06-15 14:14:22,359 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161526.66666666666, ans=0.1 2023-06-15 14:14:32,765 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=161593.33333333334, ans=0.125 2023-06-15 14:14:38,301 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=161593.33333333334, ans=0.2 2023-06-15 14:14:40,545 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-06-15 14:14:51,895 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=161660.0, ans=0.125 2023-06-15 14:15:07,642 INFO [train.py:988] (0/4) Epoch 46, batch 300, loss[loss=0.2079, simple_loss=0.2913, pruned_loss=0.06228, over 19547.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2826, pruned_loss=0.05836, over 2951795.07 frames. ], batch size: 102, lr: 7.22e-03, grad_scale: 16.0 2023-06-15 14:15:15,072 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.09 vs. limit=22.5 2023-06-15 14:15:44,111 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=161860.0, ans=0.125 2023-06-15 14:16:13,831 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=161926.66666666666, ans=0.125 2023-06-15 14:16:16,546 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=22.5 2023-06-15 14:16:32,069 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=161993.33333333334, ans=0.125 2023-06-15 14:16:35,641 INFO [train.py:988] (0/4) Epoch 46, batch 350, loss[loss=0.2076, simple_loss=0.2978, pruned_loss=0.05872, over 18312.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2828, pruned_loss=0.05813, over 3124922.02 frames. ], batch size: 72, lr: 7.22e-03, grad_scale: 16.0 2023-06-15 14:16:37,562 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=162060.0, ans=0.0 2023-06-15 14:16:39,248 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=162060.0, ans=0.125 2023-06-15 14:16:55,376 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-06-15 14:17:08,987 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=162193.33333333334, ans=0.125 2023-06-15 14:17:10,329 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.494e+02 1.820e+02 1.979e+02 2.259e+02 3.140e+02, threshold=3.959e+02, percent-clipped=0.0 2023-06-15 14:17:13,047 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=162193.33333333334, ans=0.0 2023-06-15 14:17:20,104 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=162193.33333333334, ans=0.125 2023-06-15 14:17:34,654 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-06-15 14:17:41,241 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=162260.0, ans=0.125 2023-06-15 14:17:49,403 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2023-06-15 14:18:04,277 INFO [train.py:988] (0/4) Epoch 46, batch 400, loss[loss=0.199, simple_loss=0.2677, pruned_loss=0.06511, over 20135.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.282, pruned_loss=0.05818, over 3270552.03 frames. ], batch size: 239, lr: 7.21e-03, grad_scale: 32.0 2023-06-15 14:18:06,434 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=162393.33333333334, ans=0.2 2023-06-15 14:18:35,053 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=162460.0, ans=15.0 2023-06-15 14:18:42,567 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=162526.66666666666, ans=0.0 2023-06-15 14:18:46,598 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=162526.66666666666, ans=0.0 2023-06-15 14:19:00,513 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=162593.33333333334, ans=0.0 2023-06-15 14:19:33,237 INFO [train.py:988] (0/4) Epoch 46, batch 450, loss[loss=0.2252, simple_loss=0.3167, pruned_loss=0.06685, over 16025.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2814, pruned_loss=0.05825, over 3399301.64 frames. ], batch size: 51, lr: 7.20e-03, grad_scale: 32.0 2023-06-15 14:19:38,731 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=162726.66666666666, ans=0.125 2023-06-15 14:20:07,069 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.498e+02 1.857e+02 2.090e+02 2.387e+02 3.299e+02, threshold=4.180e+02, percent-clipped=0.0 2023-06-15 14:20:34,536 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162926.66666666666, ans=0.1 2023-06-15 14:20:57,265 INFO [train.py:988] (0/4) Epoch 46, batch 500, loss[loss=0.1979, simple_loss=0.2756, pruned_loss=0.0601, over 20594.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.282, pruned_loss=0.0587, over 3480121.14 frames. ], batch size: 173, lr: 7.20e-03, grad_scale: 32.0 2023-06-15 14:21:41,327 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 14:21:49,966 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-46.pt 2023-06-15 14:22:12,864 INFO [train.py:988] (0/4) Epoch 47, batch 0, loss[loss=0.219, simple_loss=0.3042, pruned_loss=0.06687, over 16317.00 frames. ], tot_loss[loss=0.219, simple_loss=0.3042, pruned_loss=0.06687, over 16317.00 frames. ], batch size: 52, lr: 7.11e-03, grad_scale: 32.0 2023-06-15 14:22:12,865 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 14:22:19,326 INFO [train.py:1020] (0/4) Epoch 47, validation: loss=0.2046, simple_loss=0.3006, pruned_loss=0.05427, over 143649.00 frames. 2023-06-15 14:22:19,327 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 14:22:24,427 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=163280.0, ans=0.0 2023-06-15 14:22:24,521 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=163280.0, ans=0.0 2023-06-15 14:22:52,938 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163413.33333333334, ans=0.1 2023-06-15 14:23:23,394 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.400e+02 1.765e+02 1.978e+02 2.218e+02 3.606e+02, threshold=3.956e+02, percent-clipped=0.0 2023-06-15 14:23:31,044 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=163546.66666666666, ans=0.125 2023-06-15 14:23:46,074 INFO [train.py:988] (0/4) Epoch 47, batch 50, loss[loss=0.2104, simple_loss=0.3003, pruned_loss=0.06022, over 16963.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2829, pruned_loss=0.0571, over 844398.18 frames. ], batch size: 60, lr: 7.11e-03, grad_scale: 32.0 2023-06-15 14:24:16,251 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=163680.0, ans=0.0 2023-06-15 14:24:21,795 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163746.66666666666, ans=0.1 2023-06-15 14:24:35,081 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 14:24:36,789 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=163813.33333333334, ans=0.1 2023-06-15 14:24:38,159 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.57 vs. limit=15.0 2023-06-15 14:25:01,523 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=163880.0, ans=0.125 2023-06-15 14:25:13,275 INFO [train.py:988] (0/4) Epoch 47, batch 100, loss[loss=0.1904, simple_loss=0.2776, pruned_loss=0.05163, over 18630.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.282, pruned_loss=0.0587, over 1507482.66 frames. ], batch size: 80, lr: 7.10e-03, grad_scale: 32.0 2023-06-15 14:25:14,591 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2023-06-15 14:25:55,587 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=164080.0, ans=0.0 2023-06-15 14:26:00,871 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=164080.0, ans=0.2 2023-06-15 14:26:02,481 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=164080.0, ans=0.125 2023-06-15 14:26:02,602 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164080.0, ans=0.1 2023-06-15 14:26:17,524 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.364e+02 1.736e+02 1.928e+02 2.284e+02 3.416e+02, threshold=3.856e+02, percent-clipped=0.0 2023-06-15 14:26:18,081 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=164146.66666666666, ans=0.0 2023-06-15 14:26:19,575 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=164146.66666666666, ans=0.2 2023-06-15 14:26:24,603 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=164213.33333333334, ans=0.125 2023-06-15 14:26:27,857 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=164213.33333333334, ans=0.125 2023-06-15 14:26:37,628 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=164213.33333333334, ans=0.0 2023-06-15 14:26:39,452 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=164280.0, ans=0.125 2023-06-15 14:26:41,349 INFO [train.py:988] (0/4) Epoch 47, batch 150, loss[loss=0.1895, simple_loss=0.2768, pruned_loss=0.05104, over 19112.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.283, pruned_loss=0.05734, over 2007063.72 frames. ], batch size: 94, lr: 7.10e-03, grad_scale: 32.0 2023-06-15 14:26:41,681 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=164280.0, ans=0.125 2023-06-15 14:26:55,022 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164280.0, ans=0.1 2023-06-15 14:27:00,654 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 14:27:20,046 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164413.33333333334, ans=0.1 2023-06-15 14:27:26,562 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-06-15 14:27:31,230 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=164413.33333333334, ans=0.0 2023-06-15 14:27:34,370 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=164480.0, ans=0.0 2023-06-15 14:27:49,255 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=15.0 2023-06-15 14:28:08,848 INFO [train.py:988] (0/4) Epoch 47, batch 200, loss[loss=0.1827, simple_loss=0.2689, pruned_loss=0.04829, over 19562.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2835, pruned_loss=0.05726, over 2404189.83 frames. ], batch size: 102, lr: 7.09e-03, grad_scale: 32.0 2023-06-15 14:28:38,958 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164680.0, ans=0.1 2023-06-15 14:28:45,538 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=164746.66666666666, ans=0.125 2023-06-15 14:29:14,105 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.455e+02 1.808e+02 2.037e+02 2.330e+02 4.045e+02, threshold=4.073e+02, percent-clipped=1.0 2023-06-15 14:29:29,562 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=164880.0, ans=0.1 2023-06-15 14:29:36,286 INFO [train.py:988] (0/4) Epoch 47, batch 250, loss[loss=0.2154, simple_loss=0.2879, pruned_loss=0.07142, over 20261.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.2845, pruned_loss=0.05783, over 2690032.68 frames. ], batch size: 141, lr: 7.08e-03, grad_scale: 32.0 2023-06-15 14:30:04,090 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.79 vs. limit=22.5 2023-06-15 14:30:10,496 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=22.5 2023-06-15 14:30:14,237 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=12.0 2023-06-15 14:30:24,975 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-06-15 14:30:25,885 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165080.0, ans=0.1 2023-06-15 14:30:42,148 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=165146.66666666666, ans=0.0 2023-06-15 14:30:52,757 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=165213.33333333334, ans=0.0 2023-06-15 14:31:04,910 INFO [train.py:988] (0/4) Epoch 47, batch 300, loss[loss=0.2046, simple_loss=0.292, pruned_loss=0.0586, over 18789.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.285, pruned_loss=0.05803, over 2922550.78 frames. ], batch size: 83, lr: 7.08e-03, grad_scale: 32.0 2023-06-15 14:31:06,908 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=165280.0, ans=0.0 2023-06-15 14:31:16,017 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=165280.0, ans=0.125 2023-06-15 14:31:18,288 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=165280.0, ans=0.125 2023-06-15 14:31:34,655 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=165346.66666666666, ans=0.2 2023-06-15 14:31:52,149 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=165413.33333333334, ans=0.0 2023-06-15 14:32:08,597 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=165480.0, ans=0.125 2023-06-15 14:32:11,339 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.401e+02 1.863e+02 2.079e+02 2.322e+02 4.081e+02, threshold=4.157e+02, percent-clipped=1.0 2023-06-15 14:32:16,798 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=165546.66666666666, ans=0.125 2023-06-15 14:32:33,707 INFO [train.py:988] (0/4) Epoch 47, batch 350, loss[loss=0.205, simple_loss=0.2864, pruned_loss=0.0618, over 18626.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2844, pruned_loss=0.05783, over 3110819.23 frames. ], batch size: 80, lr: 7.07e-03, grad_scale: 32.0 2023-06-15 14:33:48,703 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-06-15 14:33:52,792 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=165880.0, ans=0.125 2023-06-15 14:33:53,019 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=165880.0, ans=0.04949747468305833 2023-06-15 14:34:03,133 INFO [train.py:988] (0/4) Epoch 47, batch 400, loss[loss=0.2162, simple_loss=0.3122, pruned_loss=0.06004, over 15440.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2842, pruned_loss=0.05814, over 3260449.20 frames. ], batch size: 44, lr: 7.06e-03, grad_scale: 32.0 2023-06-15 14:34:24,085 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=22.5 2023-06-15 14:34:28,592 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=166013.33333333334, ans=0.125 2023-06-15 14:34:43,294 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=166080.0, ans=0.2 2023-06-15 14:34:57,390 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=166146.66666666666, ans=0.125 2023-06-15 14:35:05,927 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=166146.66666666666, ans=0.0 2023-06-15 14:35:07,491 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=166146.66666666666, ans=0.0 2023-06-15 14:35:08,765 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+02 1.781e+02 2.023e+02 2.326e+02 3.205e+02, threshold=4.047e+02, percent-clipped=0.0 2023-06-15 14:35:32,248 INFO [train.py:988] (0/4) Epoch 47, batch 450, loss[loss=0.2015, simple_loss=0.2755, pruned_loss=0.06376, over 19955.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.283, pruned_loss=0.05839, over 3390803.12 frames. ], batch size: 126, lr: 7.06e-03, grad_scale: 32.0 2023-06-15 14:35:34,253 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=166280.0, ans=0.0 2023-06-15 14:35:50,161 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=166346.66666666666, ans=0.0 2023-06-15 14:36:01,961 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=22.5 2023-06-15 14:36:06,262 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=166413.33333333334, ans=0.2 2023-06-15 14:36:14,549 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=166413.33333333334, ans=0.125 2023-06-15 14:36:31,938 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=166480.0, ans=0.125 2023-06-15 14:36:40,021 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=166546.66666666666, ans=0.1 2023-06-15 14:36:42,330 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2023-06-15 14:36:43,180 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166546.66666666666, ans=0.1 2023-06-15 14:36:46,444 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=166546.66666666666, ans=0.2 2023-06-15 14:36:50,151 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-06-15 14:36:58,234 INFO [train.py:988] (0/4) Epoch 47, batch 500, loss[loss=0.2142, simple_loss=0.2965, pruned_loss=0.06593, over 18921.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2824, pruned_loss=0.05819, over 3477033.73 frames. ], batch size: 86, lr: 7.05e-03, grad_scale: 32.0 2023-06-15 14:37:10,426 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=166613.33333333334, ans=0.125 2023-06-15 14:37:39,175 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=166746.66666666666, ans=0.125 2023-06-15 14:37:45,662 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=166813.33333333334, ans=0.0 2023-06-15 14:37:50,082 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-47.pt 2023-06-15 14:38:11,467 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=166826.66666666666, ans=0.125 2023-06-15 14:38:12,616 INFO [train.py:988] (0/4) Epoch 48, batch 0, loss[loss=0.1895, simple_loss=0.2752, pruned_loss=0.05193, over 19862.00 frames. ], tot_loss[loss=0.1895, simple_loss=0.2752, pruned_loss=0.05193, over 19862.00 frames. ], batch size: 120, lr: 6.97e-03, grad_scale: 32.0 2023-06-15 14:38:12,617 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 14:38:18,656 INFO [train.py:1020] (0/4) Epoch 48, validation: loss=0.1998, simple_loss=0.298, pruned_loss=0.05082, over 143649.00 frames. 2023-06-15 14:38:18,657 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 14:38:26,900 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.394e+02 1.746e+02 1.946e+02 2.285e+02 3.541e+02, threshold=3.892e+02, percent-clipped=0.0 2023-06-15 14:39:44,293 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=167093.33333333334, ans=0.5 2023-06-15 14:39:47,297 INFO [train.py:988] (0/4) Epoch 48, batch 50, loss[loss=0.182, simple_loss=0.273, pruned_loss=0.04553, over 19894.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2797, pruned_loss=0.05738, over 840023.55 frames. ], batch size: 120, lr: 6.96e-03, grad_scale: 32.0 2023-06-15 14:40:01,941 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 14:40:24,933 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167293.33333333334, ans=0.1 2023-06-15 14:41:00,149 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.07 vs. limit=10.0 2023-06-15 14:41:01,742 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=167426.66666666666, ans=0.125 2023-06-15 14:41:08,102 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-06-15 14:41:15,603 INFO [train.py:988] (0/4) Epoch 48, batch 100, loss[loss=0.1901, simple_loss=0.2796, pruned_loss=0.05031, over 19834.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2814, pruned_loss=0.05811, over 1493238.89 frames. ], batch size: 115, lr: 6.96e-03, grad_scale: 32.0 2023-06-15 14:41:25,085 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.447e+02 1.839e+02 2.012e+02 2.249e+02 3.194e+02, threshold=4.023e+02, percent-clipped=0.0 2023-06-15 14:41:37,899 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=167560.0, ans=0.125 2023-06-15 14:41:46,338 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=167560.0, ans=0.0 2023-06-15 14:42:04,346 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167626.66666666666, ans=0.1 2023-06-15 14:42:23,679 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167693.33333333334, ans=0.1 2023-06-15 14:42:39,214 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=167760.0, ans=0.0 2023-06-15 14:42:43,938 INFO [train.py:988] (0/4) Epoch 48, batch 150, loss[loss=0.1863, simple_loss=0.2776, pruned_loss=0.04747, over 19677.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.2811, pruned_loss=0.05694, over 2019348.15 frames. ], batch size: 110, lr: 6.95e-03, grad_scale: 32.0 2023-06-15 14:43:00,558 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=167893.33333333334, ans=0.2 2023-06-15 14:43:16,698 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167893.33333333334, ans=0.1 2023-06-15 14:43:32,659 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167960.0, ans=0.1 2023-06-15 14:43:40,000 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=168026.66666666666, ans=0.0 2023-06-15 14:43:47,339 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168026.66666666666, ans=0.1 2023-06-15 14:44:05,378 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-06-15 14:44:11,708 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=168160.0, ans=0.125 2023-06-15 14:44:12,978 INFO [train.py:988] (0/4) Epoch 48, batch 200, loss[loss=0.1968, simple_loss=0.2733, pruned_loss=0.06015, over 19967.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2824, pruned_loss=0.05666, over 2418657.60 frames. ], batch size: 126, lr: 6.95e-03, grad_scale: 32.0 2023-06-15 14:44:20,083 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=168160.0, ans=0.0 2023-06-15 14:44:21,894 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.504e+02 1.752e+02 1.958e+02 2.186e+02 2.989e+02, threshold=3.915e+02, percent-clipped=0.0 2023-06-15 14:44:31,530 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2023-06-15 14:45:00,857 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=168293.33333333334, ans=0.0 2023-06-15 14:45:41,716 INFO [train.py:988] (0/4) Epoch 48, batch 250, loss[loss=0.1952, simple_loss=0.2814, pruned_loss=0.05448, over 18426.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2819, pruned_loss=0.05662, over 2732082.75 frames. ], batch size: 77, lr: 6.94e-03, grad_scale: 32.0 2023-06-15 14:45:56,035 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=168493.33333333334, ans=0.125 2023-06-15 14:45:56,291 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-06-15 14:46:12,644 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=168560.0, ans=0.125 2023-06-15 14:47:02,107 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=168760.0, ans=0.0 2023-06-15 14:47:10,236 INFO [train.py:988] (0/4) Epoch 48, batch 300, loss[loss=0.1837, simple_loss=0.27, pruned_loss=0.04867, over 19083.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2819, pruned_loss=0.05733, over 2959756.21 frames. ], batch size: 89, lr: 6.93e-03, grad_scale: 32.0 2023-06-15 14:47:18,059 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=168826.66666666666, ans=0.0 2023-06-15 14:47:19,383 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.542e+02 1.766e+02 2.086e+02 2.478e+02 4.078e+02, threshold=4.173e+02, percent-clipped=1.0 2023-06-15 14:47:20,243 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=22.5 2023-06-15 14:47:32,091 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=168893.33333333334, ans=0.125 2023-06-15 14:48:16,149 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=169026.66666666666, ans=0.2 2023-06-15 14:48:38,571 INFO [train.py:988] (0/4) Epoch 48, batch 350, loss[loss=0.205, simple_loss=0.2881, pruned_loss=0.06091, over 18607.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2818, pruned_loss=0.05733, over 3147045.71 frames. ], batch size: 80, lr: 6.93e-03, grad_scale: 32.0 2023-06-15 14:48:45,551 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=169160.0, ans=0.0 2023-06-15 14:50:05,146 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2023-06-15 14:50:05,735 INFO [train.py:988] (0/4) Epoch 48, batch 400, loss[loss=0.1898, simple_loss=0.269, pruned_loss=0.05534, over 20512.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2815, pruned_loss=0.05737, over 3292691.97 frames. ], batch size: 189, lr: 6.92e-03, grad_scale: 32.0 2023-06-15 14:50:06,291 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=169493.33333333334, ans=0.04949747468305833 2023-06-15 14:50:13,966 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.461e+02 1.792e+02 1.982e+02 2.238e+02 3.664e+02, threshold=3.964e+02, percent-clipped=0.0 2023-06-15 14:50:59,889 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=169693.33333333334, ans=0.2 2023-06-15 14:50:59,908 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=169693.33333333334, ans=0.125 2023-06-15 14:51:07,168 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=169693.33333333334, ans=0.0 2023-06-15 14:51:32,752 INFO [train.py:988] (0/4) Epoch 48, batch 450, loss[loss=0.1874, simple_loss=0.2779, pruned_loss=0.04847, over 19484.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2812, pruned_loss=0.05695, over 3400566.60 frames. ], batch size: 105, lr: 6.91e-03, grad_scale: 32.0 2023-06-15 14:51:45,401 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=169826.66666666666, ans=0.2 2023-06-15 14:51:56,610 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-06-15 14:51:59,649 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=169893.33333333334, ans=0.2 2023-06-15 14:52:31,850 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=170026.66666666666, ans=0.2 2023-06-15 14:52:32,385 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-06-15 14:52:48,147 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=170093.33333333334, ans=0.5 2023-06-15 14:52:57,827 INFO [train.py:988] (0/4) Epoch 48, batch 500, loss[loss=0.1931, simple_loss=0.2841, pruned_loss=0.05103, over 19129.00 frames. ], tot_loss[loss=0.197, simple_loss=0.2804, pruned_loss=0.05681, over 3482163.02 frames. ], batch size: 94, lr: 6.91e-03, grad_scale: 32.0 2023-06-15 14:53:06,274 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.529e+02 1.838e+02 2.045e+02 2.451e+02 3.381e+02, threshold=4.090e+02, percent-clipped=0.0 2023-06-15 14:53:28,937 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2023-06-15 14:53:34,292 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=170293.33333333334, ans=0.0 2023-06-15 14:53:40,759 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 14:53:49,334 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-48.pt 2023-06-15 14:54:12,509 INFO [train.py:988] (0/4) Epoch 49, batch 0, loss[loss=0.1898, simple_loss=0.2711, pruned_loss=0.05429, over 19110.00 frames. ], tot_loss[loss=0.1898, simple_loss=0.2711, pruned_loss=0.05429, over 19110.00 frames. ], batch size: 94, lr: 6.83e-03, grad_scale: 32.0 2023-06-15 14:54:12,510 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 14:54:19,031 INFO [train.py:1020] (0/4) Epoch 49, validation: loss=0.2025, simple_loss=0.2999, pruned_loss=0.05253, over 143649.00 frames. 2023-06-15 14:54:19,032 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 14:54:35,653 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170440.0, ans=0.1 2023-06-15 14:54:40,533 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=170440.0, ans=0.125 2023-06-15 14:55:01,582 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170506.66666666666, ans=0.1 2023-06-15 14:55:04,739 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=170506.66666666666, ans=0.125 2023-06-15 14:55:12,541 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=22.5 2023-06-15 14:55:13,656 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=170573.33333333334, ans=0.125 2023-06-15 14:55:47,862 INFO [train.py:988] (0/4) Epoch 49, batch 50, loss[loss=0.2009, simple_loss=0.2802, pruned_loss=0.06077, over 20614.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2832, pruned_loss=0.05758, over 855760.48 frames. ], batch size: 173, lr: 6.83e-03, grad_scale: 32.0 2023-06-15 14:56:17,363 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=170773.33333333334, ans=0.0 2023-06-15 14:56:19,002 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=170773.33333333334, ans=0.125 2023-06-15 14:56:28,381 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2023-06-15 14:56:31,040 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.526e+02 1.705e+02 1.886e+02 2.163e+02 3.210e+02, threshold=3.772e+02, percent-clipped=0.0 2023-06-15 14:57:13,182 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=170973.33333333334, ans=0.0 2023-06-15 14:57:16,253 INFO [train.py:988] (0/4) Epoch 49, batch 100, loss[loss=0.1793, simple_loss=0.2678, pruned_loss=0.04545, over 19510.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2817, pruned_loss=0.0577, over 1516806.89 frames. ], batch size: 102, lr: 6.82e-03, grad_scale: 32.0 2023-06-15 14:57:18,334 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 14:57:33,053 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=171106.66666666666, ans=0.2 2023-06-15 14:57:55,049 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2023-06-15 14:57:57,564 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=171173.33333333334, ans=0.1 2023-06-15 14:58:23,400 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-06-15 14:58:39,741 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=171306.66666666666, ans=0.125 2023-06-15 14:58:44,425 INFO [train.py:988] (0/4) Epoch 49, batch 150, loss[loss=0.192, simple_loss=0.2711, pruned_loss=0.05649, over 19943.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2831, pruned_loss=0.05785, over 2028517.81 frames. ], batch size: 126, lr: 6.81e-03, grad_scale: 32.0 2023-06-15 14:59:05,922 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2023-06-15 14:59:10,901 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2023-06-15 14:59:27,436 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.416e+02 1.782e+02 1.923e+02 2.190e+02 3.146e+02, threshold=3.845e+02, percent-clipped=0.0 2023-06-15 15:00:12,223 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171706.66666666666, ans=0.1 2023-06-15 15:00:12,515 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=171706.66666666666, ans=0.0 2023-06-15 15:00:13,676 INFO [train.py:988] (0/4) Epoch 49, batch 200, loss[loss=0.1726, simple_loss=0.2643, pruned_loss=0.04042, over 19456.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.2823, pruned_loss=0.05727, over 2418603.70 frames. ], batch size: 105, lr: 6.81e-03, grad_scale: 32.0 2023-06-15 15:00:17,434 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=171706.66666666666, ans=0.125 2023-06-15 15:00:54,336 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=171840.0, ans=0.04949747468305833 2023-06-15 15:01:11,525 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=171906.66666666666, ans=15.0 2023-06-15 15:01:41,439 INFO [train.py:988] (0/4) Epoch 49, batch 250, loss[loss=0.2051, simple_loss=0.2837, pruned_loss=0.06321, over 20658.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2825, pruned_loss=0.05734, over 2719574.49 frames. ], batch size: 211, lr: 6.80e-03, grad_scale: 32.0 2023-06-15 15:01:56,336 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=172040.0, ans=0.125 2023-06-15 15:02:08,071 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=172106.66666666666, ans=0.125 2023-06-15 15:02:16,307 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 15:02:22,898 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.449e+02 1.787e+02 2.024e+02 2.612e+02 4.231e+02, threshold=4.048e+02, percent-clipped=3.0 2023-06-15 15:02:27,275 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=172173.33333333334, ans=0.125 2023-06-15 15:03:09,732 INFO [train.py:988] (0/4) Epoch 49, batch 300, loss[loss=0.1732, simple_loss=0.2619, pruned_loss=0.04226, over 19129.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.2822, pruned_loss=0.05642, over 2950531.12 frames. ], batch size: 94, lr: 6.80e-03, grad_scale: 32.0 2023-06-15 15:03:32,649 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=172440.0, ans=0.125 2023-06-15 15:03:38,646 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2023-06-15 15:03:59,363 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2023-06-15 15:04:21,993 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172640.0, ans=0.1 2023-06-15 15:04:31,563 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-06-15 15:04:33,967 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=172640.0, ans=0.0 2023-06-15 15:04:38,776 INFO [train.py:988] (0/4) Epoch 49, batch 350, loss[loss=0.1872, simple_loss=0.2807, pruned_loss=0.0468, over 18454.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2822, pruned_loss=0.05622, over 3126794.82 frames. ], batch size: 77, lr: 6.79e-03, grad_scale: 32.0 2023-06-15 15:04:44,708 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172706.66666666666, ans=0.1 2023-06-15 15:04:51,798 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=172706.66666666666, ans=0.0 2023-06-15 15:04:56,238 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=172773.33333333334, ans=0.05 2023-06-15 15:05:22,049 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.466e+02 1.759e+02 1.916e+02 2.182e+02 3.623e+02, threshold=3.831e+02, percent-clipped=0.0 2023-06-15 15:06:07,872 INFO [train.py:988] (0/4) Epoch 49, batch 400, loss[loss=0.2232, simple_loss=0.3213, pruned_loss=0.06259, over 18328.00 frames. ], tot_loss[loss=0.1968, simple_loss=0.2815, pruned_loss=0.05607, over 3259022.66 frames. ], batch size: 72, lr: 6.78e-03, grad_scale: 32.0 2023-06-15 15:06:20,820 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-06-15 15:06:50,198 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2023-06-15 15:07:03,665 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=173240.0, ans=0.125 2023-06-15 15:07:25,912 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=173306.66666666666, ans=0.125 2023-06-15 15:07:37,453 INFO [train.py:988] (0/4) Epoch 49, batch 450, loss[loss=0.1964, simple_loss=0.2709, pruned_loss=0.06099, over 20231.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2814, pruned_loss=0.05687, over 3382897.22 frames. ], batch size: 239, lr: 6.78e-03, grad_scale: 32.0 2023-06-15 15:07:55,511 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=173440.0, ans=0.125 2023-06-15 15:08:03,956 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=173440.0, ans=0.0 2023-06-15 15:08:08,714 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=173440.0, ans=0.2 2023-06-15 15:08:13,799 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=173506.66666666666, ans=0.125 2023-06-15 15:08:20,124 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.431e+02 1.766e+02 1.952e+02 2.304e+02 4.249e+02, threshold=3.904e+02, percent-clipped=1.0 2023-06-15 15:08:37,312 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=173573.33333333334, ans=0.125 2023-06-15 15:08:58,058 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=173640.0, ans=22.5 2023-06-15 15:09:03,987 INFO [train.py:988] (0/4) Epoch 49, batch 500, loss[loss=0.1933, simple_loss=0.2761, pruned_loss=0.05522, over 19239.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2813, pruned_loss=0.05719, over 3482773.33 frames. ], batch size: 92, lr: 6.77e-03, grad_scale: 32.0 2023-06-15 15:09:28,902 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=173773.33333333334, ans=0.2 2023-06-15 15:09:31,046 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=173773.33333333334, ans=0.0 2023-06-15 15:09:36,086 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=173840.0, ans=0.125 2023-06-15 15:09:58,708 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-49.pt 2023-06-15 15:10:17,336 INFO [train.py:988] (0/4) Epoch 50, batch 0, loss[loss=0.2004, simple_loss=0.2748, pruned_loss=0.06302, over 20317.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2748, pruned_loss=0.06302, over 20317.00 frames. ], batch size: 239, lr: 6.70e-03, grad_scale: 32.0 2023-06-15 15:10:17,337 INFO [train.py:1011] (0/4) Computing validation loss 2023-06-15 15:10:23,504 INFO [train.py:1020] (0/4) Epoch 50, validation: loss=0.202, simple_loss=0.299, pruned_loss=0.05252, over 143649.00 frames. 2023-06-15 15:10:23,504 INFO [train.py:1021] (0/4) Maximum memory allocated so far is 13775MB 2023-06-15 15:10:45,930 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=12.0 2023-06-15 15:10:47,352 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-06-15 15:11:05,839 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=174060.0, ans=0.07 2023-06-15 15:11:21,364 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=12.0 2023-06-15 15:11:31,070 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174193.33333333334, ans=0.1 2023-06-15 15:11:33,961 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.423e+02 1.754e+02 1.986e+02 2.353e+02 3.317e+02, threshold=3.972e+02, percent-clipped=0.0 2023-06-15 15:11:50,943 INFO [train.py:988] (0/4) Epoch 50, batch 50, loss[loss=0.2091, simple_loss=0.3077, pruned_loss=0.05526, over 17060.00 frames. ], tot_loss[loss=0.1949, simple_loss=0.2783, pruned_loss=0.05571, over 850642.16 frames. ], batch size: 60, lr: 6.69e-03, grad_scale: 32.0 2023-06-15 15:12:43,407 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=174460.0, ans=0.2 2023-06-15 15:12:54,319 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.57 vs. limit=10.0 2023-06-15 15:13:17,581 INFO [train.py:988] (0/4) Epoch 50, batch 100, loss[loss=0.1991, simple_loss=0.2869, pruned_loss=0.05569, over 19451.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.2806, pruned_loss=0.05682, over 1505886.49 frames. ], batch size: 105, lr: 6.69e-03, grad_scale: 32.0 2023-06-15 15:13:21,206 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174593.33333333334, ans=0.1 2023-06-15 15:13:22,828 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=174593.33333333334, ans=0.0 2023-06-15 15:13:37,054 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=174660.0, ans=0.125 2023-06-15 15:13:37,289 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2023-06-15 15:13:51,934 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=174726.66666666666, ans=0.125 2023-06-15 15:13:53,598 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=174726.66666666666, ans=0.125 2023-06-15 15:13:54,982 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=174726.66666666666, ans=0.0 2023-06-15 15:13:55,059 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=174726.66666666666, ans=0.0 2023-06-15 15:14:20,653 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-06-15 15:14:23,518 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=174793.33333333334, ans=0.0 2023-06-15 15:14:28,214 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.477e+02 1.801e+02 1.997e+02 2.286e+02 3.614e+02, threshold=3.994e+02, percent-clipped=0.0 2023-06-15 15:14:43,471 INFO [train.py:988] (0/4) Epoch 50, batch 150, loss[loss=0.1927, simple_loss=0.2751, pruned_loss=0.05516, over 20583.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.2805, pruned_loss=0.05686, over 2012749.35 frames. ], batch size: 173, lr: 6.68e-03, grad_scale: 32.0 2023-06-15 15:15:02,492 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=12.0 2023-06-15 15:15:33,711 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=175060.0, ans=0.0 2023-06-15 15:16:09,937 INFO [train.py:988] (0/4) Epoch 50, batch 200, loss[loss=0.1902, simple_loss=0.268, pruned_loss=0.05618, over 20683.00 frames. ], tot_loss[loss=0.1959, simple_loss=0.2801, pruned_loss=0.05588, over 2421098.54 frames. ], batch size: 211, lr: 6.68e-03, grad_scale: 32.0 2023-06-15 15:16:27,202 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=175326.66666666666, ans=0.1 2023-06-15 15:16:49,770 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=175393.33333333334, ans=0.125 2023-06-15 15:16:51,314 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=175393.33333333334, ans=0.0 2023-06-15 15:17:22,725 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=12.0 2023-06-15 15:17:23,281 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.474e+02 1.770e+02 1.950e+02 2.283e+02 3.292e+02, threshold=3.901e+02, percent-clipped=0.0 2023-06-15 15:17:28,793 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=175526.66666666666, ans=0.0 2023-06-15 15:17:38,322 INFO [train.py:988] (0/4) Epoch 50, batch 250, loss[loss=0.1977, simple_loss=0.2856, pruned_loss=0.05486, over 19476.00 frames. ], tot_loss[loss=0.196, simple_loss=0.2798, pruned_loss=0.05606, over 2732752.71 frames. ], batch size: 105, lr: 6.67e-03, grad_scale: 32.0 2023-06-15 15:18:01,731 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=175660.0, ans=0.125 2023-06-15 15:18:07,324 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-06-15 15:18:22,616 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=175726.66666666666, ans=0.125 2023-06-15 15:18:40,899 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=175793.33333333334, ans=0.2 2023-06-15 15:18:42,517 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=175793.33333333334, ans=0.125 2023-06-15 15:18:54,979 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=175860.0, ans=0.0 2023-06-15 15:19:01,728 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=175860.0, ans=0.0 2023-06-15 15:19:06,266 INFO [train.py:988] (0/4) Epoch 50, batch 300, loss[loss=0.1879, simple_loss=0.2785, pruned_loss=0.0486, over 18313.00 frames. ], tot_loss[loss=0.1955, simple_loss=0.2798, pruned_loss=0.05559, over 2977497.71 frames. ], batch size: 74, lr: 6.66e-03, grad_scale: 16.0 2023-06-15 15:19:30,080 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=175993.33333333334, ans=0.0 2023-06-15 15:19:40,997 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-06-15 15:19:42,284 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176060.0, ans=0.1 2023-06-15 15:19:42,339 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=176060.0, ans=0.125 2023-06-15 15:20:08,830 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=12.0 2023-06-15 15:20:21,161 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.505e+02 1.741e+02 2.011e+02 2.290e+02 4.002e+02, threshold=4.022e+02, percent-clipped=1.0 2023-06-15 15:20:24,870 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=176193.33333333334, ans=0.125 2023-06-15 15:20:34,808 INFO [train.py:988] (0/4) Epoch 50, batch 350, loss[loss=0.1993, simple_loss=0.2851, pruned_loss=0.05673, over 18631.00 frames. ], tot_loss[loss=0.1954, simple_loss=0.2799, pruned_loss=0.05544, over 3163975.81 frames. ], batch size: 80, lr: 6.66e-03, grad_scale: 16.0 2023-06-15 15:20:35,148 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=176260.0, ans=0.125 2023-06-15 15:21:17,999 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=176393.33333333334, ans=0.0 2023-06-15 15:21:46,938 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=176526.66666666666, ans=0.0 2023-06-15 15:21:54,097 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=176526.66666666666, ans=0.0 2023-06-15 15:22:00,733 INFO [scaling.py:1052] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-06-15 15:22:03,791 INFO [train.py:988] (0/4) Epoch 50, batch 400, loss[loss=0.1834, simple_loss=0.2593, pruned_loss=0.05376, over 20728.00 frames. ], tot_loss[loss=0.1951, simple_loss=0.2799, pruned_loss=0.05515, over 3312030.68 frames. ], batch size: 211, lr: 6.65e-03, grad_scale: 32.0 2023-06-15 15:22:11,331 INFO [scaling.py:962] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.17 vs. limit=10.0 2023-06-15 15:22:22,144 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=176660.0, ans=0.125 2023-06-15 15:23:01,485 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=176793.33333333334, ans=0.0 2023-06-15 15:23:17,304 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.504e+02 1.749e+02 1.909e+02 2.121e+02 2.900e+02, threshold=3.818e+02, percent-clipped=0.0 2023-06-15 15:23:26,214 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=176860.0, ans=0.125 2023-06-15 15:23:32,324 INFO [train.py:988] (0/4) Epoch 50, batch 450, loss[loss=0.1931, simple_loss=0.2912, pruned_loss=0.04752, over 15477.00 frames. ], tot_loss[loss=0.1954, simple_loss=0.2795, pruned_loss=0.05561, over 3411085.97 frames. ], batch size: 44, lr: 6.65e-03, grad_scale: 32.0 2023-06-15 15:23:46,063 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=176926.66666666666, ans=0.125 2023-06-15 15:23:53,490 INFO [scaling.py:182] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=176993.33333333334, ans=10.0 2023-06-15 15:24:57,649 INFO [train.py:988] (0/4) Epoch 50, batch 500, loss[loss=0.1967, simple_loss=0.2851, pruned_loss=0.05416, over 18950.00 frames. ], tot_loss[loss=0.1957, simple_loss=0.2794, pruned_loss=0.05599, over 3487550.70 frames. ], batch size: 86, lr: 6.64e-03, grad_scale: 32.0 2023-06-15 15:25:51,655 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/v5/epoch-50.pt 2023-06-15 15:26:00,545 INFO [train.py:1201] (0/4) Done!