2023-11-19 16:28:29,702 INFO [train_asr.py:1330] (2/4) Training started 2023-11-19 16:28:29,703 INFO [train_asr.py:1340] (2/4) Device: cuda:2 2023-11-19 16:28:29,706 INFO [train_asr.py:1352] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'ae3d64ff-dirty', 'icefall-git-date': 'Sun Nov 19 00:54:09 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-6-0423201309-7c68fd68fb-qfn6b', 'IP address': '10.177.58.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 40, 'start_epoch': 10, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-19 16:28:29,706 INFO [train_asr.py:1361] (2/4) About to create model 2023-11-19 16:28:30,819 INFO [train_asr.py:1365] (2/4) Number of model parameters: 65819362 2023-11-19 16:28:30,819 INFO [checkpoint.py:112] (2/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 16:28:34,250 INFO [checkpoint.py:112] (2/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 16:28:36,547 INFO [train_asr.py:1396] (2/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-19 16:28:40,613 INFO [train_asr.py:1405] (2/4) Using DDP 2023-11-19 16:28:40,972 INFO [train_asr.py:1428] (2/4) Loading optimizer state dict 2023-11-19 16:28:41,755 INFO [train_asr.py:1436] (2/4) Loading scheduler state dict 2023-11-19 16:28:41,768 INFO [train_asr.py:1458] (2/4) Getting audioset cuts 2023-11-19 16:28:41,768 INFO [kd_datamodule.py:796] (2/4) About to get the audioset cuts. 2023-11-19 16:28:41,796 INFO [train_asr.py:1464] (2/4) Using mux to combine Librispeech with audioset 2023-11-19 16:28:41,796 INFO [train_asr.py:1474] (2/4) CutSet(len=2748469) [underlying data type: ] 2023-11-19 16:28:57,491 INFO [kd_datamodule.py:396] (2/4) Enable MUSAN 2023-11-19 16:28:57,491 INFO [kd_datamodule.py:397] (2/4) About to get Musan cuts 2023-11-19 16:29:00,987 INFO [kd_datamodule.py:427] (2/4) Enable SpecAugment 2023-11-19 16:29:00,987 INFO [kd_datamodule.py:428] (2/4) Time warp factor: 80 2023-11-19 16:29:00,987 INFO [kd_datamodule.py:438] (2/4) Num frame mask: 10 2023-11-19 16:29:00,988 INFO [kd_datamodule.py:451] (2/4) About to create train dataset 2023-11-19 16:29:00,994 INFO [kd_datamodule.py:487] (2/4) Using SimpleCutSampler 2023-11-19 16:29:00,995 INFO [kd_datamodule.py:495] (2/4) About to create train dataloader 2023-11-19 16:29:01,041 INFO [kd_datamodule.py:814] (2/4) About to get the audioset eval cuts. 2023-11-19 16:29:01,065 INFO [train_asr.py:1538] (2/4) CutSet(len=20681) [underlying data type: ] 2023-11-19 16:29:01,174 INFO [kd_datamodule.py:529] (2/4) About to create dev dataset 2023-11-19 16:29:02,021 INFO [kd_datamodule.py:550] (2/4) About to create dev dataloader 2023-11-19 16:29:02,022 INFO [train_asr.py:1552] (2/4) Loading grad scaler state dict 2023-11-19 16:29:40,875 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 0, loss[loss=0.08449, simple_loss=0.09213, pruned_loss=0.01464, audio_tagging_loss=0.02379, over 14816.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.09213, pruned_loss=0.01464, audio_tagging_loss=0.02379, over 14816.00 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:29:40,876 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-19 16:30:18,314 INFO [train_asr.py:1294] (2/4) Epoch 10, validation: loss=0.06458, simple_loss=0.05578, pruned_loss=0.006608, audio_tagging_loss=0.03008, over 4681554.00 frames. 2023-11-19 16:30:18,314 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-19 16:30:22,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=721400.0, ans=0.2 2023-11-19 16:30:27,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.461e+01 9.125e+01 9.697e+01 1.516e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 16:30:44,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=721466.6666666666, ans=0.2 2023-11-19 16:30:46,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=721533.3333333334, ans=0.0 2023-11-19 16:30:55,419 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2023-11-19 16:31:10,019 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108250 2023-11-19 16:31:16,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721666.6666666666, ans=0.1 2023-11-19 16:31:25,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721733.3333333334, ans=0.0 2023-11-19 16:31:26,678 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 50, loss[loss=0.07747, simple_loss=0.083, pruned_loss=0.01524, audio_tagging_loss=0.02073, over 16131.00 frames. ], tot_loss[loss=0.09456, simple_loss=0.1035, pruned_loss=0.02245, audio_tagging_loss=0.02037, over 686404.04 frames. ], batch size: 62, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:31:39,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=721800.0, ans=0.125 2023-11-19 16:31:42,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721800.0, ans=0.1 2023-11-19 16:31:54,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=721866.6666666666, ans=0.2 2023-11-19 16:32:12,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=721933.3333333334, ans=0.125 2023-11-19 16:32:16,821 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108300 2023-11-19 16:32:31,699 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 100, loss[loss=0.1082, simple_loss=0.1336, pruned_loss=0.0268, audio_tagging_loss=0.01465, over 14543.00 frames. ], tot_loss[loss=0.09392, simple_loss=0.104, pruned_loss=0.02245, audio_tagging_loss=0.01949, over 1201697.42 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:32:39,945 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2023-11-19 16:32:40,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.782e+01 9.608e+01 1.042e+02 1.365e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 16:32:47,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722133.3333333334, ans=0.1 2023-11-19 16:32:50,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=722133.3333333334, ans=0.125 2023-11-19 16:33:06,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=722200.0, ans=0.2 2023-11-19 16:33:20,384 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108350 2023-11-19 16:33:35,185 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 150, loss[loss=0.09038, simple_loss=0.1106, pruned_loss=0.02288, audio_tagging_loss=0.01222, over 14840.00 frames. ], tot_loss[loss=0.09245, simple_loss=0.1046, pruned_loss=0.02267, audio_tagging_loss=0.01748, over 1617348.41 frames. ], batch size: 53, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:33:35,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=722400.0, ans=0.0 2023-11-19 16:34:09,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=722533.3333333334, ans=0.125 2023-11-19 16:34:10,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=722533.3333333334, ans=0.0 2023-11-19 16:34:20,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=722600.0, ans=0.0 2023-11-19 16:34:22,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=722600.0, ans=0.125 2023-11-19 16:34:24,503 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108400 2023-11-19 16:34:27,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=722666.6666666666, ans=0.2 2023-11-19 16:34:39,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=722733.3333333334, ans=0.07 2023-11-19 16:34:40,709 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 200, loss[loss=0.118, simple_loss=0.1434, pruned_loss=0.03706, audio_tagging_loss=0.0093, over 15714.00 frames. ], tot_loss[loss=0.09071, simple_loss=0.1051, pruned_loss=0.02282, audio_tagging_loss=0.01533, over 1931386.33 frames. ], batch size: 56, lr: 7.12e-03, grad_scale: 16.0 2023-11-19 16:34:45,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=722733.3333333334, ans=0.0 2023-11-19 16:34:49,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=22.5 2023-11-19 16:34:51,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.543e+01 8.378e+01 9.256e+01 1.031e+02 1.304e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 16:34:53,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=722800.0, ans=0.0 2023-11-19 16:34:56,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=8.0 2023-11-19 16:35:08,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=722866.6666666666, ans=0.0 2023-11-19 16:35:14,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2023-11-19 16:35:15,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=722866.6666666666, ans=0.0 2023-11-19 16:35:17,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=722933.3333333334, ans=0.5 2023-11-19 16:35:20,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=722933.3333333334, ans=0.125 2023-11-19 16:35:25,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722933.3333333334, ans=0.1 2023-11-19 16:35:28,275 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2023-11-19 16:35:29,470 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108450 2023-11-19 16:35:35,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=723000.0, ans=0.0 2023-11-19 16:35:35,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=723000.0, ans=0.0 2023-11-19 16:35:43,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2023-11-19 16:35:45,236 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 250, loss[loss=0.06134, simple_loss=0.06206, pruned_loss=0.01419, audio_tagging_loss=0.01612, over 14536.00 frames. ], tot_loss[loss=0.08981, simple_loss=0.1053, pruned_loss=0.02328, audio_tagging_loss=0.01389, over 2177798.19 frames. ], batch size: 58, lr: 7.12e-03, grad_scale: 16.0 2023-11-19 16:35:51,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2023-11-19 16:35:57,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=723133.3333333334, ans=0.125 2023-11-19 16:36:07,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=12.0 2023-11-19 16:36:08,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=723200.0, ans=0.125 2023-11-19 16:36:15,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=723200.0, ans=0.2 2023-11-19 16:36:33,698 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108500 2023-11-19 16:36:42,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2023-11-19 16:36:48,246 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 300, loss[loss=0.08574, simple_loss=0.1098, pruned_loss=0.02172, audio_tagging_loss=0.009126, over 16319.00 frames. ], tot_loss[loss=0.08983, simple_loss=0.1065, pruned_loss=0.02376, audio_tagging_loss=0.01282, over 2373589.51 frames. ], batch size: 59, lr: 7.11e-03, grad_scale: 16.0 2023-11-19 16:36:58,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.707e+01 8.557e+01 9.217e+01 9.967e+01 1.431e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 16:37:03,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723466.6666666666, ans=0.1 2023-11-19 16:37:32,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723600.0, ans=0.1 2023-11-19 16:37:37,041 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108550 2023-11-19 16:37:51,885 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 350, loss[loss=0.08342, simple_loss=0.1066, pruned_loss=0.02232, audio_tagging_loss=0.007819, over 15587.00 frames. ], tot_loss[loss=0.08862, simple_loss=0.1058, pruned_loss=0.02354, audio_tagging_loss=0.01219, over 2528418.71 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 16.0 2023-11-19 16:37:52,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=723733.3333333334, ans=0.125 2023-11-19 16:38:03,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=723733.3333333334, ans=0.125 2023-11-19 16:38:11,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=723800.0, ans=0.125 2023-11-19 16:38:40,133 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108600 2023-11-19 16:38:55,832 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 400, loss[loss=0.07644, simple_loss=0.0924, pruned_loss=0.01535, audio_tagging_loss=0.01489, over 14392.00 frames. ], tot_loss[loss=0.08853, simple_loss=0.107, pruned_loss=0.02339, audio_tagging_loss=0.01163, over 2643750.50 frames. ], batch size: 54, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:38:56,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=724066.6666666666, ans=0.0 2023-11-19 16:39:00,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-11-19 16:39:01,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 16:39:06,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.916e+01 9.621e+01 1.044e+02 1.431e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 16:39:18,030 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-19 16:39:27,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=724200.0, ans=0.125 2023-11-19 16:39:35,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724266.6666666666, ans=0.1 2023-11-19 16:39:44,959 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108650 2023-11-19 16:39:46,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=724333.3333333334, ans=0.0 2023-11-19 16:39:51,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=724333.3333333334, ans=0.04949747468305833 2023-11-19 16:39:52,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-11-19 16:39:59,935 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 450, loss[loss=0.0897, simple_loss=0.1071, pruned_loss=0.02168, audio_tagging_loss=0.01447, over 14760.00 frames. ], tot_loss[loss=0.08772, simple_loss=0.1063, pruned_loss=0.02323, audio_tagging_loss=0.01132, over 2729257.42 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:40:08,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-19 16:40:19,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=724466.6666666666, ans=0.125 2023-11-19 16:40:29,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=724533.3333333334, ans=0.125 2023-11-19 16:40:48,031 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108700 2023-11-19 16:41:02,592 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 500, loss[loss=0.06757, simple_loss=0.07825, pruned_loss=0.01724, audio_tagging_loss=0.0112, over 14989.00 frames. ], tot_loss[loss=0.0867, simple_loss=0.1053, pruned_loss=0.02299, audio_tagging_loss=0.01106, over 2797344.19 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:41:11,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=724733.3333333334, ans=0.2 2023-11-19 16:41:13,960 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.347e+01 9.313e+01 1.051e+02 1.429e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 16:41:14,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=724733.3333333334, ans=0.125 2023-11-19 16:41:40,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724933.3333333334, ans=0.1 2023-11-19 16:41:46,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=724933.3333333334, ans=0.0 2023-11-19 16:41:51,535 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108750 2023-11-19 16:41:58,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=725000.0, ans=0.125 2023-11-19 16:42:07,388 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 550, loss[loss=0.08444, simple_loss=0.1091, pruned_loss=0.02004, audio_tagging_loss=0.009839, over 15839.00 frames. ], tot_loss[loss=0.08733, simple_loss=0.1064, pruned_loss=0.0233, audio_tagging_loss=0.01081, over 2855185.45 frames. ], batch size: 60, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:42:07,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=725066.6666666666, ans=0.2 2023-11-19 16:42:16,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=725066.6666666666, ans=0.04949747468305833 2023-11-19 16:42:24,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=725133.3333333334, ans=0.125 2023-11-19 16:42:28,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=725133.3333333334, ans=0.0 2023-11-19 16:42:34,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2023-11-19 16:42:37,304 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-11-19 16:42:38,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725200.0, ans=0.1 2023-11-19 16:42:50,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2023-11-19 16:42:56,346 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108800 2023-11-19 16:43:12,323 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 600, loss[loss=0.1072, simple_loss=0.1318, pruned_loss=0.02785, audio_tagging_loss=0.01342, over 16804.00 frames. ], tot_loss[loss=0.08696, simple_loss=0.1061, pruned_loss=0.0232, audio_tagging_loss=0.01072, over 2897744.88 frames. ], batch size: 61, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:43:17,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=725400.0, ans=0.015 2023-11-19 16:43:19,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=725400.0, ans=0.125 2023-11-19 16:43:21,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 8.186e+01 8.803e+01 9.595e+01 1.577e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-19 16:43:23,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-11-19 16:43:29,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2023-11-19 16:43:48,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=725533.3333333334, ans=0.2 2023-11-19 16:44:01,066 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108850 2023-11-19 16:44:08,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=725666.6666666666, ans=0.125 2023-11-19 16:44:15,721 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 650, loss[loss=0.07446, simple_loss=0.09153, pruned_loss=0.01936, audio_tagging_loss=0.00934, over 15185.00 frames. ], tot_loss[loss=0.08721, simple_loss=0.1062, pruned_loss=0.0234, audio_tagging_loss=0.01068, over 2929534.75 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:44:33,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=725800.0, ans=0.125 2023-11-19 16:44:46,553 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.035e-03 2023-11-19 16:45:04,121 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108900 2023-11-19 16:45:20,007 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 700, loss[loss=0.09623, simple_loss=0.1289, pruned_loss=0.02397, audio_tagging_loss=0.007823, over 16386.00 frames. ], tot_loss[loss=0.08673, simple_loss=0.1059, pruned_loss=0.02305, audio_tagging_loss=0.01075, over 2956502.72 frames. ], batch size: 61, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:45:21,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=726066.6666666666, ans=0.125 2023-11-19 16:45:30,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.106e+01 8.886e+01 9.595e+01 1.122e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 16:45:40,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=726133.3333333334, ans=0.125 2023-11-19 16:45:40,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=726133.3333333334, ans=0.125 2023-11-19 16:45:55,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2023-11-19 16:46:06,743 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:46:07,910 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 108950 2023-11-19 16:46:15,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=726333.3333333334, ans=0.0 2023-11-19 16:46:15,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=726333.3333333334, ans=0.0 2023-11-19 16:46:18,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=726333.3333333334, ans=0.125 2023-11-19 16:46:22,487 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 750, loss[loss=0.08129, simple_loss=0.1056, pruned_loss=0.0161, audio_tagging_loss=0.01238, over 14706.00 frames. ], tot_loss[loss=0.08733, simple_loss=0.1069, pruned_loss=0.02321, audio_tagging_loss=0.01068, over 2981469.65 frames. ], batch size: 55, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:46:56,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=726533.3333333334, ans=0.125 2023-11-19 16:47:11,173 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109000 2023-11-19 16:47:11,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=726600.0, ans=0.0 2023-11-19 16:47:27,218 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 800, loss[loss=0.08512, simple_loss=0.09109, pruned_loss=0.02371, audio_tagging_loss=0.01586, over 15161.00 frames. ], tot_loss[loss=0.08719, simple_loss=0.1064, pruned_loss=0.02321, audio_tagging_loss=0.01076, over 2995578.37 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:47:38,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.461e+01 9.150e+01 1.030e+02 1.294e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 16:47:58,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 16:48:12,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=726933.3333333334, ans=0.1 2023-11-19 16:48:15,669 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109050 2023-11-19 16:48:15,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=726933.3333333334, ans=0.125 2023-11-19 16:48:23,101 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:48:23,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=727000.0, ans=0.0 2023-11-19 16:48:30,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=727066.6666666666, ans=0.125 2023-11-19 16:48:30,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=727066.6666666666, ans=0.2 2023-11-19 16:48:31,300 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 850, loss[loss=0.07269, simple_loss=0.08396, pruned_loss=0.01931, audio_tagging_loss=0.0114, over 15014.00 frames. ], tot_loss[loss=0.08664, simple_loss=0.1054, pruned_loss=0.02314, audio_tagging_loss=0.01078, over 3015970.27 frames. ], batch size: 59, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:49:19,695 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109100 2023-11-19 16:49:21,099 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.287e-01 2023-11-19 16:49:30,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.69 vs. limit=15.0 2023-11-19 16:49:30,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=727333.3333333334, ans=0.125 2023-11-19 16:49:34,190 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 900, loss[loss=0.07094, simple_loss=0.09316, pruned_loss=0.01453, audio_tagging_loss=0.009828, over 15693.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1052, pruned_loss=0.02305, audio_tagging_loss=0.01074, over 3019200.90 frames. ], batch size: 61, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:49:45,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.269e+01 9.055e+01 9.679e+01 1.261e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 16:49:58,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=727533.3333333334, ans=0.125 2023-11-19 16:50:05,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-19 16:50:06,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=727533.3333333334, ans=0.125 2023-11-19 16:50:12,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=727600.0, ans=0.2 2023-11-19 16:50:22,852 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109150 2023-11-19 16:50:32,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=727666.6666666666, ans=0.2 2023-11-19 16:50:37,248 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 950, loss[loss=0.0927, simple_loss=0.1126, pruned_loss=0.02763, audio_tagging_loss=0.008764, over 15884.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.1052, pruned_loss=0.02301, audio_tagging_loss=0.01068, over 3029334.13 frames. ], batch size: 58, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:50:46,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=727733.3333333334, ans=0.125 2023-11-19 16:51:07,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=727866.6666666666, ans=0.125 2023-11-19 16:51:11,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-11-19 16:51:25,977 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109200 2023-11-19 16:51:42,986 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1000, loss[loss=0.09022, simple_loss=0.1088, pruned_loss=0.02703, audio_tagging_loss=0.008784, over 15351.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.105, pruned_loss=0.02312, audio_tagging_loss=0.01044, over 3032914.51 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:51:51,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=728066.6666666666, ans=0.0 2023-11-19 16:51:52,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=728066.6666666666, ans=0.0 2023-11-19 16:51:53,748 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.418e+01 8.048e+01 8.889e+01 9.743e+01 1.398e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 16:52:08,477 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:52:10,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=728200.0, ans=0.0 2023-11-19 16:52:31,677 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109250 2023-11-19 16:52:45,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=728400.0, ans=0.0 2023-11-19 16:52:46,327 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1050, loss[loss=0.1084, simple_loss=0.1309, pruned_loss=0.03265, audio_tagging_loss=0.01027, over 14830.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1047, pruned_loss=0.02294, audio_tagging_loss=0.01039, over 3029898.03 frames. ], batch size: 54, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:53:01,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=728466.6666666666, ans=0.2 2023-11-19 16:53:10,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=728533.3333333334, ans=0.0 2023-11-19 16:53:20,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2023-11-19 16:53:22,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=728533.3333333334, ans=0.04949747468305833 2023-11-19 16:53:28,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=728600.0, ans=0.125 2023-11-19 16:53:30,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=728600.0, ans=0.125 2023-11-19 16:53:35,481 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109300 2023-11-19 16:53:49,975 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1100, loss[loss=0.06085, simple_loss=0.07574, pruned_loss=0.01095, audio_tagging_loss=0.01204, over 15254.00 frames. ], tot_loss[loss=0.08608, simple_loss=0.1053, pruned_loss=0.02321, audio_tagging_loss=0.01024, over 3032820.17 frames. ], batch size: 58, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:53:52,571 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:54:01,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.411e+01 9.070e+01 1.020e+02 1.382e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 16:54:04,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=728800.0, ans=0.0 2023-11-19 16:54:23,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728866.6666666666, ans=0.1 2023-11-19 16:54:27,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=728933.3333333334, ans=0.125 2023-11-19 16:54:38,789 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109350 2023-11-19 16:54:43,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=729000.0, ans=0.125 2023-11-19 16:54:54,104 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1150, loss[loss=0.08829, simple_loss=0.1135, pruned_loss=0.02311, audio_tagging_loss=0.008424, over 15560.00 frames. ], tot_loss[loss=0.08684, simple_loss=0.1064, pruned_loss=0.02346, audio_tagging_loss=0.01021, over 3035757.61 frames. ], batch size: 57, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:55:13,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2023-11-19 16:55:42,479 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109400 2023-11-19 16:55:49,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-19 16:55:57,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=729400.0, ans=0.125 2023-11-19 16:55:58,775 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1200, loss[loss=0.08335, simple_loss=0.09954, pruned_loss=0.02453, audio_tagging_loss=0.009051, over 15888.00 frames. ], tot_loss[loss=0.08643, simple_loss=0.106, pruned_loss=0.02326, audio_tagging_loss=0.01018, over 3036528.05 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:56:09,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.170e+01 9.038e+01 9.712e+01 1.366e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 16:56:26,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=729533.3333333334, ans=0.125 2023-11-19 16:56:47,486 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109450 2023-11-19 16:57:02,019 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1250, loss[loss=0.09083, simple_loss=0.1121, pruned_loss=0.0243, audio_tagging_loss=0.01046, over 15974.00 frames. ], tot_loss[loss=0.08649, simple_loss=0.106, pruned_loss=0.02326, audio_tagging_loss=0.01023, over 3041543.68 frames. ], batch size: 59, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:57:32,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=729866.6666666666, ans=0.0 2023-11-19 16:57:51,012 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109500 2023-11-19 16:57:53,651 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:57:56,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-11-19 16:57:59,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=730000.0, ans=0.0 2023-11-19 16:58:05,806 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1300, loss[loss=0.1008, simple_loss=0.1311, pruned_loss=0.02841, audio_tagging_loss=0.006879, over 15906.00 frames. ], tot_loss[loss=0.08654, simple_loss=0.106, pruned_loss=0.02324, audio_tagging_loss=0.01032, over 3040404.14 frames. ], batch size: 61, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:58:10,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=730066.6666666666, ans=0.125 2023-11-19 16:58:18,232 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.587e+01 8.086e+01 8.673e+01 9.719e+01 1.253e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-19 16:58:24,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=730133.3333333334, ans=0.0 2023-11-19 16:58:38,211 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:58:54,053 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109550 2023-11-19 16:59:06,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730333.3333333334, ans=0.1 2023-11-19 16:59:10,762 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1350, loss[loss=0.1038, simple_loss=0.1309, pruned_loss=0.03049, audio_tagging_loss=0.007856, over 15010.00 frames. ], tot_loss[loss=0.08709, simple_loss=0.1068, pruned_loss=0.02344, audio_tagging_loss=0.01023, over 3042994.58 frames. ], batch size: 55, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:59:10,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=730400.0, ans=0.0 2023-11-19 16:59:12,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730400.0, ans=0.1 2023-11-19 16:59:26,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-19 16:59:35,487 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:59:36,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=730533.3333333334, ans=0.04949747468305833 2023-11-19 16:59:56,680 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:59:59,158 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109600 2023-11-19 17:00:14,628 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1400, loss[loss=0.08169, simple_loss=0.1053, pruned_loss=0.0172, audio_tagging_loss=0.01185, over 15231.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.1059, pruned_loss=0.02313, audio_tagging_loss=0.01034, over 3039347.46 frames. ], batch size: 55, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 17:00:16,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=730733.3333333334, ans=0.09899494936611666 2023-11-19 17:00:25,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.437e+01 9.173e+01 9.925e+01 1.308e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 17:00:36,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=730800.0, ans=0.0 2023-11-19 17:00:48,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730866.6666666666, ans=0.1 2023-11-19 17:00:55,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-11-19 17:01:03,405 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109650 2023-11-19 17:01:18,072 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1450, loss[loss=0.0809, simple_loss=0.1031, pruned_loss=0.01988, audio_tagging_loss=0.009479, over 14910.00 frames. ], tot_loss[loss=0.08584, simple_loss=0.1051, pruned_loss=0.02283, audio_tagging_loss=0.01048, over 3037939.24 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 17:01:25,140 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:01:39,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=731133.3333333334, ans=0.2 2023-11-19 17:01:48,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=731200.0, ans=0.125 2023-11-19 17:01:57,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=15.0 2023-11-19 17:02:01,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=731266.6666666666, ans=0.0 2023-11-19 17:02:06,407 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109700 2023-11-19 17:02:22,230 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1500, loss[loss=0.08473, simple_loss=0.09668, pruned_loss=0.02536, audio_tagging_loss=0.01103, over 15109.00 frames. ], tot_loss[loss=0.08635, simple_loss=0.1054, pruned_loss=0.02313, audio_tagging_loss=0.01051, over 3040788.96 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 17:02:31,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=12.0 2023-11-19 17:02:32,908 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:02:34,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=22.5 2023-11-19 17:02:35,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.288e+01 9.153e+01 9.955e+01 1.243e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 17:02:36,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=731466.6666666666, ans=0.125 2023-11-19 17:02:45,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731466.6666666666, ans=0.1 2023-11-19 17:03:06,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=731600.0, ans=0.125 2023-11-19 17:03:08,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=731600.0, ans=0.2 2023-11-19 17:03:11,040 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109750 2023-11-19 17:03:16,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731666.6666666666, ans=0.1 2023-11-19 17:03:19,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2023-11-19 17:03:26,129 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1550, loss[loss=0.09636, simple_loss=0.1254, pruned_loss=0.02538, audio_tagging_loss=0.008258, over 14545.00 frames. ], tot_loss[loss=0.08557, simple_loss=0.1042, pruned_loss=0.02281, audio_tagging_loss=0.01068, over 3040166.10 frames. ], batch size: 54, lr: 7.07e-03, grad_scale: 16.0 2023-11-19 17:03:27,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=731733.3333333334, ans=0.0 2023-11-19 17:03:28,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=731733.3333333334, ans=0.125 2023-11-19 17:03:33,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=731733.3333333334, ans=0.5 2023-11-19 17:03:44,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=731800.0, ans=0.125 2023-11-19 17:03:47,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.18 vs. limit=10.0 2023-11-19 17:03:50,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 17:03:57,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 17:03:58,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=731866.6666666666, ans=0.0 2023-11-19 17:04:00,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=731866.6666666666, ans=0.0 2023-11-19 17:04:03,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731933.3333333334, ans=0.1 2023-11-19 17:04:12,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2023-11-19 17:04:14,525 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109800 2023-11-19 17:04:23,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=15.0 2023-11-19 17:04:29,595 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1600, loss[loss=0.08655, simple_loss=0.1127, pruned_loss=0.0203, audio_tagging_loss=0.009921, over 16055.00 frames. ], tot_loss[loss=0.0861, simple_loss=0.1048, pruned_loss=0.02295, audio_tagging_loss=0.01077, over 3044395.44 frames. ], batch size: 59, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:04:32,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=22.5 2023-11-19 17:04:42,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.694e+01 9.571e+01 1.026e+02 1.392e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-19 17:04:50,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=732133.3333333334, ans=0.0 2023-11-19 17:04:53,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=732133.3333333334, ans=0.0 2023-11-19 17:05:14,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=732266.6666666666, ans=0.125 2023-11-19 17:05:18,446 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109850 2023-11-19 17:05:29,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-19 17:05:30,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732333.3333333334, ans=0.125 2023-11-19 17:05:34,203 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1650, loss[loss=0.1112, simple_loss=0.1362, pruned_loss=0.03449, audio_tagging_loss=0.008616, over 15754.00 frames. ], tot_loss[loss=0.08669, simple_loss=0.1054, pruned_loss=0.02331, audio_tagging_loss=0.0107, over 3048539.47 frames. ], batch size: 57, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:05:39,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=732400.0, ans=0.125 2023-11-19 17:05:41,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=732400.0, ans=0.2 2023-11-19 17:05:53,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=732466.6666666666, ans=0.125 2023-11-19 17:06:19,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=732600.0, ans=0.125 2023-11-19 17:06:22,813 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109900 2023-11-19 17:06:38,161 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1700, loss[loss=0.07684, simple_loss=0.09646, pruned_loss=0.01826, audio_tagging_loss=0.01035, over 14967.00 frames. ], tot_loss[loss=0.08624, simple_loss=0.1048, pruned_loss=0.02306, audio_tagging_loss=0.0108, over 3046485.06 frames. ], batch size: 56, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:06:50,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.219e+01 8.857e+01 9.747e+01 1.189e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 17:07:14,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=732866.6666666666, ans=0.0 2023-11-19 17:07:27,189 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 109950 2023-11-19 17:07:41,746 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1750, loss[loss=0.08558, simple_loss=0.1039, pruned_loss=0.02332, audio_tagging_loss=0.0103, over 15816.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1042, pruned_loss=0.02297, audio_tagging_loss=0.01068, over 3047642.18 frames. ], batch size: 61, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:07:52,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=733066.6666666666, ans=0.0 2023-11-19 17:07:59,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=733133.3333333334, ans=0.125 2023-11-19 17:08:30,319 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110000 2023-11-19 17:08:39,583 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:08:46,821 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1800, loss[loss=0.07012, simple_loss=0.09186, pruned_loss=0.01443, audio_tagging_loss=0.009766, over 15504.00 frames. ], tot_loss[loss=0.0862, simple_loss=0.1049, pruned_loss=0.0231, audio_tagging_loss=0.01065, over 3046723.63 frames. ], batch size: 58, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:08:49,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=733400.0, ans=0.1 2023-11-19 17:09:00,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.669e+01 8.372e+01 9.088e+01 1.009e+02 1.305e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 17:09:01,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=733466.6666666666, ans=0.125 2023-11-19 17:09:03,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-11-19 17:09:07,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=733466.6666666666, ans=0.125 2023-11-19 17:09:10,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=733533.3333333334, ans=0.04949747468305833 2023-11-19 17:09:27,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=733600.0, ans=0.2 2023-11-19 17:09:35,473 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110050 2023-11-19 17:09:50,045 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1850, loss[loss=0.07508, simple_loss=0.09224, pruned_loss=0.01705, audio_tagging_loss=0.01191, over 15680.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.1055, pruned_loss=0.02321, audio_tagging_loss=0.01049, over 3047997.96 frames. ], batch size: 58, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:10:01,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=733800.0, ans=0.125 2023-11-19 17:10:05,590 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:10:15,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733866.6666666666, ans=0.1 2023-11-19 17:10:38,793 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110100 2023-11-19 17:10:49,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=734000.0, ans=0.125 2023-11-19 17:10:54,480 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1900, loss[loss=0.07866, simple_loss=0.09783, pruned_loss=0.01981, audio_tagging_loss=0.009942, over 14604.00 frames. ], tot_loss[loss=0.08532, simple_loss=0.1043, pruned_loss=0.02273, audio_tagging_loss=0.01046, over 3054942.97 frames. ], batch size: 57, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:10:59,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=734066.6666666666, ans=0.0 2023-11-19 17:11:08,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.512e+01 8.528e+01 8.978e+01 9.700e+01 1.316e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 17:11:27,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=734200.0, ans=0.0 2023-11-19 17:11:43,546 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110150 2023-11-19 17:11:53,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=734333.3333333334, ans=0.0 2023-11-19 17:11:55,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=734333.3333333334, ans=0.2 2023-11-19 17:11:57,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=734400.0, ans=0.0 2023-11-19 17:11:59,394 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 1950, loss[loss=0.09692, simple_loss=0.1168, pruned_loss=0.02838, audio_tagging_loss=0.01012, over 14047.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1044, pruned_loss=0.0228, audio_tagging_loss=0.01042, over 3049933.14 frames. ], batch size: 53, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:12:12,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=734466.6666666666, ans=0.125 2023-11-19 17:12:13,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=734466.6666666666, ans=0.125 2023-11-19 17:12:43,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=734600.0, ans=15.0 2023-11-19 17:12:48,395 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110200 2023-11-19 17:13:02,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=734733.3333333334, ans=0.125 2023-11-19 17:13:03,588 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2000, loss[loss=0.09574, simple_loss=0.1191, pruned_loss=0.02602, audio_tagging_loss=0.01017, over 16448.00 frames. ], tot_loss[loss=0.08564, simple_loss=0.1046, pruned_loss=0.02289, audio_tagging_loss=0.01047, over 3046005.68 frames. ], batch size: 60, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:13:17,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.336e+01 8.914e+01 9.433e+01 1.309e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-19 17:13:20,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=734800.0, ans=0.05 2023-11-19 17:13:23,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=734800.0, ans=0.0 2023-11-19 17:13:44,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=734933.3333333334, ans=0.125 2023-11-19 17:13:52,722 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110250 2023-11-19 17:13:55,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2023-11-19 17:14:08,084 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2050, loss[loss=0.06, simple_loss=0.06653, pruned_loss=0.01404, audio_tagging_loss=0.01269, over 14987.00 frames. ], tot_loss[loss=0.08591, simple_loss=0.1049, pruned_loss=0.02304, audio_tagging_loss=0.01042, over 3051162.70 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:14:27,780 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:14:29,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=735133.3333333334, ans=0.2 2023-11-19 17:14:37,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=735200.0, ans=0.0 2023-11-19 17:14:50,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-19 17:14:55,500 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110300 2023-11-19 17:14:55,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=22.5 2023-11-19 17:15:02,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=735333.3333333334, ans=0.125 2023-11-19 17:15:10,840 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2100, loss[loss=0.08961, simple_loss=0.1041, pruned_loss=0.02909, audio_tagging_loss=0.008489, over 15335.00 frames. ], tot_loss[loss=0.08603, simple_loss=0.105, pruned_loss=0.0232, audio_tagging_loss=0.01032, over 3048509.84 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:15:20,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735400.0, ans=0.1 2023-11-19 17:15:25,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.159e+01 8.890e+01 9.967e+01 1.434e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 17:15:34,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=735466.6666666666, ans=0.2 2023-11-19 17:15:41,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=735533.3333333334, ans=0.125 2023-11-19 17:15:49,940 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:15:57,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=735600.0, ans=0.125 2023-11-19 17:15:59,783 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110350 2023-11-19 17:16:15,526 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2150, loss[loss=0.07635, simple_loss=0.08754, pruned_loss=0.01836, audio_tagging_loss=0.01422, over 14789.00 frames. ], tot_loss[loss=0.08651, simple_loss=0.1059, pruned_loss=0.02332, audio_tagging_loss=0.01024, over 3048753.47 frames. ], batch size: 57, lr: 7.05e-03, grad_scale: 32.0 2023-11-19 17:16:19,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=735733.3333333334, ans=0.125 2023-11-19 17:16:19,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=735733.3333333334, ans=0.0 2023-11-19 17:16:33,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=735800.0, ans=0.125 2023-11-19 17:16:33,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=735800.0, ans=0.0 2023-11-19 17:16:37,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2023-11-19 17:16:45,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=735866.6666666666, ans=0.125 2023-11-19 17:16:55,036 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:17:03,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=735933.3333333334, ans=0.2 2023-11-19 17:17:04,966 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110400 2023-11-19 17:17:12,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=736000.0, ans=0.0 2023-11-19 17:17:20,314 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2200, loss[loss=0.07211, simple_loss=0.09397, pruned_loss=0.01641, audio_tagging_loss=0.008717, over 14344.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1063, pruned_loss=0.02347, audio_tagging_loss=0.0103, over 3050543.16 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:17:33,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=736133.3333333334, ans=0.125 2023-11-19 17:17:35,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.594e+01 9.409e+01 1.055e+02 1.451e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 17:17:43,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=736133.3333333334, ans=0.125 2023-11-19 17:17:45,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=736133.3333333334, ans=0.0 2023-11-19 17:17:45,514 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:18:10,129 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110450 2023-11-19 17:18:25,407 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2250, loss[loss=0.09219, simple_loss=0.1183, pruned_loss=0.02495, audio_tagging_loss=0.008075, over 14936.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.1074, pruned_loss=0.02347, audio_tagging_loss=0.01023, over 3057047.04 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:18:25,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=736400.0, ans=0.0 2023-11-19 17:18:41,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=736466.6666666666, ans=0.0 2023-11-19 17:18:46,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2023-11-19 17:19:07,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2023-11-19 17:19:15,055 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110500 2023-11-19 17:19:23,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=736666.6666666666, ans=0.125 2023-11-19 17:19:31,629 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2300, loss[loss=0.07644, simple_loss=0.09282, pruned_loss=0.0189, audio_tagging_loss=0.01114, over 16356.00 frames. ], tot_loss[loss=0.08791, simple_loss=0.1079, pruned_loss=0.0236, audio_tagging_loss=0.01034, over 3056690.60 frames. ], batch size: 60, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 17:19:39,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736733.3333333334, ans=0.1 2023-11-19 17:19:46,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736800.0, ans=0.1 2023-11-19 17:19:47,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.474e+01 9.296e+01 1.022e+02 1.350e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 17:19:52,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=736800.0, ans=0.0 2023-11-19 17:20:21,207 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110550 2023-11-19 17:20:28,654 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:20:33,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2023-11-19 17:20:36,086 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2350, loss[loss=0.07567, simple_loss=0.09549, pruned_loss=0.02076, audio_tagging_loss=0.007168, over 14739.00 frames. ], tot_loss[loss=0.08808, simple_loss=0.1081, pruned_loss=0.02359, audio_tagging_loss=0.01045, over 3058385.20 frames. ], batch size: 58, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 17:20:39,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=737066.6666666666, ans=6.0 2023-11-19 17:20:42,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=737066.6666666666, ans=0.0 2023-11-19 17:20:45,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=737066.6666666666, ans=0.125 2023-11-19 17:20:51,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2023-11-19 17:21:25,372 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110600 2023-11-19 17:21:28,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737333.3333333334, ans=0.1 2023-11-19 17:21:32,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=737333.3333333334, ans=0.0 2023-11-19 17:21:37,037 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:21:37,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=737333.3333333334, ans=0.0 2023-11-19 17:21:40,462 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2400, loss[loss=0.07478, simple_loss=0.09609, pruned_loss=0.01742, audio_tagging_loss=0.009312, over 14748.00 frames. ], tot_loss[loss=0.08857, simple_loss=0.1085, pruned_loss=0.02383, audio_tagging_loss=0.0105, over 3049873.43 frames. ], batch size: 58, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:21:41,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737400.0, ans=0.1 2023-11-19 17:21:55,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=737466.6666666666, ans=0.125 2023-11-19 17:21:57,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=737466.6666666666, ans=0.125 2023-11-19 17:21:58,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.550e+01 9.088e+01 1.010e+02 1.299e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 17:22:17,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=737533.3333333334, ans=0.125 2023-11-19 17:22:19,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=737600.0, ans=0.125 2023-11-19 17:22:22,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=737600.0, ans=0.125 2023-11-19 17:22:29,546 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110650 2023-11-19 17:22:46,862 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2450, loss[loss=0.09268, simple_loss=0.1221, pruned_loss=0.02282, audio_tagging_loss=0.008791, over 15253.00 frames. ], tot_loss[loss=0.08807, simple_loss=0.1076, pruned_loss=0.02367, audio_tagging_loss=0.01061, over 3056059.36 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:23:11,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=737866.6666666666, ans=0.0 2023-11-19 17:23:12,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=737866.6666666666, ans=0.2 2023-11-19 17:23:16,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-19 17:23:35,181 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110700 2023-11-19 17:23:46,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=738000.0, ans=0.2 2023-11-19 17:23:48,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-19 17:23:49,636 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2500, loss[loss=0.09881, simple_loss=0.1116, pruned_loss=0.03055, audio_tagging_loss=0.01245, over 16338.00 frames. ], tot_loss[loss=0.08768, simple_loss=0.1067, pruned_loss=0.02355, audio_tagging_loss=0.01076, over 3056418.26 frames. ], batch size: 61, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:24:05,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.346e+01 8.795e+01 9.751e+01 1.396e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-19 17:24:16,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=738200.0, ans=0.125 2023-11-19 17:24:27,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=738266.6666666666, ans=0.125 2023-11-19 17:24:36,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738266.6666666666, ans=0.0 2023-11-19 17:24:38,655 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110750 2023-11-19 17:24:53,100 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2550, loss[loss=0.0906, simple_loss=0.1146, pruned_loss=0.02516, audio_tagging_loss=0.008124, over 15241.00 frames. ], tot_loss[loss=0.08823, simple_loss=0.1075, pruned_loss=0.02385, audio_tagging_loss=0.01065, over 3048104.47 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:24:59,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738400.0, ans=0.0 2023-11-19 17:25:03,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=738400.0, ans=0.07 2023-11-19 17:25:26,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=738533.3333333334, ans=0.0 2023-11-19 17:25:42,409 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110800 2023-11-19 17:25:46,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=738666.6666666666, ans=0.04949747468305833 2023-11-19 17:25:58,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2023-11-19 17:26:00,199 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2600, loss[loss=0.1027, simple_loss=0.1265, pruned_loss=0.03024, audio_tagging_loss=0.009263, over 15337.00 frames. ], tot_loss[loss=0.08766, simple_loss=0.1071, pruned_loss=0.02369, audio_tagging_loss=0.01042, over 3048901.60 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:26:16,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.270e+01 8.898e+01 9.575e+01 2.029e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 17:26:17,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=738800.0, ans=0.125 2023-11-19 17:26:22,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=738800.0, ans=0.2 2023-11-19 17:26:27,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738866.6666666666, ans=0.1 2023-11-19 17:26:33,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=738866.6666666666, ans=0.125 2023-11-19 17:26:36,628 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-11-19 17:26:46,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=738933.3333333334, ans=0.2 2023-11-19 17:26:49,144 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110850 2023-11-19 17:27:03,858 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2650, loss[loss=0.07102, simple_loss=0.08514, pruned_loss=0.01589, audio_tagging_loss=0.01256, over 15493.00 frames. ], tot_loss[loss=0.08701, simple_loss=0.1066, pruned_loss=0.02335, audio_tagging_loss=0.01036, over 3048901.88 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:27:11,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=739066.6666666666, ans=0.2 2023-11-19 17:27:16,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=15.0 2023-11-19 17:27:21,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=739133.3333333334, ans=0.95 2023-11-19 17:27:25,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2023-11-19 17:27:27,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739200.0, ans=0.1 2023-11-19 17:27:49,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=739266.6666666666, ans=0.0 2023-11-19 17:27:53,018 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110900 2023-11-19 17:28:07,632 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2700, loss[loss=0.06697, simple_loss=0.08068, pruned_loss=0.01455, audio_tagging_loss=0.01208, over 14450.00 frames. ], tot_loss[loss=0.08734, simple_loss=0.1069, pruned_loss=0.02349, audio_tagging_loss=0.01039, over 3046845.83 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:28:07,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=739400.0, ans=0.125 2023-11-19 17:28:13,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=12.0 2023-11-19 17:28:25,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.552e+01 9.403e+01 1.042e+02 1.397e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 17:28:28,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=739466.6666666666, ans=0.2 2023-11-19 17:28:50,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739600.0, ans=0.1 2023-11-19 17:28:56,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=739600.0, ans=0.0 2023-11-19 17:28:57,044 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 110950 2023-11-19 17:29:12,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=739733.3333333334, ans=0.125 2023-11-19 17:29:13,016 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2750, loss[loss=0.08573, simple_loss=0.1031, pruned_loss=0.02226, audio_tagging_loss=0.01194, over 16700.00 frames. ], tot_loss[loss=0.08757, simple_loss=0.1071, pruned_loss=0.02364, audio_tagging_loss=0.01038, over 3039706.45 frames. ], batch size: 64, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:29:14,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=739733.3333333334, ans=0.125 2023-11-19 17:29:17,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739733.3333333334, ans=0.1 2023-11-19 17:29:21,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=739733.3333333334, ans=0.0 2023-11-19 17:30:01,044 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111000 2023-11-19 17:30:01,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=739933.3333333334, ans=0.125 2023-11-19 17:30:06,971 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:30:16,711 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2800, loss[loss=0.09201, simple_loss=0.1062, pruned_loss=0.02626, audio_tagging_loss=0.01268, over 14245.00 frames. ], tot_loss[loss=0.08655, simple_loss=0.1055, pruned_loss=0.02336, audio_tagging_loss=0.01044, over 3034359.72 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 17:30:32,829 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.267e+01 8.759e+01 9.728e+01 1.191e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-19 17:30:44,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=740200.0, ans=0.0 2023-11-19 17:31:05,938 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111050 2023-11-19 17:31:11,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=740333.3333333334, ans=0.0 2023-11-19 17:31:20,586 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2850, loss[loss=0.09794, simple_loss=0.1302, pruned_loss=0.02509, audio_tagging_loss=0.007752, over 16211.00 frames. ], tot_loss[loss=0.08636, simple_loss=0.1057, pruned_loss=0.02325, audio_tagging_loss=0.01027, over 3034817.33 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 17:32:09,369 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111100 2023-11-19 17:32:14,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=740666.6666666666, ans=0.125 2023-11-19 17:32:22,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=740666.6666666666, ans=0.2 2023-11-19 17:32:25,181 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2900, loss[loss=0.1029, simple_loss=0.1288, pruned_loss=0.03016, audio_tagging_loss=0.008357, over 15412.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1056, pruned_loss=0.02332, audio_tagging_loss=0.0103, over 3044864.64 frames. ], batch size: 57, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:32:37,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-19 17:32:42,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=740800.0, ans=0.0 2023-11-19 17:32:43,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.402e+01 9.299e+01 9.982e+01 1.196e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 17:32:44,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=740800.0, ans=0.2 2023-11-19 17:32:48,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740800.0, ans=0.1 2023-11-19 17:33:12,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=740933.3333333334, ans=0.125 2023-11-19 17:33:14,517 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111150 2023-11-19 17:33:24,896 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2023-11-19 17:33:28,976 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 2950, loss[loss=0.0729, simple_loss=0.08247, pruned_loss=0.0206, audio_tagging_loss=0.01107, over 14409.00 frames. ], tot_loss[loss=0.08653, simple_loss=0.1059, pruned_loss=0.02337, audio_tagging_loss=0.01022, over 3049816.08 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:33:37,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=741066.6666666666, ans=0.09899494936611666 2023-11-19 17:33:42,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=741133.3333333334, ans=0.125 2023-11-19 17:34:17,258 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111200 2023-11-19 17:34:33,014 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3000, loss[loss=0.1035, simple_loss=0.1328, pruned_loss=0.02608, audio_tagging_loss=0.01103, over 14747.00 frames. ], tot_loss[loss=0.08711, simple_loss=0.1067, pruned_loss=0.02351, audio_tagging_loss=0.01028, over 3055417.15 frames. ], batch size: 57, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:34:33,015 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-19 17:35:14,019 INFO [train_asr.py:1294] (2/4) Epoch 10, validation: loss=0.06437, simple_loss=0.0554, pruned_loss=0.006444, audio_tagging_loss=0.03022, over 4681554.00 frames. 2023-11-19 17:35:14,019 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-19 17:35:23,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=741400.0, ans=0.125 2023-11-19 17:35:31,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.687e+01 8.390e+01 9.154e+01 1.009e+02 1.642e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 17:35:48,397 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:35:57,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=741600.0, ans=0.2 2023-11-19 17:35:58,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=741600.0, ans=0.2 2023-11-19 17:36:03,177 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111250 2023-11-19 17:36:17,858 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3050, loss[loss=0.08244, simple_loss=0.1032, pruned_loss=0.02233, audio_tagging_loss=0.008499, over 15461.00 frames. ], tot_loss[loss=0.08709, simple_loss=0.1066, pruned_loss=0.02349, audio_tagging_loss=0.0103, over 3053831.80 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:36:27,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.63 vs. limit=10.0 2023-11-19 17:36:49,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=741866.6666666666, ans=0.2 2023-11-19 17:36:55,117 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:37:06,326 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111300 2023-11-19 17:37:17,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=742000.0, ans=0.125 2023-11-19 17:37:21,853 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3100, loss[loss=0.1009, simple_loss=0.12, pruned_loss=0.03065, audio_tagging_loss=0.01023, over 15791.00 frames. ], tot_loss[loss=0.08685, simple_loss=0.106, pruned_loss=0.02343, audio_tagging_loss=0.01045, over 3049497.40 frames. ], batch size: 59, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:37:23,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=742066.6666666666, ans=0.0 2023-11-19 17:37:30,776 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:37:40,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.266e+01 9.120e+01 9.877e+01 1.232e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 17:37:43,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=742133.3333333334, ans=0.125 2023-11-19 17:37:43,973 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2023-11-19 17:37:45,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=742133.3333333334, ans=0.07 2023-11-19 17:37:51,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=742200.0, ans=0.07 2023-11-19 17:37:52,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.43 vs. limit=10.0 2023-11-19 17:37:54,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=742200.0, ans=0.0 2023-11-19 17:38:10,386 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:38:11,349 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111350 2023-11-19 17:38:17,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=742333.3333333334, ans=0.04949747468305833 2023-11-19 17:38:26,717 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3150, loss[loss=0.0744, simple_loss=0.08225, pruned_loss=0.02198, audio_tagging_loss=0.0113, over 14527.00 frames. ], tot_loss[loss=0.08668, simple_loss=0.1056, pruned_loss=0.02331, audio_tagging_loss=0.01058, over 3047360.26 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:39:05,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=742600.0, ans=0.95 2023-11-19 17:39:14,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=742600.0, ans=0.1 2023-11-19 17:39:16,402 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111400 2023-11-19 17:39:17,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=742666.6666666666, ans=0.2 2023-11-19 17:39:21,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2023-11-19 17:39:25,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=742666.6666666666, ans=0.0 2023-11-19 17:39:28,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=742666.6666666666, ans=0.1 2023-11-19 17:39:32,236 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3200, loss[loss=0.1045, simple_loss=0.1219, pruned_loss=0.02978, audio_tagging_loss=0.01374, over 14693.00 frames. ], tot_loss[loss=0.08708, simple_loss=0.1064, pruned_loss=0.0233, audio_tagging_loss=0.0106, over 3049589.28 frames. ], batch size: 54, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 17:39:32,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=742733.3333333334, ans=0.0 2023-11-19 17:39:33,905 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:39:38,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=742733.3333333334, ans=0.0 2023-11-19 17:39:47,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2023-11-19 17:39:50,364 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.277e+01 9.297e+01 1.012e+02 1.250e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 17:40:10,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2023-11-19 17:40:22,046 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111450 2023-11-19 17:40:24,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743000.0, ans=0.125 2023-11-19 17:40:37,268 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3250, loss[loss=0.08191, simple_loss=0.1042, pruned_loss=0.02021, audio_tagging_loss=0.009594, over 15027.00 frames. ], tot_loss[loss=0.08664, simple_loss=0.1058, pruned_loss=0.02311, audio_tagging_loss=0.01065, over 3052813.96 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 17:41:04,812 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.08 vs. limit=22.5 2023-11-19 17:41:15,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.72 vs. limit=10.0 2023-11-19 17:41:24,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=743266.6666666666, ans=0.125 2023-11-19 17:41:25,904 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111500 2023-11-19 17:41:28,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=743333.3333333334, ans=0.0 2023-11-19 17:41:40,588 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3300, loss[loss=0.092, simple_loss=0.1168, pruned_loss=0.02443, audio_tagging_loss=0.00917, over 15264.00 frames. ], tot_loss[loss=0.08755, simple_loss=0.1069, pruned_loss=0.02344, audio_tagging_loss=0.01068, over 3050576.63 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:41:47,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2023-11-19 17:42:00,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.050e+01 8.952e+01 9.862e+01 1.284e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 17:42:06,844 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=12.0 2023-11-19 17:42:18,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743600.0, ans=0.125 2023-11-19 17:42:29,523 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111550 2023-11-19 17:42:45,505 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3350, loss[loss=0.1123, simple_loss=0.1376, pruned_loss=0.03433, audio_tagging_loss=0.009152, over 15752.00 frames. ], tot_loss[loss=0.08739, simple_loss=0.1068, pruned_loss=0.02342, audio_tagging_loss=0.01055, over 3057308.46 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:42:48,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=743733.3333333334, ans=0.125 2023-11-19 17:43:02,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=743800.0, ans=0.125 2023-11-19 17:43:04,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=743800.0, ans=0.125 2023-11-19 17:43:34,595 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111600 2023-11-19 17:43:41,956 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:43:42,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744000.0, ans=0.1 2023-11-19 17:43:46,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=744000.0, ans=0.2 2023-11-19 17:43:48,452 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-11-19 17:43:50,290 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3400, loss[loss=0.09192, simple_loss=0.1043, pruned_loss=0.02809, audio_tagging_loss=0.01165, over 14522.00 frames. ], tot_loss[loss=0.08676, simple_loss=0.106, pruned_loss=0.02329, audio_tagging_loss=0.01049, over 3056539.51 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:44:04,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744133.3333333334, ans=0.125 2023-11-19 17:44:08,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=744133.3333333334, ans=0.0 2023-11-19 17:44:09,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.409e+01 9.014e+01 1.006e+02 1.399e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 17:44:18,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=744200.0, ans=0.125 2023-11-19 17:44:24,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=744200.0, ans=0.0 2023-11-19 17:44:29,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=744266.6666666666, ans=0.125 2023-11-19 17:44:31,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2023-11-19 17:44:33,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2023-11-19 17:44:34,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=744266.6666666666, ans=0.05 2023-11-19 17:44:39,207 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111650 2023-11-19 17:44:54,688 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3450, loss[loss=0.0842, simple_loss=0.1119, pruned_loss=0.02166, audio_tagging_loss=0.006581, over 14659.00 frames. ], tot_loss[loss=0.08665, simple_loss=0.1063, pruned_loss=0.02325, audio_tagging_loss=0.01027, over 3046612.91 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:44:55,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744400.0, ans=0.125 2023-11-19 17:45:01,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-19 17:45:06,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=744400.0, ans=0.0 2023-11-19 17:45:22,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=744533.3333333334, ans=0.0 2023-11-19 17:45:31,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.23 vs. limit=5.0 2023-11-19 17:45:41,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744600.0, ans=0.1 2023-11-19 17:45:43,782 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111700 2023-11-19 17:45:56,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=744666.6666666666, ans=0.125 2023-11-19 17:45:59,671 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3500, loss[loss=0.06945, simple_loss=0.08452, pruned_loss=0.01768, audio_tagging_loss=0.009507, over 15173.00 frames. ], tot_loss[loss=0.08653, simple_loss=0.1063, pruned_loss=0.02323, audio_tagging_loss=0.01017, over 3044752.64 frames. ], batch size: 59, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:46:01,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=744733.3333333334, ans=0.07 2023-11-19 17:46:01,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=744733.3333333334, ans=0.0 2023-11-19 17:46:18,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.242e+01 8.864e+01 9.843e+01 1.271e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 17:46:31,159 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:46:45,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=744933.3333333334, ans=0.2 2023-11-19 17:46:48,452 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111750 2023-11-19 17:46:52,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=745000.0, ans=0.125 2023-11-19 17:47:03,406 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3550, loss[loss=0.1163, simple_loss=0.1358, pruned_loss=0.0386, audio_tagging_loss=0.009809, over 15091.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.105, pruned_loss=0.02298, audio_tagging_loss=0.01016, over 3035616.87 frames. ], batch size: 54, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:47:12,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=745066.6666666666, ans=0.2 2023-11-19 17:47:24,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=745133.3333333334, ans=0.125 2023-11-19 17:47:29,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=745200.0, ans=0.2 2023-11-19 17:47:51,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745266.6666666666, ans=0.1 2023-11-19 17:47:52,385 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111800 2023-11-19 17:47:52,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=745266.6666666666, ans=0.02 2023-11-19 17:47:52,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=745266.6666666666, ans=0.125 2023-11-19 17:47:57,224 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-11-19 17:48:04,356 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:48:07,835 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3600, loss[loss=0.06534, simple_loss=0.07234, pruned_loss=0.01638, audio_tagging_loss=0.0128, over 15331.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.106, pruned_loss=0.02328, audio_tagging_loss=0.01012, over 3040087.36 frames. ], batch size: 57, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:48:22,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=745466.6666666666, ans=0.0 2023-11-19 17:48:27,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=745466.6666666666, ans=10.0 2023-11-19 17:48:28,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.234e+01 9.119e+01 9.988e+01 1.352e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 17:48:35,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=745533.3333333334, ans=0.0 2023-11-19 17:48:39,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-11-19 17:48:42,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=745533.3333333334, ans=0.0 2023-11-19 17:48:55,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=745600.0, ans=0.1 2023-11-19 17:48:56,729 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111850 2023-11-19 17:49:13,524 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3650, loss[loss=0.131, simple_loss=0.1592, pruned_loss=0.04072, audio_tagging_loss=0.01063, over 15967.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.1063, pruned_loss=0.0235, audio_tagging_loss=0.01002, over 3037703.30 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:49:19,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745733.3333333334, ans=0.1 2023-11-19 17:49:33,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=745800.0, ans=0.2 2023-11-19 17:49:37,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=745866.6666666666, ans=0.125 2023-11-19 17:49:49,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=745933.3333333334, ans=0.125 2023-11-19 17:49:49,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=745933.3333333334, ans=0.125 2023-11-19 17:50:02,620 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111900 2023-11-19 17:50:07,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746000.0, ans=0.0 2023-11-19 17:50:15,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746000.0, ans=0.1 2023-11-19 17:50:16,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=746066.6666666666, ans=0.125 2023-11-19 17:50:17,570 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3700, loss[loss=0.07879, simple_loss=0.09764, pruned_loss=0.02204, audio_tagging_loss=0.007939, over 15249.00 frames. ], tot_loss[loss=0.08644, simple_loss=0.1062, pruned_loss=0.02334, audio_tagging_loss=0.01002, over 3044142.19 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:50:22,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=746066.6666666666, ans=0.0 2023-11-19 17:50:28,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-19 17:50:35,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.609e+01 9.395e+01 1.090e+02 1.567e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 17:51:03,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-19 17:51:05,381 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 111950 2023-11-19 17:51:05,699 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:51:20,094 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3750, loss[loss=0.08208, simple_loss=0.09442, pruned_loss=0.02133, audio_tagging_loss=0.01354, over 13793.00 frames. ], tot_loss[loss=0.0866, simple_loss=0.1063, pruned_loss=0.02332, audio_tagging_loss=0.01013, over 3045269.30 frames. ], batch size: 55, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:51:24,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=746400.0, ans=0.125 2023-11-19 17:51:45,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=746533.3333333334, ans=15.0 2023-11-19 17:51:47,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2023-11-19 17:52:00,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=746600.0, ans=0.0 2023-11-19 17:52:00,589 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-11-19 17:52:03,715 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:52:05,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=746600.0, ans=0.2 2023-11-19 17:52:08,666 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112000 2023-11-19 17:52:23,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=746666.6666666666, ans=0.0 2023-11-19 17:52:28,347 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3800, loss[loss=0.088, simple_loss=0.103, pruned_loss=0.02538, audio_tagging_loss=0.01113, over 15634.00 frames. ], tot_loss[loss=0.08677, simple_loss=0.1062, pruned_loss=0.02338, audio_tagging_loss=0.01028, over 3046383.95 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:52:42,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746800.0, ans=0.1 2023-11-19 17:52:46,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.531e+01 9.323e+01 1.047e+02 1.478e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 17:52:47,316 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-19 17:52:53,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=746866.6666666666, ans=0.125 2023-11-19 17:53:12,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=746933.3333333334, ans=0.125 2023-11-19 17:53:17,000 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112050 2023-11-19 17:53:20,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=747000.0, ans=0.125 2023-11-19 17:53:25,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747000.0, ans=0.1 2023-11-19 17:53:31,509 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3850, loss[loss=0.08296, simple_loss=0.1081, pruned_loss=0.01848, audio_tagging_loss=0.01043, over 14941.00 frames. ], tot_loss[loss=0.08655, simple_loss=0.1058, pruned_loss=0.02319, audio_tagging_loss=0.01045, over 3042296.92 frames. ], batch size: 55, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:53:37,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=747066.6666666666, ans=0.125 2023-11-19 17:53:40,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=747066.6666666666, ans=0.0 2023-11-19 17:53:48,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=747133.3333333334, ans=0.125 2023-11-19 17:53:51,973 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.49 vs. limit=22.5 2023-11-19 17:54:01,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=747200.0, ans=0.125 2023-11-19 17:54:13,589 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2023-11-19 17:54:20,336 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112100 2023-11-19 17:54:30,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=747333.3333333334, ans=0.2 2023-11-19 17:54:35,300 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3900, loss[loss=0.1346, simple_loss=0.1433, pruned_loss=0.05597, audio_tagging_loss=0.007019, over 15115.00 frames. ], tot_loss[loss=0.08608, simple_loss=0.1047, pruned_loss=0.02322, audio_tagging_loss=0.01053, over 3042549.70 frames. ], batch size: 55, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:54:38,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=747400.0, ans=0.125 2023-11-19 17:54:53,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-19 17:54:55,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.123e+01 8.433e+01 9.481e+01 1.017e+02 1.565e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 17:54:59,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=747466.6666666666, ans=0.0 2023-11-19 17:55:09,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=747533.3333333334, ans=0.125 2023-11-19 17:55:24,489 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112150 2023-11-19 17:55:31,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=747666.6666666666, ans=0.02 2023-11-19 17:55:39,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=747733.3333333334, ans=0.0 2023-11-19 17:55:40,517 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 3950, loss[loss=0.09899, simple_loss=0.1337, pruned_loss=0.02452, audio_tagging_loss=0.007601, over 15895.00 frames. ], tot_loss[loss=0.08556, simple_loss=0.1043, pruned_loss=0.02283, audio_tagging_loss=0.01056, over 3057202.22 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:55:49,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=747733.3333333334, ans=0.125 2023-11-19 17:55:52,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747800.0, ans=0.1 2023-11-19 17:55:52,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2023-11-19 17:56:04,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747866.6666666666, ans=0.1 2023-11-19 17:56:07,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=22.5 2023-11-19 17:56:09,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=747866.6666666666, ans=0.0 2023-11-19 17:56:20,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747933.3333333334, ans=0.1 2023-11-19 17:56:29,207 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112200 2023-11-19 17:56:34,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=748000.0, ans=0.0 2023-11-19 17:56:37,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=748000.0, ans=0.125 2023-11-19 17:56:43,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=748066.6666666666, ans=0.2 2023-11-19 17:56:44,791 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4000, loss[loss=0.08145, simple_loss=0.09234, pruned_loss=0.02027, audio_tagging_loss=0.01501, over 15140.00 frames. ], tot_loss[loss=0.08651, simple_loss=0.1051, pruned_loss=0.0232, audio_tagging_loss=0.01078, over 3050677.10 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:56:59,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=748133.3333333334, ans=0.0 2023-11-19 17:57:02,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=748133.3333333334, ans=0.0 2023-11-19 17:57:04,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.458e+01 9.188e+01 1.037e+02 1.473e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 17:57:34,112 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112250 2023-11-19 17:57:34,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=748266.6666666666, ans=0.125 2023-11-19 17:57:46,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=748333.3333333334, ans=0.125 2023-11-19 17:57:48,609 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4050, loss[loss=0.0855, simple_loss=0.1035, pruned_loss=0.02458, audio_tagging_loss=0.00918, over 15681.00 frames. ], tot_loss[loss=0.08709, simple_loss=0.1058, pruned_loss=0.02347, audio_tagging_loss=0.01074, over 3047565.74 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:57:51,097 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:57:56,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-11-19 17:58:09,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=12.0 2023-11-19 17:58:35,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=748600.0, ans=0.125 2023-11-19 17:58:37,386 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112300 2023-11-19 17:58:38,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=748666.6666666666, ans=0.125 2023-11-19 17:58:43,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=748666.6666666666, ans=0.125 2023-11-19 17:58:49,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2023-11-19 17:58:52,496 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4100, loss[loss=0.09098, simple_loss=0.1124, pruned_loss=0.02354, audio_tagging_loss=0.01126, over 15380.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1053, pruned_loss=0.02324, audio_tagging_loss=0.01074, over 3036278.04 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:58:58,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=748733.3333333334, ans=0.125 2023-11-19 17:59:13,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.246e+01 9.038e+01 9.964e+01 1.289e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 17:59:28,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=748866.6666666666, ans=0.2 2023-11-19 17:59:42,672 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112350 2023-11-19 17:59:51,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=749000.0, ans=0.125 2023-11-19 17:59:57,408 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4150, loss[loss=0.07421, simple_loss=0.08647, pruned_loss=0.02137, audio_tagging_loss=0.009605, over 15915.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1048, pruned_loss=0.02304, audio_tagging_loss=0.01055, over 3034373.37 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 18:00:07,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=749066.6666666666, ans=0.0 2023-11-19 18:00:17,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=749133.3333333334, ans=0.125 2023-11-19 18:00:43,995 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:00:46,525 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112400 2023-11-19 18:01:01,958 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4200, loss[loss=0.06581, simple_loss=0.07808, pruned_loss=0.01593, audio_tagging_loss=0.01084, over 15144.00 frames. ], tot_loss[loss=0.08621, simple_loss=0.1053, pruned_loss=0.0232, audio_tagging_loss=0.01036, over 3030988.24 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 18:01:23,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.470e+01 8.967e+01 9.932e+01 1.345e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 18:01:50,892 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112450 2023-11-19 18:02:05,492 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4250, loss[loss=0.06291, simple_loss=0.07181, pruned_loss=0.01401, audio_tagging_loss=0.01299, over 16014.00 frames. ], tot_loss[loss=0.08518, simple_loss=0.1041, pruned_loss=0.02276, audio_tagging_loss=0.01036, over 3032640.13 frames. ], batch size: 63, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:02:22,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-11-19 18:02:30,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=749866.6666666666, ans=0.0 2023-11-19 18:02:51,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=749933.3333333334, ans=15.0 2023-11-19 18:02:54,485 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112500 2023-11-19 18:02:58,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=750000.0, ans=0.0 2023-11-19 18:03:00,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=750000.0, ans=0.0 2023-11-19 18:03:07,381 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2023-11-19 18:03:10,402 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4300, loss[loss=0.08998, simple_loss=0.1115, pruned_loss=0.02247, audio_tagging_loss=0.01174, over 17138.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1055, pruned_loss=0.02308, audio_tagging_loss=0.01013, over 3034890.89 frames. ], batch size: 65, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:03:14,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=750066.6666666666, ans=0.125 2023-11-19 18:03:16,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2023-11-19 18:03:18,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.74 vs. limit=10.0 2023-11-19 18:03:26,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=750133.3333333334, ans=0.125 2023-11-19 18:03:29,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=750133.3333333334, ans=0.125 2023-11-19 18:03:31,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.838e+01 9.432e+01 1.009e+02 1.921e+02, threshold=1.886e+02, percent-clipped=1.0 2023-11-19 18:03:33,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-19 18:03:59,309 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112550 2023-11-19 18:04:14,912 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4350, loss[loss=0.1014, simple_loss=0.128, pruned_loss=0.02751, audio_tagging_loss=0.009896, over 15599.00 frames. ], tot_loss[loss=0.08565, simple_loss=0.105, pruned_loss=0.02298, audio_tagging_loss=0.01015, over 3030619.49 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:04:17,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=750400.0, ans=0.125 2023-11-19 18:04:26,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750466.6666666666, ans=0.1 2023-11-19 18:04:31,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2023-11-19 18:04:39,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=750533.3333333334, ans=0.125 2023-11-19 18:05:03,460 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112600 2023-11-19 18:05:04,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.56 vs. limit=10.0 2023-11-19 18:05:11,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=750666.6666666666, ans=0.05 2023-11-19 18:05:18,574 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4400, loss[loss=0.07204, simple_loss=0.08441, pruned_loss=0.01867, audio_tagging_loss=0.01116, over 13634.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1049, pruned_loss=0.02305, audio_tagging_loss=0.01024, over 3025095.60 frames. ], batch size: 53, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:05:21,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=750733.3333333334, ans=0.125 2023-11-19 18:05:27,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=750733.3333333334, ans=0.2 2023-11-19 18:05:40,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.205e+01 8.716e+01 9.862e+01 1.282e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-19 18:05:43,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=750866.6666666666, ans=0.125 2023-11-19 18:05:47,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=750866.6666666666, ans=0.125 2023-11-19 18:05:57,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=750933.3333333334, ans=0.2 2023-11-19 18:06:00,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=750933.3333333334, ans=0.125 2023-11-19 18:06:01,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750933.3333333334, ans=0.1 2023-11-19 18:06:06,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=15.0 2023-11-19 18:06:07,251 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112650 2023-11-19 18:06:23,243 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4450, loss[loss=0.1293, simple_loss=0.1655, pruned_loss=0.03889, audio_tagging_loss=0.007636, over 15910.00 frames. ], tot_loss[loss=0.08589, simple_loss=0.105, pruned_loss=0.02312, audio_tagging_loss=0.01028, over 3032100.05 frames. ], batch size: 54, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:06:27,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=751066.6666666666, ans=0.0 2023-11-19 18:06:36,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=751133.3333333334, ans=0.125 2023-11-19 18:06:39,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=751133.3333333334, ans=0.125 2023-11-19 18:06:53,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=751200.0, ans=0.125 2023-11-19 18:07:12,209 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112700 2023-11-19 18:07:13,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=751333.3333333334, ans=0.125 2023-11-19 18:07:15,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2023-11-19 18:07:26,860 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4500, loss[loss=0.07381, simple_loss=0.08901, pruned_loss=0.01974, audio_tagging_loss=0.009558, over 15890.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1052, pruned_loss=0.0231, audio_tagging_loss=0.01028, over 3037854.94 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:07:36,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=751400.0, ans=0.0 2023-11-19 18:07:48,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.242e+01 8.889e+01 9.724e+01 1.502e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 18:07:48,960 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.31 vs. limit=10.0 2023-11-19 18:07:57,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=15.0 2023-11-19 18:08:15,542 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112750 2023-11-19 18:08:30,954 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4550, loss[loss=0.08731, simple_loss=0.1094, pruned_loss=0.02178, audio_tagging_loss=0.0108, over 15510.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1052, pruned_loss=0.02312, audio_tagging_loss=0.01025, over 3037806.20 frames. ], batch size: 57, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:09:09,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=751933.3333333334, ans=0.125 2023-11-19 18:09:13,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=751933.3333333334, ans=0.0 2023-11-19 18:09:16,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2023-11-19 18:09:19,659 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:09:19,735 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112800 2023-11-19 18:09:23,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2023-11-19 18:09:29,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=752000.0, ans=0.0 2023-11-19 18:09:36,074 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4600, loss[loss=0.08629, simple_loss=0.1059, pruned_loss=0.02283, audio_tagging_loss=0.0105, over 14998.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.1049, pruned_loss=0.02302, audio_tagging_loss=0.01031, over 3035553.94 frames. ], batch size: 55, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:09:44,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=752066.6666666666, ans=0.07 2023-11-19 18:09:56,975 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.205e+01 8.855e+01 9.599e+01 1.553e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 18:10:06,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=752200.0, ans=0.125 2023-11-19 18:10:20,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=752266.6666666666, ans=0.125 2023-11-19 18:10:24,875 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112850 2023-11-19 18:10:26,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752333.3333333334, ans=0.1 2023-11-19 18:10:39,525 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4650, loss[loss=0.09338, simple_loss=0.1069, pruned_loss=0.02772, audio_tagging_loss=0.01223, over 14302.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1037, pruned_loss=0.02289, audio_tagging_loss=0.01047, over 3041770.78 frames. ], batch size: 57, lr: 6.98e-03, grad_scale: 16.0 2023-11-19 18:10:46,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=752400.0, ans=0.0 2023-11-19 18:10:56,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752466.6666666666, ans=0.1 2023-11-19 18:11:12,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=752533.3333333334, ans=0.0 2023-11-19 18:11:22,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=752600.0, ans=0.2 2023-11-19 18:11:25,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=752600.0, ans=0.1 2023-11-19 18:11:28,745 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112900 2023-11-19 18:11:31,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752666.6666666666, ans=0.1 2023-11-19 18:11:43,078 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4700, loss[loss=0.08328, simple_loss=0.1002, pruned_loss=0.02008, audio_tagging_loss=0.01312, over 15603.00 frames. ], tot_loss[loss=0.0853, simple_loss=0.1039, pruned_loss=0.0228, audio_tagging_loss=0.01055, over 3043489.01 frames. ], batch size: 57, lr: 6.97e-03, grad_scale: 16.0 2023-11-19 18:12:06,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.509e+01 9.287e+01 1.024e+02 1.353e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 18:12:23,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-19 18:12:31,872 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 112950 2023-11-19 18:12:36,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2023-11-19 18:12:49,172 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4750, loss[loss=0.1191, simple_loss=0.1479, pruned_loss=0.03473, audio_tagging_loss=0.01042, over 15206.00 frames. ], tot_loss[loss=0.08481, simple_loss=0.1033, pruned_loss=0.02272, audio_tagging_loss=0.01046, over 3037242.89 frames. ], batch size: 54, lr: 6.97e-03, grad_scale: 16.0 2023-11-19 18:13:07,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=12.0 2023-11-19 18:13:37,916 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113000 2023-11-19 18:13:43,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753333.3333333334, ans=0.1 2023-11-19 18:13:46,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-11-19 18:13:47,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=753333.3333333334, ans=0.2 2023-11-19 18:13:52,999 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4800, loss[loss=0.07999, simple_loss=0.09618, pruned_loss=0.02014, audio_tagging_loss=0.01177, over 14953.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1036, pruned_loss=0.02291, audio_tagging_loss=0.01049, over 3039048.01 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:13:56,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=753400.0, ans=0.125 2023-11-19 18:13:57,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=753400.0, ans=0.125 2023-11-19 18:14:00,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753400.0, ans=0.125 2023-11-19 18:14:15,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.505e+01 9.415e+01 1.014e+02 1.501e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 18:14:24,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=753533.3333333334, ans=0.0 2023-11-19 18:14:41,517 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113050 2023-11-19 18:14:50,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=753666.6666666666, ans=0.0 2023-11-19 18:14:55,999 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4850, loss[loss=0.06522, simple_loss=0.07026, pruned_loss=0.01764, audio_tagging_loss=0.01245, over 14694.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1043, pruned_loss=0.02286, audio_tagging_loss=0.01061, over 3041877.61 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:15:01,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2023-11-19 18:15:03,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=753733.3333333334, ans=0.125 2023-11-19 18:15:44,953 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113100 2023-11-19 18:16:01,407 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4900, loss[loss=0.08098, simple_loss=0.1013, pruned_loss=0.01819, audio_tagging_loss=0.01213, over 14938.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.1048, pruned_loss=0.02281, audio_tagging_loss=0.01059, over 3035038.07 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:16:07,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-11-19 18:16:23,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.196e+01 8.687e+01 9.230e+01 1.120e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-19 18:16:49,913 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113150 2023-11-19 18:16:50,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754266.6666666666, ans=0.1 2023-11-19 18:16:56,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-19 18:17:00,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=754333.3333333334, ans=0.2 2023-11-19 18:17:04,383 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 4950, loss[loss=0.08984, simple_loss=0.1151, pruned_loss=0.02231, audio_tagging_loss=0.009973, over 14977.00 frames. ], tot_loss[loss=0.08569, simple_loss=0.1049, pruned_loss=0.0228, audio_tagging_loss=0.01042, over 3039240.62 frames. ], batch size: 54, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:17:06,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=754400.0, ans=0.95 2023-11-19 18:17:08,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=754400.0, ans=0.2 2023-11-19 18:17:14,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754400.0, ans=0.1 2023-11-19 18:17:20,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-19 18:17:38,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=754533.3333333334, ans=0.0 2023-11-19 18:17:46,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=754600.0, ans=0.125 2023-11-19 18:17:52,789 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113200 2023-11-19 18:17:54,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=754666.6666666666, ans=0.0 2023-11-19 18:18:07,775 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5000, loss[loss=0.0748, simple_loss=0.09448, pruned_loss=0.01872, audio_tagging_loss=0.00884, over 16077.00 frames. ], tot_loss[loss=0.08577, simple_loss=0.1054, pruned_loss=0.02279, audio_tagging_loss=0.01029, over 3044968.06 frames. ], batch size: 61, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:18:14,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=754733.3333333334, ans=0.2 2023-11-19 18:18:31,430 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.065e+01 8.852e+01 9.668e+01 1.212e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 18:18:31,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=754800.0, ans=0.125 2023-11-19 18:18:55,829 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113250 2023-11-19 18:19:00,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=755000.0, ans=0.0 2023-11-19 18:19:04,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=755000.0, ans=0.1 2023-11-19 18:19:11,136 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5050, loss[loss=0.07556, simple_loss=0.1046, pruned_loss=0.01712, audio_tagging_loss=0.006143, over 15300.00 frames. ], tot_loss[loss=0.08594, simple_loss=0.1055, pruned_loss=0.02289, audio_tagging_loss=0.01027, over 3052362.14 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:19:36,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=755200.0, ans=0.125 2023-11-19 18:19:53,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=15.0 2023-11-19 18:19:59,442 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113300 2023-11-19 18:20:04,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=755333.3333333334, ans=0.125 2023-11-19 18:20:04,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=755333.3333333334, ans=0.125 2023-11-19 18:20:15,871 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5100, loss[loss=0.1111, simple_loss=0.1391, pruned_loss=0.03224, audio_tagging_loss=0.009338, over 16054.00 frames. ], tot_loss[loss=0.08596, simple_loss=0.1057, pruned_loss=0.02292, audio_tagging_loss=0.01019, over 3043915.56 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:20:25,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755400.0, ans=0.1 2023-11-19 18:20:29,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=755466.6666666666, ans=0.125 2023-11-19 18:20:38,100 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.347e+01 7.917e+01 8.813e+01 9.876e+01 1.323e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 18:20:43,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=755533.3333333334, ans=0.0 2023-11-19 18:20:49,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=755533.3333333334, ans=0.0 2023-11-19 18:20:49,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-11-19 18:20:53,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=755600.0, ans=0.0 2023-11-19 18:20:58,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2023-11-19 18:21:01,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=22.5 2023-11-19 18:21:02,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=755600.0, ans=0.125 2023-11-19 18:21:04,699 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113350 2023-11-19 18:21:17,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=755666.6666666666, ans=15.0 2023-11-19 18:21:19,276 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5150, loss[loss=0.06879, simple_loss=0.0818, pruned_loss=0.01789, audio_tagging_loss=0.01, over 14647.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1048, pruned_loss=0.02254, audio_tagging_loss=0.01023, over 3047800.64 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:21:31,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=755800.0, ans=0.125 2023-11-19 18:21:31,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=755800.0, ans=0.125 2023-11-19 18:21:31,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=755800.0, ans=0.0 2023-11-19 18:21:35,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=15.0 2023-11-19 18:21:53,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=755866.6666666666, ans=0.125 2023-11-19 18:21:58,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=755933.3333333334, ans=0.0 2023-11-19 18:22:07,689 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113400 2023-11-19 18:22:07,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=755933.3333333334, ans=0.125 2023-11-19 18:22:18,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=756000.0, ans=0.2 2023-11-19 18:22:21,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=756066.6666666666, ans=0.125 2023-11-19 18:22:22,824 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5200, loss[loss=0.09976, simple_loss=0.1292, pruned_loss=0.02738, audio_tagging_loss=0.007789, over 15551.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.1058, pruned_loss=0.02303, audio_tagging_loss=0.01023, over 3046093.93 frames. ], batch size: 59, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:22:29,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2023-11-19 18:22:44,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-19 18:22:45,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756133.3333333334, ans=0.1 2023-11-19 18:22:46,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.495e+01 9.298e+01 1.017e+02 1.203e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 18:22:48,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=756200.0, ans=0.0 2023-11-19 18:23:06,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=756266.6666666666, ans=0.125 2023-11-19 18:23:11,162 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113450 2023-11-19 18:23:27,290 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5250, loss[loss=0.09059, simple_loss=0.114, pruned_loss=0.0238, audio_tagging_loss=0.009804, over 15921.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.1055, pruned_loss=0.02286, audio_tagging_loss=0.01017, over 3044437.60 frames. ], batch size: 59, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:23:40,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756466.6666666666, ans=0.1 2023-11-19 18:23:43,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756466.6666666666, ans=0.125 2023-11-19 18:23:51,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-19 18:23:55,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=756533.3333333334, ans=0.125 2023-11-19 18:24:14,549 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113500 2023-11-19 18:24:29,869 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5300, loss[loss=0.09387, simple_loss=0.1059, pruned_loss=0.02774, audio_tagging_loss=0.01317, over 14908.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.106, pruned_loss=0.02295, audio_tagging_loss=0.01021, over 3034908.55 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:24:40,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-19 18:24:53,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.322e+01 9.046e+01 9.978e+01 1.366e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 18:25:13,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=22.5 2023-11-19 18:25:19,321 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113550 2023-11-19 18:25:19,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=756933.3333333334, ans=0.0 2023-11-19 18:25:31,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757000.0, ans=0.125 2023-11-19 18:25:33,896 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5350, loss[loss=0.06847, simple_loss=0.08227, pruned_loss=0.01975, audio_tagging_loss=0.007578, over 15106.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1058, pruned_loss=0.02292, audio_tagging_loss=0.01023, over 3039284.74 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:25:36,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757066.6666666666, ans=0.0 2023-11-19 18:25:39,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=757066.6666666666, ans=0.2 2023-11-19 18:26:07,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=757200.0, ans=0.125 2023-11-19 18:26:15,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=757266.6666666666, ans=0.0 2023-11-19 18:26:22,802 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113600 2023-11-19 18:26:30,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=757333.3333333334, ans=0.125 2023-11-19 18:26:39,007 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5400, loss[loss=0.0925, simple_loss=0.1107, pruned_loss=0.02583, audio_tagging_loss=0.0113, over 14720.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.106, pruned_loss=0.0229, audio_tagging_loss=0.01024, over 3044591.43 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:26:53,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=757466.6666666666, ans=0.125 2023-11-19 18:26:57,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=757466.6666666666, ans=0.125 2023-11-19 18:27:01,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=757466.6666666666, ans=0.0 2023-11-19 18:27:02,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.152e+01 8.655e+01 9.837e+01 1.259e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-19 18:27:10,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=757533.3333333334, ans=0.05 2023-11-19 18:27:27,995 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113650 2023-11-19 18:27:36,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=757666.6666666666, ans=0.125 2023-11-19 18:27:41,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757666.6666666666, ans=0.0 2023-11-19 18:27:43,283 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5450, loss[loss=0.09264, simple_loss=0.1101, pruned_loss=0.0254, audio_tagging_loss=0.01217, over 14338.00 frames. ], tot_loss[loss=0.08675, simple_loss=0.1066, pruned_loss=0.02317, audio_tagging_loss=0.01028, over 3041539.12 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:27:43,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=757733.3333333334, ans=0.0 2023-11-19 18:27:50,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=757733.3333333334, ans=0.0 2023-11-19 18:27:57,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2023-11-19 18:28:21,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=757933.3333333334, ans=0.0 2023-11-19 18:28:32,018 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113700 2023-11-19 18:28:46,393 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5500, loss[loss=0.08685, simple_loss=0.1088, pruned_loss=0.01942, audio_tagging_loss=0.01304, over 14657.00 frames. ], tot_loss[loss=0.08609, simple_loss=0.1056, pruned_loss=0.0229, audio_tagging_loss=0.01037, over 3038101.52 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 16.0 2023-11-19 18:28:48,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-11-19 18:28:49,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=758066.6666666666, ans=0.125 2023-11-19 18:28:50,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=758066.6666666666, ans=0.125 2023-11-19 18:28:50,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=758066.6666666666, ans=0.125 2023-11-19 18:29:01,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758133.3333333334, ans=0.1 2023-11-19 18:29:10,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.267e+01 8.902e+01 9.734e+01 1.914e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 18:29:14,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=758200.0, ans=0.125 2023-11-19 18:29:29,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2023-11-19 18:29:31,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=758266.6666666666, ans=0.0 2023-11-19 18:29:34,951 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113750 2023-11-19 18:29:39,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=758333.3333333334, ans=0.2 2023-11-19 18:29:47,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=758333.3333333334, ans=0.2 2023-11-19 18:29:48,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=758333.3333333334, ans=0.125 2023-11-19 18:29:51,183 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5550, loss[loss=0.08541, simple_loss=0.1033, pruned_loss=0.02204, audio_tagging_loss=0.01173, over 13639.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.105, pruned_loss=0.02282, audio_tagging_loss=0.01043, over 3036596.32 frames. ], batch size: 53, lr: 6.95e-03, grad_scale: 16.0 2023-11-19 18:29:51,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=758400.0, ans=0.0 2023-11-19 18:29:55,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=758400.0, ans=0.125 2023-11-19 18:29:55,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758400.0, ans=0.1 2023-11-19 18:29:56,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758400.0, ans=0.1 2023-11-19 18:29:59,862 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:30:02,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=758466.6666666666, ans=0.0 2023-11-19 18:30:07,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=758466.6666666666, ans=0.07 2023-11-19 18:30:38,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=758600.0, ans=0.0 2023-11-19 18:30:39,388 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113800 2023-11-19 18:30:39,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=758600.0, ans=0.125 2023-11-19 18:30:51,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-19 18:30:53,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=758733.3333333334, ans=0.025 2023-11-19 18:30:54,384 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5600, loss[loss=0.08621, simple_loss=0.1082, pruned_loss=0.02226, audio_tagging_loss=0.009849, over 15424.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.1062, pruned_loss=0.02309, audio_tagging_loss=0.01049, over 3042381.65 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:30:54,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=758733.3333333334, ans=0.125 2023-11-19 18:31:15,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=758800.0, ans=0.0 2023-11-19 18:31:18,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.485e+01 9.378e+01 1.023e+02 2.129e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-19 18:31:22,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=758866.6666666666, ans=0.125 2023-11-19 18:31:29,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758866.6666666666, ans=0.1 2023-11-19 18:31:35,496 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:31:39,019 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:31:43,964 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113850 2023-11-19 18:31:53,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-19 18:31:59,320 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5650, loss[loss=0.08442, simple_loss=0.1002, pruned_loss=0.02328, audio_tagging_loss=0.01103, over 14954.00 frames. ], tot_loss[loss=0.08694, simple_loss=0.1064, pruned_loss=0.02318, audio_tagging_loss=0.01058, over 3043135.94 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:32:01,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=759066.6666666666, ans=0.125 2023-11-19 18:32:06,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=22.5 2023-11-19 18:32:06,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=759066.6666666666, ans=0.125 2023-11-19 18:32:29,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759200.0, ans=0.1 2023-11-19 18:32:40,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=759266.6666666666, ans=0.125 2023-11-19 18:32:43,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=759266.6666666666, ans=0.0 2023-11-19 18:32:48,353 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113900 2023-11-19 18:32:48,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=759266.6666666666, ans=0.0 2023-11-19 18:32:52,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2023-11-19 18:33:04,047 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5700, loss[loss=0.09246, simple_loss=0.1187, pruned_loss=0.02441, audio_tagging_loss=0.008716, over 15648.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1056, pruned_loss=0.02316, audio_tagging_loss=0.01056, over 3036515.27 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:33:14,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=759400.0, ans=0.1 2023-11-19 18:33:29,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=759533.3333333334, ans=0.0 2023-11-19 18:33:29,967 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.509e+01 9.324e+01 1.031e+02 1.317e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 18:33:53,317 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 113950 2023-11-19 18:33:55,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=759666.6666666666, ans=0.125 2023-11-19 18:34:00,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=759666.6666666666, ans=0.125 2023-11-19 18:34:02,474 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:34:08,415 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5750, loss[loss=0.07944, simple_loss=0.09747, pruned_loss=0.0185, audio_tagging_loss=0.01221, over 13920.00 frames. ], tot_loss[loss=0.08616, simple_loss=0.1053, pruned_loss=0.02302, audio_tagging_loss=0.01049, over 3039667.85 frames. ], batch size: 52, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:34:21,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=759800.0, ans=0.125 2023-11-19 18:34:30,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=759800.0, ans=0.125 2023-11-19 18:34:30,638 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2023-11-19 18:34:39,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=759866.6666666666, ans=0.125 2023-11-19 18:34:46,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759933.3333333334, ans=0.1 2023-11-19 18:34:53,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=759933.3333333334, ans=0.125 2023-11-19 18:34:57,385 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114000 2023-11-19 18:35:04,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=12.0 2023-11-19 18:35:06,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=760000.0, ans=0.0 2023-11-19 18:35:12,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-19 18:35:12,679 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5800, loss[loss=0.05921, simple_loss=0.06518, pruned_loss=0.01507, audio_tagging_loss=0.01156, over 13933.00 frames. ], tot_loss[loss=0.0865, simple_loss=0.1059, pruned_loss=0.02325, audio_tagging_loss=0.01033, over 3038581.58 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:35:28,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760133.3333333334, ans=0.1 2023-11-19 18:35:30,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=760133.3333333334, ans=0.125 2023-11-19 18:35:30,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-19 18:35:38,827 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.284e+01 9.012e+01 9.674e+01 1.297e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 18:35:51,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=760266.6666666666, ans=0.125 2023-11-19 18:35:59,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=760266.6666666666, ans=0.125 2023-11-19 18:36:02,334 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114050 2023-11-19 18:36:13,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=760333.3333333334, ans=0.125 2023-11-19 18:36:17,839 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5850, loss[loss=0.0875, simple_loss=0.1043, pruned_loss=0.02342, audio_tagging_loss=0.01194, over 14731.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1056, pruned_loss=0.02312, audio_tagging_loss=0.01034, over 3034430.96 frames. ], batch size: 54, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:36:25,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760400.0, ans=0.1 2023-11-19 18:36:26,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.52 vs. limit=22.5 2023-11-19 18:37:05,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=760600.0, ans=0.125 2023-11-19 18:37:07,041 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114100 2023-11-19 18:37:22,288 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5900, loss[loss=0.07943, simple_loss=0.09987, pruned_loss=0.02189, audio_tagging_loss=0.007606, over 16282.00 frames. ], tot_loss[loss=0.0858, simple_loss=0.1053, pruned_loss=0.02293, audio_tagging_loss=0.01023, over 3040589.55 frames. ], batch size: 60, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:37:34,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=760800.0, ans=0.2 2023-11-19 18:37:44,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=760800.0, ans=0.2 2023-11-19 18:37:47,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.348e+01 9.268e+01 1.091e+02 1.395e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-19 18:38:11,925 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114150 2023-11-19 18:38:14,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=761000.0, ans=0.125 2023-11-19 18:38:26,617 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 5950, loss[loss=0.08436, simple_loss=0.1057, pruned_loss=0.0208, audio_tagging_loss=0.01072, over 16678.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1054, pruned_loss=0.02291, audio_tagging_loss=0.01016, over 3045208.99 frames. ], batch size: 62, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:38:30,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=761066.6666666666, ans=0.0 2023-11-19 18:38:33,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761066.6666666666, ans=0.1 2023-11-19 18:38:44,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=761133.3333333334, ans=0.07 2023-11-19 18:39:15,915 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114200 2023-11-19 18:39:31,886 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6000, loss[loss=0.09571, simple_loss=0.1291, pruned_loss=0.0223, audio_tagging_loss=0.00885, over 15758.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1048, pruned_loss=0.02285, audio_tagging_loss=0.0102, over 3039695.78 frames. ], batch size: 58, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:39:31,887 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-19 18:40:05,099 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7229, 5.7594, 5.8325, 5.8930], device='cuda:2') 2023-11-19 18:40:07,246 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5194, 2.6641, 3.6872, 3.2167], device='cuda:2') 2023-11-19 18:40:12,640 INFO [train_asr.py:1294] (2/4) Epoch 10, validation: loss=0.06357, simple_loss=0.05534, pruned_loss=0.006382, audio_tagging_loss=0.02952, over 4681554.00 frames. 2023-11-19 18:40:12,641 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-19 18:40:19,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=761400.0, ans=0.0 2023-11-19 18:40:21,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=761400.0, ans=0.125 2023-11-19 18:40:38,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.242e+01 9.055e+01 9.883e+01 1.211e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 18:40:43,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=761533.3333333334, ans=0.2 2023-11-19 18:40:55,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=761600.0, ans=15.0 2023-11-19 18:40:58,310 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:41:02,017 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114250 2023-11-19 18:41:09,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=761666.6666666666, ans=0.0 2023-11-19 18:41:17,065 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6050, loss[loss=0.05426, simple_loss=0.05765, pruned_loss=0.01359, audio_tagging_loss=0.01184, over 14905.00 frames. ], tot_loss[loss=0.08516, simple_loss=0.1044, pruned_loss=0.02273, audio_tagging_loss=0.01022, over 3033762.09 frames. ], batch size: 59, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:41:18,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761733.3333333334, ans=0.1 2023-11-19 18:41:18,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=761733.3333333334, ans=0.125 2023-11-19 18:41:40,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2023-11-19 18:41:53,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-19 18:41:59,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761933.3333333334, ans=0.0 2023-11-19 18:42:02,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=12.0 2023-11-19 18:42:04,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=761933.3333333334, ans=0.125 2023-11-19 18:42:06,889 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114300 2023-11-19 18:42:18,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762000.0, ans=0.1 2023-11-19 18:42:23,027 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6100, loss[loss=0.07318, simple_loss=0.08699, pruned_loss=0.01953, audio_tagging_loss=0.01016, over 14492.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1051, pruned_loss=0.02288, audio_tagging_loss=0.01024, over 3042047.24 frames. ], batch size: 53, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:42:48,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.508e+01 9.449e+01 1.032e+02 1.447e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-19 18:43:13,165 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114350 2023-11-19 18:43:26,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=762333.3333333334, ans=0.0 2023-11-19 18:43:28,454 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6150, loss[loss=0.08136, simple_loss=0.09368, pruned_loss=0.02393, audio_tagging_loss=0.01058, over 16103.00 frames. ], tot_loss[loss=0.08549, simple_loss=0.1046, pruned_loss=0.02287, audio_tagging_loss=0.01034, over 3048810.22 frames. ], batch size: 62, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:43:30,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-19 18:43:36,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2023-11-19 18:43:46,772 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-19 18:43:47,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=762466.6666666666, ans=0.125 2023-11-19 18:43:47,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=762466.6666666666, ans=0.0 2023-11-19 18:44:09,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=762600.0, ans=0.125 2023-11-19 18:44:18,108 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114400 2023-11-19 18:44:22,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-19 18:44:28,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=762666.6666666666, ans=0.0 2023-11-19 18:44:33,329 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6200, loss[loss=0.1096, simple_loss=0.1411, pruned_loss=0.03284, audio_tagging_loss=0.006195, over 14867.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1048, pruned_loss=0.0229, audio_tagging_loss=0.01038, over 3048917.22 frames. ], batch size: 53, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:44:34,022 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-19 18:44:37,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=762733.3333333334, ans=0.0 2023-11-19 18:44:43,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=762733.3333333334, ans=0.1 2023-11-19 18:45:00,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.346e+01 9.010e+01 9.734e+01 1.303e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 18:45:12,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2023-11-19 18:45:23,160 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114450 2023-11-19 18:45:39,183 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6250, loss[loss=0.1001, simple_loss=0.1339, pruned_loss=0.02542, audio_tagging_loss=0.007692, over 15063.00 frames. ], tot_loss[loss=0.08533, simple_loss=0.1042, pruned_loss=0.02276, audio_tagging_loss=0.01047, over 3043798.17 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:45:52,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763133.3333333334, ans=0.1 2023-11-19 18:45:59,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=763133.3333333334, ans=0.1 2023-11-19 18:46:06,941 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.127e-03 2023-11-19 18:46:09,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=763200.0, ans=0.0 2023-11-19 18:46:28,950 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114500 2023-11-19 18:46:34,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=763333.3333333334, ans=0.0 2023-11-19 18:46:44,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=763400.0, ans=0.0 2023-11-19 18:46:45,271 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6300, loss[loss=0.07445, simple_loss=0.09987, pruned_loss=0.01644, audio_tagging_loss=0.008075, over 15251.00 frames. ], tot_loss[loss=0.08549, simple_loss=0.1044, pruned_loss=0.02263, audio_tagging_loss=0.01064, over 3040097.33 frames. ], batch size: 59, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:46:54,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=763400.0, ans=0.125 2023-11-19 18:47:07,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=763466.6666666666, ans=0.125 2023-11-19 18:47:09,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2023-11-19 18:47:09,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.385e+01 9.179e+01 1.044e+02 1.360e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 18:47:26,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=763600.0, ans=0.125 2023-11-19 18:47:28,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=763600.0, ans=0.125 2023-11-19 18:47:29,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-19 18:47:34,919 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114550 2023-11-19 18:47:35,510 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.25 vs. limit=15.0 2023-11-19 18:47:46,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-19 18:47:49,846 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6350, loss[loss=0.1084, simple_loss=0.132, pruned_loss=0.03325, audio_tagging_loss=0.009125, over 14550.00 frames. ], tot_loss[loss=0.0858, simple_loss=0.1052, pruned_loss=0.02258, audio_tagging_loss=0.01062, over 3038804.03 frames. ], batch size: 57, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:47:56,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=763733.3333333334, ans=0.0 2023-11-19 18:48:10,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=763800.0, ans=0.0 2023-11-19 18:48:18,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2023-11-19 18:48:25,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=763866.6666666666, ans=0.2 2023-11-19 18:48:38,867 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114600 2023-11-19 18:48:46,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=764000.0, ans=0.0 2023-11-19 18:48:53,967 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2023-11-19 18:48:54,875 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6400, loss[loss=0.08043, simple_loss=0.08808, pruned_loss=0.02157, audio_tagging_loss=0.01482, over 15482.00 frames. ], tot_loss[loss=0.08498, simple_loss=0.1041, pruned_loss=0.02222, audio_tagging_loss=0.01073, over 3035391.40 frames. ], batch size: 59, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:49:02,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-11-19 18:49:15,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764133.3333333334, ans=0.1 2023-11-19 18:49:21,740 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.802e+01 8.099e+01 8.680e+01 9.158e+01 1.578e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-19 18:49:40,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=764266.6666666666, ans=0.2 2023-11-19 18:49:44,831 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114650 2023-11-19 18:50:01,288 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6450, loss[loss=0.07013, simple_loss=0.08593, pruned_loss=0.01642, audio_tagging_loss=0.01075, over 15152.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1037, pruned_loss=0.02215, audio_tagging_loss=0.0108, over 3026046.71 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:50:11,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=764400.0, ans=0.125 2023-11-19 18:50:50,148 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114700 2023-11-19 18:51:05,826 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6500, loss[loss=0.1118, simple_loss=0.1372, pruned_loss=0.03355, audio_tagging_loss=0.009592, over 15783.00 frames. ], tot_loss[loss=0.08524, simple_loss=0.1043, pruned_loss=0.02232, audio_tagging_loss=0.01077, over 3034694.02 frames. ], batch size: 61, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:51:07,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=764733.3333333334, ans=0.0 2023-11-19 18:51:08,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764733.3333333334, ans=0.1 2023-11-19 18:51:20,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2023-11-19 18:51:28,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=764800.0, ans=0.125 2023-11-19 18:51:29,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764800.0, ans=0.1 2023-11-19 18:51:32,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.435e+01 9.152e+01 1.009e+02 1.379e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 18:51:41,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764866.6666666666, ans=0.125 2023-11-19 18:51:56,217 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114750 2023-11-19 18:51:57,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=765000.0, ans=0.2 2023-11-19 18:51:58,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=765000.0, ans=0.125 2023-11-19 18:52:11,151 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6550, loss[loss=0.09129, simple_loss=0.1066, pruned_loss=0.02832, audio_tagging_loss=0.009679, over 16481.00 frames. ], tot_loss[loss=0.08467, simple_loss=0.1036, pruned_loss=0.02229, audio_tagging_loss=0.01059, over 3039795.33 frames. ], batch size: 63, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:52:48,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=765200.0, ans=0.05 2023-11-19 18:53:00,872 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114800 2023-11-19 18:53:01,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-19 18:53:17,425 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6600, loss[loss=0.08674, simple_loss=0.1086, pruned_loss=0.02045, audio_tagging_loss=0.01198, over 14940.00 frames. ], tot_loss[loss=0.08445, simple_loss=0.1034, pruned_loss=0.02217, audio_tagging_loss=0.01055, over 3033787.83 frames. ], batch size: 58, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:53:28,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2023-11-19 18:53:40,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=765466.6666666666, ans=0.5 2023-11-19 18:53:42,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.495e+01 9.015e+01 9.763e+01 1.318e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 18:53:48,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=765533.3333333334, ans=0.125 2023-11-19 18:54:07,121 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114850 2023-11-19 18:54:22,835 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6650, loss[loss=0.1021, simple_loss=0.1279, pruned_loss=0.03121, audio_tagging_loss=0.006948, over 15275.00 frames. ], tot_loss[loss=0.08455, simple_loss=0.1037, pruned_loss=0.0223, audio_tagging_loss=0.01041, over 3032276.33 frames. ], batch size: 58, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 18:54:45,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=765800.0, ans=0.2 2023-11-19 18:54:55,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=765866.6666666666, ans=0.125 2023-11-19 18:55:05,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=765933.3333333334, ans=0.2 2023-11-19 18:55:12,757 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114900 2023-11-19 18:55:19,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766000.0, ans=0.1 2023-11-19 18:55:19,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2023-11-19 18:55:27,726 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6700, loss[loss=0.0711, simple_loss=0.09045, pruned_loss=0.0183, audio_tagging_loss=0.007577, over 14810.00 frames. ], tot_loss[loss=0.08473, simple_loss=0.1041, pruned_loss=0.02238, audio_tagging_loss=0.01031, over 3032799.31 frames. ], batch size: 57, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 18:55:46,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=766133.3333333334, ans=0.2 2023-11-19 18:55:55,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.369e+01 9.025e+01 9.789e+01 1.375e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 18:55:56,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=766200.0, ans=0.0 2023-11-19 18:56:15,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=766266.6666666666, ans=0.125 2023-11-19 18:56:17,021 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-19 18:56:17,641 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 114950 2023-11-19 18:56:30,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766333.3333333334, ans=0.1 2023-11-19 18:56:30,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2023-11-19 18:56:34,524 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6750, loss[loss=0.09754, simple_loss=0.126, pruned_loss=0.02569, audio_tagging_loss=0.008862, over 15731.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.1037, pruned_loss=0.0224, audio_tagging_loss=0.01033, over 3030454.69 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 18:56:40,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.31 vs. limit=15.0 2023-11-19 18:56:54,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-11-19 18:57:12,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-19 18:57:14,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=766600.0, ans=0.125 2023-11-19 18:57:24,345 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115000 2023-11-19 18:57:38,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=766733.3333333334, ans=0.125 2023-11-19 18:57:39,590 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6800, loss[loss=0.07011, simple_loss=0.09333, pruned_loss=0.01385, audio_tagging_loss=0.009601, over 16145.00 frames. ], tot_loss[loss=0.0849, simple_loss=0.1039, pruned_loss=0.02256, audio_tagging_loss=0.01039, over 3030559.58 frames. ], batch size: 58, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:57:48,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2023-11-19 18:57:52,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=766800.0, ans=0.1 2023-11-19 18:58:03,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=766800.0, ans=0.0 2023-11-19 18:58:05,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2023-11-19 18:58:07,411 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.232e+01 9.131e+01 9.667e+01 1.376e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-19 18:58:20,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=766933.3333333334, ans=0.0 2023-11-19 18:58:27,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=766933.3333333334, ans=0.125 2023-11-19 18:58:28,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=766933.3333333334, ans=0.125 2023-11-19 18:58:29,257 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115050 2023-11-19 18:58:44,816 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6850, loss[loss=0.07967, simple_loss=0.1073, pruned_loss=0.01842, audio_tagging_loss=0.007593, over 16367.00 frames. ], tot_loss[loss=0.08421, simple_loss=0.103, pruned_loss=0.02231, audio_tagging_loss=0.01038, over 3035278.65 frames. ], batch size: 61, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:59:08,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=22.5 2023-11-19 18:59:34,674 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115100 2023-11-19 18:59:42,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=767333.3333333334, ans=0.2 2023-11-19 18:59:50,458 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6900, loss[loss=0.09456, simple_loss=0.1151, pruned_loss=0.02876, audio_tagging_loss=0.008248, over 14573.00 frames. ], tot_loss[loss=0.08544, simple_loss=0.105, pruned_loss=0.02276, audio_tagging_loss=0.0102, over 3037752.07 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:59:53,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=767400.0, ans=0.125 2023-11-19 19:00:17,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.111e+01 8.753e+01 9.460e+01 1.253e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-19 19:00:40,068 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:00:40,156 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115150 2023-11-19 19:00:49,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=767666.6666666666, ans=0.125 2023-11-19 19:00:55,496 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 6950, loss[loss=0.1093, simple_loss=0.1427, pruned_loss=0.0316, audio_tagging_loss=0.006378, over 14544.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1048, pruned_loss=0.02276, audio_tagging_loss=0.01037, over 3035186.45 frames. ], batch size: 52, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 19:01:21,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=767866.6666666666, ans=0.0 2023-11-19 19:01:28,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=767866.6666666666, ans=0.125 2023-11-19 19:01:45,742 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115200 2023-11-19 19:01:56,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=768000.0, ans=0.0 2023-11-19 19:02:00,890 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7000, loss[loss=0.08276, simple_loss=0.1, pruned_loss=0.02229, audio_tagging_loss=0.01047, over 15612.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1038, pruned_loss=0.02263, audio_tagging_loss=0.01049, over 3036971.17 frames. ], batch size: 59, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:02:04,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=768066.6666666666, ans=0.125 2023-11-19 19:02:27,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=768200.0, ans=0.0 2023-11-19 19:02:29,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.339e+01 9.135e+01 1.019e+02 1.398e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 19:02:43,932 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2023-11-19 19:02:45,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768266.6666666666, ans=0.1 2023-11-19 19:02:50,581 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115250 2023-11-19 19:03:07,207 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7050, loss[loss=0.08175, simple_loss=0.09342, pruned_loss=0.0234, audio_tagging_loss=0.01164, over 14055.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1034, pruned_loss=0.0225, audio_tagging_loss=0.01043, over 3034903.99 frames. ], batch size: 55, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:03:25,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=22.5 2023-11-19 19:03:32,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=768533.3333333334, ans=0.2 2023-11-19 19:03:49,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=768600.0, ans=0.0 2023-11-19 19:03:56,501 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115300 2023-11-19 19:04:03,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=768666.6666666666, ans=0.0 2023-11-19 19:04:11,849 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7100, loss[loss=0.0599, simple_loss=0.05772, pruned_loss=0.01188, audio_tagging_loss=0.01917, over 15543.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1044, pruned_loss=0.02288, audio_tagging_loss=0.01057, over 3039865.32 frames. ], batch size: 62, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:04:18,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2023-11-19 19:04:23,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2023-11-19 19:04:24,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768800.0, ans=0.1 2023-11-19 19:04:35,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=768866.6666666666, ans=0.125 2023-11-19 19:04:38,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.390e+01 9.120e+01 9.831e+01 1.700e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 19:05:01,373 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115350 2023-11-19 19:05:16,486 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7150, loss[loss=0.08332, simple_loss=0.1118, pruned_loss=0.01697, audio_tagging_loss=0.01046, over 14943.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1045, pruned_loss=0.023, audio_tagging_loss=0.01072, over 3039258.02 frames. ], batch size: 54, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:05:44,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=769200.0, ans=0.125 2023-11-19 19:05:46,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=769200.0, ans=0.0 2023-11-19 19:06:06,701 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115400 2023-11-19 19:06:06,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769266.6666666666, ans=0.1 2023-11-19 19:06:08,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=769333.3333333334, ans=0.0 2023-11-19 19:06:17,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=769333.3333333334, ans=0.125 2023-11-19 19:06:23,034 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7200, loss[loss=0.06801, simple_loss=0.082, pruned_loss=0.01382, audio_tagging_loss=0.01319, over 15652.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1034, pruned_loss=0.02266, audio_tagging_loss=0.01076, over 3046403.08 frames. ], batch size: 60, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:06:50,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.178e+01 8.420e+01 9.034e+01 9.720e+01 1.175e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 19:07:05,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=769600.0, ans=0.0 2023-11-19 19:07:13,728 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115450 2023-11-19 19:07:29,112 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7250, loss[loss=0.1009, simple_loss=0.1192, pruned_loss=0.03086, audio_tagging_loss=0.01042, over 15353.00 frames. ], tot_loss[loss=0.08497, simple_loss=0.1033, pruned_loss=0.02249, audio_tagging_loss=0.01084, over 3046550.57 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:07:37,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-11-19 19:07:59,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.96 vs. limit=22.5 2023-11-19 19:08:02,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=769866.6666666666, ans=0.125 2023-11-19 19:08:18,645 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115500 2023-11-19 19:08:28,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=770000.0, ans=0.2 2023-11-19 19:08:33,604 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7300, loss[loss=0.07136, simple_loss=0.08973, pruned_loss=0.01532, audio_tagging_loss=0.01117, over 15357.00 frames. ], tot_loss[loss=0.08454, simple_loss=0.1033, pruned_loss=0.02222, audio_tagging_loss=0.01068, over 3039866.09 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:08:50,353 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:09:02,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.445e+01 8.971e+01 9.866e+01 1.829e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 19:09:23,299 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115550 2023-11-19 19:09:32,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=770333.3333333334, ans=0.0 2023-11-19 19:09:37,903 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7350, loss[loss=0.07209, simple_loss=0.09012, pruned_loss=0.01627, audio_tagging_loss=0.01075, over 14846.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1042, pruned_loss=0.02252, audio_tagging_loss=0.0105, over 3045788.53 frames. ], batch size: 56, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:10:01,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=770466.6666666666, ans=0.125 2023-11-19 19:10:17,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 19:10:25,972 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-19 19:10:26,768 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115600 2023-11-19 19:10:36,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770666.6666666666, ans=0.1 2023-11-19 19:10:42,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770666.6666666666, ans=0.1 2023-11-19 19:10:44,513 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7400, loss[loss=0.07704, simple_loss=0.09279, pruned_loss=0.01771, audio_tagging_loss=0.01293, over 15708.00 frames. ], tot_loss[loss=0.08565, simple_loss=0.105, pruned_loss=0.02273, audio_tagging_loss=0.01042, over 3046032.06 frames. ], batch size: 59, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:10:51,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=770733.3333333334, ans=0.0 2023-11-19 19:11:03,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=770800.0, ans=0.125 2023-11-19 19:11:10,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=770866.6666666666, ans=0.0 2023-11-19 19:11:11,822 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.500e+01 9.123e+01 1.022e+02 1.403e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 19:11:34,145 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115650 2023-11-19 19:11:36,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=771000.0, ans=0.2 2023-11-19 19:11:49,063 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7450, loss[loss=0.05897, simple_loss=0.07353, pruned_loss=0.01273, audio_tagging_loss=0.00947, over 15012.00 frames. ], tot_loss[loss=0.08558, simple_loss=0.1048, pruned_loss=0.02281, audio_tagging_loss=0.01038, over 3049740.05 frames. ], batch size: 60, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:11:51,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=771066.6666666666, ans=0.035 2023-11-19 19:12:00,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=771133.3333333334, ans=0.125 2023-11-19 19:12:27,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2023-11-19 19:12:33,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-11-19 19:12:37,758 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115700 2023-11-19 19:12:41,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=771333.3333333334, ans=0.02 2023-11-19 19:12:49,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2023-11-19 19:12:52,571 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7500, loss[loss=0.0709, simple_loss=0.08749, pruned_loss=0.01813, audio_tagging_loss=0.009035, over 15006.00 frames. ], tot_loss[loss=0.08588, simple_loss=0.1055, pruned_loss=0.0229, audio_tagging_loss=0.01024, over 3052010.27 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:13:11,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-11-19 19:13:22,178 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.878e+01 8.310e+01 8.971e+01 9.844e+01 3.516e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 19:13:40,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=771600.0, ans=0.2 2023-11-19 19:13:41,889 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115750 2023-11-19 19:13:59,056 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7550, loss[loss=0.09144, simple_loss=0.1114, pruned_loss=0.0266, audio_tagging_loss=0.009138, over 13601.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1044, pruned_loss=0.02265, audio_tagging_loss=0.01025, over 3045947.31 frames. ], batch size: 53, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:14:05,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771733.3333333334, ans=0.1 2023-11-19 19:14:19,645 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-11-19 19:14:23,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2023-11-19 19:14:34,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=771866.6666666666, ans=0.125 2023-11-19 19:14:48,802 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115800 2023-11-19 19:14:53,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-19 19:14:55,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=772000.0, ans=0.125 2023-11-19 19:14:58,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=772000.0, ans=0.1 2023-11-19 19:15:04,060 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7600, loss[loss=0.07242, simple_loss=0.08928, pruned_loss=0.01667, audio_tagging_loss=0.01111, over 15446.00 frames. ], tot_loss[loss=0.085, simple_loss=0.1043, pruned_loss=0.02258, audio_tagging_loss=0.01028, over 3047324.19 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:15:27,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=772133.3333333334, ans=0.025 2023-11-19 19:15:32,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.395e+01 8.967e+01 1.032e+02 1.336e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 19:15:43,021 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2023-11-19 19:15:53,534 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115850 2023-11-19 19:15:53,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772266.6666666666, ans=0.1 2023-11-19 19:16:07,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=772400.0, ans=0.125 2023-11-19 19:16:08,670 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7650, loss[loss=0.07588, simple_loss=0.08771, pruned_loss=0.02081, audio_tagging_loss=0.01121, over 14854.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1052, pruned_loss=0.02266, audio_tagging_loss=0.01018, over 3047578.75 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:16:24,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=772466.6666666666, ans=0.2 2023-11-19 19:16:57,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=772600.0, ans=0.1 2023-11-19 19:16:58,681 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115900 2023-11-19 19:17:15,214 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7700, loss[loss=0.0921, simple_loss=0.1168, pruned_loss=0.0257, audio_tagging_loss=0.008005, over 15410.00 frames. ], tot_loss[loss=0.08525, simple_loss=0.1051, pruned_loss=0.02255, audio_tagging_loss=0.01016, over 3048812.45 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:17:24,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=772733.3333333334, ans=0.04949747468305833 2023-11-19 19:17:43,121 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.063e+01 8.419e+01 9.189e+01 1.330e+02, threshold=1.684e+02, percent-clipped=0.0 2023-11-19 19:17:43,748 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-19 19:18:04,551 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 115950 2023-11-19 19:18:20,134 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7750, loss[loss=0.09314, simple_loss=0.1096, pruned_loss=0.02854, audio_tagging_loss=0.009827, over 14621.00 frames. ], tot_loss[loss=0.08475, simple_loss=0.1045, pruned_loss=0.02234, audio_tagging_loss=0.01016, over 3045574.91 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:18:27,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2023-11-19 19:18:33,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=15.0 2023-11-19 19:18:37,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=773133.3333333334, ans=0.0 2023-11-19 19:18:58,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-11-19 19:19:01,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773266.6666666666, ans=0.1 2023-11-19 19:19:04,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=773266.6666666666, ans=0.0 2023-11-19 19:19:09,774 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116000 2023-11-19 19:19:11,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2023-11-19 19:19:28,164 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7800, loss[loss=0.06945, simple_loss=0.07744, pruned_loss=0.01656, audio_tagging_loss=0.01417, over 14597.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.1042, pruned_loss=0.0224, audio_tagging_loss=0.01022, over 3042441.01 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:19:30,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=773400.0, ans=0.125 2023-11-19 19:19:34,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=773400.0, ans=0.125 2023-11-19 19:19:58,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.729e+01 9.223e+01 9.822e+01 1.484e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 19:20:04,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=22.5 2023-11-19 19:20:08,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=773600.0, ans=0.2 2023-11-19 19:20:16,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=773600.0, ans=0.125 2023-11-19 19:20:17,773 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116050 2023-11-19 19:20:17,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=773600.0, ans=0.1 2023-11-19 19:20:34,230 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7850, loss[loss=0.06436, simple_loss=0.07564, pruned_loss=0.01386, audio_tagging_loss=0.01268, over 14771.00 frames. ], tot_loss[loss=0.08392, simple_loss=0.1029, pruned_loss=0.02212, audio_tagging_loss=0.01033, over 3036859.26 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:20:35,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=773733.3333333334, ans=0.1 2023-11-19 19:21:10,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-11-19 19:21:10,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=773933.3333333334, ans=0.125 2023-11-19 19:21:23,709 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116100 2023-11-19 19:21:23,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=773933.3333333334, ans=0.0 2023-11-19 19:21:32,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=774000.0, ans=0.125 2023-11-19 19:21:34,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=774000.0, ans=0.0 2023-11-19 19:21:36,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=774000.0, ans=0.0 2023-11-19 19:21:37,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=22.5 2023-11-19 19:21:38,942 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7900, loss[loss=0.08945, simple_loss=0.1171, pruned_loss=0.02152, audio_tagging_loss=0.009362, over 14614.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1031, pruned_loss=0.02238, audio_tagging_loss=0.01045, over 3034427.28 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:21:46,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=774066.6666666666, ans=0.125 2023-11-19 19:21:51,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=774133.3333333334, ans=0.125 2023-11-19 19:22:08,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.408e+01 9.159e+01 1.027e+02 1.292e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 19:22:11,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=774200.0, ans=0.035 2023-11-19 19:22:27,810 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116150 2023-11-19 19:22:31,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=774333.3333333334, ans=0.125 2023-11-19 19:22:43,193 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 7950, loss[loss=0.07631, simple_loss=0.08752, pruned_loss=0.02078, audio_tagging_loss=0.01178, over 15719.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1042, pruned_loss=0.0228, audio_tagging_loss=0.01054, over 3040864.41 frames. ], batch size: 60, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:22:52,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-11-19 19:22:57,991 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:23:05,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=774466.6666666666, ans=0.0 2023-11-19 19:23:07,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=774466.6666666666, ans=0.05 2023-11-19 19:23:11,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-19 19:23:23,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2023-11-19 19:23:31,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=774600.0, ans=0.0 2023-11-19 19:23:32,853 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116200 2023-11-19 19:23:49,425 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8000, loss[loss=0.07181, simple_loss=0.08251, pruned_loss=0.01949, audio_tagging_loss=0.01106, over 14797.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1046, pruned_loss=0.02278, audio_tagging_loss=0.01056, over 3035445.31 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:23:51,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2023-11-19 19:23:57,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=774733.3333333334, ans=0.125 2023-11-19 19:24:00,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=22.5 2023-11-19 19:24:19,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.333e+01 8.445e+01 9.319e+01 1.021e+02 1.426e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-19 19:24:37,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=774933.3333333334, ans=0.125 2023-11-19 19:24:38,979 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116250 2023-11-19 19:24:42,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=775000.0, ans=0.125 2023-11-19 19:24:54,271 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8050, loss[loss=0.09388, simple_loss=0.1283, pruned_loss=0.02149, audio_tagging_loss=0.008224, over 15746.00 frames. ], tot_loss[loss=0.08581, simple_loss=0.1051, pruned_loss=0.02273, audio_tagging_loss=0.01054, over 3041305.44 frames. ], batch size: 55, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:24:54,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=775066.6666666666, ans=0.025 2023-11-19 19:24:55,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=775066.6666666666, ans=0.1 2023-11-19 19:24:57,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775066.6666666666, ans=0.1 2023-11-19 19:24:59,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=775066.6666666666, ans=0.2 2023-11-19 19:25:27,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=775200.0, ans=0.1 2023-11-19 19:25:44,042 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116300 2023-11-19 19:25:44,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=775266.6666666666, ans=0.2 2023-11-19 19:25:44,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=775266.6666666666, ans=15.0 2023-11-19 19:25:52,108 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 2023-11-19 19:25:59,375 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8100, loss[loss=0.0682, simple_loss=0.08091, pruned_loss=0.01557, audio_tagging_loss=0.01217, over 14914.00 frames. ], tot_loss[loss=0.08519, simple_loss=0.1045, pruned_loss=0.02246, audio_tagging_loss=0.01051, over 3046358.09 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:26:17,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-11-19 19:26:18,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=775466.6666666666, ans=0.125 2023-11-19 19:26:19,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775466.6666666666, ans=0.1 2023-11-19 19:26:27,931 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2023-11-19 19:26:29,907 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.355e+01 8.902e+01 9.485e+01 1.238e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 19:26:36,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=775533.3333333334, ans=0.2 2023-11-19 19:26:42,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=775600.0, ans=0.5 2023-11-19 19:26:48,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=775600.0, ans=0.125 2023-11-19 19:26:49,875 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116350 2023-11-19 19:26:53,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=775666.6666666666, ans=0.125 2023-11-19 19:27:00,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=775666.6666666666, ans=0.0 2023-11-19 19:27:05,393 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8150, loss[loss=0.06248, simple_loss=0.07067, pruned_loss=0.01787, audio_tagging_loss=0.009279, over 16876.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1059, pruned_loss=0.02277, audio_tagging_loss=0.01038, over 3051041.07 frames. ], batch size: 66, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:27:14,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=775733.3333333334, ans=0.125 2023-11-19 19:27:33,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=775866.6666666666, ans=0.125 2023-11-19 19:27:55,552 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116400 2023-11-19 19:27:55,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=775933.3333333334, ans=0.125 2023-11-19 19:28:10,284 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:28:11,456 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8200, loss[loss=0.0772, simple_loss=0.08904, pruned_loss=0.02083, audio_tagging_loss=0.01185, over 15189.00 frames. ], tot_loss[loss=0.08675, simple_loss=0.1071, pruned_loss=0.02297, audio_tagging_loss=0.01025, over 3065716.95 frames. ], batch size: 58, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:28:14,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2023-11-19 19:28:19,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=776066.6666666666, ans=0.2 2023-11-19 19:28:25,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-11-19 19:28:41,278 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.595e+01 9.342e+01 1.031e+02 1.321e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 19:28:45,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=776200.0, ans=0.125 2023-11-19 19:29:01,290 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116450 2023-11-19 19:29:06,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=776333.3333333334, ans=0.0 2023-11-19 19:29:16,459 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8250, loss[loss=0.06952, simple_loss=0.09716, pruned_loss=0.01409, audio_tagging_loss=0.006846, over 15724.00 frames. ], tot_loss[loss=0.08607, simple_loss=0.1061, pruned_loss=0.02269, audio_tagging_loss=0.0103, over 3063227.85 frames. ], batch size: 61, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:29:22,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=776400.0, ans=0.0 2023-11-19 19:29:27,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=776400.0, ans=0.125 2023-11-19 19:29:37,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2023-11-19 19:29:45,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=776533.3333333334, ans=0.0 2023-11-19 19:29:47,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=776533.3333333334, ans=0.2 2023-11-19 19:29:52,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=776533.3333333334, ans=0.2 2023-11-19 19:29:56,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=12.0 2023-11-19 19:30:05,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-19 19:30:05,991 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116500 2023-11-19 19:30:06,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776600.0, ans=0.1 2023-11-19 19:30:22,195 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8300, loss[loss=0.09462, simple_loss=0.1296, pruned_loss=0.02445, audio_tagging_loss=0.005365, over 15855.00 frames. ], tot_loss[loss=0.08626, simple_loss=0.1064, pruned_loss=0.02281, audio_tagging_loss=0.01026, over 3062040.56 frames. ], batch size: 60, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:30:40,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2023-11-19 19:30:47,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-11-19 19:30:49,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=776866.6666666666, ans=0.125 2023-11-19 19:30:51,982 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.250e+01 8.993e+01 9.595e+01 1.317e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 19:31:08,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=776933.3333333334, ans=0.025 2023-11-19 19:31:11,738 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116550 2023-11-19 19:31:27,859 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8350, loss[loss=0.0893, simple_loss=0.1109, pruned_loss=0.027, audio_tagging_loss=0.006862, over 15355.00 frames. ], tot_loss[loss=0.08601, simple_loss=0.106, pruned_loss=0.02283, audio_tagging_loss=0.0102, over 3056388.70 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:31:28,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=777066.6666666666, ans=0.0 2023-11-19 19:31:34,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=777066.6666666666, ans=0.125 2023-11-19 19:31:49,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=777133.3333333334, ans=0.0 2023-11-19 19:32:00,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=777200.0, ans=10.0 2023-11-19 19:32:17,484 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116600 2023-11-19 19:32:32,799 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8400, loss[loss=0.07079, simple_loss=0.07919, pruned_loss=0.01905, audio_tagging_loss=0.01215, over 14250.00 frames. ], tot_loss[loss=0.0853, simple_loss=0.1053, pruned_loss=0.02255, audio_tagging_loss=0.01009, over 3061673.27 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:32:33,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=12.0 2023-11-19 19:32:56,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=777466.6666666666, ans=0.0 2023-11-19 19:33:03,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.528e+01 8.258e+01 9.022e+01 9.929e+01 1.314e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 19:33:07,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=777533.3333333334, ans=0.0 2023-11-19 19:33:11,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=777600.0, ans=0.125 2023-11-19 19:33:12,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=777600.0, ans=0.0 2023-11-19 19:33:22,307 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116650 2023-11-19 19:33:24,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777666.6666666666, ans=0.125 2023-11-19 19:33:37,667 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8450, loss[loss=0.08696, simple_loss=0.1054, pruned_loss=0.02425, audio_tagging_loss=0.01, over 16302.00 frames. ], tot_loss[loss=0.08506, simple_loss=0.1047, pruned_loss=0.02255, audio_tagging_loss=0.01016, over 3057837.95 frames. ], batch size: 59, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:33:50,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=777800.0, ans=0.125 2023-11-19 19:33:52,566 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2023-11-19 19:34:07,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-11-19 19:34:07,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-11-19 19:34:14,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=777866.6666666666, ans=0.5 2023-11-19 19:34:16,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=777933.3333333334, ans=0.1 2023-11-19 19:34:27,537 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116700 2023-11-19 19:34:44,129 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8500, loss[loss=0.06277, simple_loss=0.06967, pruned_loss=0.01318, audio_tagging_loss=0.01475, over 15793.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1049, pruned_loss=0.02247, audio_tagging_loss=0.01022, over 3053329.19 frames. ], batch size: 60, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:34:58,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2023-11-19 19:35:13,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.293e+01 8.958e+01 9.904e+01 1.302e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-19 19:35:16,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=778200.0, ans=0.125 2023-11-19 19:35:33,132 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116750 2023-11-19 19:35:43,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=778333.3333333334, ans=0.2 2023-11-19 19:35:48,121 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8550, loss[loss=0.08351, simple_loss=0.1004, pruned_loss=0.02274, audio_tagging_loss=0.01058, over 14989.00 frames. ], tot_loss[loss=0.08588, simple_loss=0.106, pruned_loss=0.02274, audio_tagging_loss=0.01013, over 3052338.37 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:35:53,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=778400.0, ans=0.125 2023-11-19 19:36:05,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=778466.6666666666, ans=0.125 2023-11-19 19:36:12,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=778466.6666666666, ans=0.2 2023-11-19 19:36:30,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=778600.0, ans=0.2 2023-11-19 19:36:37,893 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116800 2023-11-19 19:36:43,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=778666.6666666666, ans=0.0 2023-11-19 19:36:47,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778666.6666666666, ans=0.1 2023-11-19 19:36:52,901 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8600, loss[loss=0.08013, simple_loss=0.09823, pruned_loss=0.02132, audio_tagging_loss=0.009688, over 14572.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1059, pruned_loss=0.02288, audio_tagging_loss=0.01016, over 3046635.45 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:36:57,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=778733.3333333334, ans=0.125 2023-11-19 19:37:04,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=778733.3333333334, ans=0.0 2023-11-19 19:37:22,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=778866.6666666666, ans=0.125 2023-11-19 19:37:24,753 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.439e+01 9.027e+01 1.024e+02 1.472e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 19:37:25,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=778866.6666666666, ans=0.0 2023-11-19 19:37:29,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=778866.6666666666, ans=0.125 2023-11-19 19:37:42,133 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116850 2023-11-19 19:37:59,580 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8650, loss[loss=0.113, simple_loss=0.1371, pruned_loss=0.03658, audio_tagging_loss=0.007845, over 14836.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1061, pruned_loss=0.02292, audio_tagging_loss=0.01025, over 3044386.02 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:38:48,820 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116900 2023-11-19 19:38:50,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=779333.3333333334, ans=0.0 2023-11-19 19:38:50,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=779333.3333333334, ans=0.2 2023-11-19 19:39:03,572 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8700, loss[loss=0.09201, simple_loss=0.1011, pruned_loss=0.02734, audio_tagging_loss=0.01411, over 14107.00 frames. ], tot_loss[loss=0.08607, simple_loss=0.1055, pruned_loss=0.02291, audio_tagging_loss=0.0104, over 3049042.61 frames. ], batch size: 53, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:39:04,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2023-11-19 19:39:06,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=779400.0, ans=0.125 2023-11-19 19:39:09,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2023-11-19 19:39:13,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779400.0, ans=0.1 2023-11-19 19:39:27,992 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:39:28,502 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2023-11-19 19:39:35,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.405e+01 9.122e+01 9.859e+01 1.308e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 19:39:53,677 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 116950 2023-11-19 19:39:59,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-11-19 19:40:00,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=779666.6666666666, ans=0.125 2023-11-19 19:40:06,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2023-11-19 19:40:08,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-19 19:40:08,882 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8750, loss[loss=0.08213, simple_loss=0.09371, pruned_loss=0.02517, audio_tagging_loss=0.01011, over 15265.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1059, pruned_loss=0.023, audio_tagging_loss=0.01041, over 3055820.85 frames. ], batch size: 60, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:40:40,224 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:40:53,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=779933.3333333334, ans=0.125 2023-11-19 19:40:58,653 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117000 2023-11-19 19:41:15,645 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8800, loss[loss=0.06539, simple_loss=0.07513, pruned_loss=0.01356, audio_tagging_loss=0.01427, over 14331.00 frames. ], tot_loss[loss=0.08625, simple_loss=0.1056, pruned_loss=0.02292, audio_tagging_loss=0.01054, over 3051300.18 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:41:46,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.517e+01 9.316e+01 1.005e+02 1.428e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 19:42:05,430 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117050 2023-11-19 19:42:20,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2023-11-19 19:42:20,911 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8850, loss[loss=0.09927, simple_loss=0.1233, pruned_loss=0.02768, audio_tagging_loss=0.009927, over 15325.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1053, pruned_loss=0.02285, audio_tagging_loss=0.01064, over 3054440.44 frames. ], batch size: 57, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:42:21,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=780400.0, ans=0.0 2023-11-19 19:42:31,025 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:42:32,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2023-11-19 19:43:01,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780600.0, ans=0.1 2023-11-19 19:43:06,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=780600.0, ans=0.2 2023-11-19 19:43:10,889 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117100 2023-11-19 19:43:25,653 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8900, loss[loss=0.08761, simple_loss=0.1077, pruned_loss=0.0255, audio_tagging_loss=0.008274, over 15817.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1058, pruned_loss=0.02284, audio_tagging_loss=0.01036, over 3053725.72 frames. ], batch size: 59, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:43:57,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.524e+01 9.164e+01 1.008e+02 2.519e+02, threshold=1.833e+02, percent-clipped=1.0 2023-11-19 19:44:15,128 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117150 2023-11-19 19:44:30,946 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 8950, loss[loss=0.09902, simple_loss=0.1201, pruned_loss=0.02905, audio_tagging_loss=0.009906, over 14303.00 frames. ], tot_loss[loss=0.08625, simple_loss=0.1063, pruned_loss=0.02285, audio_tagging_loss=0.01025, over 3051003.10 frames. ], batch size: 54, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:44:34,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=781066.6666666666, ans=0.07 2023-11-19 19:44:42,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=781066.6666666666, ans=0.0 2023-11-19 19:44:43,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=781133.3333333334, ans=0.125 2023-11-19 19:44:44,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=781133.3333333334, ans=0.125 2023-11-19 19:45:13,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=781266.6666666666, ans=0.0 2023-11-19 19:45:20,442 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117200 2023-11-19 19:45:27,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=781333.3333333334, ans=0.07 2023-11-19 19:45:29,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=781333.3333333334, ans=0.125 2023-11-19 19:45:36,434 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9000, loss[loss=0.0724, simple_loss=0.0885, pruned_loss=0.01796, audio_tagging_loss=0.01019, over 15879.00 frames. ], tot_loss[loss=0.08657, simple_loss=0.1064, pruned_loss=0.02308, audio_tagging_loss=0.01029, over 3044065.62 frames. ], batch size: 59, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:45:36,435 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-19 19:45:56,711 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0405, 4.5234, 5.1728, 4.7438], device='cuda:2') 2023-11-19 19:46:11,707 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3321, 4.9765, 4.7497, 5.1755], device='cuda:2') 2023-11-19 19:46:18,845 INFO [train_asr.py:1294] (2/4) Epoch 10, validation: loss=0.06518, simple_loss=0.05524, pruned_loss=0.006372, audio_tagging_loss=0.03119, over 4681554.00 frames. 2023-11-19 19:46:18,845 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-19 19:46:26,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-19 19:46:52,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.536e+01 8.923e+01 9.790e+01 1.498e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-19 19:47:08,922 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117250 2023-11-19 19:47:18,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=781666.6666666666, ans=0.07 2023-11-19 19:47:25,435 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9050, loss[loss=0.09575, simple_loss=0.114, pruned_loss=0.02784, audio_tagging_loss=0.01093, over 14986.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.1065, pruned_loss=0.02304, audio_tagging_loss=0.01018, over 3044943.66 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:47:26,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2023-11-19 19:47:30,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=781733.3333333334, ans=0.0 2023-11-19 19:47:34,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=781733.3333333334, ans=0.0 2023-11-19 19:48:07,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=781933.3333333334, ans=0.125 2023-11-19 19:48:15,121 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117300 2023-11-19 19:48:19,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2023-11-19 19:48:30,617 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9100, loss[loss=0.0752, simple_loss=0.092, pruned_loss=0.01915, audio_tagging_loss=0.01005, over 14740.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1058, pruned_loss=0.02265, audio_tagging_loss=0.0101, over 3040622.84 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:48:33,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=782066.6666666666, ans=0.0 2023-11-19 19:48:39,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=782066.6666666666, ans=0.5 2023-11-19 19:48:40,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=782066.6666666666, ans=0.0 2023-11-19 19:48:49,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=782133.3333333334, ans=0.0 2023-11-19 19:48:49,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=782133.3333333334, ans=0.125 2023-11-19 19:48:59,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=782200.0, ans=0.125 2023-11-19 19:49:03,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.111e+01 8.734e+01 9.488e+01 1.224e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-19 19:49:21,043 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117350 2023-11-19 19:49:25,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-19 19:49:35,997 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9150, loss[loss=0.06596, simple_loss=0.07911, pruned_loss=0.01452, audio_tagging_loss=0.01188, over 15797.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1057, pruned_loss=0.02275, audio_tagging_loss=0.01011, over 3043634.63 frames. ], batch size: 64, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:49:44,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=782400.0, ans=0.125 2023-11-19 19:49:58,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2023-11-19 19:50:09,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=782533.3333333334, ans=0.2 2023-11-19 19:50:18,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=782600.0, ans=0.1 2023-11-19 19:50:19,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=782600.0, ans=0.125 2023-11-19 19:50:26,040 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117400 2023-11-19 19:50:29,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-19 19:50:42,577 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9200, loss[loss=0.09643, simple_loss=0.1206, pruned_loss=0.02606, audio_tagging_loss=0.01007, over 16038.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1055, pruned_loss=0.02279, audio_tagging_loss=0.01022, over 3038487.45 frames. ], batch size: 62, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:50:45,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782733.3333333334, ans=0.1 2023-11-19 19:51:07,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=782866.6666666666, ans=0.0 2023-11-19 19:51:15,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.317e+01 9.062e+01 1.049e+02 1.537e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 19:51:33,106 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117450 2023-11-19 19:51:34,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=783000.0, ans=0.125 2023-11-19 19:51:36,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=783000.0, ans=0.125 2023-11-19 19:51:40,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=12.0 2023-11-19 19:51:41,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=783000.0, ans=0.125 2023-11-19 19:51:42,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=783000.0, ans=0.0 2023-11-19 19:51:47,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=783066.6666666666, ans=0.0 2023-11-19 19:51:48,871 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9250, loss[loss=0.08405, simple_loss=0.1039, pruned_loss=0.02134, audio_tagging_loss=0.01076, over 15168.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1058, pruned_loss=0.02278, audio_tagging_loss=0.01021, over 3045822.82 frames. ], batch size: 57, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:51:59,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=783066.6666666666, ans=0.125 2023-11-19 19:52:02,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-11-19 19:52:04,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=12.0 2023-11-19 19:52:16,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=783200.0, ans=0.125 2023-11-19 19:52:38,860 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117500 2023-11-19 19:52:40,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2023-11-19 19:52:54,384 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9300, loss[loss=0.075, simple_loss=0.09473, pruned_loss=0.01825, audio_tagging_loss=0.009381, over 15703.00 frames. ], tot_loss[loss=0.0863, simple_loss=0.1064, pruned_loss=0.02291, audio_tagging_loss=0.01019, over 3044522.96 frames. ], batch size: 62, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:52:54,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=783400.0, ans=0.2 2023-11-19 19:53:05,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=783400.0, ans=0.2 2023-11-19 19:53:11,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2023-11-19 19:53:26,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.312e+01 9.036e+01 9.787e+01 1.162e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 19:53:44,126 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117550 2023-11-19 19:53:59,581 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9350, loss[loss=0.08097, simple_loss=0.1025, pruned_loss=0.02055, audio_tagging_loss=0.009155, over 16068.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1056, pruned_loss=0.02274, audio_tagging_loss=0.01028, over 3047838.17 frames. ], batch size: 63, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:54:14,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=783800.0, ans=0.02 2023-11-19 19:54:21,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=783800.0, ans=0.125 2023-11-19 19:54:50,048 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117600 2023-11-19 19:54:56,161 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:55:05,786 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9400, loss[loss=0.09249, simple_loss=0.09714, pruned_loss=0.02926, audio_tagging_loss=0.01465, over 14991.00 frames. ], tot_loss[loss=0.08524, simple_loss=0.1047, pruned_loss=0.0225, audio_tagging_loss=0.01039, over 3043481.71 frames. ], batch size: 56, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:55:18,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-11-19 19:55:22,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784133.3333333334, ans=0.1 2023-11-19 19:55:27,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=784133.3333333334, ans=0.0 2023-11-19 19:55:30,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=784200.0, ans=0.5 2023-11-19 19:55:39,198 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.476e+01 9.081e+01 1.030e+02 1.355e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 19:55:41,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=784200.0, ans=0.125 2023-11-19 19:55:50,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2023-11-19 19:55:55,249 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117650 2023-11-19 19:56:06,428 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:56:10,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2023-11-19 19:56:10,993 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9450, loss[loss=0.06816, simple_loss=0.07988, pruned_loss=0.01571, audio_tagging_loss=0.01251, over 14335.00 frames. ], tot_loss[loss=0.08469, simple_loss=0.1037, pruned_loss=0.02224, audio_tagging_loss=0.01058, over 3043748.91 frames. ], batch size: 57, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:56:23,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=784466.6666666666, ans=0.125 2023-11-19 19:56:40,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=784533.3333333334, ans=0.0 2023-11-19 19:56:41,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=784533.3333333334, ans=0.2 2023-11-19 19:57:00,649 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117700 2023-11-19 19:57:03,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=784666.6666666666, ans=0.125 2023-11-19 19:57:16,049 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9500, loss[loss=0.09792, simple_loss=0.1134, pruned_loss=0.02625, audio_tagging_loss=0.01497, over 13627.00 frames. ], tot_loss[loss=0.08431, simple_loss=0.103, pruned_loss=0.02209, audio_tagging_loss=0.0107, over 3048625.25 frames. ], batch size: 52, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:57:19,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=784733.3333333334, ans=0.2 2023-11-19 19:57:27,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.09 vs. limit=10.0 2023-11-19 19:57:49,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.346e+01 9.084e+01 9.988e+01 1.421e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 19:57:58,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-11-19 19:58:06,185 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117750 2023-11-19 19:58:21,915 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9550, loss[loss=0.08468, simple_loss=0.1026, pruned_loss=0.02299, audio_tagging_loss=0.01038, over 14666.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.103, pruned_loss=0.02231, audio_tagging_loss=0.01077, over 3046035.28 frames. ], batch size: 57, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:58:26,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.48 vs. limit=15.0 2023-11-19 19:58:42,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=785133.3333333334, ans=0.1 2023-11-19 19:59:10,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=785266.6666666666, ans=0.125 2023-11-19 19:59:11,457 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117800 2023-11-19 19:59:26,810 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9600, loss[loss=0.08318, simple_loss=0.1032, pruned_loss=0.01996, audio_tagging_loss=0.01164, over 15707.00 frames. ], tot_loss[loss=0.08572, simple_loss=0.1048, pruned_loss=0.02265, audio_tagging_loss=0.01066, over 3049313.18 frames. ], batch size: 58, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:00:01,480 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.524e+01 9.148e+01 9.988e+01 1.418e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 20:00:04,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=785533.3333333334, ans=0.0 2023-11-19 20:00:04,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=785533.3333333334, ans=0.0 2023-11-19 20:00:05,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=785600.0, ans=0.125 2023-11-19 20:00:16,410 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117850 2023-11-19 20:00:16,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=785600.0, ans=0.125 2023-11-19 20:00:32,095 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9650, loss[loss=0.113, simple_loss=0.1352, pruned_loss=0.03651, audio_tagging_loss=0.008859, over 15643.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1041, pruned_loss=0.02269, audio_tagging_loss=0.01065, over 3048632.12 frames. ], batch size: 58, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:00:48,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=785800.0, ans=0.125 2023-11-19 20:00:50,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=785800.0, ans=0.0 2023-11-19 20:00:59,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.40 vs. limit=10.0 2023-11-19 20:01:03,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=785866.6666666666, ans=0.0 2023-11-19 20:01:18,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=785933.3333333334, ans=0.1 2023-11-19 20:01:22,122 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117900 2023-11-19 20:01:37,754 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9700, loss[loss=0.09182, simple_loss=0.1205, pruned_loss=0.02423, audio_tagging_loss=0.007356, over 15380.00 frames. ], tot_loss[loss=0.08502, simple_loss=0.1043, pruned_loss=0.02254, audio_tagging_loss=0.01035, over 3051400.47 frames. ], batch size: 57, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:01:59,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=786133.3333333334, ans=0.0 2023-11-19 20:02:07,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=786200.0, ans=0.125 2023-11-19 20:02:11,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.357e+01 9.241e+01 1.009e+02 1.482e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 20:02:24,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786266.6666666666, ans=0.1 2023-11-19 20:02:26,868 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 117950 2023-11-19 20:02:33,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=786333.3333333334, ans=0.025 2023-11-19 20:02:37,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=786333.3333333334, ans=0.125 2023-11-19 20:02:41,781 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9750, loss[loss=0.09618, simple_loss=0.116, pruned_loss=0.02833, audio_tagging_loss=0.009836, over 14985.00 frames. ], tot_loss[loss=0.08507, simple_loss=0.1046, pruned_loss=0.0225, audio_tagging_loss=0.01025, over 3051355.49 frames. ], batch size: 57, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:02:49,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=786400.0, ans=0.125 2023-11-19 20:02:49,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786400.0, ans=0.1 2023-11-19 20:03:30,749 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118000 2023-11-19 20:03:35,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=786666.6666666666, ans=0.1 2023-11-19 20:03:40,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=786666.6666666666, ans=0.125 2023-11-19 20:03:42,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=786666.6666666666, ans=0.0 2023-11-19 20:03:43,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=786666.6666666666, ans=0.0 2023-11-19 20:03:46,738 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9800, loss[loss=0.07006, simple_loss=0.07936, pruned_loss=0.02027, audio_tagging_loss=0.01011, over 15418.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.1051, pruned_loss=0.02259, audio_tagging_loss=0.01015, over 3054230.72 frames. ], batch size: 62, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:03:49,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=786733.3333333334, ans=0.2 2023-11-19 20:03:51,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=786733.3333333334, ans=0.125 2023-11-19 20:04:04,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-19 20:04:11,984 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:04:20,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.585e+01 9.415e+01 1.041e+02 1.362e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 20:04:22,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=786866.6666666666, ans=0.125 2023-11-19 20:04:23,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=786866.6666666666, ans=0.2 2023-11-19 20:04:23,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=786866.6666666666, ans=0.0 2023-11-19 20:04:36,308 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118050 2023-11-19 20:04:43,143 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:04:52,875 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9850, loss[loss=0.08873, simple_loss=0.09996, pruned_loss=0.02438, audio_tagging_loss=0.01437, over 14448.00 frames. ], tot_loss[loss=0.08559, simple_loss=0.1053, pruned_loss=0.02283, audio_tagging_loss=0.01012, over 3051838.12 frames. ], batch size: 56, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:04:59,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=787066.6666666666, ans=0.125 2023-11-19 20:05:07,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=787133.3333333334, ans=0.125 2023-11-19 20:05:17,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=787200.0, ans=0.125 2023-11-19 20:05:29,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=787266.6666666666, ans=0.1 2023-11-19 20:05:35,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=787266.6666666666, ans=0.0 2023-11-19 20:05:41,647 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118100 2023-11-19 20:05:48,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-19 20:05:56,645 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9900, loss[loss=0.103, simple_loss=0.1242, pruned_loss=0.03109, audio_tagging_loss=0.009756, over 15640.00 frames. ], tot_loss[loss=0.08575, simple_loss=0.1057, pruned_loss=0.02287, audio_tagging_loss=0.01005, over 3052557.59 frames. ], batch size: 58, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:05:56,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=787400.0, ans=0.0 2023-11-19 20:06:07,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=787400.0, ans=0.0 2023-11-19 20:06:09,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=12.0 2023-11-19 20:06:15,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=787466.6666666666, ans=15.0 2023-11-19 20:06:17,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=787466.6666666666, ans=0.2 2023-11-19 20:06:17,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=787466.6666666666, ans=0.125 2023-11-19 20:06:30,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-11-19 20:06:30,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.320e+01 8.268e+01 9.019e+01 9.736e+01 1.319e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 20:06:45,641 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118150 2023-11-19 20:07:00,293 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 9950, loss[loss=0.108, simple_loss=0.1224, pruned_loss=0.03605, audio_tagging_loss=0.0108, over 14417.00 frames. ], tot_loss[loss=0.08591, simple_loss=0.106, pruned_loss=0.02292, audio_tagging_loss=0.01, over 3051239.26 frames. ], batch size: 56, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:07:29,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=787866.6666666666, ans=0.125 2023-11-19 20:07:48,748 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118200 2023-11-19 20:07:56,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=788000.0, ans=0.0 2023-11-19 20:07:58,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=788000.0, ans=0.0 2023-11-19 20:08:05,800 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10000, loss[loss=0.08806, simple_loss=0.1135, pruned_loss=0.02256, audio_tagging_loss=0.008752, over 14537.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1054, pruned_loss=0.02286, audio_tagging_loss=0.0101, over 3054124.39 frames. ], batch size: 57, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:08:27,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=788133.3333333334, ans=0.125 2023-11-19 20:08:37,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.424e+01 7.892e+01 8.582e+01 9.307e+01 3.708e+02, threshold=1.716e+02, percent-clipped=1.0 2023-11-19 20:08:54,202 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118250 2023-11-19 20:09:09,684 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10050, loss[loss=0.08737, simple_loss=0.109, pruned_loss=0.02275, audio_tagging_loss=0.01015, over 14670.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.1059, pruned_loss=0.02294, audio_tagging_loss=0.01004, over 3051443.07 frames. ], batch size: 58, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:09:10,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=788400.0, ans=0.125 2023-11-19 20:09:22,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2023-11-19 20:09:33,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=788466.6666666666, ans=0.0 2023-11-19 20:09:50,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=788600.0, ans=0.125 2023-11-19 20:09:58,748 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118300 2023-11-19 20:10:13,514 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10100, loss[loss=0.09234, simple_loss=0.1089, pruned_loss=0.02762, audio_tagging_loss=0.01027, over 14923.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1059, pruned_loss=0.02282, audio_tagging_loss=0.01003, over 3053986.74 frames. ], batch size: 56, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:10:13,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=788733.3333333334, ans=0.0 2023-11-19 20:10:23,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2023-11-19 20:10:24,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=788733.3333333334, ans=0.1 2023-11-19 20:10:26,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2023-11-19 20:10:29,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=788800.0, ans=0.125 2023-11-19 20:10:31,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=788800.0, ans=0.0 2023-11-19 20:10:42,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=788866.6666666666, ans=0.1 2023-11-19 20:10:47,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.462e+01 9.399e+01 1.049e+02 1.408e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-19 20:11:02,488 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118350 2023-11-19 20:11:03,611 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:11:18,938 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10150, loss[loss=0.1031, simple_loss=0.1381, pruned_loss=0.02559, audio_tagging_loss=0.008462, over 14344.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1062, pruned_loss=0.02279, audio_tagging_loss=0.01005, over 3053245.04 frames. ], batch size: 56, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:11:33,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=789133.3333333334, ans=0.125 2023-11-19 20:11:41,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=789133.3333333334, ans=0.0 2023-11-19 20:11:44,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-11-19 20:11:46,497 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:12:07,329 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118400 2023-11-19 20:12:15,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=789333.3333333334, ans=0.125 2023-11-19 20:12:19,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=789333.3333333334, ans=0.125 2023-11-19 20:12:23,171 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10200, loss[loss=0.09406, simple_loss=0.1204, pruned_loss=0.02324, audio_tagging_loss=0.0106, over 16597.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1054, pruned_loss=0.0225, audio_tagging_loss=0.01019, over 3050014.28 frames. ], batch size: 61, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:12:40,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=789466.6666666666, ans=0.125 2023-11-19 20:12:44,378 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:12:50,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=789533.3333333334, ans=0.125 2023-11-19 20:12:57,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.394e+01 8.968e+01 9.888e+01 1.322e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 20:13:12,294 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118450 2023-11-19 20:13:27,001 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10250, loss[loss=0.07279, simple_loss=0.08529, pruned_loss=0.01837, audio_tagging_loss=0.01177, over 13669.00 frames. ], tot_loss[loss=0.08489, simple_loss=0.1048, pruned_loss=0.02224, audio_tagging_loss=0.01026, over 3050949.80 frames. ], batch size: 53, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:14:11,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=789933.3333333334, ans=0.2 2023-11-19 20:14:16,520 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118500 2023-11-19 20:14:25,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-19 20:14:25,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=790000.0, ans=0.125 2023-11-19 20:14:31,739 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10300, loss[loss=0.09215, simple_loss=0.1087, pruned_loss=0.02751, audio_tagging_loss=0.0103, over 14430.00 frames. ], tot_loss[loss=0.08497, simple_loss=0.1046, pruned_loss=0.02227, audio_tagging_loss=0.01042, over 3045187.12 frames. ], batch size: 54, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:14:33,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=790066.6666666666, ans=0.125 2023-11-19 20:14:46,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=790133.3333333334, ans=0.125 2023-11-19 20:15:06,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.556e+01 8.163e+01 8.831e+01 9.880e+01 1.200e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-19 20:15:13,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=790266.6666666666, ans=0.125 2023-11-19 20:15:21,129 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118550 2023-11-19 20:15:27,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=790333.3333333334, ans=0.125 2023-11-19 20:15:34,962 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:15:36,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-19 20:15:37,174 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10350, loss[loss=0.092, simple_loss=0.1111, pruned_loss=0.02612, audio_tagging_loss=0.01031, over 14905.00 frames. ], tot_loss[loss=0.08524, simple_loss=0.1044, pruned_loss=0.02243, audio_tagging_loss=0.01061, over 3040810.56 frames. ], batch size: 56, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:15:53,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2023-11-19 20:16:01,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=790533.3333333334, ans=0.2 2023-11-19 20:16:25,925 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118600 2023-11-19 20:16:28,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=790666.6666666666, ans=0.125 2023-11-19 20:16:40,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790733.3333333334, ans=0.1 2023-11-19 20:16:40,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790733.3333333334, ans=0.1 2023-11-19 20:16:41,677 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10400, loss[loss=0.0797, simple_loss=0.09627, pruned_loss=0.02107, audio_tagging_loss=0.01049, over 14677.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1047, pruned_loss=0.0225, audio_tagging_loss=0.01065, over 3042632.58 frames. ], batch size: 55, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:16:54,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=790800.0, ans=0.125 2023-11-19 20:17:12,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-11-19 20:17:17,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.510e+01 9.248e+01 1.057e+02 2.087e+02, threshold=1.850e+02, percent-clipped=1.0 2023-11-19 20:17:17,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=790866.6666666666, ans=0.2 2023-11-19 20:17:31,953 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118650 2023-11-19 20:17:38,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=791000.0, ans=0.125 2023-11-19 20:17:47,263 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10450, loss[loss=0.07853, simple_loss=0.09388, pruned_loss=0.02382, audio_tagging_loss=0.007773, over 16075.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1046, pruned_loss=0.02253, audio_tagging_loss=0.01063, over 3047782.00 frames. ], batch size: 61, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:18:35,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=791266.6666666666, ans=0.125 2023-11-19 20:18:36,683 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118700 2023-11-19 20:18:52,550 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10500, loss[loss=0.09408, simple_loss=0.1129, pruned_loss=0.02561, audio_tagging_loss=0.01201, over 14341.00 frames. ], tot_loss[loss=0.08467, simple_loss=0.1036, pruned_loss=0.02232, audio_tagging_loss=0.01057, over 3038462.80 frames. ], batch size: 53, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:18:58,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=791400.0, ans=0.1 2023-11-19 20:19:08,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=791466.6666666666, ans=0.2 2023-11-19 20:19:14,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=791466.6666666666, ans=0.0 2023-11-19 20:19:20,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=791533.3333333334, ans=0.0 2023-11-19 20:19:25,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=791533.3333333334, ans=0.2 2023-11-19 20:19:28,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.886e+01 8.049e+01 8.549e+01 9.577e+01 1.136e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-19 20:19:42,031 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118750 2023-11-19 20:19:42,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-19 20:19:52,470 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2023-11-19 20:19:56,805 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10550, loss[loss=0.0703, simple_loss=0.08889, pruned_loss=0.01495, audio_tagging_loss=0.01091, over 16492.00 frames. ], tot_loss[loss=0.08438, simple_loss=0.1035, pruned_loss=0.02223, audio_tagging_loss=0.01041, over 3039328.74 frames. ], batch size: 61, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:20:15,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=791800.0, ans=0.1 2023-11-19 20:20:35,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=791933.3333333334, ans=0.125 2023-11-19 20:20:41,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=791933.3333333334, ans=0.125 2023-11-19 20:20:46,387 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118800 2023-11-19 20:21:01,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=792066.6666666666, ans=0.1 2023-11-19 20:21:02,567 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10600, loss[loss=0.0683, simple_loss=0.08229, pruned_loss=0.01668, audio_tagging_loss=0.01047, over 15340.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.104, pruned_loss=0.02242, audio_tagging_loss=0.01031, over 3039892.46 frames. ], batch size: 59, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:21:07,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=792066.6666666666, ans=0.0 2023-11-19 20:21:17,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=792133.3333333334, ans=0.2 2023-11-19 20:21:22,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-19 20:21:27,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=792200.0, ans=0.125 2023-11-19 20:21:32,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=792200.0, ans=0.0 2023-11-19 20:21:37,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=792200.0, ans=0.125 2023-11-19 20:21:38,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 8.305e+01 8.737e+01 9.475e+01 2.195e+02, threshold=1.747e+02, percent-clipped=1.0 2023-11-19 20:21:45,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=792266.6666666666, ans=0.125 2023-11-19 20:21:49,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=792266.6666666666, ans=0.2 2023-11-19 20:21:51,997 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118850 2023-11-19 20:22:07,717 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10650, loss[loss=0.1023, simple_loss=0.1235, pruned_loss=0.02939, audio_tagging_loss=0.0112, over 15739.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.1044, pruned_loss=0.02249, audio_tagging_loss=0.01025, over 3043944.92 frames. ], batch size: 61, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:22:46,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-19 20:22:51,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=792600.0, ans=0.2 2023-11-19 20:22:57,323 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118900 2023-11-19 20:23:02,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=792666.6666666666, ans=0.125 2023-11-19 20:23:06,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=792666.6666666666, ans=0.0 2023-11-19 20:23:12,202 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10700, loss[loss=0.07209, simple_loss=0.08919, pruned_loss=0.01784, audio_tagging_loss=0.009656, over 15119.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1026, pruned_loss=0.02221, audio_tagging_loss=0.01034, over 3049894.32 frames. ], batch size: 55, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:23:30,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=792800.0, ans=0.125 2023-11-19 20:23:34,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=792800.0, ans=0.04949747468305833 2023-11-19 20:23:48,372 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.344e+01 9.124e+01 9.874e+01 1.194e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 20:23:53,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=792933.3333333334, ans=0.0 2023-11-19 20:24:00,577 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 118950 2023-11-19 20:24:13,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=793000.0, ans=0.125 2023-11-19 20:24:16,644 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10750, loss[loss=0.1202, simple_loss=0.1357, pruned_loss=0.04114, audio_tagging_loss=0.01124, over 16057.00 frames. ], tot_loss[loss=0.08445, simple_loss=0.1035, pruned_loss=0.02239, audio_tagging_loss=0.01032, over 3045073.76 frames. ], batch size: 59, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:24:32,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=793133.3333333334, ans=0.0 2023-11-19 20:24:41,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=793200.0, ans=0.125 2023-11-19 20:24:49,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=793200.0, ans=0.0 2023-11-19 20:24:52,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=793200.0, ans=0.1 2023-11-19 20:25:04,916 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119000 2023-11-19 20:25:16,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-19 20:25:21,061 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10800, loss[loss=0.07966, simple_loss=0.08218, pruned_loss=0.02632, audio_tagging_loss=0.01225, over 14445.00 frames. ], tot_loss[loss=0.0836, simple_loss=0.1026, pruned_loss=0.02206, audio_tagging_loss=0.01024, over 3039486.80 frames. ], batch size: 57, lr: 6.79e-03, grad_scale: 32.0 2023-11-19 20:25:31,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793400.0, ans=0.1 2023-11-19 20:25:52,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=793533.3333333334, ans=0.0 2023-11-19 20:25:56,701 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.324e+01 9.413e+01 1.037e+02 1.353e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 20:26:10,132 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119050 2023-11-19 20:26:12,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=793666.6666666666, ans=0.125 2023-11-19 20:26:20,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=793666.6666666666, ans=0.0 2023-11-19 20:26:24,870 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10850, loss[loss=0.08795, simple_loss=0.109, pruned_loss=0.0237, audio_tagging_loss=0.009761, over 16465.00 frames. ], tot_loss[loss=0.08364, simple_loss=0.1025, pruned_loss=0.02214, audio_tagging_loss=0.01025, over 3037515.49 frames. ], batch size: 61, lr: 6.79e-03, grad_scale: 32.0 2023-11-19 20:26:29,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793733.3333333334, ans=0.1 2023-11-19 20:26:45,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=793800.0, ans=0.09899494936611666 2023-11-19 20:27:13,767 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119100 2023-11-19 20:27:22,329 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:27:27,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=794066.6666666666, ans=0.125 2023-11-19 20:27:28,417 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10900, loss[loss=0.0769, simple_loss=0.09811, pruned_loss=0.01944, audio_tagging_loss=0.008401, over 15080.00 frames. ], tot_loss[loss=0.08347, simple_loss=0.1023, pruned_loss=0.02195, audio_tagging_loss=0.01039, over 3041644.54 frames. ], batch size: 55, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:27:34,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=794066.6666666666, ans=0.125 2023-11-19 20:27:44,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2023-11-19 20:27:59,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=794200.0, ans=0.0 2023-11-19 20:28:05,877 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.516e+01 8.198e+01 8.697e+01 9.317e+01 1.364e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-19 20:28:15,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794266.6666666666, ans=0.1 2023-11-19 20:28:16,997 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119150 2023-11-19 20:28:21,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=794333.3333333334, ans=0.05 2023-11-19 20:28:22,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=794333.3333333334, ans=0.0 2023-11-19 20:28:33,902 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 10950, loss[loss=0.05735, simple_loss=0.06767, pruned_loss=0.01209, audio_tagging_loss=0.01142, over 15115.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.1019, pruned_loss=0.02179, audio_tagging_loss=0.01045, over 3045380.93 frames. ], batch size: 58, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:28:44,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=794400.0, ans=0.125 2023-11-19 20:28:55,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=794466.6666666666, ans=0.125 2023-11-19 20:28:59,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=794533.3333333334, ans=0.0 2023-11-19 20:29:13,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=794600.0, ans=0.0 2023-11-19 20:29:14,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-19 20:29:23,143 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119200 2023-11-19 20:29:29,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=794666.6666666666, ans=0.125 2023-11-19 20:29:34,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=794666.6666666666, ans=0.0 2023-11-19 20:29:37,886 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11000, loss[loss=0.07357, simple_loss=0.09216, pruned_loss=0.01779, audio_tagging_loss=0.009702, over 14987.00 frames. ], tot_loss[loss=0.0835, simple_loss=0.1024, pruned_loss=0.02184, audio_tagging_loss=0.01045, over 3041145.31 frames. ], batch size: 57, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:29:46,434 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:29:56,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=794800.0, ans=0.0 2023-11-19 20:30:10,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=794866.6666666666, ans=0.125 2023-11-19 20:30:15,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.738e+01 8.172e+01 9.020e+01 9.721e+01 1.365e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 20:30:27,049 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119250 2023-11-19 20:30:41,672 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11050, loss[loss=0.06635, simple_loss=0.07522, pruned_loss=0.01898, audio_tagging_loss=0.009761, over 15606.00 frames. ], tot_loss[loss=0.08416, simple_loss=0.1033, pruned_loss=0.022, audio_tagging_loss=0.01049, over 3049614.38 frames. ], batch size: 58, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:30:43,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-11-19 20:30:55,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=795133.3333333334, ans=0.125 2023-11-19 20:30:58,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=795133.3333333334, ans=0.125 2023-11-19 20:31:02,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=795133.3333333334, ans=0.0 2023-11-19 20:31:23,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-11-19 20:31:29,801 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119300 2023-11-19 20:31:32,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795333.3333333334, ans=0.1 2023-11-19 20:31:44,865 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11100, loss[loss=0.1015, simple_loss=0.127, pruned_loss=0.02847, audio_tagging_loss=0.009532, over 15324.00 frames. ], tot_loss[loss=0.08505, simple_loss=0.1044, pruned_loss=0.02228, audio_tagging_loss=0.01056, over 3050851.44 frames. ], batch size: 56, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:31:50,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=795400.0, ans=0.125 2023-11-19 20:31:50,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2023-11-19 20:31:51,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=795400.0, ans=10.0 2023-11-19 20:32:21,498 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.553e+01 9.407e+01 1.079e+02 1.400e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 20:32:33,126 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119350 2023-11-19 20:32:49,469 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11150, loss[loss=0.08748, simple_loss=0.1079, pruned_loss=0.0229, audio_tagging_loss=0.01061, over 15788.00 frames. ], tot_loss[loss=0.08548, simple_loss=0.1048, pruned_loss=0.02248, audio_tagging_loss=0.0106, over 3057550.69 frames. ], batch size: 59, lr: 6.78e-03, grad_scale: 16.0 2023-11-19 20:32:52,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2023-11-19 20:33:01,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=795800.0, ans=0.125 2023-11-19 20:33:19,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=795866.6666666666, ans=0.125 2023-11-19 20:33:22,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-11-19 20:33:26,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=795933.3333333334, ans=0.0 2023-11-19 20:33:29,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=795933.3333333334, ans=0.0 2023-11-19 20:33:37,732 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119400 2023-11-19 20:33:43,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=796000.0, ans=0.07 2023-11-19 20:33:50,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=796000.0, ans=0.0 2023-11-19 20:33:52,409 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11200, loss[loss=0.07928, simple_loss=0.105, pruned_loss=0.01756, audio_tagging_loss=0.009197, over 15910.00 frames. ], tot_loss[loss=0.08575, simple_loss=0.1049, pruned_loss=0.02267, audio_tagging_loss=0.01061, over 3055696.58 frames. ], batch size: 59, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:33:58,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=796066.6666666666, ans=0.0 2023-11-19 20:34:04,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=796133.3333333334, ans=0.125 2023-11-19 20:34:06,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-19 20:34:24,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=796200.0, ans=0.2 2023-11-19 20:34:30,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.063e+01 8.813e+01 9.513e+01 1.484e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 20:34:41,317 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119450 2023-11-19 20:34:45,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=796333.3333333334, ans=0.125 2023-11-19 20:34:51,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=796333.3333333334, ans=0.0 2023-11-19 20:34:51,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=796333.3333333334, ans=0.0 2023-11-19 20:34:56,375 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11250, loss[loss=0.09426, simple_loss=0.1162, pruned_loss=0.03059, audio_tagging_loss=0.005569, over 14903.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1042, pruned_loss=0.02249, audio_tagging_loss=0.01052, over 3055514.55 frames. ], batch size: 56, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:35:29,169 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:35:41,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=796600.0, ans=0.125 2023-11-19 20:35:45,369 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119500 2023-11-19 20:35:55,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.16 vs. limit=22.5 2023-11-19 20:36:01,650 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11300, loss[loss=0.09737, simple_loss=0.1242, pruned_loss=0.02553, audio_tagging_loss=0.009714, over 16252.00 frames. ], tot_loss[loss=0.08479, simple_loss=0.104, pruned_loss=0.02233, audio_tagging_loss=0.01044, over 3054030.45 frames. ], batch size: 59, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:36:04,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=796733.3333333334, ans=0.2 2023-11-19 20:36:05,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=796733.3333333334, ans=0.0 2023-11-19 20:36:15,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2023-11-19 20:36:27,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=796866.6666666666, ans=0.0 2023-11-19 20:36:30,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=796866.6666666666, ans=0.125 2023-11-19 20:36:38,564 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.176e+01 8.864e+01 9.663e+01 1.255e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 20:36:47,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=15.0 2023-11-19 20:36:48,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2023-11-19 20:36:50,751 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119550 2023-11-19 20:37:05,293 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11350, loss[loss=0.06169, simple_loss=0.07885, pruned_loss=0.01237, audio_tagging_loss=0.009893, over 15764.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1044, pruned_loss=0.02246, audio_tagging_loss=0.01027, over 3054496.92 frames. ], batch size: 63, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:37:06,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=797066.6666666666, ans=0.125 2023-11-19 20:37:15,744 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:37:20,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=797133.3333333334, ans=0.125 2023-11-19 20:37:21,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=797133.3333333334, ans=0.125 2023-11-19 20:37:21,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=797133.3333333334, ans=0.0 2023-11-19 20:37:43,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=797266.6666666666, ans=0.125 2023-11-19 20:37:52,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797266.6666666666, ans=0.1 2023-11-19 20:37:54,004 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119600 2023-11-19 20:37:59,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=797333.3333333334, ans=0.125 2023-11-19 20:38:09,488 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11400, loss[loss=0.08219, simple_loss=0.09458, pruned_loss=0.02337, audio_tagging_loss=0.01153, over 15198.00 frames. ], tot_loss[loss=0.08502, simple_loss=0.1046, pruned_loss=0.02252, audio_tagging_loss=0.01018, over 3049756.07 frames. ], batch size: 58, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:38:14,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=797400.0, ans=0.0 2023-11-19 20:38:27,046 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2023-11-19 20:38:32,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=12.0 2023-11-19 20:38:33,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=797466.6666666666, ans=0.0 2023-11-19 20:38:46,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.335e+01 9.100e+01 1.007e+02 1.269e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 20:38:47,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2023-11-19 20:38:48,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2023-11-19 20:38:49,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=797600.0, ans=0.05 2023-11-19 20:38:58,338 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119650 2023-11-19 20:38:58,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-19 20:39:07,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2023-11-19 20:39:10,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=797666.6666666666, ans=0.2 2023-11-19 20:39:14,314 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11450, loss[loss=0.08737, simple_loss=0.1124, pruned_loss=0.02156, audio_tagging_loss=0.009613, over 14462.00 frames. ], tot_loss[loss=0.08505, simple_loss=0.1047, pruned_loss=0.02249, audio_tagging_loss=0.01023, over 3045617.52 frames. ], batch size: 58, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:39:48,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=797866.6666666666, ans=0.0 2023-11-19 20:39:55,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=797933.3333333334, ans=0.125 2023-11-19 20:39:57,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-11-19 20:40:03,129 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119700 2023-11-19 20:40:09,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=798000.0, ans=0.125 2023-11-19 20:40:14,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=798000.0, ans=0.0 2023-11-19 20:40:18,267 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11500, loss[loss=0.08553, simple_loss=0.09477, pruned_loss=0.02857, audio_tagging_loss=0.009575, over 14560.00 frames. ], tot_loss[loss=0.08488, simple_loss=0.1047, pruned_loss=0.0224, audio_tagging_loss=0.01014, over 3049379.45 frames. ], batch size: 54, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:40:18,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=798066.6666666666, ans=0.125 2023-11-19 20:40:37,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=798133.3333333334, ans=0.125 2023-11-19 20:40:38,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=798133.3333333334, ans=0.125 2023-11-19 20:40:55,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.252e+01 9.046e+01 9.695e+01 1.384e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 20:41:05,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=798266.6666666666, ans=0.0 2023-11-19 20:41:07,293 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119750 2023-11-19 20:41:09,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=798333.3333333334, ans=15.0 2023-11-19 20:41:22,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=22.5 2023-11-19 20:41:22,489 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11550, loss[loss=0.07782, simple_loss=0.09204, pruned_loss=0.01927, audio_tagging_loss=0.01253, over 15880.00 frames. ], tot_loss[loss=0.08484, simple_loss=0.1045, pruned_loss=0.02249, audio_tagging_loss=0.01012, over 3045195.51 frames. ], batch size: 62, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:41:31,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=798400.0, ans=0.125 2023-11-19 20:41:39,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=798466.6666666666, ans=0.125 2023-11-19 20:41:58,400 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:42:11,173 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119800 2023-11-19 20:42:11,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=798600.0, ans=0.125 2023-11-19 20:42:27,333 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11600, loss[loss=0.08793, simple_loss=0.1138, pruned_loss=0.02337, audio_tagging_loss=0.007681, over 15303.00 frames. ], tot_loss[loss=0.08443, simple_loss=0.104, pruned_loss=0.02216, audio_tagging_loss=0.01029, over 3038399.72 frames. ], batch size: 56, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:43:04,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.658e+01 9.480e+01 1.096e+02 1.560e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 20:43:08,085 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:43:15,991 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119850 2023-11-19 20:43:31,177 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11650, loss[loss=0.08646, simple_loss=0.09885, pruned_loss=0.02361, audio_tagging_loss=0.01342, over 15185.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1047, pruned_loss=0.02236, audio_tagging_loss=0.01032, over 3046392.18 frames. ], batch size: 56, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:43:36,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=799066.6666666666, ans=0.125 2023-11-19 20:43:45,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=799133.3333333334, ans=0.125 2023-11-19 20:43:55,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=799200.0, ans=0.125 2023-11-19 20:44:06,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=799200.0, ans=0.125 2023-11-19 20:44:11,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-19 20:44:19,888 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119900 2023-11-19 20:44:35,533 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11700, loss[loss=0.08683, simple_loss=0.1076, pruned_loss=0.02448, audio_tagging_loss=0.008529, over 14350.00 frames. ], tot_loss[loss=0.08496, simple_loss=0.1047, pruned_loss=0.02227, audio_tagging_loss=0.01034, over 3053897.76 frames. ], batch size: 55, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:45:10,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=799533.3333333334, ans=0.0 2023-11-19 20:45:12,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=799533.3333333334, ans=0.0 2023-11-19 20:45:12,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-11-19 20:45:14,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.190e+01 9.042e+01 9.907e+01 1.390e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 20:45:15,735 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:45:15,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=799600.0, ans=0.1 2023-11-19 20:45:24,731 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 119950 2023-11-19 20:45:34,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2023-11-19 20:45:40,601 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11750, loss[loss=0.07733, simple_loss=0.08684, pruned_loss=0.02058, audio_tagging_loss=0.01333, over 14928.00 frames. ], tot_loss[loss=0.084, simple_loss=0.1031, pruned_loss=0.02203, audio_tagging_loss=0.01045, over 3046536.07 frames. ], batch size: 56, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:46:09,816 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.852e-02 2023-11-19 20:46:25,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=799933.3333333334, ans=0.1 2023-11-19 20:46:29,881 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120000 2023-11-19 20:46:47,859 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11800, loss[loss=0.06877, simple_loss=0.08385, pruned_loss=0.01361, audio_tagging_loss=0.01324, over 14959.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.1015, pruned_loss=0.02182, audio_tagging_loss=0.01052, over 3043757.10 frames. ], batch size: 58, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:47:26,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.456e+01 9.093e+01 9.839e+01 1.192e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-19 20:47:34,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=800266.6666666666, ans=0.125 2023-11-19 20:47:36,856 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120050 2023-11-19 20:47:40,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=800333.3333333334, ans=0.0 2023-11-19 20:47:41,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=800333.3333333334, ans=0.0 2023-11-19 20:47:51,948 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11850, loss[loss=0.09194, simple_loss=0.1105, pruned_loss=0.02553, audio_tagging_loss=0.01113, over 15131.00 frames. ], tot_loss[loss=0.0837, simple_loss=0.1023, pruned_loss=0.022, audio_tagging_loss=0.01057, over 3050239.12 frames. ], batch size: 56, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:48:30,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800600.0, ans=0.1 2023-11-19 20:48:40,957 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120100 2023-11-19 20:48:54,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=800666.6666666666, ans=0.125 2023-11-19 20:48:57,225 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11900, loss[loss=0.07209, simple_loss=0.09159, pruned_loss=0.01732, audio_tagging_loss=0.00897, over 15389.00 frames. ], tot_loss[loss=0.08387, simple_loss=0.1025, pruned_loss=0.02198, audio_tagging_loss=0.01064, over 3051636.70 frames. ], batch size: 57, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:49:15,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=800800.0, ans=0.125 2023-11-19 20:49:23,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=800866.6666666666, ans=0.125 2023-11-19 20:49:34,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800933.3333333334, ans=0.1 2023-11-19 20:49:35,958 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.251e+01 9.063e+01 9.820e+01 1.973e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-19 20:49:41,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2023-11-19 20:49:45,852 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120150 2023-11-19 20:49:52,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-11-19 20:49:58,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-11-19 20:50:00,556 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 11950, loss[loss=0.09294, simple_loss=0.1181, pruned_loss=0.02667, audio_tagging_loss=0.007226, over 15348.00 frames. ], tot_loss[loss=0.08402, simple_loss=0.1028, pruned_loss=0.02202, audio_tagging_loss=0.0106, over 3049078.05 frames. ], batch size: 58, lr: 6.76e-03, grad_scale: 16.0 2023-11-19 20:50:08,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=801066.6666666666, ans=0.1 2023-11-19 20:50:27,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=801200.0, ans=0.0 2023-11-19 20:50:34,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801200.0, ans=0.1 2023-11-19 20:50:35,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801200.0, ans=0.1 2023-11-19 20:50:48,465 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120200 2023-11-19 20:51:02,551 INFO [train_asr.py:1262] (2/4) Epoch 10, batch 12000, loss[loss=0.1001, simple_loss=0.1266, pruned_loss=0.02788, audio_tagging_loss=0.008853, over 15801.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1034, pruned_loss=0.02218, audio_tagging_loss=0.01064, over 3042903.01 frames. ], batch size: 58, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:51:02,552 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-19 20:51:28,103 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7641, 5.8190, 5.8427, 5.9065], device='cuda:2') 2023-11-19 20:51:41,796 INFO [train_asr.py:1294] (2/4) Epoch 10, validation: loss=0.06456, simple_loss=0.05518, pruned_loss=0.006322, audio_tagging_loss=0.03065, over 4681554.00 frames. 2023-11-19 20:51:41,797 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-19 20:51:45,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=801400.0, ans=0.0 2023-11-19 20:51:50,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=801400.0, ans=0.1 2023-11-19 20:51:51,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=801400.0, ans=0.125 2023-11-19 20:52:04,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=12.0 2023-11-19 20:52:44,649 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 0, loss[loss=0.09928, simple_loss=0.115, pruned_loss=0.0207, audio_tagging_loss=0.02108, over 15766.00 frames. ], tot_loss[loss=0.09928, simple_loss=0.115, pruned_loss=0.0207, audio_tagging_loss=0.02108, over 15766.00 frames. ], batch size: 58, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:52:44,650 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-19 20:53:07,195 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5379, 3.6231, 4.3555, 3.2266], device='cuda:2') 2023-11-19 20:53:20,004 INFO [train_asr.py:1294] (2/4) Epoch 11, validation: loss=0.06409, simple_loss=0.05518, pruned_loss=0.006264, audio_tagging_loss=0.03024, over 4681554.00 frames. 2023-11-19 20:53:20,005 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-19 20:53:24,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2023-11-19 20:53:30,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=801540.0, ans=0.125 2023-11-19 20:53:32,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.493e+01 9.059e+01 9.664e+01 1.642e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 20:53:35,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2023-11-19 20:53:41,116 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120250 2023-11-19 20:53:41,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=801606.6666666666, ans=0.125 2023-11-19 20:53:42,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=801606.6666666666, ans=0.125 2023-11-19 20:53:56,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=801673.3333333334, ans=0.025 2023-11-19 20:54:17,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=801806.6666666666, ans=0.125 2023-11-19 20:54:17,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=801806.6666666666, ans=0.0 2023-11-19 20:54:17,613 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2023-11-19 20:54:23,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=801873.3333333334, ans=0.035 2023-11-19 20:54:24,286 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 50, loss[loss=0.09772, simple_loss=0.1142, pruned_loss=0.02406, audio_tagging_loss=0.01656, over 15048.00 frames. ], tot_loss[loss=0.09441, simple_loss=0.1047, pruned_loss=0.02235, audio_tagging_loss=0.0197, over 684912.81 frames. ], batch size: 56, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:54:30,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801873.3333333334, ans=0.1 2023-11-19 20:54:46,583 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120300 2023-11-19 20:54:46,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=801940.0, ans=0.125 2023-11-19 20:54:53,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=12.0 2023-11-19 20:55:02,151 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:55:30,007 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 100, loss[loss=0.09349, simple_loss=0.1048, pruned_loss=0.0229, audio_tagging_loss=0.01817, over 14448.00 frames. ], tot_loss[loss=0.09454, simple_loss=0.1054, pruned_loss=0.02258, audio_tagging_loss=0.01925, over 1212714.81 frames. ], batch size: 54, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:55:35,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=802206.6666666666, ans=0.125 2023-11-19 20:55:41,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-11-19 20:55:44,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.908e+01 9.605e+01 1.032e+02 1.207e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-19 20:55:44,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-11-19 20:55:52,813 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120350 2023-11-19 20:56:01,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=802340.0, ans=0.2 2023-11-19 20:56:08,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=802406.6666666666, ans=0.09899494936611666 2023-11-19 20:56:35,847 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 150, loss[loss=0.08418, simple_loss=0.09907, pruned_loss=0.02233, audio_tagging_loss=0.01231, over 15527.00 frames. ], tot_loss[loss=0.09144, simple_loss=0.1046, pruned_loss=0.02201, audio_tagging_loss=0.01713, over 1621144.25 frames. ], batch size: 61, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:56:38,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=802540.0, ans=0.0 2023-11-19 20:56:40,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2023-11-19 20:56:52,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=802606.6666666666, ans=0.1 2023-11-19 20:56:57,514 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120400 2023-11-19 20:57:03,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=802673.3333333334, ans=10.0 2023-11-19 20:57:16,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-19 20:57:19,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=802740.0, ans=0.0 2023-11-19 20:57:20,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=802740.0, ans=0.2 2023-11-19 20:57:40,923 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 200, loss[loss=0.07689, simple_loss=0.09675, pruned_loss=0.01707, audio_tagging_loss=0.01144, over 16133.00 frames. ], tot_loss[loss=0.09093, simple_loss=0.1067, pruned_loss=0.0226, audio_tagging_loss=0.01498, over 1943796.49 frames. ], batch size: 59, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:57:48,979 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=12.0 2023-11-19 20:57:54,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.513e+01 8.370e+01 8.919e+01 1.001e+02 1.772e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 20:57:59,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=802940.0, ans=0.125 2023-11-19 20:58:00,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=802940.0, ans=0.2 2023-11-19 20:58:02,719 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120450 2023-11-19 20:58:13,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=22.5 2023-11-19 20:58:29,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=803073.3333333334, ans=0.125 2023-11-19 20:58:46,053 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 250, loss[loss=0.09184, simple_loss=0.1139, pruned_loss=0.02383, audio_tagging_loss=0.01105, over 15290.00 frames. ], tot_loss[loss=0.09058, simple_loss=0.108, pruned_loss=0.02309, audio_tagging_loss=0.01347, over 2192712.54 frames. ], batch size: 57, lr: 6.45e-03, grad_scale: 16.0 2023-11-19 20:58:50,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=803206.6666666666, ans=10.0 2023-11-19 20:58:51,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=803206.6666666666, ans=0.125 2023-11-19 20:58:52,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=803206.6666666666, ans=0.125 2023-11-19 20:58:53,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.73 vs. limit=15.0 2023-11-19 20:58:53,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=803206.6666666666, ans=0.2 2023-11-19 20:59:00,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=803273.3333333334, ans=0.125 2023-11-19 20:59:03,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=803273.3333333334, ans=0.07 2023-11-19 20:59:08,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=803273.3333333334, ans=0.0 2023-11-19 20:59:09,469 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120500 2023-11-19 20:59:24,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=803406.6666666666, ans=0.125 2023-11-19 20:59:31,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=803406.6666666666, ans=0.125 2023-11-19 20:59:35,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=803406.6666666666, ans=0.2 2023-11-19 20:59:52,071 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 300, loss[loss=0.1217, simple_loss=0.1628, pruned_loss=0.03397, audio_tagging_loss=0.006295, over 15409.00 frames. ], tot_loss[loss=0.08954, simple_loss=0.1078, pruned_loss=0.02312, audio_tagging_loss=0.01251, over 2378877.51 frames. ], batch size: 54, lr: 6.45e-03, grad_scale: 16.0 2023-11-19 20:59:54,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=803540.0, ans=0.125 2023-11-19 21:00:05,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.387e+01 8.949e+01 9.814e+01 1.274e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 21:00:13,393 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120550 2023-11-19 21:00:33,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.17 vs. limit=15.0 2023-11-19 21:00:52,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=803806.6666666666, ans=0.0 2023-11-19 21:00:55,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=803873.3333333334, ans=0.125 2023-11-19 21:00:56,060 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 350, loss[loss=0.0774, simple_loss=0.09747, pruned_loss=0.01978, audio_tagging_loss=0.008882, over 14400.00 frames. ], tot_loss[loss=0.0885, simple_loss=0.1074, pruned_loss=0.02292, audio_tagging_loss=0.01189, over 2528539.59 frames. ], batch size: 56, lr: 6.44e-03, grad_scale: 16.0 2023-11-19 21:00:59,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-19 21:01:03,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=803873.3333333334, ans=0.0 2023-11-19 21:01:10,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=803940.0, ans=0.125 2023-11-19 21:01:15,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=803940.0, ans=0.125 2023-11-19 21:01:16,977 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:01:17,875 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120600 2023-11-19 21:01:19,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=803940.0, ans=0.125 2023-11-19 21:01:55,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=804140.0, ans=0.125 2023-11-19 21:02:01,536 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 400, loss[loss=0.07712, simple_loss=0.09601, pruned_loss=0.02005, audio_tagging_loss=0.009061, over 15232.00 frames. ], tot_loss[loss=0.08764, simple_loss=0.1068, pruned_loss=0.02277, audio_tagging_loss=0.01147, over 2637585.16 frames. ], batch size: 58, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:02:15,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.262e+01 8.810e+01 9.660e+01 1.540e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 21:02:24,527 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120650 2023-11-19 21:02:29,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804340.0, ans=0.1 2023-11-19 21:02:35,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804340.0, ans=0.1 2023-11-19 21:02:59,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=804473.3333333334, ans=0.04949747468305833 2023-11-19 21:03:00,574 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=8.0 2023-11-19 21:03:06,996 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 450, loss[loss=0.07987, simple_loss=0.1005, pruned_loss=0.0208, audio_tagging_loss=0.00884, over 15002.00 frames. ], tot_loss[loss=0.08703, simple_loss=0.1065, pruned_loss=0.02269, audio_tagging_loss=0.01111, over 2726185.71 frames. ], batch size: 56, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:03:11,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=804540.0, ans=6.0 2023-11-19 21:03:13,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-19 21:03:24,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2023-11-19 21:03:26,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=804606.6666666666, ans=0.0 2023-11-19 21:03:28,965 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120700 2023-11-19 21:03:39,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=804673.3333333334, ans=0.125 2023-11-19 21:03:51,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2023-11-19 21:03:57,844 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2023-11-19 21:04:12,492 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 500, loss[loss=0.09162, simple_loss=0.1177, pruned_loss=0.02722, audio_tagging_loss=0.005558, over 14676.00 frames. ], tot_loss[loss=0.08677, simple_loss=0.1063, pruned_loss=0.02272, audio_tagging_loss=0.01092, over 2802283.83 frames. ], batch size: 54, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:04:17,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=804873.3333333334, ans=0.2 2023-11-19 21:04:25,803 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2023-11-19 21:04:26,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.498e+01 9.308e+01 1.026e+02 1.855e+02, threshold=1.862e+02, percent-clipped=1.0 2023-11-19 21:04:34,133 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120750 2023-11-19 21:04:48,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805006.6666666666, ans=0.1 2023-11-19 21:04:55,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=805073.3333333334, ans=0.125 2023-11-19 21:04:59,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=805073.3333333334, ans=0.2 2023-11-19 21:05:00,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=805073.3333333334, ans=0.0 2023-11-19 21:05:10,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=805140.0, ans=0.0 2023-11-19 21:05:15,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=805206.6666666666, ans=0.125 2023-11-19 21:05:16,338 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 550, loss[loss=0.1068, simple_loss=0.1376, pruned_loss=0.03075, audio_tagging_loss=0.007311, over 14940.00 frames. ], tot_loss[loss=0.08547, simple_loss=0.1047, pruned_loss=0.02235, audio_tagging_loss=0.01079, over 2850759.51 frames. ], batch size: 54, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:05:22,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=805206.6666666666, ans=0.0 2023-11-19 21:05:36,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=805273.3333333334, ans=0.125 2023-11-19 21:05:39,023 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120800 2023-11-19 21:05:57,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=805406.6666666666, ans=0.0 2023-11-19 21:06:03,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-11-19 21:06:19,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=805473.3333333334, ans=0.0 2023-11-19 21:06:21,509 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 600, loss[loss=0.05803, simple_loss=0.06309, pruned_loss=0.01456, audio_tagging_loss=0.01194, over 15124.00 frames. ], tot_loss[loss=0.08546, simple_loss=0.1048, pruned_loss=0.02241, audio_tagging_loss=0.01063, over 2897577.35 frames. ], batch size: 59, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:06:35,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=805606.6666666666, ans=0.125 2023-11-19 21:06:36,956 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.276e+01 9.038e+01 9.770e+01 1.365e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 21:06:39,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=805606.6666666666, ans=0.2 2023-11-19 21:06:44,469 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120850 2023-11-19 21:07:13,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-19 21:07:14,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=8.0 2023-11-19 21:07:27,248 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 650, loss[loss=0.06696, simple_loss=0.07477, pruned_loss=0.01591, audio_tagging_loss=0.01367, over 15172.00 frames. ], tot_loss[loss=0.08554, simple_loss=0.1048, pruned_loss=0.02253, audio_tagging_loss=0.01064, over 2930801.11 frames. ], batch size: 58, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:07:27,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=805873.3333333334, ans=10.0 2023-11-19 21:07:27,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=805873.3333333334, ans=0.125 2023-11-19 21:07:28,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=15.0 2023-11-19 21:07:48,670 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120900 2023-11-19 21:07:51,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=806006.6666666666, ans=0.125 2023-11-19 21:07:56,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=806006.6666666666, ans=0.025 2023-11-19 21:08:08,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.41 vs. limit=15.0 2023-11-19 21:08:30,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=806206.6666666666, ans=0.0 2023-11-19 21:08:30,871 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 700, loss[loss=0.08775, simple_loss=0.1046, pruned_loss=0.02405, audio_tagging_loss=0.01139, over 15968.00 frames. ], tot_loss[loss=0.0861, simple_loss=0.106, pruned_loss=0.02262, audio_tagging_loss=0.01048, over 2966705.64 frames. ], batch size: 60, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:08:34,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=806206.6666666666, ans=0.2 2023-11-19 21:08:44,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.069e+01 8.585e+01 9.544e+01 1.162e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-19 21:08:48,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=806273.3333333334, ans=0.125 2023-11-19 21:08:53,554 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 120950 2023-11-19 21:08:57,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-19 21:08:58,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=806340.0, ans=0.0 2023-11-19 21:09:01,584 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2023-11-19 21:09:09,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=806406.6666666666, ans=0.05 2023-11-19 21:09:15,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806406.6666666666, ans=0.1 2023-11-19 21:09:15,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-19 21:09:35,888 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 750, loss[loss=0.09984, simple_loss=0.1283, pruned_loss=0.02832, audio_tagging_loss=0.007364, over 16493.00 frames. ], tot_loss[loss=0.08572, simple_loss=0.1057, pruned_loss=0.02242, audio_tagging_loss=0.01044, over 2983961.10 frames. ], batch size: 60, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:09:37,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=806540.0, ans=0.125 2023-11-19 21:09:58,084 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121000 2023-11-19 21:10:00,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=806606.6666666666, ans=0.0 2023-11-19 21:10:03,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2023-11-19 21:10:04,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806673.3333333334, ans=0.1 2023-11-19 21:10:19,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=806740.0, ans=0.125 2023-11-19 21:10:22,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=806740.0, ans=0.2 2023-11-19 21:10:25,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806740.0, ans=0.1 2023-11-19 21:10:38,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=806806.6666666666, ans=0.0 2023-11-19 21:10:40,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=806873.3333333334, ans=0.0 2023-11-19 21:10:41,371 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 800, loss[loss=0.06416, simple_loss=0.06599, pruned_loss=0.01566, audio_tagging_loss=0.01551, over 17118.00 frames. ], tot_loss[loss=0.08633, simple_loss=0.1063, pruned_loss=0.0227, audio_tagging_loss=0.01049, over 3005724.77 frames. ], batch size: 65, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:10:55,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.303e+01 9.154e+01 9.871e+01 1.410e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 21:11:02,910 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121050 2023-11-19 21:11:04,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=806940.0, ans=0.2 2023-11-19 21:11:05,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=12.0 2023-11-19 21:11:06,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-19 21:11:18,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2023-11-19 21:11:28,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-11-19 21:11:45,785 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 850, loss[loss=0.0751, simple_loss=0.09468, pruned_loss=0.01483, audio_tagging_loss=0.01293, over 14775.00 frames. ], tot_loss[loss=0.08584, simple_loss=0.1055, pruned_loss=0.02254, audio_tagging_loss=0.01056, over 3008443.07 frames. ], batch size: 57, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:11:50,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=807206.6666666666, ans=0.125 2023-11-19 21:11:52,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=807206.6666666666, ans=0.0 2023-11-19 21:11:57,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=807273.3333333334, ans=0.0 2023-11-19 21:12:01,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=807273.3333333334, ans=0.125 2023-11-19 21:12:01,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=807273.3333333334, ans=0.09899494936611666 2023-11-19 21:12:01,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.77 vs. limit=12.0 2023-11-19 21:12:07,857 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121100 2023-11-19 21:12:08,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=807273.3333333334, ans=0.125 2023-11-19 21:12:13,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=807340.0, ans=0.125 2023-11-19 21:12:20,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=807340.0, ans=0.5 2023-11-19 21:12:40,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=807473.3333333334, ans=0.0 2023-11-19 21:12:50,508 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 900, loss[loss=0.1037, simple_loss=0.1346, pruned_loss=0.0277, audio_tagging_loss=0.008671, over 14831.00 frames. ], tot_loss[loss=0.08527, simple_loss=0.1046, pruned_loss=0.02231, audio_tagging_loss=0.01067, over 3010795.88 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:13:05,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.167e+01 8.792e+01 9.769e+01 1.364e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-19 21:13:13,273 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121150 2023-11-19 21:13:19,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807673.3333333334, ans=0.1 2023-11-19 21:13:27,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807673.3333333334, ans=0.1 2023-11-19 21:13:33,383 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:13:56,695 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 950, loss[loss=0.07505, simple_loss=0.09359, pruned_loss=0.01579, audio_tagging_loss=0.01246, over 14719.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.105, pruned_loss=0.02247, audio_tagging_loss=0.01049, over 3019343.18 frames. ], batch size: 54, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:14:01,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=12.0 2023-11-19 21:14:04,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=807873.3333333334, ans=0.0 2023-11-19 21:14:04,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=807873.3333333334, ans=0.0 2023-11-19 21:14:08,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.93 vs. limit=10.0 2023-11-19 21:14:18,442 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121200 2023-11-19 21:14:28,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=10.0 2023-11-19 21:14:30,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=808006.6666666666, ans=0.125 2023-11-19 21:14:49,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=808140.0, ans=0.125 2023-11-19 21:15:01,052 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1000, loss[loss=0.0737, simple_loss=0.08919, pruned_loss=0.01825, audio_tagging_loss=0.01085, over 14972.00 frames. ], tot_loss[loss=0.08478, simple_loss=0.1041, pruned_loss=0.02233, audio_tagging_loss=0.01038, over 3026150.87 frames. ], batch size: 59, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:15:15,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.437e+01 8.149e+01 8.966e+01 9.862e+01 1.248e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 21:15:19,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=808273.3333333334, ans=0.0 2023-11-19 21:15:23,332 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121250 2023-11-19 21:15:28,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-19 21:15:29,313 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:15:33,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808340.0, ans=0.1 2023-11-19 21:15:38,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=808340.0, ans=0.125 2023-11-19 21:15:39,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808406.6666666666, ans=0.1 2023-11-19 21:15:50,627 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-19 21:16:01,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=808473.3333333334, ans=0.125 2023-11-19 21:16:05,943 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1050, loss[loss=0.08102, simple_loss=0.1082, pruned_loss=0.01737, audio_tagging_loss=0.009563, over 15270.00 frames. ], tot_loss[loss=0.08429, simple_loss=0.1035, pruned_loss=0.02212, audio_tagging_loss=0.01042, over 3024777.00 frames. ], batch size: 55, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:16:17,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=808540.0, ans=0.2 2023-11-19 21:16:28,216 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121300 2023-11-19 21:17:00,685 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-11-19 21:17:07,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=808806.6666666666, ans=0.125 2023-11-19 21:17:09,185 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.628e-03 2023-11-19 21:17:11,263 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1100, loss[loss=0.09381, simple_loss=0.1146, pruned_loss=0.02649, audio_tagging_loss=0.01004, over 15345.00 frames. ], tot_loss[loss=0.08383, simple_loss=0.1031, pruned_loss=0.02198, audio_tagging_loss=0.01032, over 3022777.85 frames. ], batch size: 60, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:17:13,672 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:17:22,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=808940.0, ans=0.09899494936611666 2023-11-19 21:17:23,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=808940.0, ans=0.0 2023-11-19 21:17:23,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=808940.0, ans=0.125 2023-11-19 21:17:25,900 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.226e+01 9.036e+01 9.842e+01 1.440e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 21:17:32,121 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121350 2023-11-19 21:17:45,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=809006.6666666666, ans=0.5 2023-11-19 21:17:46,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=809006.6666666666, ans=0.0 2023-11-19 21:17:53,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809073.3333333334, ans=0.1 2023-11-19 21:18:05,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-19 21:18:11,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=809140.0, ans=0.125 2023-11-19 21:18:14,500 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1150, loss[loss=0.07652, simple_loss=0.0975, pruned_loss=0.01729, audio_tagging_loss=0.01048, over 14738.00 frames. ], tot_loss[loss=0.08453, simple_loss=0.1044, pruned_loss=0.02216, audio_tagging_loss=0.0102, over 3029639.29 frames. ], batch size: 56, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:18:37,147 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121400 2023-11-19 21:18:37,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=809273.3333333334, ans=0.0 2023-11-19 21:19:02,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=809406.6666666666, ans=0.125 2023-11-19 21:19:05,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809473.3333333334, ans=0.1 2023-11-19 21:19:06,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-19 21:19:20,002 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1200, loss[loss=0.06951, simple_loss=0.08153, pruned_loss=0.01381, audio_tagging_loss=0.01494, over 14049.00 frames. ], tot_loss[loss=0.08489, simple_loss=0.1047, pruned_loss=0.02233, audio_tagging_loss=0.01023, over 3027089.75 frames. ], batch size: 56, lr: 6.42e-03, grad_scale: 32.0 2023-11-19 21:19:27,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=809540.0, ans=0.0 2023-11-19 21:19:33,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=809606.6666666666, ans=0.125 2023-11-19 21:19:36,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.249e+01 9.079e+01 9.946e+01 1.270e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 21:19:41,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=809606.6666666666, ans=0.125 2023-11-19 21:19:42,790 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121450 2023-11-19 21:19:49,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=809673.3333333334, ans=0.0 2023-11-19 21:19:54,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=809673.3333333334, ans=0.125 2023-11-19 21:20:03,929 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:20:16,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2023-11-19 21:20:17,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=809806.6666666666, ans=0.0 2023-11-19 21:20:20,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=809806.6666666666, ans=0.125 2023-11-19 21:20:25,694 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1250, loss[loss=0.09203, simple_loss=0.1129, pruned_loss=0.02631, audio_tagging_loss=0.009251, over 14723.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1032, pruned_loss=0.02195, audio_tagging_loss=0.0103, over 3027176.44 frames. ], batch size: 55, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:20:44,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=809940.0, ans=0.1 2023-11-19 21:20:46,384 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121500 2023-11-19 21:21:07,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=810073.3333333334, ans=0.125 2023-11-19 21:21:28,547 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1300, loss[loss=0.1178, simple_loss=0.1387, pruned_loss=0.04055, audio_tagging_loss=0.007867, over 15160.00 frames. ], tot_loss[loss=0.08371, simple_loss=0.1029, pruned_loss=0.02201, audio_tagging_loss=0.01024, over 3027670.12 frames. ], batch size: 55, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:21:45,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.695e+01 9.311e+01 1.032e+02 1.222e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 21:21:48,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-11-19 21:21:50,145 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121550 2023-11-19 21:21:54,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=810340.0, ans=0.125 2023-11-19 21:22:01,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=810340.0, ans=0.0 2023-11-19 21:22:01,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=810340.0, ans=15.0 2023-11-19 21:22:03,890 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:22:04,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=810340.0, ans=0.0 2023-11-19 21:22:21,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=810473.3333333334, ans=0.0 2023-11-19 21:22:29,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=810473.3333333334, ans=0.04949747468305833 2023-11-19 21:22:32,532 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1350, loss[loss=0.04561, simple_loss=0.05134, pruned_loss=0.00791, audio_tagging_loss=0.01203, over 14941.00 frames. ], tot_loss[loss=0.08415, simple_loss=0.1035, pruned_loss=0.02213, audio_tagging_loss=0.01029, over 3034710.35 frames. ], batch size: 57, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:22:35,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=810540.0, ans=0.125 2023-11-19 21:22:35,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=810540.0, ans=0.125 2023-11-19 21:22:52,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2023-11-19 21:22:54,974 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121600 2023-11-19 21:22:56,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=810606.6666666666, ans=0.0 2023-11-19 21:23:07,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=810673.3333333334, ans=0.125 2023-11-19 21:23:18,704 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:23:25,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:26,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:30,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810806.6666666666, ans=0.1 2023-11-19 21:23:31,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2023-11-19 21:23:34,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:36,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=22.5 2023-11-19 21:23:37,687 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1400, loss[loss=0.09563, simple_loss=0.1182, pruned_loss=0.028, audio_tagging_loss=0.008508, over 14309.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.1041, pruned_loss=0.02229, audio_tagging_loss=0.01037, over 3035760.94 frames. ], batch size: 56, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:23:42,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=810873.3333333334, ans=0.125 2023-11-19 21:23:44,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-19 21:23:53,614 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.902e+01 8.352e+01 8.972e+01 9.668e+01 1.251e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 21:23:58,601 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121650 2023-11-19 21:24:16,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-19 21:24:20,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=811073.3333333334, ans=0.0 2023-11-19 21:24:40,435 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1450, loss[loss=0.1067, simple_loss=0.1238, pruned_loss=0.03478, audio_tagging_loss=0.01004, over 15207.00 frames. ], tot_loss[loss=0.08457, simple_loss=0.1038, pruned_loss=0.02226, audio_tagging_loss=0.01041, over 3037901.26 frames. ], batch size: 55, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:24:50,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=811206.6666666666, ans=0.125 2023-11-19 21:24:56,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=811273.3333333334, ans=0.125 2023-11-19 21:25:01,968 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121700 2023-11-19 21:25:04,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=811340.0, ans=0.125 2023-11-19 21:25:23,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.36 vs. limit=6.0 2023-11-19 21:25:28,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=811406.6666666666, ans=0.0 2023-11-19 21:25:44,109 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1500, loss[loss=0.06543, simple_loss=0.079, pruned_loss=0.0152, audio_tagging_loss=0.01073, over 16061.00 frames. ], tot_loss[loss=0.08493, simple_loss=0.1045, pruned_loss=0.02237, audio_tagging_loss=0.01034, over 3036249.28 frames. ], batch size: 62, lr: 6.41e-03, grad_scale: 16.0 2023-11-19 21:26:01,425 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.227e+01 9.077e+01 1.029e+02 1.490e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 21:26:03,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-11-19 21:26:07,333 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121750 2023-11-19 21:26:13,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811673.3333333334, ans=0.1 2023-11-19 21:26:14,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=811673.3333333334, ans=0.05 2023-11-19 21:26:38,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=811806.6666666666, ans=0.0 2023-11-19 21:26:41,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=811806.6666666666, ans=0.0 2023-11-19 21:26:48,075 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1550, loss[loss=0.07545, simple_loss=0.08732, pruned_loss=0.02082, audio_tagging_loss=0.01098, over 15704.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1045, pruned_loss=0.02244, audio_tagging_loss=0.01042, over 3035919.46 frames. ], batch size: 59, lr: 6.41e-03, grad_scale: 16.0 2023-11-19 21:26:54,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=811873.3333333334, ans=0.125 2023-11-19 21:27:08,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=12.0 2023-11-19 21:27:11,378 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121800 2023-11-19 21:27:11,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2023-11-19 21:27:23,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=12.0 2023-11-19 21:27:35,224 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-19 21:27:54,382 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1600, loss[loss=0.08669, simple_loss=0.1033, pruned_loss=0.02344, audio_tagging_loss=0.0116, over 15229.00 frames. ], tot_loss[loss=0.08556, simple_loss=0.1052, pruned_loss=0.02251, audio_tagging_loss=0.01043, over 3038916.17 frames. ], batch size: 56, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:28:10,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.357e+01 8.989e+01 9.853e+01 1.199e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 21:28:13,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=812273.3333333334, ans=0.125 2023-11-19 21:28:15,893 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121850 2023-11-19 21:28:33,244 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=22.5 2023-11-19 21:28:37,360 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-11-19 21:28:50,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812473.3333333334, ans=0.1 2023-11-19 21:28:57,476 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1650, loss[loss=0.07924, simple_loss=0.09813, pruned_loss=0.01985, audio_tagging_loss=0.01033, over 15182.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.104, pruned_loss=0.02215, audio_tagging_loss=0.01047, over 3038384.57 frames. ], batch size: 58, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:29:01,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=812540.0, ans=0.125 2023-11-19 21:29:20,480 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121900 2023-11-19 21:30:01,951 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1700, loss[loss=0.0957, simple_loss=0.1137, pruned_loss=0.02789, audio_tagging_loss=0.01097, over 16139.00 frames. ], tot_loss[loss=0.08391, simple_loss=0.1031, pruned_loss=0.02183, audio_tagging_loss=0.01055, over 3035992.27 frames. ], batch size: 60, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:30:03,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812873.3333333334, ans=0.1 2023-11-19 21:30:11,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=812873.3333333334, ans=0.125 2023-11-19 21:30:19,242 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2023-11-19 21:30:19,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.451e+01 9.168e+01 1.014e+02 1.661e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-19 21:30:22,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=812940.0, ans=0.125 2023-11-19 21:30:24,717 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 121950 2023-11-19 21:31:07,568 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1750, loss[loss=0.0813, simple_loss=0.09279, pruned_loss=0.02293, audio_tagging_loss=0.01197, over 16445.00 frames. ], tot_loss[loss=0.08365, simple_loss=0.1029, pruned_loss=0.02179, audio_tagging_loss=0.01042, over 3045926.26 frames. ], batch size: 66, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:31:09,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-11-19 21:31:26,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=813273.3333333334, ans=0.125 2023-11-19 21:31:28,447 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122000 2023-11-19 21:31:56,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813406.6666666666, ans=0.1 2023-11-19 21:32:12,125 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1800, loss[loss=0.08308, simple_loss=0.1052, pruned_loss=0.02153, audio_tagging_loss=0.008943, over 15593.00 frames. ], tot_loss[loss=0.08431, simple_loss=0.104, pruned_loss=0.02208, audio_tagging_loss=0.01022, over 3046138.61 frames. ], batch size: 57, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:32:16,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=813540.0, ans=0.0 2023-11-19 21:32:21,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813540.0, ans=0.125 2023-11-19 21:32:22,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=813540.0, ans=0.1 2023-11-19 21:32:28,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.061e+01 8.839e+01 9.659e+01 3.662e+02, threshold=1.768e+02, percent-clipped=1.0 2023-11-19 21:32:34,301 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122050 2023-11-19 21:32:38,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=813673.3333333334, ans=0.125 2023-11-19 21:32:53,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=813740.0, ans=0.125 2023-11-19 21:32:54,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=813740.0, ans=0.125 2023-11-19 21:33:04,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.78 vs. limit=22.5 2023-11-19 21:33:16,785 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1850, loss[loss=0.08318, simple_loss=0.1118, pruned_loss=0.01917, audio_tagging_loss=0.0081, over 14934.00 frames. ], tot_loss[loss=0.08406, simple_loss=0.1038, pruned_loss=0.02188, audio_tagging_loss=0.01026, over 3041578.15 frames. ], batch size: 54, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:33:23,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=813873.3333333334, ans=0.125 2023-11-19 21:33:38,896 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122100 2023-11-19 21:33:57,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2023-11-19 21:34:00,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=814073.3333333334, ans=0.2 2023-11-19 21:34:02,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=814073.3333333334, ans=0.5 2023-11-19 21:34:09,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=814140.0, ans=0.0 2023-11-19 21:34:21,923 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1900, loss[loss=0.08446, simple_loss=0.1034, pruned_loss=0.02316, audio_tagging_loss=0.009629, over 15805.00 frames. ], tot_loss[loss=0.08395, simple_loss=0.104, pruned_loss=0.02182, audio_tagging_loss=0.01011, over 3042319.66 frames. ], batch size: 62, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:34:39,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.118e+01 8.678e+01 9.738e+01 1.673e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-19 21:34:43,608 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122150 2023-11-19 21:35:04,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=814406.6666666666, ans=0.125 2023-11-19 21:35:09,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=814406.6666666666, ans=0.04949747468305833 2023-11-19 21:35:21,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=814473.3333333334, ans=0.2 2023-11-19 21:35:26,494 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 1950, loss[loss=0.07264, simple_loss=0.09008, pruned_loss=0.017, audio_tagging_loss=0.0106, over 15013.00 frames. ], tot_loss[loss=0.08308, simple_loss=0.103, pruned_loss=0.0215, audio_tagging_loss=0.01009, over 3040466.69 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:35:48,065 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122200 2023-11-19 21:35:50,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=814606.6666666666, ans=0.125 2023-11-19 21:36:00,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2023-11-19 21:36:08,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814740.0, ans=0.1 2023-11-19 21:36:31,168 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2000, loss[loss=0.08856, simple_loss=0.1134, pruned_loss=0.0247, audio_tagging_loss=0.007179, over 16298.00 frames. ], tot_loss[loss=0.08274, simple_loss=0.1023, pruned_loss=0.02148, audio_tagging_loss=0.01011, over 3039240.39 frames. ], batch size: 62, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:36:40,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=814873.3333333334, ans=0.125 2023-11-19 21:36:49,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.499e+01 9.542e+01 1.090e+02 1.717e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-19 21:36:53,406 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122250 2023-11-19 21:37:16,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=815073.3333333334, ans=0.0 2023-11-19 21:37:22,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=815140.0, ans=0.0 2023-11-19 21:37:36,790 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2050, loss[loss=0.1111, simple_loss=0.1469, pruned_loss=0.02892, audio_tagging_loss=0.008733, over 15424.00 frames. ], tot_loss[loss=0.08311, simple_loss=0.1026, pruned_loss=0.02165, audio_tagging_loss=0.01018, over 3041799.26 frames. ], batch size: 57, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:37:37,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815206.6666666666, ans=0.1 2023-11-19 21:37:39,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=815206.6666666666, ans=0.125 2023-11-19 21:37:40,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-19 21:37:41,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815206.6666666666, ans=0.1 2023-11-19 21:37:51,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2023-11-19 21:37:58,487 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122300 2023-11-19 21:38:03,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=815340.0, ans=0.125 2023-11-19 21:38:05,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=815340.0, ans=0.2 2023-11-19 21:38:06,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=815340.0, ans=0.125 2023-11-19 21:38:15,338 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:38:24,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815406.6666666666, ans=0.1 2023-11-19 21:38:39,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815540.0, ans=0.1 2023-11-19 21:38:40,861 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2100, loss[loss=0.08896, simple_loss=0.1161, pruned_loss=0.02283, audio_tagging_loss=0.008083, over 15898.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1037, pruned_loss=0.02198, audio_tagging_loss=0.01014, over 3041322.37 frames. ], batch size: 58, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:38:44,635 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-11-19 21:38:59,062 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.154e+01 8.161e+01 9.375e+01 1.016e+02 1.346e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-19 21:39:02,947 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122350 2023-11-19 21:39:04,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=815606.6666666666, ans=0.0 2023-11-19 21:39:05,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2023-11-19 21:39:11,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=815673.3333333334, ans=0.125 2023-11-19 21:39:15,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=815673.3333333334, ans=0.125 2023-11-19 21:39:17,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=815673.3333333334, ans=0.0 2023-11-19 21:39:19,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=815740.0, ans=0.2 2023-11-19 21:39:22,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-11-19 21:39:37,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=815806.6666666666, ans=0.125 2023-11-19 21:39:37,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=815806.6666666666, ans=0.125 2023-11-19 21:39:44,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2023-11-19 21:39:45,683 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2150, loss[loss=0.05991, simple_loss=0.07379, pruned_loss=0.01043, audio_tagging_loss=0.01258, over 15855.00 frames. ], tot_loss[loss=0.08433, simple_loss=0.104, pruned_loss=0.02215, audio_tagging_loss=0.01019, over 3043970.18 frames. ], batch size: 62, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:39:50,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=815873.3333333334, ans=0.0 2023-11-19 21:39:50,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=815873.3333333334, ans=0.125 2023-11-19 21:40:08,133 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122400 2023-11-19 21:40:24,483 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:40:51,655 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2200, loss[loss=0.06864, simple_loss=0.09116, pruned_loss=0.01421, audio_tagging_loss=0.00885, over 16292.00 frames. ], tot_loss[loss=0.08398, simple_loss=0.1039, pruned_loss=0.02189, audio_tagging_loss=0.01015, over 3047438.26 frames. ], batch size: 63, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:41:08,857 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 8.220e+01 9.086e+01 1.022e+02 1.678e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 21:41:09,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=816273.3333333334, ans=0.125 2023-11-19 21:41:10,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=816273.3333333334, ans=0.09899494936611666 2023-11-19 21:41:12,643 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122450 2023-11-19 21:41:18,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=816340.0, ans=0.0 2023-11-19 21:41:48,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=816473.3333333334, ans=0.125 2023-11-19 21:41:55,712 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2250, loss[loss=0.06993, simple_loss=0.08335, pruned_loss=0.01357, audio_tagging_loss=0.01468, over 16023.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.1042, pruned_loss=0.02213, audio_tagging_loss=0.01028, over 3037853.04 frames. ], batch size: 59, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:42:08,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=816606.6666666666, ans=0.125 2023-11-19 21:42:17,982 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122500 2023-11-19 21:42:56,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=816806.6666666666, ans=0.125 2023-11-19 21:43:00,635 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2300, loss[loss=0.08227, simple_loss=0.1026, pruned_loss=0.02009, audio_tagging_loss=0.01088, over 15155.00 frames. ], tot_loss[loss=0.084, simple_loss=0.1031, pruned_loss=0.02207, audio_tagging_loss=0.01036, over 3040317.53 frames. ], batch size: 57, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:43:09,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=816873.3333333334, ans=0.125 2023-11-19 21:43:19,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=816940.0, ans=0.0 2023-11-19 21:43:19,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.330e+01 8.975e+01 9.637e+01 1.370e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-19 21:43:23,789 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122550 2023-11-19 21:43:46,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-19 21:43:48,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817073.3333333334, ans=0.1 2023-11-19 21:43:58,423 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:44:07,012 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2350, loss[loss=0.07949, simple_loss=0.1053, pruned_loss=0.0191, audio_tagging_loss=0.007748, over 15336.00 frames. ], tot_loss[loss=0.08423, simple_loss=0.1034, pruned_loss=0.02203, audio_tagging_loss=0.01047, over 3042491.44 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 16.0 2023-11-19 21:44:28,148 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122600 2023-11-19 21:44:33,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2023-11-19 21:44:37,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=817340.0, ans=0.125 2023-11-19 21:45:11,205 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2400, loss[loss=0.07961, simple_loss=0.1004, pruned_loss=0.01694, audio_tagging_loss=0.01245, over 16646.00 frames. ], tot_loss[loss=0.08416, simple_loss=0.1031, pruned_loss=0.022, audio_tagging_loss=0.0106, over 3041222.10 frames. ], batch size: 58, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:45:15,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-11-19 21:45:27,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=817606.6666666666, ans=0.125 2023-11-19 21:45:30,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.655e+01 8.060e+01 8.978e+01 9.886e+01 1.686e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 21:45:32,849 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122650 2023-11-19 21:45:37,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=817673.3333333334, ans=0.1 2023-11-19 21:45:44,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=817673.3333333334, ans=0.2 2023-11-19 21:45:47,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=817673.3333333334, ans=0.125 2023-11-19 21:46:08,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817806.6666666666, ans=0.1 2023-11-19 21:46:15,438 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2450, loss[loss=0.07081, simple_loss=0.07815, pruned_loss=0.01761, audio_tagging_loss=0.01412, over 15668.00 frames. ], tot_loss[loss=0.08433, simple_loss=0.1034, pruned_loss=0.02202, audio_tagging_loss=0.01061, over 3048440.07 frames. ], batch size: 60, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:46:38,569 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122700 2023-11-19 21:46:41,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2023-11-19 21:46:44,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=818006.6666666666, ans=0.0 2023-11-19 21:47:01,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=818073.3333333334, ans=0.125 2023-11-19 21:47:21,091 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2500, loss[loss=0.07343, simple_loss=0.09063, pruned_loss=0.01793, audio_tagging_loss=0.01018, over 14951.00 frames. ], tot_loss[loss=0.08424, simple_loss=0.1034, pruned_loss=0.022, audio_tagging_loss=0.01054, over 3047298.57 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:47:40,052 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.459e+01 8.244e+01 8.954e+01 9.755e+01 1.221e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-19 21:47:42,591 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122750 2023-11-19 21:48:12,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=818473.3333333334, ans=0.0 2023-11-19 21:48:14,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-11-19 21:48:25,713 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2550, loss[loss=0.07353, simple_loss=0.0964, pruned_loss=0.01702, audio_tagging_loss=0.008309, over 16832.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.103, pruned_loss=0.022, audio_tagging_loss=0.01045, over 3043552.99 frames. ], batch size: 61, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:48:32,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2023-11-19 21:48:47,285 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122800 2023-11-19 21:49:02,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=818673.3333333334, ans=0.125 2023-11-19 21:49:03,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=818673.3333333334, ans=0.5 2023-11-19 21:49:29,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=818873.3333333334, ans=0.0 2023-11-19 21:49:30,330 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2600, loss[loss=0.07955, simple_loss=0.101, pruned_loss=0.01713, audio_tagging_loss=0.01191, over 15884.00 frames. ], tot_loss[loss=0.08343, simple_loss=0.1023, pruned_loss=0.02186, audio_tagging_loss=0.01041, over 3037681.64 frames. ], batch size: 60, lr: 6.39e-03, grad_scale: 16.0 2023-11-19 21:49:36,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=818873.3333333334, ans=0.2 2023-11-19 21:49:44,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2023-11-19 21:49:52,167 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.517e+01 9.235e+01 1.002e+02 1.405e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 21:49:53,579 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122850 2023-11-19 21:50:14,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2023-11-19 21:50:17,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=819073.3333333334, ans=0.125 2023-11-19 21:50:21,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=819140.0, ans=0.125 2023-11-19 21:50:35,312 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2650, loss[loss=0.09887, simple_loss=0.1341, pruned_loss=0.02628, audio_tagging_loss=0.005514, over 15315.00 frames. ], tot_loss[loss=0.08373, simple_loss=0.1031, pruned_loss=0.02194, audio_tagging_loss=0.01023, over 3034949.22 frames. ], batch size: 54, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:50:43,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=22.5 2023-11-19 21:50:49,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=819273.3333333334, ans=0.1 2023-11-19 21:50:58,165 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122900 2023-11-19 21:51:01,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.06 vs. limit=22.5 2023-11-19 21:51:21,235 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:51:35,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=819473.3333333334, ans=0.125 2023-11-19 21:51:38,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.74 vs. limit=10.0 2023-11-19 21:51:39,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=819473.3333333334, ans=0.125 2023-11-19 21:51:41,506 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2700, loss[loss=0.1069, simple_loss=0.1339, pruned_loss=0.03113, audio_tagging_loss=0.008879, over 16471.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.1028, pruned_loss=0.02158, audio_tagging_loss=0.01021, over 3042035.43 frames. ], batch size: 58, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:51:44,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=819540.0, ans=0.125 2023-11-19 21:52:01,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.510e+01 9.186e+01 1.041e+02 2.301e+02, threshold=1.837e+02, percent-clipped=1.0 2023-11-19 21:52:03,521 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 122950 2023-11-19 21:52:21,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=819740.0, ans=0.0 2023-11-19 21:52:29,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2023-11-19 21:52:40,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=819806.6666666666, ans=0.125 2023-11-19 21:52:46,250 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2750, loss[loss=0.07854, simple_loss=0.1023, pruned_loss=0.01923, audio_tagging_loss=0.008141, over 15450.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.1028, pruned_loss=0.02172, audio_tagging_loss=0.01023, over 3046505.17 frames. ], batch size: 60, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:53:08,473 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123000 2023-11-19 21:53:22,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=820006.6666666666, ans=0.125 2023-11-19 21:53:29,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=820073.3333333334, ans=0.2 2023-11-19 21:53:32,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=820073.3333333334, ans=0.2 2023-11-19 21:53:36,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820073.3333333334, ans=0.1 2023-11-19 21:53:40,873 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:53:41,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=820140.0, ans=0.125 2023-11-19 21:53:51,413 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2800, loss[loss=0.0901, simple_loss=0.1126, pruned_loss=0.02384, audio_tagging_loss=0.009937, over 14796.00 frames. ], tot_loss[loss=0.08274, simple_loss=0.1016, pruned_loss=0.02161, audio_tagging_loss=0.01032, over 3040212.29 frames. ], batch size: 54, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:53:55,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820206.6666666666, ans=0.1 2023-11-19 21:54:06,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=820273.3333333334, ans=0.2 2023-11-19 21:54:07,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=820273.3333333334, ans=0.125 2023-11-19 21:54:09,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=820273.3333333334, ans=0.025 2023-11-19 21:54:14,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.139e+01 8.815e+01 9.780e+01 1.679e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 21:54:14,187 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123050 2023-11-19 21:54:18,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=820340.0, ans=0.0 2023-11-19 21:54:21,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=820340.0, ans=0.05 2023-11-19 21:54:28,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=820340.0, ans=0.125 2023-11-19 21:54:34,346 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-19 21:54:44,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820473.3333333334, ans=0.1 2023-11-19 21:54:50,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2023-11-19 21:54:52,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=820473.3333333334, ans=0.05 2023-11-19 21:54:56,815 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2850, loss[loss=0.08996, simple_loss=0.1111, pruned_loss=0.02277, audio_tagging_loss=0.01164, over 15066.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1025, pruned_loss=0.0218, audio_tagging_loss=0.01023, over 3032446.17 frames. ], batch size: 56, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:55:03,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=820540.0, ans=0.0 2023-11-19 21:55:04,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820540.0, ans=0.1 2023-11-19 21:55:18,667 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123100 2023-11-19 21:55:30,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=820673.3333333334, ans=0.0 2023-11-19 21:55:46,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=820740.0, ans=0.1 2023-11-19 21:55:57,440 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:56:02,087 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2900, loss[loss=0.07413, simple_loss=0.08262, pruned_loss=0.02256, audio_tagging_loss=0.01026, over 14563.00 frames. ], tot_loss[loss=0.08391, simple_loss=0.1034, pruned_loss=0.02199, audio_tagging_loss=0.01023, over 3033642.04 frames. ], batch size: 57, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:56:15,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=820940.0, ans=0.125 2023-11-19 21:56:23,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.184e+01 8.854e+01 9.488e+01 1.292e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 21:56:23,953 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123150 2023-11-19 21:56:50,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2023-11-19 21:57:06,581 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 2950, loss[loss=0.09281, simple_loss=0.1189, pruned_loss=0.02565, audio_tagging_loss=0.007729, over 15015.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1042, pruned_loss=0.02209, audio_tagging_loss=0.0102, over 3038027.28 frames. ], batch size: 56, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:57:08,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-11-19 21:57:18,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=821206.6666666666, ans=0.07 2023-11-19 21:57:28,836 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123200 2023-11-19 21:57:33,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=821340.0, ans=0.0 2023-11-19 21:57:40,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2023-11-19 21:57:42,522 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=22.5 2023-11-19 21:57:46,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821406.6666666666, ans=0.1 2023-11-19 21:57:59,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=821473.3333333334, ans=0.125 2023-11-19 21:58:00,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=821473.3333333334, ans=0.2 2023-11-19 21:58:04,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=821473.3333333334, ans=0.0 2023-11-19 21:58:12,079 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3000, loss[loss=0.06608, simple_loss=0.08312, pruned_loss=0.01533, audio_tagging_loss=0.009182, over 15074.00 frames. ], tot_loss[loss=0.0845, simple_loss=0.1042, pruned_loss=0.02214, audio_tagging_loss=0.01026, over 3049251.65 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 21:58:12,080 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-19 21:58:36,335 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3321, 5.0056, 4.7575, 5.1680], device='cuda:2') 2023-11-19 21:58:36,474 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4037, 3.6431, 2.4405, 3.4929], device='cuda:2') 2023-11-19 21:58:52,219 INFO [train_asr.py:1294] (2/4) Epoch 11, validation: loss=0.06441, simple_loss=0.05497, pruned_loss=0.006219, audio_tagging_loss=0.03071, over 4681554.00 frames. 2023-11-19 21:58:52,220 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-19 21:58:57,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=821540.0, ans=0.125 2023-11-19 21:59:02,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=821540.0, ans=0.0 2023-11-19 21:59:14,433 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.445e+01 9.049e+01 1.018e+02 1.456e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 21:59:14,579 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123250 2023-11-19 21:59:22,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=821673.3333333334, ans=0.125 2023-11-19 21:59:43,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=821806.6666666666, ans=0.125 2023-11-19 21:59:47,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=821806.6666666666, ans=0.125 2023-11-19 21:59:55,782 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3050, loss[loss=0.1093, simple_loss=0.131, pruned_loss=0.03388, audio_tagging_loss=0.009965, over 14782.00 frames. ], tot_loss[loss=0.08434, simple_loss=0.1039, pruned_loss=0.02203, audio_tagging_loss=0.01036, over 3047360.15 frames. ], batch size: 55, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:00:18,120 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123300 2023-11-19 22:00:19,428 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:00:21,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=822006.6666666666, ans=0.125 2023-11-19 22:00:26,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=822006.6666666666, ans=0.125 2023-11-19 22:00:27,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=822006.6666666666, ans=0.125 2023-11-19 22:00:33,961 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:00:43,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=822073.3333333334, ans=0.0 2023-11-19 22:00:44,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2023-11-19 22:01:01,016 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3100, loss[loss=0.1151, simple_loss=0.1407, pruned_loss=0.03641, audio_tagging_loss=0.008334, over 14136.00 frames. ], tot_loss[loss=0.08589, simple_loss=0.106, pruned_loss=0.0227, audio_tagging_loss=0.01021, over 3046127.24 frames. ], batch size: 53, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:01:01,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822206.6666666666, ans=0.1 2023-11-19 22:01:03,095 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-19 22:01:06,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.20 vs. limit=22.5 2023-11-19 22:01:22,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.148e+01 8.871e+01 9.460e+01 1.235e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 22:01:23,100 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123350 2023-11-19 22:01:29,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=822340.0, ans=0.125 2023-11-19 22:01:48,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=822406.6666666666, ans=0.0 2023-11-19 22:02:05,584 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3150, loss[loss=0.0881, simple_loss=0.1071, pruned_loss=0.0211, audio_tagging_loss=0.01343, over 15495.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1051, pruned_loss=0.02261, audio_tagging_loss=0.01036, over 3044577.15 frames. ], batch size: 60, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:02:08,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822540.0, ans=0.1 2023-11-19 22:02:22,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=822606.6666666666, ans=0.125 2023-11-19 22:02:23,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2023-11-19 22:02:27,769 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123400 2023-11-19 22:02:46,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2023-11-19 22:02:50,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822740.0, ans=0.1 2023-11-19 22:03:07,403 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-19 22:03:10,328 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3200, loss[loss=0.06978, simple_loss=0.08468, pruned_loss=0.01583, audio_tagging_loss=0.0116, over 14379.00 frames. ], tot_loss[loss=0.0852, simple_loss=0.1047, pruned_loss=0.02241, audio_tagging_loss=0.01044, over 3045924.49 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:03:10,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=822873.3333333334, ans=0.125 2023-11-19 22:03:14,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=822873.3333333334, ans=0.0 2023-11-19 22:03:26,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822940.0, ans=0.1 2023-11-19 22:03:26,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2023-11-19 22:03:32,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.258e+01 8.832e+01 9.801e+01 1.591e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-19 22:03:32,515 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123450 2023-11-19 22:03:38,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=823006.6666666666, ans=0.2 2023-11-19 22:03:48,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=823073.3333333334, ans=0.125 2023-11-19 22:04:01,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=823140.0, ans=0.125 2023-11-19 22:04:06,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=823140.0, ans=0.0 2023-11-19 22:04:15,944 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3250, loss[loss=0.07358, simple_loss=0.08879, pruned_loss=0.01888, audio_tagging_loss=0.0103, over 14788.00 frames. ], tot_loss[loss=0.08548, simple_loss=0.1052, pruned_loss=0.02248, audio_tagging_loss=0.01043, over 3042067.74 frames. ], batch size: 56, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:04:29,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=823273.3333333334, ans=0.125 2023-11-19 22:04:36,916 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123500 2023-11-19 22:04:54,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=823406.6666666666, ans=0.125 2023-11-19 22:05:08,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=823473.3333333334, ans=0.09899494936611666 2023-11-19 22:05:18,909 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3300, loss[loss=0.06536, simple_loss=0.07899, pruned_loss=0.01157, audio_tagging_loss=0.0143, over 15573.00 frames. ], tot_loss[loss=0.08431, simple_loss=0.1037, pruned_loss=0.02193, audio_tagging_loss=0.01052, over 3034342.15 frames. ], batch size: 61, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:05:25,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=823540.0, ans=0.125 2023-11-19 22:05:40,804 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.419e+01 8.972e+01 9.610e+01 1.838e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 22:05:40,961 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123550 2023-11-19 22:05:46,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2023-11-19 22:05:52,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=823673.3333333334, ans=10.0 2023-11-19 22:06:06,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.29 vs. limit=10.0 2023-11-19 22:06:10,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=823806.6666666666, ans=0.05 2023-11-19 22:06:23,925 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3350, loss[loss=0.09179, simple_loss=0.124, pruned_loss=0.02294, audio_tagging_loss=0.006868, over 15887.00 frames. ], tot_loss[loss=0.08413, simple_loss=0.1035, pruned_loss=0.02188, audio_tagging_loss=0.0105, over 3030149.39 frames. ], batch size: 60, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:06:26,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=823873.3333333334, ans=0.125 2023-11-19 22:06:34,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2023-11-19 22:06:46,361 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123600 2023-11-19 22:07:01,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=824073.3333333334, ans=0.125 2023-11-19 22:07:02,160 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2023-11-19 22:07:25,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=824140.0, ans=0.2 2023-11-19 22:07:29,862 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3400, loss[loss=0.09114, simple_loss=0.1104, pruned_loss=0.02186, audio_tagging_loss=0.01407, over 15105.00 frames. ], tot_loss[loss=0.08393, simple_loss=0.1034, pruned_loss=0.02178, audio_tagging_loss=0.01044, over 3028398.47 frames. ], batch size: 58, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:07:33,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=12.0 2023-11-19 22:07:36,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=824206.6666666666, ans=0.125 2023-11-19 22:07:50,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.550e+01 9.235e+01 1.051e+02 1.197e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 22:07:51,068 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123650 2023-11-19 22:07:53,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=824340.0, ans=0.125 2023-11-19 22:08:03,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.90 vs. limit=15.0 2023-11-19 22:08:13,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=824406.6666666666, ans=0.0 2023-11-19 22:08:14,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=824406.6666666666, ans=0.125 2023-11-19 22:08:27,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=15.0 2023-11-19 22:08:33,938 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3450, loss[loss=0.08731, simple_loss=0.1109, pruned_loss=0.02115, audio_tagging_loss=0.0107, over 15607.00 frames. ], tot_loss[loss=0.08412, simple_loss=0.1038, pruned_loss=0.02194, audio_tagging_loss=0.01031, over 3031948.01 frames. ], batch size: 59, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:08:45,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=824606.6666666666, ans=0.125 2023-11-19 22:08:45,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=824606.6666666666, ans=0.0 2023-11-19 22:08:48,345 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:08:48,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=22.5 2023-11-19 22:08:54,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824606.6666666666, ans=0.1 2023-11-19 22:08:54,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2023-11-19 22:08:56,281 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123700 2023-11-19 22:09:13,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=824740.0, ans=0.125 2023-11-19 22:09:18,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=824740.0, ans=0.0 2023-11-19 22:09:21,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=824740.0, ans=0.0 2023-11-19 22:09:21,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=824740.0, ans=12.0 2023-11-19 22:09:35,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=824806.6666666666, ans=0.0 2023-11-19 22:09:38,703 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3500, loss[loss=0.06973, simple_loss=0.0874, pruned_loss=0.01805, audio_tagging_loss=0.007976, over 15817.00 frames. ], tot_loss[loss=0.08374, simple_loss=0.1029, pruned_loss=0.022, audio_tagging_loss=0.01028, over 3038913.45 frames. ], batch size: 60, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:09:44,945 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2023-11-19 22:10:00,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=824940.0, ans=0.125 2023-11-19 22:10:01,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.383e+01 8.617e+01 9.360e+01 1.039e+02 1.365e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 22:10:01,339 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123750 2023-11-19 22:10:08,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=825006.6666666666, ans=0.125 2023-11-19 22:10:12,167 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:10:22,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=825073.3333333334, ans=0.125 2023-11-19 22:10:24,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=825073.3333333334, ans=0.0 2023-11-19 22:10:25,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2023-11-19 22:10:30,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=825140.0, ans=0.0 2023-11-19 22:10:33,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=825140.0, ans=0.125 2023-11-19 22:10:43,995 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3550, loss[loss=0.08381, simple_loss=0.1092, pruned_loss=0.01834, audio_tagging_loss=0.01086, over 15012.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.1039, pruned_loss=0.02211, audio_tagging_loss=0.01016, over 3042021.40 frames. ], batch size: 54, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:10:54,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=825206.6666666666, ans=0.125 2023-11-19 22:10:59,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=825273.3333333334, ans=0.2 2023-11-19 22:11:04,773 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123800 2023-11-19 22:11:24,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=825406.6666666666, ans=0.5 2023-11-19 22:11:47,376 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3600, loss[loss=0.08225, simple_loss=0.1089, pruned_loss=0.0203, audio_tagging_loss=0.007478, over 15553.00 frames. ], tot_loss[loss=0.08423, simple_loss=0.1043, pruned_loss=0.02204, audio_tagging_loss=0.01006, over 3039120.56 frames. ], batch size: 58, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:11:48,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=825540.0, ans=0.125 2023-11-19 22:11:49,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=825540.0, ans=0.1 2023-11-19 22:11:50,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=825540.0, ans=0.125 2023-11-19 22:12:04,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=825606.6666666666, ans=0.125 2023-11-19 22:12:08,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.358e+01 8.243e+01 9.327e+01 1.037e+02 1.432e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 22:12:09,132 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123850 2023-11-19 22:12:25,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=825740.0, ans=0.125 2023-11-19 22:12:29,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825740.0, ans=0.1 2023-11-19 22:12:52,097 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3650, loss[loss=0.07806, simple_loss=0.09545, pruned_loss=0.01866, audio_tagging_loss=0.01167, over 15507.00 frames. ], tot_loss[loss=0.08475, simple_loss=0.1047, pruned_loss=0.02228, audio_tagging_loss=0.0101, over 3045077.48 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:12:53,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=825873.3333333334, ans=0.0 2023-11-19 22:13:15,366 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123900 2023-11-19 22:13:26,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=826006.6666666666, ans=0.125 2023-11-19 22:13:29,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=826006.6666666666, ans=0.125 2023-11-19 22:13:41,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=826073.3333333334, ans=0.125 2023-11-19 22:13:57,505 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3700, loss[loss=0.08556, simple_loss=0.1082, pruned_loss=0.02297, audio_tagging_loss=0.008496, over 15959.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1049, pruned_loss=0.02244, audio_tagging_loss=0.01013, over 3047431.52 frames. ], batch size: 59, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:14:16,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=826273.3333333334, ans=0.1 2023-11-19 22:14:18,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.345e+01 9.061e+01 9.917e+01 1.388e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 22:14:18,925 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 123950 2023-11-19 22:14:22,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=826340.0, ans=0.04949747468305833 2023-11-19 22:14:23,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-11-19 22:14:24,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=826340.0, ans=0.0 2023-11-19 22:14:34,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=826406.6666666666, ans=0.125 2023-11-19 22:14:51,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=826473.3333333334, ans=0.2 2023-11-19 22:14:52,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=826473.3333333334, ans=0.5 2023-11-19 22:14:58,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=826473.3333333334, ans=0.125 2023-11-19 22:15:01,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-19 22:15:01,856 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3750, loss[loss=0.08994, simple_loss=0.106, pruned_loss=0.02517, audio_tagging_loss=0.01176, over 15871.00 frames. ], tot_loss[loss=0.0854, simple_loss=0.1055, pruned_loss=0.02251, audio_tagging_loss=0.01015, over 3052738.48 frames. ], batch size: 62, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:15:06,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=826540.0, ans=0.125 2023-11-19 22:15:23,441 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124000 2023-11-19 22:15:46,149 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-19 22:15:50,619 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:16:08,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=826873.3333333334, ans=0.125 2023-11-19 22:16:09,639 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3800, loss[loss=0.08483, simple_loss=0.1094, pruned_loss=0.01979, audio_tagging_loss=0.01035, over 14922.00 frames. ], tot_loss[loss=0.08502, simple_loss=0.1047, pruned_loss=0.02234, audio_tagging_loss=0.01031, over 3050437.07 frames. ], batch size: 56, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:16:32,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.419e+01 9.148e+01 9.794e+01 1.250e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 22:16:32,576 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124050 2023-11-19 22:16:40,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827006.6666666666, ans=0.125 2023-11-19 22:16:41,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=827006.6666666666, ans=0.035 2023-11-19 22:16:57,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827073.3333333334, ans=0.1 2023-11-19 22:17:13,900 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3850, loss[loss=0.06859, simple_loss=0.0808, pruned_loss=0.01367, audio_tagging_loss=0.01452, over 14350.00 frames. ], tot_loss[loss=0.08476, simple_loss=0.1046, pruned_loss=0.02212, audio_tagging_loss=0.01036, over 3049358.84 frames. ], batch size: 56, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:17:14,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-19 22:17:20,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=827206.6666666666, ans=0.0 2023-11-19 22:17:22,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=15.0 2023-11-19 22:17:30,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=827273.3333333334, ans=0.07 2023-11-19 22:17:35,533 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124100 2023-11-19 22:17:35,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=827273.3333333334, ans=0.125 2023-11-19 22:17:47,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=827340.0, ans=0.2 2023-11-19 22:17:48,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=827340.0, ans=0.125 2023-11-19 22:18:09,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=827473.3333333334, ans=0.125 2023-11-19 22:18:18,130 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3900, loss[loss=0.08516, simple_loss=0.1021, pruned_loss=0.02445, audio_tagging_loss=0.009673, over 14884.00 frames. ], tot_loss[loss=0.08549, simple_loss=0.1056, pruned_loss=0.02234, audio_tagging_loss=0.01034, over 3042958.59 frames. ], batch size: 55, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:18:36,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=827606.6666666666, ans=0.125 2023-11-19 22:18:39,674 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.442e+01 9.156e+01 1.005e+02 1.586e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 22:18:39,818 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124150 2023-11-19 22:18:42,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=827673.3333333334, ans=0.0 2023-11-19 22:19:10,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=827806.6666666666, ans=0.125 2023-11-19 22:19:20,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=827806.6666666666, ans=0.0 2023-11-19 22:19:22,055 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 3950, loss[loss=0.06731, simple_loss=0.08563, pruned_loss=0.01298, audio_tagging_loss=0.01152, over 14678.00 frames. ], tot_loss[loss=0.08484, simple_loss=0.1049, pruned_loss=0.02199, audio_tagging_loss=0.01042, over 3045151.91 frames. ], batch size: 55, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:19:29,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-19 22:19:35,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=827940.0, ans=0.125 2023-11-19 22:19:43,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827940.0, ans=0.1 2023-11-19 22:19:44,124 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124200 2023-11-19 22:19:53,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=828006.6666666666, ans=0.125 2023-11-19 22:20:05,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=828073.3333333334, ans=15.0 2023-11-19 22:20:14,081 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-19 22:20:20,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2023-11-19 22:20:22,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-11-19 22:20:24,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=828140.0, ans=0.0 2023-11-19 22:20:27,564 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4000, loss[loss=0.06261, simple_loss=0.0718, pruned_loss=0.01212, audio_tagging_loss=0.01459, over 15723.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.1046, pruned_loss=0.02193, audio_tagging_loss=0.01048, over 3041766.67 frames. ], batch size: 63, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:20:31,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=828206.6666666666, ans=0.0 2023-11-19 22:20:49,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.189e+01 8.890e+01 9.727e+01 1.231e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 22:20:49,889 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124250 2023-11-19 22:20:50,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=828273.3333333334, ans=0.125 2023-11-19 22:20:53,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=828340.0, ans=0.125 2023-11-19 22:21:13,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828406.6666666666, ans=0.1 2023-11-19 22:21:14,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-19 22:21:27,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.06 vs. limit=10.0 2023-11-19 22:21:31,908 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4050, loss[loss=0.1022, simple_loss=0.1173, pruned_loss=0.03116, audio_tagging_loss=0.01239, over 15474.00 frames. ], tot_loss[loss=0.08481, simple_loss=0.1045, pruned_loss=0.02196, audio_tagging_loss=0.01059, over 3042130.65 frames. ], batch size: 59, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:21:36,201 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:21:42,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=828540.0, ans=0.0 2023-11-19 22:21:52,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=828606.6666666666, ans=0.125 2023-11-19 22:21:53,960 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124300 2023-11-19 22:21:57,005 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=12.0 2023-11-19 22:22:10,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=828740.0, ans=0.125 2023-11-19 22:22:15,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=12.0 2023-11-19 22:22:26,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=828806.6666666666, ans=0.125 2023-11-19 22:22:36,335 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4100, loss[loss=0.08071, simple_loss=0.09455, pruned_loss=0.01952, audio_tagging_loss=0.01392, over 15096.00 frames. ], tot_loss[loss=0.08436, simple_loss=0.1039, pruned_loss=0.02179, audio_tagging_loss=0.0106, over 3042139.57 frames. ], batch size: 55, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:22:36,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=828873.3333333334, ans=0.0 2023-11-19 22:22:45,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=828873.3333333334, ans=0.125 2023-11-19 22:22:58,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.257e+01 8.196e+01 8.855e+01 9.661e+01 1.383e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 22:22:58,406 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124350 2023-11-19 22:23:05,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=829006.6666666666, ans=0.125 2023-11-19 22:23:20,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=829073.3333333334, ans=0.125 2023-11-19 22:23:24,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=12.0 2023-11-19 22:23:27,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=829140.0, ans=0.125 2023-11-19 22:23:27,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829140.0, ans=0.1 2023-11-19 22:23:40,911 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4150, loss[loss=0.0765, simple_loss=0.1045, pruned_loss=0.01699, audio_tagging_loss=0.007263, over 15903.00 frames. ], tot_loss[loss=0.08397, simple_loss=0.1036, pruned_loss=0.02171, audio_tagging_loss=0.01047, over 3044689.51 frames. ], batch size: 59, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:24:02,794 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124400 2023-11-19 22:24:20,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.28 vs. limit=10.0 2023-11-19 22:24:22,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2023-11-19 22:24:27,479 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:24:43,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=829473.3333333334, ans=0.0 2023-11-19 22:24:45,689 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4200, loss[loss=0.09218, simple_loss=0.1124, pruned_loss=0.02624, audio_tagging_loss=0.009746, over 15083.00 frames. ], tot_loss[loss=0.0844, simple_loss=0.1043, pruned_loss=0.02198, audio_tagging_loss=0.01026, over 3047139.25 frames. ], batch size: 58, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:24:46,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=829540.0, ans=0.025 2023-11-19 22:24:49,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=12.0 2023-11-19 22:25:03,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=829606.6666666666, ans=0.0 2023-11-19 22:25:05,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=829606.6666666666, ans=0.125 2023-11-19 22:25:07,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.135e+01 8.898e+01 9.525e+01 1.896e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 22:25:07,794 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124450 2023-11-19 22:25:18,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=829673.3333333334, ans=0.125 2023-11-19 22:25:24,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=829740.0, ans=0.2 2023-11-19 22:25:29,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=829740.0, ans=0.125 2023-11-19 22:25:30,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=829740.0, ans=0.125 2023-11-19 22:25:43,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=829806.6666666666, ans=0.0 2023-11-19 22:25:50,203 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4250, loss[loss=0.08165, simple_loss=0.1058, pruned_loss=0.01739, audio_tagging_loss=0.01134, over 15904.00 frames. ], tot_loss[loss=0.08389, simple_loss=0.1038, pruned_loss=0.02181, audio_tagging_loss=0.0102, over 3044175.15 frames. ], batch size: 58, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:25:58,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=829873.3333333334, ans=0.125 2023-11-19 22:25:59,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=829873.3333333334, ans=0.125 2023-11-19 22:26:12,426 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124500 2023-11-19 22:26:13,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=829940.0, ans=0.125 2023-11-19 22:26:44,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=830140.0, ans=0.125 2023-11-19 22:26:46,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=830140.0, ans=0.125 2023-11-19 22:26:55,231 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4300, loss[loss=0.08114, simple_loss=0.0945, pruned_loss=0.02637, audio_tagging_loss=0.007518, over 14700.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.105, pruned_loss=0.02238, audio_tagging_loss=0.01008, over 3047667.96 frames. ], batch size: 57, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:27:10,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=830273.3333333334, ans=0.0 2023-11-19 22:27:15,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2023-11-19 22:27:17,389 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124550 2023-11-19 22:27:17,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=830273.3333333334, ans=0.125 2023-11-19 22:27:18,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.093e+01 8.901e+01 9.811e+01 2.323e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 22:27:22,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=830340.0, ans=0.0 2023-11-19 22:27:22,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830340.0, ans=0.1 2023-11-19 22:27:25,627 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=15.0 2023-11-19 22:27:26,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=830340.0, ans=0.125 2023-11-19 22:27:39,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=830406.6666666666, ans=0.125 2023-11-19 22:27:59,310 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4350, loss[loss=0.08523, simple_loss=0.1136, pruned_loss=0.0222, audio_tagging_loss=0.00625, over 14884.00 frames. ], tot_loss[loss=0.08444, simple_loss=0.104, pruned_loss=0.02225, audio_tagging_loss=0.01017, over 3055785.03 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:28:10,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=830540.0, ans=0.2 2023-11-19 22:28:11,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2023-11-19 22:28:20,760 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124600 2023-11-19 22:28:55,597 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:28:59,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=830806.6666666666, ans=0.125 2023-11-19 22:29:00,399 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:29:03,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=830873.3333333334, ans=0.125 2023-11-19 22:29:03,778 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4400, loss[loss=0.06276, simple_loss=0.07519, pruned_loss=0.01655, audio_tagging_loss=0.008618, over 14096.00 frames. ], tot_loss[loss=0.0845, simple_loss=0.1044, pruned_loss=0.02225, audio_tagging_loss=0.01006, over 3056087.29 frames. ], batch size: 55, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:29:23,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=830940.0, ans=0.125 2023-11-19 22:29:26,141 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124650 2023-11-19 22:29:27,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.389e+01 8.577e+01 9.158e+01 9.839e+01 1.465e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 22:29:58,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=831140.0, ans=0.1 2023-11-19 22:30:09,319 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4450, loss[loss=0.07841, simple_loss=0.09965, pruned_loss=0.01807, audio_tagging_loss=0.01051, over 15273.00 frames. ], tot_loss[loss=0.08481, simple_loss=0.1049, pruned_loss=0.02226, audio_tagging_loss=0.01012, over 3057309.72 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:30:31,362 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124700 2023-11-19 22:30:40,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-19 22:31:13,692 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4500, loss[loss=0.1018, simple_loss=0.1309, pruned_loss=0.03003, audio_tagging_loss=0.006299, over 15313.00 frames. ], tot_loss[loss=0.08486, simple_loss=0.1052, pruned_loss=0.02217, audio_tagging_loss=0.0101, over 3056436.78 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:31:19,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=831540.0, ans=0.125 2023-11-19 22:31:35,374 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124750 2023-11-19 22:31:36,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.327e+01 8.964e+01 9.884e+01 1.189e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 22:31:45,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2023-11-19 22:32:18,451 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4550, loss[loss=0.1008, simple_loss=0.1273, pruned_loss=0.02663, audio_tagging_loss=0.01053, over 15438.00 frames. ], tot_loss[loss=0.08468, simple_loss=0.1049, pruned_loss=0.02215, audio_tagging_loss=0.0101, over 3053009.47 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:32:35,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=831940.0, ans=0.2 2023-11-19 22:32:40,905 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124800 2023-11-19 22:32:51,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=832006.6666666666, ans=0.0 2023-11-19 22:32:57,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-11-19 22:33:00,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=832073.3333333334, ans=0.2 2023-11-19 22:33:07,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=832073.3333333334, ans=0.025 2023-11-19 22:33:08,664 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:33:10,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=832140.0, ans=0.2 2023-11-19 22:33:24,153 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4600, loss[loss=0.09941, simple_loss=0.1156, pruned_loss=0.03006, audio_tagging_loss=0.01154, over 15743.00 frames. ], tot_loss[loss=0.08417, simple_loss=0.1036, pruned_loss=0.02198, audio_tagging_loss=0.01037, over 3045710.75 frames. ], batch size: 57, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:33:24,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=832206.6666666666, ans=10.0 2023-11-19 22:33:46,690 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124850 2023-11-19 22:33:47,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.306e+01 8.909e+01 9.778e+01 1.815e+02, threshold=1.782e+02, percent-clipped=2.0 2023-11-19 22:33:49,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=832340.0, ans=0.0 2023-11-19 22:34:27,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=832473.3333333334, ans=0.2 2023-11-19 22:34:28,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=832540.0, ans=0.0 2023-11-19 22:34:29,629 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4650, loss[loss=0.08125, simple_loss=0.0909, pruned_loss=0.02192, audio_tagging_loss=0.01389, over 16024.00 frames. ], tot_loss[loss=0.08383, simple_loss=0.1032, pruned_loss=0.02177, audio_tagging_loss=0.01045, over 3043254.21 frames. ], batch size: 60, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:34:51,390 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124900 2023-11-19 22:35:10,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=832740.0, ans=0.2 2023-11-19 22:35:11,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-11-19 22:35:23,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-19 22:35:25,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2023-11-19 22:35:27,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.57 vs. limit=10.0 2023-11-19 22:35:34,367 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4700, loss[loss=0.08295, simple_loss=0.1036, pruned_loss=0.01896, audio_tagging_loss=0.01221, over 13749.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.1035, pruned_loss=0.02202, audio_tagging_loss=0.01052, over 3044147.96 frames. ], batch size: 53, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:35:39,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=832873.3333333334, ans=0.125 2023-11-19 22:35:55,640 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 124950 2023-11-19 22:35:56,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.340e+01 9.099e+01 9.695e+01 1.346e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 22:36:02,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=833006.6666666666, ans=0.1 2023-11-19 22:36:16,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=833073.3333333334, ans=0.125 2023-11-19 22:36:18,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=833073.3333333334, ans=0.125 2023-11-19 22:36:20,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=833073.3333333334, ans=0.1 2023-11-19 22:36:38,649 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4750, loss[loss=0.08997, simple_loss=0.1121, pruned_loss=0.02171, audio_tagging_loss=0.01223, over 14801.00 frames. ], tot_loss[loss=0.08443, simple_loss=0.1036, pruned_loss=0.02201, audio_tagging_loss=0.01061, over 3041164.60 frames. ], batch size: 54, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:36:41,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=833206.6666666666, ans=0.125 2023-11-19 22:36:55,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=833273.3333333334, ans=0.09899494936611666 2023-11-19 22:37:00,276 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125000 2023-11-19 22:37:05,269 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2023-11-19 22:37:05,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=833340.0, ans=0.125 2023-11-19 22:37:14,458 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:37:21,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=833406.6666666666, ans=0.2 2023-11-19 22:37:31,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=833473.3333333334, ans=0.2 2023-11-19 22:37:38,692 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2023-11-19 22:37:42,735 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4800, loss[loss=0.07406, simple_loss=0.08451, pruned_loss=0.01919, audio_tagging_loss=0.01262, over 16646.00 frames. ], tot_loss[loss=0.08426, simple_loss=0.1034, pruned_loss=0.02192, audio_tagging_loss=0.01063, over 3043529.73 frames. ], batch size: 64, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:37:46,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=833540.0, ans=0.125 2023-11-19 22:37:59,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=833606.6666666666, ans=0.125 2023-11-19 22:38:04,949 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125050 2023-11-19 22:38:06,674 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=22.5 2023-11-19 22:38:07,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.107e+01 9.125e+01 1.005e+02 1.234e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 22:38:24,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=833740.0, ans=0.07 2023-11-19 22:38:44,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=833806.6666666666, ans=0.125 2023-11-19 22:38:47,597 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4850, loss[loss=0.06704, simple_loss=0.09334, pruned_loss=0.01119, audio_tagging_loss=0.009188, over 16048.00 frames. ], tot_loss[loss=0.0845, simple_loss=0.1036, pruned_loss=0.02196, audio_tagging_loss=0.01074, over 3048232.29 frames. ], batch size: 61, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:38:57,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=833873.3333333334, ans=0.125 2023-11-19 22:38:57,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=833873.3333333334, ans=0.125 2023-11-19 22:39:06,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=833940.0, ans=0.0 2023-11-19 22:39:09,137 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125100 2023-11-19 22:39:09,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=833940.0, ans=0.125 2023-11-19 22:39:10,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-19 22:39:41,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=834140.0, ans=0.125 2023-11-19 22:39:49,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=834140.0, ans=0.2 2023-11-19 22:39:51,854 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4900, loss[loss=0.0994, simple_loss=0.1219, pruned_loss=0.02841, audio_tagging_loss=0.01003, over 14973.00 frames. ], tot_loss[loss=0.08468, simple_loss=0.1042, pruned_loss=0.02208, audio_tagging_loss=0.0105, over 3044535.36 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:40:13,356 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125150 2023-11-19 22:40:16,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.432e+01 8.847e+01 9.512e+01 1.221e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 22:40:37,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=834406.6666666666, ans=0.125 2023-11-19 22:40:38,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=834406.6666666666, ans=0.125 2023-11-19 22:40:40,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=834406.6666666666, ans=0.07 2023-11-19 22:40:41,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-19 22:40:44,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=834473.3333333334, ans=0.125 2023-11-19 22:40:44,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=834473.3333333334, ans=0.125 2023-11-19 22:40:55,094 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 4950, loss[loss=0.08995, simple_loss=0.1014, pruned_loss=0.02809, audio_tagging_loss=0.01117, over 14397.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1042, pruned_loss=0.02204, audio_tagging_loss=0.01028, over 3045271.73 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:41:10,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=834606.6666666666, ans=0.125 2023-11-19 22:41:13,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=834606.6666666666, ans=0.0 2023-11-19 22:41:17,928 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125200 2023-11-19 22:41:35,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=834740.0, ans=0.125 2023-11-19 22:41:43,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=834740.0, ans=0.125 2023-11-19 22:41:47,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834806.6666666666, ans=0.1 2023-11-19 22:41:48,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=12.0 2023-11-19 22:41:58,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=834806.6666666666, ans=0.125 2023-11-19 22:42:00,383 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5000, loss[loss=0.09379, simple_loss=0.107, pruned_loss=0.0301, audio_tagging_loss=0.0102, over 15200.00 frames. ], tot_loss[loss=0.08378, simple_loss=0.1031, pruned_loss=0.02193, audio_tagging_loss=0.01029, over 3043473.85 frames. ], batch size: 55, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:42:00,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=834873.3333333334, ans=0.125 2023-11-19 22:42:06,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=834873.3333333334, ans=0.1 2023-11-19 22:42:22,129 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125250 2023-11-19 22:42:24,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.190e+01 8.915e+01 9.653e+01 1.690e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-19 22:42:29,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=835006.6666666666, ans=0.1 2023-11-19 22:42:30,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835006.6666666666, ans=0.1 2023-11-19 22:42:38,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=835073.3333333334, ans=0.05 2023-11-19 22:42:57,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2023-11-19 22:43:04,421 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5050, loss[loss=0.07081, simple_loss=0.09046, pruned_loss=0.01645, audio_tagging_loss=0.009128, over 14448.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1027, pruned_loss=0.02172, audio_tagging_loss=0.01023, over 3041807.27 frames. ], batch size: 54, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:43:19,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=835273.3333333334, ans=0.125 2023-11-19 22:43:22,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-19 22:43:26,373 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125300 2023-11-19 22:44:08,439 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5100, loss[loss=0.07396, simple_loss=0.09721, pruned_loss=0.01705, audio_tagging_loss=0.008306, over 14819.00 frames. ], tot_loss[loss=0.083, simple_loss=0.1028, pruned_loss=0.02143, audio_tagging_loss=0.01018, over 3039299.80 frames. ], batch size: 55, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:44:23,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=835606.6666666666, ans=0.035 2023-11-19 22:44:30,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=835606.6666666666, ans=0.2 2023-11-19 22:44:31,259 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125350 2023-11-19 22:44:32,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=835606.6666666666, ans=0.0 2023-11-19 22:44:33,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.119e+01 8.827e+01 9.463e+01 1.199e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 22:44:35,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=835673.3333333334, ans=0.2 2023-11-19 22:45:03,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=835806.6666666666, ans=0.0 2023-11-19 22:45:14,111 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5150, loss[loss=0.0731, simple_loss=0.08804, pruned_loss=0.02257, audio_tagging_loss=0.006516, over 14528.00 frames. ], tot_loss[loss=0.08359, simple_loss=0.1034, pruned_loss=0.02168, audio_tagging_loss=0.01019, over 3036891.39 frames. ], batch size: 55, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:45:15,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=835873.3333333334, ans=0.2 2023-11-19 22:45:15,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=835873.3333333334, ans=0.125 2023-11-19 22:45:17,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=835873.3333333334, ans=0.125 2023-11-19 22:45:36,419 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125400 2023-11-19 22:46:06,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=836140.0, ans=0.1 2023-11-19 22:46:19,179 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5200, loss[loss=0.09824, simple_loss=0.1238, pruned_loss=0.02727, audio_tagging_loss=0.009088, over 16236.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.1031, pruned_loss=0.02146, audio_tagging_loss=0.01021, over 3039580.73 frames. ], batch size: 60, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:46:28,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=836206.6666666666, ans=0.09899494936611666 2023-11-19 22:46:31,107 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:46:40,483 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125450 2023-11-19 22:46:43,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.292e+01 9.057e+01 9.966e+01 1.254e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 22:47:16,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=836473.3333333334, ans=0.025 2023-11-19 22:47:23,305 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5250, loss[loss=0.1093, simple_loss=0.1359, pruned_loss=0.03294, audio_tagging_loss=0.008446, over 14959.00 frames. ], tot_loss[loss=0.08347, simple_loss=0.1032, pruned_loss=0.02164, audio_tagging_loss=0.01025, over 3035358.34 frames. ], batch size: 57, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:47:36,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=836606.6666666666, ans=0.07 2023-11-19 22:47:43,345 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:47:45,644 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125500 2023-11-19 22:47:50,913 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:47:55,077 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:47:55,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=836673.3333333334, ans=0.05 2023-11-19 22:48:15,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=836806.6666666666, ans=0.0 2023-11-19 22:48:24,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=836806.6666666666, ans=0.125 2023-11-19 22:48:26,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-19 22:48:28,091 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5300, loss[loss=0.08131, simple_loss=0.1078, pruned_loss=0.02015, audio_tagging_loss=0.007246, over 14467.00 frames. ], tot_loss[loss=0.08342, simple_loss=0.1031, pruned_loss=0.02165, audio_tagging_loss=0.0102, over 3036527.14 frames. ], batch size: 54, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:48:47,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-19 22:48:50,373 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125550 2023-11-19 22:48:53,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.677e+01 8.460e+01 9.151e+01 1.090e+02 1.553e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 22:49:27,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=837140.0, ans=0.2 2023-11-19 22:49:30,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=837140.0, ans=0.125 2023-11-19 22:49:33,681 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5350, loss[loss=0.07017, simple_loss=0.09169, pruned_loss=0.01533, audio_tagging_loss=0.008994, over 14717.00 frames. ], tot_loss[loss=0.08432, simple_loss=0.1046, pruned_loss=0.02192, audio_tagging_loss=0.01011, over 3042015.79 frames. ], batch size: 55, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:49:36,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-11-19 22:49:36,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2023-11-19 22:49:50,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837273.3333333334, ans=0.125 2023-11-19 22:49:54,923 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125600 2023-11-19 22:50:12,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837406.6666666666, ans=0.1 2023-11-19 22:50:18,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=837406.6666666666, ans=0.125 2023-11-19 22:50:38,019 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5400, loss[loss=0.1121, simple_loss=0.1428, pruned_loss=0.03122, audio_tagging_loss=0.00949, over 16431.00 frames. ], tot_loss[loss=0.08519, simple_loss=0.1057, pruned_loss=0.02227, audio_tagging_loss=0.01007, over 3031896.22 frames. ], batch size: 59, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:50:46,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=837540.0, ans=0.1 2023-11-19 22:50:50,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=837606.6666666666, ans=0.125 2023-11-19 22:50:57,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=837606.6666666666, ans=10.0 2023-11-19 22:51:00,266 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125650 2023-11-19 22:51:03,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.135e+01 8.658e+01 9.508e+01 1.289e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-19 22:51:04,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=837673.3333333334, ans=0.0 2023-11-19 22:51:10,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=837673.3333333334, ans=15.0 2023-11-19 22:51:17,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=837740.0, ans=0.125 2023-11-19 22:51:34,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837806.6666666666, ans=0.1 2023-11-19 22:51:42,673 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5450, loss[loss=0.09953, simple_loss=0.1305, pruned_loss=0.02683, audio_tagging_loss=0.007423, over 16047.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.1047, pruned_loss=0.02208, audio_tagging_loss=0.0103, over 3036273.73 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:51:48,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=837873.3333333334, ans=0.2 2023-11-19 22:52:04,547 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125700 2023-11-19 22:52:06,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=837940.0, ans=0.035 2023-11-19 22:52:06,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=837940.0, ans=0.125 2023-11-19 22:52:07,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.62 vs. limit=22.5 2023-11-19 22:52:11,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838006.6666666666, ans=0.1 2023-11-19 22:52:17,398 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:52:30,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=838073.3333333334, ans=0.125 2023-11-19 22:52:47,683 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5500, loss[loss=0.1043, simple_loss=0.1338, pruned_loss=0.02725, audio_tagging_loss=0.01014, over 15428.00 frames. ], tot_loss[loss=0.0847, simple_loss=0.1048, pruned_loss=0.02215, audio_tagging_loss=0.01018, over 3042261.48 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:52:50,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=838206.6666666666, ans=0.04949747468305833 2023-11-19 22:52:52,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=838206.6666666666, ans=0.0 2023-11-19 22:52:56,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=838206.6666666666, ans=0.125 2023-11-19 22:53:09,220 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125750 2023-11-19 22:53:11,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=838340.0, ans=0.125 2023-11-19 22:53:11,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838340.0, ans=0.1 2023-11-19 22:53:12,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.496e+01 8.990e+01 9.633e+01 1.229e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 22:53:15,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=838340.0, ans=0.07 2023-11-19 22:53:40,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-19 22:53:42,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=838473.3333333334, ans=0.125 2023-11-19 22:53:52,462 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5550, loss[loss=0.1192, simple_loss=0.1548, pruned_loss=0.03236, audio_tagging_loss=0.009426, over 15892.00 frames. ], tot_loss[loss=0.08583, simple_loss=0.1063, pruned_loss=0.02243, audio_tagging_loss=0.01026, over 3043528.36 frames. ], batch size: 57, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:53:55,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=838540.0, ans=0.2 2023-11-19 22:53:56,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=838540.0, ans=0.0 2023-11-19 22:54:14,289 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125800 2023-11-19 22:54:34,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=838740.0, ans=0.125 2023-11-19 22:54:43,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-19 22:54:50,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=838806.6666666666, ans=0.2 2023-11-19 22:54:57,833 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5600, loss[loss=0.09172, simple_loss=0.1101, pruned_loss=0.0261, audio_tagging_loss=0.01056, over 14449.00 frames. ], tot_loss[loss=0.08609, simple_loss=0.1067, pruned_loss=0.02245, audio_tagging_loss=0.0103, over 3046009.27 frames. ], batch size: 55, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:55:07,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=838873.3333333334, ans=0.0 2023-11-19 22:55:12,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=838940.0, ans=0.125 2023-11-19 22:55:19,579 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125850 2023-11-19 22:55:23,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.075e+01 8.698e+01 9.694e+01 1.274e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-19 22:55:44,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=839073.3333333334, ans=0.0 2023-11-19 22:55:46,044 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:55:57,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=839140.0, ans=0.95 2023-11-19 22:55:59,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=839140.0, ans=0.125 2023-11-19 22:56:02,497 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5650, loss[loss=0.09045, simple_loss=0.1197, pruned_loss=0.02192, audio_tagging_loss=0.008668, over 16383.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.107, pruned_loss=0.02231, audio_tagging_loss=0.01034, over 3049842.78 frames. ], batch size: 58, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:56:11,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=839206.6666666666, ans=0.0 2023-11-19 22:56:24,953 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125900 2023-11-19 22:56:37,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.16 vs. limit=10.0 2023-11-19 22:56:49,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=839406.6666666666, ans=0.07 2023-11-19 22:57:01,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839473.3333333334, ans=0.1 2023-11-19 22:57:06,839 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5700, loss[loss=0.1195, simple_loss=0.1407, pruned_loss=0.03957, audio_tagging_loss=0.009601, over 14361.00 frames. ], tot_loss[loss=0.0861, simple_loss=0.1068, pruned_loss=0.02231, audio_tagging_loss=0.01038, over 3048021.30 frames. ], batch size: 54, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:57:07,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=839540.0, ans=0.0 2023-11-19 22:57:21,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=839606.6666666666, ans=0.0 2023-11-19 22:57:25,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=839606.6666666666, ans=0.125 2023-11-19 22:57:29,135 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 125950 2023-11-19 22:57:29,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=839606.6666666666, ans=0.125 2023-11-19 22:57:34,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.315e+01 7.815e+01 8.868e+01 9.889e+01 1.263e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 22:57:34,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=839673.3333333334, ans=0.125 2023-11-19 22:57:34,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=839673.3333333334, ans=0.125 2023-11-19 22:57:44,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=839740.0, ans=0.125 2023-11-19 22:57:48,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=839740.0, ans=0.125 2023-11-19 22:57:56,479 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:58:11,318 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5750, loss[loss=0.08733, simple_loss=0.1064, pruned_loss=0.02359, audio_tagging_loss=0.01052, over 14647.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1058, pruned_loss=0.02221, audio_tagging_loss=0.0103, over 3049131.71 frames. ], batch size: 55, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:58:11,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=839873.3333333334, ans=0.125 2023-11-19 22:58:25,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=839940.0, ans=0.125 2023-11-19 22:58:33,799 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126000 2023-11-19 22:58:42,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=840006.6666666666, ans=0.0 2023-11-19 22:58:52,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=840073.3333333334, ans=0.025 2023-11-19 22:58:54,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=840073.3333333334, ans=0.125 2023-11-19 22:59:15,260 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:59:17,498 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5800, loss[loss=0.08017, simple_loss=0.09931, pruned_loss=0.02176, audio_tagging_loss=0.008762, over 15536.00 frames. ], tot_loss[loss=0.08504, simple_loss=0.1056, pruned_loss=0.02217, audio_tagging_loss=0.01007, over 3053508.47 frames. ], batch size: 58, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 22:59:34,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=840273.3333333334, ans=0.125 2023-11-19 22:59:39,040 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126050 2023-11-19 22:59:43,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.582e+01 8.258e+01 8.850e+01 9.872e+01 1.564e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 22:59:56,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840406.6666666666, ans=0.1 2023-11-19 23:00:22,526 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5850, loss[loss=0.07334, simple_loss=0.0908, pruned_loss=0.01793, audio_tagging_loss=0.01001, over 15214.00 frames. ], tot_loss[loss=0.08469, simple_loss=0.1053, pruned_loss=0.02204, audio_tagging_loss=0.01003, over 3042746.24 frames. ], batch size: 55, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:00:26,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=840540.0, ans=0.0 2023-11-19 23:00:37,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=840606.6666666666, ans=0.125 2023-11-19 23:00:38,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=840606.6666666666, ans=0.2 2023-11-19 23:00:39,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=840606.6666666666, ans=0.95 2023-11-19 23:00:39,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840606.6666666666, ans=0.1 2023-11-19 23:00:44,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-11-19 23:00:44,889 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126100 2023-11-19 23:01:03,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2023-11-19 23:01:06,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=840740.0, ans=0.0 2023-11-19 23:01:14,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=840806.6666666666, ans=0.2 2023-11-19 23:01:23,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-11-19 23:01:27,264 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5900, loss[loss=0.07672, simple_loss=0.09099, pruned_loss=0.01856, audio_tagging_loss=0.01266, over 14781.00 frames. ], tot_loss[loss=0.08385, simple_loss=0.1039, pruned_loss=0.02186, audio_tagging_loss=0.01002, over 3042807.16 frames. ], batch size: 54, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:01:44,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=840940.0, ans=0.95 2023-11-19 23:01:49,502 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126150 2023-11-19 23:01:54,162 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 8.354e+01 9.139e+01 9.974e+01 1.416e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 23:01:58,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-19 23:02:08,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=841073.3333333334, ans=0.125 2023-11-19 23:02:22,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2023-11-19 23:02:32,589 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 5950, loss[loss=0.07506, simple_loss=0.09541, pruned_loss=0.01946, audio_tagging_loss=0.007901, over 15746.00 frames. ], tot_loss[loss=0.08379, simple_loss=0.104, pruned_loss=0.02182, audio_tagging_loss=0.009954, over 3047669.68 frames. ], batch size: 57, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:02:53,883 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126200 2023-11-19 23:03:14,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=841406.6666666666, ans=0.0 2023-11-19 23:03:31,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=841473.3333333334, ans=0.95 2023-11-19 23:03:36,312 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6000, loss[loss=0.08118, simple_loss=0.1052, pruned_loss=0.02049, audio_tagging_loss=0.008086, over 14261.00 frames. ], tot_loss[loss=0.08264, simple_loss=0.1026, pruned_loss=0.02138, audio_tagging_loss=0.009968, over 3038353.69 frames. ], batch size: 55, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:03:36,312 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-19 23:04:00,653 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6494, 4.1814, 3.6339, 3.2571], device='cuda:2') 2023-11-19 23:04:01,565 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9161, 5.6788, 5.3196, 5.4763], device='cuda:2') 2023-11-19 23:04:05,779 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8883, 4.9630, 5.0131, 4.9228], device='cuda:2') 2023-11-19 23:04:18,354 INFO [train_asr.py:1294] (2/4) Epoch 11, validation: loss=0.06364, simple_loss=0.05477, pruned_loss=0.006179, audio_tagging_loss=0.03008, over 4681554.00 frames. 2023-11-19 23:04:18,354 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-19 23:04:34,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=841606.6666666666, ans=0.125 2023-11-19 23:04:40,344 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126250 2023-11-19 23:04:45,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.220e+01 8.140e+01 8.896e+01 9.801e+01 1.425e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 23:05:04,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=841740.0, ans=0.04949747468305833 2023-11-19 23:05:07,212 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:05:08,036 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-11-19 23:05:11,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=841806.6666666666, ans=0.2 2023-11-19 23:05:11,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=841806.6666666666, ans=0.125 2023-11-19 23:05:23,652 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6050, loss[loss=0.07762, simple_loss=0.09775, pruned_loss=0.01904, audio_tagging_loss=0.009705, over 15176.00 frames. ], tot_loss[loss=0.08215, simple_loss=0.1017, pruned_loss=0.02121, audio_tagging_loss=0.01007, over 3037816.92 frames. ], batch size: 58, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:05:31,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=841873.3333333334, ans=0.125 2023-11-19 23:05:45,490 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126300 2023-11-19 23:05:56,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=842006.6666666666, ans=0.0 2023-11-19 23:06:07,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842073.3333333334, ans=0.1 2023-11-19 23:06:28,648 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6100, loss[loss=0.06797, simple_loss=0.08544, pruned_loss=0.01762, audio_tagging_loss=0.007632, over 15152.00 frames. ], tot_loss[loss=0.08169, simple_loss=0.1011, pruned_loss=0.02107, audio_tagging_loss=0.01008, over 3034985.22 frames. ], batch size: 56, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:06:30,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842206.6666666666, ans=0.1 2023-11-19 23:06:37,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842206.6666666666, ans=0.1 2023-11-19 23:06:43,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-19 23:06:50,078 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126350 2023-11-19 23:06:55,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.655e+01 9.472e+01 1.043e+02 1.487e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-19 23:07:29,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842473.3333333334, ans=0.1 2023-11-19 23:07:32,562 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6150, loss[loss=0.05374, simple_loss=0.06075, pruned_loss=0.01264, audio_tagging_loss=0.01073, over 15323.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1014, pruned_loss=0.0211, audio_tagging_loss=0.01018, over 3046379.11 frames. ], batch size: 59, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:07:55,468 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126400 2023-11-19 23:08:39,211 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6200, loss[loss=0.08289, simple_loss=0.1054, pruned_loss=0.0168, audio_tagging_loss=0.0134, over 14671.00 frames. ], tot_loss[loss=0.08215, simple_loss=0.1016, pruned_loss=0.02113, audio_tagging_loss=0.01024, over 3045604.34 frames. ], batch size: 55, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:08:40,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842873.3333333334, ans=0.1 2023-11-19 23:08:49,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=842873.3333333334, ans=0.0 2023-11-19 23:09:00,681 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126450 2023-11-19 23:09:05,396 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.350e+01 8.922e+01 9.859e+01 1.209e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 23:09:09,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=843006.6666666666, ans=0.0 2023-11-19 23:09:16,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=843073.3333333334, ans=0.125 2023-11-19 23:09:42,307 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6250, loss[loss=0.1168, simple_loss=0.1414, pruned_loss=0.03839, audio_tagging_loss=0.007687, over 16198.00 frames. ], tot_loss[loss=0.08217, simple_loss=0.1011, pruned_loss=0.02125, audio_tagging_loss=0.01036, over 3043348.86 frames. ], batch size: 57, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:09:42,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=843206.6666666666, ans=0.125 2023-11-19 23:10:04,680 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126500 2023-11-19 23:10:04,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=843273.3333333334, ans=0.0 2023-11-19 23:10:15,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2023-11-19 23:10:32,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2023-11-19 23:10:47,064 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6300, loss[loss=0.08545, simple_loss=0.1045, pruned_loss=0.01944, audio_tagging_loss=0.01374, over 15409.00 frames. ], tot_loss[loss=0.08319, simple_loss=0.1023, pruned_loss=0.02157, audio_tagging_loss=0.01045, over 3042547.16 frames. ], batch size: 57, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:10:54,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=843540.0, ans=0.125 2023-11-19 23:11:09,963 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126550 2023-11-19 23:11:14,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843673.3333333334, ans=0.1 2023-11-19 23:11:14,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.231e+01 8.988e+01 9.749e+01 1.273e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 23:11:51,048 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:11:52,591 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6350, loss[loss=0.1132, simple_loss=0.1444, pruned_loss=0.03356, audio_tagging_loss=0.007464, over 16335.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.1043, pruned_loss=0.02204, audio_tagging_loss=0.01043, over 3042962.18 frames. ], batch size: 59, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:11:55,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=843873.3333333334, ans=0.125 2023-11-19 23:12:05,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=843940.0, ans=0.0 2023-11-19 23:12:08,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=843940.0, ans=0.2 2023-11-19 23:12:10,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=843940.0, ans=0.2 2023-11-19 23:12:14,909 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126600 2023-11-19 23:12:26,344 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:12:26,742 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=12.0 2023-11-19 23:12:28,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-19 23:12:32,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=844073.3333333334, ans=0.125 2023-11-19 23:12:34,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=844073.3333333334, ans=0.125 2023-11-19 23:12:57,579 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6400, loss[loss=0.0755, simple_loss=0.09385, pruned_loss=0.01622, audio_tagging_loss=0.01235, over 16572.00 frames. ], tot_loss[loss=0.08464, simple_loss=0.1042, pruned_loss=0.02208, audio_tagging_loss=0.01045, over 3045434.49 frames. ], batch size: 61, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:13:00,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=844206.6666666666, ans=0.09899494936611666 2023-11-19 23:13:19,173 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126650 2023-11-19 23:13:25,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.306e+01 8.849e+01 9.605e+01 1.260e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 23:13:51,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=844473.3333333334, ans=0.0 2023-11-19 23:14:01,783 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6450, loss[loss=0.0865, simple_loss=0.109, pruned_loss=0.02286, audio_tagging_loss=0.009134, over 13849.00 frames. ], tot_loss[loss=0.08305, simple_loss=0.102, pruned_loss=0.02147, audio_tagging_loss=0.0106, over 3033554.24 frames. ], batch size: 53, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:14:08,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-19 23:14:24,168 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126700 2023-11-19 23:14:26,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=844673.3333333334, ans=0.0 2023-11-19 23:14:26,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844673.3333333334, ans=0.1 2023-11-19 23:14:26,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=844673.3333333334, ans=0.125 2023-11-19 23:14:27,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844673.3333333334, ans=0.1 2023-11-19 23:14:32,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=844673.3333333334, ans=0.125 2023-11-19 23:14:35,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=844673.3333333334, ans=0.0 2023-11-19 23:14:48,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=844740.0, ans=0.125 2023-11-19 23:15:00,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=844806.6666666666, ans=0.2 2023-11-19 23:15:06,442 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6500, loss[loss=0.07356, simple_loss=0.07657, pruned_loss=0.02264, audio_tagging_loss=0.01265, over 15188.00 frames. ], tot_loss[loss=0.08261, simple_loss=0.1014, pruned_loss=0.02132, audio_tagging_loss=0.01058, over 3035585.18 frames. ], batch size: 58, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:15:08,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=844873.3333333334, ans=0.025 2023-11-19 23:15:16,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2023-11-19 23:15:29,780 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126750 2023-11-19 23:15:35,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.834e+01 8.115e+01 8.787e+01 9.556e+01 1.431e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-19 23:15:45,384 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-19 23:15:54,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=845073.3333333334, ans=0.125 2023-11-19 23:16:07,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845140.0, ans=0.1 2023-11-19 23:16:07,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=845140.0, ans=0.125 2023-11-19 23:16:12,377 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6550, loss[loss=0.06986, simple_loss=0.08015, pruned_loss=0.02074, audio_tagging_loss=0.009048, over 14757.00 frames. ], tot_loss[loss=0.08345, simple_loss=0.1026, pruned_loss=0.02163, audio_tagging_loss=0.01054, over 3035557.32 frames. ], batch size: 57, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:16:21,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-19 23:16:28,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2023-11-19 23:16:34,052 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126800 2023-11-19 23:16:39,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2023-11-19 23:16:42,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-19 23:17:11,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845473.3333333334, ans=0.1 2023-11-19 23:17:17,385 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6600, loss[loss=0.07293, simple_loss=0.08511, pruned_loss=0.01999, audio_tagging_loss=0.01038, over 15058.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1018, pruned_loss=0.02153, audio_tagging_loss=0.01043, over 3027701.82 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:17:32,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2023-11-19 23:17:40,141 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126850 2023-11-19 23:17:46,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.644e+01 8.323e+01 8.980e+01 9.690e+01 1.359e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 23:17:53,887 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:18:14,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845806.6666666666, ans=0.1 2023-11-19 23:18:18,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=845806.6666666666, ans=0.95 2023-11-19 23:18:22,255 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6650, loss[loss=0.08861, simple_loss=0.1079, pruned_loss=0.0204, audio_tagging_loss=0.01427, over 14679.00 frames. ], tot_loss[loss=0.08256, simple_loss=0.1014, pruned_loss=0.02146, audio_tagging_loss=0.01039, over 3032100.67 frames. ], batch size: 54, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:18:26,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=845873.3333333334, ans=0.125 2023-11-19 23:18:43,957 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126900 2023-11-19 23:18:44,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=845940.0, ans=0.0 2023-11-19 23:18:47,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=846006.6666666666, ans=0.0 2023-11-19 23:18:56,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=846006.6666666666, ans=0.0 2023-11-19 23:19:26,938 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6700, loss[loss=0.09319, simple_loss=0.1238, pruned_loss=0.02582, audio_tagging_loss=0.005463, over 14831.00 frames. ], tot_loss[loss=0.08288, simple_loss=0.1019, pruned_loss=0.02163, audio_tagging_loss=0.01028, over 3037499.68 frames. ], batch size: 54, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:19:49,154 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 126950 2023-11-19 23:19:50,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846273.3333333334, ans=0.1 2023-11-19 23:19:57,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.564e+01 8.084e+01 8.667e+01 9.126e+01 1.226e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-19 23:19:59,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.74 vs. limit=22.5 2023-11-19 23:20:02,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=846340.0, ans=0.125 2023-11-19 23:20:02,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-19 23:20:05,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=846406.6666666666, ans=0.2 2023-11-19 23:20:31,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=846540.0, ans=0.125 2023-11-19 23:20:32,097 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6750, loss[loss=0.1031, simple_loss=0.1333, pruned_loss=0.02876, audio_tagging_loss=0.007645, over 14521.00 frames. ], tot_loss[loss=0.08298, simple_loss=0.102, pruned_loss=0.02176, audio_tagging_loss=0.01021, over 3027550.03 frames. ], batch size: 53, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:20:45,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2023-11-19 23:20:53,942 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127000 2023-11-19 23:20:59,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.35 vs. limit=22.5 2023-11-19 23:21:05,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=846673.3333333334, ans=0.0 2023-11-19 23:21:22,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=846806.6666666666, ans=0.2 2023-11-19 23:21:36,269 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6800, loss[loss=0.07068, simple_loss=0.08419, pruned_loss=0.01842, audio_tagging_loss=0.01016, over 15314.00 frames. ], tot_loss[loss=0.08342, simple_loss=0.1025, pruned_loss=0.02199, audio_tagging_loss=0.01016, over 3040536.64 frames. ], batch size: 59, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:21:46,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=846873.3333333334, ans=0.0 2023-11-19 23:21:57,803 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127050 2023-11-19 23:21:59,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=846940.0, ans=0.0 2023-11-19 23:22:05,924 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.284e+01 8.995e+01 1.009e+02 1.556e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 23:22:08,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=847006.6666666666, ans=0.2 2023-11-19 23:22:09,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-19 23:22:41,143 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6850, loss[loss=0.1155, simple_loss=0.1571, pruned_loss=0.03121, audio_tagging_loss=0.005719, over 15924.00 frames. ], tot_loss[loss=0.0836, simple_loss=0.1028, pruned_loss=0.02206, audio_tagging_loss=0.01015, over 3036012.11 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:22:54,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=847273.3333333334, ans=0.125 2023-11-19 23:23:03,159 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127100 2023-11-19 23:23:05,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=847340.0, ans=0.125 2023-11-19 23:23:28,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=847406.6666666666, ans=0.125 2023-11-19 23:23:32,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=847473.3333333334, ans=0.125 2023-11-19 23:23:36,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-11-19 23:23:39,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=847473.3333333334, ans=0.125 2023-11-19 23:23:45,609 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6900, loss[loss=0.08447, simple_loss=0.1106, pruned_loss=0.02046, audio_tagging_loss=0.008724, over 15356.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1027, pruned_loss=0.02189, audio_tagging_loss=0.01013, over 3034684.90 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:24:07,957 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127150 2023-11-19 23:24:16,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.980e+01 8.238e+01 8.922e+01 9.730e+01 1.552e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 23:24:31,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-19 23:24:37,207 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:24:41,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=847806.6666666666, ans=0.1 2023-11-19 23:24:50,538 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 6950, loss[loss=0.105, simple_loss=0.1311, pruned_loss=0.03068, audio_tagging_loss=0.008787, over 16498.00 frames. ], tot_loss[loss=0.08401, simple_loss=0.1042, pruned_loss=0.02192, audio_tagging_loss=0.01, over 3043212.20 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:24:55,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847873.3333333334, ans=0.1 2023-11-19 23:25:06,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=847940.0, ans=0.125 2023-11-19 23:25:09,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2023-11-19 23:25:12,271 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127200 2023-11-19 23:25:13,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=847940.0, ans=0.125 2023-11-19 23:25:20,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=848006.6666666666, ans=0.0 2023-11-19 23:25:52,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=848140.0, ans=0.0 2023-11-19 23:25:53,660 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:25:53,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=848140.0, ans=0.125 2023-11-19 23:25:55,836 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7000, loss[loss=0.1024, simple_loss=0.1283, pruned_loss=0.02979, audio_tagging_loss=0.008452, over 15429.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.1025, pruned_loss=0.02159, audio_tagging_loss=0.01023, over 3042994.44 frames. ], batch size: 55, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:26:05,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=848206.6666666666, ans=0.125 2023-11-19 23:26:17,234 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127250 2023-11-19 23:26:23,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=848340.0, ans=0.125 2023-11-19 23:26:26,384 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.193e+01 9.050e+01 1.000e+02 1.255e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 23:26:44,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-19 23:26:46,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=848473.3333333334, ans=0.125 2023-11-19 23:27:00,070 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7050, loss[loss=0.07807, simple_loss=0.09332, pruned_loss=0.01863, audio_tagging_loss=0.01278, over 13969.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.1023, pruned_loss=0.02156, audio_tagging_loss=0.01036, over 3042023.92 frames. ], batch size: 53, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:27:22,399 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127300 2023-11-19 23:27:23,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=848606.6666666666, ans=0.125 2023-11-19 23:27:33,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.05 vs. limit=15.0 2023-11-19 23:27:38,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-19 23:27:53,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=848806.6666666666, ans=0.125 2023-11-19 23:27:58,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2023-11-19 23:28:03,965 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7100, loss[loss=0.07137, simple_loss=0.09523, pruned_loss=0.01503, audio_tagging_loss=0.008721, over 14808.00 frames. ], tot_loss[loss=0.08246, simple_loss=0.1014, pruned_loss=0.02139, audio_tagging_loss=0.01039, over 3042521.57 frames. ], batch size: 55, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:28:05,628 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:28:26,314 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127350 2023-11-19 23:28:34,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.042e+01 8.769e+01 9.719e+01 1.215e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-19 23:28:59,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=849140.0, ans=0.0 2023-11-19 23:29:09,327 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7150, loss[loss=0.07779, simple_loss=0.08772, pruned_loss=0.02197, audio_tagging_loss=0.01196, over 16575.00 frames. ], tot_loss[loss=0.08319, simple_loss=0.1022, pruned_loss=0.02162, audio_tagging_loss=0.01046, over 3048313.97 frames. ], batch size: 63, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:29:18,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849206.6666666666, ans=0.1 2023-11-19 23:29:24,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2023-11-19 23:29:30,734 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127400 2023-11-19 23:29:36,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=849340.0, ans=0.0 2023-11-19 23:29:47,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=849406.6666666666, ans=0.125 2023-11-19 23:30:08,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=849473.3333333334, ans=0.125 2023-11-19 23:30:13,321 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7200, loss[loss=0.09296, simple_loss=0.1169, pruned_loss=0.02603, audio_tagging_loss=0.008474, over 14380.00 frames. ], tot_loss[loss=0.0827, simple_loss=0.1019, pruned_loss=0.02129, audio_tagging_loss=0.01044, over 3051355.50 frames. ], batch size: 53, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:30:35,628 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127450 2023-11-19 23:30:45,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.591e+01 9.847e+01 1.111e+02 1.455e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-19 23:30:45,603 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:30:47,355 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2023-11-19 23:31:18,083 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7250, loss[loss=0.07226, simple_loss=0.09121, pruned_loss=0.01664, audio_tagging_loss=0.01001, over 15822.00 frames. ], tot_loss[loss=0.08267, simple_loss=0.1017, pruned_loss=0.02124, audio_tagging_loss=0.0106, over 3052739.41 frames. ], batch size: 59, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:31:40,789 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127500 2023-11-19 23:31:43,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=850006.6666666666, ans=0.0 2023-11-19 23:31:58,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=850073.3333333334, ans=0.125 2023-11-19 23:32:12,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=850140.0, ans=0.125 2023-11-19 23:32:23,470 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7300, loss[loss=0.07422, simple_loss=0.09151, pruned_loss=0.01881, audio_tagging_loss=0.009651, over 15589.00 frames. ], tot_loss[loss=0.08317, simple_loss=0.1024, pruned_loss=0.02144, audio_tagging_loss=0.01051, over 3054249.59 frames. ], batch size: 57, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:32:29,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=850206.6666666666, ans=0.1 2023-11-19 23:32:45,019 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127550 2023-11-19 23:32:50,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-19 23:32:53,433 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.382e+01 8.309e+01 8.798e+01 9.625e+01 1.232e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-19 23:33:03,432 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=15.0 2023-11-19 23:33:05,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=850406.6666666666, ans=0.125 2023-11-19 23:33:20,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=850473.3333333334, ans=0.2 2023-11-19 23:33:27,432 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7350, loss[loss=0.05365, simple_loss=0.06356, pruned_loss=0.01023, audio_tagging_loss=0.01164, over 15145.00 frames. ], tot_loss[loss=0.08323, simple_loss=0.1027, pruned_loss=0.02163, audio_tagging_loss=0.01023, over 3050335.17 frames. ], batch size: 60, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:33:31,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=850540.0, ans=0.2 2023-11-19 23:33:35,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=850540.0, ans=0.125 2023-11-19 23:33:43,119 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2023-11-19 23:33:48,897 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127600 2023-11-19 23:34:02,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2023-11-19 23:34:13,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=850740.0, ans=0.125 2023-11-19 23:34:17,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=850806.6666666666, ans=0.125 2023-11-19 23:34:18,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=850806.6666666666, ans=0.2 2023-11-19 23:34:31,287 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7400, loss[loss=0.08511, simple_loss=0.1, pruned_loss=0.02417, audio_tagging_loss=0.01094, over 14462.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1025, pruned_loss=0.02145, audio_tagging_loss=0.01028, over 3050240.29 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:34:34,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-11-19 23:34:51,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=850940.0, ans=0.05 2023-11-19 23:34:52,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=850940.0, ans=0.0 2023-11-19 23:34:53,556 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127650 2023-11-19 23:34:53,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=850940.0, ans=0.0 2023-11-19 23:35:02,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.438e+01 8.975e+01 9.970e+01 1.315e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-19 23:35:19,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=851073.3333333334, ans=0.1 2023-11-19 23:35:35,671 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7450, loss[loss=0.08883, simple_loss=0.1049, pruned_loss=0.02762, audio_tagging_loss=0.008775, over 16334.00 frames. ], tot_loss[loss=0.0835, simple_loss=0.1032, pruned_loss=0.02173, audio_tagging_loss=0.01017, over 3046750.85 frames. ], batch size: 62, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:35:56,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=851273.3333333334, ans=0.125 2023-11-19 23:35:57,641 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127700 2023-11-19 23:36:03,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=851340.0, ans=0.125 2023-11-19 23:36:09,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-19 23:36:31,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=851473.3333333334, ans=0.0 2023-11-19 23:36:37,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=851473.3333333334, ans=0.0 2023-11-19 23:36:40,540 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7500, loss[loss=0.104, simple_loss=0.1356, pruned_loss=0.02699, audio_tagging_loss=0.009155, over 15201.00 frames. ], tot_loss[loss=0.08344, simple_loss=0.103, pruned_loss=0.02172, audio_tagging_loss=0.0102, over 3047418.16 frames. ], batch size: 58, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:36:51,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=851540.0, ans=0.125 2023-11-19 23:37:02,029 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127750 2023-11-19 23:37:12,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.268e+01 8.982e+01 9.702e+01 1.380e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 23:37:32,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=851806.6666666666, ans=0.125 2023-11-19 23:37:43,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=851873.3333333334, ans=0.125 2023-11-19 23:37:44,455 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7550, loss[loss=0.08266, simple_loss=0.1017, pruned_loss=0.02121, audio_tagging_loss=0.0106, over 14984.00 frames. ], tot_loss[loss=0.08282, simple_loss=0.1022, pruned_loss=0.02153, audio_tagging_loss=0.01021, over 3046243.76 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 16.0 2023-11-19 23:37:53,255 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:38:06,329 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127800 2023-11-19 23:38:15,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=852006.6666666666, ans=0.125 2023-11-19 23:38:17,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=852006.6666666666, ans=0.125 2023-11-19 23:38:33,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=852073.3333333334, ans=0.05 2023-11-19 23:38:48,613 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7600, loss[loss=0.07643, simple_loss=0.09843, pruned_loss=0.01892, audio_tagging_loss=0.008294, over 14123.00 frames. ], tot_loss[loss=0.08281, simple_loss=0.1022, pruned_loss=0.02152, audio_tagging_loss=0.01017, over 3047449.01 frames. ], batch size: 54, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:38:48,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=852206.6666666666, ans=0.125 2023-11-19 23:38:55,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=852206.6666666666, ans=0.125 2023-11-19 23:39:01,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852273.3333333334, ans=0.1 2023-11-19 23:39:10,698 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127850 2023-11-19 23:39:19,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=852340.0, ans=0.2 2023-11-19 23:39:20,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.303e+01 8.868e+01 9.604e+01 1.243e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 23:39:33,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=852406.6666666666, ans=0.125 2023-11-19 23:39:38,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-19 23:39:44,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=852473.3333333334, ans=0.125 2023-11-19 23:39:52,489 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7650, loss[loss=0.1077, simple_loss=0.146, pruned_loss=0.02767, audio_tagging_loss=0.007027, over 14928.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1018, pruned_loss=0.02129, audio_tagging_loss=0.0102, over 3051839.12 frames. ], batch size: 52, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:39:52,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=852540.0, ans=0.1 2023-11-19 23:40:11,160 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:40:12,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=852606.6666666666, ans=0.0 2023-11-19 23:40:14,496 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127900 2023-11-19 23:40:18,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=852673.3333333334, ans=0.0 2023-11-19 23:40:57,032 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7700, loss[loss=0.06906, simple_loss=0.08123, pruned_loss=0.01598, audio_tagging_loss=0.01247, over 14845.00 frames. ], tot_loss[loss=0.08227, simple_loss=0.1016, pruned_loss=0.0212, audio_tagging_loss=0.01029, over 3040626.12 frames. ], batch size: 56, lr: 6.26e-03, grad_scale: 16.0 2023-11-19 23:40:58,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=852873.3333333334, ans=0.125 2023-11-19 23:41:10,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2023-11-19 23:41:19,333 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 127950 2023-11-19 23:41:30,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=853006.6666666666, ans=0.2 2023-11-19 23:41:31,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.407e+01 9.041e+01 9.727e+01 1.362e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 23:41:45,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-11-19 23:41:48,832 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:42:01,279 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7750, loss[loss=0.1038, simple_loss=0.1291, pruned_loss=0.02938, audio_tagging_loss=0.00993, over 15331.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.1016, pruned_loss=0.02119, audio_tagging_loss=0.01033, over 3036715.77 frames. ], batch size: 56, lr: 6.26e-03, grad_scale: 8.0 2023-11-19 23:42:22,900 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128000 2023-11-19 23:42:31,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=853340.0, ans=0.125 2023-11-19 23:42:37,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2023-11-19 23:42:40,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-19 23:42:47,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=853406.6666666666, ans=0.1 2023-11-19 23:42:52,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=853406.6666666666, ans=0.125 2023-11-19 23:43:04,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=853473.3333333334, ans=0.0 2023-11-19 23:43:09,399 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7800, loss[loss=0.07332, simple_loss=0.08974, pruned_loss=0.01842, audio_tagging_loss=0.01003, over 14848.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1012, pruned_loss=0.02108, audio_tagging_loss=0.01039, over 3039965.04 frames. ], batch size: 55, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:43:09,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=853540.0, ans=0.2 2023-11-19 23:43:15,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853540.0, ans=0.1 2023-11-19 23:43:21,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=22.5 2023-11-19 23:43:31,556 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128050 2023-11-19 23:43:35,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=853673.3333333334, ans=0.125 2023-11-19 23:43:44,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.217e+01 8.897e+01 9.655e+01 1.501e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 23:44:14,104 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7850, loss[loss=0.07599, simple_loss=0.0962, pruned_loss=0.01794, audio_tagging_loss=0.009948, over 14098.00 frames. ], tot_loss[loss=0.08222, simple_loss=0.101, pruned_loss=0.02124, audio_tagging_loss=0.01046, over 3039530.04 frames. ], batch size: 53, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:44:14,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2023-11-19 23:44:16,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=853873.3333333334, ans=0.2 2023-11-19 23:44:19,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=853873.3333333334, ans=0.125 2023-11-19 23:44:29,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=853940.0, ans=0.125 2023-11-19 23:44:33,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2023-11-19 23:44:35,542 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128100 2023-11-19 23:44:46,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=854006.6666666666, ans=0.0 2023-11-19 23:44:47,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854006.6666666666, ans=0.1 2023-11-19 23:44:52,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=854073.3333333334, ans=0.125 2023-11-19 23:44:55,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=854073.3333333334, ans=0.2 2023-11-19 23:44:57,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=854073.3333333334, ans=0.2 2023-11-19 23:45:06,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=854140.0, ans=0.125 2023-11-19 23:45:08,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=854140.0, ans=0.125 2023-11-19 23:45:17,526 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7900, loss[loss=0.07275, simple_loss=0.09033, pruned_loss=0.01607, audio_tagging_loss=0.01151, over 14560.00 frames. ], tot_loss[loss=0.08263, simple_loss=0.1016, pruned_loss=0.02137, audio_tagging_loss=0.01046, over 3038066.33 frames. ], batch size: 55, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:45:20,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=854206.6666666666, ans=0.125 2023-11-19 23:45:34,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=854273.3333333334, ans=0.125 2023-11-19 23:45:39,333 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128150 2023-11-19 23:45:52,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.209e+01 8.987e+01 9.607e+01 1.593e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 23:46:00,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=854406.6666666666, ans=0.125 2023-11-19 23:46:22,312 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 7950, loss[loss=0.08692, simple_loss=0.1003, pruned_loss=0.02648, audio_tagging_loss=0.0103, over 15339.00 frames. ], tot_loss[loss=0.08244, simple_loss=0.1009, pruned_loss=0.02136, audio_tagging_loss=0.01061, over 3037137.52 frames. ], batch size: 59, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:46:38,897 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:46:44,311 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128200 2023-11-19 23:46:59,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=854673.3333333334, ans=0.125 2023-11-19 23:47:04,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=854740.0, ans=0.2 2023-11-19 23:47:24,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2023-11-19 23:47:26,485 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8000, loss[loss=0.08852, simple_loss=0.1028, pruned_loss=0.02693, audio_tagging_loss=0.0102, over 15056.00 frames. ], tot_loss[loss=0.08251, simple_loss=0.101, pruned_loss=0.02138, audio_tagging_loss=0.01061, over 3035399.79 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:47:32,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=854873.3333333334, ans=0.1 2023-11-19 23:47:49,105 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128250 2023-11-19 23:48:01,614 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.413e+01 8.250e+01 9.015e+01 9.647e+01 1.325e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 23:48:01,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=855006.6666666666, ans=0.0 2023-11-19 23:48:04,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=855073.3333333334, ans=0.0 2023-11-19 23:48:31,471 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8050, loss[loss=0.08421, simple_loss=0.0964, pruned_loss=0.02482, audio_tagging_loss=0.01119, over 14321.00 frames. ], tot_loss[loss=0.08221, simple_loss=0.1006, pruned_loss=0.02128, audio_tagging_loss=0.01065, over 3038265.02 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:48:39,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-19 23:48:43,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=855273.3333333334, ans=0.125 2023-11-19 23:48:53,381 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128300 2023-11-19 23:48:53,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=855273.3333333334, ans=0.09899494936611666 2023-11-19 23:48:53,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=855273.3333333334, ans=0.125 2023-11-19 23:48:58,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=855340.0, ans=0.125 2023-11-19 23:49:12,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=855406.6666666666, ans=0.125 2023-11-19 23:49:22,873 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-11-19 23:49:30,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=22.5 2023-11-19 23:49:35,487 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8100, loss[loss=0.08133, simple_loss=0.1008, pruned_loss=0.02081, audio_tagging_loss=0.01012, over 15090.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.1015, pruned_loss=0.02153, audio_tagging_loss=0.01048, over 3038837.35 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:49:56,778 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128350 2023-11-19 23:50:06,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=855673.3333333334, ans=0.0 2023-11-19 23:50:09,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 8.405e+01 9.063e+01 9.959e+01 1.355e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-19 23:50:15,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=855740.0, ans=0.0 2023-11-19 23:50:22,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-19 23:50:29,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=855806.6666666666, ans=0.125 2023-11-19 23:50:37,064 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:50:38,131 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8150, loss[loss=0.09487, simple_loss=0.1086, pruned_loss=0.02915, audio_tagging_loss=0.01143, over 15247.00 frames. ], tot_loss[loss=0.08266, simple_loss=0.1017, pruned_loss=0.02146, audio_tagging_loss=0.01036, over 3042784.95 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:50:52,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=855940.0, ans=0.125 2023-11-19 23:50:59,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=855940.0, ans=0.125 2023-11-19 23:51:00,911 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128400 2023-11-19 23:51:01,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=855940.0, ans=0.0 2023-11-19 23:51:42,596 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8200, loss[loss=0.09271, simple_loss=0.1181, pruned_loss=0.02662, audio_tagging_loss=0.007027, over 15095.00 frames. ], tot_loss[loss=0.08367, simple_loss=0.1034, pruned_loss=0.02178, audio_tagging_loss=0.01021, over 3034275.92 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:51:45,070 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:51:51,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=856206.6666666666, ans=0.125 2023-11-19 23:52:00,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-19 23:52:05,172 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128450 2023-11-19 23:52:17,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.248e+01 8.899e+01 9.586e+01 1.451e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 23:52:48,572 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8250, loss[loss=0.06859, simple_loss=0.08019, pruned_loss=0.01603, audio_tagging_loss=0.01247, over 14865.00 frames. ], tot_loss[loss=0.08252, simple_loss=0.1018, pruned_loss=0.0214, audio_tagging_loss=0.01022, over 3030972.71 frames. ], batch size: 59, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:52:51,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=856540.0, ans=0.0 2023-11-19 23:53:09,866 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128500 2023-11-19 23:53:24,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=856740.0, ans=0.125 2023-11-19 23:53:35,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=856740.0, ans=0.09899494936611666 2023-11-19 23:53:51,360 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8300, loss[loss=0.0846, simple_loss=0.1005, pruned_loss=0.02301, audio_tagging_loss=0.01137, over 15645.00 frames. ], tot_loss[loss=0.08276, simple_loss=0.102, pruned_loss=0.02149, audio_tagging_loss=0.01028, over 3037939.04 frames. ], batch size: 58, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:53:59,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=856873.3333333334, ans=0.2 2023-11-19 23:54:12,761 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128550 2023-11-19 23:54:27,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.283e+01 8.806e+01 9.666e+01 1.225e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-19 23:54:28,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857006.6666666666, ans=0.1 2023-11-19 23:54:36,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=22.5 2023-11-19 23:54:38,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=857073.3333333334, ans=0.2 2023-11-19 23:54:39,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=857073.3333333334, ans=0.125 2023-11-19 23:54:39,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-19 23:54:47,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=857140.0, ans=0.125 2023-11-19 23:54:55,433 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8350, loss[loss=0.07824, simple_loss=0.09003, pruned_loss=0.0207, audio_tagging_loss=0.01253, over 14949.00 frames. ], tot_loss[loss=0.08265, simple_loss=0.1019, pruned_loss=0.02131, audio_tagging_loss=0.01037, over 3038203.60 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 8.0 2023-11-19 23:54:55,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=857206.6666666666, ans=0.125 2023-11-19 23:54:55,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=857206.6666666666, ans=0.125 2023-11-19 23:54:56,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-11-19 23:55:18,365 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128600 2023-11-19 23:55:21,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857340.0, ans=0.1 2023-11-19 23:55:33,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=857406.6666666666, ans=0.0 2023-11-19 23:55:37,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=857406.6666666666, ans=0.2 2023-11-19 23:55:47,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857473.3333333334, ans=0.1 2023-11-19 23:55:51,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=857473.3333333334, ans=0.0 2023-11-19 23:55:55,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=857473.3333333334, ans=0.025 2023-11-19 23:56:00,796 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8400, loss[loss=0.08854, simple_loss=0.1145, pruned_loss=0.02168, audio_tagging_loss=0.009607, over 14826.00 frames. ], tot_loss[loss=0.08331, simple_loss=0.1031, pruned_loss=0.02153, audio_tagging_loss=0.01023, over 3042206.61 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:56:09,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-19 23:56:10,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=857540.0, ans=0.125 2023-11-19 23:56:16,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=857606.6666666666, ans=0.0 2023-11-19 23:56:22,361 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128650 2023-11-19 23:56:36,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.256e+01 8.921e+01 9.764e+01 1.880e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-19 23:56:38,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=857740.0, ans=0.125 2023-11-19 23:56:41,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=857740.0, ans=0.2 2023-11-19 23:57:04,613 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8450, loss[loss=0.08274, simple_loss=0.09052, pruned_loss=0.02384, audio_tagging_loss=0.01363, over 16007.00 frames. ], tot_loss[loss=0.08319, simple_loss=0.1027, pruned_loss=0.02157, audio_tagging_loss=0.01029, over 3046949.35 frames. ], batch size: 60, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:57:26,157 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128700 2023-11-19 23:57:35,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=858006.6666666666, ans=0.2 2023-11-19 23:58:08,031 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8500, loss[loss=0.08657, simple_loss=0.1127, pruned_loss=0.02007, audio_tagging_loss=0.01015, over 14818.00 frames. ], tot_loss[loss=0.08373, simple_loss=0.1033, pruned_loss=0.0218, audio_tagging_loss=0.01028, over 3047959.73 frames. ], batch size: 55, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:58:09,495 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:58:12,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=858206.6666666666, ans=0.0 2023-11-19 23:58:22,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858273.3333333334, ans=0.1 2023-11-19 23:58:24,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=858273.3333333334, ans=0.125 2023-11-19 23:58:30,645 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128750 2023-11-19 23:58:36,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=858340.0, ans=0.1 2023-11-19 23:58:43,811 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.248e+01 9.044e+01 1.008e+02 1.243e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 23:58:45,316 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:58:47,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=858406.6666666666, ans=0.125 2023-11-19 23:59:05,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=858473.3333333334, ans=0.125 2023-11-19 23:59:12,499 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8550, loss[loss=0.07598, simple_loss=0.09554, pruned_loss=0.01692, audio_tagging_loss=0.01129, over 14835.00 frames. ], tot_loss[loss=0.08315, simple_loss=0.1027, pruned_loss=0.02154, audio_tagging_loss=0.01028, over 3044833.25 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:59:16,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=858540.0, ans=0.025 2023-11-19 23:59:18,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=858540.0, ans=0.1 2023-11-19 23:59:22,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=858540.0, ans=0.2 2023-11-19 23:59:23,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=858540.0, ans=0.2 2023-11-19 23:59:34,428 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128800 2023-11-19 23:59:37,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=858673.3333333334, ans=0.2 2023-11-19 23:59:42,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=858673.3333333334, ans=0.0 2023-11-19 23:59:47,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=858673.3333333334, ans=0.2 2023-11-19 23:59:58,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=858740.0, ans=0.0 2023-11-20 00:00:06,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-20 00:00:17,212 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8600, loss[loss=0.08654, simple_loss=0.1069, pruned_loss=0.02217, audio_tagging_loss=0.0109, over 14170.00 frames. ], tot_loss[loss=0.08249, simple_loss=0.1016, pruned_loss=0.02117, audio_tagging_loss=0.0105, over 3040859.92 frames. ], batch size: 53, lr: 6.24e-03, grad_scale: 16.0 2023-11-20 00:00:18,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=858873.3333333334, ans=0.05 2023-11-20 00:00:21,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858873.3333333334, ans=0.1 2023-11-20 00:00:22,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=858873.3333333334, ans=0.2 2023-11-20 00:00:26,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=858873.3333333334, ans=0.2 2023-11-20 00:00:35,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=858940.0, ans=0.125 2023-11-20 00:00:38,594 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128850 2023-11-20 00:00:52,609 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.239e+01 8.842e+01 9.457e+01 1.153e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 00:01:05,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2023-11-20 00:01:08,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.14 vs. limit=10.0 2023-11-20 00:01:17,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=859140.0, ans=0.125 2023-11-20 00:01:21,506 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8650, loss[loss=0.09986, simple_loss=0.1154, pruned_loss=0.03172, audio_tagging_loss=0.01042, over 13805.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.1029, pruned_loss=0.02132, audio_tagging_loss=0.01041, over 3046881.59 frames. ], batch size: 53, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:01:43,252 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128900 2023-11-20 00:01:51,072 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2023-11-20 00:01:54,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=859340.0, ans=0.125 2023-11-20 00:01:56,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=859340.0, ans=0.125 2023-11-20 00:02:00,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=859406.6666666666, ans=0.125 2023-11-20 00:02:03,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=859406.6666666666, ans=0.125 2023-11-20 00:02:24,857 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8700, loss[loss=0.09307, simple_loss=0.115, pruned_loss=0.02615, audio_tagging_loss=0.009412, over 14534.00 frames. ], tot_loss[loss=0.08262, simple_loss=0.102, pruned_loss=0.02115, audio_tagging_loss=0.0105, over 3038245.17 frames. ], batch size: 56, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:02:25,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=859540.0, ans=0.125 2023-11-20 00:02:25,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=859540.0, ans=0.125 2023-11-20 00:02:47,782 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 128950 2023-11-20 00:02:58,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=859673.3333333334, ans=0.0 2023-11-20 00:03:01,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.243e+01 8.970e+01 9.872e+01 1.298e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-20 00:03:12,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=859740.0, ans=0.0 2023-11-20 00:03:29,209 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8750, loss[loss=0.09503, simple_loss=0.1168, pruned_loss=0.02746, audio_tagging_loss=0.009187, over 15960.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1031, pruned_loss=0.02156, audio_tagging_loss=0.01043, over 3041787.67 frames. ], batch size: 58, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:03:33,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859873.3333333334, ans=0.1 2023-11-20 00:03:39,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=859873.3333333334, ans=0.2 2023-11-20 00:03:51,134 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129000 2023-11-20 00:03:56,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-20 00:04:01,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=860006.6666666666, ans=0.125 2023-11-20 00:04:01,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=860006.6666666666, ans=0.0 2023-11-20 00:04:11,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=860073.3333333334, ans=0.05 2023-11-20 00:04:11,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=860073.3333333334, ans=0.125 2023-11-20 00:04:33,953 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8800, loss[loss=0.09839, simple_loss=0.1158, pruned_loss=0.02689, audio_tagging_loss=0.01358, over 15196.00 frames. ], tot_loss[loss=0.08393, simple_loss=0.1034, pruned_loss=0.02163, audio_tagging_loss=0.01059, over 3044385.45 frames. ], batch size: 55, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:04:45,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=860273.3333333334, ans=0.0 2023-11-20 00:04:55,345 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129050 2023-11-20 00:05:09,092 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.422e+01 9.194e+01 1.008e+02 1.237e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 00:05:10,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=860406.6666666666, ans=0.125 2023-11-20 00:05:10,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-20 00:05:18,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=860406.6666666666, ans=0.04949747468305833 2023-11-20 00:05:29,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=860473.3333333334, ans=0.125 2023-11-20 00:05:36,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=860540.0, ans=0.125 2023-11-20 00:05:37,379 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8850, loss[loss=0.09832, simple_loss=0.1214, pruned_loss=0.02746, audio_tagging_loss=0.01016, over 15717.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1046, pruned_loss=0.02192, audio_tagging_loss=0.01042, over 3052521.35 frames. ], batch size: 57, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:05:52,301 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:05:59,808 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129100 2023-11-20 00:06:08,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2023-11-20 00:06:19,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.62 vs. limit=22.5 2023-11-20 00:06:23,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=860740.0, ans=0.0 2023-11-20 00:06:42,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=860873.3333333334, ans=0.0 2023-11-20 00:06:43,078 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8900, loss[loss=0.06275, simple_loss=0.08607, pruned_loss=0.01283, audio_tagging_loss=0.006886, over 15374.00 frames. ], tot_loss[loss=0.08453, simple_loss=0.1046, pruned_loss=0.02201, audio_tagging_loss=0.01021, over 3053776.38 frames. ], batch size: 61, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:07:02,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=860940.0, ans=0.125 2023-11-20 00:07:05,296 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129150 2023-11-20 00:07:18,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.249e+01 8.942e+01 1.024e+02 1.298e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 00:07:47,615 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 8950, loss[loss=0.08119, simple_loss=0.09911, pruned_loss=0.02089, audio_tagging_loss=0.01076, over 14557.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.106, pruned_loss=0.0224, audio_tagging_loss=0.01005, over 3049369.77 frames. ], batch size: 54, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:07:47,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=861206.6666666666, ans=0.0 2023-11-20 00:08:09,156 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129200 2023-11-20 00:08:17,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=861340.0, ans=0.09899494936611666 2023-11-20 00:08:28,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=861406.6666666666, ans=0.1 2023-11-20 00:08:36,249 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:08:52,268 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9000, loss[loss=0.09298, simple_loss=0.117, pruned_loss=0.02553, audio_tagging_loss=0.008929, over 15203.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1057, pruned_loss=0.02232, audio_tagging_loss=0.009998, over 3045785.97 frames. ], batch size: 57, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:08:52,269 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 00:09:31,839 INFO [train_asr.py:1294] (2/4) Epoch 11, validation: loss=0.06425, simple_loss=0.05461, pruned_loss=0.006061, audio_tagging_loss=0.03088, over 4681554.00 frames. 2023-11-20 00:09:31,840 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 00:09:35,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-20 00:09:53,909 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129250 2023-11-20 00:10:03,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-11-20 00:10:08,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.204e+01 8.877e+01 9.469e+01 1.301e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 00:10:11,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861740.0, ans=0.1 2023-11-20 00:10:27,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=861806.6666666666, ans=0.0 2023-11-20 00:10:27,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861806.6666666666, ans=0.1 2023-11-20 00:10:35,644 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9050, loss[loss=0.0959, simple_loss=0.1215, pruned_loss=0.02408, audio_tagging_loss=0.01108, over 15858.00 frames. ], tot_loss[loss=0.08464, simple_loss=0.1053, pruned_loss=0.02203, audio_tagging_loss=0.009952, over 3056052.75 frames. ], batch size: 60, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:10:35,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=861873.3333333334, ans=0.0 2023-11-20 00:10:39,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=861873.3333333334, ans=0.125 2023-11-20 00:10:41,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=861873.3333333334, ans=0.2 2023-11-20 00:10:57,892 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129300 2023-11-20 00:11:29,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=862140.0, ans=0.0 2023-11-20 00:11:37,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=862140.0, ans=0.125 2023-11-20 00:11:39,795 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9100, loss[loss=0.0782, simple_loss=0.09561, pruned_loss=0.02005, audio_tagging_loss=0.01035, over 14516.00 frames. ], tot_loss[loss=0.08413, simple_loss=0.1047, pruned_loss=0.02184, audio_tagging_loss=0.009953, over 3050446.25 frames. ], batch size: 54, lr: 6.22e-03, grad_scale: 16.0 2023-11-20 00:11:52,428 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:11:57,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=862273.3333333334, ans=0.2 2023-11-20 00:12:02,105 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129350 2023-11-20 00:12:06,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=862340.0, ans=0.1 2023-11-20 00:12:12,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-20 00:12:17,840 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.243e+01 9.006e+01 9.571e+01 1.391e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 00:12:27,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.23 vs. limit=5.0 2023-11-20 00:12:44,992 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9150, loss[loss=0.07424, simple_loss=0.08537, pruned_loss=0.01869, audio_tagging_loss=0.01287, over 13748.00 frames. ], tot_loss[loss=0.08391, simple_loss=0.1041, pruned_loss=0.02181, audio_tagging_loss=0.01003, over 3039903.72 frames. ], batch size: 54, lr: 6.22e-03, grad_scale: 16.0 2023-11-20 00:12:50,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=862540.0, ans=0.125 2023-11-20 00:13:06,329 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129400 2023-11-20 00:13:13,734 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:13:28,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.08 vs. limit=22.5 2023-11-20 00:13:33,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2023-11-20 00:13:42,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862806.6666666666, ans=0.1 2023-11-20 00:13:49,204 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9200, loss[loss=0.07803, simple_loss=0.0876, pruned_loss=0.02093, audio_tagging_loss=0.0133, over 16141.00 frames. ], tot_loss[loss=0.08339, simple_loss=0.1032, pruned_loss=0.02162, audio_tagging_loss=0.01015, over 3038020.23 frames. ], batch size: 62, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:14:09,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=862940.0, ans=0.125 2023-11-20 00:14:09,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=862940.0, ans=0.125 2023-11-20 00:14:11,576 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129450 2023-11-20 00:14:13,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=862940.0, ans=0.125 2023-11-20 00:14:15,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=863006.6666666666, ans=0.07 2023-11-20 00:14:17,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=863006.6666666666, ans=0.0 2023-11-20 00:14:18,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=863006.6666666666, ans=0.125 2023-11-20 00:14:22,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863006.6666666666, ans=0.1 2023-11-20 00:14:27,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.355e+01 9.081e+01 9.853e+01 1.317e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 00:14:44,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=863140.0, ans=0.0 2023-11-20 00:14:50,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=22.5 2023-11-20 00:14:54,746 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9250, loss[loss=0.07662, simple_loss=0.08917, pruned_loss=0.01909, audio_tagging_loss=0.01295, over 16800.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1032, pruned_loss=0.02165, audio_tagging_loss=0.0101, over 3049322.59 frames. ], batch size: 65, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:14:56,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=863206.6666666666, ans=0.2 2023-11-20 00:14:58,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=863206.6666666666, ans=0.0 2023-11-20 00:15:14,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-20 00:15:16,856 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129500 2023-11-20 00:15:28,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863340.0, ans=0.1 2023-11-20 00:15:32,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=863406.6666666666, ans=0.2 2023-11-20 00:15:36,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=863406.6666666666, ans=0.125 2023-11-20 00:15:41,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-20 00:15:54,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2023-11-20 00:15:57,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=863473.3333333334, ans=0.2 2023-11-20 00:15:59,612 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9300, loss[loss=0.09174, simple_loss=0.112, pruned_loss=0.02576, audio_tagging_loss=0.009959, over 15208.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.1034, pruned_loss=0.02151, audio_tagging_loss=0.01015, over 3051189.57 frames. ], batch size: 58, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:16:16,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=863606.6666666666, ans=0.0 2023-11-20 00:16:19,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=863606.6666666666, ans=0.0 2023-11-20 00:16:21,308 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129550 2023-11-20 00:16:24,205 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-20 00:16:28,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863673.3333333334, ans=0.1 2023-11-20 00:16:33,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=863673.3333333334, ans=0.125 2023-11-20 00:16:34,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=863673.3333333334, ans=0.0 2023-11-20 00:16:34,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=863673.3333333334, ans=0.02 2023-11-20 00:16:36,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.083e+01 9.100e+01 9.829e+01 1.304e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-20 00:16:39,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-11-20 00:16:41,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=863740.0, ans=0.125 2023-11-20 00:16:50,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=22.5 2023-11-20 00:17:03,651 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9350, loss[loss=0.08555, simple_loss=0.1114, pruned_loss=0.02274, audio_tagging_loss=0.007122, over 15933.00 frames. ], tot_loss[loss=0.08332, simple_loss=0.1034, pruned_loss=0.0215, audio_tagging_loss=0.01013, over 3058711.33 frames. ], batch size: 57, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:17:03,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=863873.3333333334, ans=0.2 2023-11-20 00:17:12,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=863873.3333333334, ans=0.0 2023-11-20 00:17:26,557 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129600 2023-11-20 00:17:27,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=863940.0, ans=0.035 2023-11-20 00:17:31,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=864006.6666666666, ans=0.125 2023-11-20 00:17:36,201 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:18:00,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=864140.0, ans=0.2 2023-11-20 00:18:00,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=864140.0, ans=0.0 2023-11-20 00:18:09,217 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9400, loss[loss=0.06668, simple_loss=0.08486, pruned_loss=0.01653, audio_tagging_loss=0.007728, over 15178.00 frames. ], tot_loss[loss=0.08403, simple_loss=0.1041, pruned_loss=0.02186, audio_tagging_loss=0.01013, over 3051951.61 frames. ], batch size: 58, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:18:09,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=864206.6666666666, ans=0.0 2023-11-20 00:18:32,230 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129650 2023-11-20 00:18:35,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=864340.0, ans=0.125 2023-11-20 00:18:38,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=864340.0, ans=0.02 2023-11-20 00:18:46,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.451e+01 9.430e+01 1.021e+02 1.598e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-20 00:18:48,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=22.5 2023-11-20 00:18:55,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=864406.6666666666, ans=0.0 2023-11-20 00:19:14,313 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9450, loss[loss=0.1152, simple_loss=0.147, pruned_loss=0.0346, audio_tagging_loss=0.007137, over 14505.00 frames. ], tot_loss[loss=0.08481, simple_loss=0.1048, pruned_loss=0.02219, audio_tagging_loss=0.01023, over 3053877.16 frames. ], batch size: 53, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:19:14,368 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:19:20,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=864540.0, ans=0.1 2023-11-20 00:19:35,894 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129700 2023-11-20 00:19:39,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=864673.3333333334, ans=0.125 2023-11-20 00:19:43,482 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2023-11-20 00:19:47,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=864673.3333333334, ans=0.125 2023-11-20 00:19:58,645 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2023-11-20 00:20:11,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=864806.6666666666, ans=0.125 2023-11-20 00:20:18,802 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9500, loss[loss=0.08491, simple_loss=0.11, pruned_loss=0.02, audio_tagging_loss=0.009905, over 14195.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.104, pruned_loss=0.02187, audio_tagging_loss=0.01036, over 3059551.36 frames. ], batch size: 52, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:20:20,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-11-20 00:20:36,415 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2023-11-20 00:20:40,541 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129750 2023-11-20 00:20:47,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=865006.6666666666, ans=0.125 2023-11-20 00:20:57,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.143e+01 9.051e+01 9.932e+01 1.802e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-20 00:21:01,552 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2023-11-20 00:21:15,136 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-20 00:21:19,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=865140.0, ans=0.125 2023-11-20 00:21:21,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=865140.0, ans=0.2 2023-11-20 00:21:23,878 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9550, loss[loss=0.08767, simple_loss=0.1146, pruned_loss=0.02156, audio_tagging_loss=0.008837, over 15265.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.1028, pruned_loss=0.02146, audio_tagging_loss=0.01052, over 3061271.62 frames. ], batch size: 56, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:21:40,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=865273.3333333334, ans=0.1 2023-11-20 00:21:46,582 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129800 2023-11-20 00:21:49,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865340.0, ans=0.1 2023-11-20 00:21:51,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=865340.0, ans=0.02 2023-11-20 00:21:54,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2023-11-20 00:22:15,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=865473.3333333334, ans=0.0 2023-11-20 00:22:24,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=865473.3333333334, ans=0.0 2023-11-20 00:22:29,123 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9600, loss[loss=0.07872, simple_loss=0.09296, pruned_loss=0.01923, audio_tagging_loss=0.013, over 14401.00 frames. ], tot_loss[loss=0.08381, simple_loss=0.1033, pruned_loss=0.02161, audio_tagging_loss=0.01053, over 3063501.99 frames. ], batch size: 56, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:22:36,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=865540.0, ans=0.0 2023-11-20 00:22:39,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=865540.0, ans=0.125 2023-11-20 00:22:39,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=865540.0, ans=0.1 2023-11-20 00:22:50,665 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129850 2023-11-20 00:23:05,962 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.225e+01 8.966e+01 9.703e+01 1.238e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-20 00:23:09,244 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2023-11-20 00:23:16,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=865740.0, ans=0.2 2023-11-20 00:23:19,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-20 00:23:29,462 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2023-11-20 00:23:31,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865806.6666666666, ans=0.1 2023-11-20 00:23:33,762 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9650, loss[loss=0.07554, simple_loss=0.08569, pruned_loss=0.02308, audio_tagging_loss=0.009614, over 14794.00 frames. ], tot_loss[loss=0.08343, simple_loss=0.1028, pruned_loss=0.02147, audio_tagging_loss=0.01055, over 3056496.96 frames. ], batch size: 56, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:23:35,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2023-11-20 00:23:37,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=865873.3333333334, ans=0.09899494936611666 2023-11-20 00:23:42,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=865873.3333333334, ans=0.0 2023-11-20 00:23:55,433 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129900 2023-11-20 00:23:58,102 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.407e-01 2023-11-20 00:24:16,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=866073.3333333334, ans=0.0 2023-11-20 00:24:21,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=866073.3333333334, ans=0.04949747468305833 2023-11-20 00:24:29,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=866140.0, ans=0.2 2023-11-20 00:24:37,642 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9700, loss[loss=0.09121, simple_loss=0.1207, pruned_loss=0.02485, audio_tagging_loss=0.005997, over 15152.00 frames. ], tot_loss[loss=0.08258, simple_loss=0.1021, pruned_loss=0.02113, audio_tagging_loss=0.01041, over 3054008.65 frames. ], batch size: 57, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:24:59,553 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 129950 2023-11-20 00:25:06,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=866340.0, ans=0.1 2023-11-20 00:25:14,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.210e+01 8.274e+01 8.934e+01 1.009e+02 1.297e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:25:18,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-20 00:25:24,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=866406.6666666666, ans=0.0 2023-11-20 00:25:41,616 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9750, loss[loss=0.0785, simple_loss=0.09759, pruned_loss=0.01857, audio_tagging_loss=0.01114, over 15581.00 frames. ], tot_loss[loss=0.08266, simple_loss=0.1024, pruned_loss=0.02112, audio_tagging_loss=0.01033, over 3049769.21 frames. ], batch size: 59, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:26:04,594 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130000 2023-11-20 00:26:23,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-11-20 00:26:47,951 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9800, loss[loss=0.06123, simple_loss=0.06484, pruned_loss=0.01461, audio_tagging_loss=0.0142, over 13599.00 frames. ], tot_loss[loss=0.08214, simple_loss=0.1021, pruned_loss=0.02082, audio_tagging_loss=0.01028, over 3043094.46 frames. ], batch size: 55, lr: 6.21e-03, grad_scale: 16.0 2023-11-20 00:26:57,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=866873.3333333334, ans=0.0 2023-11-20 00:27:06,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=866940.0, ans=0.125 2023-11-20 00:27:09,864 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130050 2023-11-20 00:27:12,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=867006.6666666666, ans=0.125 2023-11-20 00:27:26,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.368e+01 8.974e+01 9.697e+01 1.703e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 00:27:28,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-11-20 00:27:47,315 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:27:52,256 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9850, loss[loss=0.07595, simple_loss=0.07961, pruned_loss=0.02195, audio_tagging_loss=0.0142, over 15664.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.1018, pruned_loss=0.02092, audio_tagging_loss=0.01028, over 3043754.03 frames. ], batch size: 60, lr: 6.21e-03, grad_scale: 16.0 2023-11-20 00:28:09,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=867273.3333333334, ans=0.125 2023-11-20 00:28:11,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=867273.3333333334, ans=0.125 2023-11-20 00:28:14,644 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130100 2023-11-20 00:28:14,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=867273.3333333334, ans=0.5 2023-11-20 00:28:15,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=867273.3333333334, ans=0.04949747468305833 2023-11-20 00:28:18,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=867340.0, ans=0.2 2023-11-20 00:28:20,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=867340.0, ans=0.125 2023-11-20 00:28:47,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867473.3333333334, ans=0.1 2023-11-20 00:28:49,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867473.3333333334, ans=0.1 2023-11-20 00:28:52,783 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-20 00:28:57,190 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9900, loss[loss=0.06919, simple_loss=0.07789, pruned_loss=0.01504, audio_tagging_loss=0.0152, over 14855.00 frames. ], tot_loss[loss=0.08218, simple_loss=0.1019, pruned_loss=0.021, audio_tagging_loss=0.01023, over 3044885.22 frames. ], batch size: 57, lr: 6.20e-03, grad_scale: 16.0 2023-11-20 00:29:01,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=22.5 2023-11-20 00:29:07,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=867540.0, ans=0.125 2023-11-20 00:29:18,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=15.0 2023-11-20 00:29:20,367 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130150 2023-11-20 00:29:36,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.726e+01 8.221e+01 8.937e+01 9.593e+01 1.338e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:29:46,149 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.70 vs. limit=10.0 2023-11-20 00:30:02,327 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 9950, loss[loss=0.07526, simple_loss=0.08534, pruned_loss=0.02228, audio_tagging_loss=0.0103, over 13700.00 frames. ], tot_loss[loss=0.08222, simple_loss=0.1018, pruned_loss=0.02111, audio_tagging_loss=0.0102, over 3045329.46 frames. ], batch size: 52, lr: 6.20e-03, grad_scale: 16.0 2023-11-20 00:30:03,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=867873.3333333334, ans=0.125 2023-11-20 00:30:18,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2023-11-20 00:30:24,655 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130200 2023-11-20 00:31:07,007 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10000, loss[loss=0.1127, simple_loss=0.1471, pruned_loss=0.03247, audio_tagging_loss=0.006659, over 15359.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1017, pruned_loss=0.02094, audio_tagging_loss=0.01026, over 3041711.33 frames. ], batch size: 56, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:31:25,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=868273.3333333334, ans=0.125 2023-11-20 00:31:26,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=868273.3333333334, ans=0.0 2023-11-20 00:31:28,545 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130250 2023-11-20 00:31:40,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.42 vs. limit=22.5 2023-11-20 00:31:43,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=868340.0, ans=0.025 2023-11-20 00:31:45,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.141e+01 8.733e+01 9.527e+01 1.222e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 00:31:47,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=868406.6666666666, ans=0.0 2023-11-20 00:32:02,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2023-11-20 00:32:12,005 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10050, loss[loss=0.06892, simple_loss=0.07873, pruned_loss=0.01808, audio_tagging_loss=0.01147, over 14727.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1018, pruned_loss=0.02096, audio_tagging_loss=0.01018, over 3045241.00 frames. ], batch size: 56, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:32:18,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=868540.0, ans=0.2 2023-11-20 00:32:30,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2023-11-20 00:32:33,902 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130300 2023-11-20 00:32:37,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868673.3333333334, ans=0.1 2023-11-20 00:32:50,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=868740.0, ans=0.0 2023-11-20 00:32:56,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=868740.0, ans=0.0 2023-11-20 00:33:06,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=868806.6666666666, ans=0.0 2023-11-20 00:33:06,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-11-20 00:33:16,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=868873.3333333334, ans=0.125 2023-11-20 00:33:17,431 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10100, loss[loss=0.0817, simple_loss=0.1111, pruned_loss=0.01692, audio_tagging_loss=0.009249, over 15238.00 frames. ], tot_loss[loss=0.08193, simple_loss=0.1016, pruned_loss=0.02088, audio_tagging_loss=0.01025, over 3041017.99 frames. ], batch size: 57, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:33:25,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.59 vs. limit=10.0 2023-11-20 00:33:28,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=868940.0, ans=0.0 2023-11-20 00:33:33,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=868940.0, ans=0.125 2023-11-20 00:33:39,583 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130350 2023-11-20 00:33:40,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=868940.0, ans=0.125 2023-11-20 00:33:44,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=869006.6666666666, ans=0.2 2023-11-20 00:33:50,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=869006.6666666666, ans=0.1 2023-11-20 00:33:55,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 8.143e+01 8.992e+01 9.764e+01 1.668e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 00:34:10,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=869140.0, ans=0.0 2023-11-20 00:34:11,670 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:34:15,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=869140.0, ans=0.0 2023-11-20 00:34:21,576 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10150, loss[loss=0.08627, simple_loss=0.111, pruned_loss=0.02201, audio_tagging_loss=0.008751, over 14231.00 frames. ], tot_loss[loss=0.08305, simple_loss=0.103, pruned_loss=0.0213, audio_tagging_loss=0.01026, over 3040287.31 frames. ], batch size: 55, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:34:30,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=869206.6666666666, ans=0.1 2023-11-20 00:34:43,473 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130400 2023-11-20 00:34:43,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=869273.3333333334, ans=0.125 2023-11-20 00:34:44,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=869273.3333333334, ans=10.0 2023-11-20 00:34:51,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=869340.0, ans=0.2 2023-11-20 00:34:53,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=869340.0, ans=0.0 2023-11-20 00:34:54,967 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:35:12,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=869473.3333333334, ans=0.1 2023-11-20 00:35:21,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=869473.3333333334, ans=0.125 2023-11-20 00:35:27,262 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10200, loss[loss=0.1034, simple_loss=0.1214, pruned_loss=0.03267, audio_tagging_loss=0.01005, over 16079.00 frames. ], tot_loss[loss=0.0833, simple_loss=0.1029, pruned_loss=0.02136, audio_tagging_loss=0.01048, over 3037167.06 frames. ], batch size: 61, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:35:48,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=22.5 2023-11-20 00:35:49,355 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130450 2023-11-20 00:35:50,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=869606.6666666666, ans=0.125 2023-11-20 00:35:53,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=869673.3333333334, ans=0.0 2023-11-20 00:35:54,308 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:35:54,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=869673.3333333334, ans=0.0 2023-11-20 00:35:58,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=15.0 2023-11-20 00:36:06,674 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.250e+01 8.852e+01 1.003e+02 1.443e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 00:36:14,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=869740.0, ans=0.125 2023-11-20 00:36:19,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=869806.6666666666, ans=0.025 2023-11-20 00:36:32,662 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10250, loss[loss=0.1125, simple_loss=0.1293, pruned_loss=0.03457, audio_tagging_loss=0.01329, over 16024.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1042, pruned_loss=0.02162, audio_tagging_loss=0.01048, over 3051060.71 frames. ], batch size: 60, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:36:35,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=869873.3333333334, ans=0.0 2023-11-20 00:36:47,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=869940.0, ans=0.0 2023-11-20 00:36:53,525 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130500 2023-11-20 00:37:04,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2023-11-20 00:37:05,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=870006.6666666666, ans=0.1 2023-11-20 00:37:06,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=870006.6666666666, ans=0.125 2023-11-20 00:37:28,166 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:37:30,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=870140.0, ans=0.0 2023-11-20 00:37:36,723 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10300, loss[loss=0.08454, simple_loss=0.09813, pruned_loss=0.02312, audio_tagging_loss=0.01236, over 15912.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1029, pruned_loss=0.0213, audio_tagging_loss=0.0106, over 3054649.51 frames. ], batch size: 59, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:37:43,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=870206.6666666666, ans=0.0 2023-11-20 00:37:49,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=870273.3333333334, ans=0.125 2023-11-20 00:37:58,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.56 vs. limit=10.0 2023-11-20 00:37:59,505 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130550 2023-11-20 00:38:13,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=870340.0, ans=0.1 2023-11-20 00:38:16,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.334e+01 9.071e+01 9.729e+01 1.396e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-20 00:38:24,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-11-20 00:38:27,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=870406.6666666666, ans=0.2 2023-11-20 00:38:34,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870473.3333333334, ans=0.1 2023-11-20 00:38:42,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-11-20 00:38:42,751 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10350, loss[loss=0.07911, simple_loss=0.09375, pruned_loss=0.02029, audio_tagging_loss=0.01194, over 14440.00 frames. ], tot_loss[loss=0.08431, simple_loss=0.104, pruned_loss=0.02166, audio_tagging_loss=0.01066, over 3054505.84 frames. ], batch size: 57, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:38:56,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=870606.6666666666, ans=0.1 2023-11-20 00:38:57,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870606.6666666666, ans=0.1 2023-11-20 00:39:05,161 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130600 2023-11-20 00:39:30,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=870740.0, ans=0.1 2023-11-20 00:39:41,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=870806.6666666666, ans=0.2 2023-11-20 00:39:47,660 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10400, loss[loss=0.1019, simple_loss=0.1263, pruned_loss=0.02964, audio_tagging_loss=0.009061, over 15422.00 frames. ], tot_loss[loss=0.08368, simple_loss=0.1029, pruned_loss=0.02148, audio_tagging_loss=0.01078, over 3045472.82 frames. ], batch size: 53, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:39:55,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=870873.3333333334, ans=0.125 2023-11-20 00:40:09,398 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130650 2023-11-20 00:40:11,036 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2023-11-20 00:40:26,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.319e+01 9.012e+01 9.651e+01 1.388e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 00:40:33,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=871073.3333333334, ans=0.125 2023-11-20 00:40:34,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=871073.3333333334, ans=0.0 2023-11-20 00:40:46,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=871140.0, ans=0.125 2023-11-20 00:40:47,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=871140.0, ans=0.0 2023-11-20 00:40:52,083 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10450, loss[loss=0.0998, simple_loss=0.13, pruned_loss=0.02638, audio_tagging_loss=0.00842, over 14633.00 frames. ], tot_loss[loss=0.08382, simple_loss=0.1032, pruned_loss=0.02159, audio_tagging_loss=0.01063, over 3040711.56 frames. ], batch size: 53, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:40:53,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=871206.6666666666, ans=0.2 2023-11-20 00:40:57,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=871206.6666666666, ans=0.125 2023-11-20 00:41:06,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=871273.3333333334, ans=0.07 2023-11-20 00:41:14,130 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130700 2023-11-20 00:41:22,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=871340.0, ans=0.125 2023-11-20 00:41:30,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=871406.6666666666, ans=0.0 2023-11-20 00:41:31,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871406.6666666666, ans=0.1 2023-11-20 00:41:34,477 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:41:34,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=871406.6666666666, ans=0.0 2023-11-20 00:41:49,384 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=22.5 2023-11-20 00:41:53,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=871473.3333333334, ans=0.0 2023-11-20 00:41:55,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=871540.0, ans=0.2 2023-11-20 00:41:56,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=12.0 2023-11-20 00:41:56,757 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10500, loss[loss=0.05803, simple_loss=0.06817, pruned_loss=0.01489, audio_tagging_loss=0.009048, over 14308.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.1026, pruned_loss=0.02144, audio_tagging_loss=0.01048, over 3034465.27 frames. ], batch size: 54, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:42:18,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=871606.6666666666, ans=0.0 2023-11-20 00:42:19,551 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130750 2023-11-20 00:42:29,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=871673.3333333334, ans=0.07 2023-11-20 00:42:35,581 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.359e+01 9.035e+01 1.000e+02 1.181e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 00:42:48,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=871806.6666666666, ans=0.0 2023-11-20 00:43:01,876 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10550, loss[loss=0.08874, simple_loss=0.09951, pruned_loss=0.02383, audio_tagging_loss=0.01516, over 15375.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1033, pruned_loss=0.02162, audio_tagging_loss=0.01026, over 3029252.42 frames. ], batch size: 60, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:43:03,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=871873.3333333334, ans=0.0 2023-11-20 00:43:16,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=871940.0, ans=0.0 2023-11-20 00:43:17,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=871940.0, ans=0.2 2023-11-20 00:43:23,343 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130800 2023-11-20 00:43:27,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=872006.6666666666, ans=0.2 2023-11-20 00:43:28,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=872006.6666666666, ans=0.035 2023-11-20 00:43:59,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=872140.0, ans=0.2 2023-11-20 00:44:02,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=872140.0, ans=0.125 2023-11-20 00:44:06,390 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10600, loss[loss=0.07445, simple_loss=0.08929, pruned_loss=0.01691, audio_tagging_loss=0.0129, over 15272.00 frames. ], tot_loss[loss=0.08317, simple_loss=0.1032, pruned_loss=0.02148, audio_tagging_loss=0.0101, over 3037052.35 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:44:14,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=872206.6666666666, ans=0.125 2023-11-20 00:44:27,773 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130850 2023-11-20 00:44:27,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=872273.3333333334, ans=0.1 2023-11-20 00:44:38,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=872340.0, ans=0.125 2023-11-20 00:44:39,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=872340.0, ans=22.5 2023-11-20 00:44:46,878 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.153e+01 8.252e+01 9.029e+01 9.863e+01 1.267e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 00:44:47,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=872406.6666666666, ans=0.0 2023-11-20 00:44:47,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=872406.6666666666, ans=15.0 2023-11-20 00:45:04,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=872473.3333333334, ans=0.0 2023-11-20 00:45:10,727 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10650, loss[loss=0.0881, simple_loss=0.1184, pruned_loss=0.02335, audio_tagging_loss=0.005548, over 14753.00 frames. ], tot_loss[loss=0.08317, simple_loss=0.1033, pruned_loss=0.0214, audio_tagging_loss=0.01011, over 3042997.61 frames. ], batch size: 56, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:45:32,913 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130900 2023-11-20 00:45:40,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=872673.3333333334, ans=0.2 2023-11-20 00:46:08,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872806.6666666666, ans=0.125 2023-11-20 00:46:14,889 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10700, loss[loss=0.08665, simple_loss=0.1117, pruned_loss=0.02264, audio_tagging_loss=0.008161, over 15362.00 frames. ], tot_loss[loss=0.08316, simple_loss=0.1033, pruned_loss=0.02138, audio_tagging_loss=0.01013, over 3040959.78 frames. ], batch size: 57, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:46:37,377 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 130950 2023-11-20 00:46:37,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=872940.0, ans=0.125 2023-11-20 00:46:45,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=12.0 2023-11-20 00:46:55,061 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.315e+01 9.053e+01 9.865e+01 1.273e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-20 00:47:02,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=873073.3333333334, ans=0.2 2023-11-20 00:47:02,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=873073.3333333334, ans=0.125 2023-11-20 00:47:02,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=873073.3333333334, ans=0.1 2023-11-20 00:47:10,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=873140.0, ans=0.125 2023-11-20 00:47:20,612 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10750, loss[loss=0.06004, simple_loss=0.06348, pruned_loss=0.01722, audio_tagging_loss=0.01109, over 14744.00 frames. ], tot_loss[loss=0.08352, simple_loss=0.1034, pruned_loss=0.02175, audio_tagging_loss=0.01004, over 3045513.67 frames. ], batch size: 57, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:47:41,969 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131000 2023-11-20 00:47:42,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873273.3333333334, ans=0.1 2023-11-20 00:48:13,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873473.3333333334, ans=0.1 2023-11-20 00:48:15,274 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=12.0 2023-11-20 00:48:17,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=873473.3333333334, ans=0.125 2023-11-20 00:48:24,193 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10800, loss[loss=0.0874, simple_loss=0.1145, pruned_loss=0.02205, audio_tagging_loss=0.008106, over 16576.00 frames. ], tot_loss[loss=0.08296, simple_loss=0.1028, pruned_loss=0.02149, audio_tagging_loss=0.01007, over 3050038.65 frames. ], batch size: 61, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:48:34,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=873540.0, ans=0.2 2023-11-20 00:48:46,058 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131050 2023-11-20 00:48:47,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=873606.6666666666, ans=0.125 2023-11-20 00:48:53,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=873673.3333333334, ans=0.05 2023-11-20 00:48:58,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=873673.3333333334, ans=0.07 2023-11-20 00:49:05,231 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.697e+01 8.209e+01 8.933e+01 9.655e+01 1.364e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:49:05,971 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2023-11-20 00:49:11,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=873740.0, ans=0.0 2023-11-20 00:49:13,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2023-11-20 00:49:15,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=873806.6666666666, ans=0.035 2023-11-20 00:49:27,999 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10850, loss[loss=0.07255, simple_loss=0.08414, pruned_loss=0.01944, audio_tagging_loss=0.01104, over 15315.00 frames. ], tot_loss[loss=0.08263, simple_loss=0.1023, pruned_loss=0.02131, audio_tagging_loss=0.01017, over 3049006.17 frames. ], batch size: 60, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:49:50,577 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131100 2023-11-20 00:49:58,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=874006.6666666666, ans=0.125 2023-11-20 00:50:32,919 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10900, loss[loss=0.08991, simple_loss=0.1091, pruned_loss=0.0276, audio_tagging_loss=0.007778, over 15251.00 frames. ], tot_loss[loss=0.08229, simple_loss=0.1015, pruned_loss=0.02135, audio_tagging_loss=0.01021, over 3043824.68 frames. ], batch size: 56, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:50:32,957 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:50:36,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=874206.6666666666, ans=0.1 2023-11-20 00:50:42,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=874206.6666666666, ans=0.125 2023-11-20 00:50:52,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=874273.3333333334, ans=0.0 2023-11-20 00:50:54,861 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131150 2023-11-20 00:51:01,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=874340.0, ans=0.125 2023-11-20 00:51:07,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=874340.0, ans=0.125 2023-11-20 00:51:13,345 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.251e+01 8.753e+01 9.767e+01 1.236e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 00:51:19,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=874406.6666666666, ans=0.125 2023-11-20 00:51:36,273 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 10950, loss[loss=0.06852, simple_loss=0.09161, pruned_loss=0.01366, audio_tagging_loss=0.009054, over 15332.00 frames. ], tot_loss[loss=0.08228, simple_loss=0.1015, pruned_loss=0.0213, audio_tagging_loss=0.01021, over 3045561.89 frames. ], batch size: 57, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:51:48,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=874606.6666666666, ans=0.2 2023-11-20 00:51:58,456 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131200 2023-11-20 00:52:11,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-11-20 00:52:23,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2023-11-20 00:52:30,967 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-20 00:52:41,620 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11000, loss[loss=0.07399, simple_loss=0.08702, pruned_loss=0.01902, audio_tagging_loss=0.01146, over 14530.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1014, pruned_loss=0.021, audio_tagging_loss=0.01029, over 3047404.49 frames. ], batch size: 56, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:52:52,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=874873.3333333334, ans=0.0 2023-11-20 00:52:56,457 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:52:58,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-20 00:53:04,400 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131250 2023-11-20 00:53:14,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=875006.6666666666, ans=0.0 2023-11-20 00:53:20,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=875073.3333333334, ans=0.0 2023-11-20 00:53:23,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.049e+01 8.869e+01 9.421e+01 1.178e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 00:53:34,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=875140.0, ans=0.125 2023-11-20 00:53:37,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=875140.0, ans=0.0 2023-11-20 00:53:38,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=875140.0, ans=0.125 2023-11-20 00:53:46,579 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11050, loss[loss=0.09679, simple_loss=0.1168, pruned_loss=0.02918, audio_tagging_loss=0.009211, over 15057.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1018, pruned_loss=0.02115, audio_tagging_loss=0.01036, over 3047351.10 frames. ], batch size: 56, lr: 6.18e-03, grad_scale: 8.0 2023-11-20 00:53:50,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=875206.6666666666, ans=0.0 2023-11-20 00:53:57,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=875206.6666666666, ans=0.0 2023-11-20 00:54:00,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.40 vs. limit=5.0 2023-11-20 00:54:04,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=875273.3333333334, ans=0.125 2023-11-20 00:54:08,901 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131300 2023-11-20 00:54:25,482 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2023-11-20 00:54:27,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-20 00:54:32,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=875406.6666666666, ans=0.125 2023-11-20 00:54:39,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=875473.3333333334, ans=0.035 2023-11-20 00:54:47,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=875473.3333333334, ans=0.125 2023-11-20 00:54:49,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=875540.0, ans=0.1 2023-11-20 00:54:51,022 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11100, loss[loss=0.07101, simple_loss=0.09126, pruned_loss=0.0154, audio_tagging_loss=0.009984, over 14718.00 frames. ], tot_loss[loss=0.08233, simple_loss=0.1015, pruned_loss=0.02114, audio_tagging_loss=0.01041, over 3046881.22 frames. ], batch size: 56, lr: 6.18e-03, grad_scale: 8.0 2023-11-20 00:55:06,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2023-11-20 00:55:11,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=875606.6666666666, ans=0.125 2023-11-20 00:55:12,460 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131350 2023-11-20 00:55:33,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.391e+01 9.002e+01 9.833e+01 1.655e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 00:55:36,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=875740.0, ans=0.2 2023-11-20 00:55:39,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=875740.0, ans=0.05 2023-11-20 00:55:55,302 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11150, loss[loss=0.07582, simple_loss=0.09363, pruned_loss=0.01892, audio_tagging_loss=0.01009, over 13999.00 frames. ], tot_loss[loss=0.08267, simple_loss=0.102, pruned_loss=0.02123, audio_tagging_loss=0.01045, over 3047735.65 frames. ], batch size: 52, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 00:55:56,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=875873.3333333334, ans=0.125 2023-11-20 00:56:17,355 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131400 2023-11-20 00:56:21,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=876006.6666666666, ans=0.125 2023-11-20 00:56:29,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=876006.6666666666, ans=0.0 2023-11-20 00:56:29,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=876006.6666666666, ans=0.125 2023-11-20 00:56:45,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=876073.3333333334, ans=0.125 2023-11-20 00:56:57,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=876140.0, ans=0.125 2023-11-20 00:56:57,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=876140.0, ans=0.1 2023-11-20 00:57:00,006 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11200, loss[loss=0.06269, simple_loss=0.077, pruned_loss=0.01119, audio_tagging_loss=0.013, over 15442.00 frames. ], tot_loss[loss=0.0829, simple_loss=0.1022, pruned_loss=0.02123, audio_tagging_loss=0.01057, over 3055879.65 frames. ], batch size: 57, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:57:09,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-11-20 00:57:22,046 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131450 2023-11-20 00:57:42,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.062e+01 8.697e+01 9.573e+01 1.140e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 00:58:04,928 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11250, loss[loss=0.0738, simple_loss=0.09595, pruned_loss=0.01692, audio_tagging_loss=0.008901, over 15592.00 frames. ], tot_loss[loss=0.08347, simple_loss=0.1029, pruned_loss=0.02153, audio_tagging_loss=0.01047, over 3057453.42 frames. ], batch size: 59, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:58:26,617 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131500 2023-11-20 00:58:28,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-20 00:58:32,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=876673.3333333334, ans=0.1 2023-11-20 00:58:36,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=876673.3333333334, ans=0.125 2023-11-20 00:59:04,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=876806.6666666666, ans=0.025 2023-11-20 00:59:08,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=876873.3333333334, ans=0.2 2023-11-20 00:59:09,637 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11300, loss[loss=0.09622, simple_loss=0.1257, pruned_loss=0.02239, audio_tagging_loss=0.011, over 14285.00 frames. ], tot_loss[loss=0.08381, simple_loss=0.1036, pruned_loss=0.02164, audio_tagging_loss=0.01035, over 3059635.07 frames. ], batch size: 54, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:59:31,657 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131550 2023-11-20 00:59:34,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=877006.6666666666, ans=0.125 2023-11-20 00:59:36,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=877006.6666666666, ans=0.0 2023-11-20 00:59:40,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877006.6666666666, ans=0.1 2023-11-20 00:59:47,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=877073.3333333334, ans=0.2 2023-11-20 00:59:53,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.098e+01 8.659e+01 9.613e+01 1.705e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 00:59:59,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-20 01:00:14,035 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11350, loss[loss=0.09754, simple_loss=0.1283, pruned_loss=0.02214, audio_tagging_loss=0.01124, over 16233.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1045, pruned_loss=0.02179, audio_tagging_loss=0.0103, over 3063664.17 frames. ], batch size: 61, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:00:15,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=877206.6666666666, ans=0.05 2023-11-20 01:00:35,833 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131600 2023-11-20 01:00:48,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=877340.0, ans=0.125 2023-11-20 01:00:54,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=877406.6666666666, ans=0.0 2023-11-20 01:01:04,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=877406.6666666666, ans=0.0 2023-11-20 01:01:18,981 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11400, loss[loss=0.07502, simple_loss=0.09099, pruned_loss=0.02252, audio_tagging_loss=0.007012, over 14671.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1031, pruned_loss=0.02153, audio_tagging_loss=0.01017, over 3055596.68 frames. ], batch size: 55, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:01:40,749 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131650 2023-11-20 01:01:47,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=877673.3333333334, ans=0.125 2023-11-20 01:02:02,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.241e+01 9.056e+01 1.011e+02 3.989e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-20 01:02:17,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877806.6666666666, ans=0.1 2023-11-20 01:02:23,658 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11450, loss[loss=0.09717, simple_loss=0.1268, pruned_loss=0.02685, audio_tagging_loss=0.006938, over 14767.00 frames. ], tot_loss[loss=0.08306, simple_loss=0.1032, pruned_loss=0.02145, audio_tagging_loss=0.01001, over 3056400.57 frames. ], batch size: 55, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:02:39,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=877940.0, ans=15.0 2023-11-20 01:02:45,845 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131700 2023-11-20 01:02:48,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2023-11-20 01:03:25,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=878140.0, ans=0.125 2023-11-20 01:03:27,749 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11500, loss[loss=0.1015, simple_loss=0.1302, pruned_loss=0.02711, audio_tagging_loss=0.009268, over 15312.00 frames. ], tot_loss[loss=0.08354, simple_loss=0.1036, pruned_loss=0.02171, audio_tagging_loss=0.01004, over 3049896.34 frames. ], batch size: 55, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:03:48,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=878273.3333333334, ans=0.0 2023-11-20 01:03:49,221 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131750 2023-11-20 01:04:11,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.409e+01 8.478e+01 9.308e+01 1.005e+02 1.725e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-20 01:04:20,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=878473.3333333334, ans=0.0 2023-11-20 01:04:22,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=878473.3333333334, ans=0.0 2023-11-20 01:04:31,795 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11550, loss[loss=0.07835, simple_loss=0.09629, pruned_loss=0.02192, audio_tagging_loss=0.008281, over 15164.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.104, pruned_loss=0.02175, audio_tagging_loss=0.009966, over 3051819.25 frames. ], batch size: 56, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:04:43,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=878606.6666666666, ans=0.0 2023-11-20 01:04:48,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=878606.6666666666, ans=0.0 2023-11-20 01:04:54,031 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131800 2023-11-20 01:04:59,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=878673.3333333334, ans=0.125 2023-11-20 01:05:15,153 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:05:17,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=878740.0, ans=0.125 2023-11-20 01:05:21,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=878740.0, ans=0.2 2023-11-20 01:05:36,272 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11600, loss[loss=0.1013, simple_loss=0.1356, pruned_loss=0.02466, audio_tagging_loss=0.008776, over 16182.00 frames. ], tot_loss[loss=0.08383, simple_loss=0.1043, pruned_loss=0.0217, audio_tagging_loss=0.00997, over 3048393.92 frames. ], batch size: 57, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:05:40,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=878873.3333333334, ans=0.1 2023-11-20 01:05:48,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=878940.0, ans=0.125 2023-11-20 01:05:58,406 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131850 2023-11-20 01:06:14,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-11-20 01:06:19,907 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.090e+01 8.633e+01 9.438e+01 1.426e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 01:06:24,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=879073.3333333334, ans=0.5 2023-11-20 01:06:30,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=879140.0, ans=0.125 2023-11-20 01:06:40,890 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11650, loss[loss=0.1055, simple_loss=0.1283, pruned_loss=0.0295, audio_tagging_loss=0.01183, over 16082.00 frames. ], tot_loss[loss=0.08378, simple_loss=0.104, pruned_loss=0.02171, audio_tagging_loss=0.01005, over 3047731.65 frames. ], batch size: 61, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:06:43,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879206.6666666666, ans=0.1 2023-11-20 01:07:03,000 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131900 2023-11-20 01:07:12,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2023-11-20 01:07:33,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=879473.3333333334, ans=0.5 2023-11-20 01:07:35,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879473.3333333334, ans=0.1 2023-11-20 01:07:36,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=879473.3333333334, ans=0.0 2023-11-20 01:07:43,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2023-11-20 01:07:44,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2023-11-20 01:07:45,862 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11700, loss[loss=0.07287, simple_loss=0.07865, pruned_loss=0.02119, audio_tagging_loss=0.01235, over 15196.00 frames. ], tot_loss[loss=0.08357, simple_loss=0.1036, pruned_loss=0.02166, audio_tagging_loss=0.01012, over 3050431.32 frames. ], batch size: 59, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:08:00,689 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:08:07,437 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 131950 2023-11-20 01:08:13,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=879673.3333333334, ans=0.1 2023-11-20 01:08:30,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.385e+01 9.019e+01 1.002e+02 1.324e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 01:08:38,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-20 01:08:41,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=879806.6666666666, ans=0.025 2023-11-20 01:08:44,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2023-11-20 01:08:49,736 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11750, loss[loss=0.07283, simple_loss=0.08693, pruned_loss=0.01589, audio_tagging_loss=0.01348, over 15082.00 frames. ], tot_loss[loss=0.08356, simple_loss=0.1035, pruned_loss=0.02171, audio_tagging_loss=0.01011, over 3042655.10 frames. ], batch size: 60, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:08:54,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879873.3333333334, ans=0.1 2023-11-20 01:08:54,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=879873.3333333334, ans=0.2 2023-11-20 01:09:04,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=879940.0, ans=0.0 2023-11-20 01:09:07,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=879940.0, ans=0.05 2023-11-20 01:09:12,291 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132000 2023-11-20 01:09:37,462 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:09:44,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=880140.0, ans=0.125 2023-11-20 01:09:58,230 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11800, loss[loss=0.07426, simple_loss=0.08673, pruned_loss=0.02049, audio_tagging_loss=0.0104, over 15380.00 frames. ], tot_loss[loss=0.08356, simple_loss=0.1034, pruned_loss=0.02172, audio_tagging_loss=0.01015, over 3041495.61 frames. ], batch size: 57, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:10:08,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=880206.6666666666, ans=0.125 2023-11-20 01:10:20,451 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132050 2023-11-20 01:10:33,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=880340.0, ans=0.0 2023-11-20 01:10:39,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=880406.6666666666, ans=0.125 2023-11-20 01:10:41,796 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.587e+01 9.267e+01 9.920e+01 1.513e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-20 01:10:42,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880406.6666666666, ans=0.1 2023-11-20 01:11:03,540 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11850, loss[loss=0.0964, simple_loss=0.1198, pruned_loss=0.02347, audio_tagging_loss=0.01304, over 16549.00 frames. ], tot_loss[loss=0.08383, simple_loss=0.1037, pruned_loss=0.02175, audio_tagging_loss=0.01025, over 3047098.93 frames. ], batch size: 61, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:11:07,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=880540.0, ans=0.125 2023-11-20 01:11:09,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=880540.0, ans=0.0 2023-11-20 01:11:09,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=880540.0, ans=0.2 2023-11-20 01:11:24,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=880606.6666666666, ans=0.125 2023-11-20 01:11:24,969 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132100 2023-11-20 01:11:28,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=880673.3333333334, ans=0.95 2023-11-20 01:11:38,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-20 01:12:06,500 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11900, loss[loss=0.07926, simple_loss=0.09351, pruned_loss=0.02151, audio_tagging_loss=0.011, over 16923.00 frames. ], tot_loss[loss=0.08446, simple_loss=0.1048, pruned_loss=0.02179, audio_tagging_loss=0.01029, over 3048306.13 frames. ], batch size: 62, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:12:14,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=880873.3333333334, ans=0.125 2023-11-20 01:12:22,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=22.5 2023-11-20 01:12:28,406 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132150 2023-11-20 01:12:50,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.333e+01 8.992e+01 9.854e+01 1.328e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 01:12:55,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=881073.3333333334, ans=0.125 2023-11-20 01:12:59,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881140.0, ans=0.1 2023-11-20 01:12:59,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=881140.0, ans=0.05 2023-11-20 01:13:02,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=881140.0, ans=0.0 2023-11-20 01:13:02,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2023-11-20 01:13:10,601 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 11950, loss[loss=0.06945, simple_loss=0.08415, pruned_loss=0.01657, audio_tagging_loss=0.01081, over 16157.00 frames. ], tot_loss[loss=0.08363, simple_loss=0.1033, pruned_loss=0.02148, audio_tagging_loss=0.0105, over 3044670.65 frames. ], batch size: 60, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:13:18,665 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:13:33,298 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132200 2023-11-20 01:13:47,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=881340.0, ans=0.0 2023-11-20 01:13:52,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=881406.6666666666, ans=0.125 2023-11-20 01:14:00,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=881473.3333333334, ans=0.2 2023-11-20 01:14:05,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=881473.3333333334, ans=0.2 2023-11-20 01:14:10,715 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-20 01:14:13,593 INFO [train_asr.py:1262] (2/4) Epoch 11, batch 12000, loss[loss=0.1138, simple_loss=0.1488, pruned_loss=0.02721, audio_tagging_loss=0.01214, over 16587.00 frames. ], tot_loss[loss=0.08336, simple_loss=0.103, pruned_loss=0.02138, audio_tagging_loss=0.01049, over 3042781.06 frames. ], batch size: 57, lr: 6.15e-03, grad_scale: 32.0 2023-11-20 01:14:13,594 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 01:14:51,512 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5304, 3.7499, 4.3181, 3.4263], device='cuda:2') 2023-11-20 01:14:57,672 INFO [train_asr.py:1294] (2/4) Epoch 11, validation: loss=0.06362, simple_loss=0.05468, pruned_loss=0.006127, audio_tagging_loss=0.03015, over 4681554.00 frames. 2023-11-20 01:14:57,672 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 01:15:08,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=881606.6666666666, ans=0.125 2023-11-20 01:15:15,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=881606.6666666666, ans=0.125 2023-11-20 01:15:17,691 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132250 2023-11-20 01:16:05,096 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 0, loss[loss=0.09962, simple_loss=0.1075, pruned_loss=0.02349, audio_tagging_loss=0.02238, over 16024.00 frames. ], tot_loss[loss=0.09962, simple_loss=0.1075, pruned_loss=0.02349, audio_tagging_loss=0.02238, over 16024.00 frames. ], batch size: 57, lr: 5.90e-03, grad_scale: 32.0 2023-11-20 01:16:05,097 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 01:16:34,822 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5144, 2.7527, 3.7926, 3.2297], device='cuda:2') 2023-11-20 01:16:42,317 INFO [train_asr.py:1294] (2/4) Epoch 12, validation: loss=0.06246, simple_loss=0.05467, pruned_loss=0.006079, audio_tagging_loss=0.02904, over 4681554.00 frames. 2023-11-20 01:16:42,318 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 01:16:42,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=881720.0, ans=0.0 2023-11-20 01:16:51,495 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.342e+01 8.202e+01 8.941e+01 9.888e+01 1.289e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 01:16:59,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=881786.6666666666, ans=0.125 2023-11-20 01:17:16,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=881853.3333333334, ans=0.125 2023-11-20 01:17:34,462 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132300 2023-11-20 01:17:41,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=881986.6666666666, ans=0.125 2023-11-20 01:17:47,172 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 50, loss[loss=0.08975, simple_loss=0.1006, pruned_loss=0.02129, audio_tagging_loss=0.01814, over 16170.00 frames. ], tot_loss[loss=0.09111, simple_loss=0.1005, pruned_loss=0.0205, audio_tagging_loss=0.02035, over 687608.77 frames. ], batch size: 60, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:18:00,926 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:18:10,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=882120.0, ans=0.125 2023-11-20 01:18:30,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=882253.3333333334, ans=0.2 2023-11-20 01:18:39,737 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132350 2023-11-20 01:18:41,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=882320.0, ans=0.2 2023-11-20 01:18:47,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=882320.0, ans=0.125 2023-11-20 01:18:52,567 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 100, loss[loss=0.06883, simple_loss=0.07267, pruned_loss=0.01364, audio_tagging_loss=0.01885, over 13590.00 frames. ], tot_loss[loss=0.09074, simple_loss=0.1014, pruned_loss=0.0208, audio_tagging_loss=0.01924, over 1204208.77 frames. ], batch size: 54, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:18:55,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=882386.6666666666, ans=0.1 2023-11-20 01:18:59,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=882386.6666666666, ans=0.025 2023-11-20 01:19:01,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 8.794e+01 9.349e+01 1.020e+02 1.692e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-20 01:19:13,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=882453.3333333334, ans=0.125 2023-11-20 01:19:24,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2023-11-20 01:19:29,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=882520.0, ans=0.2 2023-11-20 01:19:44,818 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132400 2023-11-20 01:19:57,297 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 150, loss[loss=0.09622, simple_loss=0.1178, pruned_loss=0.02481, audio_tagging_loss=0.01248, over 15501.00 frames. ], tot_loss[loss=0.08978, simple_loss=0.1032, pruned_loss=0.02109, audio_tagging_loss=0.01708, over 1610543.27 frames. ], batch size: 58, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:20:18,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=882786.6666666666, ans=0.04949747468305833 2023-11-20 01:20:27,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=882853.3333333334, ans=0.125 2023-11-20 01:20:39,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=882920.0, ans=0.07 2023-11-20 01:20:49,219 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132450 2023-11-20 01:20:53,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=882986.6666666666, ans=0.125 2023-11-20 01:21:02,283 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 200, loss[loss=0.08302, simple_loss=0.1022, pruned_loss=0.02142, audio_tagging_loss=0.01052, over 15356.00 frames. ], tot_loss[loss=0.08707, simple_loss=0.1022, pruned_loss=0.02089, audio_tagging_loss=0.0151, over 1927910.06 frames. ], batch size: 55, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:21:11,575 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.201e+01 8.761e+01 9.540e+01 1.328e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-20 01:21:32,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=883186.6666666666, ans=0.2 2023-11-20 01:21:48,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=883253.3333333334, ans=0.0 2023-11-20 01:21:50,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=883253.3333333334, ans=0.125 2023-11-20 01:21:54,047 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132500 2023-11-20 01:22:06,768 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 250, loss[loss=0.06424, simple_loss=0.07825, pruned_loss=0.01682, audio_tagging_loss=0.008293, over 14892.00 frames. ], tot_loss[loss=0.08608, simple_loss=0.1027, pruned_loss=0.02112, audio_tagging_loss=0.01361, over 2172340.80 frames. ], batch size: 59, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:22:16,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2023-11-20 01:22:25,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=883453.3333333334, ans=0.2 2023-11-20 01:22:30,379 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-20 01:22:49,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=883586.6666666666, ans=0.1 2023-11-20 01:22:50,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=883586.6666666666, ans=0.125 2023-11-20 01:22:58,128 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132550 2023-11-20 01:23:11,536 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 300, loss[loss=0.1128, simple_loss=0.1425, pruned_loss=0.03494, audio_tagging_loss=0.006579, over 15294.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.104, pruned_loss=0.02147, audio_tagging_loss=0.0125, over 2364561.07 frames. ], batch size: 55, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:23:20,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.204e+01 9.028e+01 9.850e+01 1.789e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-20 01:23:37,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=883853.3333333334, ans=0.5 2023-11-20 01:23:45,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=883853.3333333334, ans=0.125 2023-11-20 01:23:46,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=883853.3333333334, ans=0.0 2023-11-20 01:23:53,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2023-11-20 01:24:00,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=883920.0, ans=0.09899494936611666 2023-11-20 01:24:03,121 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132600 2023-11-20 01:24:16,365 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 350, loss[loss=0.07832, simple_loss=0.08661, pruned_loss=0.02319, audio_tagging_loss=0.01182, over 14496.00 frames. ], tot_loss[loss=0.08513, simple_loss=0.1041, pruned_loss=0.02133, audio_tagging_loss=0.01174, over 2521817.18 frames. ], batch size: 57, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:24:46,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2023-11-20 01:24:58,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884253.3333333334, ans=0.1 2023-11-20 01:25:08,198 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132650 2023-11-20 01:25:21,030 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 400, loss[loss=0.09574, simple_loss=0.1169, pruned_loss=0.02511, audio_tagging_loss=0.01217, over 14723.00 frames. ], tot_loss[loss=0.08455, simple_loss=0.1041, pruned_loss=0.02127, audio_tagging_loss=0.01125, over 2639297.42 frames. ], batch size: 54, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:25:30,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.352e+01 8.151e+01 8.736e+01 9.522e+01 1.340e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 01:25:51,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-11-20 01:26:03,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=884586.6666666666, ans=0.2 2023-11-20 01:26:13,091 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132700 2023-11-20 01:26:26,604 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 450, loss[loss=0.09147, simple_loss=0.1043, pruned_loss=0.0265, audio_tagging_loss=0.01281, over 14732.00 frames. ], tot_loss[loss=0.08408, simple_loss=0.1037, pruned_loss=0.02122, audio_tagging_loss=0.011, over 2728543.27 frames. ], batch size: 55, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:26:33,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=884720.0, ans=0.125 2023-11-20 01:26:54,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2023-11-20 01:27:17,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=884986.6666666666, ans=0.025 2023-11-20 01:27:18,805 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132750 2023-11-20 01:27:24,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=884986.6666666666, ans=0.125 2023-11-20 01:27:25,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=884986.6666666666, ans=0.0 2023-11-20 01:27:31,599 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 500, loss[loss=0.0642, simple_loss=0.07894, pruned_loss=0.01529, audio_tagging_loss=0.009439, over 15782.00 frames. ], tot_loss[loss=0.08395, simple_loss=0.1039, pruned_loss=0.02121, audio_tagging_loss=0.01078, over 2806867.68 frames. ], batch size: 60, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:27:34,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2023-11-20 01:27:40,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.508e+01 8.200e+01 8.678e+01 9.366e+01 1.155e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 01:27:57,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=885186.6666666666, ans=0.0 2023-11-20 01:27:59,034 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:28:21,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=885253.3333333334, ans=0.125 2023-11-20 01:28:23,584 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132800 2023-11-20 01:28:36,886 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 550, loss[loss=0.06548, simple_loss=0.08268, pruned_loss=0.01479, audio_tagging_loss=0.00935, over 14316.00 frames. ], tot_loss[loss=0.08374, simple_loss=0.1037, pruned_loss=0.02121, audio_tagging_loss=0.01066, over 2861871.36 frames. ], batch size: 55, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:28:39,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-20 01:28:47,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=885386.6666666666, ans=0.1 2023-11-20 01:28:58,451 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:29:01,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=885520.0, ans=0.125 2023-11-20 01:29:06,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-11-20 01:29:28,464 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132850 2023-11-20 01:29:36,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=885653.3333333334, ans=0.2 2023-11-20 01:29:39,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=885653.3333333334, ans=0.035 2023-11-20 01:29:41,438 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 600, loss[loss=0.07844, simple_loss=0.09352, pruned_loss=0.01811, audio_tagging_loss=0.01357, over 16110.00 frames. ], tot_loss[loss=0.08288, simple_loss=0.1026, pruned_loss=0.02098, audio_tagging_loss=0.0106, over 2895126.36 frames. ], batch size: 62, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:29:50,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.266e+01 9.047e+01 9.748e+01 1.324e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-20 01:30:06,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=885853.3333333334, ans=15.0 2023-11-20 01:30:24,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2023-11-20 01:30:25,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=885920.0, ans=0.125 2023-11-20 01:30:32,894 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132900 2023-11-20 01:30:33,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=885986.6666666666, ans=0.0 2023-11-20 01:30:44,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=886053.3333333334, ans=0.2 2023-11-20 01:30:45,505 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 650, loss[loss=0.06865, simple_loss=0.07736, pruned_loss=0.01798, audio_tagging_loss=0.01199, over 15347.00 frames. ], tot_loss[loss=0.08287, simple_loss=0.1025, pruned_loss=0.02105, audio_tagging_loss=0.01055, over 2934872.79 frames. ], batch size: 58, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:30:47,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=886053.3333333334, ans=0.025 2023-11-20 01:31:15,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=886186.6666666666, ans=0.0 2023-11-20 01:31:33,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=886253.3333333334, ans=0.125 2023-11-20 01:31:34,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=886253.3333333334, ans=0.0 2023-11-20 01:31:34,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-20 01:31:38,604 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 132950 2023-11-20 01:31:43,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=886320.0, ans=0.125 2023-11-20 01:31:46,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=886320.0, ans=0.015 2023-11-20 01:31:51,412 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 700, loss[loss=0.07737, simple_loss=0.09821, pruned_loss=0.01954, audio_tagging_loss=0.008728, over 14107.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.1024, pruned_loss=0.0211, audio_tagging_loss=0.01048, over 2957129.45 frames. ], batch size: 57, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:31:59,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2023-11-20 01:32:00,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.620e+01 8.108e+01 8.721e+01 9.361e+01 1.160e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 01:32:07,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=886453.3333333334, ans=0.0 2023-11-20 01:32:13,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=886453.3333333334, ans=0.09899494936611666 2023-11-20 01:32:24,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=886520.0, ans=0.0 2023-11-20 01:32:25,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=886520.0, ans=0.125 2023-11-20 01:32:41,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2023-11-20 01:32:43,725 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133000 2023-11-20 01:32:56,794 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 750, loss[loss=0.0893, simple_loss=0.1181, pruned_loss=0.02013, audio_tagging_loss=0.01015, over 14984.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1029, pruned_loss=0.02111, audio_tagging_loss=0.01041, over 2971283.17 frames. ], batch size: 54, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:33:02,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=886720.0, ans=0.07 2023-11-20 01:33:03,772 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-20 01:33:09,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=886786.6666666666, ans=0.125 2023-11-20 01:33:15,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-20 01:33:25,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=886853.3333333334, ans=0.1 2023-11-20 01:33:31,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=12.0 2023-11-20 01:33:48,638 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133050 2023-11-20 01:34:00,837 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 800, loss[loss=0.08039, simple_loss=0.102, pruned_loss=0.02002, audio_tagging_loss=0.009375, over 15260.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.1033, pruned_loss=0.02112, audio_tagging_loss=0.0104, over 2992528.04 frames. ], batch size: 58, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:34:03,621 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.506e-01 2023-11-20 01:34:10,153 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.686e+01 8.461e+01 9.039e+01 1.027e+02 1.682e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 01:34:25,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=887186.6666666666, ans=0.0 2023-11-20 01:34:36,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2023-11-20 01:34:52,121 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133100 2023-11-20 01:35:05,333 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 850, loss[loss=0.08045, simple_loss=0.1098, pruned_loss=0.01924, audio_tagging_loss=0.006314, over 15976.00 frames. ], tot_loss[loss=0.08359, simple_loss=0.1036, pruned_loss=0.02133, audio_tagging_loss=0.01048, over 3004942.39 frames. ], batch size: 58, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:35:28,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2023-11-20 01:35:35,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=887520.0, ans=0.0 2023-11-20 01:35:37,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=887520.0, ans=0.125 2023-11-20 01:35:41,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=887520.0, ans=0.0 2023-11-20 01:35:43,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=887586.6666666666, ans=0.0 2023-11-20 01:35:48,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=887586.6666666666, ans=0.125 2023-11-20 01:35:57,668 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133150 2023-11-20 01:35:57,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=887653.3333333334, ans=0.125 2023-11-20 01:35:58,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=887653.3333333334, ans=0.125 2023-11-20 01:36:01,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=887653.3333333334, ans=0.125 2023-11-20 01:36:10,457 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 900, loss[loss=0.07791, simple_loss=0.1019, pruned_loss=0.01856, audio_tagging_loss=0.008373, over 13938.00 frames. ], tot_loss[loss=0.08381, simple_loss=0.1034, pruned_loss=0.02144, audio_tagging_loss=0.01067, over 3009369.42 frames. ], batch size: 53, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:36:10,662 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:36:12,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2023-11-20 01:36:18,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=887720.0, ans=0.0 2023-11-20 01:36:19,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.555e+01 8.224e+01 8.986e+01 9.963e+01 2.180e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-20 01:36:19,346 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:36:47,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=887920.0, ans=0.125 2023-11-20 01:37:01,706 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133200 2023-11-20 01:37:14,465 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 950, loss[loss=0.0666, simple_loss=0.0855, pruned_loss=0.01602, audio_tagging_loss=0.007825, over 14408.00 frames. ], tot_loss[loss=0.08427, simple_loss=0.1041, pruned_loss=0.02175, audio_tagging_loss=0.01045, over 3014632.55 frames. ], batch size: 55, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:37:14,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=888053.3333333334, ans=0.0 2023-11-20 01:37:25,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=888053.3333333334, ans=0.0 2023-11-20 01:37:26,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=888120.0, ans=0.07 2023-11-20 01:37:53,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888253.3333333334, ans=0.1 2023-11-20 01:38:05,989 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133250 2023-11-20 01:38:14,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-20 01:38:16,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=888320.0, ans=0.125 2023-11-20 01:38:19,476 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1000, loss[loss=0.08631, simple_loss=0.1007, pruned_loss=0.02212, audio_tagging_loss=0.01383, over 14842.00 frames. ], tot_loss[loss=0.08419, simple_loss=0.1043, pruned_loss=0.0217, audio_tagging_loss=0.01032, over 3025906.96 frames. ], batch size: 55, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:38:28,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.591e+01 8.070e+01 8.953e+01 9.480e+01 1.441e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 01:38:33,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=888453.3333333334, ans=0.05 2023-11-20 01:38:33,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=888453.3333333334, ans=0.0 2023-11-20 01:38:46,581 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:38:46,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=888520.0, ans=0.125 2023-11-20 01:39:12,092 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133300 2023-11-20 01:39:17,589 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-20 01:39:24,750 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1050, loss[loss=0.1098, simple_loss=0.121, pruned_loss=0.03818, audio_tagging_loss=0.01112, over 14610.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.105, pruned_loss=0.02192, audio_tagging_loss=0.01023, over 3034804.48 frames. ], batch size: 56, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:39:58,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=888853.3333333334, ans=0.125 2023-11-20 01:40:13,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.16 vs. limit=10.0 2023-11-20 01:40:17,241 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133350 2023-11-20 01:40:29,551 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1100, loss[loss=0.09809, simple_loss=0.1209, pruned_loss=0.02615, audio_tagging_loss=0.01149, over 15252.00 frames. ], tot_loss[loss=0.08403, simple_loss=0.1041, pruned_loss=0.0218, audio_tagging_loss=0.01019, over 3033408.77 frames. ], batch size: 54, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:40:29,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=889053.3333333334, ans=0.2 2023-11-20 01:40:29,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=889053.3333333334, ans=0.2 2023-11-20 01:40:32,077 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:40:33,579 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:40:38,581 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.052e+01 8.709e+01 9.479e+01 1.259e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 01:40:41,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=889120.0, ans=0.125 2023-11-20 01:40:46,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=889120.0, ans=0.0 2023-11-20 01:41:02,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-20 01:41:14,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=889253.3333333334, ans=0.09899494936611666 2023-11-20 01:41:17,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=889253.3333333334, ans=0.0 2023-11-20 01:41:17,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=889253.3333333334, ans=0.0 2023-11-20 01:41:21,061 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133400 2023-11-20 01:41:24,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=889320.0, ans=0.0 2023-11-20 01:41:28,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=889320.0, ans=0.125 2023-11-20 01:41:34,528 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1150, loss[loss=0.1063, simple_loss=0.1387, pruned_loss=0.02823, audio_tagging_loss=0.00877, over 16031.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1039, pruned_loss=0.02166, audio_tagging_loss=0.01022, over 3038907.16 frames. ], batch size: 55, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:41:46,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=889453.3333333334, ans=10.0 2023-11-20 01:41:50,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=889453.3333333334, ans=0.125 2023-11-20 01:41:55,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=889453.3333333334, ans=0.125 2023-11-20 01:42:04,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=889520.0, ans=0.125 2023-11-20 01:42:10,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=889520.0, ans=0.2 2023-11-20 01:42:25,922 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133450 2023-11-20 01:42:39,271 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1200, loss[loss=0.06949, simple_loss=0.08274, pruned_loss=0.01615, audio_tagging_loss=0.01197, over 15093.00 frames. ], tot_loss[loss=0.08366, simple_loss=0.1041, pruned_loss=0.02158, audio_tagging_loss=0.01005, over 3042267.56 frames. ], batch size: 58, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:42:48,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.251e+01 9.001e+01 9.736e+01 1.493e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 01:42:53,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=889786.6666666666, ans=0.2 2023-11-20 01:42:54,023 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2023-11-20 01:43:11,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2023-11-20 01:43:19,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=889920.0, ans=0.125 2023-11-20 01:43:31,710 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133500 2023-11-20 01:43:43,770 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1250, loss[loss=0.07878, simple_loss=0.09288, pruned_loss=0.02017, audio_tagging_loss=0.01217, over 14735.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1042, pruned_loss=0.02157, audio_tagging_loss=0.01003, over 3037497.29 frames. ], batch size: 54, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:43:44,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=890053.3333333334, ans=0.125 2023-11-20 01:44:35,408 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133550 2023-11-20 01:44:48,086 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1300, loss[loss=0.09702, simple_loss=0.1139, pruned_loss=0.02642, audio_tagging_loss=0.01363, over 15291.00 frames. ], tot_loss[loss=0.08365, simple_loss=0.1041, pruned_loss=0.0216, audio_tagging_loss=0.009986, over 3032134.77 frames. ], batch size: 58, lr: 5.87e-03, grad_scale: 64.0 2023-11-20 01:44:56,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2023-11-20 01:44:57,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.375e+01 9.143e+01 9.896e+01 1.258e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 01:45:10,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890453.3333333334, ans=0.1 2023-11-20 01:45:39,246 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133600 2023-11-20 01:45:46,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=890653.3333333334, ans=0.035 2023-11-20 01:45:52,907 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1350, loss[loss=0.08041, simple_loss=0.1005, pruned_loss=0.02088, audio_tagging_loss=0.009257, over 16564.00 frames. ], tot_loss[loss=0.08339, simple_loss=0.1038, pruned_loss=0.02147, audio_tagging_loss=0.01004, over 3037657.49 frames. ], batch size: 59, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:45:55,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.87 vs. limit=22.5 2023-11-20 01:45:56,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.89 vs. limit=22.5 2023-11-20 01:46:09,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=890786.6666666666, ans=12.0 2023-11-20 01:46:21,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-11-20 01:46:24,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=890853.3333333334, ans=0.125 2023-11-20 01:46:32,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=890920.0, ans=0.125 2023-11-20 01:46:35,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=890920.0, ans=0.0 2023-11-20 01:46:35,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=890920.0, ans=0.0 2023-11-20 01:46:41,271 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:46:45,177 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133650 2023-11-20 01:46:45,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-11-20 01:46:58,779 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1400, loss[loss=0.07005, simple_loss=0.08563, pruned_loss=0.01567, audio_tagging_loss=0.01157, over 14795.00 frames. ], tot_loss[loss=0.08308, simple_loss=0.1036, pruned_loss=0.02127, audio_tagging_loss=0.01003, over 3036985.00 frames. ], batch size: 53, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:47:08,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 7.927e+01 8.547e+01 9.494e+01 1.207e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 01:47:13,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=891120.0, ans=0.125 2023-11-20 01:47:49,908 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133700 2023-11-20 01:48:02,913 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1450, loss[loss=0.08772, simple_loss=0.11, pruned_loss=0.02171, audio_tagging_loss=0.01098, over 15735.00 frames. ], tot_loss[loss=0.08392, simple_loss=0.1043, pruned_loss=0.02165, audio_tagging_loss=0.01011, over 3040206.37 frames. ], batch size: 60, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:48:39,431 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.178e-03 2023-11-20 01:48:45,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.29 vs. limit=10.0 2023-11-20 01:48:54,319 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133750 2023-11-20 01:48:56,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=891653.3333333334, ans=0.0 2023-11-20 01:49:02,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=891653.3333333334, ans=0.125 2023-11-20 01:49:05,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-20 01:49:07,033 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1500, loss[loss=0.07754, simple_loss=0.1027, pruned_loss=0.01819, audio_tagging_loss=0.007996, over 14958.00 frames. ], tot_loss[loss=0.0829, simple_loss=0.1029, pruned_loss=0.02124, audio_tagging_loss=0.01019, over 3033793.22 frames. ], batch size: 56, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:49:11,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=891720.0, ans=0.0 2023-11-20 01:49:18,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.912e+01 7.823e+01 8.560e+01 9.381e+01 1.533e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 01:49:32,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=891853.3333333334, ans=0.125 2023-11-20 01:49:42,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=891853.3333333334, ans=0.0 2023-11-20 01:49:43,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891853.3333333334, ans=0.1 2023-11-20 01:49:49,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=891920.0, ans=0.125 2023-11-20 01:49:59,065 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133800 2023-11-20 01:50:03,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=891986.6666666666, ans=0.125 2023-11-20 01:50:11,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892053.3333333334, ans=0.1 2023-11-20 01:50:12,773 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1550, loss[loss=0.07941, simple_loss=0.09955, pruned_loss=0.02161, audio_tagging_loss=0.008019, over 16065.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.103, pruned_loss=0.02109, audio_tagging_loss=0.0103, over 3036913.81 frames. ], batch size: 60, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:50:24,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=892120.0, ans=0.0 2023-11-20 01:50:33,443 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-20 01:51:00,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=892253.3333333334, ans=0.07 2023-11-20 01:51:04,192 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133850 2023-11-20 01:51:16,377 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1600, loss[loss=0.07811, simple_loss=0.08924, pruned_loss=0.02285, audio_tagging_loss=0.01064, over 14805.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.103, pruned_loss=0.02114, audio_tagging_loss=0.01044, over 3045417.51 frames. ], batch size: 55, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:51:28,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.791e+01 8.075e+01 8.775e+01 9.622e+01 1.213e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 01:51:34,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=892453.3333333334, ans=10.0 2023-11-20 01:51:39,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=12.0 2023-11-20 01:51:44,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=892520.0, ans=0.1 2023-11-20 01:52:09,105 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133900 2023-11-20 01:52:13,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=892653.3333333334, ans=0.125 2023-11-20 01:52:21,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=892720.0, ans=0.125 2023-11-20 01:52:22,025 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1650, loss[loss=0.07881, simple_loss=0.09592, pruned_loss=0.02114, audio_tagging_loss=0.009712, over 15931.00 frames. ], tot_loss[loss=0.08351, simple_loss=0.1038, pruned_loss=0.02122, audio_tagging_loss=0.01037, over 3057207.61 frames. ], batch size: 59, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:52:37,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=892786.6666666666, ans=0.0 2023-11-20 01:52:48,553 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2023-11-20 01:52:58,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=892853.3333333334, ans=0.1 2023-11-20 01:53:13,621 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 133950 2023-11-20 01:53:18,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-20 01:53:26,558 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1700, loss[loss=0.08083, simple_loss=0.1053, pruned_loss=0.01786, audio_tagging_loss=0.0103, over 13995.00 frames. ], tot_loss[loss=0.08253, simple_loss=0.1026, pruned_loss=0.02076, audio_tagging_loss=0.01046, over 3052462.67 frames. ], batch size: 54, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:53:29,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=893053.3333333334, ans=0.0 2023-11-20 01:53:38,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.118e+01 8.650e+01 9.250e+01 1.178e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 01:54:02,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=893186.6666666666, ans=0.2 2023-11-20 01:54:03,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=893186.6666666666, ans=0.0 2023-11-20 01:54:06,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=893253.3333333334, ans=0.125 2023-11-20 01:54:18,434 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134000 2023-11-20 01:54:25,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=893320.0, ans=0.125 2023-11-20 01:54:31,571 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1750, loss[loss=0.09583, simple_loss=0.1221, pruned_loss=0.02724, audio_tagging_loss=0.007517, over 15485.00 frames. ], tot_loss[loss=0.08205, simple_loss=0.1022, pruned_loss=0.02063, audio_tagging_loss=0.01032, over 3055822.89 frames. ], batch size: 56, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:54:31,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=893386.6666666666, ans=0.125 2023-11-20 01:54:41,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=893386.6666666666, ans=0.125 2023-11-20 01:54:52,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=893453.3333333334, ans=0.125 2023-11-20 01:55:10,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=893586.6666666666, ans=0.125 2023-11-20 01:55:14,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=893586.6666666666, ans=0.125 2023-11-20 01:55:23,913 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134050 2023-11-20 01:55:32,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=893653.3333333334, ans=0.2 2023-11-20 01:55:34,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2023-11-20 01:55:36,242 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1800, loss[loss=0.07669, simple_loss=0.1012, pruned_loss=0.01616, audio_tagging_loss=0.009919, over 15449.00 frames. ], tot_loss[loss=0.08274, simple_loss=0.1031, pruned_loss=0.02097, audio_tagging_loss=0.0102, over 3055064.85 frames. ], batch size: 56, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:55:47,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.080e+01 8.622e+01 9.512e+01 1.674e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-20 01:55:53,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=893786.6666666666, ans=0.0 2023-11-20 01:56:03,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=893853.3333333334, ans=0.0 2023-11-20 01:56:04,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=893853.3333333334, ans=0.125 2023-11-20 01:56:14,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=893920.0, ans=0.125 2023-11-20 01:56:25,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=893920.0, ans=0.0 2023-11-20 01:56:28,513 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134100 2023-11-20 01:56:34,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=893986.6666666666, ans=0.0 2023-11-20 01:56:36,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=893986.6666666666, ans=0.0 2023-11-20 01:56:41,482 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1850, loss[loss=0.08224, simple_loss=0.1081, pruned_loss=0.01618, audio_tagging_loss=0.012, over 15270.00 frames. ], tot_loss[loss=0.08283, simple_loss=0.1034, pruned_loss=0.02102, audio_tagging_loss=0.01013, over 3051091.87 frames. ], batch size: 57, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:56:51,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=894053.3333333334, ans=0.1 2023-11-20 01:56:54,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=894120.0, ans=0.125 2023-11-20 01:57:22,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=894253.3333333334, ans=0.0 2023-11-20 01:57:24,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2023-11-20 01:57:33,252 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134150 2023-11-20 01:57:33,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=894320.0, ans=0.125 2023-11-20 01:57:45,559 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1900, loss[loss=0.08284, simple_loss=0.1024, pruned_loss=0.02197, audio_tagging_loss=0.009669, over 14281.00 frames. ], tot_loss[loss=0.08266, simple_loss=0.1032, pruned_loss=0.02098, audio_tagging_loss=0.0101, over 3053206.27 frames. ], batch size: 55, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 01:57:59,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.176e+01 8.935e+01 9.422e+01 1.185e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 01:58:01,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=894453.3333333334, ans=0.0 2023-11-20 01:58:19,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=894520.0, ans=0.1 2023-11-20 01:58:33,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=894586.6666666666, ans=0.125 2023-11-20 01:58:34,714 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2023-11-20 01:58:38,365 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134200 2023-11-20 01:58:42,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2023-11-20 01:58:49,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=894653.3333333334, ans=0.0 2023-11-20 01:58:51,026 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 1950, loss[loss=0.07716, simple_loss=0.09415, pruned_loss=0.0185, audio_tagging_loss=0.01159, over 14353.00 frames. ], tot_loss[loss=0.08243, simple_loss=0.1028, pruned_loss=0.02097, audio_tagging_loss=0.01006, over 3044282.55 frames. ], batch size: 53, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 01:59:05,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-11-20 01:59:16,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=894853.3333333334, ans=0.0 2023-11-20 01:59:24,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=894853.3333333334, ans=0.07 2023-11-20 01:59:24,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=894853.3333333334, ans=0.0 2023-11-20 01:59:37,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=894920.0, ans=0.125 2023-11-20 01:59:39,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=894920.0, ans=0.1 2023-11-20 01:59:42,049 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134250 2023-11-20 01:59:47,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2023-11-20 01:59:54,811 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2000, loss[loss=0.06253, simple_loss=0.06679, pruned_loss=0.01179, audio_tagging_loss=0.01734, over 14316.00 frames. ], tot_loss[loss=0.08176, simple_loss=0.1016, pruned_loss=0.02081, audio_tagging_loss=0.01014, over 3048005.83 frames. ], batch size: 56, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 02:00:08,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.403e+01 7.896e+01 8.593e+01 9.449e+01 1.399e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 02:00:24,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=895186.6666666666, ans=0.0 2023-11-20 02:00:47,365 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134300 2023-11-20 02:00:59,929 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2050, loss[loss=0.06667, simple_loss=0.07632, pruned_loss=0.01991, audio_tagging_loss=0.008608, over 13269.00 frames. ], tot_loss[loss=0.08252, simple_loss=0.1026, pruned_loss=0.02114, audio_tagging_loss=0.01009, over 3050293.47 frames. ], batch size: 52, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 02:01:10,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-11-20 02:01:17,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=895453.3333333334, ans=0.0 2023-11-20 02:01:20,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=895453.3333333334, ans=0.0 2023-11-20 02:01:30,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=895520.0, ans=0.125 2023-11-20 02:01:36,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=895520.0, ans=0.125 2023-11-20 02:01:51,616 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134350 2023-11-20 02:02:03,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=895653.3333333334, ans=0.0 2023-11-20 02:02:05,142 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2100, loss[loss=0.06547, simple_loss=0.07827, pruned_loss=0.01632, audio_tagging_loss=0.01001, over 16090.00 frames. ], tot_loss[loss=0.08278, simple_loss=0.1033, pruned_loss=0.02116, audio_tagging_loss=0.009963, over 3044163.52 frames. ], batch size: 63, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:02:12,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=895720.0, ans=0.95 2023-11-20 02:02:14,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=895720.0, ans=0.025 2023-11-20 02:02:19,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.312e+01 8.947e+01 9.682e+01 1.152e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 02:02:19,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2023-11-20 02:02:24,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=895786.6666666666, ans=0.125 2023-11-20 02:02:56,470 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134400 2023-11-20 02:02:57,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=895986.6666666666, ans=0.125 2023-11-20 02:03:09,591 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2150, loss[loss=0.1128, simple_loss=0.146, pruned_loss=0.0325, audio_tagging_loss=0.007274, over 14670.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1033, pruned_loss=0.02116, audio_tagging_loss=0.01003, over 3045144.34 frames. ], batch size: 54, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:03:21,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-20 02:03:33,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2023-11-20 02:03:34,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=896186.6666666666, ans=0.025 2023-11-20 02:03:48,931 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:04:00,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=896320.0, ans=0.125 2023-11-20 02:04:01,699 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134450 2023-11-20 02:04:14,506 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2200, loss[loss=0.07166, simple_loss=0.08995, pruned_loss=0.01543, audio_tagging_loss=0.01126, over 15522.00 frames. ], tot_loss[loss=0.08287, simple_loss=0.1037, pruned_loss=0.02106, audio_tagging_loss=0.009948, over 3047177.49 frames. ], batch size: 58, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:04:23,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=896386.6666666666, ans=0.0 2023-11-20 02:04:23,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2023-11-20 02:04:27,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=896453.3333333334, ans=0.125 2023-11-20 02:04:28,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.199e+01 8.937e+01 9.521e+01 1.153e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 02:04:42,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=896520.0, ans=0.125 2023-11-20 02:04:52,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896586.6666666666, ans=0.1 2023-11-20 02:05:06,379 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134500 2023-11-20 02:05:18,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=896720.0, ans=0.0 2023-11-20 02:05:19,206 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2250, loss[loss=0.09096, simple_loss=0.1106, pruned_loss=0.02536, audio_tagging_loss=0.0103, over 14831.00 frames. ], tot_loss[loss=0.08248, simple_loss=0.1032, pruned_loss=0.02096, audio_tagging_loss=0.009931, over 3044634.25 frames. ], batch size: 57, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:05:20,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=896720.0, ans=0.125 2023-11-20 02:05:24,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=896720.0, ans=0.125 2023-11-20 02:05:35,682 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=12.0 2023-11-20 02:05:51,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=896853.3333333334, ans=0.125 2023-11-20 02:06:10,901 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134550 2023-11-20 02:06:19,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=896986.6666666666, ans=0.125 2023-11-20 02:06:24,384 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2300, loss[loss=0.09374, simple_loss=0.1173, pruned_loss=0.0257, audio_tagging_loss=0.009404, over 14432.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.1023, pruned_loss=0.02087, audio_tagging_loss=0.01001, over 3047241.31 frames. ], batch size: 55, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:06:24,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=897053.3333333334, ans=0.0 2023-11-20 02:06:38,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.171e+01 8.997e+01 9.810e+01 1.855e+02, threshold=1.799e+02, percent-clipped=1.0 2023-11-20 02:06:42,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=897120.0, ans=0.125 2023-11-20 02:06:46,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-11-20 02:06:51,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2023-11-20 02:07:15,457 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134600 2023-11-20 02:07:20,759 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:07:24,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=897320.0, ans=0.0 2023-11-20 02:07:26,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=897320.0, ans=0.125 2023-11-20 02:07:28,691 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2350, loss[loss=0.0965, simple_loss=0.1248, pruned_loss=0.02459, audio_tagging_loss=0.009521, over 15548.00 frames. ], tot_loss[loss=0.08245, simple_loss=0.1028, pruned_loss=0.02099, audio_tagging_loss=0.01005, over 3054545.08 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:08:06,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=897586.6666666666, ans=0.0 2023-11-20 02:08:19,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=897653.3333333334, ans=0.125 2023-11-20 02:08:20,701 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134650 2023-11-20 02:08:22,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-20 02:08:29,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=897653.3333333334, ans=0.1 2023-11-20 02:08:33,547 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2400, loss[loss=0.07218, simple_loss=0.08814, pruned_loss=0.01468, audio_tagging_loss=0.01343, over 14829.00 frames. ], tot_loss[loss=0.08167, simple_loss=0.1015, pruned_loss=0.02064, audio_tagging_loss=0.01029, over 3049870.91 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 32.0 2023-11-20 02:08:33,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897720.0, ans=0.1 2023-11-20 02:08:35,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=12.0 2023-11-20 02:08:36,931 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2023-11-20 02:08:39,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=897720.0, ans=0.04949747468305833 2023-11-20 02:08:46,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=897786.6666666666, ans=0.1 2023-11-20 02:08:47,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.416e+01 8.072e+01 8.719e+01 9.774e+01 1.313e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 02:08:53,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=897786.6666666666, ans=0.125 2023-11-20 02:09:01,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=897853.3333333334, ans=0.125 2023-11-20 02:09:15,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=897920.0, ans=0.0 2023-11-20 02:09:24,709 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134700 2023-11-20 02:09:26,022 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:09:28,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2023-11-20 02:09:36,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=898053.3333333334, ans=0.5 2023-11-20 02:09:37,392 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2450, loss[loss=0.07801, simple_loss=0.08802, pruned_loss=0.02246, audio_tagging_loss=0.01155, over 14967.00 frames. ], tot_loss[loss=0.08283, simple_loss=0.1028, pruned_loss=0.02107, audio_tagging_loss=0.01034, over 3050292.49 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:09:43,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=898053.3333333334, ans=10.0 2023-11-20 02:09:46,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=898053.3333333334, ans=0.0 2023-11-20 02:10:08,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=898186.6666666666, ans=0.2 2023-11-20 02:10:18,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=898253.3333333334, ans=0.025 2023-11-20 02:10:25,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898253.3333333334, ans=0.1 2023-11-20 02:10:29,979 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134750 2023-11-20 02:10:35,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=898320.0, ans=15.0 2023-11-20 02:10:36,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898320.0, ans=0.1 2023-11-20 02:10:42,742 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2500, loss[loss=0.08228, simple_loss=0.1093, pruned_loss=0.01783, audio_tagging_loss=0.009777, over 15370.00 frames. ], tot_loss[loss=0.08249, simple_loss=0.1024, pruned_loss=0.02085, audio_tagging_loss=0.01044, over 3052125.87 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:10:45,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=898386.6666666666, ans=0.1 2023-11-20 02:10:50,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-20 02:10:56,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=898453.3333333334, ans=0.125 2023-11-20 02:10:57,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 7.993e+01 8.796e+01 9.570e+01 1.207e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 02:11:13,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=898520.0, ans=0.0 2023-11-20 02:11:34,157 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134800 2023-11-20 02:11:35,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=898653.3333333334, ans=0.07 2023-11-20 02:11:37,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=898653.3333333334, ans=0.125 2023-11-20 02:11:46,895 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2550, loss[loss=0.07395, simple_loss=0.09703, pruned_loss=0.01559, audio_tagging_loss=0.009849, over 15707.00 frames. ], tot_loss[loss=0.08215, simple_loss=0.1021, pruned_loss=0.02085, audio_tagging_loss=0.01024, over 3049043.20 frames. ], batch size: 59, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:12:39,677 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134850 2023-11-20 02:12:52,590 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2600, loss[loss=0.08734, simple_loss=0.1145, pruned_loss=0.02281, audio_tagging_loss=0.007266, over 15546.00 frames. ], tot_loss[loss=0.0816, simple_loss=0.1015, pruned_loss=0.02074, audio_tagging_loss=0.01009, over 3046872.80 frames. ], batch size: 54, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:13:06,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2023-11-20 02:13:08,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.150e+01 8.781e+01 9.502e+01 1.826e+02, threshold=1.756e+02, percent-clipped=1.0 2023-11-20 02:13:16,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899120.0, ans=0.0 2023-11-20 02:13:16,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=899120.0, ans=0.2 2023-11-20 02:13:28,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=899186.6666666666, ans=0.0 2023-11-20 02:13:32,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=899253.3333333334, ans=0.125 2023-11-20 02:13:41,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899253.3333333334, ans=0.0 2023-11-20 02:13:44,909 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134900 2023-11-20 02:13:58,381 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2650, loss[loss=0.04206, simple_loss=0.0366, pruned_loss=0.007012, audio_tagging_loss=0.01675, over 15309.00 frames. ], tot_loss[loss=0.08112, simple_loss=0.1008, pruned_loss=0.02055, audio_tagging_loss=0.01017, over 3049233.41 frames. ], batch size: 61, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:14:10,360 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=8.0 2023-11-20 02:14:49,804 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 134950 2023-11-20 02:14:58,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=899653.3333333334, ans=0.125 2023-11-20 02:15:02,144 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2700, loss[loss=0.07768, simple_loss=0.09893, pruned_loss=0.02119, audio_tagging_loss=0.007017, over 13642.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.101, pruned_loss=0.02066, audio_tagging_loss=0.01008, over 3042769.54 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:15:08,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=899720.0, ans=0.125 2023-11-20 02:15:18,165 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.493e+01 8.480e+01 9.136e+01 9.839e+01 1.399e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 02:15:19,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=899786.6666666666, ans=0.1 2023-11-20 02:15:37,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=899853.3333333334, ans=0.07 2023-11-20 02:15:53,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-20 02:15:54,270 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135000 2023-11-20 02:16:07,434 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2750, loss[loss=0.1092, simple_loss=0.1462, pruned_loss=0.02856, audio_tagging_loss=0.007566, over 16546.00 frames. ], tot_loss[loss=0.08172, simple_loss=0.1016, pruned_loss=0.02082, audio_tagging_loss=0.01011, over 3042560.61 frames. ], batch size: 58, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:16:07,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=900053.3333333334, ans=0.05 2023-11-20 02:16:11,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=900053.3333333334, ans=0.125 2023-11-20 02:16:11,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=900053.3333333334, ans=0.0 2023-11-20 02:16:11,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2023-11-20 02:16:27,786 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-20 02:16:34,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900186.6666666666, ans=0.125 2023-11-20 02:16:37,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=900186.6666666666, ans=0.0 2023-11-20 02:16:42,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=900186.6666666666, ans=0.0 2023-11-20 02:16:53,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=900253.3333333334, ans=0.125 2023-11-20 02:16:59,872 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135050 2023-11-20 02:17:03,535 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:17:04,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-20 02:17:04,202 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-11-20 02:17:12,193 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2800, loss[loss=0.05928, simple_loss=0.06405, pruned_loss=0.01417, audio_tagging_loss=0.01308, over 15424.00 frames. ], tot_loss[loss=0.08105, simple_loss=0.1006, pruned_loss=0.02055, audio_tagging_loss=0.0102, over 3038082.82 frames. ], batch size: 60, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:17:20,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=900386.6666666666, ans=0.07 2023-11-20 02:17:25,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=900453.3333333334, ans=0.125 2023-11-20 02:17:28,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.003e+01 8.693e+01 9.427e+01 1.214e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 02:17:28,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=900453.3333333334, ans=0.07 2023-11-20 02:17:30,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=900453.3333333334, ans=0.125 2023-11-20 02:18:04,411 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135100 2023-11-20 02:18:10,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=900653.3333333334, ans=0.0 2023-11-20 02:18:17,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=12.0 2023-11-20 02:18:17,509 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2850, loss[loss=0.08087, simple_loss=0.09535, pruned_loss=0.02339, audio_tagging_loss=0.009808, over 14414.00 frames. ], tot_loss[loss=0.08132, simple_loss=0.1009, pruned_loss=0.02066, audio_tagging_loss=0.0102, over 3032639.15 frames. ], batch size: 53, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:18:46,674 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2023-11-20 02:19:07,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=900920.0, ans=0.2 2023-11-20 02:19:08,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900986.6666666666, ans=0.1 2023-11-20 02:19:09,309 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135150 2023-11-20 02:19:22,087 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2900, loss[loss=0.07986, simple_loss=0.0889, pruned_loss=0.02014, audio_tagging_loss=0.01526, over 14765.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1013, pruned_loss=0.02082, audio_tagging_loss=0.01022, over 3031625.49 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:19:22,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901053.3333333334, ans=0.1 2023-11-20 02:19:26,635 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:19:30,454 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:19:37,586 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.169e+01 9.046e+01 9.875e+01 2.052e+02, threshold=1.809e+02, percent-clipped=1.0 2023-11-20 02:20:13,518 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135200 2023-11-20 02:20:26,774 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 2950, loss[loss=0.08262, simple_loss=0.1141, pruned_loss=0.01863, audio_tagging_loss=0.006922, over 15460.00 frames. ], tot_loss[loss=0.08214, simple_loss=0.1018, pruned_loss=0.02099, audio_tagging_loss=0.01024, over 3038731.60 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:20:41,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=901453.3333333334, ans=0.0 2023-11-20 02:21:10,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-20 02:21:15,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901586.6666666666, ans=0.1 2023-11-20 02:21:16,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2023-11-20 02:21:18,793 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135250 2023-11-20 02:21:30,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=901720.0, ans=0.125 2023-11-20 02:21:31,815 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3000, loss[loss=0.07374, simple_loss=0.08945, pruned_loss=0.0156, audio_tagging_loss=0.01342, over 14129.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1014, pruned_loss=0.02099, audio_tagging_loss=0.01036, over 3039186.25 frames. ], batch size: 54, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:21:31,816 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 02:21:56,135 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.4199, 6.0534, 6.4026, 5.9604], device='cuda:2') 2023-11-20 02:22:13,527 INFO [train_asr.py:1294] (2/4) Epoch 12, validation: loss=0.0631, simple_loss=0.05442, pruned_loss=0.006068, audio_tagging_loss=0.02982, over 4681554.00 frames. 2023-11-20 02:22:13,528 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 02:22:22,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=901720.0, ans=0.05 2023-11-20 02:22:24,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2023-11-20 02:22:29,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.337e+01 8.935e+01 9.823e+01 2.024e+02, threshold=1.787e+02, percent-clipped=1.0 2023-11-20 02:22:36,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=901786.6666666666, ans=0.125 2023-11-20 02:22:43,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=901853.3333333334, ans=0.2 2023-11-20 02:23:05,052 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135300 2023-11-20 02:23:11,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=901986.6666666666, ans=0.2 2023-11-20 02:23:17,190 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3050, loss[loss=0.08362, simple_loss=0.1104, pruned_loss=0.02182, audio_tagging_loss=0.006589, over 14836.00 frames. ], tot_loss[loss=0.08253, simple_loss=0.1022, pruned_loss=0.02121, audio_tagging_loss=0.01024, over 3037287.72 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:23:41,441 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:23:56,511 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:23:57,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-20 02:23:58,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=902253.3333333334, ans=0.125 2023-11-20 02:24:03,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=902253.3333333334, ans=0.0 2023-11-20 02:24:09,912 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135350 2023-11-20 02:24:22,081 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3100, loss[loss=0.09382, simple_loss=0.1147, pruned_loss=0.02445, audio_tagging_loss=0.01199, over 15461.00 frames. ], tot_loss[loss=0.08312, simple_loss=0.103, pruned_loss=0.02136, audio_tagging_loss=0.01025, over 3040477.40 frames. ], batch size: 57, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:24:39,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 8.132e+01 9.002e+01 1.001e+02 1.327e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 02:24:48,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.64 vs. limit=15.0 2023-11-20 02:24:50,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=902520.0, ans=0.2 2023-11-20 02:24:52,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=902520.0, ans=0.125 2023-11-20 02:25:04,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=902586.6666666666, ans=0.125 2023-11-20 02:25:08,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.91 vs. limit=10.0 2023-11-20 02:25:14,138 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135400 2023-11-20 02:25:27,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=902720.0, ans=0.125 2023-11-20 02:25:27,972 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3150, loss[loss=0.09137, simple_loss=0.1192, pruned_loss=0.02433, audio_tagging_loss=0.007445, over 16075.00 frames. ], tot_loss[loss=0.08352, simple_loss=0.1036, pruned_loss=0.02143, audio_tagging_loss=0.01029, over 3044470.00 frames. ], batch size: 59, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:25:35,682 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2023-11-20 02:26:03,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=902853.3333333334, ans=0.2 2023-11-20 02:26:19,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=902920.0, ans=15.0 2023-11-20 02:26:20,995 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135450 2023-11-20 02:26:22,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-20 02:26:31,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=902986.6666666666, ans=0.125 2023-11-20 02:26:33,304 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3200, loss[loss=0.08644, simple_loss=0.1037, pruned_loss=0.02435, audio_tagging_loss=0.01024, over 14773.00 frames. ], tot_loss[loss=0.08452, simple_loss=0.1048, pruned_loss=0.02181, audio_tagging_loss=0.0103, over 3053268.65 frames. ], batch size: 55, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:26:40,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903053.3333333334, ans=0.1 2023-11-20 02:26:45,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=903120.0, ans=15.0 2023-11-20 02:26:49,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.472e+01 8.960e+01 9.869e+01 1.272e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 02:26:51,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=903120.0, ans=0.125 2023-11-20 02:26:58,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=903186.6666666666, ans=0.125 2023-11-20 02:26:58,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=903186.6666666666, ans=0.0 2023-11-20 02:26:59,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903186.6666666666, ans=0.1 2023-11-20 02:27:09,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=903186.6666666666, ans=0.125 2023-11-20 02:27:25,472 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135500 2023-11-20 02:27:25,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903320.0, ans=0.1 2023-11-20 02:27:31,827 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=15.0 2023-11-20 02:27:33,682 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:27:38,273 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3250, loss[loss=0.08772, simple_loss=0.1152, pruned_loss=0.02272, audio_tagging_loss=0.007403, over 16981.00 frames. ], tot_loss[loss=0.0843, simple_loss=0.1046, pruned_loss=0.02171, audio_tagging_loss=0.01028, over 3056324.64 frames. ], batch size: 63, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:27:42,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=903386.6666666666, ans=0.09899494936611666 2023-11-20 02:27:49,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2023-11-20 02:28:05,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=903520.0, ans=0.0 2023-11-20 02:28:10,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=903520.0, ans=0.0 2023-11-20 02:28:10,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903520.0, ans=0.1 2023-11-20 02:28:11,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=903520.0, ans=0.0 2023-11-20 02:28:14,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=903520.0, ans=0.0 2023-11-20 02:28:23,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=903586.6666666666, ans=0.0 2023-11-20 02:28:30,492 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135550 2023-11-20 02:28:43,516 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3300, loss[loss=0.1007, simple_loss=0.1349, pruned_loss=0.02618, audio_tagging_loss=0.007028, over 15625.00 frames. ], tot_loss[loss=0.08401, simple_loss=0.1043, pruned_loss=0.02158, audio_tagging_loss=0.01029, over 3055702.95 frames. ], batch size: 56, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:28:54,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=903720.0, ans=0.125 2023-11-20 02:29:00,609 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.302e+01 8.686e+01 9.682e+01 1.210e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 02:29:11,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903853.3333333334, ans=0.1 2023-11-20 02:29:19,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=903853.3333333334, ans=0.0 2023-11-20 02:29:34,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2023-11-20 02:29:36,507 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135600 2023-11-20 02:29:48,789 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3350, loss[loss=0.08545, simple_loss=0.1039, pruned_loss=0.0229, audio_tagging_loss=0.0106, over 15260.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1049, pruned_loss=0.02186, audio_tagging_loss=0.01013, over 3058338.16 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:29:51,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=22.5 2023-11-20 02:30:24,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=904186.6666666666, ans=0.2 2023-11-20 02:30:39,798 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135650 2023-11-20 02:30:52,701 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3400, loss[loss=0.08658, simple_loss=0.1185, pruned_loss=0.01818, audio_tagging_loss=0.009151, over 15583.00 frames. ], tot_loss[loss=0.08395, simple_loss=0.1044, pruned_loss=0.02176, audio_tagging_loss=0.01001, over 3052947.65 frames. ], batch size: 56, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:30:54,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=904386.6666666666, ans=0.05 2023-11-20 02:30:54,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-20 02:31:09,372 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.145e+01 8.862e+01 9.550e+01 1.351e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 02:31:12,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=904453.3333333334, ans=0.0 2023-11-20 02:31:22,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=904520.0, ans=0.0 2023-11-20 02:31:29,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=904520.0, ans=0.125 2023-11-20 02:31:31,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-20 02:31:40,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2023-11-20 02:31:44,813 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135700 2023-11-20 02:31:52,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904653.3333333334, ans=0.1 2023-11-20 02:31:54,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=904653.3333333334, ans=0.1 2023-11-20 02:31:54,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=904653.3333333334, ans=0.07 2023-11-20 02:31:57,664 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3450, loss[loss=0.09477, simple_loss=0.1209, pruned_loss=0.02711, audio_tagging_loss=0.00721, over 15719.00 frames. ], tot_loss[loss=0.08308, simple_loss=0.1035, pruned_loss=0.02141, audio_tagging_loss=0.009917, over 3049417.29 frames. ], batch size: 56, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:32:17,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=904786.6666666666, ans=0.05 2023-11-20 02:32:19,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2023-11-20 02:32:46,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=904920.0, ans=0.1 2023-11-20 02:32:49,592 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135750 2023-11-20 02:33:03,094 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3500, loss[loss=0.07784, simple_loss=0.09683, pruned_loss=0.01907, audio_tagging_loss=0.01036, over 15184.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.1022, pruned_loss=0.0211, audio_tagging_loss=0.009956, over 3043946.15 frames. ], batch size: 56, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:33:08,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=905053.3333333334, ans=0.0 2023-11-20 02:33:19,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.042e+01 8.771e+01 9.579e+01 1.310e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 02:33:36,436 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:33:51,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=905253.3333333334, ans=0.125 2023-11-20 02:33:55,116 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135800 2023-11-20 02:34:08,294 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3550, loss[loss=0.08179, simple_loss=0.1017, pruned_loss=0.0232, audio_tagging_loss=0.007758, over 15516.00 frames. ], tot_loss[loss=0.08243, simple_loss=0.1026, pruned_loss=0.02122, audio_tagging_loss=0.00992, over 3048791.60 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:34:31,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=905453.3333333334, ans=0.125 2023-11-20 02:34:59,950 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135850 2023-11-20 02:35:12,752 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3600, loss[loss=0.07002, simple_loss=0.09134, pruned_loss=0.01619, audio_tagging_loss=0.008161, over 15621.00 frames. ], tot_loss[loss=0.08215, simple_loss=0.1024, pruned_loss=0.02103, audio_tagging_loss=0.009914, over 3042014.09 frames. ], batch size: 58, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:35:29,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.123e+01 9.173e+01 1.010e+02 1.525e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 02:35:48,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=905853.3333333334, ans=0.125 2023-11-20 02:36:04,352 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135900 2023-11-20 02:36:15,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=905986.6666666666, ans=0.2 2023-11-20 02:36:17,252 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3650, loss[loss=0.08979, simple_loss=0.1188, pruned_loss=0.02415, audio_tagging_loss=0.006214, over 16057.00 frames. ], tot_loss[loss=0.08227, simple_loss=0.1026, pruned_loss=0.02108, audio_tagging_loss=0.009888, over 3045676.61 frames. ], batch size: 56, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:36:22,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.38 vs. limit=15.0 2023-11-20 02:36:25,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=906053.3333333334, ans=0.0 2023-11-20 02:36:42,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=906186.6666666666, ans=0.0 2023-11-20 02:36:55,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=906253.3333333334, ans=0.0 2023-11-20 02:37:07,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=906253.3333333334, ans=0.125 2023-11-20 02:37:08,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906320.0, ans=0.1 2023-11-20 02:37:09,368 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 135950 2023-11-20 02:37:22,738 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3700, loss[loss=0.09363, simple_loss=0.1183, pruned_loss=0.02397, audio_tagging_loss=0.01052, over 15732.00 frames. ], tot_loss[loss=0.08238, simple_loss=0.1026, pruned_loss=0.0212, audio_tagging_loss=0.009872, over 3046139.71 frames. ], batch size: 58, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:37:24,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=906386.6666666666, ans=0.0 2023-11-20 02:37:31,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=906386.6666666666, ans=0.125 2023-11-20 02:37:38,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.262e+01 8.837e+01 9.420e+01 1.280e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 02:37:40,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=906453.3333333334, ans=0.125 2023-11-20 02:37:47,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=906520.0, ans=0.2 2023-11-20 02:38:11,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=906586.6666666666, ans=0.125 2023-11-20 02:38:13,895 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136000 2023-11-20 02:38:14,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=906653.3333333334, ans=0.0 2023-11-20 02:38:29,423 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3750, loss[loss=0.07093, simple_loss=0.08728, pruned_loss=0.01676, audio_tagging_loss=0.01053, over 14668.00 frames. ], tot_loss[loss=0.08302, simple_loss=0.1031, pruned_loss=0.02149, audio_tagging_loss=0.00999, over 3044424.11 frames. ], batch size: 57, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:38:36,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=906720.0, ans=0.2 2023-11-20 02:38:56,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=906853.3333333334, ans=0.1 2023-11-20 02:39:08,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=906920.0, ans=0.0 2023-11-20 02:39:16,267 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:39:19,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=906920.0, ans=0.09899494936611666 2023-11-20 02:39:21,896 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136050 2023-11-20 02:39:34,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2023-11-20 02:39:34,602 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3800, loss[loss=0.07321, simple_loss=0.08712, pruned_loss=0.01932, audio_tagging_loss=0.01033, over 14084.00 frames. ], tot_loss[loss=0.08226, simple_loss=0.1022, pruned_loss=0.02107, audio_tagging_loss=0.01009, over 3046050.27 frames. ], batch size: 55, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:39:52,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.426e+01 8.980e+01 9.690e+01 1.284e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 02:39:53,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=907120.0, ans=0.125 2023-11-20 02:39:55,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-11-20 02:40:06,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=907186.6666666666, ans=0.0 2023-11-20 02:40:17,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=907253.3333333334, ans=0.125 2023-11-20 02:40:26,352 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136100 2023-11-20 02:40:26,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=907320.0, ans=0.125 2023-11-20 02:40:39,685 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3850, loss[loss=0.1042, simple_loss=0.129, pruned_loss=0.03, audio_tagging_loss=0.009719, over 15409.00 frames. ], tot_loss[loss=0.08253, simple_loss=0.1022, pruned_loss=0.0211, audio_tagging_loss=0.01033, over 3048557.07 frames. ], batch size: 57, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:40:43,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=907386.6666666666, ans=0.125 2023-11-20 02:40:46,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=907386.6666666666, ans=0.125 2023-11-20 02:41:10,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=907520.0, ans=0.125 2023-11-20 02:41:10,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=907520.0, ans=0.025 2023-11-20 02:41:22,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2023-11-20 02:41:25,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=907586.6666666666, ans=0.125 2023-11-20 02:41:31,411 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136150 2023-11-20 02:41:42,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907720.0, ans=0.1 2023-11-20 02:41:43,658 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3900, loss[loss=0.08365, simple_loss=0.1058, pruned_loss=0.01821, audio_tagging_loss=0.01252, over 15665.00 frames. ], tot_loss[loss=0.08248, simple_loss=0.1024, pruned_loss=0.021, audio_tagging_loss=0.01029, over 3048515.81 frames. ], batch size: 58, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:41:43,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=907720.0, ans=0.0 2023-11-20 02:42:02,319 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.198e+01 8.910e+01 9.669e+01 1.262e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 02:42:12,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=907853.3333333334, ans=0.125 2023-11-20 02:42:15,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=907853.3333333334, ans=0.125 2023-11-20 02:42:21,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=907853.3333333334, ans=0.09899494936611666 2023-11-20 02:42:25,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=907920.0, ans=0.125 2023-11-20 02:42:31,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=907920.0, ans=0.1 2023-11-20 02:42:35,727 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136200 2023-11-20 02:42:41,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-20 02:42:49,269 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 3950, loss[loss=0.07565, simple_loss=0.09923, pruned_loss=0.01694, audio_tagging_loss=0.00909, over 15481.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1021, pruned_loss=0.02067, audio_tagging_loss=0.0104, over 3052850.63 frames. ], batch size: 58, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:42:50,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=908053.3333333334, ans=0.0 2023-11-20 02:43:10,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=908120.0, ans=0.125 2023-11-20 02:43:12,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=908120.0, ans=0.0 2023-11-20 02:43:28,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=908253.3333333334, ans=0.125 2023-11-20 02:43:31,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=908253.3333333334, ans=0.1 2023-11-20 02:43:37,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=908253.3333333334, ans=0.1 2023-11-20 02:43:40,669 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136250 2023-11-20 02:43:43,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2023-11-20 02:43:51,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=908386.6666666666, ans=0.1 2023-11-20 02:43:52,742 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4000, loss[loss=0.1271, simple_loss=0.1544, pruned_loss=0.04104, audio_tagging_loss=0.008827, over 15420.00 frames. ], tot_loss[loss=0.0826, simple_loss=0.1025, pruned_loss=0.02094, audio_tagging_loss=0.0104, over 3050572.58 frames. ], batch size: 58, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:43:53,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-11-20 02:44:03,304 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.02 vs. limit=10.0 2023-11-20 02:44:03,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=908386.6666666666, ans=0.125 2023-11-20 02:44:06,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=908453.3333333334, ans=0.025 2023-11-20 02:44:11,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.168e+01 8.877e+01 9.706e+01 2.567e+02, threshold=1.775e+02, percent-clipped=1.0 2023-11-20 02:44:21,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=908520.0, ans=0.0 2023-11-20 02:44:22,140 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-20 02:44:23,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=908520.0, ans=0.95 2023-11-20 02:44:27,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-20 02:44:44,757 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136300 2023-11-20 02:44:44,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=908653.3333333334, ans=0.0 2023-11-20 02:44:46,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=908653.3333333334, ans=0.1 2023-11-20 02:44:52,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=908653.3333333334, ans=0.0 2023-11-20 02:44:57,466 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4050, loss[loss=0.0841, simple_loss=0.107, pruned_loss=0.022, audio_tagging_loss=0.008593, over 15234.00 frames. ], tot_loss[loss=0.08251, simple_loss=0.1025, pruned_loss=0.02079, audio_tagging_loss=0.01045, over 3053887.09 frames. ], batch size: 56, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:44:57,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=908720.0, ans=0.125 2023-11-20 02:44:59,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=908720.0, ans=15.0 2023-11-20 02:45:01,196 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:45:08,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=908720.0, ans=0.125 2023-11-20 02:45:29,705 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:45:49,237 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136350 2023-11-20 02:45:49,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=908986.6666666666, ans=0.0 2023-11-20 02:46:01,979 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4100, loss[loss=0.07027, simple_loss=0.08631, pruned_loss=0.01334, audio_tagging_loss=0.01378, over 15168.00 frames. ], tot_loss[loss=0.08256, simple_loss=0.1027, pruned_loss=0.02083, audio_tagging_loss=0.01039, over 3053593.83 frames. ], batch size: 58, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:46:19,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.593e+01 8.318e+01 8.893e+01 9.801e+01 1.256e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 02:46:20,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=909120.0, ans=0.125 2023-11-20 02:46:35,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=909186.6666666666, ans=0.09899494936611666 2023-11-20 02:46:48,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=909253.3333333334, ans=0.125 2023-11-20 02:46:54,586 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136400 2023-11-20 02:46:56,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=909320.0, ans=0.1 2023-11-20 02:47:00,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-20 02:47:07,350 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4150, loss[loss=0.09761, simple_loss=0.1254, pruned_loss=0.02829, audio_tagging_loss=0.006606, over 15194.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1022, pruned_loss=0.02079, audio_tagging_loss=0.01023, over 3045507.91 frames. ], batch size: 57, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:47:15,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=909386.6666666666, ans=0.125 2023-11-20 02:47:16,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=909386.6666666666, ans=0.1 2023-11-20 02:47:39,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2023-11-20 02:47:45,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=909586.6666666666, ans=0.125 2023-11-20 02:47:50,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=909586.6666666666, ans=0.125 2023-11-20 02:47:54,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=909586.6666666666, ans=0.125 2023-11-20 02:47:55,217 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:47:59,045 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136450 2023-11-20 02:48:12,030 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4200, loss[loss=0.08816, simple_loss=0.1131, pruned_loss=0.02313, audio_tagging_loss=0.008497, over 15996.00 frames. ], tot_loss[loss=0.08195, simple_loss=0.102, pruned_loss=0.02078, audio_tagging_loss=0.01016, over 3043973.58 frames. ], batch size: 59, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:48:14,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=909720.0, ans=0.2 2023-11-20 02:48:23,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.15 vs. limit=15.0 2023-11-20 02:48:28,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=909786.6666666666, ans=0.125 2023-11-20 02:48:30,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.105e+01 8.354e+01 9.339e+01 1.014e+02 1.353e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-20 02:48:36,730 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=15.0 2023-11-20 02:48:57,912 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2023-11-20 02:49:04,002 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136500 2023-11-20 02:49:16,881 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4250, loss[loss=0.0812, simple_loss=0.104, pruned_loss=0.02191, audio_tagging_loss=0.007278, over 15140.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.1027, pruned_loss=0.02068, audio_tagging_loss=0.009963, over 3047546.57 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:49:33,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=910120.0, ans=0.125 2023-11-20 02:49:34,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=910120.0, ans=0.07 2023-11-20 02:49:56,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-20 02:49:59,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=910253.3333333334, ans=0.125 2023-11-20 02:50:02,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=910253.3333333334, ans=0.0 2023-11-20 02:50:07,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=910320.0, ans=0.125 2023-11-20 02:50:08,487 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136550 2023-11-20 02:50:13,308 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2023-11-20 02:50:21,277 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4300, loss[loss=0.09149, simple_loss=0.1103, pruned_loss=0.02465, audio_tagging_loss=0.0117, over 15068.00 frames. ], tot_loss[loss=0.08336, simple_loss=0.1043, pruned_loss=0.0213, audio_tagging_loss=0.009906, over 3048108.27 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 16.0 2023-11-20 02:50:40,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.438e+01 9.081e+01 9.991e+01 1.404e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 02:51:07,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=910586.6666666666, ans=0.5 2023-11-20 02:51:12,542 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136600 2023-11-20 02:51:25,603 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4350, loss[loss=0.09661, simple_loss=0.1208, pruned_loss=0.02699, audio_tagging_loss=0.009208, over 14938.00 frames. ], tot_loss[loss=0.08278, simple_loss=0.1033, pruned_loss=0.02114, audio_tagging_loss=0.009977, over 3044377.64 frames. ], batch size: 55, lr: 5.80e-03, grad_scale: 16.0 2023-11-20 02:51:58,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=910853.3333333334, ans=0.125 2023-11-20 02:52:08,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=910920.0, ans=0.125 2023-11-20 02:52:12,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.00 vs. limit=10.0 2023-11-20 02:52:13,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=910920.0, ans=0.0 2023-11-20 02:52:17,056 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136650 2023-11-20 02:52:22,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=12.0 2023-11-20 02:52:30,324 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4400, loss[loss=0.08731, simple_loss=0.1133, pruned_loss=0.02398, audio_tagging_loss=0.006667, over 15867.00 frames. ], tot_loss[loss=0.08296, simple_loss=0.1035, pruned_loss=0.02123, audio_tagging_loss=0.009999, over 3040604.34 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:52:49,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 7.985e+01 8.679e+01 9.460e+01 1.350e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 02:52:53,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=911120.0, ans=0.1 2023-11-20 02:52:54,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.00 vs. limit=22.5 2023-11-20 02:53:09,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=911253.3333333334, ans=0.2 2023-11-20 02:53:21,560 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136700 2023-11-20 02:53:25,184 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-20 02:53:28,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=911320.0, ans=0.2 2023-11-20 02:53:30,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=911320.0, ans=0.0 2023-11-20 02:53:32,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=911320.0, ans=0.125 2023-11-20 02:53:32,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=911320.0, ans=0.0 2023-11-20 02:53:34,337 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4450, loss[loss=0.08652, simple_loss=0.1039, pruned_loss=0.0245, audio_tagging_loss=0.01006, over 15680.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1032, pruned_loss=0.02123, audio_tagging_loss=0.01004, over 3034322.09 frames. ], batch size: 58, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:54:01,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=911520.0, ans=0.2 2023-11-20 02:54:14,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=911586.6666666666, ans=0.1 2023-11-20 02:54:19,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=911586.6666666666, ans=0.2 2023-11-20 02:54:25,010 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2023-11-20 02:54:25,570 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136750 2023-11-20 02:54:38,257 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4500, loss[loss=0.1015, simple_loss=0.1299, pruned_loss=0.0301, audio_tagging_loss=0.006438, over 15040.00 frames. ], tot_loss[loss=0.08265, simple_loss=0.1032, pruned_loss=0.02104, audio_tagging_loss=0.01, over 3041816.06 frames. ], batch size: 55, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:54:48,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2023-11-20 02:54:53,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=911786.6666666666, ans=0.1 2023-11-20 02:54:57,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.079e+01 8.708e+01 9.513e+01 1.325e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 02:55:12,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=911853.3333333334, ans=0.125 2023-11-20 02:55:23,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=911920.0, ans=0.5 2023-11-20 02:55:29,818 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136800 2023-11-20 02:55:34,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=911986.6666666666, ans=0.0 2023-11-20 02:55:43,009 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4550, loss[loss=0.1122, simple_loss=0.1391, pruned_loss=0.03354, audio_tagging_loss=0.009085, over 16655.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1037, pruned_loss=0.02096, audio_tagging_loss=0.01002, over 3046678.09 frames. ], batch size: 60, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:56:04,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=912120.0, ans=0.0 2023-11-20 02:56:11,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.20 vs. limit=22.5 2023-11-20 02:56:13,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=912186.6666666666, ans=0.125 2023-11-20 02:56:32,697 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:56:34,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=912320.0, ans=0.0 2023-11-20 02:56:35,234 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136850 2023-11-20 02:56:38,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=912320.0, ans=0.05 2023-11-20 02:56:40,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=912320.0, ans=0.125 2023-11-20 02:56:46,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=912320.0, ans=0.1 2023-11-20 02:56:48,558 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4600, loss[loss=0.06425, simple_loss=0.08009, pruned_loss=0.01123, audio_tagging_loss=0.01298, over 15094.00 frames. ], tot_loss[loss=0.08238, simple_loss=0.1027, pruned_loss=0.02076, audio_tagging_loss=0.01027, over 3044759.94 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:56:56,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-11-20 02:56:57,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=912386.6666666666, ans=0.0 2023-11-20 02:57:03,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=912453.3333333334, ans=0.1 2023-11-20 02:57:03,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=912453.3333333334, ans=0.1 2023-11-20 02:57:03,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=912453.3333333334, ans=0.0 2023-11-20 02:57:07,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.584e+01 7.967e+01 8.569e+01 9.519e+01 1.814e+02, threshold=1.714e+02, percent-clipped=1.0 2023-11-20 02:57:41,177 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136900 2023-11-20 02:57:43,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=912653.3333333334, ans=0.0 2023-11-20 02:57:53,872 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4650, loss[loss=0.0633, simple_loss=0.07508, pruned_loss=0.01327, audio_tagging_loss=0.01249, over 13776.00 frames. ], tot_loss[loss=0.08268, simple_loss=0.1028, pruned_loss=0.02093, audio_tagging_loss=0.01033, over 3045480.25 frames. ], batch size: 54, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:57:59,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-20 02:58:10,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=912786.6666666666, ans=0.0 2023-11-20 02:58:46,007 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 136950 2023-11-20 02:58:51,604 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2023-11-20 02:58:53,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=912986.6666666666, ans=0.125 2023-11-20 02:58:58,289 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4700, loss[loss=0.06023, simple_loss=0.06877, pruned_loss=0.01076, audio_tagging_loss=0.01509, over 14651.00 frames. ], tot_loss[loss=0.08217, simple_loss=0.1019, pruned_loss=0.02082, audio_tagging_loss=0.0104, over 3050072.74 frames. ], batch size: 59, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 02:59:07,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2023-11-20 02:59:17,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.087e+01 8.546e+01 9.453e+01 1.405e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 02:59:21,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=913120.0, ans=0.125 2023-11-20 02:59:30,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=913186.6666666666, ans=0.0 2023-11-20 02:59:49,515 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137000 2023-11-20 02:59:49,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=913320.0, ans=0.125 2023-11-20 03:00:03,308 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4750, loss[loss=0.07176, simple_loss=0.09328, pruned_loss=0.01501, audio_tagging_loss=0.01011, over 14893.00 frames. ], tot_loss[loss=0.08175, simple_loss=0.1012, pruned_loss=0.02065, audio_tagging_loss=0.01049, over 3047418.58 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:00:29,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-20 03:00:54,739 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137050 2023-11-20 03:01:07,329 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4800, loss[loss=0.06879, simple_loss=0.08348, pruned_loss=0.01547, audio_tagging_loss=0.01158, over 14015.00 frames. ], tot_loss[loss=0.08185, simple_loss=0.1013, pruned_loss=0.0207, audio_tagging_loss=0.01051, over 3048433.95 frames. ], batch size: 54, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:01:13,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-20 03:01:14,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=913720.0, ans=0.125 2023-11-20 03:01:23,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=913786.6666666666, ans=0.5 2023-11-20 03:01:25,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.051e+01 8.703e+01 9.476e+01 1.263e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 03:01:48,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=913920.0, ans=0.0 2023-11-20 03:01:58,931 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137100 2023-11-20 03:02:00,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=913986.6666666666, ans=0.125 2023-11-20 03:02:11,287 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4850, loss[loss=0.06545, simple_loss=0.08446, pruned_loss=0.01484, audio_tagging_loss=0.008387, over 14745.00 frames. ], tot_loss[loss=0.08193, simple_loss=0.1012, pruned_loss=0.02066, audio_tagging_loss=0.01066, over 3045962.85 frames. ], batch size: 58, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:02:40,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=914186.6666666666, ans=0.125 2023-11-20 03:02:47,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2023-11-20 03:02:51,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=914253.3333333334, ans=0.5 2023-11-20 03:03:00,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=914253.3333333334, ans=0.0 2023-11-20 03:03:02,587 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137150 2023-11-20 03:03:05,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=914320.0, ans=0.2 2023-11-20 03:03:15,886 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4900, loss[loss=0.06084, simple_loss=0.06436, pruned_loss=0.01501, audio_tagging_loss=0.01366, over 14809.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1014, pruned_loss=0.02073, audio_tagging_loss=0.01052, over 3042312.39 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:03:24,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=914386.6666666666, ans=0.125 2023-11-20 03:03:35,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.052e+01 8.825e+01 9.558e+01 1.326e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 03:04:02,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=914586.6666666666, ans=0.2 2023-11-20 03:04:07,021 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137200 2023-11-20 03:04:20,313 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 4950, loss[loss=0.09954, simple_loss=0.1271, pruned_loss=0.02926, audio_tagging_loss=0.006724, over 15193.00 frames. ], tot_loss[loss=0.0824, simple_loss=0.1023, pruned_loss=0.02097, audio_tagging_loss=0.01027, over 3038683.19 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:04:22,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=914720.0, ans=0.1 2023-11-20 03:04:28,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=914720.0, ans=0.1 2023-11-20 03:04:41,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=914786.6666666666, ans=0.0 2023-11-20 03:04:55,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=914853.3333333334, ans=0.125 2023-11-20 03:05:05,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=914920.0, ans=0.125 2023-11-20 03:05:12,453 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137250 2023-11-20 03:05:24,392 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5000, loss[loss=0.09339, simple_loss=0.1105, pruned_loss=0.02852, audio_tagging_loss=0.009613, over 16622.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.1017, pruned_loss=0.02072, audio_tagging_loss=0.01009, over 3032002.43 frames. ], batch size: 61, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:05:40,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=915120.0, ans=0.0 2023-11-20 03:05:43,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.699e+01 7.904e+01 8.718e+01 9.618e+01 1.428e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 03:05:48,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915120.0, ans=0.1 2023-11-20 03:06:01,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=915186.6666666666, ans=0.0 2023-11-20 03:06:01,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=915186.6666666666, ans=0.2 2023-11-20 03:06:15,469 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137300 2023-11-20 03:06:15,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=915320.0, ans=0.0 2023-11-20 03:06:28,271 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5050, loss[loss=0.08236, simple_loss=0.09576, pruned_loss=0.02305, audio_tagging_loss=0.01143, over 16906.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.1021, pruned_loss=0.02078, audio_tagging_loss=0.01005, over 3030527.95 frames. ], batch size: 63, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:06:43,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=915453.3333333334, ans=0.1 2023-11-20 03:06:52,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=915453.3333333334, ans=0.2 2023-11-20 03:06:53,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=915520.0, ans=0.125 2023-11-20 03:07:20,306 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137350 2023-11-20 03:07:24,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=915653.3333333334, ans=0.2 2023-11-20 03:07:25,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2023-11-20 03:07:25,913 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.48 vs. limit=10.0 2023-11-20 03:07:30,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=915653.3333333334, ans=0.125 2023-11-20 03:07:32,313 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5100, loss[loss=0.0808, simple_loss=0.09428, pruned_loss=0.01911, audio_tagging_loss=0.01455, over 15055.00 frames. ], tot_loss[loss=0.08156, simple_loss=0.1015, pruned_loss=0.02069, audio_tagging_loss=0.01011, over 3039699.33 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:07:49,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=8.0 2023-11-20 03:07:51,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=915786.6666666666, ans=0.0 2023-11-20 03:07:51,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 7.866e+01 8.491e+01 9.250e+01 1.522e+02, threshold=1.698e+02, percent-clipped=0.0 2023-11-20 03:08:02,816 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.110e-01 2023-11-20 03:08:24,153 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137400 2023-11-20 03:08:37,468 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5150, loss[loss=0.06829, simple_loss=0.09385, pruned_loss=0.01087, audio_tagging_loss=0.0105, over 14341.00 frames. ], tot_loss[loss=0.08136, simple_loss=0.1014, pruned_loss=0.02054, audio_tagging_loss=0.01012, over 3038926.29 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:08:48,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=916053.3333333334, ans=0.125 2023-11-20 03:08:56,108 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2023-11-20 03:09:28,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=916320.0, ans=0.0 2023-11-20 03:09:29,574 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137450 2023-11-20 03:09:30,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=916320.0, ans=0.125 2023-11-20 03:09:33,461 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:09:37,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=916320.0, ans=0.1 2023-11-20 03:09:42,211 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5200, loss[loss=0.08842, simple_loss=0.118, pruned_loss=0.02183, audio_tagging_loss=0.007581, over 15934.00 frames. ], tot_loss[loss=0.08071, simple_loss=0.1005, pruned_loss=0.02036, audio_tagging_loss=0.01011, over 3040268.87 frames. ], batch size: 57, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:10:01,277 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.182e+01 8.792e+01 9.724e+01 1.387e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 03:10:09,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=916520.0, ans=0.1 2023-11-20 03:10:09,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-20 03:10:11,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=916520.0, ans=0.125 2023-11-20 03:10:34,172 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137500 2023-11-20 03:10:38,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=916653.3333333334, ans=0.1 2023-11-20 03:10:45,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=916720.0, ans=0.125 2023-11-20 03:10:46,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2023-11-20 03:10:46,732 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5250, loss[loss=0.09071, simple_loss=0.1056, pruned_loss=0.0297, audio_tagging_loss=0.008209, over 14837.00 frames. ], tot_loss[loss=0.08204, simple_loss=0.102, pruned_loss=0.02093, audio_tagging_loss=0.01011, over 3041463.94 frames. ], batch size: 57, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:11:07,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=916786.6666666666, ans=0.0 2023-11-20 03:11:09,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=916786.6666666666, ans=0.09899494936611666 2023-11-20 03:11:23,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=916853.3333333334, ans=0.0 2023-11-20 03:11:31,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=22.5 2023-11-20 03:11:38,220 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137550 2023-11-20 03:11:44,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=916986.6666666666, ans=0.0 2023-11-20 03:11:51,370 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5300, loss[loss=0.09651, simple_loss=0.1332, pruned_loss=0.02268, audio_tagging_loss=0.007206, over 16138.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1019, pruned_loss=0.02101, audio_tagging_loss=0.01009, over 3035983.94 frames. ], batch size: 58, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:11:51,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=917053.3333333334, ans=0.0 2023-11-20 03:12:00,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=917053.3333333334, ans=0.0 2023-11-20 03:12:10,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.318e+01 9.072e+01 9.912e+01 1.242e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-20 03:12:11,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-20 03:12:12,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=917120.0, ans=0.07 2023-11-20 03:12:25,311 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-20 03:12:30,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.39 vs. limit=10.0 2023-11-20 03:12:36,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.45 vs. limit=22.5 2023-11-20 03:12:37,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=917253.3333333334, ans=0.125 2023-11-20 03:12:37,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=917253.3333333334, ans=15.0 2023-11-20 03:12:43,191 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137600 2023-11-20 03:12:56,090 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5350, loss[loss=0.07609, simple_loss=0.09931, pruned_loss=0.01994, audio_tagging_loss=0.006491, over 15035.00 frames. ], tot_loss[loss=0.08133, simple_loss=0.1006, pruned_loss=0.02074, audio_tagging_loss=0.01027, over 3039408.79 frames. ], batch size: 56, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:12:59,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=917386.6666666666, ans=0.125 2023-11-20 03:13:01,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-20 03:13:09,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=917453.3333333334, ans=0.125 2023-11-20 03:13:17,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=917453.3333333334, ans=0.04949747468305833 2023-11-20 03:13:18,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=917453.3333333334, ans=0.035 2023-11-20 03:13:37,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=917586.6666666666, ans=0.0 2023-11-20 03:13:42,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=917586.6666666666, ans=0.125 2023-11-20 03:13:47,649 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137650 2023-11-20 03:13:54,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=917653.3333333334, ans=0.0 2023-11-20 03:13:57,604 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2023-11-20 03:14:00,426 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5400, loss[loss=0.07528, simple_loss=0.09503, pruned_loss=0.01778, audio_tagging_loss=0.00998, over 14130.00 frames. ], tot_loss[loss=0.08167, simple_loss=0.1015, pruned_loss=0.02073, audio_tagging_loss=0.01019, over 3042454.12 frames. ], batch size: 54, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:14:19,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.272e+01 8.874e+01 9.617e+01 1.716e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 03:14:29,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2023-11-20 03:14:29,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=917853.3333333334, ans=0.125 2023-11-20 03:14:51,540 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137700 2023-11-20 03:15:00,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-20 03:15:04,138 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5450, loss[loss=0.08111, simple_loss=0.1025, pruned_loss=0.01843, audio_tagging_loss=0.01142, over 15173.00 frames. ], tot_loss[loss=0.08187, simple_loss=0.1015, pruned_loss=0.02078, audio_tagging_loss=0.01032, over 3042093.74 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:15:14,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=918053.3333333334, ans=0.0 2023-11-20 03:15:17,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=918120.0, ans=0.2 2023-11-20 03:15:19,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=918120.0, ans=0.125 2023-11-20 03:15:28,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=918186.6666666666, ans=0.125 2023-11-20 03:15:47,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2023-11-20 03:15:48,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=918253.3333333334, ans=0.125 2023-11-20 03:15:55,385 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137750 2023-11-20 03:16:02,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2023-11-20 03:16:08,290 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5500, loss[loss=0.08274, simple_loss=0.1057, pruned_loss=0.02134, audio_tagging_loss=0.008549, over 15523.00 frames. ], tot_loss[loss=0.08144, simple_loss=0.1011, pruned_loss=0.02065, audio_tagging_loss=0.01026, over 3042896.32 frames. ], batch size: 58, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:16:14,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=918386.6666666666, ans=0.0 2023-11-20 03:16:27,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.238e+01 9.064e+01 9.896e+01 2.099e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-20 03:17:00,400 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137800 2023-11-20 03:17:10,526 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-20 03:17:13,452 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5550, loss[loss=0.09683, simple_loss=0.1191, pruned_loss=0.02629, audio_tagging_loss=0.01098, over 15489.00 frames. ], tot_loss[loss=0.08279, simple_loss=0.1026, pruned_loss=0.02117, audio_tagging_loss=0.01033, over 3048823.85 frames. ], batch size: 54, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:17:26,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=918786.6666666666, ans=0.0 2023-11-20 03:17:39,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=918853.3333333334, ans=0.0 2023-11-20 03:17:39,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=918853.3333333334, ans=0.125 2023-11-20 03:18:04,344 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137850 2023-11-20 03:18:11,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=918986.6666666666, ans=0.125 2023-11-20 03:18:16,839 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5600, loss[loss=0.08016, simple_loss=0.09381, pruned_loss=0.02124, audio_tagging_loss=0.01201, over 14564.00 frames. ], tot_loss[loss=0.08294, simple_loss=0.1031, pruned_loss=0.02105, audio_tagging_loss=0.01037, over 3048517.50 frames. ], batch size: 56, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:18:24,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=919053.3333333334, ans=0.0 2023-11-20 03:18:25,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=919053.3333333334, ans=0.1 2023-11-20 03:18:35,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.491e+01 8.234e+01 9.061e+01 1.016e+02 1.381e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-20 03:19:01,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2023-11-20 03:19:02,527 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:19:03,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=919253.3333333334, ans=0.2 2023-11-20 03:19:07,348 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137900 2023-11-20 03:19:13,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=919320.0, ans=0.125 2023-11-20 03:19:19,192 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5650, loss[loss=0.07253, simple_loss=0.09758, pruned_loss=0.01225, audio_tagging_loss=0.01149, over 14866.00 frames. ], tot_loss[loss=0.08283, simple_loss=0.1032, pruned_loss=0.02092, audio_tagging_loss=0.01032, over 3062266.28 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:19:25,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=919386.6666666666, ans=0.0 2023-11-20 03:20:09,961 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 137950 2023-11-20 03:20:13,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=919653.3333333334, ans=0.125 2023-11-20 03:20:23,361 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5700, loss[loss=0.06383, simple_loss=0.07641, pruned_loss=0.01462, audio_tagging_loss=0.011, over 15448.00 frames. ], tot_loss[loss=0.08253, simple_loss=0.1027, pruned_loss=0.02083, audio_tagging_loss=0.01035, over 3058293.44 frames. ], batch size: 60, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:20:23,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=919720.0, ans=0.2 2023-11-20 03:20:25,087 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=22.5 2023-11-20 03:20:42,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.375e+01 8.900e+01 9.766e+01 1.297e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 03:20:52,619 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.102e-01 2023-11-20 03:21:15,204 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138000 2023-11-20 03:21:28,267 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5750, loss[loss=0.1016, simple_loss=0.1296, pruned_loss=0.02827, audio_tagging_loss=0.008575, over 15449.00 frames. ], tot_loss[loss=0.0823, simple_loss=0.1025, pruned_loss=0.02083, audio_tagging_loss=0.01024, over 3056564.38 frames. ], batch size: 56, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:21:32,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=920053.3333333334, ans=0.125 2023-11-20 03:21:59,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=920186.6666666666, ans=0.0 2023-11-20 03:22:04,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=920186.6666666666, ans=0.125 2023-11-20 03:22:10,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920253.3333333334, ans=0.1 2023-11-20 03:22:13,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.52 vs. limit=6.0 2023-11-20 03:22:19,754 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138050 2023-11-20 03:22:31,823 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5800, loss[loss=0.06595, simple_loss=0.06671, pruned_loss=0.01819, audio_tagging_loss=0.0144, over 14109.00 frames. ], tot_loss[loss=0.08221, simple_loss=0.1025, pruned_loss=0.02082, audio_tagging_loss=0.01014, over 3054011.01 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:22:48,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920453.3333333334, ans=0.1 2023-11-20 03:22:49,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=920453.3333333334, ans=0.125 2023-11-20 03:22:52,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.315e+01 8.891e+01 9.653e+01 1.172e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 03:23:00,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=920520.0, ans=0.125 2023-11-20 03:23:07,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=920520.0, ans=0.1 2023-11-20 03:23:23,078 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138100 2023-11-20 03:23:33,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=920653.3333333334, ans=0.125 2023-11-20 03:23:36,489 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5850, loss[loss=0.09158, simple_loss=0.1096, pruned_loss=0.02255, audio_tagging_loss=0.01421, over 14519.00 frames. ], tot_loss[loss=0.08169, simple_loss=0.1021, pruned_loss=0.02062, audio_tagging_loss=0.01003, over 3055596.83 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:23:38,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=920720.0, ans=0.025 2023-11-20 03:23:55,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=920786.6666666666, ans=0.125 2023-11-20 03:24:25,948 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2023-11-20 03:24:27,838 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138150 2023-11-20 03:24:39,929 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5900, loss[loss=0.08105, simple_loss=0.1009, pruned_loss=0.0225, audio_tagging_loss=0.008083, over 14736.00 frames. ], tot_loss[loss=0.0818, simple_loss=0.102, pruned_loss=0.02087, audio_tagging_loss=0.009924, over 3051990.11 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:24:44,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-11-20 03:24:52,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=921120.0, ans=10.0 2023-11-20 03:24:54,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=921120.0, ans=0.125 2023-11-20 03:24:56,873 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.55 vs. limit=15.0 2023-11-20 03:24:59,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.195e+01 8.943e+01 1.006e+02 1.652e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 03:25:08,196 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:25:25,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-20 03:25:31,452 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138200 2023-11-20 03:25:31,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=921320.0, ans=0.125 2023-11-20 03:25:39,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=921320.0, ans=0.125 2023-11-20 03:25:43,840 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 5950, loss[loss=0.08266, simple_loss=0.1003, pruned_loss=0.02215, audio_tagging_loss=0.01038, over 15229.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1017, pruned_loss=0.02072, audio_tagging_loss=0.009959, over 3050813.54 frames. ], batch size: 58, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:25:45,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=921386.6666666666, ans=0.125 2023-11-20 03:26:08,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2023-11-20 03:26:14,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-11-20 03:26:21,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=921586.6666666666, ans=0.09899494936611666 2023-11-20 03:26:23,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=921586.6666666666, ans=0.125 2023-11-20 03:26:29,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2023-11-20 03:26:34,909 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138250 2023-11-20 03:26:47,503 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6000, loss[loss=0.08614, simple_loss=0.1037, pruned_loss=0.02419, audio_tagging_loss=0.01013, over 15500.00 frames. ], tot_loss[loss=0.08219, simple_loss=0.1028, pruned_loss=0.02091, audio_tagging_loss=0.009881, over 3043642.52 frames. ], batch size: 57, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:26:47,503 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 03:27:28,658 INFO [train_asr.py:1294] (2/4) Epoch 12, validation: loss=0.06387, simple_loss=0.05435, pruned_loss=0.006012, audio_tagging_loss=0.03068, over 4681554.00 frames. 2023-11-20 03:27:28,659 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 03:27:38,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=921720.0, ans=0.1 2023-11-20 03:27:48,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.205e+01 8.900e+01 1.006e+02 1.555e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 03:27:56,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=921853.3333333334, ans=0.2 2023-11-20 03:28:13,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=921920.0, ans=0.1 2023-11-20 03:28:16,467 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:28:20,250 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138300 2023-11-20 03:28:25,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=921986.6666666666, ans=0.125 2023-11-20 03:28:28,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-20 03:28:32,672 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6050, loss[loss=0.04996, simple_loss=0.05635, pruned_loss=0.01033, audio_tagging_loss=0.01146, over 13427.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.1023, pruned_loss=0.02083, audio_tagging_loss=0.009944, over 3044604.52 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:28:33,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=922053.3333333334, ans=0.2 2023-11-20 03:28:42,463 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2023-11-20 03:29:12,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=922253.3333333334, ans=0.0 2023-11-20 03:29:24,349 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138350 2023-11-20 03:29:27,190 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:29:31,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=922320.0, ans=0.2 2023-11-20 03:29:37,779 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6100, loss[loss=0.08036, simple_loss=0.09828, pruned_loss=0.02011, audio_tagging_loss=0.01111, over 14604.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1023, pruned_loss=0.02094, audio_tagging_loss=0.01003, over 3040979.25 frames. ], batch size: 58, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:30:00,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 7.861e+01 8.501e+01 9.321e+01 2.317e+02, threshold=1.700e+02, percent-clipped=1.0 2023-11-20 03:30:06,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2023-11-20 03:30:24,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=922586.6666666666, ans=0.2 2023-11-20 03:30:26,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=922586.6666666666, ans=0.025 2023-11-20 03:30:30,146 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138400 2023-11-20 03:30:43,108 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6150, loss[loss=0.09134, simple_loss=0.1059, pruned_loss=0.02685, audio_tagging_loss=0.01152, over 14685.00 frames. ], tot_loss[loss=0.08236, simple_loss=0.1023, pruned_loss=0.02107, audio_tagging_loss=0.01014, over 3048900.14 frames. ], batch size: 54, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:31:21,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=922920.0, ans=0.0 2023-11-20 03:31:27,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=922920.0, ans=0.125 2023-11-20 03:31:34,749 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138450 2023-11-20 03:31:35,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=922986.6666666666, ans=0.1 2023-11-20 03:31:41,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=922986.6666666666, ans=0.0 2023-11-20 03:31:47,110 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6200, loss[loss=0.08404, simple_loss=0.09971, pruned_loss=0.02386, audio_tagging_loss=0.01032, over 14588.00 frames. ], tot_loss[loss=0.08207, simple_loss=0.102, pruned_loss=0.02091, audio_tagging_loss=0.01013, over 3050934.10 frames. ], batch size: 55, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:31:47,432 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:32:05,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=923120.0, ans=0.125 2023-11-20 03:32:09,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.412e+01 9.324e+01 1.012e+02 1.364e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-20 03:32:16,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=923186.6666666666, ans=0.09899494936611666 2023-11-20 03:32:18,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=22.5 2023-11-20 03:32:23,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=923186.6666666666, ans=0.1 2023-11-20 03:32:39,054 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138500 2023-11-20 03:32:46,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=923320.0, ans=0.125 2023-11-20 03:32:52,008 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6250, loss[loss=0.06602, simple_loss=0.07695, pruned_loss=0.01486, audio_tagging_loss=0.01268, over 13634.00 frames. ], tot_loss[loss=0.08137, simple_loss=0.1009, pruned_loss=0.0205, audio_tagging_loss=0.01041, over 3052194.51 frames. ], batch size: 53, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:33:12,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=923453.3333333334, ans=0.0 2023-11-20 03:33:36,318 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-20 03:33:38,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-11-20 03:33:41,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923586.6666666666, ans=0.1 2023-11-20 03:33:42,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=923653.3333333334, ans=0.04949747468305833 2023-11-20 03:33:43,697 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138550 2023-11-20 03:33:44,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=923653.3333333334, ans=0.125 2023-11-20 03:33:55,746 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6300, loss[loss=0.06619, simple_loss=0.08179, pruned_loss=0.01148, audio_tagging_loss=0.01382, over 14751.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.101, pruned_loss=0.02048, audio_tagging_loss=0.01044, over 3046724.63 frames. ], batch size: 54, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:34:01,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=923720.0, ans=0.0 2023-11-20 03:34:09,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=923786.6666666666, ans=0.0 2023-11-20 03:34:17,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.557e+01 9.163e+01 1.006e+02 1.411e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-20 03:34:25,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=923853.3333333334, ans=0.0 2023-11-20 03:34:25,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=923853.3333333334, ans=0.1 2023-11-20 03:34:30,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=923853.3333333334, ans=0.125 2023-11-20 03:34:31,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=923853.3333333334, ans=0.125 2023-11-20 03:34:48,359 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138600 2023-11-20 03:34:49,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-20 03:35:01,692 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6350, loss[loss=0.07668, simple_loss=0.1018, pruned_loss=0.01692, audio_tagging_loss=0.008832, over 14631.00 frames. ], tot_loss[loss=0.08066, simple_loss=0.09998, pruned_loss=0.02015, audio_tagging_loss=0.01053, over 3051877.09 frames. ], batch size: 54, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:35:01,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=924053.3333333334, ans=0.0 2023-11-20 03:35:16,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=924120.0, ans=0.0 2023-11-20 03:35:16,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=924120.0, ans=0.125 2023-11-20 03:35:17,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2023-11-20 03:35:27,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=924186.6666666666, ans=0.2 2023-11-20 03:35:46,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=924253.3333333334, ans=0.125 2023-11-20 03:35:53,794 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138650 2023-11-20 03:36:06,529 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6400, loss[loss=0.08066, simple_loss=0.1017, pruned_loss=0.02005, audio_tagging_loss=0.00974, over 15416.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.09918, pruned_loss=0.02004, audio_tagging_loss=0.01066, over 3048027.65 frames. ], batch size: 61, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:36:28,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.195e+01 8.907e+01 9.891e+01 1.303e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 03:36:47,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=924586.6666666666, ans=0.125 2023-11-20 03:36:58,939 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138700 2023-11-20 03:37:05,213 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=8.175e-03 2023-11-20 03:37:11,078 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6450, loss[loss=0.09775, simple_loss=0.1192, pruned_loss=0.02654, audio_tagging_loss=0.01159, over 15114.00 frames. ], tot_loss[loss=0.08048, simple_loss=0.09919, pruned_loss=0.02011, audio_tagging_loss=0.01078, over 3045782.52 frames. ], batch size: 55, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:37:13,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-20 03:37:18,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=924720.0, ans=0.07 2023-11-20 03:37:42,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=924853.3333333334, ans=0.125 2023-11-20 03:37:46,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=924853.3333333334, ans=0.2 2023-11-20 03:38:02,390 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138750 2023-11-20 03:38:15,263 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6500, loss[loss=0.08242, simple_loss=0.09818, pruned_loss=0.02422, audio_tagging_loss=0.009103, over 15772.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.1005, pruned_loss=0.02052, audio_tagging_loss=0.01061, over 3046509.52 frames. ], batch size: 59, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:38:18,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=925053.3333333334, ans=0.125 2023-11-20 03:38:24,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925053.3333333334, ans=0.1 2023-11-20 03:38:27,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-11-20 03:38:28,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=925120.0, ans=0.0 2023-11-20 03:38:37,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.283e+01 9.037e+01 9.701e+01 1.555e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 03:38:44,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2023-11-20 03:38:47,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2023-11-20 03:38:48,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=925186.6666666666, ans=0.0 2023-11-20 03:39:04,379 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:39:06,601 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138800 2023-11-20 03:39:17,492 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:39:20,291 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6550, loss[loss=0.09504, simple_loss=0.1128, pruned_loss=0.03017, audio_tagging_loss=0.008494, over 15253.00 frames. ], tot_loss[loss=0.08174, simple_loss=0.1015, pruned_loss=0.02061, audio_tagging_loss=0.01038, over 3049918.57 frames. ], batch size: 60, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:39:24,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=925386.6666666666, ans=0.0 2023-11-20 03:39:27,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=925386.6666666666, ans=0.95 2023-11-20 03:39:32,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2023-11-20 03:39:38,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=925453.3333333334, ans=0.1 2023-11-20 03:39:51,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=925520.0, ans=0.125 2023-11-20 03:40:05,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=925586.6666666666, ans=0.0 2023-11-20 03:40:12,485 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138850 2023-11-20 03:40:21,005 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2023-11-20 03:40:25,147 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6600, loss[loss=0.06894, simple_loss=0.08126, pruned_loss=0.01594, audio_tagging_loss=0.01237, over 14584.00 frames. ], tot_loss[loss=0.08132, simple_loss=0.101, pruned_loss=0.02048, audio_tagging_loss=0.01033, over 3049002.68 frames. ], batch size: 57, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:40:28,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=925720.0, ans=0.2 2023-11-20 03:40:46,611 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.079e+01 8.710e+01 9.676e+01 1.211e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 03:41:17,215 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138900 2023-11-20 03:41:29,865 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6650, loss[loss=0.09497, simple_loss=0.1192, pruned_loss=0.02742, audio_tagging_loss=0.007926, over 15289.00 frames. ], tot_loss[loss=0.08095, simple_loss=0.1008, pruned_loss=0.02038, audio_tagging_loss=0.0102, over 3048214.03 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:41:48,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926120.0, ans=0.1 2023-11-20 03:41:52,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=926120.0, ans=0.125 2023-11-20 03:42:01,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=926186.6666666666, ans=0.1 2023-11-20 03:42:05,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=926186.6666666666, ans=0.125 2023-11-20 03:42:21,836 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 138950 2023-11-20 03:42:23,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=926320.0, ans=0.5 2023-11-20 03:42:34,587 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6700, loss[loss=0.07183, simple_loss=0.09802, pruned_loss=0.01463, audio_tagging_loss=0.008184, over 14695.00 frames. ], tot_loss[loss=0.08, simple_loss=0.09958, pruned_loss=0.02012, audio_tagging_loss=0.01009, over 3041347.09 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:42:56,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.827e+01 8.182e+01 8.627e+01 9.432e+01 1.236e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-20 03:43:13,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926586.6666666666, ans=0.1 2023-11-20 03:43:21,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=926586.6666666666, ans=0.025 2023-11-20 03:43:26,342 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139000 2023-11-20 03:43:32,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=926653.3333333334, ans=0.1 2023-11-20 03:43:39,262 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6750, loss[loss=0.08936, simple_loss=0.09651, pruned_loss=0.02926, audio_tagging_loss=0.01184, over 15655.00 frames. ], tot_loss[loss=0.08004, simple_loss=0.09945, pruned_loss=0.02017, audio_tagging_loss=0.01015, over 3040176.61 frames. ], batch size: 61, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:43:41,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=926720.0, ans=0.125 2023-11-20 03:43:54,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=926786.6666666666, ans=0.125 2023-11-20 03:43:57,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=926786.6666666666, ans=0.0 2023-11-20 03:44:29,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=926986.6666666666, ans=0.07 2023-11-20 03:44:30,366 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139050 2023-11-20 03:44:31,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=926986.6666666666, ans=0.0 2023-11-20 03:44:42,791 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6800, loss[loss=0.07116, simple_loss=0.08667, pruned_loss=0.01898, audio_tagging_loss=0.008852, over 14782.00 frames. ], tot_loss[loss=0.08155, simple_loss=0.1016, pruned_loss=0.02072, audio_tagging_loss=0.01004, over 3049075.67 frames. ], batch size: 55, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:44:58,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=22.5 2023-11-20 03:45:06,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.226e+01 8.844e+01 9.966e+01 1.208e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 03:45:34,553 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139100 2023-11-20 03:45:36,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2023-11-20 03:45:46,934 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6850, loss[loss=0.0633, simple_loss=0.07545, pruned_loss=0.01474, audio_tagging_loss=0.01084, over 15636.00 frames. ], tot_loss[loss=0.08129, simple_loss=0.1013, pruned_loss=0.02072, audio_tagging_loss=0.009913, over 3041088.55 frames. ], batch size: 58, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:46:23,413 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-11-20 03:46:26,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=927586.6666666666, ans=0.0 2023-11-20 03:46:38,582 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139150 2023-11-20 03:46:52,488 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6900, loss[loss=0.1061, simple_loss=0.1384, pruned_loss=0.03032, audio_tagging_loss=0.00658, over 15812.00 frames. ], tot_loss[loss=0.08104, simple_loss=0.1016, pruned_loss=0.02051, audio_tagging_loss=0.00974, over 3045637.47 frames. ], batch size: 58, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:47:08,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=927786.6666666666, ans=0.125 2023-11-20 03:47:09,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.90 vs. limit=22.5 2023-11-20 03:47:15,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=927786.6666666666, ans=0.125 2023-11-20 03:47:16,179 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.075e+01 8.808e+01 9.662e+01 1.235e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 03:47:24,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2023-11-20 03:47:25,101 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:47:42,940 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:47:44,314 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139200 2023-11-20 03:47:54,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=927986.6666666666, ans=0.125 2023-11-20 03:47:57,532 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 6950, loss[loss=0.08056, simple_loss=0.1052, pruned_loss=0.01841, audio_tagging_loss=0.009535, over 15331.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1013, pruned_loss=0.02059, audio_tagging_loss=0.00983, over 3044295.68 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:47:57,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=928053.3333333334, ans=0.07 2023-11-20 03:47:59,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=928053.3333333334, ans=0.0 2023-11-20 03:48:02,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=928053.3333333334, ans=0.125 2023-11-20 03:48:21,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=928186.6666666666, ans=0.125 2023-11-20 03:48:34,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=928186.6666666666, ans=0.2 2023-11-20 03:48:34,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=928186.6666666666, ans=0.0 2023-11-20 03:48:48,910 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139250 2023-11-20 03:49:01,000 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7000, loss[loss=0.06662, simple_loss=0.07256, pruned_loss=0.01845, audio_tagging_loss=0.01189, over 15219.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1014, pruned_loss=0.02057, audio_tagging_loss=0.009871, over 3044061.79 frames. ], batch size: 57, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:49:06,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.51 vs. limit=10.0 2023-11-20 03:49:25,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.169e+01 8.824e+01 9.776e+01 1.242e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 03:49:34,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=928520.0, ans=0.125 2023-11-20 03:49:35,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=928520.0, ans=0.125 2023-11-20 03:49:35,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2023-11-20 03:49:52,142 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139300 2023-11-20 03:50:00,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=928653.3333333334, ans=0.0 2023-11-20 03:50:04,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=928720.0, ans=0.125 2023-11-20 03:50:05,511 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7050, loss[loss=0.06881, simple_loss=0.08006, pruned_loss=0.01759, audio_tagging_loss=0.01119, over 14482.00 frames. ], tot_loss[loss=0.08151, simple_loss=0.1019, pruned_loss=0.02066, audio_tagging_loss=0.009884, over 3049351.76 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:50:31,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=928853.3333333334, ans=0.0 2023-11-20 03:50:56,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=928986.6666666666, ans=0.125 2023-11-20 03:50:57,609 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139350 2023-11-20 03:51:00,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=928986.6666666666, ans=0.0 2023-11-20 03:51:04,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=928986.6666666666, ans=0.0 2023-11-20 03:51:10,698 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7100, loss[loss=0.09532, simple_loss=0.1213, pruned_loss=0.02528, audio_tagging_loss=0.009376, over 16335.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.103, pruned_loss=0.0207, audio_tagging_loss=0.009911, over 3057727.23 frames. ], batch size: 60, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:51:34,673 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.104e+01 8.912e+01 9.574e+01 1.346e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 03:51:35,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-20 03:51:56,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=929253.3333333334, ans=0.125 2023-11-20 03:51:57,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-11-20 03:51:58,206 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:52:03,521 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139400 2023-11-20 03:52:04,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=929320.0, ans=0.125 2023-11-20 03:52:14,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=929386.6666666666, ans=0.0 2023-11-20 03:52:15,904 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7150, loss[loss=0.07793, simple_loss=0.09504, pruned_loss=0.01805, audio_tagging_loss=0.01236, over 15256.00 frames. ], tot_loss[loss=0.08172, simple_loss=0.102, pruned_loss=0.02058, audio_tagging_loss=0.01015, over 3047649.63 frames. ], batch size: 59, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:52:20,261 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-20 03:52:38,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=929453.3333333334, ans=0.1 2023-11-20 03:52:46,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=929520.0, ans=0.1 2023-11-20 03:52:54,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=929586.6666666666, ans=0.1 2023-11-20 03:53:07,765 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139450 2023-11-20 03:53:10,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=929653.3333333334, ans=0.125 2023-11-20 03:53:21,253 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7200, loss[loss=0.08891, simple_loss=0.1055, pruned_loss=0.02427, audio_tagging_loss=0.01186, over 14649.00 frames. ], tot_loss[loss=0.08167, simple_loss=0.1017, pruned_loss=0.02052, audio_tagging_loss=0.01031, over 3046332.01 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 32.0 2023-11-20 03:53:28,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2023-11-20 03:53:45,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.353e+01 8.991e+01 9.790e+01 1.410e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 03:53:47,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=929853.3333333334, ans=0.2 2023-11-20 03:53:59,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=929920.0, ans=0.1 2023-11-20 03:54:13,124 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139500 2023-11-20 03:54:15,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=929986.6666666666, ans=0.07 2023-11-20 03:54:19,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=929986.6666666666, ans=0.125 2023-11-20 03:54:21,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=929986.6666666666, ans=0.125 2023-11-20 03:54:25,371 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7250, loss[loss=0.07648, simple_loss=0.1027, pruned_loss=0.01745, audio_tagging_loss=0.007665, over 14839.00 frames. ], tot_loss[loss=0.08205, simple_loss=0.102, pruned_loss=0.02065, audio_tagging_loss=0.01038, over 3052082.56 frames. ], batch size: 53, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:54:48,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=930120.0, ans=0.2 2023-11-20 03:55:05,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=930253.3333333334, ans=0.125 2023-11-20 03:55:17,504 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139550 2023-11-20 03:55:21,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=930320.0, ans=0.2 2023-11-20 03:55:26,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=930320.0, ans=0.0 2023-11-20 03:55:30,948 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7300, loss[loss=0.08736, simple_loss=0.1059, pruned_loss=0.02383, audio_tagging_loss=0.0106, over 15547.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.1024, pruned_loss=0.02081, audio_tagging_loss=0.01029, over 3053234.19 frames. ], batch size: 58, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:55:31,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=930386.6666666666, ans=0.0 2023-11-20 03:55:31,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=930386.6666666666, ans=0.025 2023-11-20 03:55:39,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.72 vs. limit=10.0 2023-11-20 03:55:45,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=12.0 2023-11-20 03:55:46,438 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:55:56,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.831e+01 8.147e+01 8.750e+01 9.433e+01 1.159e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 03:56:00,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=930520.0, ans=0.125 2023-11-20 03:56:05,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=930520.0, ans=0.1 2023-11-20 03:56:22,043 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139600 2023-11-20 03:56:23,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=930653.3333333334, ans=0.0 2023-11-20 03:56:35,336 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7350, loss[loss=0.08822, simple_loss=0.1082, pruned_loss=0.02431, audio_tagging_loss=0.0098, over 15087.00 frames. ], tot_loss[loss=0.08324, simple_loss=0.104, pruned_loss=0.02121, audio_tagging_loss=0.01005, over 3049482.82 frames. ], batch size: 58, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:56:45,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-20 03:57:04,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=930853.3333333334, ans=0.125 2023-11-20 03:57:11,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=930853.3333333334, ans=0.07 2023-11-20 03:57:17,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2023-11-20 03:57:19,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=930920.0, ans=0.125 2023-11-20 03:57:27,633 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139650 2023-11-20 03:57:29,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=930986.6666666666, ans=0.0 2023-11-20 03:57:32,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=930986.6666666666, ans=0.2 2023-11-20 03:57:32,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=930986.6666666666, ans=0.125 2023-11-20 03:57:36,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=930986.6666666666, ans=0.125 2023-11-20 03:57:37,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=930986.6666666666, ans=0.0 2023-11-20 03:57:38,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2023-11-20 03:57:39,591 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7400, loss[loss=0.08091, simple_loss=0.1058, pruned_loss=0.01939, audio_tagging_loss=0.008626, over 14961.00 frames. ], tot_loss[loss=0.08359, simple_loss=0.1048, pruned_loss=0.02127, audio_tagging_loss=0.009906, over 3052144.85 frames. ], batch size: 54, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:57:52,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=931120.0, ans=0.125 2023-11-20 03:58:05,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 7.809e+01 8.517e+01 9.487e+01 1.228e+02, threshold=1.703e+02, percent-clipped=0.0 2023-11-20 03:58:10,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=931186.6666666666, ans=0.0 2023-11-20 03:58:16,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=931186.6666666666, ans=0.0 2023-11-20 03:58:30,989 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139700 2023-11-20 03:58:38,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-20 03:58:44,238 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7450, loss[loss=0.06938, simple_loss=0.08053, pruned_loss=0.01948, audio_tagging_loss=0.009632, over 14873.00 frames. ], tot_loss[loss=0.08388, simple_loss=0.1049, pruned_loss=0.02158, audio_tagging_loss=0.009857, over 3051906.62 frames. ], batch size: 55, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:58:48,613 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-20 03:58:50,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=931386.6666666666, ans=0.125 2023-11-20 03:59:03,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.59 vs. limit=10.0 2023-11-20 03:59:04,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=931453.3333333334, ans=0.125 2023-11-20 03:59:09,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=931520.0, ans=0.125 2023-11-20 03:59:35,528 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139750 2023-11-20 03:59:41,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=931653.3333333334, ans=0.0 2023-11-20 03:59:47,765 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7500, loss[loss=0.113, simple_loss=0.1418, pruned_loss=0.03418, audio_tagging_loss=0.007931, over 15767.00 frames. ], tot_loss[loss=0.08344, simple_loss=0.1043, pruned_loss=0.02145, audio_tagging_loss=0.009851, over 3056502.78 frames. ], batch size: 57, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 04:00:05,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=931786.6666666666, ans=0.0 2023-11-20 04:00:09,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=931786.6666666666, ans=15.0 2023-11-20 04:00:13,506 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.250e+01 8.176e+01 8.772e+01 9.671e+01 2.176e+02, threshold=1.754e+02, percent-clipped=1.0 2023-11-20 04:00:16,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=931853.3333333334, ans=0.125 2023-11-20 04:00:21,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=931853.3333333334, ans=0.0 2023-11-20 04:00:24,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=931853.3333333334, ans=0.0 2023-11-20 04:00:25,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=931920.0, ans=0.0 2023-11-20 04:00:33,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=931920.0, ans=0.125 2023-11-20 04:00:39,794 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139800 2023-11-20 04:00:44,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=931986.6666666666, ans=0.1 2023-11-20 04:00:52,979 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7550, loss[loss=0.05453, simple_loss=0.06464, pruned_loss=0.01317, audio_tagging_loss=0.009038, over 16078.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.1038, pruned_loss=0.02134, audio_tagging_loss=0.009817, over 3051344.54 frames. ], batch size: 62, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:00:56,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=932053.3333333334, ans=0.1 2023-11-20 04:01:10,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=932120.0, ans=0.0 2023-11-20 04:01:11,517 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.92 vs. limit=10.0 2023-11-20 04:01:38,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-20 04:01:43,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=932320.0, ans=0.125 2023-11-20 04:01:44,723 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139850 2023-11-20 04:01:54,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=932320.0, ans=0.125 2023-11-20 04:01:57,664 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7600, loss[loss=0.1148, simple_loss=0.142, pruned_loss=0.03513, audio_tagging_loss=0.00866, over 17034.00 frames. ], tot_loss[loss=0.0829, simple_loss=0.1037, pruned_loss=0.02126, audio_tagging_loss=0.009801, over 3055135.47 frames. ], batch size: 61, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:02:06,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-11-20 04:02:10,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=932453.3333333334, ans=0.2 2023-11-20 04:02:14,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=932453.3333333334, ans=0.125 2023-11-20 04:02:23,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 8.108e+01 8.753e+01 9.539e+01 1.294e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 04:02:44,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=932586.6666666666, ans=0.2 2023-11-20 04:02:49,272 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139900 2023-11-20 04:03:02,135 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7650, loss[loss=0.06645, simple_loss=0.08818, pruned_loss=0.01196, audio_tagging_loss=0.01041, over 15206.00 frames. ], tot_loss[loss=0.08227, simple_loss=0.1027, pruned_loss=0.02104, audio_tagging_loss=0.009852, over 3053891.05 frames. ], batch size: 62, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:03:11,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=932720.0, ans=0.0 2023-11-20 04:03:38,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=932853.3333333334, ans=0.1 2023-11-20 04:03:41,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=932920.0, ans=0.125 2023-11-20 04:03:53,558 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 139950 2023-11-20 04:04:03,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=932986.6666666666, ans=0.1 2023-11-20 04:04:03,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=932986.6666666666, ans=0.07 2023-11-20 04:04:06,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=933053.3333333334, ans=0.125 2023-11-20 04:04:07,054 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7700, loss[loss=0.08819, simple_loss=0.1035, pruned_loss=0.02648, audio_tagging_loss=0.009964, over 14101.00 frames. ], tot_loss[loss=0.08267, simple_loss=0.1035, pruned_loss=0.02106, audio_tagging_loss=0.009866, over 3052638.00 frames. ], batch size: 51, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:04:17,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=933053.3333333334, ans=0.125 2023-11-20 04:04:18,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-20 04:04:28,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=933120.0, ans=0.125 2023-11-20 04:04:32,002 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.410e+01 8.189e+01 8.877e+01 9.506e+01 1.213e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 04:04:34,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=933186.6666666666, ans=0.2 2023-11-20 04:04:55,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=933253.3333333334, ans=0.05 2023-11-20 04:04:58,454 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140000 2023-11-20 04:05:09,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=933320.0, ans=0.1 2023-11-20 04:05:15,103 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7750, loss[loss=0.0589, simple_loss=0.07237, pruned_loss=0.01302, audio_tagging_loss=0.009694, over 14310.00 frames. ], tot_loss[loss=0.08156, simple_loss=0.1019, pruned_loss=0.02059, audio_tagging_loss=0.01001, over 3047571.84 frames. ], batch size: 54, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:05:28,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=933453.3333333334, ans=0.125 2023-11-20 04:05:39,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=933520.0, ans=0.0 2023-11-20 04:05:59,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=933586.6666666666, ans=0.2 2023-11-20 04:06:06,669 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140050 2023-11-20 04:06:10,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=933653.3333333334, ans=0.0 2023-11-20 04:06:12,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-20 04:06:19,735 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7800, loss[loss=0.07088, simple_loss=0.08555, pruned_loss=0.01751, audio_tagging_loss=0.0106, over 14545.00 frames. ], tot_loss[loss=0.08243, simple_loss=0.103, pruned_loss=0.02085, audio_tagging_loss=0.01006, over 3048234.12 frames. ], batch size: 54, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:06:46,654 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.169e+01 8.821e+01 9.790e+01 1.228e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 04:07:00,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2023-11-20 04:07:11,528 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140100 2023-11-20 04:07:15,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=933986.6666666666, ans=0.125 2023-11-20 04:07:24,821 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7850, loss[loss=0.06666, simple_loss=0.08321, pruned_loss=0.01556, audio_tagging_loss=0.009495, over 14699.00 frames. ], tot_loss[loss=0.08262, simple_loss=0.1037, pruned_loss=0.02079, audio_tagging_loss=0.009959, over 3044815.00 frames. ], batch size: 59, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:07:27,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=934053.3333333334, ans=0.125 2023-11-20 04:07:28,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=934053.3333333334, ans=0.0 2023-11-20 04:07:31,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=934053.3333333334, ans=0.125 2023-11-20 04:07:50,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=934186.6666666666, ans=0.0 2023-11-20 04:07:52,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=934186.6666666666, ans=0.025 2023-11-20 04:08:08,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-20 04:08:16,472 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140150 2023-11-20 04:08:19,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=934320.0, ans=0.025 2023-11-20 04:08:22,046 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.11 vs. limit=10.0 2023-11-20 04:08:28,677 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7900, loss[loss=0.05592, simple_loss=0.06434, pruned_loss=0.01332, audio_tagging_loss=0.01043, over 16458.00 frames. ], tot_loss[loss=0.08253, simple_loss=0.1036, pruned_loss=0.02064, audio_tagging_loss=0.0101, over 3045247.85 frames. ], batch size: 65, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:08:35,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-20 04:08:56,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.338e+01 8.229e+01 9.116e+01 9.916e+01 1.318e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 04:09:18,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=934586.6666666666, ans=0.1 2023-11-20 04:09:21,242 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140200 2023-11-20 04:09:33,528 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 7950, loss[loss=0.07867, simple_loss=0.09126, pruned_loss=0.01581, audio_tagging_loss=0.01723, over 15076.00 frames. ], tot_loss[loss=0.08274, simple_loss=0.1034, pruned_loss=0.02078, audio_tagging_loss=0.01025, over 3046057.94 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:09:45,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=934786.6666666666, ans=0.125 2023-11-20 04:09:50,702 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:09:52,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=934786.6666666666, ans=0.125 2023-11-20 04:09:53,844 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.14 vs. limit=15.0 2023-11-20 04:10:02,582 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=22.5 2023-11-20 04:10:24,411 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140250 2023-11-20 04:10:31,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=934986.6666666666, ans=0.1 2023-11-20 04:10:38,696 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8000, loss[loss=0.1032, simple_loss=0.128, pruned_loss=0.03099, audio_tagging_loss=0.008171, over 16344.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1028, pruned_loss=0.02064, audio_tagging_loss=0.01035, over 3043167.45 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:10:39,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=935053.3333333334, ans=0.09899494936611666 2023-11-20 04:11:05,193 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.205e+01 8.859e+01 1.003e+02 1.525e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 04:11:07,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2023-11-20 04:11:30,813 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140300 2023-11-20 04:11:33,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=935320.0, ans=0.125 2023-11-20 04:11:38,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=935320.0, ans=0.0 2023-11-20 04:11:40,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=935320.0, ans=0.07 2023-11-20 04:11:40,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=935320.0, ans=0.02 2023-11-20 04:11:42,852 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8050, loss[loss=0.06714, simple_loss=0.07983, pruned_loss=0.01561, audio_tagging_loss=0.01162, over 13743.00 frames. ], tot_loss[loss=0.08264, simple_loss=0.1029, pruned_loss=0.02077, audio_tagging_loss=0.01043, over 3039348.22 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:11:43,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=935386.6666666666, ans=0.125 2023-11-20 04:12:26,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-20 04:12:31,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=935586.6666666666, ans=0.1 2023-11-20 04:12:33,889 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140350 2023-11-20 04:12:46,657 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8100, loss[loss=0.1101, simple_loss=0.1385, pruned_loss=0.03504, audio_tagging_loss=0.005792, over 14186.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1026, pruned_loss=0.02083, audio_tagging_loss=0.01025, over 3036326.17 frames. ], batch size: 53, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:12:49,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=935720.0, ans=0.0 2023-11-20 04:12:51,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=935720.0, ans=10.0 2023-11-20 04:13:02,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-11-20 04:13:10,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=935786.6666666666, ans=0.0 2023-11-20 04:13:13,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=935853.3333333334, ans=0.0 2023-11-20 04:13:14,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.397e+01 8.892e+01 9.736e+01 1.286e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 04:13:15,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.63 vs. limit=10.0 2023-11-20 04:13:21,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=935853.3333333334, ans=0.1 2023-11-20 04:13:38,145 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140400 2023-11-20 04:13:41,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2023-11-20 04:13:51,053 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8150, loss[loss=0.1021, simple_loss=0.1375, pruned_loss=0.02619, audio_tagging_loss=0.007229, over 15070.00 frames. ], tot_loss[loss=0.08255, simple_loss=0.1029, pruned_loss=0.02098, audio_tagging_loss=0.01014, over 3042836.88 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:14:07,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=936120.0, ans=0.125 2023-11-20 04:14:09,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=936120.0, ans=0.1 2023-11-20 04:14:26,537 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.617e-01 2023-11-20 04:14:27,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=936186.6666666666, ans=0.0 2023-11-20 04:14:43,645 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140450 2023-11-20 04:14:48,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=936320.0, ans=0.0 2023-11-20 04:14:56,553 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8200, loss[loss=0.09498, simple_loss=0.1241, pruned_loss=0.02535, audio_tagging_loss=0.007605, over 15447.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1038, pruned_loss=0.021, audio_tagging_loss=0.009982, over 3051889.67 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:14:57,842 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:15:00,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=936386.6666666666, ans=0.2 2023-11-20 04:15:01,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=936386.6666666666, ans=0.125 2023-11-20 04:15:23,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.334e+01 8.915e+01 9.605e+01 1.213e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 04:15:35,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=936586.6666666666, ans=0.2 2023-11-20 04:15:47,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=936653.3333333334, ans=0.1 2023-11-20 04:15:48,655 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140500 2023-11-20 04:15:48,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=936653.3333333334, ans=0.0 2023-11-20 04:15:53,601 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-20 04:15:56,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=936653.3333333334, ans=0.07 2023-11-20 04:16:01,647 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8250, loss[loss=0.08351, simple_loss=0.1098, pruned_loss=0.02266, audio_tagging_loss=0.005941, over 14801.00 frames. ], tot_loss[loss=0.08269, simple_loss=0.1034, pruned_loss=0.02107, audio_tagging_loss=0.00991, over 3051466.86 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:16:10,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=936720.0, ans=0.125 2023-11-20 04:16:20,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=936786.6666666666, ans=0.1 2023-11-20 04:16:42,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=936920.0, ans=0.2 2023-11-20 04:16:52,930 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140550 2023-11-20 04:16:54,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2023-11-20 04:17:05,488 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8300, loss[loss=0.1007, simple_loss=0.1182, pruned_loss=0.02935, audio_tagging_loss=0.01223, over 15149.00 frames. ], tot_loss[loss=0.08254, simple_loss=0.1033, pruned_loss=0.02098, audio_tagging_loss=0.00992, over 3053261.25 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 16.0 2023-11-20 04:17:10,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=937053.3333333334, ans=0.125 2023-11-20 04:17:11,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937053.3333333334, ans=0.1 2023-11-20 04:17:32,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2023-11-20 04:17:33,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.113e+01 8.924e+01 9.899e+01 1.160e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 04:17:37,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2023-11-20 04:17:54,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=937253.3333333334, ans=0.125 2023-11-20 04:17:56,396 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140600 2023-11-20 04:18:01,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2023-11-20 04:18:09,913 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8350, loss[loss=0.05534, simple_loss=0.07111, pruned_loss=0.008457, audio_tagging_loss=0.01132, over 15128.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.1029, pruned_loss=0.02075, audio_tagging_loss=0.009952, over 3051703.56 frames. ], batch size: 59, lr: 5.72e-03, grad_scale: 16.0 2023-11-20 04:18:11,350 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:18:13,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937386.6666666666, ans=0.1 2023-11-20 04:18:30,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=937453.3333333334, ans=0.125 2023-11-20 04:18:58,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=937586.6666666666, ans=0.0 2023-11-20 04:18:59,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=937586.6666666666, ans=0.0 2023-11-20 04:19:02,273 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140650 2023-11-20 04:19:02,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-11-20 04:19:06,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=937653.3333333334, ans=15.0 2023-11-20 04:19:15,092 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8400, loss[loss=0.07245, simple_loss=0.0914, pruned_loss=0.01735, audio_tagging_loss=0.009402, over 15790.00 frames. ], tot_loss[loss=0.08199, simple_loss=0.1028, pruned_loss=0.02065, audio_tagging_loss=0.009939, over 3051715.44 frames. ], batch size: 64, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:19:20,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937720.0, ans=0.1 2023-11-20 04:19:20,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=937720.0, ans=0.125 2023-11-20 04:19:43,612 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 7.950e+01 8.532e+01 9.547e+01 1.131e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 04:20:07,393 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140700 2023-11-20 04:20:19,672 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8450, loss[loss=0.08535, simple_loss=0.1044, pruned_loss=0.02098, audio_tagging_loss=0.01215, over 15227.00 frames. ], tot_loss[loss=0.08265, simple_loss=0.1036, pruned_loss=0.02101, audio_tagging_loss=0.009852, over 3055101.81 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:20:43,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=938120.0, ans=0.125 2023-11-20 04:20:49,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=938186.6666666666, ans=0.035 2023-11-20 04:20:53,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=938186.6666666666, ans=0.125 2023-11-20 04:20:54,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=938186.6666666666, ans=0.1 2023-11-20 04:21:00,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=938253.3333333334, ans=0.0 2023-11-20 04:21:12,273 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140750 2023-11-20 04:21:24,945 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8500, loss[loss=0.09978, simple_loss=0.1359, pruned_loss=0.02436, audio_tagging_loss=0.007481, over 15631.00 frames. ], tot_loss[loss=0.08354, simple_loss=0.1046, pruned_loss=0.02133, audio_tagging_loss=0.009924, over 3049063.49 frames. ], batch size: 58, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:21:25,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938386.6666666666, ans=0.1 2023-11-20 04:21:25,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=938386.6666666666, ans=0.1 2023-11-20 04:21:25,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=938386.6666666666, ans=0.125 2023-11-20 04:21:42,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-20 04:21:53,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.458e+01 8.139e+01 8.950e+01 9.720e+01 1.235e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 04:22:06,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=938586.6666666666, ans=0.0 2023-11-20 04:22:16,784 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140800 2023-11-20 04:22:18,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=938653.3333333334, ans=15.0 2023-11-20 04:22:19,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=938653.3333333334, ans=0.125 2023-11-20 04:22:19,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=938653.3333333334, ans=10.0 2023-11-20 04:22:27,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=938653.3333333334, ans=0.125 2023-11-20 04:22:30,071 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8550, loss[loss=0.1076, simple_loss=0.13, pruned_loss=0.03267, audio_tagging_loss=0.009968, over 15265.00 frames. ], tot_loss[loss=0.08379, simple_loss=0.1047, pruned_loss=0.02149, audio_tagging_loss=0.009945, over 3046185.32 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 32.0 2023-11-20 04:22:49,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=938786.6666666666, ans=0.125 2023-11-20 04:22:51,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=938786.6666666666, ans=0.125 2023-11-20 04:22:57,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938853.3333333334, ans=0.1 2023-11-20 04:22:59,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=938853.3333333334, ans=0.125 2023-11-20 04:23:01,142 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:23:17,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=938920.0, ans=0.0 2023-11-20 04:23:21,917 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140850 2023-11-20 04:23:28,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=938986.6666666666, ans=0.0 2023-11-20 04:23:29,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=938986.6666666666, ans=0.125 2023-11-20 04:23:34,031 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8600, loss[loss=0.05845, simple_loss=0.06302, pruned_loss=0.01191, audio_tagging_loss=0.01503, over 15551.00 frames. ], tot_loss[loss=0.08348, simple_loss=0.104, pruned_loss=0.0214, audio_tagging_loss=0.01007, over 3053796.54 frames. ], batch size: 59, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:23:36,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=939053.3333333334, ans=0.125 2023-11-20 04:23:55,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=939120.0, ans=0.125 2023-11-20 04:24:04,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.151e+01 8.812e+01 9.579e+01 1.857e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-20 04:24:05,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-20 04:24:16,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=939253.3333333334, ans=0.0 2023-11-20 04:24:16,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=939253.3333333334, ans=0.0 2023-11-20 04:24:18,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-20 04:24:18,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=939253.3333333334, ans=0.1 2023-11-20 04:24:26,062 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140900 2023-11-20 04:24:26,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=939320.0, ans=0.1 2023-11-20 04:24:37,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=939320.0, ans=0.125 2023-11-20 04:24:39,294 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8650, loss[loss=0.09118, simple_loss=0.1206, pruned_loss=0.02193, audio_tagging_loss=0.008938, over 16297.00 frames. ], tot_loss[loss=0.08385, simple_loss=0.1047, pruned_loss=0.02145, audio_tagging_loss=0.01005, over 3056011.66 frames. ], batch size: 60, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:24:43,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=22.5 2023-11-20 04:24:48,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.17 vs. limit=10.0 2023-11-20 04:24:54,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=939453.3333333334, ans=0.125 2023-11-20 04:25:04,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2023-11-20 04:25:05,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=939520.0, ans=0.015 2023-11-20 04:25:07,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=939520.0, ans=0.125 2023-11-20 04:25:20,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-20 04:25:30,771 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 140950 2023-11-20 04:25:33,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2023-11-20 04:25:43,345 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8700, loss[loss=0.0892, simple_loss=0.1162, pruned_loss=0.02437, audio_tagging_loss=0.006735, over 14770.00 frames. ], tot_loss[loss=0.08327, simple_loss=0.1039, pruned_loss=0.02113, audio_tagging_loss=0.01018, over 3053899.38 frames. ], batch size: 55, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:26:13,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.908e+01 8.317e+01 9.149e+01 9.990e+01 1.361e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 04:26:35,194 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141000 2023-11-20 04:26:35,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2023-11-20 04:26:38,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=939986.6666666666, ans=0.125 2023-11-20 04:26:47,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=940053.3333333334, ans=0.125 2023-11-20 04:26:48,349 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8750, loss[loss=0.06228, simple_loss=0.07241, pruned_loss=0.01423, audio_tagging_loss=0.01184, over 16311.00 frames. ], tot_loss[loss=0.08261, simple_loss=0.1031, pruned_loss=0.02081, audio_tagging_loss=0.01023, over 3050096.60 frames. ], batch size: 61, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:26:50,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=940053.3333333334, ans=0.125 2023-11-20 04:27:08,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=940120.0, ans=0.125 2023-11-20 04:27:40,245 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141050 2023-11-20 04:27:46,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=940320.0, ans=0.1 2023-11-20 04:27:47,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=940320.0, ans=0.1 2023-11-20 04:27:53,777 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8800, loss[loss=0.05713, simple_loss=0.07201, pruned_loss=0.01136, audio_tagging_loss=0.00977, over 15444.00 frames. ], tot_loss[loss=0.08308, simple_loss=0.1035, pruned_loss=0.02104, audio_tagging_loss=0.01031, over 3050339.06 frames. ], batch size: 60, lr: 5.71e-03, grad_scale: 32.0 2023-11-20 04:27:53,989 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:28:12,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=940453.3333333334, ans=0.09899494936611666 2023-11-20 04:28:20,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=940520.0, ans=0.2 2023-11-20 04:28:24,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.494e+01 8.276e+01 8.994e+01 9.890e+01 1.240e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 04:28:41,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=940586.6666666666, ans=0.95 2023-11-20 04:28:45,009 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141100 2023-11-20 04:28:48,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=940653.3333333334, ans=0.125 2023-11-20 04:28:55,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=940653.3333333334, ans=0.125 2023-11-20 04:28:58,022 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8850, loss[loss=0.07092, simple_loss=0.07741, pruned_loss=0.01758, audio_tagging_loss=0.01463, over 15657.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.1026, pruned_loss=0.02087, audio_tagging_loss=0.01036, over 3050687.32 frames. ], batch size: 62, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:29:10,831 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:29:10,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=940786.6666666666, ans=0.1 2023-11-20 04:29:12,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=940786.6666666666, ans=0.0 2023-11-20 04:29:13,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=940786.6666666666, ans=0.125 2023-11-20 04:29:22,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=940853.3333333334, ans=0.125 2023-11-20 04:29:27,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=940853.3333333334, ans=0.1 2023-11-20 04:29:33,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-20 04:29:49,303 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141150 2023-11-20 04:29:49,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=940986.6666666666, ans=0.2 2023-11-20 04:30:01,933 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8900, loss[loss=0.07102, simple_loss=0.09651, pruned_loss=0.01478, audio_tagging_loss=0.007986, over 15113.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1022, pruned_loss=0.02073, audio_tagging_loss=0.01027, over 3049771.03 frames. ], batch size: 58, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:30:05,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=941053.3333333334, ans=0.0 2023-11-20 04:30:30,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=941186.6666666666, ans=0.125 2023-11-20 04:30:34,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.851e+01 8.057e+01 8.835e+01 9.972e+01 1.678e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 04:30:39,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=941253.3333333334, ans=0.125 2023-11-20 04:30:47,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=941253.3333333334, ans=0.125 2023-11-20 04:30:54,000 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141200 2023-11-20 04:30:54,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=941320.0, ans=0.09899494936611666 2023-11-20 04:31:04,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=941320.0, ans=0.125 2023-11-20 04:31:06,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=941386.6666666666, ans=0.0 2023-11-20 04:31:07,532 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 8950, loss[loss=0.1014, simple_loss=0.121, pruned_loss=0.03205, audio_tagging_loss=0.008854, over 15725.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.1031, pruned_loss=0.02069, audio_tagging_loss=0.0101, over 3057690.94 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:31:29,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2023-11-20 04:31:32,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2023-11-20 04:31:46,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=941586.6666666666, ans=0.1 2023-11-20 04:31:47,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=941586.6666666666, ans=0.125 2023-11-20 04:31:58,581 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141250 2023-11-20 04:32:10,730 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9000, loss[loss=0.1032, simple_loss=0.1265, pruned_loss=0.02816, audio_tagging_loss=0.01179, over 15828.00 frames. ], tot_loss[loss=0.08323, simple_loss=0.1042, pruned_loss=0.02112, audio_tagging_loss=0.01002, over 3057338.54 frames. ], batch size: 59, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:32:10,731 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 04:32:53,271 INFO [train_asr.py:1294] (2/4) Epoch 12, validation: loss=0.06397, simple_loss=0.05412, pruned_loss=0.005869, audio_tagging_loss=0.03104, over 4681554.00 frames. 2023-11-20 04:32:53,272 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 04:32:56,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=941720.0, ans=0.2 2023-11-20 04:33:25,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.562e+01 8.268e+01 8.688e+01 9.407e+01 1.162e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 04:33:41,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=941920.0, ans=0.0 2023-11-20 04:33:45,394 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141300 2023-11-20 04:33:53,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=941986.6666666666, ans=0.125 2023-11-20 04:33:55,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=941986.6666666666, ans=0.1 2023-11-20 04:33:58,659 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9050, loss[loss=0.07791, simple_loss=0.1103, pruned_loss=0.0156, audio_tagging_loss=0.007151, over 15225.00 frames. ], tot_loss[loss=0.08283, simple_loss=0.1039, pruned_loss=0.02098, audio_tagging_loss=0.009886, over 3051269.52 frames. ], batch size: 56, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:33:58,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=942053.3333333334, ans=0.2 2023-11-20 04:34:01,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942053.3333333334, ans=0.1 2023-11-20 04:34:14,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=942120.0, ans=0.125 2023-11-20 04:34:47,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=942253.3333333334, ans=0.04949747468305833 2023-11-20 04:34:48,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942253.3333333334, ans=0.1 2023-11-20 04:34:50,693 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141350 2023-11-20 04:35:02,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=942386.6666666666, ans=0.0 2023-11-20 04:35:03,402 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9100, loss[loss=0.06713, simple_loss=0.08541, pruned_loss=0.0162, audio_tagging_loss=0.008219, over 15179.00 frames. ], tot_loss[loss=0.08184, simple_loss=0.103, pruned_loss=0.02048, audio_tagging_loss=0.009858, over 3058316.86 frames. ], batch size: 57, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:35:04,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=942386.6666666666, ans=0.2 2023-11-20 04:35:30,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=942520.0, ans=0.0 2023-11-20 04:35:36,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.481e+01 8.086e+01 8.915e+01 9.526e+01 1.275e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 04:35:41,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=942586.6666666666, ans=15.0 2023-11-20 04:35:55,763 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141400 2023-11-20 04:36:08,573 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9150, loss[loss=0.05287, simple_loss=0.05306, pruned_loss=0.01198, audio_tagging_loss=0.01436, over 13961.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.1027, pruned_loss=0.02053, audio_tagging_loss=0.009732, over 3053493.97 frames. ], batch size: 59, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:36:13,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=942720.0, ans=0.125 2023-11-20 04:36:38,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-20 04:37:00,731 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141450 2023-11-20 04:37:10,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=942986.6666666666, ans=0.125 2023-11-20 04:37:14,217 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9200, loss[loss=0.08301, simple_loss=0.09693, pruned_loss=0.02309, audio_tagging_loss=0.01146, over 15373.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1017, pruned_loss=0.02019, audio_tagging_loss=0.00973, over 3053268.94 frames. ], batch size: 59, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:37:21,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=943053.3333333334, ans=0.0 2023-11-20 04:37:23,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=943053.3333333334, ans=0.2 2023-11-20 04:37:32,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=943120.0, ans=0.125 2023-11-20 04:37:43,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=943186.6666666666, ans=0.1 2023-11-20 04:37:45,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.352e+01 9.147e+01 9.950e+01 1.226e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 04:37:50,127 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2023-11-20 04:37:55,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=943253.3333333334, ans=0.1 2023-11-20 04:38:02,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=943253.3333333334, ans=0.125 2023-11-20 04:38:06,663 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141500 2023-11-20 04:38:12,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=943320.0, ans=15.0 2023-11-20 04:38:19,628 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9250, loss[loss=0.05751, simple_loss=0.06538, pruned_loss=0.009252, audio_tagging_loss=0.01557, over 15574.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1017, pruned_loss=0.0203, audio_tagging_loss=0.00981, over 3054496.47 frames. ], batch size: 60, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:38:29,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=943386.6666666666, ans=0.125 2023-11-20 04:38:48,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=943520.0, ans=0.125 2023-11-20 04:38:59,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=12.0 2023-11-20 04:39:11,433 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141550 2023-11-20 04:39:14,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=943653.3333333334, ans=0.0 2023-11-20 04:39:14,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2023-11-20 04:39:21,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=943653.3333333334, ans=0.125 2023-11-20 04:39:23,867 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9300, loss[loss=0.09365, simple_loss=0.113, pruned_loss=0.02568, audio_tagging_loss=0.01148, over 14856.00 frames. ], tot_loss[loss=0.08082, simple_loss=0.1015, pruned_loss=0.02022, audio_tagging_loss=0.009865, over 3062766.56 frames. ], batch size: 55, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:39:31,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=943720.0, ans=0.125 2023-11-20 04:39:44,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=943786.6666666666, ans=0.0 2023-11-20 04:39:57,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.210e+01 8.770e+01 9.348e+01 1.167e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 04:40:14,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=943986.6666666666, ans=0.125 2023-11-20 04:40:15,586 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141600 2023-11-20 04:40:28,812 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9350, loss[loss=0.1156, simple_loss=0.1484, pruned_loss=0.03281, audio_tagging_loss=0.008543, over 16485.00 frames. ], tot_loss[loss=0.08182, simple_loss=0.1027, pruned_loss=0.02064, audio_tagging_loss=0.009844, over 3060425.36 frames. ], batch size: 60, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:40:38,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=944053.3333333334, ans=0.0 2023-11-20 04:40:44,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=22.5 2023-11-20 04:40:52,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=944120.0, ans=0.1 2023-11-20 04:40:58,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=944186.6666666666, ans=0.2 2023-11-20 04:41:06,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=944253.3333333334, ans=0.125 2023-11-20 04:41:21,500 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141650 2023-11-20 04:41:27,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=944320.0, ans=0.125 2023-11-20 04:41:33,797 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9400, loss[loss=0.07273, simple_loss=0.09257, pruned_loss=0.01746, audio_tagging_loss=0.008996, over 15313.00 frames. ], tot_loss[loss=0.08218, simple_loss=0.1031, pruned_loss=0.0207, audio_tagging_loss=0.009907, over 3057309.24 frames. ], batch size: 57, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:41:43,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=944386.6666666666, ans=0.0 2023-11-20 04:42:05,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.348e+01 8.869e+01 9.935e+01 1.327e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 04:42:18,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=944586.6666666666, ans=0.0 2023-11-20 04:42:23,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=944586.6666666666, ans=0.125 2023-11-20 04:42:26,290 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141700 2023-11-20 04:42:37,250 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:42:38,432 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9450, loss[loss=0.08739, simple_loss=0.1219, pruned_loss=0.02057, audio_tagging_loss=0.005865, over 15215.00 frames. ], tot_loss[loss=0.08221, simple_loss=0.1032, pruned_loss=0.02062, audio_tagging_loss=0.009989, over 3058509.61 frames. ], batch size: 57, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:43:15,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=944853.3333333334, ans=0.0 2023-11-20 04:43:22,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2023-11-20 04:43:30,230 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141750 2023-11-20 04:43:30,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=944986.6666666666, ans=0.0 2023-11-20 04:43:31,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=944986.6666666666, ans=0.125 2023-11-20 04:43:31,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=944986.6666666666, ans=10.0 2023-11-20 04:43:42,794 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9500, loss[loss=0.09082, simple_loss=0.1115, pruned_loss=0.02599, audio_tagging_loss=0.00906, over 15250.00 frames. ], tot_loss[loss=0.08185, simple_loss=0.1023, pruned_loss=0.02054, audio_tagging_loss=0.01014, over 3055469.79 frames. ], batch size: 55, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:43:58,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=945120.0, ans=0.0 2023-11-20 04:44:06,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-11-20 04:44:15,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.327e+01 9.041e+01 9.892e+01 1.668e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 04:44:25,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=945253.3333333334, ans=0.125 2023-11-20 04:44:34,998 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141800 2023-11-20 04:44:39,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=945320.0, ans=0.125 2023-11-20 04:44:41,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=945320.0, ans=0.0 2023-11-20 04:44:48,685 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9550, loss[loss=0.08497, simple_loss=0.1027, pruned_loss=0.02488, audio_tagging_loss=0.008718, over 15565.00 frames. ], tot_loss[loss=0.08175, simple_loss=0.1019, pruned_loss=0.0206, audio_tagging_loss=0.01021, over 3054269.15 frames. ], batch size: 61, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:44:49,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2023-11-20 04:44:51,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=945386.6666666666, ans=0.1 2023-11-20 04:44:51,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=945386.6666666666, ans=0.125 2023-11-20 04:45:04,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=945453.3333333334, ans=0.1 2023-11-20 04:45:14,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=945520.0, ans=0.0 2023-11-20 04:45:40,914 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141850 2023-11-20 04:45:54,315 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9600, loss[loss=0.06332, simple_loss=0.07336, pruned_loss=0.01742, audio_tagging_loss=0.009209, over 13710.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1015, pruned_loss=0.0205, audio_tagging_loss=0.01036, over 3052627.93 frames. ], batch size: 53, lr: 5.69e-03, grad_scale: 32.0 2023-11-20 04:46:01,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=945720.0, ans=0.0 2023-11-20 04:46:05,992 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:46:21,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=945853.3333333334, ans=0.125 2023-11-20 04:46:26,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.231e+01 8.901e+01 9.790e+01 1.400e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 04:46:29,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=945853.3333333334, ans=0.125 2023-11-20 04:46:31,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=945853.3333333334, ans=0.125 2023-11-20 04:46:34,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=945920.0, ans=0.2 2023-11-20 04:46:43,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=945920.0, ans=0.125 2023-11-20 04:46:44,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-20 04:46:46,310 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141900 2023-11-20 04:46:55,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-20 04:46:58,316 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9650, loss[loss=0.1078, simple_loss=0.1256, pruned_loss=0.03324, audio_tagging_loss=0.0117, over 15074.00 frames. ], tot_loss[loss=0.08204, simple_loss=0.1022, pruned_loss=0.02062, audio_tagging_loss=0.01031, over 3052579.67 frames. ], batch size: 54, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:47:23,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.51 vs. limit=22.5 2023-11-20 04:47:25,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=946186.6666666666, ans=0.2 2023-11-20 04:47:45,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=946253.3333333334, ans=0.0 2023-11-20 04:47:49,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-11-20 04:47:50,335 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 141950 2023-11-20 04:47:54,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=946320.0, ans=0.0 2023-11-20 04:48:03,357 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9700, loss[loss=0.06607, simple_loss=0.06574, pruned_loss=0.01506, audio_tagging_loss=0.01814, over 14280.00 frames. ], tot_loss[loss=0.08173, simple_loss=0.1018, pruned_loss=0.02062, audio_tagging_loss=0.01022, over 3053025.92 frames. ], batch size: 54, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:48:22,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=946453.3333333334, ans=0.0 2023-11-20 04:48:36,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.155e+01 8.941e+01 9.505e+01 1.207e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 04:48:37,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2023-11-20 04:48:42,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-20 04:48:55,407 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142000 2023-11-20 04:49:03,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=946653.3333333334, ans=0.04949747468305833 2023-11-20 04:49:08,455 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9750, loss[loss=0.05756, simple_loss=0.0586, pruned_loss=0.01135, audio_tagging_loss=0.0169, over 15306.00 frames. ], tot_loss[loss=0.08159, simple_loss=0.1016, pruned_loss=0.02066, audio_tagging_loss=0.01015, over 3050752.35 frames. ], batch size: 58, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:49:42,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=946853.3333333334, ans=0.0 2023-11-20 04:50:00,599 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142050 2023-11-20 04:50:09,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=946986.6666666666, ans=0.125 2023-11-20 04:50:12,902 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9800, loss[loss=0.1205, simple_loss=0.154, pruned_loss=0.03619, audio_tagging_loss=0.007361, over 16597.00 frames. ], tot_loss[loss=0.08217, simple_loss=0.1022, pruned_loss=0.02101, audio_tagging_loss=0.01006, over 3043119.32 frames. ], batch size: 61, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:50:35,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-20 04:50:40,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=947186.6666666666, ans=0.0 2023-11-20 04:50:47,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.314e+01 8.707e+01 9.702e+01 1.155e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 04:51:04,553 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142100 2023-11-20 04:51:11,782 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:51:17,962 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9850, loss[loss=0.07938, simple_loss=0.0967, pruned_loss=0.02034, audio_tagging_loss=0.01069, over 15487.00 frames. ], tot_loss[loss=0.08184, simple_loss=0.1019, pruned_loss=0.02086, audio_tagging_loss=0.01003, over 3041053.25 frames. ], batch size: 56, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:51:18,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=947386.6666666666, ans=0.2 2023-11-20 04:51:21,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=947386.6666666666, ans=0.125 2023-11-20 04:51:33,259 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2023-11-20 04:51:51,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=947520.0, ans=0.1 2023-11-20 04:51:56,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=947586.6666666666, ans=0.1 2023-11-20 04:51:58,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=947586.6666666666, ans=0.0 2023-11-20 04:52:09,666 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142150 2023-11-20 04:52:22,421 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9900, loss[loss=0.06638, simple_loss=0.08417, pruned_loss=0.01581, audio_tagging_loss=0.008479, over 13935.00 frames. ], tot_loss[loss=0.08104, simple_loss=0.1011, pruned_loss=0.02044, audio_tagging_loss=0.01003, over 3040528.93 frames. ], batch size: 52, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:52:27,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=947720.0, ans=0.125 2023-11-20 04:52:29,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=947720.0, ans=0.0 2023-11-20 04:52:30,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=947720.0, ans=0.125 2023-11-20 04:52:48,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=947853.3333333334, ans=0.0 2023-11-20 04:52:56,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.295e+01 8.098e+01 8.892e+01 9.710e+01 1.368e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 04:52:59,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-20 04:53:14,525 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142200 2023-11-20 04:53:27,281 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 9950, loss[loss=0.09879, simple_loss=0.1364, pruned_loss=0.02208, audio_tagging_loss=0.008529, over 15740.00 frames. ], tot_loss[loss=0.08033, simple_loss=0.1, pruned_loss=0.02029, audio_tagging_loss=0.01001, over 3045788.99 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:53:44,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=948120.0, ans=0.035 2023-11-20 04:53:47,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-11-20 04:53:59,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=948186.6666666666, ans=0.125 2023-11-20 04:54:08,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=948253.3333333334, ans=0.125 2023-11-20 04:54:18,913 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142250 2023-11-20 04:54:20,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=948320.0, ans=10.0 2023-11-20 04:54:32,596 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10000, loss[loss=0.06241, simple_loss=0.08836, pruned_loss=0.0103, audio_tagging_loss=0.00793, over 15058.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.09942, pruned_loss=0.01998, audio_tagging_loss=0.01002, over 3046998.31 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 32.0 2023-11-20 04:54:48,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=948453.3333333334, ans=0.125 2023-11-20 04:55:05,758 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.242e+01 9.183e+01 1.026e+02 1.433e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-20 04:55:08,735 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2023-11-20 04:55:23,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=948653.3333333334, ans=0.0 2023-11-20 04:55:24,385 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142300 2023-11-20 04:55:29,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=948653.3333333334, ans=0.0 2023-11-20 04:55:37,138 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10050, loss[loss=0.08641, simple_loss=0.1114, pruned_loss=0.02411, audio_tagging_loss=0.006604, over 15747.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.0996, pruned_loss=0.02007, audio_tagging_loss=0.009967, over 3046309.65 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 04:55:58,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=948786.6666666666, ans=0.0 2023-11-20 04:55:58,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=948786.6666666666, ans=0.125 2023-11-20 04:56:28,241 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142350 2023-11-20 04:56:41,022 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10100, loss[loss=0.1058, simple_loss=0.1319, pruned_loss=0.03136, audio_tagging_loss=0.008481, over 16818.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1006, pruned_loss=0.02037, audio_tagging_loss=0.01013, over 3048671.58 frames. ], batch size: 61, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:56:45,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=949053.3333333334, ans=0.0 2023-11-20 04:56:48,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=949053.3333333334, ans=0.125 2023-11-20 04:56:51,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=949053.3333333334, ans=0.1 2023-11-20 04:57:06,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-11-20 04:57:09,823 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:57:12,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=949186.6666666666, ans=0.125 2023-11-20 04:57:14,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.42 vs. limit=10.0 2023-11-20 04:57:16,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.344e+01 8.796e+01 9.668e+01 1.145e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 04:57:32,851 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:57:32,887 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142400 2023-11-20 04:57:46,697 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10150, loss[loss=0.08603, simple_loss=0.1047, pruned_loss=0.02236, audio_tagging_loss=0.0113, over 14628.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1014, pruned_loss=0.02048, audio_tagging_loss=0.01019, over 3051386.03 frames. ], batch size: 55, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:57:54,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=949386.6666666666, ans=0.125 2023-11-20 04:58:16,904 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:58:17,201 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:58:20,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=949520.0, ans=0.0 2023-11-20 04:58:31,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=949586.6666666666, ans=0.1 2023-11-20 04:58:34,032 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-20 04:58:38,555 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142450 2023-11-20 04:58:42,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=949653.3333333334, ans=0.125 2023-11-20 04:58:51,120 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10200, loss[loss=0.0834, simple_loss=0.09652, pruned_loss=0.02343, audio_tagging_loss=0.01171, over 14867.00 frames. ], tot_loss[loss=0.08183, simple_loss=0.1021, pruned_loss=0.02062, audio_tagging_loss=0.01017, over 3056954.85 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:58:51,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=949720.0, ans=0.2 2023-11-20 04:58:53,812 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.572e-03 2023-11-20 04:58:53,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=949720.0, ans=0.125 2023-11-20 04:59:15,478 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:59:22,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=949853.3333333334, ans=0.125 2023-11-20 04:59:23,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=949853.3333333334, ans=0.2 2023-11-20 04:59:26,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.468e+01 8.830e+01 9.466e+01 1.234e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 04:59:42,868 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142500 2023-11-20 04:59:54,963 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10250, loss[loss=0.08307, simple_loss=0.1112, pruned_loss=0.01921, audio_tagging_loss=0.008273, over 15517.00 frames. ], tot_loss[loss=0.08212, simple_loss=0.1024, pruned_loss=0.02068, audio_tagging_loss=0.01022, over 3058618.88 frames. ], batch size: 57, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:59:57,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=950053.3333333334, ans=0.2 2023-11-20 05:00:06,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=950120.0, ans=0.125 2023-11-20 05:00:37,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=950253.3333333334, ans=0.0 2023-11-20 05:00:41,105 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2023-11-20 05:00:43,243 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:00:46,645 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142550 2023-11-20 05:00:58,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=950320.0, ans=0.07 2023-11-20 05:01:00,105 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10300, loss[loss=0.06042, simple_loss=0.07287, pruned_loss=0.01007, audio_tagging_loss=0.01391, over 14816.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.1028, pruned_loss=0.02062, audio_tagging_loss=0.01023, over 3058554.61 frames. ], batch size: 55, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:01:07,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-20 05:01:12,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-20 05:01:33,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=950520.0, ans=0.0 2023-11-20 05:01:34,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.101e+01 8.687e+01 9.429e+01 1.201e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 05:01:38,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=950586.6666666666, ans=0.125 2023-11-20 05:01:52,452 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142600 2023-11-20 05:01:53,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=950653.3333333334, ans=0.125 2023-11-20 05:01:53,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=950653.3333333334, ans=0.2 2023-11-20 05:01:56,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=950653.3333333334, ans=0.1 2023-11-20 05:02:04,868 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10350, loss[loss=0.08128, simple_loss=0.08567, pruned_loss=0.02224, audio_tagging_loss=0.01621, over 15006.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.1024, pruned_loss=0.02073, audio_tagging_loss=0.01033, over 3053918.03 frames. ], batch size: 58, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:02:30,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-20 05:02:57,461 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142650 2023-11-20 05:03:00,175 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:03:09,714 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10400, loss[loss=0.1097, simple_loss=0.1408, pruned_loss=0.02953, audio_tagging_loss=0.009786, over 16151.00 frames. ], tot_loss[loss=0.08249, simple_loss=0.1026, pruned_loss=0.02075, audio_tagging_loss=0.01042, over 3054469.64 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:03:19,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-20 05:03:42,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=951186.6666666666, ans=0.0 2023-11-20 05:03:45,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.746e+01 9.185e+01 9.994e+01 1.378e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-20 05:03:53,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=951253.3333333334, ans=0.125 2023-11-20 05:04:01,618 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142700 2023-11-20 05:04:05,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-20 05:04:06,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=951320.0, ans=0.5 2023-11-20 05:04:11,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=951320.0, ans=0.125 2023-11-20 05:04:14,410 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10450, loss[loss=0.079, simple_loss=0.09522, pruned_loss=0.01918, audio_tagging_loss=0.01222, over 14815.00 frames. ], tot_loss[loss=0.08221, simple_loss=0.1025, pruned_loss=0.02062, audio_tagging_loss=0.01034, over 3064244.77 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:04:19,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=22.5 2023-11-20 05:04:28,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=951453.3333333334, ans=0.125 2023-11-20 05:04:32,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=951453.3333333334, ans=0.0 2023-11-20 05:04:34,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=951453.3333333334, ans=0.125 2023-11-20 05:05:06,121 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142750 2023-11-20 05:05:07,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-20 05:05:18,698 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10500, loss[loss=0.07891, simple_loss=0.09971, pruned_loss=0.02103, audio_tagging_loss=0.008029, over 14292.00 frames. ], tot_loss[loss=0.08216, simple_loss=0.1025, pruned_loss=0.02071, audio_tagging_loss=0.01019, over 3061619.62 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:05:32,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2023-11-20 05:05:33,062 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.527e-02 2023-11-20 05:05:43,119 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-20 05:05:45,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=951853.3333333334, ans=0.05 2023-11-20 05:05:47,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=951853.3333333334, ans=0.125 2023-11-20 05:05:52,959 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.079e+01 8.976e+01 9.766e+01 1.332e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 05:05:57,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=951920.0, ans=0.125 2023-11-20 05:06:02,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=951920.0, ans=0.5 2023-11-20 05:06:10,143 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142800 2023-11-20 05:06:13,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=951986.6666666666, ans=0.0 2023-11-20 05:06:22,639 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10550, loss[loss=0.06843, simple_loss=0.07621, pruned_loss=0.01694, audio_tagging_loss=0.01339, over 15351.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1027, pruned_loss=0.02093, audio_tagging_loss=0.01008, over 3053622.01 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:06:29,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-20 05:06:33,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=952120.0, ans=0.0 2023-11-20 05:06:38,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=952120.0, ans=0.025 2023-11-20 05:06:47,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=952186.6666666666, ans=0.1 2023-11-20 05:06:49,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=952186.6666666666, ans=0.0 2023-11-20 05:06:49,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=952186.6666666666, ans=0.0 2023-11-20 05:07:00,887 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:07:04,940 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=12.0 2023-11-20 05:07:14,213 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142850 2023-11-20 05:07:26,286 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10600, loss[loss=0.07763, simple_loss=0.1004, pruned_loss=0.01947, audio_tagging_loss=0.007947, over 14488.00 frames. ], tot_loss[loss=0.08232, simple_loss=0.1027, pruned_loss=0.021, audio_tagging_loss=0.009958, over 3043716.93 frames. ], batch size: 55, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:07:29,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=952386.6666666666, ans=10.0 2023-11-20 05:07:48,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=952453.3333333334, ans=0.0 2023-11-20 05:08:01,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.408e+01 9.197e+01 1.017e+02 1.438e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 05:08:18,689 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142900 2023-11-20 05:08:31,753 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10650, loss[loss=0.06996, simple_loss=0.08659, pruned_loss=0.01702, audio_tagging_loss=0.009644, over 15117.00 frames. ], tot_loss[loss=0.08149, simple_loss=0.1016, pruned_loss=0.02074, audio_tagging_loss=0.009968, over 3034796.08 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:08:38,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=952720.0, ans=0.2 2023-11-20 05:08:43,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2023-11-20 05:08:48,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=952786.6666666666, ans=0.125 2023-11-20 05:09:09,772 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2023-11-20 05:09:20,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=952920.0, ans=0.025 2023-11-20 05:09:23,754 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 142950 2023-11-20 05:09:27,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=952986.6666666666, ans=0.07 2023-11-20 05:09:36,517 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10700, loss[loss=0.08456, simple_loss=0.1078, pruned_loss=0.02019, audio_tagging_loss=0.0105, over 15560.00 frames. ], tot_loss[loss=0.08151, simple_loss=0.1018, pruned_loss=0.02068, audio_tagging_loss=0.009945, over 3037667.46 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:09:49,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=953120.0, ans=0.95 2023-11-20 05:10:02,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=953186.6666666666, ans=0.125 2023-11-20 05:10:06,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=953186.6666666666, ans=0.125 2023-11-20 05:10:11,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.248e+01 8.993e+01 9.641e+01 1.206e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 05:10:28,374 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143000 2023-11-20 05:10:38,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=953320.0, ans=0.1 2023-11-20 05:10:39,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=953386.6666666666, ans=0.0 2023-11-20 05:10:40,855 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10750, loss[loss=0.08843, simple_loss=0.1213, pruned_loss=0.02001, audio_tagging_loss=0.007757, over 15754.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1025, pruned_loss=0.02088, audio_tagging_loss=0.00991, over 3038605.69 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:11:24,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=953586.6666666666, ans=0.125 2023-11-20 05:11:32,726 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143050 2023-11-20 05:11:45,910 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10800, loss[loss=0.08304, simple_loss=0.1092, pruned_loss=0.01952, audio_tagging_loss=0.008909, over 15120.00 frames. ], tot_loss[loss=0.0812, simple_loss=0.1017, pruned_loss=0.02044, audio_tagging_loss=0.009923, over 3036178.27 frames. ], batch size: 55, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:11:59,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=953786.6666666666, ans=0.04949747468305833 2023-11-20 05:12:07,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=953786.6666666666, ans=0.125 2023-11-20 05:12:20,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 7.886e+01 8.544e+01 9.371e+01 1.667e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 05:12:26,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=953920.0, ans=0.125 2023-11-20 05:12:32,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=953920.0, ans=0.125 2023-11-20 05:12:32,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=953920.0, ans=0.125 2023-11-20 05:12:37,717 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143100 2023-11-20 05:12:50,260 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10850, loss[loss=0.06827, simple_loss=0.07605, pruned_loss=0.01802, audio_tagging_loss=0.01222, over 16924.00 frames. ], tot_loss[loss=0.08143, simple_loss=0.1017, pruned_loss=0.02048, audio_tagging_loss=0.01009, over 3047465.73 frames. ], batch size: 68, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:12:52,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-11-20 05:12:52,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=954053.3333333334, ans=0.0 2023-11-20 05:13:16,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=954186.6666666666, ans=0.95 2023-11-20 05:13:20,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=954186.6666666666, ans=0.125 2023-11-20 05:13:25,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2023-11-20 05:13:28,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-20 05:13:41,527 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143150 2023-11-20 05:13:44,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=954320.0, ans=0.125 2023-11-20 05:13:50,147 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:13:52,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=954386.6666666666, ans=0.125 2023-11-20 05:13:53,700 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10900, loss[loss=0.06324, simple_loss=0.07123, pruned_loss=0.0142, audio_tagging_loss=0.01342, over 15838.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.1022, pruned_loss=0.02065, audio_tagging_loss=0.01014, over 3049530.23 frames. ], batch size: 63, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:14:07,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=954453.3333333334, ans=0.0 2023-11-20 05:14:20,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=954520.0, ans=0.125 2023-11-20 05:14:24,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=954520.0, ans=0.2 2023-11-20 05:14:29,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=954520.0, ans=0.2 2023-11-20 05:14:30,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.320e+01 9.175e+01 1.028e+02 1.481e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 05:14:31,546 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.07 vs. limit=10.0 2023-11-20 05:14:34,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=954586.6666666666, ans=0.125 2023-11-20 05:14:34,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=954586.6666666666, ans=0.125 2023-11-20 05:14:45,536 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143200 2023-11-20 05:14:47,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=954653.3333333334, ans=0.125 2023-11-20 05:14:49,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=954653.3333333334, ans=0.125 2023-11-20 05:14:59,431 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 10950, loss[loss=0.09176, simple_loss=0.1164, pruned_loss=0.02235, audio_tagging_loss=0.01121, over 16591.00 frames. ], tot_loss[loss=0.0817, simple_loss=0.1017, pruned_loss=0.02058, audio_tagging_loss=0.01027, over 3044893.30 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:15:27,369 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2023-11-20 05:15:42,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=954920.0, ans=0.0 2023-11-20 05:15:42,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=954920.0, ans=0.0 2023-11-20 05:15:48,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=954920.0, ans=0.125 2023-11-20 05:15:51,653 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143250 2023-11-20 05:15:54,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=954986.6666666666, ans=0.0 2023-11-20 05:15:54,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=954986.6666666666, ans=0.125 2023-11-20 05:15:56,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2023-11-20 05:16:02,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=954986.6666666666, ans=0.2 2023-11-20 05:16:04,524 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11000, loss[loss=0.08198, simple_loss=0.0983, pruned_loss=0.02261, audio_tagging_loss=0.01022, over 15049.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1022, pruned_loss=0.02063, audio_tagging_loss=0.01027, over 3044931.51 frames. ], batch size: 57, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:16:07,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=955053.3333333334, ans=0.125 2023-11-20 05:16:15,005 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:16:17,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=955120.0, ans=0.125 2023-11-20 05:16:18,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=955120.0, ans=0.2 2023-11-20 05:16:39,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=955186.6666666666, ans=0.2 2023-11-20 05:16:40,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.224e+01 8.941e+01 1.006e+02 1.362e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 05:16:55,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=955320.0, ans=0.0 2023-11-20 05:16:56,672 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143300 2023-11-20 05:16:56,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=955320.0, ans=0.125 2023-11-20 05:17:03,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-20 05:17:05,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=955320.0, ans=0.1 2023-11-20 05:17:06,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=955320.0, ans=0.2 2023-11-20 05:17:08,878 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11050, loss[loss=0.09116, simple_loss=0.1041, pruned_loss=0.0254, audio_tagging_loss=0.01368, over 14582.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1024, pruned_loss=0.02059, audio_tagging_loss=0.01021, over 3047311.45 frames. ], batch size: 55, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:17:20,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=955453.3333333334, ans=0.125 2023-11-20 05:17:27,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=955453.3333333334, ans=0.125 2023-11-20 05:17:53,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=955586.6666666666, ans=0.125 2023-11-20 05:17:56,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=955586.6666666666, ans=0.125 2023-11-20 05:17:57,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=955586.6666666666, ans=0.0 2023-11-20 05:18:00,875 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143350 2023-11-20 05:18:14,384 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11100, loss[loss=0.0914, simple_loss=0.1046, pruned_loss=0.02772, audio_tagging_loss=0.01139, over 14644.00 frames. ], tot_loss[loss=0.08257, simple_loss=0.1028, pruned_loss=0.02091, audio_tagging_loss=0.01028, over 3048377.41 frames. ], batch size: 57, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:18:26,181 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.07 vs. limit=10.0 2023-11-20 05:18:34,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=22.5 2023-11-20 05:18:49,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.290e+01 8.982e+01 9.769e+01 2.008e+02, threshold=1.796e+02, percent-clipped=1.0 2023-11-20 05:19:05,724 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143400 2023-11-20 05:19:18,866 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11150, loss[loss=0.08127, simple_loss=0.1107, pruned_loss=0.01964, audio_tagging_loss=0.006272, over 14964.00 frames. ], tot_loss[loss=0.08207, simple_loss=0.1019, pruned_loss=0.02068, audio_tagging_loss=0.01042, over 3050820.58 frames. ], batch size: 56, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:19:26,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=956053.3333333334, ans=0.0 2023-11-20 05:19:29,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=956053.3333333334, ans=0.02 2023-11-20 05:19:41,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=956120.0, ans=0.125 2023-11-20 05:19:47,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=956186.6666666666, ans=0.125 2023-11-20 05:20:09,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=956320.0, ans=0.0 2023-11-20 05:20:10,313 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143450 2023-11-20 05:20:14,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=956320.0, ans=0.0 2023-11-20 05:20:23,233 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11200, loss[loss=0.06531, simple_loss=0.07395, pruned_loss=0.01473, audio_tagging_loss=0.01361, over 14838.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.102, pruned_loss=0.02054, audio_tagging_loss=0.01051, over 3052627.69 frames. ], batch size: 56, lr: 5.66e-03, grad_scale: 32.0 2023-11-20 05:20:46,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=956453.3333333334, ans=0.125 2023-11-20 05:20:59,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.186e+01 8.107e+01 8.782e+01 9.452e+01 1.606e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 05:21:14,933 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143500 2023-11-20 05:21:27,545 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11250, loss[loss=0.07588, simple_loss=0.09823, pruned_loss=0.01826, audio_tagging_loss=0.008505, over 15048.00 frames. ], tot_loss[loss=0.08179, simple_loss=0.1017, pruned_loss=0.02052, audio_tagging_loss=0.01042, over 3047869.26 frames. ], batch size: 57, lr: 5.66e-03, grad_scale: 32.0 2023-11-20 05:21:52,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=956853.3333333334, ans=0.1 2023-11-20 05:22:11,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=956920.0, ans=0.1 2023-11-20 05:22:19,686 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143550 2023-11-20 05:22:24,355 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-11-20 05:22:32,451 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11300, loss[loss=0.09159, simple_loss=0.1101, pruned_loss=0.02511, audio_tagging_loss=0.01144, over 14838.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.1035, pruned_loss=0.02077, audio_tagging_loss=0.01024, over 3048158.93 frames. ], batch size: 57, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:22:41,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=957053.3333333334, ans=0.07 2023-11-20 05:22:43,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=957053.3333333334, ans=0.125 2023-11-20 05:23:09,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-20 05:23:10,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 8.144e+01 8.778e+01 9.554e+01 1.373e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 05:23:24,904 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143600 2023-11-20 05:23:32,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=957320.0, ans=0.1 2023-11-20 05:23:37,249 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11350, loss[loss=0.08332, simple_loss=0.1034, pruned_loss=0.02218, audio_tagging_loss=0.009438, over 15554.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1026, pruned_loss=0.02059, audio_tagging_loss=0.01012, over 3038170.66 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:24:02,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=957520.0, ans=0.0 2023-11-20 05:24:29,274 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143650 2023-11-20 05:24:42,576 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11400, loss[loss=0.0941, simple_loss=0.1094, pruned_loss=0.0256, audio_tagging_loss=0.0138, over 15725.00 frames. ], tot_loss[loss=0.08148, simple_loss=0.1021, pruned_loss=0.02038, audio_tagging_loss=0.01003, over 3033968.80 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:25:08,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=957853.3333333334, ans=0.125 2023-11-20 05:25:12,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=957853.3333333334, ans=0.0 2023-11-20 05:25:19,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.041e+01 8.720e+01 9.566e+01 1.309e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 05:25:25,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=22.5 2023-11-20 05:25:34,558 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143700 2023-11-20 05:25:46,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=958053.3333333334, ans=0.1 2023-11-20 05:25:47,445 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11450, loss[loss=0.08305, simple_loss=0.1062, pruned_loss=0.02078, audio_tagging_loss=0.009174, over 15793.00 frames. ], tot_loss[loss=0.08088, simple_loss=0.1012, pruned_loss=0.02019, audio_tagging_loss=0.01008, over 3039307.16 frames. ], batch size: 59, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:25:53,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=958053.3333333334, ans=0.05 2023-11-20 05:25:58,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=958053.3333333334, ans=0.1 2023-11-20 05:26:27,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-20 05:26:38,718 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143750 2023-11-20 05:26:51,476 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11500, loss[loss=0.06618, simple_loss=0.0831, pruned_loss=0.01329, audio_tagging_loss=0.01134, over 15315.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1019, pruned_loss=0.02054, audio_tagging_loss=0.01004, over 3037518.27 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:27:16,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=958520.0, ans=15.0 2023-11-20 05:27:20,294 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:27:21,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=958520.0, ans=0.1 2023-11-20 05:27:29,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 7.975e+01 8.821e+01 9.410e+01 1.240e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 05:27:35,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=958586.6666666666, ans=0.1 2023-11-20 05:27:42,806 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143800 2023-11-20 05:27:56,025 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11550, loss[loss=0.1039, simple_loss=0.1293, pruned_loss=0.02935, audio_tagging_loss=0.009883, over 14968.00 frames. ], tot_loss[loss=0.08228, simple_loss=0.1029, pruned_loss=0.02076, audio_tagging_loss=0.01005, over 3037214.43 frames. ], batch size: 57, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:28:08,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-20 05:28:09,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=958786.6666666666, ans=0.125 2023-11-20 05:28:28,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.52 vs. limit=10.0 2023-11-20 05:28:28,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=958853.3333333334, ans=0.0 2023-11-20 05:28:36,104 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:28:39,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=958920.0, ans=0.125 2023-11-20 05:28:44,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2023-11-20 05:28:48,355 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143850 2023-11-20 05:28:53,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-20 05:29:01,032 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11600, loss[loss=0.08929, simple_loss=0.1125, pruned_loss=0.0246, audio_tagging_loss=0.008442, over 16206.00 frames. ], tot_loss[loss=0.08218, simple_loss=0.1027, pruned_loss=0.02079, audio_tagging_loss=0.01005, over 3041232.87 frames. ], batch size: 61, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:29:06,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=959053.3333333334, ans=0.125 2023-11-20 05:29:10,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=959053.3333333334, ans=0.035 2023-11-20 05:29:15,417 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:29:22,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=959120.0, ans=0.0 2023-11-20 05:29:30,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2023-11-20 05:29:38,410 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.207e+01 8.809e+01 9.519e+01 1.148e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 05:29:52,841 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143900 2023-11-20 05:29:59,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=959320.0, ans=0.0 2023-11-20 05:30:05,601 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11650, loss[loss=0.04789, simple_loss=0.04637, pruned_loss=0.01295, audio_tagging_loss=0.01175, over 14553.00 frames. ], tot_loss[loss=0.08185, simple_loss=0.1019, pruned_loss=0.02078, audio_tagging_loss=0.01014, over 3039151.30 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:30:13,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-11-20 05:30:30,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=959520.0, ans=0.125 2023-11-20 05:30:40,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=959520.0, ans=0.1 2023-11-20 05:30:45,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=959586.6666666666, ans=0.125 2023-11-20 05:30:57,367 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 143950 2023-11-20 05:31:09,579 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11700, loss[loss=0.08478, simple_loss=0.1102, pruned_loss=0.01904, audio_tagging_loss=0.01066, over 14691.00 frames. ], tot_loss[loss=0.08214, simple_loss=0.1021, pruned_loss=0.02098, audio_tagging_loss=0.0101, over 3037979.22 frames. ], batch size: 53, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:31:23,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=959786.6666666666, ans=0.125 2023-11-20 05:31:46,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.116e+01 8.169e+01 8.905e+01 9.543e+01 1.392e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 05:31:52,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=959920.0, ans=0.2 2023-11-20 05:32:00,440 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144000 2023-11-20 05:32:13,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=959986.6666666666, ans=0.05 2023-11-20 05:32:17,070 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11750, loss[loss=0.06907, simple_loss=0.09016, pruned_loss=0.01718, audio_tagging_loss=0.006808, over 15742.00 frames. ], tot_loss[loss=0.08155, simple_loss=0.1015, pruned_loss=0.02066, audio_tagging_loss=0.01014, over 3046758.70 frames. ], batch size: 58, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:32:21,083 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.206e-02 2023-11-20 05:32:26,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=960053.3333333334, ans=0.125 2023-11-20 05:32:26,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=960053.3333333334, ans=0.2 2023-11-20 05:32:31,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=960120.0, ans=0.125 2023-11-20 05:32:32,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=12.0 2023-11-20 05:32:47,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=960186.6666666666, ans=0.04949747468305833 2023-11-20 05:33:08,922 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144050 2023-11-20 05:33:15,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=960320.0, ans=0.0 2023-11-20 05:33:19,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=960320.0, ans=0.125 2023-11-20 05:33:21,607 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11800, loss[loss=0.1047, simple_loss=0.1321, pruned_loss=0.0286, audio_tagging_loss=0.01004, over 15501.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1009, pruned_loss=0.02045, audio_tagging_loss=0.01027, over 3035670.44 frames. ], batch size: 58, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:33:27,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=12.0 2023-11-20 05:33:59,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.293e+01 8.347e+01 8.824e+01 9.407e+01 1.316e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 05:34:08,318 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:34:11,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=960653.3333333334, ans=0.125 2023-11-20 05:34:12,998 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144100 2023-11-20 05:34:20,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=960653.3333333334, ans=0.125 2023-11-20 05:34:25,237 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11850, loss[loss=0.05936, simple_loss=0.06779, pruned_loss=0.01626, audio_tagging_loss=0.009214, over 15695.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1019, pruned_loss=0.02064, audio_tagging_loss=0.0102, over 3042005.54 frames. ], batch size: 60, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:34:36,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=960786.6666666666, ans=0.1 2023-11-20 05:34:49,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=960786.6666666666, ans=0.125 2023-11-20 05:34:57,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=960853.3333333334, ans=0.1 2023-11-20 05:35:02,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=960853.3333333334, ans=0.2 2023-11-20 05:35:09,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=960920.0, ans=0.2 2023-11-20 05:35:16,555 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144150 2023-11-20 05:35:26,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2023-11-20 05:35:28,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-11-20 05:35:29,886 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11900, loss[loss=0.08335, simple_loss=0.09804, pruned_loss=0.02253, audio_tagging_loss=0.0118, over 16028.00 frames. ], tot_loss[loss=0.08224, simple_loss=0.1026, pruned_loss=0.02074, audio_tagging_loss=0.01022, over 3042093.26 frames. ], batch size: 62, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:35:38,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=961053.3333333334, ans=0.125 2023-11-20 05:35:57,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=961186.6666666666, ans=0.125 2023-11-20 05:35:58,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=961186.6666666666, ans=0.0 2023-11-20 05:36:07,727 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.202e+01 8.808e+01 9.616e+01 1.545e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 05:36:10,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=961253.3333333334, ans=0.125 2023-11-20 05:36:21,915 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144200 2023-11-20 05:36:35,087 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 11950, loss[loss=0.08639, simple_loss=0.1107, pruned_loss=0.02342, audio_tagging_loss=0.007605, over 15098.00 frames. ], tot_loss[loss=0.08236, simple_loss=0.1024, pruned_loss=0.02086, audio_tagging_loss=0.0103, over 3047111.13 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:36:45,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=961386.6666666666, ans=0.125 2023-11-20 05:36:50,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=961453.3333333334, ans=0.125 2023-11-20 05:37:05,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=961520.0, ans=0.125 2023-11-20 05:37:14,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=961586.6666666666, ans=0.125 2023-11-20 05:37:25,338 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144250 2023-11-20 05:37:28,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=961653.3333333334, ans=0.2 2023-11-20 05:37:35,497 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:37:37,572 INFO [train_asr.py:1262] (2/4) Epoch 12, batch 12000, loss[loss=0.06879, simple_loss=0.08416, pruned_loss=0.01706, audio_tagging_loss=0.009654, over 16308.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.1018, pruned_loss=0.02064, audio_tagging_loss=0.01035, over 3045554.77 frames. ], batch size: 62, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:37:37,573 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 05:38:18,774 INFO [train_asr.py:1294] (2/4) Epoch 12, validation: loss=0.06309, simple_loss=0.0542, pruned_loss=0.005937, audio_tagging_loss=0.03005, over 4681554.00 frames. 2023-11-20 05:38:18,775 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 05:38:21,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=961720.0, ans=0.04949747468305833 2023-11-20 05:38:39,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-20 05:39:27,283 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 0, loss[loss=0.0932, simple_loss=0.09366, pruned_loss=0.01867, audio_tagging_loss=0.0277, over 14840.00 frames. ], tot_loss[loss=0.0932, simple_loss=0.09366, pruned_loss=0.01867, audio_tagging_loss=0.0277, over 14840.00 frames. ], batch size: 57, lr: 5.43e-03, grad_scale: 32.0 2023-11-20 05:39:27,284 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 05:40:04,330 INFO [train_asr.py:1294] (2/4) Epoch 13, validation: loss=0.06272, simple_loss=0.05429, pruned_loss=0.006071, audio_tagging_loss=0.02951, over 4681554.00 frames. 2023-11-20 05:40:04,331 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 05:40:10,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.395e+01 8.159e+01 8.856e+01 9.666e+01 1.294e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 05:40:11,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2023-11-20 05:40:20,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=961953.3333333334, ans=0.125 2023-11-20 05:40:23,167 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144300 2023-11-20 05:40:35,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=962020.0, ans=0.125 2023-11-20 05:40:54,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=962086.6666666666, ans=0.0 2023-11-20 05:40:55,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=962153.3333333334, ans=0.0 2023-11-20 05:41:09,222 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 50, loss[loss=0.08819, simple_loss=0.08658, pruned_loss=0.02228, audio_tagging_loss=0.02262, over 14456.00 frames. ], tot_loss[loss=0.08859, simple_loss=0.09879, pruned_loss=0.01973, audio_tagging_loss=0.01947, over 686692.25 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:41:10,081 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.56 vs. limit=10.0 2023-11-20 05:41:15,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=962220.0, ans=0.125 2023-11-20 05:41:20,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=962286.6666666666, ans=0.125 2023-11-20 05:41:24,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=962286.6666666666, ans=0.2 2023-11-20 05:41:28,962 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144350 2023-11-20 05:41:34,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=962353.3333333334, ans=0.125 2023-11-20 05:41:55,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=962420.0, ans=0.0 2023-11-20 05:42:00,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=962486.6666666666, ans=0.125 2023-11-20 05:42:04,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=962486.6666666666, ans=0.5 2023-11-20 05:42:12,681 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 100, loss[loss=0.08615, simple_loss=0.1004, pruned_loss=0.01897, audio_tagging_loss=0.01697, over 15683.00 frames. ], tot_loss[loss=0.09014, simple_loss=0.103, pruned_loss=0.02033, audio_tagging_loss=0.0183, over 1211322.85 frames. ], batch size: 59, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:42:19,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=962553.3333333334, ans=0.2 2023-11-20 05:42:21,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.954e+01 9.518e+01 1.027e+02 1.327e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-20 05:42:22,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=962553.3333333334, ans=0.125 2023-11-20 05:42:33,887 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144400 2023-11-20 05:42:40,645 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:42:45,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=962686.6666666666, ans=0.015 2023-11-20 05:42:45,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=962686.6666666666, ans=0.1 2023-11-20 05:42:50,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=962686.6666666666, ans=0.125 2023-11-20 05:42:59,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=962753.3333333334, ans=0.125 2023-11-20 05:43:01,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-20 05:43:01,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=962753.3333333334, ans=0.035 2023-11-20 05:43:12,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=962820.0, ans=0.07 2023-11-20 05:43:14,403 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=12.0 2023-11-20 05:43:15,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=962820.0, ans=0.125 2023-11-20 05:43:18,745 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 150, loss[loss=0.08199, simple_loss=0.1027, pruned_loss=0.01971, audio_tagging_loss=0.01094, over 14524.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1019, pruned_loss=0.01992, audio_tagging_loss=0.0165, over 1619986.80 frames. ], batch size: 53, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:43:38,527 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144450 2023-11-20 05:43:41,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=962953.3333333334, ans=0.125 2023-11-20 05:43:48,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=963020.0, ans=0.125 2023-11-20 05:43:49,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2023-11-20 05:44:20,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=963153.3333333334, ans=0.125 2023-11-20 05:44:24,353 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 200, loss[loss=0.08622, simple_loss=0.1093, pruned_loss=0.0219, audio_tagging_loss=0.00966, over 15161.00 frames. ], tot_loss[loss=0.08513, simple_loss=0.1014, pruned_loss=0.01988, audio_tagging_loss=0.01456, over 1936018.65 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:44:31,993 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.317e+01 9.168e+01 9.939e+01 1.407e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 05:44:36,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=963286.6666666666, ans=0.125 2023-11-20 05:44:43,899 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144500 2023-11-20 05:45:15,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2023-11-20 05:45:26,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=963486.6666666666, ans=0.125 2023-11-20 05:45:28,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=963553.3333333334, ans=0.125 2023-11-20 05:45:28,849 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 250, loss[loss=0.08769, simple_loss=0.1027, pruned_loss=0.02648, audio_tagging_loss=0.009869, over 16437.00 frames. ], tot_loss[loss=0.08425, simple_loss=0.1015, pruned_loss=0.02021, audio_tagging_loss=0.01328, over 2172793.76 frames. ], batch size: 61, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:45:39,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=963553.3333333334, ans=0.0 2023-11-20 05:45:45,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=963620.0, ans=0.2 2023-11-20 05:45:49,378 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144550 2023-11-20 05:45:57,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=963686.6666666666, ans=0.125 2023-11-20 05:45:57,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=963686.6666666666, ans=0.125 2023-11-20 05:46:05,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=963686.6666666666, ans=0.125 2023-11-20 05:46:08,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=963753.3333333334, ans=0.1 2023-11-20 05:46:10,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=963753.3333333334, ans=0.125 2023-11-20 05:46:16,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2023-11-20 05:46:29,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=963820.0, ans=0.0 2023-11-20 05:46:30,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=963820.0, ans=0.125 2023-11-20 05:46:34,390 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 300, loss[loss=0.09721, simple_loss=0.1248, pruned_loss=0.02571, audio_tagging_loss=0.009125, over 15113.00 frames. ], tot_loss[loss=0.08397, simple_loss=0.1024, pruned_loss=0.02046, audio_tagging_loss=0.01229, over 2374780.46 frames. ], batch size: 54, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:46:42,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 8.488e+01 9.150e+01 9.824e+01 1.478e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 05:46:44,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.16 vs. limit=10.0 2023-11-20 05:46:51,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=22.5 2023-11-20 05:46:54,360 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144600 2023-11-20 05:47:22,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=964086.6666666666, ans=0.1 2023-11-20 05:47:28,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2023-11-20 05:47:40,290 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 350, loss[loss=0.09438, simple_loss=0.1294, pruned_loss=0.02145, audio_tagging_loss=0.008248, over 15518.00 frames. ], tot_loss[loss=0.0841, simple_loss=0.1036, pruned_loss=0.0207, audio_tagging_loss=0.01162, over 2528388.11 frames. ], batch size: 56, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:47:45,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-20 05:47:58,766 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144650 2023-11-20 05:48:09,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2023-11-20 05:48:11,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=964353.3333333334, ans=0.125 2023-11-20 05:48:20,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-20 05:48:26,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=964420.0, ans=0.0 2023-11-20 05:48:44,442 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 400, loss[loss=0.08202, simple_loss=0.09341, pruned_loss=0.02292, audio_tagging_loss=0.01239, over 14290.00 frames. ], tot_loss[loss=0.08306, simple_loss=0.1024, pruned_loss=0.02049, audio_tagging_loss=0.01136, over 2640736.52 frames. ], batch size: 55, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:48:47,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=964553.3333333334, ans=0.0 2023-11-20 05:48:48,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=964553.3333333334, ans=0.125 2023-11-20 05:48:52,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.150e+01 8.876e+01 9.638e+01 1.255e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 05:48:52,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=964553.3333333334, ans=10.0 2023-11-20 05:48:57,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=964620.0, ans=0.0 2023-11-20 05:49:04,235 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144700 2023-11-20 05:49:18,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=964686.6666666666, ans=10.0 2023-11-20 05:49:20,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964686.6666666666, ans=0.1 2023-11-20 05:49:27,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=964753.3333333334, ans=0.125 2023-11-20 05:49:36,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=964820.0, ans=0.0 2023-11-20 05:49:36,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-20 05:49:49,700 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 450, loss[loss=0.08366, simple_loss=0.1047, pruned_loss=0.01998, audio_tagging_loss=0.01131, over 15647.00 frames. ], tot_loss[loss=0.08244, simple_loss=0.102, pruned_loss=0.02042, audio_tagging_loss=0.01101, over 2729640.92 frames. ], batch size: 59, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:50:02,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=964953.3333333334, ans=0.125 2023-11-20 05:50:05,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=964953.3333333334, ans=0.0 2023-11-20 05:50:08,938 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144750 2023-11-20 05:50:34,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=965086.6666666666, ans=0.125 2023-11-20 05:50:50,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-20 05:50:54,205 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 500, loss[loss=0.084, simple_loss=0.107, pruned_loss=0.0222, audio_tagging_loss=0.008271, over 15401.00 frames. ], tot_loss[loss=0.08222, simple_loss=0.1021, pruned_loss=0.02038, audio_tagging_loss=0.01081, over 2805327.01 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:50:55,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=965220.0, ans=0.125 2023-11-20 05:51:02,238 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.170e+01 8.806e+01 9.563e+01 1.167e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 05:51:13,707 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144800 2023-11-20 05:51:19,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=965353.3333333334, ans=0.0 2023-11-20 05:51:19,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=965353.3333333334, ans=0.2 2023-11-20 05:51:24,051 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:51:36,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=965420.0, ans=0.0 2023-11-20 05:51:38,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=965420.0, ans=0.125 2023-11-20 05:51:43,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965420.0, ans=0.1 2023-11-20 05:51:43,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=965420.0, ans=0.0 2023-11-20 05:51:46,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=965486.6666666666, ans=0.125 2023-11-20 05:51:50,717 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2023-11-20 05:51:59,449 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 550, loss[loss=0.06763, simple_loss=0.07402, pruned_loss=0.01734, audio_tagging_loss=0.01328, over 15273.00 frames. ], tot_loss[loss=0.08128, simple_loss=0.101, pruned_loss=0.02016, audio_tagging_loss=0.01063, over 2854493.52 frames. ], batch size: 60, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:52:19,372 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144850 2023-11-20 05:52:31,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=965686.6666666666, ans=0.0 2023-11-20 05:52:37,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=965753.3333333334, ans=0.05 2023-11-20 05:52:38,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=965753.3333333334, ans=0.1 2023-11-20 05:52:40,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2023-11-20 05:53:04,627 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 600, loss[loss=0.0812, simple_loss=0.09905, pruned_loss=0.0218, audio_tagging_loss=0.009878, over 14615.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1018, pruned_loss=0.02024, audio_tagging_loss=0.0105, over 2901489.02 frames. ], batch size: 55, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:53:12,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.106e+01 8.665e+01 9.302e+01 1.312e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 05:53:13,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=965886.6666666666, ans=0.125 2023-11-20 05:53:14,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=965886.6666666666, ans=10.0 2023-11-20 05:53:24,021 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144900 2023-11-20 05:53:26,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.49 vs. limit=15.0 2023-11-20 05:53:37,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=966020.0, ans=0.0 2023-11-20 05:53:43,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=966086.6666666666, ans=0.0 2023-11-20 05:53:48,055 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:53:52,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=966086.6666666666, ans=0.125 2023-11-20 05:54:03,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=966153.3333333334, ans=0.2 2023-11-20 05:54:10,279 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 650, loss[loss=0.07215, simple_loss=0.09343, pruned_loss=0.01443, audio_tagging_loss=0.01101, over 15791.00 frames. ], tot_loss[loss=0.08144, simple_loss=0.1016, pruned_loss=0.02028, audio_tagging_loss=0.01034, over 2924668.14 frames. ], batch size: 61, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:54:29,384 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 144950 2023-11-20 05:54:37,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=966353.3333333334, ans=0.125 2023-11-20 05:54:38,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-20 05:54:47,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2023-11-20 05:55:02,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=966486.6666666666, ans=0.125 2023-11-20 05:55:03,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=966486.6666666666, ans=0.1 2023-11-20 05:55:14,354 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 700, loss[loss=0.0779, simple_loss=0.1054, pruned_loss=0.01784, audio_tagging_loss=0.007349, over 15376.00 frames. ], tot_loss[loss=0.08216, simple_loss=0.1029, pruned_loss=0.02049, audio_tagging_loss=0.01022, over 2951522.11 frames. ], batch size: 58, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:55:22,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.027e+01 8.645e+01 9.342e+01 1.133e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 05:55:34,924 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145000 2023-11-20 05:55:42,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=966686.6666666666, ans=0.0 2023-11-20 05:55:45,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=966686.6666666666, ans=0.0 2023-11-20 05:55:47,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-20 05:55:59,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=966753.3333333334, ans=0.125 2023-11-20 05:56:17,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-20 05:56:21,168 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 750, loss[loss=0.08206, simple_loss=0.1027, pruned_loss=0.01963, audio_tagging_loss=0.0111, over 16064.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.1019, pruned_loss=0.02021, audio_tagging_loss=0.01024, over 2977103.86 frames. ], batch size: 60, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:56:36,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.37 vs. limit=10.0 2023-11-20 05:56:40,404 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145050 2023-11-20 05:56:49,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=967020.0, ans=0.125 2023-11-20 05:56:53,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=967020.0, ans=0.0 2023-11-20 05:57:04,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=967086.6666666666, ans=0.0 2023-11-20 05:57:06,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=967086.6666666666, ans=0.0 2023-11-20 05:57:22,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=967153.3333333334, ans=0.125 2023-11-20 05:57:25,798 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 800, loss[loss=0.07486, simple_loss=0.1036, pruned_loss=0.01393, audio_tagging_loss=0.009111, over 15683.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.101, pruned_loss=0.02011, audio_tagging_loss=0.01042, over 2985674.41 frames. ], batch size: 58, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:57:31,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=967220.0, ans=0.125 2023-11-20 05:57:33,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.560e+01 8.396e+01 9.088e+01 9.762e+01 1.189e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-20 05:57:44,469 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145100 2023-11-20 05:57:57,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=967353.3333333334, ans=0.125 2023-11-20 05:58:11,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=967420.0, ans=0.125 2023-11-20 05:58:16,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=967486.6666666666, ans=0.05 2023-11-20 05:58:25,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2023-11-20 05:58:29,591 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 850, loss[loss=0.08128, simple_loss=0.09987, pruned_loss=0.02002, audio_tagging_loss=0.01133, over 14601.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1016, pruned_loss=0.02031, audio_tagging_loss=0.01041, over 3003350.95 frames. ], batch size: 53, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:58:29,973 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:58:49,709 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145150 2023-11-20 05:58:49,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=967620.0, ans=0.1 2023-11-20 05:59:00,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=967686.6666666666, ans=0.125 2023-11-20 05:59:05,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=967686.6666666666, ans=0.1 2023-11-20 05:59:10,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=967753.3333333334, ans=0.125 2023-11-20 05:59:18,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2023-11-20 05:59:26,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=967820.0, ans=0.2 2023-11-20 05:59:34,785 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 900, loss[loss=0.06036, simple_loss=0.06995, pruned_loss=0.01324, audio_tagging_loss=0.01215, over 14951.00 frames. ], tot_loss[loss=0.0815, simple_loss=0.1015, pruned_loss=0.0203, audio_tagging_loss=0.01043, over 3008347.38 frames. ], batch size: 56, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:59:36,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-20 05:59:42,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.961e+01 8.143e+01 8.772e+01 9.521e+01 1.429e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 05:59:54,362 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145200 2023-11-20 06:00:10,227 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-20 06:00:17,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=968086.6666666666, ans=0.125 2023-11-20 06:00:40,237 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 950, loss[loss=0.08985, simple_loss=0.1275, pruned_loss=0.0203, audio_tagging_loss=0.005811, over 14888.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1015, pruned_loss=0.0203, audio_tagging_loss=0.01033, over 3019996.35 frames. ], batch size: 56, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 06:00:53,922 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:00:58,581 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145250 2023-11-20 06:01:09,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968353.3333333334, ans=0.1 2023-11-20 06:01:11,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=968353.3333333334, ans=0.0 2023-11-20 06:01:31,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=968486.6666666666, ans=0.1 2023-11-20 06:01:43,365 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1000, loss[loss=0.07384, simple_loss=0.09404, pruned_loss=0.01662, audio_tagging_loss=0.01021, over 15841.00 frames. ], tot_loss[loss=0.08129, simple_loss=0.1015, pruned_loss=0.02038, audio_tagging_loss=0.01013, over 3023852.10 frames. ], batch size: 59, lr: 5.41e-03, grad_scale: 16.0 2023-11-20 06:01:44,923 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.000e-02 2023-11-20 06:01:51,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 7.989e+01 8.669e+01 9.928e+01 1.196e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 06:01:55,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2023-11-20 06:01:59,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=968620.0, ans=0.0 2023-11-20 06:02:02,739 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145300 2023-11-20 06:02:10,074 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:02:15,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968686.6666666666, ans=0.1 2023-11-20 06:02:35,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=968820.0, ans=0.125 2023-11-20 06:02:47,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=968886.6666666666, ans=0.125 2023-11-20 06:02:48,449 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1050, loss[loss=0.09584, simple_loss=0.1284, pruned_loss=0.02416, audio_tagging_loss=0.007508, over 15066.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1007, pruned_loss=0.0202, audio_tagging_loss=0.01009, over 3021190.10 frames. ], batch size: 53, lr: 5.41e-03, grad_scale: 16.0 2023-11-20 06:03:09,051 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145350 2023-11-20 06:03:11,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=968953.3333333334, ans=0.125 2023-11-20 06:03:25,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=969020.0, ans=0.125 2023-11-20 06:03:34,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-20 06:03:54,986 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1100, loss[loss=0.08172, simple_loss=0.1075, pruned_loss=0.01879, audio_tagging_loss=0.009187, over 15308.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.101, pruned_loss=0.02039, audio_tagging_loss=0.01009, over 3025414.97 frames. ], batch size: 57, lr: 5.40e-03, grad_scale: 16.0 2023-11-20 06:03:57,475 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:04:03,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.161e+01 8.676e+01 9.552e+01 1.363e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 06:04:13,741 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145400 2023-11-20 06:04:31,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2023-11-20 06:04:41,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=969420.0, ans=0.125 2023-11-20 06:04:46,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=969486.6666666666, ans=0.5 2023-11-20 06:04:59,688 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1150, loss[loss=0.09236, simple_loss=0.1192, pruned_loss=0.02498, audio_tagging_loss=0.00778, over 15703.00 frames. ], tot_loss[loss=0.08106, simple_loss=0.1012, pruned_loss=0.02044, audio_tagging_loss=0.01002, over 3023898.54 frames. ], batch size: 58, lr: 5.40e-03, grad_scale: 16.0 2023-11-20 06:05:01,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-20 06:05:18,970 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145450 2023-11-20 06:05:35,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=969686.6666666666, ans=0.0 2023-11-20 06:05:38,365 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:05:48,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=969753.3333333334, ans=0.125 2023-11-20 06:05:49,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=969753.3333333334, ans=0.125 2023-11-20 06:05:59,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2023-11-20 06:06:00,674 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.48 vs. limit=10.0 2023-11-20 06:06:02,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=969886.6666666666, ans=0.125 2023-11-20 06:06:03,739 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1200, loss[loss=0.06105, simple_loss=0.07316, pruned_loss=0.01222, audio_tagging_loss=0.01225, over 15274.00 frames. ], tot_loss[loss=0.0807, simple_loss=0.1007, pruned_loss=0.02032, audio_tagging_loss=0.01001, over 3025952.82 frames. ], batch size: 61, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:06:07,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=969886.6666666666, ans=0.125 2023-11-20 06:06:13,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.393e+01 8.955e+01 9.874e+01 3.263e+02, threshold=1.791e+02, percent-clipped=1.0 2023-11-20 06:06:22,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-11-20 06:06:24,513 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145500 2023-11-20 06:06:59,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=12.0 2023-11-20 06:07:05,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=970153.3333333334, ans=0.2 2023-11-20 06:07:06,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=970153.3333333334, ans=0.95 2023-11-20 06:07:09,450 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1250, loss[loss=0.06422, simple_loss=0.08457, pruned_loss=0.01285, audio_tagging_loss=0.009094, over 14683.00 frames. ], tot_loss[loss=0.081, simple_loss=0.1014, pruned_loss=0.0204, audio_tagging_loss=0.009914, over 3028835.14 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:07:28,462 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145550 2023-11-20 06:08:06,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=970486.6666666666, ans=0.2 2023-11-20 06:08:13,559 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1300, loss[loss=0.06511, simple_loss=0.07481, pruned_loss=0.01603, audio_tagging_loss=0.01167, over 14105.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1011, pruned_loss=0.02032, audio_tagging_loss=0.009843, over 3032927.14 frames. ], batch size: 53, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:08:22,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.020e+01 8.850e+01 9.786e+01 1.232e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 06:08:32,733 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145600 2023-11-20 06:08:55,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=970753.3333333334, ans=0.125 2023-11-20 06:09:17,780 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1350, loss[loss=0.08162, simple_loss=0.09558, pruned_loss=0.02387, audio_tagging_loss=0.009962, over 14833.00 frames. ], tot_loss[loss=0.0809, simple_loss=0.1013, pruned_loss=0.02034, audio_tagging_loss=0.009905, over 3037618.62 frames. ], batch size: 55, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:09:37,910 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145650 2023-11-20 06:09:43,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=971020.0, ans=0.125 2023-11-20 06:10:03,351 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:10:12,138 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2023-11-20 06:10:14,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2023-11-20 06:10:21,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=22.5 2023-11-20 06:10:22,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=971220.0, ans=0.2 2023-11-20 06:10:22,949 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1400, loss[loss=0.05567, simple_loss=0.06087, pruned_loss=0.01356, audio_tagging_loss=0.01168, over 15349.00 frames. ], tot_loss[loss=0.0802, simple_loss=0.09998, pruned_loss=0.02008, audio_tagging_loss=0.01013, over 3037158.61 frames. ], batch size: 58, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:10:24,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=971220.0, ans=15.0 2023-11-20 06:10:32,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.085e+01 8.864e+01 9.771e+01 1.469e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 06:10:36,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=971286.6666666666, ans=0.125 2023-11-20 06:10:42,871 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145700 2023-11-20 06:10:51,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=971353.3333333334, ans=0.125 2023-11-20 06:11:01,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-11-20 06:11:28,558 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1450, loss[loss=0.09255, simple_loss=0.1165, pruned_loss=0.02284, audio_tagging_loss=0.01148, over 14673.00 frames. ], tot_loss[loss=0.08051, simple_loss=0.1002, pruned_loss=0.02024, audio_tagging_loss=0.01018, over 3038580.38 frames. ], batch size: 54, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:11:32,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=971553.3333333334, ans=0.2 2023-11-20 06:11:46,886 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145750 2023-11-20 06:11:58,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=971686.6666666666, ans=0.035 2023-11-20 06:12:12,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=971753.3333333334, ans=0.2 2023-11-20 06:12:32,271 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1500, loss[loss=0.07279, simple_loss=0.09864, pruned_loss=0.0144, audio_tagging_loss=0.009073, over 15703.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1006, pruned_loss=0.02042, audio_tagging_loss=0.01026, over 3038105.02 frames. ], batch size: 58, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:12:41,376 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.185e+01 8.333e+01 8.756e+01 9.459e+01 1.276e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 06:12:51,851 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145800 2023-11-20 06:13:02,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-20 06:13:14,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=972086.6666666666, ans=0.125 2023-11-20 06:13:22,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=972153.3333333334, ans=0.125 2023-11-20 06:13:37,321 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1550, loss[loss=0.09176, simple_loss=0.1177, pruned_loss=0.0248, audio_tagging_loss=0.008103, over 15225.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1001, pruned_loss=0.02038, audio_tagging_loss=0.01035, over 3030652.08 frames. ], batch size: 55, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:13:45,894 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2023-11-20 06:13:56,441 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145850 2023-11-20 06:13:59,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=972286.6666666666, ans=0.125 2023-11-20 06:14:13,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=972353.3333333334, ans=0.125 2023-11-20 06:14:14,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=972420.0, ans=0.2 2023-11-20 06:14:38,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=972486.6666666666, ans=0.0 2023-11-20 06:14:40,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=972486.6666666666, ans=0.1 2023-11-20 06:14:42,118 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1600, loss[loss=0.0807, simple_loss=0.09466, pruned_loss=0.02305, audio_tagging_loss=0.01032, over 16211.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1007, pruned_loss=0.02054, audio_tagging_loss=0.0104, over 3039518.94 frames. ], batch size: 63, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:14:42,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2023-11-20 06:14:51,237 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.279e+01 8.833e+01 9.772e+01 1.298e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 06:15:01,489 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145900 2023-11-20 06:15:17,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=972686.6666666666, ans=0.0 2023-11-20 06:15:17,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=972686.6666666666, ans=0.125 2023-11-20 06:15:27,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2023-11-20 06:15:36,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=972820.0, ans=0.125 2023-11-20 06:15:46,796 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1650, loss[loss=0.06334, simple_loss=0.06885, pruned_loss=0.01551, audio_tagging_loss=0.0134, over 14813.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1001, pruned_loss=0.02024, audio_tagging_loss=0.01048, over 3038878.32 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:15:47,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=972886.6666666666, ans=0.0 2023-11-20 06:16:00,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-11-20 06:16:06,548 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 145950 2023-11-20 06:16:19,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=973020.0, ans=0.125 2023-11-20 06:16:44,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=973153.3333333334, ans=0.0 2023-11-20 06:16:51,536 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1700, loss[loss=0.07923, simple_loss=0.1026, pruned_loss=0.01882, audio_tagging_loss=0.009086, over 15483.00 frames. ], tot_loss[loss=0.08065, simple_loss=0.1002, pruned_loss=0.02008, audio_tagging_loss=0.01048, over 3038733.49 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:17:02,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.052e+01 8.811e+01 9.587e+01 1.278e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 06:17:11,385 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146000 2023-11-20 06:17:16,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=973353.3333333334, ans=0.125 2023-11-20 06:17:38,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=22.5 2023-11-20 06:17:54,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=973486.6666666666, ans=0.0 2023-11-20 06:17:56,476 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1750, loss[loss=0.06203, simple_loss=0.07287, pruned_loss=0.01106, audio_tagging_loss=0.01453, over 15146.00 frames. ], tot_loss[loss=0.08127, simple_loss=0.1012, pruned_loss=0.02033, audio_tagging_loss=0.01035, over 3046492.14 frames. ], batch size: 57, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:18:11,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=973620.0, ans=0.0 2023-11-20 06:18:15,700 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146050 2023-11-20 06:18:38,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=973753.3333333334, ans=0.0 2023-11-20 06:18:46,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=973753.3333333334, ans=0.2 2023-11-20 06:18:59,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=973886.6666666666, ans=0.1 2023-11-20 06:19:00,633 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1800, loss[loss=0.08397, simple_loss=0.1125, pruned_loss=0.01968, audio_tagging_loss=0.00805, over 14754.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1008, pruned_loss=0.02036, audio_tagging_loss=0.01026, over 3042925.13 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:19:11,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.340e+01 8.928e+01 9.770e+01 2.074e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-20 06:19:21,350 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146100 2023-11-20 06:19:25,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=973953.3333333334, ans=0.0 2023-11-20 06:19:34,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=974020.0, ans=0.5 2023-11-20 06:19:40,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=974086.6666666666, ans=0.0 2023-11-20 06:19:47,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=974086.6666666666, ans=0.1 2023-11-20 06:19:55,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=974153.3333333334, ans=0.05 2023-11-20 06:20:06,118 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1850, loss[loss=0.09238, simple_loss=0.117, pruned_loss=0.02573, audio_tagging_loss=0.008129, over 16383.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1013, pruned_loss=0.02051, audio_tagging_loss=0.01015, over 3041936.64 frames. ], batch size: 59, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:20:14,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=974220.0, ans=0.0 2023-11-20 06:20:18,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=974286.6666666666, ans=0.07 2023-11-20 06:20:25,778 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146150 2023-11-20 06:20:28,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=974286.6666666666, ans=0.125 2023-11-20 06:20:45,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=974420.0, ans=0.2 2023-11-20 06:20:49,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=974420.0, ans=0.125 2023-11-20 06:21:09,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=22.5 2023-11-20 06:21:11,572 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1900, loss[loss=0.1102, simple_loss=0.1417, pruned_loss=0.0301, audio_tagging_loss=0.00924, over 16509.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.1014, pruned_loss=0.02042, audio_tagging_loss=0.01013, over 3049645.61 frames. ], batch size: 60, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:21:16,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=974553.3333333334, ans=0.125 2023-11-20 06:21:20,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2023-11-20 06:21:21,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.109e+01 8.191e+01 8.714e+01 9.670e+01 1.123e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 06:21:30,192 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146200 2023-11-20 06:21:31,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=974620.0, ans=0.025 2023-11-20 06:21:33,195 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.133e-01 2023-11-20 06:21:42,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=15.0 2023-11-20 06:22:02,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=974820.0, ans=0.125 2023-11-20 06:22:16,072 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 1950, loss[loss=0.07034, simple_loss=0.09497, pruned_loss=0.01355, audio_tagging_loss=0.009305, over 14692.00 frames. ], tot_loss[loss=0.08042, simple_loss=0.1005, pruned_loss=0.02002, audio_tagging_loss=0.01013, over 3050421.76 frames. ], batch size: 55, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:22:17,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=974886.6666666666, ans=0.1 2023-11-20 06:22:18,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=974886.6666666666, ans=0.0 2023-11-20 06:22:29,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=974953.3333333334, ans=0.125 2023-11-20 06:22:35,955 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146250 2023-11-20 06:22:37,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=974953.3333333334, ans=0.2 2023-11-20 06:22:45,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=975020.0, ans=0.125 2023-11-20 06:22:47,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975020.0, ans=0.1 2023-11-20 06:22:47,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=975020.0, ans=0.1 2023-11-20 06:23:03,567 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-20 06:23:21,430 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2000, loss[loss=0.08402, simple_loss=0.1122, pruned_loss=0.01918, audio_tagging_loss=0.00875, over 15483.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.1005, pruned_loss=0.02019, audio_tagging_loss=0.01019, over 3049935.00 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:23:25,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=975220.0, ans=0.1 2023-11-20 06:23:29,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=975220.0, ans=0.125 2023-11-20 06:23:31,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 7.759e+01 8.483e+01 9.476e+01 1.626e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 06:23:38,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=975286.6666666666, ans=0.125 2023-11-20 06:23:41,238 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146300 2023-11-20 06:24:19,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=15.0 2023-11-20 06:24:21,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=975486.6666666666, ans=0.1 2023-11-20 06:24:26,433 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2050, loss[loss=0.08968, simple_loss=0.1069, pruned_loss=0.02555, audio_tagging_loss=0.0107, over 14630.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.1008, pruned_loss=0.02029, audio_tagging_loss=0.01008, over 3044143.80 frames. ], batch size: 57, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:24:27,953 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:24:28,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=975553.3333333334, ans=0.125 2023-11-20 06:24:30,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=975553.3333333334, ans=0.125 2023-11-20 06:24:36,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=975553.3333333334, ans=0.125 2023-11-20 06:24:42,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=975620.0, ans=0.0 2023-11-20 06:24:42,749 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:24:45,181 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146350 2023-11-20 06:24:46,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=975620.0, ans=0.125 2023-11-20 06:25:03,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=975753.3333333334, ans=0.125 2023-11-20 06:25:19,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=975820.0, ans=0.125 2023-11-20 06:25:30,034 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2100, loss[loss=0.06271, simple_loss=0.0778, pruned_loss=0.01433, audio_tagging_loss=0.00948, over 14848.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1014, pruned_loss=0.02036, audio_tagging_loss=0.01006, over 3037147.81 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:25:39,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.216e+01 8.927e+01 9.679e+01 1.244e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 06:25:48,957 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146400 2023-11-20 06:25:58,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=976020.0, ans=0.0 2023-11-20 06:26:20,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=976153.3333333334, ans=0.125 2023-11-20 06:26:21,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=976153.3333333334, ans=0.0 2023-11-20 06:26:26,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-20 06:26:33,946 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2150, loss[loss=0.08618, simple_loss=0.1182, pruned_loss=0.01867, audio_tagging_loss=0.008423, over 14386.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1006, pruned_loss=0.02002, audio_tagging_loss=0.01006, over 3039961.98 frames. ], batch size: 54, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:26:37,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=976220.0, ans=0.05 2023-11-20 06:26:43,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-20 06:26:55,017 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146450 2023-11-20 06:27:07,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=976353.3333333334, ans=0.2 2023-11-20 06:27:11,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=976353.3333333334, ans=0.02 2023-11-20 06:27:12,198 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:27:17,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=976420.0, ans=12.0 2023-11-20 06:27:38,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=976553.3333333334, ans=0.125 2023-11-20 06:27:39,868 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2200, loss[loss=0.0863, simple_loss=0.1157, pruned_loss=0.02024, audio_tagging_loss=0.008229, over 15741.00 frames. ], tot_loss[loss=0.0809, simple_loss=0.1015, pruned_loss=0.02008, audio_tagging_loss=0.01005, over 3039730.33 frames. ], batch size: 63, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:27:49,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=976553.3333333334, ans=0.125 2023-11-20 06:27:50,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.222e+01 8.987e+01 9.495e+01 1.215e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 06:27:53,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=976620.0, ans=0.0 2023-11-20 06:27:59,385 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146500 2023-11-20 06:28:03,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=976620.0, ans=0.025 2023-11-20 06:28:24,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=976753.3333333334, ans=0.125 2023-11-20 06:28:38,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=976820.0, ans=0.125 2023-11-20 06:28:39,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=976820.0, ans=0.125 2023-11-20 06:28:44,333 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2250, loss[loss=0.09804, simple_loss=0.121, pruned_loss=0.0284, audio_tagging_loss=0.009126, over 15912.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.1017, pruned_loss=0.02026, audio_tagging_loss=0.01006, over 3042980.98 frames. ], batch size: 58, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:28:45,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=976886.6666666666, ans=0.0 2023-11-20 06:28:54,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=976886.6666666666, ans=0.0 2023-11-20 06:29:03,380 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146550 2023-11-20 06:29:05,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2023-11-20 06:29:14,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=977020.0, ans=0.0 2023-11-20 06:29:48,061 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2300, loss[loss=0.09568, simple_loss=0.1169, pruned_loss=0.02729, audio_tagging_loss=0.00991, over 16875.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1012, pruned_loss=0.02029, audio_tagging_loss=0.01011, over 3044801.57 frames. ], batch size: 62, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:29:50,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2023-11-20 06:29:58,633 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.545e+01 8.135e+01 8.852e+01 9.695e+01 1.259e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 06:30:07,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=977286.6666666666, ans=0.0 2023-11-20 06:30:08,677 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146600 2023-11-20 06:30:13,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=977286.6666666666, ans=0.0 2023-11-20 06:30:20,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-11-20 06:30:33,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=977420.0, ans=0.125 2023-11-20 06:30:37,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=977420.0, ans=0.125 2023-11-20 06:30:44,341 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:30:53,521 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2350, loss[loss=0.07254, simple_loss=0.09259, pruned_loss=0.01665, audio_tagging_loss=0.009598, over 16536.00 frames. ], tot_loss[loss=0.08151, simple_loss=0.1018, pruned_loss=0.02043, audio_tagging_loss=0.01019, over 3042680.82 frames. ], batch size: 64, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:31:08,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=977620.0, ans=0.125 2023-11-20 06:31:13,197 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146650 2023-11-20 06:31:38,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=977753.3333333334, ans=0.0 2023-11-20 06:31:41,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=977753.3333333334, ans=0.125 2023-11-20 06:31:58,154 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2400, loss[loss=0.04553, simple_loss=0.05788, pruned_loss=0.007761, audio_tagging_loss=0.008829, over 14160.00 frames. ], tot_loss[loss=0.0819, simple_loss=0.1023, pruned_loss=0.02052, audio_tagging_loss=0.01022, over 3048279.15 frames. ], batch size: 55, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:32:02,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=22.5 2023-11-20 06:32:09,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.246e+01 8.786e+01 9.716e+01 1.266e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-20 06:32:13,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-20 06:32:15,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=977953.3333333334, ans=0.125 2023-11-20 06:32:16,312 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146700 2023-11-20 06:32:26,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=978020.0, ans=0.0 2023-11-20 06:32:40,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=978086.6666666666, ans=0.125 2023-11-20 06:32:47,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=978086.6666666666, ans=0.0 2023-11-20 06:32:57,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2023-11-20 06:33:01,533 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2450, loss[loss=0.07516, simple_loss=0.0894, pruned_loss=0.02132, audio_tagging_loss=0.009133, over 14658.00 frames. ], tot_loss[loss=0.08237, simple_loss=0.1032, pruned_loss=0.02063, audio_tagging_loss=0.01016, over 3049614.02 frames. ], batch size: 57, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:33:13,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=978286.6666666666, ans=0.0 2023-11-20 06:33:21,621 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146750 2023-11-20 06:33:24,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=978286.6666666666, ans=0.125 2023-11-20 06:33:44,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=978420.0, ans=0.09899494936611666 2023-11-20 06:33:48,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=978420.0, ans=0.125 2023-11-20 06:34:06,462 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2500, loss[loss=0.08897, simple_loss=0.1173, pruned_loss=0.02382, audio_tagging_loss=0.006506, over 14633.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.1039, pruned_loss=0.02094, audio_tagging_loss=0.01017, over 3049836.36 frames. ], batch size: 54, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:34:18,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.313e+01 9.053e+01 9.691e+01 1.783e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-20 06:34:22,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=978620.0, ans=0.125 2023-11-20 06:34:26,054 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146800 2023-11-20 06:34:26,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=978620.0, ans=0.04949747468305833 2023-11-20 06:34:29,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=978620.0, ans=0.2 2023-11-20 06:34:38,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=978686.6666666666, ans=0.125 2023-11-20 06:34:42,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=978686.6666666666, ans=0.0 2023-11-20 06:34:46,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=978753.3333333334, ans=0.125 2023-11-20 06:34:50,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=978753.3333333334, ans=0.5 2023-11-20 06:35:10,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=978886.6666666666, ans=0.125 2023-11-20 06:35:11,792 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2550, loss[loss=0.08346, simple_loss=0.1095, pruned_loss=0.01888, audio_tagging_loss=0.00981, over 15089.00 frames. ], tot_loss[loss=0.08212, simple_loss=0.1029, pruned_loss=0.02063, audio_tagging_loss=0.01005, over 3045628.64 frames. ], batch size: 55, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:35:19,954 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.42 vs. limit=12.0 2023-11-20 06:35:29,930 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146850 2023-11-20 06:35:38,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=10.0 2023-11-20 06:35:44,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=979020.0, ans=0.0 2023-11-20 06:35:50,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=979086.6666666666, ans=0.125 2023-11-20 06:36:03,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2023-11-20 06:36:09,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=22.5 2023-11-20 06:36:15,067 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2600, loss[loss=0.06862, simple_loss=0.08853, pruned_loss=0.01678, audio_tagging_loss=0.007577, over 14875.00 frames. ], tot_loss[loss=0.08159, simple_loss=0.1026, pruned_loss=0.0204, audio_tagging_loss=0.009886, over 3050216.69 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:36:26,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.215e+01 8.794e+01 9.573e+01 1.201e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 06:36:27,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.62 vs. limit=10.0 2023-11-20 06:36:34,893 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146900 2023-11-20 06:36:45,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=979353.3333333334, ans=0.125 2023-11-20 06:36:54,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=979420.0, ans=0.2 2023-11-20 06:36:59,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=979420.0, ans=0.125 2023-11-20 06:37:10,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-20 06:37:10,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=979486.6666666666, ans=0.2 2023-11-20 06:37:16,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=979486.6666666666, ans=0.125 2023-11-20 06:37:20,160 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2650, loss[loss=0.07643, simple_loss=0.09696, pruned_loss=0.01743, audio_tagging_loss=0.01053, over 15278.00 frames. ], tot_loss[loss=0.08177, simple_loss=0.1028, pruned_loss=0.02052, audio_tagging_loss=0.009858, over 3047859.18 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 16.0 2023-11-20 06:37:27,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=979553.3333333334, ans=0.125 2023-11-20 06:37:32,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=979620.0, ans=0.05 2023-11-20 06:37:34,504 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:37:39,809 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 146950 2023-11-20 06:37:53,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=979686.6666666666, ans=0.125 2023-11-20 06:37:54,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=979686.6666666666, ans=0.0 2023-11-20 06:38:03,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=979753.3333333334, ans=0.0 2023-11-20 06:38:06,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=979753.3333333334, ans=0.0 2023-11-20 06:38:18,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2023-11-20 06:38:21,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=979820.0, ans=0.125 2023-11-20 06:38:24,558 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2700, loss[loss=0.09594, simple_loss=0.1215, pruned_loss=0.02631, audio_tagging_loss=0.008869, over 14566.00 frames. ], tot_loss[loss=0.08139, simple_loss=0.1023, pruned_loss=0.02039, audio_tagging_loss=0.009866, over 3046414.52 frames. ], batch size: 55, lr: 5.38e-03, grad_scale: 16.0 2023-11-20 06:38:37,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.654e+01 8.126e+01 8.706e+01 9.526e+01 1.459e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 06:38:40,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=979953.3333333334, ans=0.035 2023-11-20 06:38:43,836 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147000 2023-11-20 06:38:46,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=979953.3333333334, ans=0.125 2023-11-20 06:38:49,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=980020.0, ans=0.125 2023-11-20 06:39:19,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-11-20 06:39:29,317 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2750, loss[loss=0.0774, simple_loss=0.09582, pruned_loss=0.02004, audio_tagging_loss=0.009452, over 15323.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1023, pruned_loss=0.02051, audio_tagging_loss=0.00974, over 3040985.90 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:39:37,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.09 vs. limit=22.5 2023-11-20 06:39:49,310 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147050 2023-11-20 06:40:15,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=980420.0, ans=0.125 2023-11-20 06:40:16,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=980420.0, ans=0.125 2023-11-20 06:40:22,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-20 06:40:23,478 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:40:28,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=980486.6666666666, ans=0.125 2023-11-20 06:40:33,950 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2800, loss[loss=0.08345, simple_loss=0.1033, pruned_loss=0.02356, audio_tagging_loss=0.008265, over 15204.00 frames. ], tot_loss[loss=0.08125, simple_loss=0.1019, pruned_loss=0.02043, audio_tagging_loss=0.009863, over 3034662.48 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:40:48,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.373e+01 8.295e+01 8.926e+01 1.019e+02 1.823e+02, threshold=1.785e+02, percent-clipped=1.0 2023-11-20 06:40:49,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=980620.0, ans=0.05 2023-11-20 06:40:49,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=980620.0, ans=0.2 2023-11-20 06:40:51,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=980620.0, ans=0.125 2023-11-20 06:40:53,885 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147100 2023-11-20 06:41:10,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=980686.6666666666, ans=0.05 2023-11-20 06:41:14,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=980753.3333333334, ans=0.125 2023-11-20 06:41:30,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=980820.0, ans=0.125 2023-11-20 06:41:30,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=980820.0, ans=0.1 2023-11-20 06:41:39,274 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2850, loss[loss=0.08079, simple_loss=0.09599, pruned_loss=0.02166, audio_tagging_loss=0.01113, over 15075.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1014, pruned_loss=0.02019, audio_tagging_loss=0.00989, over 3036475.40 frames. ], batch size: 55, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:41:58,403 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147150 2023-11-20 06:41:58,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=980953.3333333334, ans=0.125 2023-11-20 06:42:08,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=981020.0, ans=0.125 2023-11-20 06:42:12,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=981020.0, ans=0.125 2023-11-20 06:42:12,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=981020.0, ans=0.125 2023-11-20 06:42:18,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=981086.6666666666, ans=0.0 2023-11-20 06:42:23,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=981086.6666666666, ans=0.0 2023-11-20 06:42:26,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981086.6666666666, ans=0.1 2023-11-20 06:42:28,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=981086.6666666666, ans=0.0 2023-11-20 06:42:33,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=981153.3333333334, ans=0.0 2023-11-20 06:42:40,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2023-11-20 06:42:43,734 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2900, loss[loss=0.0904, simple_loss=0.1058, pruned_loss=0.02679, audio_tagging_loss=0.01068, over 15033.00 frames. ], tot_loss[loss=0.08085, simple_loss=0.1017, pruned_loss=0.02021, audio_tagging_loss=0.009817, over 3041881.11 frames. ], batch size: 58, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:42:57,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.790e+01 8.308e+01 8.941e+01 9.842e+01 1.369e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 06:43:02,752 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147200 2023-11-20 06:43:32,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=981420.0, ans=0.0 2023-11-20 06:43:36,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=981486.6666666666, ans=0.125 2023-11-20 06:43:38,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=981486.6666666666, ans=0.07 2023-11-20 06:43:48,111 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 2950, loss[loss=0.08164, simple_loss=0.1054, pruned_loss=0.01855, audio_tagging_loss=0.01039, over 15545.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1019, pruned_loss=0.02023, audio_tagging_loss=0.009788, over 3037233.04 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:43:49,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=981553.3333333334, ans=0.0 2023-11-20 06:43:52,000 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:44:07,693 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147250 2023-11-20 06:44:09,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2023-11-20 06:44:29,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-20 06:44:45,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-20 06:44:46,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=981820.0, ans=0.0 2023-11-20 06:44:53,068 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3000, loss[loss=0.0938, simple_loss=0.1181, pruned_loss=0.02277, audio_tagging_loss=0.01198, over 14731.00 frames. ], tot_loss[loss=0.0817, simple_loss=0.1028, pruned_loss=0.02046, audio_tagging_loss=0.009851, over 3037766.81 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:44:53,069 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 06:45:16,482 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.9351, 2.9628, 3.5527, 2.8577, 3.8353, 3.7049, 3.4120, 3.1744], device='cuda:2') 2023-11-20 06:45:31,946 INFO [train_asr.py:1294] (2/4) Epoch 13, validation: loss=0.06242, simple_loss=0.05394, pruned_loss=0.005804, audio_tagging_loss=0.02964, over 4681554.00 frames. 2023-11-20 06:45:31,947 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 06:45:33,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=981886.6666666666, ans=0.0 2023-11-20 06:45:36,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=981886.6666666666, ans=0.2 2023-11-20 06:45:44,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=981953.3333333334, ans=0.125 2023-11-20 06:45:46,810 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.201e+01 8.897e+01 9.903e+01 1.229e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 06:45:52,533 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147300 2023-11-20 06:46:32,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=982153.3333333334, ans=0.0 2023-11-20 06:46:35,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=982153.3333333334, ans=0.1 2023-11-20 06:46:37,627 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3050, loss[loss=0.07011, simple_loss=0.09353, pruned_loss=0.01535, audio_tagging_loss=0.007994, over 14138.00 frames. ], tot_loss[loss=0.08083, simple_loss=0.1013, pruned_loss=0.02021, audio_tagging_loss=0.009961, over 3041108.66 frames. ], batch size: 53, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:46:37,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=982220.0, ans=0.125 2023-11-20 06:46:40,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=982220.0, ans=0.0 2023-11-20 06:46:48,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=982220.0, ans=0.125 2023-11-20 06:46:57,062 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147350 2023-11-20 06:47:03,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=982353.3333333334, ans=0.125 2023-11-20 06:47:13,008 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:47:18,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=982420.0, ans=0.125 2023-11-20 06:47:28,138 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:47:35,069 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:47:42,641 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3100, loss[loss=0.08203, simple_loss=0.1091, pruned_loss=0.01985, audio_tagging_loss=0.007644, over 15170.00 frames. ], tot_loss[loss=0.0815, simple_loss=0.1022, pruned_loss=0.02038, audio_tagging_loss=0.01003, over 3044512.56 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:47:46,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=982553.3333333334, ans=0.2 2023-11-20 06:47:50,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=982553.3333333334, ans=0.0 2023-11-20 06:47:50,613 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-11-20 06:47:55,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.032e+01 8.672e+01 9.635e+01 1.172e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 06:48:00,926 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147400 2023-11-20 06:48:05,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=982620.0, ans=0.125 2023-11-20 06:48:17,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=982686.6666666666, ans=0.2 2023-11-20 06:48:21,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=982753.3333333334, ans=10.0 2023-11-20 06:48:21,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=982753.3333333334, ans=0.125 2023-11-20 06:48:25,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=982753.3333333334, ans=0.125 2023-11-20 06:48:47,329 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3150, loss[loss=0.1017, simple_loss=0.1395, pruned_loss=0.02514, audio_tagging_loss=0.006782, over 15571.00 frames. ], tot_loss[loss=0.08171, simple_loss=0.1027, pruned_loss=0.02029, audio_tagging_loss=0.01006, over 3044799.59 frames. ], batch size: 54, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:48:48,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=982886.6666666666, ans=0.2 2023-11-20 06:49:00,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=982953.3333333334, ans=0.0 2023-11-20 06:49:06,555 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147450 2023-11-20 06:49:30,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=983086.6666666666, ans=0.125 2023-11-20 06:49:35,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=983086.6666666666, ans=0.0 2023-11-20 06:49:49,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=983153.3333333334, ans=0.1 2023-11-20 06:49:52,025 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3200, loss[loss=0.06899, simple_loss=0.09039, pruned_loss=0.01283, audio_tagging_loss=0.01097, over 14688.00 frames. ], tot_loss[loss=0.08199, simple_loss=0.103, pruned_loss=0.02046, audio_tagging_loss=0.01005, over 3045140.26 frames. ], batch size: 54, lr: 5.37e-03, grad_scale: 32.0 2023-11-20 06:50:01,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=983220.0, ans=0.125 2023-11-20 06:50:06,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.614e+01 8.160e+01 8.999e+01 9.638e+01 2.555e+02, threshold=1.800e+02, percent-clipped=1.0 2023-11-20 06:50:11,817 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147500 2023-11-20 06:50:23,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=983353.3333333334, ans=0.125 2023-11-20 06:50:56,503 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3250, loss[loss=0.09901, simple_loss=0.1327, pruned_loss=0.02333, audio_tagging_loss=0.009326, over 14559.00 frames. ], tot_loss[loss=0.08146, simple_loss=0.1023, pruned_loss=0.02017, audio_tagging_loss=0.01013, over 3045722.89 frames. ], batch size: 53, lr: 5.37e-03, grad_scale: 32.0 2023-11-20 06:51:12,400 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=12.0 2023-11-20 06:51:15,558 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147550 2023-11-20 06:51:44,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2023-11-20 06:52:00,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=983886.6666666666, ans=0.0 2023-11-20 06:52:00,907 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3300, loss[loss=0.06142, simple_loss=0.06887, pruned_loss=0.01353, audio_tagging_loss=0.01345, over 14729.00 frames. ], tot_loss[loss=0.08071, simple_loss=0.1009, pruned_loss=0.01987, audio_tagging_loss=0.01037, over 3047733.93 frames. ], batch size: 58, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:52:03,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=983886.6666666666, ans=12.0 2023-11-20 06:52:14,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.228e+01 8.367e+01 9.193e+01 1.015e+02 1.401e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 06:52:20,245 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147600 2023-11-20 06:52:20,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-11-20 06:52:34,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=984020.0, ans=0.125 2023-11-20 06:53:05,444 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3350, loss[loss=0.08869, simple_loss=0.1183, pruned_loss=0.02134, audio_tagging_loss=0.008176, over 16755.00 frames. ], tot_loss[loss=0.08073, simple_loss=0.1009, pruned_loss=0.01999, audio_tagging_loss=0.01029, over 3052541.27 frames. ], batch size: 59, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:53:26,293 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147650 2023-11-20 06:53:26,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=984286.6666666666, ans=0.125 2023-11-20 06:53:45,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=984420.0, ans=0.125 2023-11-20 06:54:11,605 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3400, loss[loss=0.06543, simple_loss=0.07634, pruned_loss=0.01598, audio_tagging_loss=0.01129, over 15949.00 frames. ], tot_loss[loss=0.0806, simple_loss=0.1005, pruned_loss=0.02014, audio_tagging_loss=0.01021, over 3049913.16 frames. ], batch size: 60, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:54:19,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-20 06:54:25,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.861e+01 8.407e+01 9.126e+01 1.002e+02 1.896e+02, threshold=1.825e+02, percent-clipped=1.0 2023-11-20 06:54:31,031 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147700 2023-11-20 06:54:42,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=984686.6666666666, ans=0.125 2023-11-20 06:54:46,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=984686.6666666666, ans=0.0 2023-11-20 06:55:16,232 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3450, loss[loss=0.06645, simple_loss=0.08941, pruned_loss=0.01258, audio_tagging_loss=0.009176, over 15727.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1004, pruned_loss=0.02007, audio_tagging_loss=0.01011, over 3051306.02 frames. ], batch size: 59, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:55:18,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=984886.6666666666, ans=0.0 2023-11-20 06:55:35,429 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147750 2023-11-20 06:55:38,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984953.3333333334, ans=0.1 2023-11-20 06:55:53,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-11-20 06:56:00,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985086.6666666666, ans=0.1 2023-11-20 06:56:11,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=985153.3333333334, ans=0.5 2023-11-20 06:56:16,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=985153.3333333334, ans=0.125 2023-11-20 06:56:18,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985220.0, ans=0.1 2023-11-20 06:56:19,930 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3500, loss[loss=0.06346, simple_loss=0.07651, pruned_loss=0.01462, audio_tagging_loss=0.01058, over 15136.00 frames. ], tot_loss[loss=0.08076, simple_loss=0.1011, pruned_loss=0.0202, audio_tagging_loss=0.01003, over 3042122.59 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:56:34,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.197e+01 8.889e+01 1.015e+02 2.808e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-20 06:56:40,132 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147800 2023-11-20 06:56:41,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=985286.6666666666, ans=0.125 2023-11-20 06:56:52,853 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:56:55,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=985353.3333333334, ans=0.0 2023-11-20 06:56:55,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2023-11-20 06:57:12,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985486.6666666666, ans=0.1 2023-11-20 06:57:24,515 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3550, loss[loss=0.0762, simple_loss=0.1018, pruned_loss=0.01516, audio_tagging_loss=0.01014, over 15636.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.1013, pruned_loss=0.02019, audio_tagging_loss=0.009947, over 3045262.32 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 16.0 2023-11-20 06:57:24,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=985553.3333333334, ans=0.0 2023-11-20 06:57:27,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=985553.3333333334, ans=0.125 2023-11-20 06:57:33,513 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:57:33,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2023-11-20 06:57:42,229 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-20 06:57:44,153 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147850 2023-11-20 06:57:45,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=985620.0, ans=0.2 2023-11-20 06:57:45,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=985620.0, ans=0.125 2023-11-20 06:58:04,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2023-11-20 06:58:29,808 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3600, loss[loss=0.08428, simple_loss=0.1069, pruned_loss=0.02223, audio_tagging_loss=0.008614, over 14089.00 frames. ], tot_loss[loss=0.08155, simple_loss=0.1027, pruned_loss=0.02038, audio_tagging_loss=0.009825, over 3053954.39 frames. ], batch size: 52, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:58:35,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=985886.6666666666, ans=0.125 2023-11-20 06:58:44,591 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.078e+01 8.812e+01 9.931e+01 2.962e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-20 06:58:49,013 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147900 2023-11-20 06:58:50,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=985953.3333333334, ans=0.0 2023-11-20 06:58:51,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=12.0 2023-11-20 06:59:33,138 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3650, loss[loss=0.0785, simple_loss=0.09481, pruned_loss=0.02066, audio_tagging_loss=0.01043, over 14525.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1028, pruned_loss=0.02039, audio_tagging_loss=0.00983, over 3055081.27 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:59:52,945 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 147950 2023-11-20 07:00:09,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-11-20 07:00:12,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=986420.0, ans=0.125 2023-11-20 07:00:21,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=986420.0, ans=0.0 2023-11-20 07:00:38,021 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3700, loss[loss=0.05679, simple_loss=0.06112, pruned_loss=0.01182, audio_tagging_loss=0.01441, over 14982.00 frames. ], tot_loss[loss=0.08248, simple_loss=0.1039, pruned_loss=0.02077, audio_tagging_loss=0.009791, over 3064104.95 frames. ], batch size: 59, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:00:45,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=986553.3333333334, ans=0.07 2023-11-20 07:00:53,835 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 7.922e+01 8.709e+01 9.261e+01 1.390e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 07:00:57,665 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148000 2023-11-20 07:01:14,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2023-11-20 07:01:15,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=986686.6666666666, ans=0.2 2023-11-20 07:01:47,072 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3750, loss[loss=0.08898, simple_loss=0.117, pruned_loss=0.02255, audio_tagging_loss=0.007921, over 16367.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.1034, pruned_loss=0.02074, audio_tagging_loss=0.009788, over 3061008.42 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:01:47,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=986886.6666666666, ans=0.125 2023-11-20 07:01:48,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=986886.6666666666, ans=0.125 2023-11-20 07:01:48,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=986886.6666666666, ans=0.1 2023-11-20 07:01:52,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=986886.6666666666, ans=0.0 2023-11-20 07:02:05,709 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148050 2023-11-20 07:02:13,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=987020.0, ans=0.125 2023-11-20 07:02:29,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=987086.6666666666, ans=0.125 2023-11-20 07:02:30,755 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:02:41,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=987153.3333333334, ans=0.2 2023-11-20 07:02:51,366 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3800, loss[loss=0.06737, simple_loss=0.07682, pruned_loss=0.01832, audio_tagging_loss=0.01064, over 15153.00 frames. ], tot_loss[loss=0.08216, simple_loss=0.1031, pruned_loss=0.02066, audio_tagging_loss=0.009953, over 3061137.56 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:02:57,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=987220.0, ans=0.125 2023-11-20 07:03:07,203 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.071e+01 8.832e+01 9.688e+01 1.208e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 07:03:10,946 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148100 2023-11-20 07:03:14,604 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:03:16,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-20 07:03:51,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=987486.6666666666, ans=0.0 2023-11-20 07:03:55,676 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3850, loss[loss=0.09077, simple_loss=0.1153, pruned_loss=0.02281, audio_tagging_loss=0.01031, over 14203.00 frames. ], tot_loss[loss=0.08146, simple_loss=0.102, pruned_loss=0.02034, audio_tagging_loss=0.01012, over 3057518.95 frames. ], batch size: 55, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:03:55,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=987553.3333333334, ans=0.125 2023-11-20 07:04:00,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987553.3333333334, ans=0.1 2023-11-20 07:04:02,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=987553.3333333334, ans=0.0 2023-11-20 07:04:15,195 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148150 2023-11-20 07:04:16,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=987620.0, ans=0.125 2023-11-20 07:04:16,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=987620.0, ans=0.0 2023-11-20 07:04:17,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=987620.0, ans=0.1 2023-11-20 07:04:21,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=987686.6666666666, ans=0.125 2023-11-20 07:05:00,264 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3900, loss[loss=0.06521, simple_loss=0.07594, pruned_loss=0.01594, audio_tagging_loss=0.0113, over 14703.00 frames. ], tot_loss[loss=0.08185, simple_loss=0.1024, pruned_loss=0.02049, audio_tagging_loss=0.01018, over 3050280.10 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:05:08,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987886.6666666666, ans=0.1 2023-11-20 07:05:15,596 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.310e+01 9.082e+01 9.917e+01 1.355e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 07:05:19,493 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148200 2023-11-20 07:05:35,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988020.0, ans=0.1 2023-11-20 07:06:05,389 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 3950, loss[loss=0.09421, simple_loss=0.1154, pruned_loss=0.02661, audio_tagging_loss=0.009893, over 15040.00 frames. ], tot_loss[loss=0.08131, simple_loss=0.1015, pruned_loss=0.02024, audio_tagging_loss=0.01033, over 3045348.02 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:06:11,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-11-20 07:06:25,459 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148250 2023-11-20 07:06:31,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=988353.3333333334, ans=0.1 2023-11-20 07:06:37,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=988353.3333333334, ans=0.0 2023-11-20 07:06:38,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=988353.3333333334, ans=0.125 2023-11-20 07:06:47,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=988420.0, ans=0.0 2023-11-20 07:06:47,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=988420.0, ans=0.0 2023-11-20 07:06:56,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=988486.6666666666, ans=0.0 2023-11-20 07:07:00,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=988486.6666666666, ans=0.125 2023-11-20 07:07:10,800 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4000, loss[loss=0.07913, simple_loss=0.1045, pruned_loss=0.01901, audio_tagging_loss=0.00786, over 14060.00 frames. ], tot_loss[loss=0.08076, simple_loss=0.1003, pruned_loss=0.02011, audio_tagging_loss=0.01049, over 3044779.49 frames. ], batch size: 54, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:07:26,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.419e+01 9.137e+01 9.996e+01 1.272e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 07:07:27,462 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-11-20 07:07:30,570 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148300 2023-11-20 07:07:30,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=988620.0, ans=0.0 2023-11-20 07:07:33,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=988620.0, ans=0.0 2023-11-20 07:07:38,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=988686.6666666666, ans=0.125 2023-11-20 07:07:47,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=988686.6666666666, ans=0.125 2023-11-20 07:07:49,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988753.3333333334, ans=0.1 2023-11-20 07:08:01,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=988820.0, ans=0.1 2023-11-20 07:08:03,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2023-11-20 07:08:16,234 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4050, loss[loss=0.08313, simple_loss=0.1001, pruned_loss=0.02107, audio_tagging_loss=0.01202, over 15007.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1006, pruned_loss=0.0201, audio_tagging_loss=0.01045, over 3042133.12 frames. ], batch size: 55, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:08:18,719 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:08:23,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=988886.6666666666, ans=0.125 2023-11-20 07:08:24,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=988886.6666666666, ans=0.0 2023-11-20 07:08:34,946 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148350 2023-11-20 07:08:35,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2023-11-20 07:09:08,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2023-11-20 07:09:20,355 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4100, loss[loss=0.09036, simple_loss=0.1197, pruned_loss=0.01948, audio_tagging_loss=0.01104, over 16789.00 frames. ], tot_loss[loss=0.08129, simple_loss=0.1015, pruned_loss=0.02015, audio_tagging_loss=0.01039, over 3046823.55 frames. ], batch size: 61, lr: 5.35e-03, grad_scale: 16.0 2023-11-20 07:09:23,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2023-11-20 07:09:37,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.747e+01 7.925e+01 8.541e+01 9.436e+01 1.128e+02, threshold=1.708e+02, percent-clipped=0.0 2023-11-20 07:09:39,243 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148400 2023-11-20 07:10:17,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=989486.6666666666, ans=0.0 2023-11-20 07:10:23,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=989553.3333333334, ans=0.125 2023-11-20 07:10:24,270 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4150, loss[loss=0.06704, simple_loss=0.07392, pruned_loss=0.01997, audio_tagging_loss=0.01012, over 15085.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.1009, pruned_loss=0.01993, audio_tagging_loss=0.01017, over 3048646.02 frames. ], batch size: 60, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:10:44,164 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148450 2023-11-20 07:11:03,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-11-20 07:11:05,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=989753.3333333334, ans=0.125 2023-11-20 07:11:09,848 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:11:28,771 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4200, loss[loss=0.07056, simple_loss=0.09411, pruned_loss=0.01613, audio_tagging_loss=0.007374, over 15976.00 frames. ], tot_loss[loss=0.08022, simple_loss=0.1009, pruned_loss=0.0198, audio_tagging_loss=0.009993, over 3047085.98 frames. ], batch size: 58, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:11:46,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.148e+01 8.675e+01 9.910e+01 1.292e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 07:11:47,741 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148500 2023-11-20 07:11:55,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=990020.0, ans=15.0 2023-11-20 07:12:02,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=990020.0, ans=0.0 2023-11-20 07:12:32,963 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4250, loss[loss=0.1046, simple_loss=0.1302, pruned_loss=0.02996, audio_tagging_loss=0.009528, over 16240.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1015, pruned_loss=0.01988, audio_tagging_loss=0.009833, over 3061585.51 frames. ], batch size: 58, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:12:35,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=990220.0, ans=0.125 2023-11-20 07:12:46,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=990286.6666666666, ans=0.1 2023-11-20 07:12:52,618 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148550 2023-11-20 07:13:22,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990420.0, ans=0.1 2023-11-20 07:13:38,312 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4300, loss[loss=0.0917, simple_loss=0.1188, pruned_loss=0.02362, audio_tagging_loss=0.008669, over 14693.00 frames. ], tot_loss[loss=0.08145, simple_loss=0.1025, pruned_loss=0.02032, audio_tagging_loss=0.009877, over 3057783.95 frames. ], batch size: 55, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:13:42,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=990553.3333333334, ans=0.125 2023-11-20 07:13:56,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.001e+01 8.705e+01 9.608e+01 1.261e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 07:13:57,923 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148600 2023-11-20 07:14:21,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2023-11-20 07:14:29,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=990820.0, ans=0.0 2023-11-20 07:14:35,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=990820.0, ans=0.05 2023-11-20 07:14:43,241 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4350, loss[loss=0.07089, simple_loss=0.09047, pruned_loss=0.01449, audio_tagging_loss=0.01116, over 14810.00 frames. ], tot_loss[loss=0.08154, simple_loss=0.1027, pruned_loss=0.02041, audio_tagging_loss=0.009778, over 3059244.49 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:14:43,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2023-11-20 07:15:02,449 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148650 2023-11-20 07:15:47,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2023-11-20 07:15:48,011 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4400, loss[loss=0.101, simple_loss=0.1357, pruned_loss=0.02669, audio_tagging_loss=0.006484, over 15332.00 frames. ], tot_loss[loss=0.08189, simple_loss=0.1034, pruned_loss=0.02054, audio_tagging_loss=0.009665, over 3060477.07 frames. ], batch size: 53, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:16:05,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-20 07:16:05,878 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.660e+01 8.299e+01 8.915e+01 9.764e+01 1.212e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 07:16:07,191 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148700 2023-11-20 07:16:14,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=991353.3333333334, ans=0.1 2023-11-20 07:16:20,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-20 07:16:51,715 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2023-11-20 07:16:52,182 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4450, loss[loss=0.09525, simple_loss=0.1211, pruned_loss=0.02529, audio_tagging_loss=0.009432, over 16264.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1035, pruned_loss=0.02051, audio_tagging_loss=0.009728, over 3070663.63 frames. ], batch size: 62, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:17:12,438 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148750 2023-11-20 07:17:32,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=991753.3333333334, ans=0.125 2023-11-20 07:17:33,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=991753.3333333334, ans=0.0 2023-11-20 07:17:40,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=991753.3333333334, ans=0.125 2023-11-20 07:17:42,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=991753.3333333334, ans=0.1 2023-11-20 07:17:44,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=991820.0, ans=0.07 2023-11-20 07:17:44,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=991820.0, ans=0.2 2023-11-20 07:17:57,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-20 07:17:58,004 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4500, loss[loss=0.05731, simple_loss=0.06923, pruned_loss=0.01192, audio_tagging_loss=0.01077, over 15856.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1036, pruned_loss=0.02052, audio_tagging_loss=0.009694, over 3073626.79 frames. ], batch size: 61, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:18:03,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=991886.6666666666, ans=0.125 2023-11-20 07:18:15,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=22.5 2023-11-20 07:18:15,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.390e+01 8.916e+01 9.990e+01 1.363e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 07:18:17,059 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148800 2023-11-20 07:18:23,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=992020.0, ans=0.125 2023-11-20 07:18:25,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=992020.0, ans=0.07 2023-11-20 07:18:31,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=992020.0, ans=0.125 2023-11-20 07:19:02,818 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4550, loss[loss=0.07827, simple_loss=0.09817, pruned_loss=0.01914, audio_tagging_loss=0.01004, over 14021.00 frames. ], tot_loss[loss=0.08251, simple_loss=0.1042, pruned_loss=0.02068, audio_tagging_loss=0.009722, over 3062134.09 frames. ], batch size: 53, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:19:04,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=992220.0, ans=0.0 2023-11-20 07:19:06,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=992220.0, ans=0.5 2023-11-20 07:19:21,515 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148850 2023-11-20 07:19:21,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=992286.6666666666, ans=0.1 2023-11-20 07:19:42,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=22.5 2023-11-20 07:19:44,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=992420.0, ans=0.2 2023-11-20 07:19:48,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992420.0, ans=0.1 2023-11-20 07:19:49,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=992420.0, ans=0.125 2023-11-20 07:19:51,861 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:20:04,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=992486.6666666666, ans=0.125 2023-11-20 07:20:06,660 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4600, loss[loss=0.06703, simple_loss=0.08355, pruned_loss=0.01512, audio_tagging_loss=0.01014, over 14892.00 frames. ], tot_loss[loss=0.08119, simple_loss=0.1019, pruned_loss=0.02024, audio_tagging_loss=0.009994, over 3056312.74 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:20:07,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2023-11-20 07:20:25,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.427e+01 8.263e+01 8.928e+01 9.927e+01 1.394e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 07:20:27,169 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148900 2023-11-20 07:20:53,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=992753.3333333334, ans=0.125 2023-11-20 07:21:01,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=992820.0, ans=0.125 2023-11-20 07:21:07,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992820.0, ans=0.1 2023-11-20 07:21:11,559 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4650, loss[loss=0.083, simple_loss=0.1009, pruned_loss=0.02307, audio_tagging_loss=0.00949, over 15272.00 frames. ], tot_loss[loss=0.08161, simple_loss=0.1024, pruned_loss=0.02031, audio_tagging_loss=0.01009, over 3052931.35 frames. ], batch size: 59, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:21:20,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=992886.6666666666, ans=0.2 2023-11-20 07:21:31,217 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 148950 2023-11-20 07:21:42,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=15.0 2023-11-20 07:21:44,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=993020.0, ans=0.125 2023-11-20 07:21:55,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=993086.6666666666, ans=0.0 2023-11-20 07:21:56,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=993086.6666666666, ans=0.0 2023-11-20 07:22:10,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2023-11-20 07:22:14,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=993153.3333333334, ans=0.125 2023-11-20 07:22:16,906 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4700, loss[loss=0.06927, simple_loss=0.08433, pruned_loss=0.01559, audio_tagging_loss=0.01151, over 15075.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.1021, pruned_loss=0.02017, audio_tagging_loss=0.01017, over 3052055.81 frames. ], batch size: 58, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:22:27,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=993220.0, ans=0.09899494936611666 2023-11-20 07:22:34,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.015e+01 8.729e+01 9.415e+01 1.258e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 07:22:35,412 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149000 2023-11-20 07:22:42,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=993353.3333333334, ans=0.125 2023-11-20 07:22:43,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-20 07:22:50,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=993353.3333333334, ans=0.125 2023-11-20 07:22:55,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=993420.0, ans=0.125 2023-11-20 07:23:05,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=993420.0, ans=0.125 2023-11-20 07:23:17,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993486.6666666666, ans=0.1 2023-11-20 07:23:18,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-11-20 07:23:21,485 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4750, loss[loss=0.06995, simple_loss=0.0964, pruned_loss=0.01261, audio_tagging_loss=0.00914, over 15547.00 frames. ], tot_loss[loss=0.08205, simple_loss=0.103, pruned_loss=0.02042, audio_tagging_loss=0.01014, over 3047286.68 frames. ], batch size: 59, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:23:27,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2023-11-20 07:23:35,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=993620.0, ans=0.125 2023-11-20 07:23:40,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=993620.0, ans=0.0 2023-11-20 07:23:41,195 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149050 2023-11-20 07:23:58,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=993686.6666666666, ans=0.0 2023-11-20 07:24:18,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=993820.0, ans=0.0 2023-11-20 07:24:25,482 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4800, loss[loss=0.08101, simple_loss=0.0985, pruned_loss=0.02118, audio_tagging_loss=0.01057, over 15966.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1015, pruned_loss=0.02021, audio_tagging_loss=0.01034, over 3045650.90 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:24:41,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-20 07:24:44,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.552e+01 8.333e+01 8.832e+01 9.618e+01 1.335e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 07:24:46,058 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149100 2023-11-20 07:24:48,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=993953.3333333334, ans=22.5 2023-11-20 07:24:51,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=994020.0, ans=0.04949747468305833 2023-11-20 07:25:31,590 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4850, loss[loss=0.06312, simple_loss=0.07271, pruned_loss=0.01613, audio_tagging_loss=0.01064, over 15288.00 frames. ], tot_loss[loss=0.08134, simple_loss=0.1014, pruned_loss=0.02024, audio_tagging_loss=0.0104, over 3047034.48 frames. ], batch size: 58, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:25:31,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=994220.0, ans=0.125 2023-11-20 07:25:39,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2023-11-20 07:25:49,756 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149150 2023-11-20 07:25:49,880 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:26:03,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=994353.3333333334, ans=0.125 2023-11-20 07:26:11,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=994420.0, ans=0.0 2023-11-20 07:26:31,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2023-11-20 07:26:34,933 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4900, loss[loss=0.07011, simple_loss=0.09045, pruned_loss=0.01636, audio_tagging_loss=0.008525, over 14155.00 frames. ], tot_loss[loss=0.08117, simple_loss=0.101, pruned_loss=0.0203, audio_tagging_loss=0.01035, over 3040766.39 frames. ], batch size: 56, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:26:37,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=994553.3333333334, ans=0.125 2023-11-20 07:26:44,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=994553.3333333334, ans=0.0 2023-11-20 07:26:52,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.440e+01 9.252e+01 1.012e+02 1.571e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-20 07:26:54,705 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149200 2023-11-20 07:27:06,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=994686.6666666666, ans=0.0 2023-11-20 07:27:11,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=994686.6666666666, ans=0.1 2023-11-20 07:27:39,746 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 4950, loss[loss=0.08529, simple_loss=0.1083, pruned_loss=0.02345, audio_tagging_loss=0.007681, over 14763.00 frames. ], tot_loss[loss=0.08129, simple_loss=0.1013, pruned_loss=0.02042, audio_tagging_loss=0.01021, over 3040847.14 frames. ], batch size: 55, lr: 5.33e-03, grad_scale: 32.0 2023-11-20 07:27:59,565 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149250 2023-11-20 07:28:01,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=994953.3333333334, ans=0.125 2023-11-20 07:28:12,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=995020.0, ans=0.125 2023-11-20 07:28:28,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=995086.6666666666, ans=0.2 2023-11-20 07:28:33,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=995153.3333333334, ans=0.0 2023-11-20 07:28:34,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=995153.3333333334, ans=0.125 2023-11-20 07:28:45,189 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5000, loss[loss=0.06717, simple_loss=0.07273, pruned_loss=0.01534, audio_tagging_loss=0.01546, over 14379.00 frames. ], tot_loss[loss=0.08045, simple_loss=0.1003, pruned_loss=0.02013, audio_tagging_loss=0.01018, over 3042032.65 frames. ], batch size: 55, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:28:58,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=995286.6666666666, ans=0.1 2023-11-20 07:29:04,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.403e+01 7.937e+01 8.592e+01 9.230e+01 1.200e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-20 07:29:04,245 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149300 2023-11-20 07:29:12,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=995353.3333333334, ans=0.2 2023-11-20 07:29:47,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=995486.6666666666, ans=0.125 2023-11-20 07:29:49,407 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5050, loss[loss=0.1028, simple_loss=0.1354, pruned_loss=0.0292, audio_tagging_loss=0.005916, over 15665.00 frames. ], tot_loss[loss=0.08083, simple_loss=0.101, pruned_loss=0.02034, audio_tagging_loss=0.01, over 3042221.99 frames. ], batch size: 55, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:30:05,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-11-20 07:30:08,514 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149350 2023-11-20 07:30:08,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=995620.0, ans=0.125 2023-11-20 07:30:23,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995686.6666666666, ans=0.1 2023-11-20 07:30:30,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=995753.3333333334, ans=0.0 2023-11-20 07:30:40,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=995820.0, ans=0.2 2023-11-20 07:30:54,243 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5100, loss[loss=0.1007, simple_loss=0.132, pruned_loss=0.02567, audio_tagging_loss=0.009063, over 14976.00 frames. ], tot_loss[loss=0.08095, simple_loss=0.1012, pruned_loss=0.02042, audio_tagging_loss=0.009938, over 3034146.24 frames. ], batch size: 55, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:30:54,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=995886.6666666666, ans=0.125 2023-11-20 07:31:09,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2023-11-20 07:31:13,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.152e+01 8.992e+01 9.912e+01 1.301e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 07:31:14,141 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149400 2023-11-20 07:31:17,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-11-20 07:31:21,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2023-11-20 07:31:25,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-20 07:31:31,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=996020.0, ans=0.0 2023-11-20 07:31:59,925 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5150, loss[loss=0.08048, simple_loss=0.1085, pruned_loss=0.01862, audio_tagging_loss=0.007591, over 15863.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.1006, pruned_loss=0.0203, audio_tagging_loss=0.00996, over 3026522.96 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:32:00,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=996220.0, ans=0.2 2023-11-20 07:32:08,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=996220.0, ans=0.0 2023-11-20 07:32:17,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=996286.6666666666, ans=0.125 2023-11-20 07:32:18,921 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149450 2023-11-20 07:32:26,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=996353.3333333334, ans=0.1 2023-11-20 07:32:44,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=996420.0, ans=0.2 2023-11-20 07:32:52,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=996486.6666666666, ans=0.0 2023-11-20 07:33:03,254 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5200, loss[loss=0.08086, simple_loss=0.09369, pruned_loss=0.02353, audio_tagging_loss=0.01048, over 15918.00 frames. ], tot_loss[loss=0.08105, simple_loss=0.1014, pruned_loss=0.02048, audio_tagging_loss=0.009855, over 3028903.31 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 32.0 2023-11-20 07:33:06,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=996553.3333333334, ans=0.125 2023-11-20 07:33:06,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=996553.3333333334, ans=0.2 2023-11-20 07:33:11,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2023-11-20 07:33:22,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2023-11-20 07:33:22,722 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.127e+01 8.748e+01 9.361e+01 1.233e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 07:33:22,889 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149500 2023-11-20 07:33:29,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=996686.6666666666, ans=0.0 2023-11-20 07:33:32,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-20 07:33:52,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=996753.3333333334, ans=0.0 2023-11-20 07:33:57,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=996820.0, ans=0.125 2023-11-20 07:34:07,805 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5250, loss[loss=0.07734, simple_loss=0.09575, pruned_loss=0.02092, audio_tagging_loss=0.008548, over 13876.00 frames. ], tot_loss[loss=0.08116, simple_loss=0.1015, pruned_loss=0.02061, audio_tagging_loss=0.009807, over 3033580.76 frames. ], batch size: 55, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:34:14,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=996886.6666666666, ans=0.07 2023-11-20 07:34:28,166 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149550 2023-11-20 07:34:30,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=996953.3333333334, ans=0.125 2023-11-20 07:34:32,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=996953.3333333334, ans=0.0 2023-11-20 07:35:09,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=997153.3333333334, ans=0.125 2023-11-20 07:35:13,110 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5300, loss[loss=0.08473, simple_loss=0.1031, pruned_loss=0.02202, audio_tagging_loss=0.01116, over 15272.00 frames. ], tot_loss[loss=0.08182, simple_loss=0.1026, pruned_loss=0.02071, audio_tagging_loss=0.009811, over 3042624.03 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:35:32,195 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149600 2023-11-20 07:35:33,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.168e+01 7.882e+01 8.704e+01 9.363e+01 1.429e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 07:35:43,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=997353.3333333334, ans=0.125 2023-11-20 07:35:48,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=997353.3333333334, ans=0.0 2023-11-20 07:36:03,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=997420.0, ans=0.035 2023-11-20 07:36:18,206 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5350, loss[loss=0.08886, simple_loss=0.1081, pruned_loss=0.02641, audio_tagging_loss=0.008424, over 15366.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.1027, pruned_loss=0.02049, audio_tagging_loss=0.009817, over 3046040.37 frames. ], batch size: 57, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:36:20,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=997553.3333333334, ans=0.0 2023-11-20 07:36:22,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=997553.3333333334, ans=0.05 2023-11-20 07:36:28,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2023-11-20 07:36:37,669 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149650 2023-11-20 07:36:42,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=997686.6666666666, ans=0.1 2023-11-20 07:37:13,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=997820.0, ans=0.125 2023-11-20 07:37:17,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=997820.0, ans=0.125 2023-11-20 07:37:21,726 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5400, loss[loss=0.07124, simple_loss=0.08582, pruned_loss=0.01544, audio_tagging_loss=0.01289, over 14569.00 frames. ], tot_loss[loss=0.08208, simple_loss=0.1032, pruned_loss=0.02067, audio_tagging_loss=0.00983, over 3047388.62 frames. ], batch size: 56, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:37:22,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=997886.6666666666, ans=0.125 2023-11-20 07:37:41,823 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149700 2023-11-20 07:37:42,884 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.118e+01 8.667e+01 9.522e+01 1.475e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 07:37:43,307 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:38:11,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=998153.3333333334, ans=0.1 2023-11-20 07:38:20,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=998153.3333333334, ans=0.0 2023-11-20 07:38:22,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=998153.3333333334, ans=0.125 2023-11-20 07:38:25,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=998220.0, ans=0.1 2023-11-20 07:38:26,848 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5450, loss[loss=0.101, simple_loss=0.1224, pruned_loss=0.03075, audio_tagging_loss=0.009063, over 16480.00 frames. ], tot_loss[loss=0.0823, simple_loss=0.1031, pruned_loss=0.02075, audio_tagging_loss=0.009998, over 3051083.18 frames. ], batch size: 62, lr: 5.33e-03, grad_scale: 8.0 2023-11-20 07:38:27,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=998220.0, ans=0.125 2023-11-20 07:38:28,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=998220.0, ans=0.0 2023-11-20 07:38:30,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=998220.0, ans=0.0 2023-11-20 07:38:32,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=15.0 2023-11-20 07:38:37,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=998220.0, ans=0.1 2023-11-20 07:38:45,960 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149750 2023-11-20 07:39:15,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=998420.0, ans=0.0 2023-11-20 07:39:18,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=998486.6666666666, ans=0.015 2023-11-20 07:39:18,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=998486.6666666666, ans=0.125 2023-11-20 07:39:31,008 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5500, loss[loss=0.07988, simple_loss=0.09649, pruned_loss=0.02114, audio_tagging_loss=0.01049, over 16840.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1028, pruned_loss=0.0205, audio_tagging_loss=0.01013, over 3051892.86 frames. ], batch size: 62, lr: 5.32e-03, grad_scale: 8.0 2023-11-20 07:39:32,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=998553.3333333334, ans=0.05 2023-11-20 07:39:49,359 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149800 2023-11-20 07:39:52,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 8.049e+01 8.675e+01 9.506e+01 1.252e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 07:39:57,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=998686.6666666666, ans=0.125 2023-11-20 07:39:59,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=998686.6666666666, ans=0.0 2023-11-20 07:40:19,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=998753.3333333334, ans=0.1 2023-11-20 07:40:29,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.5 2023-11-20 07:40:30,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2023-11-20 07:40:34,410 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5550, loss[loss=0.09235, simple_loss=0.114, pruned_loss=0.0245, audio_tagging_loss=0.01086, over 15953.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.1026, pruned_loss=0.02034, audio_tagging_loss=0.01018, over 3047250.11 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 8.0 2023-11-20 07:40:43,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-20 07:40:48,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-11-20 07:40:52,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=998953.3333333334, ans=0.0 2023-11-20 07:40:54,604 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149850 2023-11-20 07:40:56,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=998953.3333333334, ans=0.1 2023-11-20 07:41:00,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-20 07:41:10,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=999020.0, ans=0.2 2023-11-20 07:41:26,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=999153.3333333334, ans=0.125 2023-11-20 07:41:40,103 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5600, loss[loss=0.08412, simple_loss=0.1081, pruned_loss=0.01911, audio_tagging_loss=0.01094, over 15451.00 frames. ], tot_loss[loss=0.08216, simple_loss=0.1033, pruned_loss=0.02033, audio_tagging_loss=0.01017, over 3045806.20 frames. ], batch size: 58, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:41:45,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=999220.0, ans=0.09899494936611666 2023-11-20 07:41:59,446 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149900 2023-11-20 07:42:01,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.809e+01 8.057e+01 8.712e+01 9.430e+01 1.236e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 07:42:08,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=999353.3333333334, ans=0.0 2023-11-20 07:42:14,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=999353.3333333334, ans=0.1 2023-11-20 07:42:15,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=999353.3333333334, ans=0.125 2023-11-20 07:42:22,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=999420.0, ans=0.125 2023-11-20 07:42:24,342 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:42:36,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=999486.6666666666, ans=0.0 2023-11-20 07:42:44,602 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5650, loss[loss=0.08374, simple_loss=0.1032, pruned_loss=0.02145, audio_tagging_loss=0.01071, over 14451.00 frames. ], tot_loss[loss=0.0815, simple_loss=0.1022, pruned_loss=0.02016, audio_tagging_loss=0.01026, over 3042665.39 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:42:51,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=999553.3333333334, ans=0.2 2023-11-20 07:42:51,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-20 07:43:00,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=999620.0, ans=0.125 2023-11-20 07:43:03,289 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 149950 2023-11-20 07:43:04,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=999620.0, ans=0.125 2023-11-20 07:43:25,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=999753.3333333334, ans=0.2 2023-11-20 07:43:30,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2023-11-20 07:43:46,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=999820.0, ans=0.0 2023-11-20 07:43:48,686 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5700, loss[loss=0.1199, simple_loss=0.1554, pruned_loss=0.0335, audio_tagging_loss=0.008746, over 14929.00 frames. ], tot_loss[loss=0.08124, simple_loss=0.1016, pruned_loss=0.0201, audio_tagging_loss=0.01033, over 3036756.21 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:44:08,590 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150000 2023-11-20 07:44:11,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.782e+01 7.942e+01 8.498e+01 9.233e+01 1.252e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-20 07:44:26,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1000086.6666666666, ans=0.125 2023-11-20 07:44:30,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1000086.6666666666, ans=0.125 2023-11-20 07:44:53,178 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5750, loss[loss=0.09376, simple_loss=0.1319, pruned_loss=0.02038, audio_tagging_loss=0.007412, over 15280.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1013, pruned_loss=0.02011, audio_tagging_loss=0.01018, over 3036184.40 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:45:00,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000220.0, ans=0.1 2023-11-20 07:45:03,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1000220.0, ans=0.0 2023-11-20 07:45:03,635 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2023-11-20 07:45:13,447 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150050 2023-11-20 07:45:17,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1000286.6666666666, ans=0.0 2023-11-20 07:45:38,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1000420.0, ans=0.1 2023-11-20 07:45:38,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1000420.0, ans=0.07 2023-11-20 07:45:45,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2023-11-20 07:45:58,338 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5800, loss[loss=0.07237, simple_loss=0.08882, pruned_loss=0.01787, audio_tagging_loss=0.01009, over 14237.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.1013, pruned_loss=0.02021, audio_tagging_loss=0.01013, over 3041515.36 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:45:58,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1000553.3333333334, ans=0.125 2023-11-20 07:46:04,786 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:46:12,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1000620.0, ans=0.125 2023-11-20 07:46:13,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1000620.0, ans=0.125 2023-11-20 07:46:16,865 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150100 2023-11-20 07:46:19,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.622e+01 8.139e+01 8.776e+01 9.398e+01 1.793e+02, threshold=1.755e+02, percent-clipped=1.0 2023-11-20 07:46:36,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1000753.3333333334, ans=0.125 2023-11-20 07:46:58,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1000820.0, ans=0.125 2023-11-20 07:47:01,814 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5850, loss[loss=0.08214, simple_loss=0.1166, pruned_loss=0.01641, audio_tagging_loss=0.007437, over 14665.00 frames. ], tot_loss[loss=0.0812, simple_loss=0.1019, pruned_loss=0.02024, audio_tagging_loss=0.01002, over 3043224.61 frames. ], batch size: 53, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:47:05,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1000886.6666666666, ans=0.0 2023-11-20 07:47:20,807 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150150 2023-11-20 07:47:22,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1000953.3333333334, ans=0.125 2023-11-20 07:47:41,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001086.6666666666, ans=0.1 2023-11-20 07:47:55,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001153.3333333334, ans=0.1 2023-11-20 07:48:05,908 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5900, loss[loss=0.09657, simple_loss=0.1079, pruned_loss=0.03113, audio_tagging_loss=0.01148, over 16010.00 frames. ], tot_loss[loss=0.08182, simple_loss=0.1027, pruned_loss=0.02048, audio_tagging_loss=0.01001, over 3045116.54 frames. ], batch size: 59, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:48:08,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1001220.0, ans=0.125 2023-11-20 07:48:14,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1001220.0, ans=10.0 2023-11-20 07:48:15,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1001220.0, ans=0.125 2023-11-20 07:48:25,990 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150200 2023-11-20 07:48:28,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.240e+01 8.857e+01 9.692e+01 1.369e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 07:48:30,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1001286.6666666666, ans=0.0 2023-11-20 07:48:55,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1001420.0, ans=0.1 2023-11-20 07:49:10,180 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 5950, loss[loss=0.09149, simple_loss=0.1098, pruned_loss=0.02763, audio_tagging_loss=0.008948, over 14200.00 frames. ], tot_loss[loss=0.08177, simple_loss=0.1028, pruned_loss=0.02039, audio_tagging_loss=0.009993, over 3050499.17 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:49:12,499 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-20 07:49:17,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1001553.3333333334, ans=0.125 2023-11-20 07:49:27,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1001620.0, ans=0.125 2023-11-20 07:49:27,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1001620.0, ans=0.1 2023-11-20 07:49:29,868 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150250 2023-11-20 07:49:31,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1001620.0, ans=0.125 2023-11-20 07:50:13,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1001886.6666666666, ans=0.0 2023-11-20 07:50:14,428 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6000, loss[loss=0.08492, simple_loss=0.1115, pruned_loss=0.01915, audio_tagging_loss=0.01004, over 14903.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.1013, pruned_loss=0.02012, audio_tagging_loss=0.01005, over 3050493.92 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 32.0 2023-11-20 07:50:14,429 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 07:50:55,341 INFO [train_asr.py:1294] (2/4) Epoch 13, validation: loss=0.06203, simple_loss=0.05394, pruned_loss=0.00581, audio_tagging_loss=0.02925, over 4681554.00 frames. 2023-11-20 07:50:55,342 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 07:50:55,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1001886.6666666666, ans=0.125 2023-11-20 07:51:01,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1001886.6666666666, ans=0.09899494936611666 2023-11-20 07:51:06,985 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2023-11-20 07:51:07,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1001953.3333333334, ans=0.125 2023-11-20 07:51:07,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1001953.3333333334, ans=0.1 2023-11-20 07:51:10,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2023-11-20 07:51:12,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1001953.3333333334, ans=0.2 2023-11-20 07:51:15,063 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150300 2023-11-20 07:51:17,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 7.953e+01 8.643e+01 9.262e+01 1.038e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 07:51:22,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1002020.0, ans=0.125 2023-11-20 07:51:31,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1002020.0, ans=0.125 2023-11-20 07:51:31,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1002020.0, ans=0.09899494936611666 2023-11-20 07:51:34,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1002086.6666666666, ans=0.125 2023-11-20 07:51:36,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1002086.6666666666, ans=0.025 2023-11-20 07:51:41,295 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:51:59,911 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6050, loss[loss=0.07569, simple_loss=0.09252, pruned_loss=0.01955, audio_tagging_loss=0.009875, over 15383.00 frames. ], tot_loss[loss=0.08081, simple_loss=0.1012, pruned_loss=0.02017, audio_tagging_loss=0.01004, over 3052956.92 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:52:11,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1002286.6666666666, ans=0.0 2023-11-20 07:52:15,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1002286.6666666666, ans=0.0 2023-11-20 07:52:18,914 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150350 2023-11-20 07:52:26,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1002353.3333333334, ans=0.0 2023-11-20 07:52:43,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1002420.0, ans=0.5 2023-11-20 07:53:03,899 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6100, loss[loss=0.05748, simple_loss=0.06096, pruned_loss=0.01213, audio_tagging_loss=0.01487, over 15800.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1002, pruned_loss=0.02004, audio_tagging_loss=0.009994, over 3050713.32 frames. ], batch size: 61, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:53:12,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1002553.3333333334, ans=0.0 2023-11-20 07:53:23,647 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150400 2023-11-20 07:53:27,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.482e+01 9.288e+01 1.055e+02 1.647e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-20 07:53:35,508 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2023-11-20 07:54:08,374 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6150, loss[loss=0.06895, simple_loss=0.07666, pruned_loss=0.01597, audio_tagging_loss=0.01465, over 15290.00 frames. ], tot_loss[loss=0.08039, simple_loss=0.1007, pruned_loss=0.02009, audio_tagging_loss=0.009967, over 3046194.99 frames. ], batch size: 59, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:54:25,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1002953.3333333334, ans=0.125 2023-11-20 07:54:28,031 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150450 2023-11-20 07:54:46,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1003086.6666666666, ans=0.1 2023-11-20 07:54:48,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1003086.6666666666, ans=0.5 2023-11-20 07:55:12,838 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6200, loss[loss=0.0857, simple_loss=0.108, pruned_loss=0.01941, audio_tagging_loss=0.01229, over 15927.00 frames. ], tot_loss[loss=0.08048, simple_loss=0.1008, pruned_loss=0.02002, audio_tagging_loss=0.01004, over 3046625.84 frames. ], batch size: 57, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:55:14,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1003220.0, ans=0.2 2023-11-20 07:55:27,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1003286.6666666666, ans=0.125 2023-11-20 07:55:32,257 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150500 2023-11-20 07:55:35,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.052e+01 8.616e+01 9.415e+01 1.229e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 07:55:52,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-11-20 07:55:56,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1003420.0, ans=0.2 2023-11-20 07:56:17,649 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6250, loss[loss=0.08388, simple_loss=0.1044, pruned_loss=0.02292, audio_tagging_loss=0.008761, over 16207.00 frames. ], tot_loss[loss=0.07992, simple_loss=0.0998, pruned_loss=0.01984, audio_tagging_loss=0.01018, over 3043806.04 frames. ], batch size: 59, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:56:37,381 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150550 2023-11-20 07:57:07,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2023-11-20 07:57:07,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=22.5 2023-11-20 07:57:21,479 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6300, loss[loss=0.1092, simple_loss=0.1375, pruned_loss=0.03014, audio_tagging_loss=0.01028, over 16158.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1008, pruned_loss=0.02, audio_tagging_loss=0.01024, over 3047002.62 frames. ], batch size: 57, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:57:41,449 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150600 2023-11-20 07:57:45,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.528e+01 9.126e+01 1.017e+02 1.272e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-20 07:57:49,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1004020.0, ans=0.1 2023-11-20 07:58:01,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1004086.6666666666, ans=0.125 2023-11-20 07:58:27,044 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6350, loss[loss=0.07986, simple_loss=0.1047, pruned_loss=0.01892, audio_tagging_loss=0.008595, over 14784.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.1007, pruned_loss=0.0199, audio_tagging_loss=0.01034, over 3044001.58 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:58:27,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1004220.0, ans=0.125 2023-11-20 07:58:35,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1004220.0, ans=10.0 2023-11-20 07:58:39,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1004286.6666666666, ans=0.0 2023-11-20 07:58:42,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1004286.6666666666, ans=0.05 2023-11-20 07:58:46,476 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150650 2023-11-20 07:59:02,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1004353.3333333334, ans=0.0 2023-11-20 07:59:02,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1004353.3333333334, ans=0.0 2023-11-20 07:59:16,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1004420.0, ans=0.0 2023-11-20 07:59:25,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1004486.6666666666, ans=0.0 2023-11-20 07:59:32,165 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6400, loss[loss=0.09215, simple_loss=0.1151, pruned_loss=0.02092, audio_tagging_loss=0.01369, over 15946.00 frames. ], tot_loss[loss=0.08065, simple_loss=0.101, pruned_loss=0.0198, audio_tagging_loss=0.01037, over 3041352.04 frames. ], batch size: 60, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 07:59:36,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1004553.3333333334, ans=0.125 2023-11-20 07:59:46,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1004620.0, ans=0.125 2023-11-20 07:59:51,879 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150700 2023-11-20 07:59:56,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.100e+01 8.750e+01 9.554e+01 1.310e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 08:00:05,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2023-11-20 08:00:05,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=22.5 2023-11-20 08:00:21,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1004753.3333333334, ans=0.125 2023-11-20 08:00:37,048 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6450, loss[loss=0.09028, simple_loss=0.1212, pruned_loss=0.02144, audio_tagging_loss=0.008265, over 14919.00 frames. ], tot_loss[loss=0.08079, simple_loss=0.1009, pruned_loss=0.01993, audio_tagging_loss=0.01038, over 3036620.23 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:00:38,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1004886.6666666666, ans=0.0 2023-11-20 08:00:39,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1004886.6666666666, ans=0.125 2023-11-20 08:00:40,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=1004886.6666666666, ans=12.0 2023-11-20 08:00:42,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1004886.6666666666, ans=0.1 2023-11-20 08:00:53,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1004953.3333333334, ans=0.125 2023-11-20 08:00:56,811 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150750 2023-11-20 08:01:00,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1004953.3333333334, ans=0.125 2023-11-20 08:01:06,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-20 08:01:38,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1005153.3333333334, ans=0.0 2023-11-20 08:01:39,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1005153.3333333334, ans=0.1 2023-11-20 08:01:42,378 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6500, loss[loss=0.08409, simple_loss=0.118, pruned_loss=0.01832, audio_tagging_loss=0.006749, over 15512.00 frames. ], tot_loss[loss=0.07965, simple_loss=0.09929, pruned_loss=0.01961, audio_tagging_loss=0.01039, over 3042379.54 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:01:45,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1005220.0, ans=0.0 2023-11-20 08:02:01,565 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150800 2023-11-20 08:02:02,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1005286.6666666666, ans=0.125 2023-11-20 08:02:05,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.112e+01 8.845e+01 9.450e+01 1.236e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 08:02:15,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1005353.3333333334, ans=0.125 2023-11-20 08:02:20,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1005420.0, ans=0.07 2023-11-20 08:02:29,227 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2023-11-20 08:02:30,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1005420.0, ans=0.0 2023-11-20 08:02:34,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1005486.6666666666, ans=0.0 2023-11-20 08:02:43,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-20 08:02:47,592 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6550, loss[loss=0.09779, simple_loss=0.134, pruned_loss=0.02072, audio_tagging_loss=0.01005, over 15809.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.09975, pruned_loss=0.01968, audio_tagging_loss=0.01033, over 3043584.18 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:02:56,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1005553.3333333334, ans=0.125 2023-11-20 08:03:06,851 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150850 2023-11-20 08:03:09,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1005620.0, ans=0.1 2023-11-20 08:03:14,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1005686.6666666666, ans=0.125 2023-11-20 08:03:20,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1005686.6666666666, ans=0.0 2023-11-20 08:03:34,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1005753.3333333334, ans=0.125 2023-11-20 08:03:51,430 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6600, loss[loss=0.04716, simple_loss=0.05691, pruned_loss=0.0112, audio_tagging_loss=0.007511, over 15326.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1005, pruned_loss=0.01986, audio_tagging_loss=0.01004, over 3041725.93 frames. ], batch size: 61, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 08:04:12,212 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150900 2023-11-20 08:04:14,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1005953.3333333334, ans=0.125 2023-11-20 08:04:16,941 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.805e+01 8.131e+01 8.865e+01 9.580e+01 1.182e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 08:04:26,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1006020.0, ans=0.125 2023-11-20 08:04:57,527 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6650, loss[loss=0.06035, simple_loss=0.06916, pruned_loss=0.01168, audio_tagging_loss=0.01409, over 14469.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1003, pruned_loss=0.01986, audio_tagging_loss=0.009996, over 3043995.92 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:05:16,590 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 150950 2023-11-20 08:05:35,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1006420.0, ans=0.0 2023-11-20 08:05:41,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1006420.0, ans=0.1 2023-11-20 08:06:01,387 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6700, loss[loss=0.08629, simple_loss=0.1159, pruned_loss=0.01676, audio_tagging_loss=0.01157, over 13904.00 frames. ], tot_loss[loss=0.07951, simple_loss=0.09997, pruned_loss=0.01965, audio_tagging_loss=0.009871, over 3032341.59 frames. ], batch size: 52, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:06:01,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1006553.3333333334, ans=0.1 2023-11-20 08:06:04,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1006553.3333333334, ans=0.1 2023-11-20 08:06:09,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1006553.3333333334, ans=10.0 2023-11-20 08:06:11,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1006553.3333333334, ans=0.1 2023-11-20 08:06:19,942 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151000 2023-11-20 08:06:25,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.540e+01 7.744e+01 8.706e+01 9.471e+01 1.164e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 08:06:30,297 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:06:32,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1006686.6666666666, ans=0.1 2023-11-20 08:06:57,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1006820.0, ans=0.0 2023-11-20 08:07:05,449 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6750, loss[loss=0.06432, simple_loss=0.08232, pruned_loss=0.01304, audio_tagging_loss=0.01012, over 14465.00 frames. ], tot_loss[loss=0.07955, simple_loss=0.0999, pruned_loss=0.0197, audio_tagging_loss=0.009901, over 3035754.64 frames. ], batch size: 54, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:07:11,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-11-20 08:07:12,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1006886.6666666666, ans=0.0 2023-11-20 08:07:25,130 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151050 2023-11-20 08:08:10,114 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6800, loss[loss=0.08853, simple_loss=0.1095, pruned_loss=0.0233, audio_tagging_loss=0.01046, over 15653.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.1002, pruned_loss=0.0198, audio_tagging_loss=0.009822, over 3032096.91 frames. ], batch size: 58, lr: 5.30e-03, grad_scale: 32.0 2023-11-20 08:08:29,246 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151100 2023-11-20 08:08:29,855 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.46 vs. limit=10.0 2023-11-20 08:08:31,088 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2023-11-20 08:08:33,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.041e+01 8.769e+01 9.846e+01 1.398e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 08:08:40,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1007353.3333333334, ans=0.125 2023-11-20 08:08:48,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1007420.0, ans=0.125 2023-11-20 08:09:13,915 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6850, loss[loss=0.09433, simple_loss=0.1127, pruned_loss=0.02768, audio_tagging_loss=0.01029, over 15357.00 frames. ], tot_loss[loss=0.07952, simple_loss=0.1, pruned_loss=0.01974, audio_tagging_loss=0.009772, over 3030759.85 frames. ], batch size: 57, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:09:28,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1007620.0, ans=0.125 2023-11-20 08:09:32,372 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151150 2023-11-20 08:09:33,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1007620.0, ans=0.125 2023-11-20 08:09:37,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1007686.6666666666, ans=0.125 2023-11-20 08:09:39,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-20 08:09:39,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-20 08:09:53,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1007753.3333333334, ans=0.0 2023-11-20 08:09:58,265 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2023-11-20 08:10:16,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1007886.6666666666, ans=0.1 2023-11-20 08:10:16,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2023-11-20 08:10:17,698 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6900, loss[loss=0.0854, simple_loss=0.1052, pruned_loss=0.02193, audio_tagging_loss=0.01086, over 14681.00 frames. ], tot_loss[loss=0.07972, simple_loss=0.1002, pruned_loss=0.0198, audio_tagging_loss=0.009809, over 3030678.64 frames. ], batch size: 54, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:10:37,654 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151200 2023-11-20 08:10:44,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.370e+01 9.169e+01 1.032e+02 1.396e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 08:10:58,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1008086.6666666666, ans=0.0 2023-11-20 08:11:03,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1008086.6666666666, ans=0.1 2023-11-20 08:11:08,955 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:11:22,934 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 6950, loss[loss=0.07653, simple_loss=0.09962, pruned_loss=0.01706, audio_tagging_loss=0.009662, over 14544.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.1012, pruned_loss=0.02002, audio_tagging_loss=0.009841, over 3026876.63 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:11:39,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1008286.6666666666, ans=0.1 2023-11-20 08:11:43,517 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151250 2023-11-20 08:11:46,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1008286.6666666666, ans=0.0 2023-11-20 08:12:01,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2023-11-20 08:12:18,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-11-20 08:12:27,876 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7000, loss[loss=0.07837, simple_loss=0.09406, pruned_loss=0.02014, audio_tagging_loss=0.0112, over 14241.00 frames. ], tot_loss[loss=0.08116, simple_loss=0.1021, pruned_loss=0.02031, audio_tagging_loss=0.009788, over 3032332.55 frames. ], batch size: 54, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:12:28,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1008553.3333333334, ans=0.125 2023-11-20 08:12:46,745 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151300 2023-11-20 08:12:50,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1008620.0, ans=0.125 2023-11-20 08:12:52,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.209e+01 8.895e+01 9.636e+01 1.347e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 08:13:10,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1008753.3333333334, ans=0.1 2023-11-20 08:13:20,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1008820.0, ans=0.0 2023-11-20 08:13:32,216 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7050, loss[loss=0.0832, simple_loss=0.09927, pruned_loss=0.02107, audio_tagging_loss=0.0125, over 15453.00 frames. ], tot_loss[loss=0.08134, simple_loss=0.1022, pruned_loss=0.0203, audio_tagging_loss=0.009964, over 3036503.33 frames. ], batch size: 58, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:13:43,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1008953.3333333334, ans=0.125 2023-11-20 08:13:51,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-20 08:13:51,867 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151350 2023-11-20 08:13:51,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1008953.3333333334, ans=0.0 2023-11-20 08:13:52,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1008953.3333333334, ans=0.125 2023-11-20 08:14:00,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1009020.0, ans=0.015 2023-11-20 08:14:11,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1009086.6666666666, ans=0.0 2023-11-20 08:14:14,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1009086.6666666666, ans=0.125 2023-11-20 08:14:14,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1009086.6666666666, ans=0.0 2023-11-20 08:14:15,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1009086.6666666666, ans=0.0 2023-11-20 08:14:16,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1009086.6666666666, ans=0.0 2023-11-20 08:14:36,092 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7100, loss[loss=0.1145, simple_loss=0.1494, pruned_loss=0.03306, audio_tagging_loss=0.00676, over 15037.00 frames. ], tot_loss[loss=0.08182, simple_loss=0.1028, pruned_loss=0.02043, audio_tagging_loss=0.009994, over 3037929.95 frames. ], batch size: 54, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:14:42,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-11-20 08:14:46,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1009220.0, ans=0.0 2023-11-20 08:14:48,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1009286.6666666666, ans=0.04949747468305833 2023-11-20 08:14:56,656 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151400 2023-11-20 08:15:03,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.148e+01 8.789e+01 9.448e+01 1.213e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 08:15:06,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-20 08:15:08,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1009353.3333333334, ans=0.125 2023-11-20 08:15:21,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1009420.0, ans=0.2 2023-11-20 08:15:29,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1009486.6666666666, ans=0.0 2023-11-20 08:15:37,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1009486.6666666666, ans=0.125 2023-11-20 08:15:41,837 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7150, loss[loss=0.07421, simple_loss=0.08891, pruned_loss=0.0176, audio_tagging_loss=0.01215, over 15860.00 frames. ], tot_loss[loss=0.08183, simple_loss=0.103, pruned_loss=0.02033, audio_tagging_loss=0.01001, over 3040155.41 frames. ], batch size: 62, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:15:56,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-20 08:15:58,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1009620.0, ans=0.04949747468305833 2023-11-20 08:16:01,901 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151450 2023-11-20 08:16:09,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1009686.6666666666, ans=0.125 2023-11-20 08:16:10,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1009686.6666666666, ans=0.035 2023-11-20 08:16:14,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1009686.6666666666, ans=0.2 2023-11-20 08:16:37,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1009820.0, ans=0.125 2023-11-20 08:16:47,320 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7200, loss[loss=0.1021, simple_loss=0.1277, pruned_loss=0.02872, audio_tagging_loss=0.009542, over 16182.00 frames. ], tot_loss[loss=0.08135, simple_loss=0.1021, pruned_loss=0.02014, audio_tagging_loss=0.01015, over 3038558.93 frames. ], batch size: 61, lr: 5.30e-03, grad_scale: 32.0 2023-11-20 08:17:00,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1009953.3333333334, ans=0.125 2023-11-20 08:17:06,122 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151500 2023-11-20 08:17:08,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1009953.3333333334, ans=0.125 2023-11-20 08:17:13,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.859e+01 8.360e+01 9.161e+01 1.018e+02 1.279e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-20 08:17:22,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1010020.0, ans=0.125 2023-11-20 08:17:47,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1010153.3333333334, ans=0.2 2023-11-20 08:17:50,983 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7250, loss[loss=0.07585, simple_loss=0.09735, pruned_loss=0.01496, audio_tagging_loss=0.01222, over 15223.00 frames. ], tot_loss[loss=0.08134, simple_loss=0.102, pruned_loss=0.02025, audio_tagging_loss=0.0101, over 3041441.88 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:18:10,767 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151550 2023-11-20 08:18:17,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1010353.3333333334, ans=0.125 2023-11-20 08:18:17,985 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-20 08:18:44,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.49 vs. limit=6.0 2023-11-20 08:18:51,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1010486.6666666666, ans=0.0 2023-11-20 08:18:55,902 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7300, loss[loss=0.08255, simple_loss=0.1128, pruned_loss=0.01728, audio_tagging_loss=0.008896, over 14604.00 frames. ], tot_loss[loss=0.08104, simple_loss=0.1018, pruned_loss=0.02022, audio_tagging_loss=0.00994, over 3034223.72 frames. ], batch size: 54, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:19:08,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-20 08:19:11,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=12.0 2023-11-20 08:19:15,958 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151600 2023-11-20 08:19:20,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1010620.0, ans=0.125 2023-11-20 08:19:22,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.295e+01 8.871e+01 9.520e+01 1.560e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 08:19:29,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1010686.6666666666, ans=0.2 2023-11-20 08:19:30,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1010686.6666666666, ans=0.125 2023-11-20 08:19:44,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1010753.3333333334, ans=0.04949747468305833 2023-11-20 08:19:44,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.53 vs. limit=10.0 2023-11-20 08:19:46,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-20 08:19:50,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1010820.0, ans=0.125 2023-11-20 08:19:53,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1010820.0, ans=0.05 2023-11-20 08:20:01,266 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7350, loss[loss=0.07286, simple_loss=0.08796, pruned_loss=0.01706, audio_tagging_loss=0.01183, over 13480.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.101, pruned_loss=0.02006, audio_tagging_loss=0.009925, over 3036568.40 frames. ], batch size: 53, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:20:02,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1010886.6666666666, ans=0.95 2023-11-20 08:20:19,880 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151650 2023-11-20 08:20:39,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.99 vs. limit=10.0 2023-11-20 08:21:05,197 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7400, loss[loss=0.07508, simple_loss=0.09738, pruned_loss=0.01617, audio_tagging_loss=0.01022, over 16881.00 frames. ], tot_loss[loss=0.0805, simple_loss=0.1007, pruned_loss=0.02014, audio_tagging_loss=0.009997, over 3041948.42 frames. ], batch size: 66, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:21:06,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1011220.0, ans=0.0 2023-11-20 08:21:25,014 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151700 2023-11-20 08:21:30,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.761e+01 8.067e+01 8.750e+01 9.547e+01 1.451e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 08:21:42,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1011353.3333333334, ans=0.2 2023-11-20 08:22:09,934 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7450, loss[loss=0.09496, simple_loss=0.1174, pruned_loss=0.02513, audio_tagging_loss=0.01113, over 14312.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1016, pruned_loss=0.02028, audio_tagging_loss=0.009924, over 3043439.82 frames. ], batch size: 53, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:22:28,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-20 08:22:29,157 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151750 2023-11-20 08:22:45,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1011686.6666666666, ans=0.125 2023-11-20 08:22:58,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-20 08:23:14,432 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7500, loss[loss=0.08559, simple_loss=0.1097, pruned_loss=0.02263, audio_tagging_loss=0.00812, over 15316.00 frames. ], tot_loss[loss=0.08111, simple_loss=0.1015, pruned_loss=0.02043, audio_tagging_loss=0.009948, over 3035660.76 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:23:17,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011886.6666666666, ans=0.1 2023-11-20 08:23:30,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1011953.3333333334, ans=0.09899494936611666 2023-11-20 08:23:33,522 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151800 2023-11-20 08:23:40,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.501e+01 9.262e+01 1.002e+02 1.307e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-20 08:24:19,074 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7550, loss[loss=0.1056, simple_loss=0.1442, pruned_loss=0.02584, audio_tagging_loss=0.007667, over 15534.00 frames. ], tot_loss[loss=0.0811, simple_loss=0.1017, pruned_loss=0.02031, audio_tagging_loss=0.009957, over 3042946.32 frames. ], batch size: 54, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:24:36,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1012286.6666666666, ans=0.125 2023-11-20 08:24:38,753 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151850 2023-11-20 08:24:50,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1012353.3333333334, ans=0.0 2023-11-20 08:25:04,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1012420.0, ans=0.125 2023-11-20 08:25:09,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1012486.6666666666, ans=0.0 2023-11-20 08:25:20,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1012486.6666666666, ans=0.0 2023-11-20 08:25:21,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1012486.6666666666, ans=0.05 2023-11-20 08:25:23,853 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7600, loss[loss=0.07786, simple_loss=0.1039, pruned_loss=0.01922, audio_tagging_loss=0.006697, over 16278.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1015, pruned_loss=0.02007, audio_tagging_loss=0.00988, over 3045105.86 frames. ], batch size: 60, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:25:24,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1012553.3333333334, ans=0.02 2023-11-20 08:25:42,985 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151900 2023-11-20 08:25:44,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1012620.0, ans=0.05 2023-11-20 08:25:45,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1012620.0, ans=0.0 2023-11-20 08:25:48,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.336e+01 8.168e+01 8.744e+01 9.399e+01 1.403e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 08:25:58,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=22.5 2023-11-20 08:26:07,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1012753.3333333334, ans=0.125 2023-11-20 08:26:16,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1012820.0, ans=0.0 2023-11-20 08:26:25,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1012820.0, ans=0.5 2023-11-20 08:26:28,794 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7650, loss[loss=0.1003, simple_loss=0.1346, pruned_loss=0.02118, audio_tagging_loss=0.01183, over 15663.00 frames. ], tot_loss[loss=0.08086, simple_loss=0.1018, pruned_loss=0.02014, audio_tagging_loss=0.009799, over 3039731.40 frames. ], batch size: 59, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:26:40,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1012953.3333333334, ans=0.1 2023-11-20 08:26:42,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1012953.3333333334, ans=0.1 2023-11-20 08:26:47,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1012953.3333333334, ans=0.125 2023-11-20 08:26:48,132 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 151950 2023-11-20 08:26:56,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1013020.0, ans=0.125 2023-11-20 08:27:13,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1013086.6666666666, ans=0.125 2023-11-20 08:27:19,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1013086.6666666666, ans=0.0 2023-11-20 08:27:33,345 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7700, loss[loss=0.08361, simple_loss=0.1072, pruned_loss=0.02037, audio_tagging_loss=0.009617, over 15197.00 frames. ], tot_loss[loss=0.08016, simple_loss=0.101, pruned_loss=0.01979, audio_tagging_loss=0.009895, over 3032683.25 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:27:48,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1013286.6666666666, ans=0.0 2023-11-20 08:27:53,674 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152000 2023-11-20 08:28:00,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1013286.6666666666, ans=0.0 2023-11-20 08:28:03,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 7.881e+01 8.536e+01 9.329e+01 1.282e+02, threshold=1.707e+02, percent-clipped=0.0 2023-11-20 08:28:06,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1013353.3333333334, ans=0.95 2023-11-20 08:28:19,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-20 08:28:20,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1013420.0, ans=0.125 2023-11-20 08:28:42,942 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7750, loss[loss=0.08293, simple_loss=0.1115, pruned_loss=0.01899, audio_tagging_loss=0.00817, over 14819.00 frames. ], tot_loss[loss=0.08033, simple_loss=0.1008, pruned_loss=0.01992, audio_tagging_loss=0.009993, over 3035444.88 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:28:51,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1013553.3333333334, ans=0.07 2023-11-20 08:29:01,689 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152050 2023-11-20 08:29:01,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1013620.0, ans=0.125 2023-11-20 08:29:19,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1013753.3333333334, ans=0.125 2023-11-20 08:29:31,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1013753.3333333334, ans=0.125 2023-11-20 08:29:35,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1013820.0, ans=0.0 2023-11-20 08:29:35,733 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2023-11-20 08:29:36,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1013820.0, ans=0.025 2023-11-20 08:29:46,250 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7800, loss[loss=0.08973, simple_loss=0.1187, pruned_loss=0.02214, audio_tagging_loss=0.008241, over 15314.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.1022, pruned_loss=0.0202, audio_tagging_loss=0.009863, over 3034702.72 frames. ], batch size: 55, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:29:46,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1013886.6666666666, ans=0.1 2023-11-20 08:29:51,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1013886.6666666666, ans=0.2 2023-11-20 08:30:01,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1013953.3333333334, ans=0.0 2023-11-20 08:30:05,402 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152100 2023-11-20 08:30:05,499 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:30:12,162 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.195e+01 8.711e+01 9.199e+01 1.205e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 08:30:51,125 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7850, loss[loss=0.06312, simple_loss=0.0648, pruned_loss=0.01701, audio_tagging_loss=0.01371, over 15312.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1013, pruned_loss=0.02002, audio_tagging_loss=0.009966, over 3042673.40 frames. ], batch size: 60, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:31:00,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1014220.0, ans=0.125 2023-11-20 08:31:11,652 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152150 2023-11-20 08:31:14,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1014286.6666666666, ans=0.125 2023-11-20 08:31:28,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1014353.3333333334, ans=0.0 2023-11-20 08:31:44,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1014486.6666666666, ans=0.125 2023-11-20 08:31:45,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1014486.6666666666, ans=0.0 2023-11-20 08:31:56,452 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7900, loss[loss=0.08335, simple_loss=0.1063, pruned_loss=0.02167, audio_tagging_loss=0.00853, over 16039.00 frames. ], tot_loss[loss=0.08056, simple_loss=0.101, pruned_loss=0.02, audio_tagging_loss=0.01008, over 3056104.34 frames. ], batch size: 59, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:32:15,253 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152200 2023-11-20 08:32:22,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.112e+01 8.963e+01 9.750e+01 1.229e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-20 08:32:30,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1014686.6666666666, ans=0.125 2023-11-20 08:33:00,845 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 7950, loss[loss=0.1014, simple_loss=0.1268, pruned_loss=0.03078, audio_tagging_loss=0.007232, over 15428.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1014, pruned_loss=0.02009, audio_tagging_loss=0.01018, over 3056892.89 frames. ], batch size: 59, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:33:07,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1014886.6666666666, ans=0.125 2023-11-20 08:33:17,555 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:33:19,992 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152250 2023-11-20 08:33:21,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1014953.3333333334, ans=0.2 2023-11-20 08:33:24,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.45 vs. limit=22.5 2023-11-20 08:33:33,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1015020.0, ans=0.125 2023-11-20 08:33:44,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1015086.6666666666, ans=0.0 2023-11-20 08:33:46,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1015086.6666666666, ans=0.1 2023-11-20 08:33:47,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1015086.6666666666, ans=0.07 2023-11-20 08:34:04,852 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8000, loss[loss=0.09556, simple_loss=0.1135, pruned_loss=0.02685, audio_tagging_loss=0.01197, over 16283.00 frames. ], tot_loss[loss=0.08058, simple_loss=0.1006, pruned_loss=0.02001, audio_tagging_loss=0.01028, over 3052181.52 frames. ], batch size: 60, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:34:10,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1015220.0, ans=0.125 2023-11-20 08:34:12,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1015220.0, ans=0.1 2023-11-20 08:34:24,109 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152300 2023-11-20 08:34:30,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 8.267e+01 8.804e+01 9.546e+01 1.251e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 08:34:36,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2023-11-20 08:34:41,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1015353.3333333334, ans=0.125 2023-11-20 08:35:08,907 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8050, loss[loss=0.07879, simple_loss=0.109, pruned_loss=0.01723, audio_tagging_loss=0.007065, over 16312.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.1004, pruned_loss=0.01993, audio_tagging_loss=0.01033, over 3050344.03 frames. ], batch size: 60, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:35:18,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1015553.3333333334, ans=0.1 2023-11-20 08:35:29,132 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152350 2023-11-20 08:35:50,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1015753.3333333334, ans=0.125 2023-11-20 08:35:56,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1015753.3333333334, ans=0.125 2023-11-20 08:36:09,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1015820.0, ans=0.125 2023-11-20 08:36:11,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2023-11-20 08:36:14,450 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8100, loss[loss=0.09531, simple_loss=0.122, pruned_loss=0.02732, audio_tagging_loss=0.007018, over 14684.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1009, pruned_loss=0.02003, audio_tagging_loss=0.01015, over 3049285.57 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:36:21,527 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2023-11-20 08:36:32,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1015953.3333333334, ans=0.125 2023-11-20 08:36:33,608 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152400 2023-11-20 08:36:39,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.032e+01 8.730e+01 9.665e+01 1.175e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 08:37:13,840 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:37:15,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1016153.3333333334, ans=0.0 2023-11-20 08:37:18,477 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8150, loss[loss=0.1104, simple_loss=0.1423, pruned_loss=0.02928, audio_tagging_loss=0.00999, over 15864.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1017, pruned_loss=0.02023, audio_tagging_loss=0.01001, over 3052570.48 frames. ], batch size: 59, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:37:25,265 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-20 08:37:33,744 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-20 08:37:34,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1016286.6666666666, ans=10.0 2023-11-20 08:37:36,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1016286.6666666666, ans=0.125 2023-11-20 08:37:37,882 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152450 2023-11-20 08:37:38,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1016286.6666666666, ans=0.125 2023-11-20 08:37:38,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1016286.6666666666, ans=0.07 2023-11-20 08:37:56,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1016420.0, ans=0.125 2023-11-20 08:37:58,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1016420.0, ans=0.05 2023-11-20 08:38:00,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1016420.0, ans=0.125 2023-11-20 08:38:14,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-20 08:38:22,216 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8200, loss[loss=0.08344, simple_loss=0.1036, pruned_loss=0.02202, audio_tagging_loss=0.009636, over 15419.00 frames. ], tot_loss[loss=0.08038, simple_loss=0.101, pruned_loss=0.01993, audio_tagging_loss=0.009945, over 3048051.48 frames. ], batch size: 58, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:38:23,451 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:38:24,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1016553.3333333334, ans=0.125 2023-11-20 08:38:36,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1016620.0, ans=0.125 2023-11-20 08:38:42,197 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152500 2023-11-20 08:38:43,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1016620.0, ans=0.125 2023-11-20 08:38:49,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 7.999e+01 8.746e+01 9.539e+01 1.196e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 08:39:02,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1016753.3333333334, ans=0.07 2023-11-20 08:39:04,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-11-20 08:39:07,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1016753.3333333334, ans=0.1 2023-11-20 08:39:13,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1016820.0, ans=0.035 2023-11-20 08:39:23,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1016820.0, ans=0.1 2023-11-20 08:39:26,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1016886.6666666666, ans=0.125 2023-11-20 08:39:27,139 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8250, loss[loss=0.07927, simple_loss=0.1085, pruned_loss=0.01928, audio_tagging_loss=0.005751, over 15647.00 frames. ], tot_loss[loss=0.08022, simple_loss=0.1009, pruned_loss=0.01992, audio_tagging_loss=0.009861, over 3047960.97 frames. ], batch size: 56, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:39:28,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1016886.6666666666, ans=0.125 2023-11-20 08:39:46,097 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152550 2023-11-20 08:40:29,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1017153.3333333334, ans=0.125 2023-11-20 08:40:31,467 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8300, loss[loss=0.06912, simple_loss=0.0846, pruned_loss=0.01605, audio_tagging_loss=0.01077, over 14828.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.101, pruned_loss=0.01991, audio_tagging_loss=0.009934, over 3043851.76 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:40:49,665 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152600 2023-11-20 08:40:49,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1017286.6666666666, ans=0.2 2023-11-20 08:40:57,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.507e+01 8.110e+01 8.922e+01 9.710e+01 1.456e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 08:41:16,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1017420.0, ans=0.2 2023-11-20 08:41:35,103 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8350, loss[loss=0.06629, simple_loss=0.07749, pruned_loss=0.01372, audio_tagging_loss=0.01383, over 14342.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1012, pruned_loss=0.01999, audio_tagging_loss=0.009861, over 3047542.65 frames. ], batch size: 54, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:41:39,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1017553.3333333334, ans=0.125 2023-11-20 08:41:54,604 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152650 2023-11-20 08:42:24,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1017753.3333333334, ans=0.125 2023-11-20 08:42:24,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1017753.3333333334, ans=0.0 2023-11-20 08:42:39,804 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8400, loss[loss=0.09938, simple_loss=0.129, pruned_loss=0.02821, audio_tagging_loss=0.006658, over 16393.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1012, pruned_loss=0.01987, audio_tagging_loss=0.009799, over 3051452.47 frames. ], batch size: 59, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:42:49,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1017886.6666666666, ans=0.1 2023-11-20 08:42:59,257 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152700 2023-11-20 08:43:06,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.274e+01 8.016e+01 8.880e+01 9.653e+01 1.299e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 08:43:16,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1018086.6666666666, ans=0.125 2023-11-20 08:43:44,757 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8450, loss[loss=0.09057, simple_loss=0.1146, pruned_loss=0.02223, audio_tagging_loss=0.01102, over 15856.00 frames. ], tot_loss[loss=0.0798, simple_loss=0.1001, pruned_loss=0.01978, audio_tagging_loss=0.009952, over 3060543.96 frames. ], batch size: 59, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:43:52,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1018220.0, ans=0.125 2023-11-20 08:44:01,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-20 08:44:03,250 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152750 2023-11-20 08:44:24,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1018420.0, ans=0.2 2023-11-20 08:44:31,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1018420.0, ans=0.125 2023-11-20 08:44:43,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1018486.6666666666, ans=0.125 2023-11-20 08:44:48,086 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8500, loss[loss=0.07243, simple_loss=0.0952, pruned_loss=0.01602, audio_tagging_loss=0.008815, over 15270.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1008, pruned_loss=0.0199, audio_tagging_loss=0.009986, over 3052720.79 frames. ], batch size: 60, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:44:49,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1018553.3333333334, ans=0.2 2023-11-20 08:45:07,846 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152800 2023-11-20 08:45:10,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1018620.0, ans=0.0 2023-11-20 08:45:15,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.066e+01 8.928e+01 9.740e+01 1.439e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 08:45:22,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1018686.6666666666, ans=0.125 2023-11-20 08:45:45,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2023-11-20 08:45:48,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-20 08:45:53,043 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8550, loss[loss=0.07492, simple_loss=0.09222, pruned_loss=0.01695, audio_tagging_loss=0.01186, over 15892.00 frames. ], tot_loss[loss=0.08039, simple_loss=0.1008, pruned_loss=0.01993, audio_tagging_loss=0.01005, over 3058794.95 frames. ], batch size: 63, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:45:57,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1018886.6666666666, ans=0.0 2023-11-20 08:46:00,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1018886.6666666666, ans=0.125 2023-11-20 08:46:12,910 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152850 2023-11-20 08:46:23,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1019020.0, ans=0.125 2023-11-20 08:46:31,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1019086.6666666666, ans=0.0 2023-11-20 08:46:33,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1019086.6666666666, ans=0.5 2023-11-20 08:46:36,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-11-20 08:46:47,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2023-11-20 08:46:57,784 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8600, loss[loss=0.08732, simple_loss=0.1159, pruned_loss=0.01917, audio_tagging_loss=0.01019, over 15202.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.1016, pruned_loss=0.0202, audio_tagging_loss=0.01003, over 3059021.64 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:47:09,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1019286.6666666666, ans=0.025 2023-11-20 08:47:12,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019286.6666666666, ans=0.1 2023-11-20 08:47:16,755 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152900 2023-11-20 08:47:19,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2023-11-20 08:47:24,684 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 7.921e+01 8.657e+01 9.489e+01 1.169e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 08:47:40,334 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:47:45,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1019420.0, ans=0.125 2023-11-20 08:47:51,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1019486.6666666666, ans=0.2 2023-11-20 08:47:55,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1019486.6666666666, ans=0.125 2023-11-20 08:48:02,344 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8650, loss[loss=0.07672, simple_loss=0.09785, pruned_loss=0.01731, audio_tagging_loss=0.01048, over 16295.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.1028, pruned_loss=0.02022, audio_tagging_loss=0.01006, over 3056757.30 frames. ], batch size: 63, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:48:22,138 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 152950 2023-11-20 08:48:26,265 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-20 08:48:27,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1019686.6666666666, ans=0.2 2023-11-20 08:48:34,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1019686.6666666666, ans=0.125 2023-11-20 08:48:47,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=15.0 2023-11-20 08:49:03,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1019820.0, ans=0.1 2023-11-20 08:49:06,033 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8700, loss[loss=0.09709, simple_loss=0.1217, pruned_loss=0.02676, audio_tagging_loss=0.009488, over 14502.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1034, pruned_loss=0.02054, audio_tagging_loss=0.01013, over 3053674.08 frames. ], batch size: 54, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:49:25,909 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153000 2023-11-20 08:49:35,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.289e+01 9.010e+01 9.707e+01 1.388e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 08:49:43,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1020020.0, ans=0.0 2023-11-20 08:50:05,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1020153.3333333334, ans=0.125 2023-11-20 08:50:11,245 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8750, loss[loss=0.0783, simple_loss=0.09476, pruned_loss=0.01902, audio_tagging_loss=0.0119, over 15193.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1028, pruned_loss=0.02053, audio_tagging_loss=0.01016, over 3044004.37 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:50:30,885 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153050 2023-11-20 08:50:40,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1020353.3333333334, ans=0.125 2023-11-20 08:51:07,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1020486.6666666666, ans=0.125 2023-11-20 08:51:16,257 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8800, loss[loss=0.06385, simple_loss=0.0753, pruned_loss=0.01543, audio_tagging_loss=0.01076, over 15043.00 frames. ], tot_loss[loss=0.08199, simple_loss=0.1031, pruned_loss=0.02031, audio_tagging_loss=0.0101, over 3043075.69 frames. ], batch size: 59, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:51:31,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1020620.0, ans=0.125 2023-11-20 08:51:35,319 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153100 2023-11-20 08:51:44,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.394e+01 9.116e+01 1.011e+02 1.329e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 08:51:48,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1020686.6666666666, ans=0.1 2023-11-20 08:51:53,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2023-11-20 08:52:04,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1020753.3333333334, ans=0.125 2023-11-20 08:52:20,904 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8850, loss[loss=0.07183, simple_loss=0.09116, pruned_loss=0.01453, audio_tagging_loss=0.01171, over 16807.00 frames. ], tot_loss[loss=0.08229, simple_loss=0.1033, pruned_loss=0.02049, audio_tagging_loss=0.01016, over 3053316.02 frames. ], batch size: 62, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:52:33,733 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:52:39,968 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153150 2023-11-20 08:52:40,131 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:53:13,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-11-20 08:53:13,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2023-11-20 08:53:22,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1021153.3333333334, ans=0.5 2023-11-20 08:53:26,050 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8900, loss[loss=0.05236, simple_loss=0.06083, pruned_loss=0.01139, audio_tagging_loss=0.01056, over 14993.00 frames. ], tot_loss[loss=0.08187, simple_loss=0.1029, pruned_loss=0.0204, audio_tagging_loss=0.01003, over 3055224.79 frames. ], batch size: 62, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:53:33,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1021220.0, ans=0.1 2023-11-20 08:53:36,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1021220.0, ans=0.125 2023-11-20 08:53:39,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1021286.6666666666, ans=0.125 2023-11-20 08:53:45,915 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153200 2023-11-20 08:53:53,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1021353.3333333334, ans=0.0 2023-11-20 08:53:55,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.130e+01 8.663e+01 9.532e+01 1.311e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 08:54:21,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1021486.6666666666, ans=0.2 2023-11-20 08:54:31,118 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 8950, loss[loss=0.08821, simple_loss=0.1109, pruned_loss=0.02585, audio_tagging_loss=0.006906, over 14199.00 frames. ], tot_loss[loss=0.08095, simple_loss=0.1021, pruned_loss=0.02009, audio_tagging_loss=0.009819, over 3053406.92 frames. ], batch size: 53, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 08:54:32,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.88 vs. limit=5.0 2023-11-20 08:54:43,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=15.0 2023-11-20 08:54:44,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1021620.0, ans=0.1 2023-11-20 08:54:50,590 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153250 2023-11-20 08:55:26,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1021820.0, ans=0.0 2023-11-20 08:55:32,064 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:55:36,034 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9000, loss[loss=0.0701, simple_loss=0.08903, pruned_loss=0.01349, audio_tagging_loss=0.0121, over 16792.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1021, pruned_loss=0.02, audio_tagging_loss=0.009716, over 3057492.18 frames. ], batch size: 64, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:55:36,035 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 08:56:18,745 INFO [train_asr.py:1294] (2/4) Epoch 13, validation: loss=0.06245, simple_loss=0.0538, pruned_loss=0.005768, audio_tagging_loss=0.02978, over 4681554.00 frames. 2023-11-20 08:56:18,746 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 08:56:22,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1021886.6666666666, ans=0.0 2023-11-20 08:56:36,984 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153300 2023-11-20 08:56:48,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.276e+01 8.856e+01 9.604e+01 3.298e+02, threshold=1.771e+02, percent-clipped=1.0 2023-11-20 08:56:57,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=22.5 2023-11-20 08:56:58,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1022086.6666666666, ans=0.0 2023-11-20 08:56:59,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1022086.6666666666, ans=0.125 2023-11-20 08:57:17,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1022153.3333333334, ans=0.2 2023-11-20 08:57:17,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1022153.3333333334, ans=0.1 2023-11-20 08:57:20,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1022153.3333333334, ans=0.125 2023-11-20 08:57:22,141 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9050, loss[loss=0.08528, simple_loss=0.09671, pruned_loss=0.02279, audio_tagging_loss=0.01414, over 15440.00 frames. ], tot_loss[loss=0.08065, simple_loss=0.1016, pruned_loss=0.02002, audio_tagging_loss=0.00983, over 3063192.28 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:57:41,896 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153350 2023-11-20 08:57:47,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1022353.3333333334, ans=0.125 2023-11-20 08:57:55,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=12.0 2023-11-20 08:57:56,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1022353.3333333334, ans=0.125 2023-11-20 08:58:04,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1022420.0, ans=0.125 2023-11-20 08:58:07,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.08 vs. limit=10.0 2023-11-20 08:58:19,463 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-11-20 08:58:24,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2023-11-20 08:58:26,699 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9100, loss[loss=0.1034, simple_loss=0.1344, pruned_loss=0.02737, audio_tagging_loss=0.008848, over 14484.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1019, pruned_loss=0.02001, audio_tagging_loss=0.009806, over 3064131.37 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:58:27,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1022553.3333333334, ans=0.125 2023-11-20 08:58:46,280 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153400 2023-11-20 08:58:55,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1022686.6666666666, ans=0.2 2023-11-20 08:58:57,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.168e+01 8.794e+01 9.522e+01 1.542e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 08:59:00,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1022686.6666666666, ans=0.09899494936611666 2023-11-20 08:59:07,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1022753.3333333334, ans=0.0 2023-11-20 08:59:15,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1022753.3333333334, ans=0.0 2023-11-20 08:59:15,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1022753.3333333334, ans=0.2 2023-11-20 08:59:19,119 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-20 08:59:31,262 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9150, loss[loss=0.06758, simple_loss=0.09002, pruned_loss=0.0118, audio_tagging_loss=0.01077, over 13955.00 frames. ], tot_loss[loss=0.08112, simple_loss=0.102, pruned_loss=0.02023, audio_tagging_loss=0.009878, over 3053736.01 frames. ], batch size: 54, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:59:32,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1022886.6666666666, ans=0.0 2023-11-20 08:59:40,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1022886.6666666666, ans=0.0 2023-11-20 08:59:50,136 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153450 2023-11-20 09:00:10,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1023086.6666666666, ans=0.0 2023-11-20 09:00:13,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1023086.6666666666, ans=0.125 2023-11-20 09:00:18,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1023086.6666666666, ans=0.2 2023-11-20 09:00:28,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1023153.3333333334, ans=0.2 2023-11-20 09:00:35,467 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9200, loss[loss=0.1044, simple_loss=0.1371, pruned_loss=0.02694, audio_tagging_loss=0.008952, over 16059.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.1014, pruned_loss=0.02001, audio_tagging_loss=0.009845, over 3051667.67 frames. ], batch size: 54, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:00:55,649 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153500 2023-11-20 09:01:07,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.430e+01 8.062e+01 8.603e+01 9.204e+01 1.228e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-20 09:01:18,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1023420.0, ans=0.1 2023-11-20 09:01:34,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.15 vs. limit=10.0 2023-11-20 09:01:39,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1023553.3333333334, ans=0.125 2023-11-20 09:01:40,790 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9250, loss[loss=0.1118, simple_loss=0.1484, pruned_loss=0.03197, audio_tagging_loss=0.005698, over 14467.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1018, pruned_loss=0.02027, audio_tagging_loss=0.009864, over 3061274.22 frames. ], batch size: 54, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:01:41,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1023553.3333333334, ans=0.125 2023-11-20 09:01:47,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1023553.3333333334, ans=0.0 2023-11-20 09:02:00,766 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153550 2023-11-20 09:02:00,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1023620.0, ans=0.125 2023-11-20 09:02:06,369 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:02:09,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.80 vs. limit=10.0 2023-11-20 09:02:40,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1023820.0, ans=0.2 2023-11-20 09:02:46,033 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9300, loss[loss=0.06686, simple_loss=0.07455, pruned_loss=0.01832, audio_tagging_loss=0.01126, over 16025.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.1013, pruned_loss=0.02002, audio_tagging_loss=0.009948, over 3059168.33 frames. ], batch size: 60, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:02:50,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1023886.6666666666, ans=0.0 2023-11-20 09:02:55,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1023886.6666666666, ans=0.125 2023-11-20 09:02:58,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=22.5 2023-11-20 09:03:02,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1023953.3333333334, ans=0.125 2023-11-20 09:03:03,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1023953.3333333334, ans=0.1 2023-11-20 09:03:05,348 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153600 2023-11-20 09:03:17,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.349e+01 9.030e+01 1.018e+02 1.384e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 09:03:20,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1024020.0, ans=0.125 2023-11-20 09:03:22,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1024020.0, ans=0.0 2023-11-20 09:03:25,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1024086.6666666666, ans=0.04949747468305833 2023-11-20 09:03:33,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024086.6666666666, ans=0.1 2023-11-20 09:03:35,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1024086.6666666666, ans=0.125 2023-11-20 09:03:51,229 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9350, loss[loss=0.06972, simple_loss=0.08615, pruned_loss=0.01388, audio_tagging_loss=0.01276, over 15008.00 frames. ], tot_loss[loss=0.08081, simple_loss=0.1017, pruned_loss=0.02005, audio_tagging_loss=0.009902, over 3058588.95 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:04:03,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1024286.6666666666, ans=0.0 2023-11-20 09:04:06,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1024286.6666666666, ans=0.125 2023-11-20 09:04:10,023 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153650 2023-11-20 09:04:23,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=12.0 2023-11-20 09:04:25,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-11-20 09:04:31,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1024420.0, ans=0.125 2023-11-20 09:04:33,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1024420.0, ans=0.125 2023-11-20 09:04:37,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1024420.0, ans=0.0 2023-11-20 09:04:51,087 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-20 09:04:54,545 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9400, loss[loss=0.07041, simple_loss=0.08341, pruned_loss=0.01885, audio_tagging_loss=0.009861, over 14915.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1016, pruned_loss=0.01996, audio_tagging_loss=0.009899, over 3055064.89 frames. ], batch size: 59, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:05:14,396 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153700 2023-11-20 09:05:19,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1024620.0, ans=0.125 2023-11-20 09:05:20,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-20 09:05:22,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1024686.6666666666, ans=0.125 2023-11-20 09:05:26,176 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.189e+01 8.740e+01 9.691e+01 1.507e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 09:05:51,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1024820.0, ans=0.2 2023-11-20 09:05:55,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1024820.0, ans=0.125 2023-11-20 09:05:57,740 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:05:58,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1024886.6666666666, ans=0.1 2023-11-20 09:05:59,520 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9450, loss[loss=0.08173, simple_loss=0.1034, pruned_loss=0.01804, audio_tagging_loss=0.01199, over 14881.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1023, pruned_loss=0.02012, audio_tagging_loss=0.01005, over 3058884.48 frames. ], batch size: 54, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:06:18,714 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153750 2023-11-20 09:06:28,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1025020.0, ans=0.125 2023-11-20 09:06:39,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1025086.6666666666, ans=0.0 2023-11-20 09:06:51,245 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:06:53,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1025153.3333333334, ans=0.1 2023-11-20 09:07:04,154 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9500, loss[loss=0.0953, simple_loss=0.1308, pruned_loss=0.02122, audio_tagging_loss=0.008691, over 16291.00 frames. ], tot_loss[loss=0.08237, simple_loss=0.1037, pruned_loss=0.02051, audio_tagging_loss=0.01004, over 3052113.05 frames. ], batch size: 59, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:07:06,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1025220.0, ans=0.0 2023-11-20 09:07:21,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=15.0 2023-11-20 09:07:23,688 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153800 2023-11-20 09:07:35,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.164e+01 8.858e+01 9.394e+01 2.637e+02, threshold=1.772e+02, percent-clipped=1.0 2023-11-20 09:07:44,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1025420.0, ans=0.125 2023-11-20 09:07:59,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1025486.6666666666, ans=0.0 2023-11-20 09:08:09,384 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9550, loss[loss=0.08533, simple_loss=0.1073, pruned_loss=0.01845, audio_tagging_loss=0.01325, over 15072.00 frames. ], tot_loss[loss=0.08186, simple_loss=0.1028, pruned_loss=0.02027, audio_tagging_loss=0.01018, over 3049541.05 frames. ], batch size: 54, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:08:16,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1025553.3333333334, ans=0.2 2023-11-20 09:08:17,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1025553.3333333334, ans=0.125 2023-11-20 09:08:18,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1025553.3333333334, ans=0.125 2023-11-20 09:08:29,314 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153850 2023-11-20 09:08:31,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1025620.0, ans=0.125 2023-11-20 09:08:49,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1025753.3333333334, ans=0.125 2023-11-20 09:09:05,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2023-11-20 09:09:12,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1025820.0, ans=0.125 2023-11-20 09:09:14,888 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9600, loss[loss=0.07401, simple_loss=0.08783, pruned_loss=0.0199, audio_tagging_loss=0.0102, over 15364.00 frames. ], tot_loss[loss=0.08133, simple_loss=0.1019, pruned_loss=0.02011, audio_tagging_loss=0.01027, over 3051831.77 frames. ], batch size: 59, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:09:16,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1025886.6666666666, ans=0.1 2023-11-20 09:09:34,181 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153900 2023-11-20 09:09:44,975 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 7.953e+01 8.946e+01 9.937e+01 1.277e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 09:09:45,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026020.0, ans=0.1 2023-11-20 09:09:56,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1026086.6666666666, ans=0.0 2023-11-20 09:10:12,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1026153.3333333334, ans=0.125 2023-11-20 09:10:19,602 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9650, loss[loss=0.08749, simple_loss=0.1154, pruned_loss=0.02104, audio_tagging_loss=0.008735, over 15008.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.102, pruned_loss=0.02004, audio_tagging_loss=0.01014, over 3057547.71 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:10:30,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1026286.6666666666, ans=0.125 2023-11-20 09:10:38,854 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 153950 2023-11-20 09:10:59,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1026420.0, ans=0.1 2023-11-20 09:11:00,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026420.0, ans=0.1 2023-11-20 09:11:04,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-20 09:11:23,342 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9700, loss[loss=0.07478, simple_loss=0.09796, pruned_loss=0.01634, audio_tagging_loss=0.009455, over 15725.00 frames. ], tot_loss[loss=0.08149, simple_loss=0.1024, pruned_loss=0.02022, audio_tagging_loss=0.01008, over 3051603.11 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:11:43,180 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154000 2023-11-20 09:11:48,866 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2023-11-20 09:11:54,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.116e+01 8.846e+01 9.566e+01 1.154e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 09:11:59,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1026686.6666666666, ans=0.0 2023-11-20 09:12:04,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1026753.3333333334, ans=0.0 2023-11-20 09:12:09,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026753.3333333334, ans=0.1 2023-11-20 09:12:16,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1026820.0, ans=0.1 2023-11-20 09:12:16,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1026820.0, ans=0.125 2023-11-20 09:12:22,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1026820.0, ans=0.125 2023-11-20 09:12:27,847 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9750, loss[loss=0.0609, simple_loss=0.07432, pruned_loss=0.01157, audio_tagging_loss=0.01217, over 14993.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.1017, pruned_loss=0.01999, audio_tagging_loss=0.009967, over 3049528.48 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:12:36,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-11-20 09:12:39,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=22.5 2023-11-20 09:12:48,289 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154050 2023-11-20 09:12:49,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1026953.3333333334, ans=0.2 2023-11-20 09:13:10,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=22.5 2023-11-20 09:13:19,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1027153.3333333334, ans=0.1 2023-11-20 09:13:32,816 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9800, loss[loss=0.061, simple_loss=0.07579, pruned_loss=0.01109, audio_tagging_loss=0.01202, over 14484.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1007, pruned_loss=0.02, audio_tagging_loss=0.01, over 3045519.29 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:13:47,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1027286.6666666666, ans=0.125 2023-11-20 09:13:51,961 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154100 2023-11-20 09:13:57,912 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-11-20 09:13:58,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2023-11-20 09:13:58,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1027353.3333333334, ans=0.125 2023-11-20 09:14:03,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.195e+01 8.924e+01 9.693e+01 1.492e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 09:14:03,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1027353.3333333334, ans=0.05 2023-11-20 09:14:08,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-20 09:14:09,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2023-11-20 09:14:14,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1027420.0, ans=0.2 2023-11-20 09:14:15,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1027420.0, ans=0.035 2023-11-20 09:14:19,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1027420.0, ans=0.125 2023-11-20 09:14:20,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1027420.0, ans=0.125 2023-11-20 09:14:23,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2023-11-20 09:14:30,952 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:14:37,026 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9850, loss[loss=0.04386, simple_loss=0.04872, pruned_loss=0.006836, audio_tagging_loss=0.01266, over 15225.00 frames. ], tot_loss[loss=0.08028, simple_loss=0.1005, pruned_loss=0.02006, audio_tagging_loss=0.009997, over 3045579.34 frames. ], batch size: 62, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:14:37,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1027553.3333333334, ans=0.0 2023-11-20 09:14:38,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1027553.3333333334, ans=0.125 2023-11-20 09:14:42,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.66 vs. limit=10.0 2023-11-20 09:14:48,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1027620.0, ans=0.1 2023-11-20 09:14:53,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1027620.0, ans=0.125 2023-11-20 09:14:56,565 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154150 2023-11-20 09:15:15,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=22.5 2023-11-20 09:15:16,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1027753.3333333334, ans=0.125 2023-11-20 09:15:19,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1027753.3333333334, ans=0.0 2023-11-20 09:15:24,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1027753.3333333334, ans=0.2 2023-11-20 09:15:33,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1027820.0, ans=0.125 2023-11-20 09:15:41,458 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9900, loss[loss=0.0659, simple_loss=0.07739, pruned_loss=0.01467, audio_tagging_loss=0.01254, over 15815.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1005, pruned_loss=0.01989, audio_tagging_loss=0.01001, over 3038165.35 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:16:01,302 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154200 2023-11-20 09:16:08,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1028020.0, ans=0.125 2023-11-20 09:16:11,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1028020.0, ans=0.1 2023-11-20 09:16:14,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.724e+01 8.225e+01 8.765e+01 9.379e+01 1.572e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 09:16:30,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1028086.6666666666, ans=0.125 2023-11-20 09:16:39,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1028153.3333333334, ans=0.0 2023-11-20 09:16:47,256 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 9950, loss[loss=0.07562, simple_loss=0.09145, pruned_loss=0.01994, audio_tagging_loss=0.009951, over 15276.00 frames. ], tot_loss[loss=0.0791, simple_loss=0.09898, pruned_loss=0.01952, audio_tagging_loss=0.01009, over 3042598.14 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:16:48,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1028220.0, ans=0.04949747468305833 2023-11-20 09:16:59,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1028286.6666666666, ans=0.125 2023-11-20 09:17:06,172 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154250 2023-11-20 09:17:08,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1028286.6666666666, ans=10.0 2023-11-20 09:17:08,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1028286.6666666666, ans=0.09899494936611666 2023-11-20 09:17:10,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2023-11-20 09:17:24,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1028420.0, ans=0.125 2023-11-20 09:17:26,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-20 09:17:27,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1028420.0, ans=0.0 2023-11-20 09:17:34,262 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:17:36,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1028420.0, ans=0.2 2023-11-20 09:17:45,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-11-20 09:17:46,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1028486.6666666666, ans=0.2 2023-11-20 09:17:51,772 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10000, loss[loss=0.09307, simple_loss=0.1231, pruned_loss=0.02001, audio_tagging_loss=0.01151, over 16025.00 frames. ], tot_loss[loss=0.07924, simple_loss=0.09943, pruned_loss=0.01942, audio_tagging_loss=0.0101, over 3041145.29 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:18:10,884 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154300 2023-11-20 09:18:21,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1028686.6666666666, ans=0.125 2023-11-20 09:18:23,633 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.210e+01 8.752e+01 9.474e+01 1.370e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 09:18:38,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1028753.3333333334, ans=0.125 2023-11-20 09:18:46,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1028820.0, ans=0.2 2023-11-20 09:18:56,658 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10050, loss[loss=0.06972, simple_loss=0.09227, pruned_loss=0.01494, audio_tagging_loss=0.008639, over 14433.00 frames. ], tot_loss[loss=0.07974, simple_loss=0.1, pruned_loss=0.01967, audio_tagging_loss=0.01005, over 3034298.09 frames. ], batch size: 54, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:19:09,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-11-20 09:19:16,397 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154350 2023-11-20 09:19:34,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1029086.6666666666, ans=0.125 2023-11-20 09:19:46,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1029086.6666666666, ans=0.0 2023-11-20 09:20:01,393 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10100, loss[loss=0.0835, simple_loss=0.1035, pruned_loss=0.02079, audio_tagging_loss=0.01095, over 15418.00 frames. ], tot_loss[loss=0.08045, simple_loss=0.1009, pruned_loss=0.01995, audio_tagging_loss=0.01004, over 3043844.32 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:20:11,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-20 09:20:11,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2023-11-20 09:20:20,388 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154400 2023-11-20 09:20:31,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1029353.3333333334, ans=0.125 2023-11-20 09:20:35,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.287e+01 9.301e+01 1.019e+02 1.504e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-20 09:20:53,517 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:20:56,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1029486.6666666666, ans=0.0 2023-11-20 09:20:57,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1029486.6666666666, ans=0.0 2023-11-20 09:21:05,868 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10150, loss[loss=0.103, simple_loss=0.1317, pruned_loss=0.02778, audio_tagging_loss=0.009369, over 16195.00 frames. ], tot_loss[loss=0.08082, simple_loss=0.1016, pruned_loss=0.01997, audio_tagging_loss=0.01007, over 3045575.87 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:21:24,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1029620.0, ans=0.125 2023-11-20 09:21:25,610 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154450 2023-11-20 09:21:36,029 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:22:10,484 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10200, loss[loss=0.06794, simple_loss=0.09073, pruned_loss=0.01223, audio_tagging_loss=0.01034, over 15130.00 frames. ], tot_loss[loss=0.08135, simple_loss=0.1022, pruned_loss=0.02026, audio_tagging_loss=0.009995, over 3041588.65 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:22:29,566 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154500 2023-11-20 09:22:34,900 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:22:37,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030020.0, ans=0.1 2023-11-20 09:22:41,604 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-11-20 09:22:43,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.371e+01 8.118e+01 8.919e+01 9.960e+01 1.274e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 09:22:43,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1030020.0, ans=0.09899494936611666 2023-11-20 09:22:53,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1030086.6666666666, ans=0.125 2023-11-20 09:22:53,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1030086.6666666666, ans=0.1 2023-11-20 09:23:13,983 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10250, loss[loss=0.07166, simple_loss=0.08121, pruned_loss=0.01853, audio_tagging_loss=0.01253, over 15224.00 frames. ], tot_loss[loss=0.08156, simple_loss=0.1025, pruned_loss=0.02034, audio_tagging_loss=0.009984, over 3042312.90 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:23:16,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1030220.0, ans=0.2 2023-11-20 09:23:21,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-11-20 09:23:28,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-20 09:23:33,186 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154550 2023-11-20 09:23:49,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1030353.3333333334, ans=0.2 2023-11-20 09:24:06,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1030486.6666666666, ans=0.0 2023-11-20 09:24:16,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1030486.6666666666, ans=0.015 2023-11-20 09:24:19,140 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10300, loss[loss=0.0754, simple_loss=0.0853, pruned_loss=0.01835, audio_tagging_loss=0.0144, over 14892.00 frames. ], tot_loss[loss=0.08183, simple_loss=0.1028, pruned_loss=0.02035, audio_tagging_loss=0.01006, over 3052805.07 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:24:21,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1030553.3333333334, ans=0.2 2023-11-20 09:24:38,886 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154600 2023-11-20 09:24:47,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=22.5 2023-11-20 09:24:52,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1030686.6666666666, ans=0.125 2023-11-20 09:24:53,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.311e+01 8.875e+01 9.603e+01 1.202e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 09:24:55,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1030686.6666666666, ans=0.0 2023-11-20 09:25:04,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030753.3333333334, ans=0.1 2023-11-20 09:25:17,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1030820.0, ans=0.0 2023-11-20 09:25:24,169 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10350, loss[loss=0.08206, simple_loss=0.1129, pruned_loss=0.01665, audio_tagging_loss=0.00897, over 18106.00 frames. ], tot_loss[loss=0.08121, simple_loss=0.1019, pruned_loss=0.02009, audio_tagging_loss=0.01018, over 3048994.84 frames. ], batch size: 65, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:25:42,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1030953.3333333334, ans=0.125 2023-11-20 09:25:43,767 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154650 2023-11-20 09:26:10,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1031086.6666666666, ans=0.125 2023-11-20 09:26:24,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1031153.3333333334, ans=0.125 2023-11-20 09:26:29,375 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10400, loss[loss=0.08554, simple_loss=0.1109, pruned_loss=0.02191, audio_tagging_loss=0.008162, over 16865.00 frames. ], tot_loss[loss=0.08051, simple_loss=0.1009, pruned_loss=0.0198, audio_tagging_loss=0.01025, over 3045444.05 frames. ], batch size: 62, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:26:48,732 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154700 2023-11-20 09:27:03,088 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.158e+01 8.127e+01 8.781e+01 9.645e+01 1.274e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 09:27:34,488 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10450, loss[loss=0.08729, simple_loss=0.1124, pruned_loss=0.02075, audio_tagging_loss=0.01035, over 15904.00 frames. ], tot_loss[loss=0.08031, simple_loss=0.1009, pruned_loss=0.01971, audio_tagging_loss=0.01013, over 3046571.70 frames. ], batch size: 59, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:27:43,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1031553.3333333334, ans=0.2 2023-11-20 09:27:46,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1031620.0, ans=0.125 2023-11-20 09:27:53,808 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154750 2023-11-20 09:27:59,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1031686.6666666666, ans=0.1 2023-11-20 09:27:59,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1031686.6666666666, ans=0.125 2023-11-20 09:28:02,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1031686.6666666666, ans=15.0 2023-11-20 09:28:20,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1031753.3333333334, ans=0.125 2023-11-20 09:28:38,634 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10500, loss[loss=0.06909, simple_loss=0.0799, pruned_loss=0.01681, audio_tagging_loss=0.01233, over 14010.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.101, pruned_loss=0.01994, audio_tagging_loss=0.009969, over 3045195.24 frames. ], batch size: 55, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:28:48,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1031886.6666666666, ans=0.125 2023-11-20 09:28:51,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1031953.3333333334, ans=0.125 2023-11-20 09:28:52,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1031953.3333333334, ans=0.1 2023-11-20 09:28:54,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1031953.3333333334, ans=0.2 2023-11-20 09:28:59,036 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154800 2023-11-20 09:29:10,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2023-11-20 09:29:13,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.208e+01 9.112e+01 1.062e+02 1.393e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-20 09:29:13,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1032020.0, ans=0.0 2023-11-20 09:29:22,604 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2023-11-20 09:29:32,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1032153.3333333334, ans=0.0 2023-11-20 09:29:32,940 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:29:45,119 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10550, loss[loss=0.09997, simple_loss=0.1317, pruned_loss=0.02586, audio_tagging_loss=0.008229, over 16265.00 frames. ], tot_loss[loss=0.07979, simple_loss=0.1005, pruned_loss=0.01966, audio_tagging_loss=0.009888, over 3039233.37 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:30:02,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1032286.6666666666, ans=0.0 2023-11-20 09:30:04,324 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154850 2023-11-20 09:30:06,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1032286.6666666666, ans=0.1 2023-11-20 09:30:14,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1032353.3333333334, ans=0.125 2023-11-20 09:30:49,029 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10600, loss[loss=0.08032, simple_loss=0.1015, pruned_loss=0.01898, audio_tagging_loss=0.01061, over 15540.00 frames. ], tot_loss[loss=0.0799, simple_loss=0.1007, pruned_loss=0.01973, audio_tagging_loss=0.009806, over 3042009.77 frames. ], batch size: 59, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:30:49,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-20 09:30:55,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1032553.3333333334, ans=0.0 2023-11-20 09:31:05,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1032620.0, ans=0.09899494936611666 2023-11-20 09:31:08,137 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154900 2023-11-20 09:31:18,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1032686.6666666666, ans=0.07 2023-11-20 09:31:19,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1032686.6666666666, ans=0.125 2023-11-20 09:31:21,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.175e+01 8.791e+01 9.542e+01 1.185e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 09:31:29,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1032753.3333333334, ans=0.125 2023-11-20 09:31:32,595 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.88 vs. limit=10.0 2023-11-20 09:31:46,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2023-11-20 09:31:52,012 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10650, loss[loss=0.1021, simple_loss=0.1262, pruned_loss=0.02913, audio_tagging_loss=0.009868, over 14807.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1006, pruned_loss=0.01973, audio_tagging_loss=0.009778, over 3049247.39 frames. ], batch size: 53, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:32:12,485 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 154950 2023-11-20 09:32:34,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1033086.6666666666, ans=0.125 2023-11-20 09:32:48,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1033153.3333333334, ans=0.0 2023-11-20 09:32:55,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.65 vs. limit=15.0 2023-11-20 09:32:56,689 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10700, loss[loss=0.08342, simple_loss=0.1054, pruned_loss=0.02214, audio_tagging_loss=0.008601, over 15707.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1006, pruned_loss=0.01974, audio_tagging_loss=0.009884, over 3049114.46 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:32:58,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1033220.0, ans=0.125 2023-11-20 09:33:00,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=12.0 2023-11-20 09:33:03,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1033220.0, ans=0.025 2023-11-20 09:33:05,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1033220.0, ans=0.95 2023-11-20 09:33:16,893 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155000 2023-11-20 09:33:30,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.303e+01 8.834e+01 9.458e+01 1.451e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 09:34:02,494 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10750, loss[loss=0.08634, simple_loss=0.1108, pruned_loss=0.02339, audio_tagging_loss=0.007552, over 15030.00 frames. ], tot_loss[loss=0.07933, simple_loss=0.09965, pruned_loss=0.01959, audio_tagging_loss=0.009921, over 3043603.67 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:34:02,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1033553.3333333334, ans=0.1 2023-11-20 09:34:12,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.62 vs. limit=10.0 2023-11-20 09:34:21,032 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155050 2023-11-20 09:34:24,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1033620.0, ans=0.0 2023-11-20 09:34:41,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-11-20 09:34:48,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1033753.3333333334, ans=0.0 2023-11-20 09:34:51,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1033753.3333333334, ans=0.0 2023-11-20 09:35:01,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1033820.0, ans=0.125 2023-11-20 09:35:06,221 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10800, loss[loss=0.08046, simple_loss=0.1044, pruned_loss=0.01923, audio_tagging_loss=0.009048, over 14398.00 frames. ], tot_loss[loss=0.07887, simple_loss=0.09909, pruned_loss=0.01942, audio_tagging_loss=0.009898, over 3041180.00 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:35:26,113 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155100 2023-11-20 09:35:29,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1033953.3333333334, ans=0.0 2023-11-20 09:35:39,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1034020.0, ans=0.0 2023-11-20 09:35:40,781 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.539e+01 8.046e+01 8.532e+01 9.175e+01 1.216e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 09:35:53,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1034086.6666666666, ans=0.2 2023-11-20 09:36:11,290 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10850, loss[loss=0.09045, simple_loss=0.1149, pruned_loss=0.02233, audio_tagging_loss=0.01065, over 15569.00 frames. ], tot_loss[loss=0.07896, simple_loss=0.09903, pruned_loss=0.01945, audio_tagging_loss=0.009994, over 3043171.93 frames. ], batch size: 58, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:36:32,058 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155150 2023-11-20 09:36:58,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1034420.0, ans=0.0 2023-11-20 09:37:00,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-20 09:37:05,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1034486.6666666666, ans=0.125 2023-11-20 09:37:06,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1034486.6666666666, ans=0.125 2023-11-20 09:37:12,659 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:37:16,323 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10900, loss[loss=0.0958, simple_loss=0.1223, pruned_loss=0.02439, audio_tagging_loss=0.01025, over 15563.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.09996, pruned_loss=0.01978, audio_tagging_loss=0.009998, over 3046621.99 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:37:35,751 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155200 2023-11-20 09:37:39,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1034620.0, ans=0.125 2023-11-20 09:37:50,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.200e+01 8.772e+01 9.722e+01 1.243e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 09:37:50,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1034686.6666666666, ans=0.04949747468305833 2023-11-20 09:38:02,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1034753.3333333334, ans=0.05 2023-11-20 09:38:12,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1034820.0, ans=0.125 2023-11-20 09:38:20,709 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 10950, loss[loss=0.07667, simple_loss=0.0927, pruned_loss=0.02088, audio_tagging_loss=0.009439, over 15783.00 frames. ], tot_loss[loss=0.07987, simple_loss=0.1002, pruned_loss=0.01975, audio_tagging_loss=0.01005, over 3041633.92 frames. ], batch size: 61, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:38:36,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2023-11-20 09:38:39,906 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155250 2023-11-20 09:38:48,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1035020.0, ans=0.0 2023-11-20 09:38:51,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1035020.0, ans=0.025 2023-11-20 09:39:06,881 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-11-20 09:39:17,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1035153.3333333334, ans=0.0 2023-11-20 09:39:24,993 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11000, loss[loss=0.0877, simple_loss=0.1036, pruned_loss=0.02392, audio_tagging_loss=0.01197, over 13728.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1008, pruned_loss=0.0198, audio_tagging_loss=0.01007, over 3035618.02 frames. ], batch size: 55, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:39:29,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1035220.0, ans=0.2 2023-11-20 09:39:35,547 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:39:44,331 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155300 2023-11-20 09:39:55,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1035353.3333333334, ans=0.0 2023-11-20 09:39:59,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.120e+01 8.817e+01 9.505e+01 1.234e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-20 09:40:11,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1035420.0, ans=0.125 2023-11-20 09:40:14,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-20 09:40:24,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1035486.6666666666, ans=0.125 2023-11-20 09:40:29,854 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11050, loss[loss=0.06811, simple_loss=0.08719, pruned_loss=0.01349, audio_tagging_loss=0.01102, over 16006.00 frames. ], tot_loss[loss=0.08058, simple_loss=0.1014, pruned_loss=0.0199, audio_tagging_loss=0.009976, over 3038273.98 frames. ], batch size: 62, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:40:42,704 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:40:49,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2023-11-20 09:40:49,945 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155350 2023-11-20 09:41:15,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1035753.3333333334, ans=0.125 2023-11-20 09:41:28,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1035820.0, ans=0.0 2023-11-20 09:41:34,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-11-20 09:41:34,887 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11100, loss[loss=0.07039, simple_loss=0.08405, pruned_loss=0.01641, audio_tagging_loss=0.01195, over 15352.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.1013, pruned_loss=0.01995, audio_tagging_loss=0.01, over 3047446.91 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:41:54,090 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155400 2023-11-20 09:42:00,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1036020.0, ans=0.125 2023-11-20 09:42:00,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1036020.0, ans=0.125 2023-11-20 09:42:09,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.254e+01 9.028e+01 9.759e+01 1.162e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 09:42:39,720 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11150, loss[loss=0.09433, simple_loss=0.1126, pruned_loss=0.02697, audio_tagging_loss=0.01104, over 14428.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.1011, pruned_loss=0.01983, audio_tagging_loss=0.01018, over 3053685.46 frames. ], batch size: 52, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:42:58,728 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155450 2023-11-20 09:43:16,072 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:43:26,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1036420.0, ans=0.0 2023-11-20 09:43:44,256 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11200, loss[loss=0.09814, simple_loss=0.1184, pruned_loss=0.02501, audio_tagging_loss=0.01393, over 15562.00 frames. ], tot_loss[loss=0.08028, simple_loss=0.1009, pruned_loss=0.01966, audio_tagging_loss=0.01017, over 3060164.53 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:43:53,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1036553.3333333334, ans=0.125 2023-11-20 09:44:03,203 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155500 2023-11-20 09:44:05,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1036620.0, ans=0.0 2023-11-20 09:44:19,034 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.019e+01 8.498e+01 9.323e+01 1.224e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-20 09:44:32,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1036753.3333333334, ans=0.125 2023-11-20 09:44:48,551 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11250, loss[loss=0.07998, simple_loss=0.105, pruned_loss=0.0199, audio_tagging_loss=0.007607, over 14497.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1005, pruned_loss=0.01976, audio_tagging_loss=0.01014, over 3064602.34 frames. ], batch size: 54, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:44:57,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1036886.6666666666, ans=0.125 2023-11-20 09:45:05,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2023-11-20 09:45:08,246 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155550 2023-11-20 09:45:12,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1036953.3333333334, ans=0.125 2023-11-20 09:45:14,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1037020.0, ans=10.0 2023-11-20 09:45:27,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1037086.6666666666, ans=0.1 2023-11-20 09:45:53,989 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11300, loss[loss=0.08567, simple_loss=0.106, pruned_loss=0.02304, audio_tagging_loss=0.009614, over 15229.00 frames. ], tot_loss[loss=0.0799, simple_loss=0.1004, pruned_loss=0.01971, audio_tagging_loss=0.01, over 3060227.30 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:45:59,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1037220.0, ans=0.2 2023-11-20 09:46:13,185 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155600 2023-11-20 09:46:21,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1037353.3333333334, ans=0.125 2023-11-20 09:46:28,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.236e+01 9.129e+01 9.698e+01 1.564e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-20 09:46:52,753 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:46:59,334 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11350, loss[loss=0.07231, simple_loss=0.09767, pruned_loss=0.01363, audio_tagging_loss=0.009847, over 14644.00 frames. ], tot_loss[loss=0.07996, simple_loss=0.1006, pruned_loss=0.01986, audio_tagging_loss=0.009825, over 3056643.24 frames. ], batch size: 55, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:47:16,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1037620.0, ans=0.0 2023-11-20 09:47:18,672 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155650 2023-11-20 09:47:21,343 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:47:21,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-20 09:47:40,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2023-11-20 09:47:48,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1037753.3333333334, ans=0.125 2023-11-20 09:48:01,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-11-20 09:48:02,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1037820.0, ans=0.1 2023-11-20 09:48:04,513 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11400, loss[loss=0.1, simple_loss=0.1186, pruned_loss=0.0327, audio_tagging_loss=0.008023, over 14578.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.1002, pruned_loss=0.01973, audio_tagging_loss=0.009795, over 3041620.08 frames. ], batch size: 54, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:48:08,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-20 09:48:13,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1037886.6666666666, ans=0.0 2023-11-20 09:48:24,553 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155700 2023-11-20 09:48:28,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-11-20 09:48:39,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.071e+01 7.956e+01 8.738e+01 9.892e+01 2.201e+02, threshold=1.748e+02, percent-clipped=1.0 2023-11-20 09:48:47,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1038086.6666666666, ans=0.125 2023-11-20 09:48:49,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1038086.6666666666, ans=0.0 2023-11-20 09:48:50,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1038086.6666666666, ans=0.2 2023-11-20 09:49:09,133 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11450, loss[loss=0.07166, simple_loss=0.09389, pruned_loss=0.01491, audio_tagging_loss=0.009808, over 15955.00 frames. ], tot_loss[loss=0.07947, simple_loss=0.1002, pruned_loss=0.01961, audio_tagging_loss=0.009751, over 3042526.35 frames. ], batch size: 60, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:49:23,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-20 09:49:24,204 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-20 09:49:27,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1038286.6666666666, ans=0.0 2023-11-20 09:49:28,748 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155750 2023-11-20 09:49:31,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1038286.6666666666, ans=0.0 2023-11-20 09:49:40,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1038353.3333333334, ans=0.125 2023-11-20 09:50:04,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1038486.6666666666, ans=0.2 2023-11-20 09:50:13,897 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11500, loss[loss=0.07168, simple_loss=0.0937, pruned_loss=0.01507, audio_tagging_loss=0.009767, over 15838.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1007, pruned_loss=0.01976, audio_tagging_loss=0.009705, over 3041231.60 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:50:18,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1038553.3333333334, ans=0.125 2023-11-20 09:50:20,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1038553.3333333334, ans=0.1 2023-11-20 09:50:22,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1038553.3333333334, ans=0.125 2023-11-20 09:50:31,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-20 09:50:32,905 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155800 2023-11-20 09:50:37,236 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:50:47,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.235e+01 8.586e+01 9.090e+01 1.242e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 09:50:59,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-11-20 09:51:18,153 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11550, loss[loss=0.08734, simple_loss=0.1067, pruned_loss=0.02067, audio_tagging_loss=0.01334, over 15450.00 frames. ], tot_loss[loss=0.08015, simple_loss=0.101, pruned_loss=0.01986, audio_tagging_loss=0.009802, over 3039812.93 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:51:24,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1038886.6666666666, ans=0.2 2023-11-20 09:51:30,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1038953.3333333334, ans=0.1 2023-11-20 09:51:36,545 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155850 2023-11-20 09:51:51,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1039020.0, ans=0.125 2023-11-20 09:51:54,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1039020.0, ans=0.2 2023-11-20 09:51:56,506 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:52:00,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1039086.6666666666, ans=0.125 2023-11-20 09:52:00,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1039086.6666666666, ans=0.09899494936611666 2023-11-20 09:52:05,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1039086.6666666666, ans=0.125 2023-11-20 09:52:05,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1039086.6666666666, ans=0.125 2023-11-20 09:52:14,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1039153.3333333334, ans=0.125 2023-11-20 09:52:21,328 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11600, loss[loss=0.07588, simple_loss=0.08209, pruned_loss=0.02182, audio_tagging_loss=0.01302, over 14388.00 frames. ], tot_loss[loss=0.07994, simple_loss=0.1005, pruned_loss=0.01979, audio_tagging_loss=0.009905, over 3045193.78 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:52:21,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1039220.0, ans=0.1 2023-11-20 09:52:22,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1039220.0, ans=0.125 2023-11-20 09:52:41,571 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155900 2023-11-20 09:52:45,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1039286.6666666666, ans=0.125 2023-11-20 09:52:56,403 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.203e+01 8.943e+01 9.744e+01 1.251e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 09:53:24,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1039553.3333333334, ans=0.0 2023-11-20 09:53:25,674 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11650, loss[loss=0.0841, simple_loss=0.101, pruned_loss=0.02051, audio_tagging_loss=0.01309, over 14448.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.1005, pruned_loss=0.01976, audio_tagging_loss=0.009992, over 3047260.47 frames. ], batch size: 55, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:53:29,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1039553.3333333334, ans=0.125 2023-11-20 09:53:33,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1039553.3333333334, ans=0.125 2023-11-20 09:53:45,328 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 155950 2023-11-20 09:53:56,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1039686.6666666666, ans=0.0 2023-11-20 09:54:03,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1039753.3333333334, ans=0.0 2023-11-20 09:54:24,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1039820.0, ans=0.125 2023-11-20 09:54:30,840 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11700, loss[loss=0.0777, simple_loss=0.1093, pruned_loss=0.01527, audio_tagging_loss=0.007808, over 15717.00 frames. ], tot_loss[loss=0.07968, simple_loss=0.1003, pruned_loss=0.0196, audio_tagging_loss=0.009939, over 3048002.62 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:54:38,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1039886.6666666666, ans=0.125 2023-11-20 09:54:46,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1039953.3333333334, ans=0.125 2023-11-20 09:54:49,305 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156000 2023-11-20 09:54:58,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1040020.0, ans=0.0 2023-11-20 09:55:09,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.314e+01 9.144e+01 1.029e+02 1.424e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 09:55:19,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1040086.6666666666, ans=0.0 2023-11-20 09:55:33,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1040153.3333333334, ans=0.2 2023-11-20 09:55:38,365 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11750, loss[loss=0.08623, simple_loss=0.1011, pruned_loss=0.02648, audio_tagging_loss=0.009196, over 14085.00 frames. ], tot_loss[loss=0.0797, simple_loss=0.09998, pruned_loss=0.01974, audio_tagging_loss=0.009979, over 3045079.64 frames. ], batch size: 53, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:55:41,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1040220.0, ans=0.0 2023-11-20 09:55:49,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1040220.0, ans=0.0 2023-11-20 09:55:58,509 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156050 2023-11-20 09:56:09,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.10 vs. limit=10.0 2023-11-20 09:56:42,572 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11800, loss[loss=0.0669, simple_loss=0.07463, pruned_loss=0.01868, audio_tagging_loss=0.0109, over 14901.00 frames. ], tot_loss[loss=0.07966, simple_loss=0.1001, pruned_loss=0.01962, audio_tagging_loss=0.009988, over 3049054.76 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:56:45,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1040553.3333333334, ans=0.1 2023-11-20 09:56:57,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-20 09:57:02,653 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156100 2023-11-20 09:57:17,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.238e+01 8.781e+01 9.493e+01 1.182e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 09:57:46,289 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11850, loss[loss=0.07368, simple_loss=0.09231, pruned_loss=0.01684, audio_tagging_loss=0.01068, over 15958.00 frames. ], tot_loss[loss=0.08004, simple_loss=0.1005, pruned_loss=0.01972, audio_tagging_loss=0.01005, over 3046179.52 frames. ], batch size: 60, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:58:05,472 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156150 2023-11-20 09:58:20,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1041020.0, ans=0.125 2023-11-20 09:58:38,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1041153.3333333334, ans=0.2 2023-11-20 09:58:50,141 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11900, loss[loss=0.07702, simple_loss=0.09312, pruned_loss=0.01782, audio_tagging_loss=0.01263, over 15791.00 frames. ], tot_loss[loss=0.07947, simple_loss=0.09926, pruned_loss=0.01956, audio_tagging_loss=0.01027, over 3048049.53 frames. ], batch size: 58, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 09:59:09,374 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156200 2023-11-20 09:59:19,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1041353.3333333334, ans=0.5 2023-11-20 09:59:24,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1041353.3333333334, ans=0.125 2023-11-20 09:59:25,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.397e+01 8.078e+01 8.558e+01 9.294e+01 1.166e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 09:59:38,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-20 09:59:51,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1041486.6666666666, ans=0.125 2023-11-20 09:59:54,109 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 11950, loss[loss=0.09741, simple_loss=0.1216, pruned_loss=0.02655, audio_tagging_loss=0.01004, over 15602.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.09946, pruned_loss=0.01956, audio_tagging_loss=0.01041, over 3051864.54 frames. ], batch size: 56, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 10:00:05,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-20 10:00:07,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1041620.0, ans=0.07 2023-11-20 10:00:14,334 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156250 2023-11-20 10:00:29,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1041686.6666666666, ans=0.125 2023-11-20 10:00:42,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1041753.3333333334, ans=0.1 2023-11-20 10:00:46,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1041820.0, ans=0.05 2023-11-20 10:00:47,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1041820.0, ans=0.125 2023-11-20 10:00:56,372 INFO [train_asr.py:1262] (2/4) Epoch 13, batch 12000, loss[loss=0.04711, simple_loss=0.0542, pruned_loss=0.01007, audio_tagging_loss=0.009939, over 15631.00 frames. ], tot_loss[loss=0.07919, simple_loss=0.09912, pruned_loss=0.01918, audio_tagging_loss=0.01044, over 3054050.80 frames. ], batch size: 62, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 10:00:56,372 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 10:01:16,390 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8088, 4.4368, 3.8340, 4.3912], device='cuda:2') 2023-11-20 10:01:28,103 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1697, 4.1492, 4.3860, 4.4725], device='cuda:2') 2023-11-20 10:01:36,809 INFO [train_asr.py:1294] (2/4) Epoch 13, validation: loss=0.0624, simple_loss=0.05383, pruned_loss=0.00582, audio_tagging_loss=0.02967, over 4681554.00 frames. 2023-11-20 10:01:36,810 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 10:01:43,897 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:01:45,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1041886.6666666666, ans=0.05 2023-11-20 10:01:54,251 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156300 2023-11-20 10:02:02,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1042020.0, ans=0.2 2023-11-20 10:02:41,377 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 0, loss[loss=0.09807, simple_loss=0.1098, pruned_loss=0.02074, audio_tagging_loss=0.02242, over 13947.00 frames. ], tot_loss[loss=0.09807, simple_loss=0.1098, pruned_loss=0.02074, audio_tagging_loss=0.02242, over 13947.00 frames. ], batch size: 53, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:02:41,378 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 10:03:13,590 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9991, 3.1962, 2.9207, 3.1029, 3.4925, 2.7068, 3.3921, 2.6817], device='cuda:2') 2023-11-20 10:03:18,489 INFO [train_asr.py:1294] (2/4) Epoch 14, validation: loss=0.0621, simple_loss=0.05383, pruned_loss=0.005845, audio_tagging_loss=0.02934, over 4681554.00 frames. 2023-11-20 10:03:18,490 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 10:03:22,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.326e+01 8.983e+01 9.877e+01 1.645e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 10:03:22,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1042046.6666666666, ans=0.2 2023-11-20 10:03:48,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1042180.0, ans=0.125 2023-11-20 10:04:03,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1042246.6666666666, ans=0.125 2023-11-20 10:04:12,145 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156350 2023-11-20 10:04:13,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1042313.3333333334, ans=0.09899494936611666 2023-11-20 10:04:23,735 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 50, loss[loss=0.08417, simple_loss=0.09571, pruned_loss=0.01498, audio_tagging_loss=0.02133, over 16026.00 frames. ], tot_loss[loss=0.09133, simple_loss=0.1038, pruned_loss=0.02046, audio_tagging_loss=0.01898, over 687728.82 frames. ], batch size: 58, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:04:27,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1042380.0, ans=0.0 2023-11-20 10:04:42,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1042446.6666666666, ans=0.125 2023-11-20 10:04:55,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-20 10:05:03,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1042580.0, ans=0.1 2023-11-20 10:05:07,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=8.0 2023-11-20 10:05:16,911 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156400 2023-11-20 10:05:18,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1042646.6666666666, ans=0.0 2023-11-20 10:05:24,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1042646.6666666666, ans=0.125 2023-11-20 10:05:29,531 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 100, loss[loss=0.08327, simple_loss=0.1032, pruned_loss=0.01487, audio_tagging_loss=0.01679, over 14629.00 frames. ], tot_loss[loss=0.09089, simple_loss=0.1037, pruned_loss=0.02057, audio_tagging_loss=0.01849, over 1210991.10 frames. ], batch size: 54, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:05:33,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.681e+01 9.274e+01 1.011e+02 1.384e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-20 10:05:33,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1042713.3333333334, ans=0.125 2023-11-20 10:06:03,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1042846.6666666666, ans=0.125 2023-11-20 10:06:21,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1042980.0, ans=0.09899494936611666 2023-11-20 10:06:22,027 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156450 2023-11-20 10:06:33,121 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 150, loss[loss=0.1231, simple_loss=0.1486, pruned_loss=0.03467, audio_tagging_loss=0.0141, over 15426.00 frames. ], tot_loss[loss=0.08784, simple_loss=0.1019, pruned_loss=0.02023, audio_tagging_loss=0.01665, over 1616175.65 frames. ], batch size: 58, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:06:41,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1043046.6666666666, ans=0.025 2023-11-20 10:07:08,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1043180.0, ans=0.125 2023-11-20 10:07:09,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1043180.0, ans=0.125 2023-11-20 10:07:12,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1043246.6666666666, ans=0.125 2023-11-20 10:07:25,487 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-20 10:07:27,245 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156500 2023-11-20 10:07:35,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=22.5 2023-11-20 10:07:36,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2023-11-20 10:07:38,217 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 200, loss[loss=0.05854, simple_loss=0.07486, pruned_loss=0.01055, audio_tagging_loss=0.01057, over 15578.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1007, pruned_loss=0.01992, audio_tagging_loss=0.01477, over 1938504.14 frames. ], batch size: 58, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:07:42,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.224e+01 9.022e+01 9.818e+01 1.305e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 10:08:14,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1043513.3333333334, ans=0.0 2023-11-20 10:08:19,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1043580.0, ans=0.0 2023-11-20 10:08:27,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1043580.0, ans=0.125 2023-11-20 10:08:31,961 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156550 2023-11-20 10:08:35,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1043646.6666666666, ans=0.125 2023-11-20 10:08:43,615 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 250, loss[loss=0.09529, simple_loss=0.1198, pruned_loss=0.02422, audio_tagging_loss=0.01118, over 14511.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.102, pruned_loss=0.02004, audio_tagging_loss=0.01322, over 2186232.04 frames. ], batch size: 54, lr: 5.02e-03, grad_scale: 16.0 2023-11-20 10:09:07,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1043780.0, ans=0.125 2023-11-20 10:09:09,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2023-11-20 10:09:14,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2023-11-20 10:09:15,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-20 10:09:21,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1043913.3333333334, ans=0.0 2023-11-20 10:09:35,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1043980.0, ans=0.0 2023-11-20 10:09:37,017 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156600 2023-11-20 10:09:48,909 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 300, loss[loss=0.06354, simple_loss=0.07896, pruned_loss=0.01202, audio_tagging_loss=0.01205, over 14665.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1015, pruned_loss=0.01989, audio_tagging_loss=0.0123, over 2374087.84 frames. ], batch size: 56, lr: 5.02e-03, grad_scale: 16.0 2023-11-20 10:09:54,297 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.223e+01 8.932e+01 9.585e+01 1.475e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 10:10:01,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1044113.3333333334, ans=0.0 2023-11-20 10:10:18,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1044180.0, ans=0.0 2023-11-20 10:10:42,176 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156650 2023-11-20 10:10:46,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1044313.3333333334, ans=10.0 2023-11-20 10:10:53,861 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 350, loss[loss=0.1092, simple_loss=0.1455, pruned_loss=0.02642, audio_tagging_loss=0.01002, over 14734.00 frames. ], tot_loss[loss=0.08274, simple_loss=0.1022, pruned_loss=0.02004, audio_tagging_loss=0.01162, over 2526514.89 frames. ], batch size: 54, lr: 5.02e-03, grad_scale: 4.0 2023-11-20 10:11:06,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1044446.6666666666, ans=0.125 2023-11-20 10:11:13,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1044446.6666666666, ans=0.125 2023-11-20 10:11:18,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1044513.3333333334, ans=0.125 2023-11-20 10:11:20,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2023-11-20 10:11:21,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1044513.3333333334, ans=0.125 2023-11-20 10:11:30,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1044513.3333333334, ans=0.07 2023-11-20 10:11:30,442 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:11:46,514 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156700 2023-11-20 10:11:58,254 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 400, loss[loss=0.08776, simple_loss=0.1093, pruned_loss=0.02328, audio_tagging_loss=0.009823, over 15417.00 frames. ], tot_loss[loss=0.08226, simple_loss=0.1022, pruned_loss=0.02, audio_tagging_loss=0.01116, over 2640657.18 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:12:06,229 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.326e+01 8.879e+01 9.512e+01 2.019e+02, threshold=1.776e+02, percent-clipped=1.0 2023-11-20 10:12:14,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1044780.0, ans=0.125 2023-11-20 10:12:14,568 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:12:15,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1044780.0, ans=0.125 2023-11-20 10:12:33,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1044846.6666666666, ans=0.0 2023-11-20 10:12:45,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1044913.3333333334, ans=10.0 2023-11-20 10:12:45,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1044913.3333333334, ans=0.125 2023-11-20 10:12:51,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1044980.0, ans=0.025 2023-11-20 10:12:52,211 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156750 2023-11-20 10:12:57,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1044980.0, ans=0.2 2023-11-20 10:13:03,932 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 450, loss[loss=0.0835, simple_loss=0.1151, pruned_loss=0.01772, audio_tagging_loss=0.008251, over 15123.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1028, pruned_loss=0.02013, audio_tagging_loss=0.01083, over 2728389.58 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:13:08,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1045046.6666666666, ans=0.07 2023-11-20 10:13:41,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1045180.0, ans=0.1 2023-11-20 10:13:46,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1045246.6666666666, ans=0.5 2023-11-20 10:13:57,626 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156800 2023-11-20 10:14:09,376 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 500, loss[loss=0.05526, simple_loss=0.06625, pruned_loss=0.01104, audio_tagging_loss=0.0111, over 15442.00 frames. ], tot_loss[loss=0.08174, simple_loss=0.1021, pruned_loss=0.02009, audio_tagging_loss=0.01061, over 2801848.55 frames. ], batch size: 58, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:14:16,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.307e+01 8.961e+01 9.765e+01 1.460e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 10:14:34,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1045513.3333333334, ans=0.125 2023-11-20 10:14:36,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1045513.3333333334, ans=0.1 2023-11-20 10:14:47,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-20 10:14:51,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1045580.0, ans=0.125 2023-11-20 10:15:02,748 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156850 2023-11-20 10:15:02,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1045646.6666666666, ans=0.09899494936611666 2023-11-20 10:15:14,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-20 10:15:14,481 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 550, loss[loss=0.08346, simple_loss=0.1091, pruned_loss=0.02002, audio_tagging_loss=0.008885, over 16646.00 frames. ], tot_loss[loss=0.0811, simple_loss=0.1013, pruned_loss=0.01994, audio_tagging_loss=0.01049, over 2850677.43 frames. ], batch size: 63, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:15:18,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1045713.3333333334, ans=0.125 2023-11-20 10:16:04,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2023-11-20 10:16:08,717 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156900 2023-11-20 10:16:17,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1045980.0, ans=0.125 2023-11-20 10:16:19,730 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 600, loss[loss=0.06097, simple_loss=0.07248, pruned_loss=0.01151, audio_tagging_loss=0.01323, over 14710.00 frames. ], tot_loss[loss=0.08015, simple_loss=0.1001, pruned_loss=0.01963, audio_tagging_loss=0.01047, over 2896595.65 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:16:21,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1046046.6666666666, ans=0.5 2023-11-20 10:16:27,236 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 7.933e+01 8.592e+01 9.443e+01 1.249e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-20 10:16:27,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1046046.6666666666, ans=0.2 2023-11-20 10:16:27,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1046046.6666666666, ans=0.0 2023-11-20 10:16:38,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1046113.3333333334, ans=0.2 2023-11-20 10:16:39,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1046113.3333333334, ans=0.125 2023-11-20 10:17:13,201 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 156950 2023-11-20 10:17:20,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1046313.3333333334, ans=0.09899494936611666 2023-11-20 10:17:24,843 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 650, loss[loss=0.09775, simple_loss=0.1164, pruned_loss=0.02759, audio_tagging_loss=0.01198, over 15962.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1002, pruned_loss=0.01951, audio_tagging_loss=0.01039, over 2936665.54 frames. ], batch size: 58, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:17:54,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1046513.3333333334, ans=0.125 2023-11-20 10:18:08,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1046580.0, ans=0.1 2023-11-20 10:18:10,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046580.0, ans=0.1 2023-11-20 10:18:11,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1046580.0, ans=0.125 2023-11-20 10:18:16,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1046646.6666666666, ans=0.125 2023-11-20 10:18:18,344 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157000 2023-11-20 10:18:30,371 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 700, loss[loss=0.05937, simple_loss=0.07506, pruned_loss=0.0133, audio_tagging_loss=0.008538, over 15376.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1006, pruned_loss=0.01971, audio_tagging_loss=0.01032, over 2954156.11 frames. ], batch size: 59, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:18:38,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.472e+01 9.225e+01 1.029e+02 2.197e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-20 10:18:42,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1046780.0, ans=0.125 2023-11-20 10:18:45,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1046780.0, ans=0.04949747468305833 2023-11-20 10:18:48,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1046780.0, ans=0.125 2023-11-20 10:18:49,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1046780.0, ans=0.125 2023-11-20 10:18:52,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1046780.0, ans=0.125 2023-11-20 10:19:23,869 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157050 2023-11-20 10:19:27,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1046980.0, ans=0.125 2023-11-20 10:19:35,563 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 750, loss[loss=0.08217, simple_loss=0.1031, pruned_loss=0.018, audio_tagging_loss=0.01263, over 14386.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1008, pruned_loss=0.01978, audio_tagging_loss=0.01036, over 2984609.79 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:19:39,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1047046.6666666666, ans=0.125 2023-11-20 10:19:42,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1047046.6666666666, ans=0.0 2023-11-20 10:19:44,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1047046.6666666666, ans=0.0 2023-11-20 10:20:21,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1047246.6666666666, ans=0.125 2023-11-20 10:20:29,174 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157100 2023-11-20 10:20:40,827 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 800, loss[loss=0.06665, simple_loss=0.08736, pruned_loss=0.01403, audio_tagging_loss=0.008937, over 14695.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1007, pruned_loss=0.01965, audio_tagging_loss=0.01028, over 2997813.63 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:20:49,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.237e+01 8.953e+01 9.687e+01 1.353e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 10:21:08,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1047513.3333333334, ans=0.125 2023-11-20 10:21:13,240 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:21:34,763 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157150 2023-11-20 10:21:44,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1047646.6666666666, ans=0.2 2023-11-20 10:21:46,960 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 850, loss[loss=0.08094, simple_loss=0.09918, pruned_loss=0.02102, audio_tagging_loss=0.01033, over 15618.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.1011, pruned_loss=0.01969, audio_tagging_loss=0.01031, over 3008113.73 frames. ], batch size: 59, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:22:05,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1047780.0, ans=0.125 2023-11-20 10:22:16,033 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:22:22,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1047846.6666666666, ans=0.1 2023-11-20 10:22:28,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-20 10:22:39,782 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157200 2023-11-20 10:22:39,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1047980.0, ans=0.0 2023-11-20 10:22:46,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-11-20 10:22:48,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1047980.0, ans=0.0 2023-11-20 10:22:48,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-11-20 10:22:51,806 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 900, loss[loss=0.06247, simple_loss=0.06125, pruned_loss=0.01562, audio_tagging_loss=0.01622, over 16541.00 frames. ], tot_loss[loss=0.08035, simple_loss=0.1004, pruned_loss=0.01968, audio_tagging_loss=0.01045, over 3017604.26 frames. ], batch size: 64, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:22:59,170 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.565e+01 8.407e+01 9.404e+01 1.035e+02 1.329e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-20 10:23:33,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1048246.6666666666, ans=0.125 2023-11-20 10:23:40,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1048246.6666666666, ans=0.125 2023-11-20 10:23:45,681 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157250 2023-11-20 10:23:45,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048313.3333333334, ans=0.1 2023-11-20 10:23:52,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1048313.3333333334, ans=0.125 2023-11-20 10:23:57,289 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 950, loss[loss=0.06405, simple_loss=0.0767, pruned_loss=0.0155, audio_tagging_loss=0.0102, over 16006.00 frames. ], tot_loss[loss=0.08021, simple_loss=0.1005, pruned_loss=0.01963, audio_tagging_loss=0.01031, over 3026515.03 frames. ], batch size: 62, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:23:59,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1048380.0, ans=0.125 2023-11-20 10:24:05,566 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=8.0 2023-11-20 10:24:06,552 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2023-11-20 10:24:07,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1048380.0, ans=0.09899494936611666 2023-11-20 10:24:12,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1048446.6666666666, ans=0.0 2023-11-20 10:24:15,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2023-11-20 10:24:31,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1048513.3333333334, ans=0.1 2023-11-20 10:24:35,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1048580.0, ans=0.0 2023-11-20 10:24:50,741 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157300 2023-11-20 10:24:57,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1048646.6666666667, ans=0.0 2023-11-20 10:25:02,398 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1000, loss[loss=0.0791, simple_loss=0.106, pruned_loss=0.01986, audio_tagging_loss=0.006232, over 16027.00 frames. ], tot_loss[loss=0.07897, simple_loss=0.0992, pruned_loss=0.01925, audio_tagging_loss=0.01012, over 3029118.78 frames. ], batch size: 60, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:25:06,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1048713.3333333333, ans=0.2 2023-11-20 10:25:08,322 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:25:09,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1048713.3333333333, ans=0.125 2023-11-20 10:25:10,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.129e+01 7.769e+01 8.550e+01 9.087e+01 1.228e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-20 10:25:11,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=22.5 2023-11-20 10:25:12,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=12.0 2023-11-20 10:25:17,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1048780.0, ans=0.2 2023-11-20 10:25:17,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1048780.0, ans=0.1 2023-11-20 10:25:29,936 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:25:35,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048846.6666666667, ans=0.1 2023-11-20 10:25:42,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.12 vs. limit=10.0 2023-11-20 10:25:49,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1048913.3333333333, ans=0.0 2023-11-20 10:25:52,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1048913.3333333333, ans=0.09899494936611666 2023-11-20 10:25:56,417 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157350 2023-11-20 10:26:08,088 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1050, loss[loss=0.05967, simple_loss=0.06808, pruned_loss=0.01722, audio_tagging_loss=0.008403, over 14685.00 frames. ], tot_loss[loss=0.07904, simple_loss=0.09943, pruned_loss=0.01928, audio_tagging_loss=0.01004, over 3028166.63 frames. ], batch size: 56, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:26:17,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-11-20 10:26:39,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1049180.0, ans=0.125 2023-11-20 10:26:43,652 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2023-11-20 10:26:52,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1049246.6666666667, ans=0.125 2023-11-20 10:27:01,908 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157400 2023-11-20 10:27:13,309 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1100, loss[loss=0.07563, simple_loss=0.09893, pruned_loss=0.01787, audio_tagging_loss=0.008294, over 15572.00 frames. ], tot_loss[loss=0.0788, simple_loss=0.09889, pruned_loss=0.01932, audio_tagging_loss=0.01004, over 3029023.63 frames. ], batch size: 61, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:27:17,877 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:27:21,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.064e+01 8.696e+01 9.715e+01 1.230e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 10:27:40,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1049513.3333333333, ans=0.0 2023-11-20 10:27:58,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1049580.0, ans=0.125 2023-11-20 10:28:07,901 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157450 2023-11-20 10:28:19,096 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1150, loss[loss=0.08064, simple_loss=0.1056, pruned_loss=0.02081, audio_tagging_loss=0.00703, over 15394.00 frames. ], tot_loss[loss=0.07897, simple_loss=0.09928, pruned_loss=0.01937, audio_tagging_loss=0.009964, over 3030127.45 frames. ], batch size: 59, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:28:21,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-11-20 10:28:37,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1049780.0, ans=0.0 2023-11-20 10:28:46,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1049846.6666666667, ans=0.1 2023-11-20 10:29:14,208 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157500 2023-11-20 10:29:26,551 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1200, loss[loss=0.05577, simple_loss=0.06146, pruned_loss=0.01132, audio_tagging_loss=0.01372, over 15342.00 frames. ], tot_loss[loss=0.07925, simple_loss=0.09954, pruned_loss=0.0195, audio_tagging_loss=0.009971, over 3031407.95 frames. ], batch size: 58, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:29:33,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.234e+01 8.296e+01 8.818e+01 9.703e+01 1.332e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 10:29:41,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1050113.3333333333, ans=0.0 2023-11-20 10:29:48,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1050113.3333333333, ans=0.0 2023-11-20 10:30:06,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1050246.6666666667, ans=0.2 2023-11-20 10:30:07,791 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2023-11-20 10:30:10,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1050246.6666666667, ans=0.125 2023-11-20 10:30:19,791 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157550 2023-11-20 10:30:31,202 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1250, loss[loss=0.06964, simple_loss=0.0857, pruned_loss=0.01608, audio_tagging_loss=0.0107, over 15721.00 frames. ], tot_loss[loss=0.07945, simple_loss=0.09965, pruned_loss=0.01963, audio_tagging_loss=0.009995, over 3034460.44 frames. ], batch size: 60, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:30:47,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1050446.6666666667, ans=0.125 2023-11-20 10:30:52,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1050446.6666666667, ans=0.125 2023-11-20 10:30:57,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1050513.3333333333, ans=0.0 2023-11-20 10:31:17,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-20 10:31:21,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1050646.6666666667, ans=0.125 2023-11-20 10:31:24,016 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157600 2023-11-20 10:31:26,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1050646.6666666667, ans=0.125 2023-11-20 10:31:28,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1050646.6666666667, ans=0.2 2023-11-20 10:31:30,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-20 10:31:35,680 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1300, loss[loss=0.0799, simple_loss=0.1022, pruned_loss=0.02086, audio_tagging_loss=0.00795, over 15363.00 frames. ], tot_loss[loss=0.07907, simple_loss=0.09947, pruned_loss=0.01944, audio_tagging_loss=0.009901, over 3030754.99 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:31:35,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1050713.3333333333, ans=0.0 2023-11-20 10:31:36,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-20 10:31:43,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 8.044e+01 8.712e+01 9.271e+01 1.350e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 10:31:53,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050780.0, ans=0.1 2023-11-20 10:31:53,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1050780.0, ans=0.125 2023-11-20 10:31:53,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1050780.0, ans=0.125 2023-11-20 10:32:14,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2023-11-20 10:32:29,177 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157650 2023-11-20 10:32:37,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1050980.0, ans=0.125 2023-11-20 10:32:40,755 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1350, loss[loss=0.07862, simple_loss=0.08367, pruned_loss=0.02336, audio_tagging_loss=0.01342, over 15889.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.09938, pruned_loss=0.01941, audio_tagging_loss=0.009951, over 3037734.21 frames. ], batch size: 65, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:32:50,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1051046.6666666667, ans=0.1 2023-11-20 10:32:51,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1051046.6666666667, ans=0.025 2023-11-20 10:33:00,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1051113.3333333333, ans=0.09899494936611666 2023-11-20 10:33:26,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1051246.6666666667, ans=0.2 2023-11-20 10:33:28,894 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:33:34,523 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157700 2023-11-20 10:33:46,192 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1400, loss[loss=0.095, simple_loss=0.1158, pruned_loss=0.02525, audio_tagging_loss=0.01187, over 15235.00 frames. ], tot_loss[loss=0.07908, simple_loss=0.09933, pruned_loss=0.01939, audio_tagging_loss=0.01003, over 3036507.03 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:33:48,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1051380.0, ans=0.125 2023-11-20 10:33:55,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.246e+01 8.256e+01 8.950e+01 9.563e+01 1.336e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 10:33:56,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1051380.0, ans=0.2 2023-11-20 10:34:07,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1051446.6666666667, ans=0.2 2023-11-20 10:34:09,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-11-20 10:34:21,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-11-20 10:34:35,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1051580.0, ans=0.125 2023-11-20 10:34:38,678 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157750 2023-11-20 10:34:39,259 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2023-11-20 10:34:45,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1051646.6666666667, ans=0.2 2023-11-20 10:34:49,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1051713.3333333333, ans=0.0 2023-11-20 10:34:50,303 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1450, loss[loss=0.08004, simple_loss=0.09141, pruned_loss=0.02401, audio_tagging_loss=0.01032, over 15452.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.09967, pruned_loss=0.01961, audio_tagging_loss=0.01012, over 3038534.80 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:35:05,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=22.5 2023-11-20 10:35:10,685 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2023-11-20 10:35:24,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1051846.6666666667, ans=0.2 2023-11-20 10:35:32,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1051913.3333333333, ans=0.125 2023-11-20 10:35:43,453 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157800 2023-11-20 10:35:43,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1051980.0, ans=0.0 2023-11-20 10:35:43,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-20 10:35:55,199 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-20 10:35:55,545 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1500, loss[loss=0.08072, simple_loss=0.1, pruned_loss=0.01931, audio_tagging_loss=0.01138, over 15311.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.1001, pruned_loss=0.01955, audio_tagging_loss=0.01016, over 3038003.72 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:36:04,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.201e+01 8.884e+01 9.746e+01 1.400e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 10:36:09,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1052113.3333333333, ans=0.07 2023-11-20 10:36:11,213 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2023-11-20 10:36:13,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1052113.3333333333, ans=0.125 2023-11-20 10:36:15,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1052113.3333333333, ans=0.2 2023-11-20 10:36:24,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1052180.0, ans=0.125 2023-11-20 10:36:27,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1052180.0, ans=0.1 2023-11-20 10:36:38,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1052246.6666666667, ans=0.125 2023-11-20 10:36:50,127 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157850 2023-11-20 10:36:52,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2023-11-20 10:37:01,345 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1550, loss[loss=0.08762, simple_loss=0.1126, pruned_loss=0.02078, audio_tagging_loss=0.01057, over 13763.00 frames. ], tot_loss[loss=0.07994, simple_loss=0.1002, pruned_loss=0.01966, audio_tagging_loss=0.01016, over 3040072.00 frames. ], batch size: 53, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:37:20,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2023-11-20 10:37:35,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052513.3333333333, ans=0.1 2023-11-20 10:37:53,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1052646.6666666667, ans=0.125 2023-11-20 10:37:55,495 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157900 2023-11-20 10:37:58,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1052646.6666666667, ans=0.2 2023-11-20 10:38:03,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1052646.6666666667, ans=0.125 2023-11-20 10:38:07,072 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1600, loss[loss=0.08387, simple_loss=0.09394, pruned_loss=0.0206, audio_tagging_loss=0.0163, over 13614.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.09986, pruned_loss=0.0196, audio_tagging_loss=0.01022, over 3039096.26 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:38:15,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.476e+01 8.170e+01 8.883e+01 9.558e+01 1.180e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 10:38:35,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2023-11-20 10:38:40,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1052846.6666666667, ans=0.2 2023-11-20 10:38:41,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1052846.6666666667, ans=0.125 2023-11-20 10:38:48,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1052913.3333333333, ans=0.125 2023-11-20 10:39:00,484 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 157950 2023-11-20 10:39:12,303 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1650, loss[loss=0.08227, simple_loss=0.1015, pruned_loss=0.02304, audio_tagging_loss=0.008504, over 15061.00 frames. ], tot_loss[loss=0.08021, simple_loss=0.1003, pruned_loss=0.01986, audio_tagging_loss=0.01021, over 3038358.11 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:39:17,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1053046.6666666667, ans=0.125 2023-11-20 10:39:49,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-20 10:39:58,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1053246.6666666667, ans=0.125 2023-11-20 10:40:02,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1053246.6666666667, ans=0.04949747468305833 2023-11-20 10:40:06,102 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158000 2023-11-20 10:40:17,621 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1700, loss[loss=0.06558, simple_loss=0.07784, pruned_loss=0.01536, audio_tagging_loss=0.01129, over 15669.00 frames. ], tot_loss[loss=0.08036, simple_loss=0.1007, pruned_loss=0.01987, audio_tagging_loss=0.01015, over 3039393.34 frames. ], batch size: 60, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:40:24,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1053380.0, ans=0.125 2023-11-20 10:40:26,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.158e+01 8.615e+01 9.243e+01 1.140e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 10:41:00,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1053580.0, ans=0.125 2023-11-20 10:41:10,247 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158050 2023-11-20 10:41:12,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1053646.6666666667, ans=0.1 2023-11-20 10:41:22,432 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1750, loss[loss=0.06035, simple_loss=0.0805, pruned_loss=0.01256, audio_tagging_loss=0.007545, over 14307.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1006, pruned_loss=0.01978, audio_tagging_loss=0.01007, over 3047001.81 frames. ], batch size: 55, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:41:30,008 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:41:37,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1053780.0, ans=0.1 2023-11-20 10:42:13,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1053980.0, ans=0.2 2023-11-20 10:42:15,703 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158100 2023-11-20 10:42:27,194 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1800, loss[loss=0.07516, simple_loss=0.09615, pruned_loss=0.01925, audio_tagging_loss=0.007844, over 15072.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.1005, pruned_loss=0.01949, audio_tagging_loss=0.00993, over 3059067.84 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:42:29,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=12.0 2023-11-20 10:42:37,567 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.074e+01 8.907e+01 9.490e+01 1.284e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 10:42:55,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1054180.0, ans=0.125 2023-11-20 10:43:01,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1054180.0, ans=0.1 2023-11-20 10:43:10,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2023-11-20 10:43:20,190 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158150 2023-11-20 10:43:24,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1054313.3333333333, ans=15.0 2023-11-20 10:43:31,658 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1850, loss[loss=0.0739, simple_loss=0.09441, pruned_loss=0.01635, audio_tagging_loss=0.01034, over 15467.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1011, pruned_loss=0.01994, audio_tagging_loss=0.009956, over 3061553.21 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:43:51,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1054446.6666666667, ans=0.1 2023-11-20 10:43:53,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1054446.6666666667, ans=0.0 2023-11-20 10:43:57,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1054513.3333333333, ans=0.125 2023-11-20 10:44:09,706 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:44:15,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1054580.0, ans=0.2 2023-11-20 10:44:25,141 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158200 2023-11-20 10:44:37,036 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1900, loss[loss=0.07439, simple_loss=0.09505, pruned_loss=0.019, audio_tagging_loss=0.007859, over 14193.00 frames. ], tot_loss[loss=0.08004, simple_loss=0.1006, pruned_loss=0.0198, audio_tagging_loss=0.00994, over 3058101.80 frames. ], batch size: 53, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:44:41,760 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:44:47,498 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.196e+01 8.901e+01 9.660e+01 1.214e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 10:45:07,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1054846.6666666667, ans=0.125 2023-11-20 10:45:13,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1054846.6666666667, ans=0.0 2023-11-20 10:45:30,919 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158250 2023-11-20 10:45:37,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1054980.0, ans=0.125 2023-11-20 10:45:42,677 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 1950, loss[loss=0.07352, simple_loss=0.09138, pruned_loss=0.01572, audio_tagging_loss=0.01211, over 14566.00 frames. ], tot_loss[loss=0.07972, simple_loss=0.1002, pruned_loss=0.01964, audio_tagging_loss=0.009978, over 3048827.33 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:45:46,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1055046.6666666667, ans=0.125 2023-11-20 10:45:57,469 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:46:26,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1055246.6666666667, ans=0.0 2023-11-20 10:46:36,378 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158300 2023-11-20 10:46:45,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1055313.3333333333, ans=0.125 2023-11-20 10:46:47,904 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2000, loss[loss=0.08157, simple_loss=0.09944, pruned_loss=0.02276, audio_tagging_loss=0.009085, over 15966.00 frames. ], tot_loss[loss=0.07888, simple_loss=0.09906, pruned_loss=0.01931, audio_tagging_loss=0.01004, over 3056892.68 frames. ], batch size: 58, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:46:57,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 8.029e+01 8.442e+01 9.197e+01 1.092e+02, threshold=1.688e+02, percent-clipped=0.0 2023-11-20 10:47:20,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1055513.3333333333, ans=0.125 2023-11-20 10:47:28,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1055580.0, ans=0.125 2023-11-20 10:47:36,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1055580.0, ans=0.125 2023-11-20 10:47:41,016 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158350 2023-11-20 10:47:52,038 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2050, loss[loss=0.09074, simple_loss=0.1226, pruned_loss=0.02329, audio_tagging_loss=0.006147, over 15453.00 frames. ], tot_loss[loss=0.07907, simple_loss=0.09929, pruned_loss=0.01942, audio_tagging_loss=0.01, over 3048838.85 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:48:29,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1055846.6666666667, ans=0.125 2023-11-20 10:48:44,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1055980.0, ans=0.0 2023-11-20 10:48:45,948 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158400 2023-11-20 10:48:46,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1055980.0, ans=0.125 2023-11-20 10:48:47,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1055980.0, ans=0.2 2023-11-20 10:48:58,055 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2100, loss[loss=0.08101, simple_loss=0.09535, pruned_loss=0.02065, audio_tagging_loss=0.01268, over 14360.00 frames. ], tot_loss[loss=0.07931, simple_loss=0.09978, pruned_loss=0.01955, audio_tagging_loss=0.009872, over 3044136.25 frames. ], batch size: 53, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:49:08,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.360e+01 8.882e+01 9.714e+01 1.219e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 10:49:35,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1056246.6666666667, ans=0.0 2023-11-20 10:49:51,945 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158450 2023-11-20 10:50:02,837 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2150, loss[loss=0.08605, simple_loss=0.09818, pruned_loss=0.02412, audio_tagging_loss=0.01283, over 15575.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.1001, pruned_loss=0.01967, audio_tagging_loss=0.00986, over 3045362.24 frames. ], batch size: 58, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:50:12,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1056380.0, ans=0.0 2023-11-20 10:50:34,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=15.0 2023-11-20 10:50:42,483 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:50:50,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=15.0 2023-11-20 10:50:54,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1056646.6666666667, ans=0.09899494936611666 2023-11-20 10:50:56,677 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158500 2023-11-20 10:51:07,673 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2200, loss[loss=0.05337, simple_loss=0.0624, pruned_loss=0.01131, audio_tagging_loss=0.01086, over 14147.00 frames. ], tot_loss[loss=0.07968, simple_loss=0.1002, pruned_loss=0.01972, audio_tagging_loss=0.00986, over 3053812.55 frames. ], batch size: 54, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:51:19,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.263e+01 8.832e+01 9.449e+01 1.423e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 10:51:19,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1056780.0, ans=0.0 2023-11-20 10:51:24,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1056780.0, ans=0.2 2023-11-20 10:51:35,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1056846.6666666667, ans=0.125 2023-11-20 10:51:35,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-20 10:51:40,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1056846.6666666667, ans=0.0 2023-11-20 10:51:45,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1056913.3333333333, ans=0.125 2023-11-20 10:51:49,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1056913.3333333333, ans=0.125 2023-11-20 10:51:55,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-20 10:52:00,275 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158550 2023-11-20 10:52:12,217 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2250, loss[loss=0.07377, simple_loss=0.08287, pruned_loss=0.02001, audio_tagging_loss=0.01232, over 16185.00 frames. ], tot_loss[loss=0.08024, simple_loss=0.1009, pruned_loss=0.01995, audio_tagging_loss=0.009852, over 3058561.22 frames. ], batch size: 61, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:52:12,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1057046.6666666667, ans=0.2 2023-11-20 10:52:15,335 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2023-11-20 10:52:36,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1057113.3333333333, ans=0.2 2023-11-20 10:52:45,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1057180.0, ans=0.0 2023-11-20 10:52:50,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1057246.6666666667, ans=0.125 2023-11-20 10:52:56,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1057246.6666666667, ans=0.125 2023-11-20 10:53:06,443 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158600 2023-11-20 10:53:12,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1057313.3333333333, ans=0.0 2023-11-20 10:53:18,559 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2300, loss[loss=0.1094, simple_loss=0.1183, pruned_loss=0.03532, audio_tagging_loss=0.01491, over 15748.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1011, pruned_loss=0.0201, audio_tagging_loss=0.01003, over 3059162.00 frames. ], batch size: 61, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:53:19,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2023-11-20 10:53:28,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1057380.0, ans=0.125 2023-11-20 10:53:31,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.157e+01 8.586e+01 9.219e+01 1.150e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 10:53:37,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1057446.6666666667, ans=0.125 2023-11-20 10:53:59,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1057580.0, ans=0.0 2023-11-20 10:54:13,242 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158650 2023-11-20 10:54:16,854 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:54:24,145 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2350, loss[loss=0.08954, simple_loss=0.1189, pruned_loss=0.02197, audio_tagging_loss=0.008108, over 16460.00 frames. ], tot_loss[loss=0.08027, simple_loss=0.1006, pruned_loss=0.01991, audio_tagging_loss=0.01005, over 3062710.07 frames. ], batch size: 59, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:54:28,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1057713.3333333333, ans=0.125 2023-11-20 10:54:39,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2023-11-20 10:54:39,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1057780.0, ans=0.025 2023-11-20 10:54:53,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1057846.6666666667, ans=0.0 2023-11-20 10:55:03,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1057913.3333333333, ans=0.5 2023-11-20 10:55:13,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-20 10:55:18,060 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158700 2023-11-20 10:55:29,609 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2400, loss[loss=0.07092, simple_loss=0.08955, pruned_loss=0.01611, audio_tagging_loss=0.01004, over 13882.00 frames. ], tot_loss[loss=0.08073, simple_loss=0.1016, pruned_loss=0.01993, audio_tagging_loss=0.009984, over 3064641.60 frames. ], batch size: 53, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:55:42,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.027e+01 8.593e+01 9.391e+01 2.644e+02, threshold=1.719e+02, percent-clipped=1.0 2023-11-20 10:55:49,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2023-11-20 10:55:54,595 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=12.0 2023-11-20 10:56:10,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1058246.6666666667, ans=0.1 2023-11-20 10:56:23,283 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158750 2023-11-20 10:56:35,189 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2450, loss[loss=0.07446, simple_loss=0.09512, pruned_loss=0.01549, audio_tagging_loss=0.01142, over 14802.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1016, pruned_loss=0.01976, audio_tagging_loss=0.01006, over 3060513.58 frames. ], batch size: 55, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:57:22,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1058580.0, ans=0.0 2023-11-20 10:57:24,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-20 10:57:25,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1058580.0, ans=0.09899494936611666 2023-11-20 10:57:29,302 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158800 2023-11-20 10:57:41,056 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2500, loss[loss=0.09039, simple_loss=0.1145, pruned_loss=0.02506, audio_tagging_loss=0.008083, over 16269.00 frames. ], tot_loss[loss=0.08059, simple_loss=0.1013, pruned_loss=0.01978, audio_tagging_loss=0.01014, over 3061876.60 frames. ], batch size: 58, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:57:51,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1058713.3333333333, ans=0.2 2023-11-20 10:57:54,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.094e+01 8.721e+01 9.744e+01 1.305e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 10:58:29,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-20 10:58:34,498 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158850 2023-11-20 10:58:46,285 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2550, loss[loss=0.08659, simple_loss=0.1126, pruned_loss=0.02234, audio_tagging_loss=0.007954, over 14819.00 frames. ], tot_loss[loss=0.08002, simple_loss=0.1005, pruned_loss=0.01966, audio_tagging_loss=0.0101, over 3055472.33 frames. ], batch size: 53, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:59:14,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1059180.0, ans=0.125 2023-11-20 10:59:35,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059246.6666666667, ans=0.1 2023-11-20 10:59:36,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1059246.6666666667, ans=0.0 2023-11-20 10:59:37,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1059313.3333333333, ans=0.0 2023-11-20 10:59:39,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2023-11-20 10:59:39,874 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158900 2023-11-20 10:59:51,248 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2600, loss[loss=0.05737, simple_loss=0.06419, pruned_loss=0.01393, audio_tagging_loss=0.01134, over 16545.00 frames. ], tot_loss[loss=0.07965, simple_loss=0.1004, pruned_loss=0.01955, audio_tagging_loss=0.009883, over 3058125.61 frames. ], batch size: 64, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:59:56,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1059380.0, ans=0.0 2023-11-20 11:00:04,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.545e+01 8.985e+01 9.606e+01 1.560e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 11:00:20,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1059513.3333333333, ans=0.1 2023-11-20 11:00:22,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1059513.3333333333, ans=0.2 2023-11-20 11:00:44,939 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 158950 2023-11-20 11:00:50,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1059646.6666666667, ans=0.125 2023-11-20 11:00:57,132 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2650, loss[loss=0.05249, simple_loss=0.062, pruned_loss=0.009346, audio_tagging_loss=0.01215, over 13872.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.09998, pruned_loss=0.01942, audio_tagging_loss=0.009792, over 3055041.34 frames. ], batch size: 54, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:00:59,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1059713.3333333333, ans=0.05 2023-11-20 11:01:08,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1059780.0, ans=0.05 2023-11-20 11:01:16,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1059780.0, ans=0.125 2023-11-20 11:01:17,022 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-20 11:01:50,652 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159000 2023-11-20 11:01:58,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1059980.0, ans=0.125 2023-11-20 11:02:02,634 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2700, loss[loss=0.1136, simple_loss=0.1529, pruned_loss=0.03249, audio_tagging_loss=0.004662, over 15665.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.1009, pruned_loss=0.01944, audio_tagging_loss=0.009691, over 3052134.24 frames. ], batch size: 55, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:02:15,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 7.991e+01 8.664e+01 9.430e+01 1.129e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 11:02:24,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1060113.3333333333, ans=0.125 2023-11-20 11:02:33,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1060180.0, ans=0.0 2023-11-20 11:02:56,837 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159050 2023-11-20 11:03:08,526 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2750, loss[loss=0.08696, simple_loss=0.1103, pruned_loss=0.02267, audio_tagging_loss=0.009122, over 14463.00 frames. ], tot_loss[loss=0.07849, simple_loss=0.09918, pruned_loss=0.0191, audio_tagging_loss=0.009799, over 3043731.02 frames. ], batch size: 54, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:03:11,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1060380.0, ans=0.125 2023-11-20 11:03:24,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2023-11-20 11:03:27,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1060446.6666666667, ans=0.0 2023-11-20 11:03:30,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1060446.6666666667, ans=0.0 2023-11-20 11:03:33,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1060513.3333333333, ans=0.125 2023-11-20 11:03:57,229 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:04:01,985 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159100 2023-11-20 11:04:04,409 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:04:13,751 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2800, loss[loss=0.07349, simple_loss=0.0954, pruned_loss=0.01521, audio_tagging_loss=0.01059, over 15061.00 frames. ], tot_loss[loss=0.07897, simple_loss=0.09978, pruned_loss=0.0193, audio_tagging_loss=0.009781, over 3040360.60 frames. ], batch size: 57, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:04:26,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.546e+01 8.183e+01 8.895e+01 9.590e+01 1.282e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 11:04:48,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1060846.6666666667, ans=0.125 2023-11-20 11:05:07,162 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159150 2023-11-20 11:05:18,781 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2850, loss[loss=0.0906, simple_loss=0.1126, pruned_loss=0.02368, audio_tagging_loss=0.01061, over 15682.00 frames. ], tot_loss[loss=0.0794, simple_loss=0.1004, pruned_loss=0.01944, audio_tagging_loss=0.009761, over 3035417.12 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:05:23,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1061046.6666666667, ans=0.1 2023-11-20 11:05:27,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1061046.6666666667, ans=0.125 2023-11-20 11:05:28,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1061046.6666666667, ans=0.0 2023-11-20 11:06:00,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1061246.6666666667, ans=0.0 2023-11-20 11:06:12,330 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159200 2023-11-20 11:06:24,449 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2900, loss[loss=0.08422, simple_loss=0.1021, pruned_loss=0.02335, audio_tagging_loss=0.009817, over 16178.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.1007, pruned_loss=0.01952, audio_tagging_loss=0.009801, over 3042550.95 frames. ], batch size: 61, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:06:28,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-20 11:06:37,435 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.069e+01 8.700e+01 9.440e+01 1.245e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 11:06:46,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2023-11-20 11:07:00,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1061513.3333333333, ans=0.125 2023-11-20 11:07:06,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1061580.0, ans=0.125 2023-11-20 11:07:10,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1061580.0, ans=0.2 2023-11-20 11:07:13,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1061580.0, ans=0.125 2023-11-20 11:07:18,177 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159250 2023-11-20 11:07:29,983 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 2950, loss[loss=0.08947, simple_loss=0.1249, pruned_loss=0.01994, audio_tagging_loss=0.007063, over 15269.00 frames. ], tot_loss[loss=0.08045, simple_loss=0.1019, pruned_loss=0.01981, audio_tagging_loss=0.009689, over 3051364.20 frames. ], batch size: 55, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:07:33,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1061713.3333333333, ans=0.1 2023-11-20 11:07:56,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1061846.6666666667, ans=0.125 2023-11-20 11:08:05,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1061846.6666666667, ans=0.5 2023-11-20 11:08:10,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1061913.3333333333, ans=0.0 2023-11-20 11:08:18,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1061913.3333333333, ans=0.125 2023-11-20 11:08:23,494 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159300 2023-11-20 11:08:34,628 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3000, loss[loss=0.0632, simple_loss=0.08148, pruned_loss=0.01145, audio_tagging_loss=0.01101, over 13707.00 frames. ], tot_loss[loss=0.08, simple_loss=0.1012, pruned_loss=0.01959, audio_tagging_loss=0.009829, over 3046371.07 frames. ], batch size: 53, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:08:34,629 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 11:08:54,811 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0087, 3.9101, 5.1401, 3.5328], device='cuda:2') 2023-11-20 11:09:14,373 INFO [train_asr.py:1294] (2/4) Epoch 14, validation: loss=0.06185, simple_loss=0.05368, pruned_loss=0.005702, audio_tagging_loss=0.02931, over 4681554.00 frames. 2023-11-20 11:09:14,374 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 11:09:26,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1062113.3333333333, ans=0.2 2023-11-20 11:09:27,211 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.003e+01 8.131e+01 8.854e+01 9.762e+01 1.260e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 11:09:47,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-11-20 11:10:01,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1062246.6666666667, ans=0.0 2023-11-20 11:10:01,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1062246.6666666667, ans=0.0 2023-11-20 11:10:07,370 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159350 2023-11-20 11:10:19,213 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3050, loss[loss=0.09496, simple_loss=0.1172, pruned_loss=0.02596, audio_tagging_loss=0.0104, over 15701.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1022, pruned_loss=0.01974, audio_tagging_loss=0.009828, over 3047868.92 frames. ], batch size: 58, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:10:26,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1062380.0, ans=0.1 2023-11-20 11:10:26,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1062380.0, ans=0.2 2023-11-20 11:10:40,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1062446.6666666667, ans=0.125 2023-11-20 11:10:47,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2023-11-20 11:10:56,748 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:11:05,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1062580.0, ans=0.0 2023-11-20 11:11:06,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1062580.0, ans=0.2 2023-11-20 11:11:12,503 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159400 2023-11-20 11:11:23,994 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3100, loss[loss=0.08834, simple_loss=0.1133, pruned_loss=0.0228, audio_tagging_loss=0.008866, over 15080.00 frames. ], tot_loss[loss=0.08007, simple_loss=0.1012, pruned_loss=0.01957, audio_tagging_loss=0.009887, over 3045528.52 frames. ], batch size: 55, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:11:26,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1062713.3333333333, ans=0.125 2023-11-20 11:11:33,379 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-11-20 11:11:37,687 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 7.933e+01 8.636e+01 9.301e+01 1.154e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 11:11:39,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1062780.0, ans=0.125 2023-11-20 11:12:00,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.32 vs. limit=22.5 2023-11-20 11:12:18,301 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159450 2023-11-20 11:12:29,852 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3150, loss[loss=0.08332, simple_loss=0.1078, pruned_loss=0.02185, audio_tagging_loss=0.007589, over 16802.00 frames. ], tot_loss[loss=0.08008, simple_loss=0.101, pruned_loss=0.01955, audio_tagging_loss=0.01005, over 3040488.38 frames. ], batch size: 63, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:12:31,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1063046.6666666667, ans=0.0 2023-11-20 11:12:44,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1063113.3333333333, ans=0.125 2023-11-20 11:13:17,634 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:13:24,108 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159500 2023-11-20 11:13:32,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1063313.3333333333, ans=0.2 2023-11-20 11:13:35,645 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3200, loss[loss=0.07987, simple_loss=0.1018, pruned_loss=0.0178, audio_tagging_loss=0.01117, over 16177.00 frames. ], tot_loss[loss=0.08039, simple_loss=0.1014, pruned_loss=0.01962, audio_tagging_loss=0.01009, over 3044143.92 frames. ], batch size: 59, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:13:44,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1063380.0, ans=0.0 2023-11-20 11:13:47,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.344e+01 9.152e+01 9.986e+01 1.362e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 11:13:51,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1063446.6666666667, ans=0.0 2023-11-20 11:14:11,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1063513.3333333333, ans=0.07 2023-11-20 11:14:19,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2023-11-20 11:14:23,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1063580.0, ans=0.125 2023-11-20 11:14:23,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-20 11:14:27,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1063646.6666666667, ans=0.0 2023-11-20 11:14:29,395 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159550 2023-11-20 11:14:40,184 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3250, loss[loss=0.0685, simple_loss=0.08539, pruned_loss=0.0133, audio_tagging_loss=0.0125, over 17323.00 frames. ], tot_loss[loss=0.0799, simple_loss=0.1006, pruned_loss=0.0194, audio_tagging_loss=0.01021, over 3049483.92 frames. ], batch size: 65, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:14:47,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-20 11:14:56,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1063780.0, ans=0.2 2023-11-20 11:15:34,377 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159600 2023-11-20 11:15:34,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1063980.0, ans=0.0 2023-11-20 11:15:45,716 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3300, loss[loss=0.1042, simple_loss=0.1376, pruned_loss=0.02636, audio_tagging_loss=0.00905, over 15357.00 frames. ], tot_loss[loss=0.07969, simple_loss=0.1003, pruned_loss=0.01929, audio_tagging_loss=0.01028, over 3050888.64 frames. ], batch size: 54, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:15:50,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1064046.6666666667, ans=0.0 2023-11-20 11:15:55,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1064046.6666666667, ans=0.125 2023-11-20 11:15:58,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 7.969e+01 8.807e+01 9.518e+01 1.189e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 11:16:03,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.41 vs. limit=10.0 2023-11-20 11:16:25,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1064246.6666666667, ans=0.125 2023-11-20 11:16:39,574 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159650 2023-11-20 11:16:39,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1064313.3333333333, ans=0.1 2023-11-20 11:16:51,798 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3350, loss[loss=0.06216, simple_loss=0.07801, pruned_loss=0.01312, audio_tagging_loss=0.01004, over 15081.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1014, pruned_loss=0.0196, audio_tagging_loss=0.01014, over 3051372.57 frames. ], batch size: 60, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:16:54,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1064380.0, ans=0.125 2023-11-20 11:17:01,462 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.63 vs. limit=12.0 2023-11-20 11:17:12,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1064446.6666666667, ans=0.0 2023-11-20 11:17:34,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1064580.0, ans=0.1 2023-11-20 11:17:39,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1064580.0, ans=0.0 2023-11-20 11:17:45,121 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159700 2023-11-20 11:17:48,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1064646.6666666667, ans=0.0 2023-11-20 11:17:55,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1064713.3333333333, ans=0.0 2023-11-20 11:17:56,249 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3400, loss[loss=0.07444, simple_loss=0.09984, pruned_loss=0.01373, audio_tagging_loss=0.01078, over 15994.00 frames. ], tot_loss[loss=0.0795, simple_loss=0.1, pruned_loss=0.01937, audio_tagging_loss=0.01012, over 3049734.91 frames. ], batch size: 59, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:17:56,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1064713.3333333333, ans=0.125 2023-11-20 11:18:01,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-11-20 11:18:08,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1064780.0, ans=0.1 2023-11-20 11:18:10,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.209e+01 8.921e+01 9.607e+01 2.745e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-20 11:18:36,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2023-11-20 11:18:40,512 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=15.0 2023-11-20 11:18:43,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1064913.3333333333, ans=0.125 2023-11-20 11:18:49,733 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159750 2023-11-20 11:19:01,449 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3450, loss[loss=0.06174, simple_loss=0.07292, pruned_loss=0.01136, audio_tagging_loss=0.01392, over 15361.00 frames. ], tot_loss[loss=0.07939, simple_loss=0.09992, pruned_loss=0.01933, audio_tagging_loss=0.01009, over 3048633.61 frames. ], batch size: 59, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:19:07,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1065046.6666666667, ans=0.0 2023-11-20 11:19:32,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1065180.0, ans=0.07 2023-11-20 11:19:51,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1065246.6666666667, ans=0.125 2023-11-20 11:19:54,742 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159800 2023-11-20 11:20:07,002 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3500, loss[loss=0.06434, simple_loss=0.07541, pruned_loss=0.0147, audio_tagging_loss=0.01193, over 15517.00 frames. ], tot_loss[loss=0.07948, simple_loss=0.1001, pruned_loss=0.01944, audio_tagging_loss=0.009986, over 3052339.83 frames. ], batch size: 61, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:20:10,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1065380.0, ans=0.125 2023-11-20 11:20:11,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1065380.0, ans=0.95 2023-11-20 11:20:22,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.437e+01 9.163e+01 1.016e+02 1.154e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-20 11:20:40,564 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:20:48,127 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.30 vs. limit=10.0 2023-11-20 11:20:51,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1065580.0, ans=0.125 2023-11-20 11:21:00,585 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159850 2023-11-20 11:21:11,686 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3550, loss[loss=0.07646, simple_loss=0.1002, pruned_loss=0.01777, audio_tagging_loss=0.008612, over 15217.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.1008, pruned_loss=0.01972, audio_tagging_loss=0.009847, over 3046021.40 frames. ], batch size: 57, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:21:13,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1065713.3333333333, ans=0.025 2023-11-20 11:21:17,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1065713.3333333333, ans=0.2 2023-11-20 11:21:21,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1065713.3333333333, ans=0.1 2023-11-20 11:21:22,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1065713.3333333333, ans=0.125 2023-11-20 11:21:29,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1065780.0, ans=0.07 2023-11-20 11:22:04,638 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159900 2023-11-20 11:22:05,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-11-20 11:22:16,490 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3600, loss[loss=0.0776, simple_loss=0.0871, pruned_loss=0.02303, audio_tagging_loss=0.01102, over 15658.00 frames. ], tot_loss[loss=0.07897, simple_loss=0.09946, pruned_loss=0.01934, audio_tagging_loss=0.009898, over 3044367.33 frames. ], batch size: 59, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:22:19,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1066046.6666666667, ans=0.1 2023-11-20 11:22:24,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1066046.6666666667, ans=0.125 2023-11-20 11:22:28,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1066113.3333333333, ans=0.0 2023-11-20 11:22:29,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1066113.3333333333, ans=0.125 2023-11-20 11:22:31,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.755e+01 8.370e+01 9.104e+01 9.925e+01 1.510e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-20 11:22:32,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.88 vs. limit=22.5 2023-11-20 11:22:39,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1066113.3333333333, ans=0.0 2023-11-20 11:22:39,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1066113.3333333333, ans=0.125 2023-11-20 11:22:43,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1066180.0, ans=0.0 2023-11-20 11:23:05,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1066246.6666666667, ans=0.125 2023-11-20 11:23:09,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.75 vs. limit=10.0 2023-11-20 11:23:09,960 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 159950 2023-11-20 11:23:16,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-20 11:23:21,957 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3650, loss[loss=0.06858, simple_loss=0.0912, pruned_loss=0.01508, audio_tagging_loss=0.007903, over 14658.00 frames. ], tot_loss[loss=0.07914, simple_loss=0.09974, pruned_loss=0.01942, audio_tagging_loss=0.00985, over 3039793.08 frames. ], batch size: 57, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:23:23,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1066380.0, ans=0.125 2023-11-20 11:23:25,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2023-11-20 11:23:29,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1066380.0, ans=0.0 2023-11-20 11:24:02,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-20 11:24:09,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1066580.0, ans=0.125 2023-11-20 11:24:10,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1066580.0, ans=0.125 2023-11-20 11:24:16,573 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160000 2023-11-20 11:24:17,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1066646.6666666667, ans=0.125 2023-11-20 11:24:31,860 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3700, loss[loss=0.08998, simple_loss=0.1061, pruned_loss=0.02931, audio_tagging_loss=0.007598, over 14541.00 frames. ], tot_loss[loss=0.07868, simple_loss=0.09896, pruned_loss=0.01928, audio_tagging_loss=0.009922, over 3040118.44 frames. ], batch size: 57, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:24:47,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.291e+01 8.874e+01 9.793e+01 1.503e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 11:24:50,145 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-11-20 11:25:00,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2023-11-20 11:25:03,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1066846.6666666667, ans=0.0 2023-11-20 11:25:25,350 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160050 2023-11-20 11:25:29,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1066980.0, ans=0.0 2023-11-20 11:25:34,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1066980.0, ans=0.125 2023-11-20 11:25:34,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1066980.0, ans=0.125 2023-11-20 11:25:36,927 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3750, loss[loss=0.1236, simple_loss=0.1597, pruned_loss=0.03824, audio_tagging_loss=0.0055, over 15273.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.09959, pruned_loss=0.01946, audio_tagging_loss=0.009954, over 3043247.42 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:25:45,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-20 11:26:01,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=22.5 2023-11-20 11:26:22,703 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:26:22,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1067246.6666666667, ans=10.0 2023-11-20 11:26:30,181 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160100 2023-11-20 11:26:41,888 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3800, loss[loss=0.09258, simple_loss=0.1141, pruned_loss=0.02479, audio_tagging_loss=0.01072, over 14757.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.1005, pruned_loss=0.01966, audio_tagging_loss=0.009954, over 3048953.37 frames. ], batch size: 55, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:26:48,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1067380.0, ans=0.125 2023-11-20 11:26:50,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1067380.0, ans=0.125 2023-11-20 11:26:54,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-20 11:26:57,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.245e+01 9.019e+01 9.669e+01 1.480e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 11:27:32,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1067580.0, ans=0.0 2023-11-20 11:27:36,410 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160150 2023-11-20 11:27:44,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1067646.6666666667, ans=0.2 2023-11-20 11:27:48,139 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3850, loss[loss=0.07169, simple_loss=0.09214, pruned_loss=0.01548, audio_tagging_loss=0.01013, over 15475.00 frames. ], tot_loss[loss=0.07966, simple_loss=0.1004, pruned_loss=0.01955, audio_tagging_loss=0.009902, over 3044604.55 frames. ], batch size: 59, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:28:01,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1067780.0, ans=0.04949747468305833 2023-11-20 11:28:10,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-20 11:28:24,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1067846.6666666667, ans=0.125 2023-11-20 11:28:33,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1067913.3333333333, ans=0.125 2023-11-20 11:28:41,437 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160200 2023-11-20 11:28:49,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1067980.0, ans=0.0 2023-11-20 11:28:53,427 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3900, loss[loss=0.1103, simple_loss=0.1471, pruned_loss=0.03162, audio_tagging_loss=0.005166, over 15048.00 frames. ], tot_loss[loss=0.07983, simple_loss=0.1008, pruned_loss=0.01955, audio_tagging_loss=0.009901, over 3032661.44 frames. ], batch size: 54, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:29:08,877 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.101e+01 8.668e+01 9.712e+01 1.300e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 11:29:17,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1068113.3333333333, ans=0.0 2023-11-20 11:29:44,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1068313.3333333333, ans=0.0 2023-11-20 11:29:45,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1068313.3333333333, ans=0.125 2023-11-20 11:29:45,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1068313.3333333333, ans=0.125 2023-11-20 11:29:46,796 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160250 2023-11-20 11:29:50,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1068313.3333333333, ans=0.2 2023-11-20 11:29:51,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-20 11:29:58,771 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 3950, loss[loss=0.08062, simple_loss=0.1057, pruned_loss=0.01833, audio_tagging_loss=0.009453, over 16407.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1007, pruned_loss=0.01968, audio_tagging_loss=0.01012, over 3034112.87 frames. ], batch size: 61, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:30:04,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1068380.0, ans=0.125 2023-11-20 11:30:36,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1068513.3333333333, ans=0.125 2023-11-20 11:30:47,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1068580.0, ans=0.125 2023-11-20 11:30:52,510 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160300 2023-11-20 11:31:04,149 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4000, loss[loss=0.101, simple_loss=0.1261, pruned_loss=0.02699, audio_tagging_loss=0.0109, over 14339.00 frames. ], tot_loss[loss=0.0805, simple_loss=0.1011, pruned_loss=0.01972, audio_tagging_loss=0.01025, over 3038066.00 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 32.0 2023-11-20 11:31:16,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1068780.0, ans=0.125 2023-11-20 11:31:20,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.155e+01 8.816e+01 9.659e+01 1.219e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-20 11:31:35,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1068846.6666666667, ans=0.015 2023-11-20 11:31:57,392 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160350 2023-11-20 11:32:00,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1068980.0, ans=0.125 2023-11-20 11:32:09,820 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4050, loss[loss=0.06798, simple_loss=0.09165, pruned_loss=0.01374, audio_tagging_loss=0.008419, over 14776.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.1011, pruned_loss=0.01978, audio_tagging_loss=0.01021, over 3034045.40 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:32:13,607 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:32:18,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1069046.6666666667, ans=0.125 2023-11-20 11:32:21,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1069113.3333333333, ans=0.1 2023-11-20 11:32:25,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1069113.3333333333, ans=0.05 2023-11-20 11:32:36,643 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2023-11-20 11:33:02,901 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160400 2023-11-20 11:33:14,086 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4100, loss[loss=0.0463, simple_loss=0.05246, pruned_loss=0.006654, audio_tagging_loss=0.01341, over 14303.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.101, pruned_loss=0.01958, audio_tagging_loss=0.01023, over 3038417.21 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:33:15,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1069380.0, ans=0.125 2023-11-20 11:33:31,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.500e+01 8.137e+01 8.868e+01 9.537e+01 1.552e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 11:33:34,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1069446.6666666667, ans=0.1 2023-11-20 11:33:42,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1069513.3333333333, ans=0.1 2023-11-20 11:33:56,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1069580.0, ans=0.1 2023-11-20 11:34:07,315 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160450 2023-11-20 11:34:19,051 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4150, loss[loss=0.05833, simple_loss=0.06723, pruned_loss=0.01292, audio_tagging_loss=0.01179, over 14781.00 frames. ], tot_loss[loss=0.07972, simple_loss=0.1004, pruned_loss=0.01937, audio_tagging_loss=0.01016, over 3046765.85 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:34:26,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1069713.3333333333, ans=0.125 2023-11-20 11:34:55,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1069846.6666666667, ans=0.125 2023-11-20 11:34:57,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1069913.3333333333, ans=0.2 2023-11-20 11:35:04,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.13 vs. limit=15.0 2023-11-20 11:35:06,445 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:35:11,527 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160500 2023-11-20 11:35:22,621 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4200, loss[loss=0.09147, simple_loss=0.1077, pruned_loss=0.03121, audio_tagging_loss=0.006404, over 15273.00 frames. ], tot_loss[loss=0.07983, simple_loss=0.1008, pruned_loss=0.01945, audio_tagging_loss=0.009972, over 3046049.98 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:35:26,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2023-11-20 11:35:27,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1070046.6666666667, ans=0.025 2023-11-20 11:35:39,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1070113.3333333333, ans=0.125 2023-11-20 11:35:40,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 8.053e+01 8.867e+01 9.480e+01 1.332e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 11:35:56,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=15.0 2023-11-20 11:35:57,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1070180.0, ans=0.0 2023-11-20 11:36:01,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1070246.6666666667, ans=0.125 2023-11-20 11:36:04,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-11-20 11:36:16,918 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160550 2023-11-20 11:36:18,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1070313.3333333333, ans=0.0 2023-11-20 11:36:25,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1070313.3333333333, ans=0.125 2023-11-20 11:36:28,438 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4250, loss[loss=0.07416, simple_loss=0.08999, pruned_loss=0.01606, audio_tagging_loss=0.01311, over 14557.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.1015, pruned_loss=0.01988, audio_tagging_loss=0.00994, over 3043774.04 frames. ], batch size: 55, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:36:49,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1070446.6666666667, ans=0.09899494936611666 2023-11-20 11:37:22,890 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160600 2023-11-20 11:37:30,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2023-11-20 11:37:34,830 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4300, loss[loss=0.1042, simple_loss=0.1299, pruned_loss=0.02892, audio_tagging_loss=0.01029, over 15901.00 frames. ], tot_loss[loss=0.08089, simple_loss=0.1021, pruned_loss=0.01999, audio_tagging_loss=0.009862, over 3050388.33 frames. ], batch size: 57, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:37:41,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1070713.3333333333, ans=0.2 2023-11-20 11:37:50,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.383e+01 9.324e+01 1.005e+02 1.336e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-20 11:38:07,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-20 11:38:28,429 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160650 2023-11-20 11:38:39,431 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4350, loss[loss=0.09562, simple_loss=0.1149, pruned_loss=0.02861, audio_tagging_loss=0.009583, over 15173.00 frames. ], tot_loss[loss=0.08173, simple_loss=0.1035, pruned_loss=0.02023, audio_tagging_loss=0.009754, over 3045333.01 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:38:43,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1071046.6666666667, ans=0.125 2023-11-20 11:38:52,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1071113.3333333333, ans=0.0 2023-11-20 11:38:52,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1071113.3333333333, ans=0.125 2023-11-20 11:39:01,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-20 11:39:32,572 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160700 2023-11-20 11:39:34,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071313.3333333333, ans=0.1 2023-11-20 11:39:45,069 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4400, loss[loss=0.08678, simple_loss=0.106, pruned_loss=0.0241, audio_tagging_loss=0.009672, over 14214.00 frames. ], tot_loss[loss=0.08109, simple_loss=0.1024, pruned_loss=0.02009, audio_tagging_loss=0.009786, over 3039852.51 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:39:48,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1071380.0, ans=0.125 2023-11-20 11:40:02,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.159e+01 8.593e+01 9.338e+01 1.252e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 11:40:13,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1071513.3333333333, ans=0.1 2023-11-20 11:40:21,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1071513.3333333333, ans=0.125 2023-11-20 11:40:27,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.0 2023-11-20 11:40:38,708 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160750 2023-11-20 11:40:50,810 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4450, loss[loss=0.05919, simple_loss=0.06402, pruned_loss=0.01496, audio_tagging_loss=0.01221, over 14829.00 frames. ], tot_loss[loss=0.08038, simple_loss=0.1013, pruned_loss=0.01992, audio_tagging_loss=0.009781, over 3040128.58 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:41:03,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1071780.0, ans=0.0 2023-11-20 11:41:20,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=22.5 2023-11-20 11:41:20,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2023-11-20 11:41:44,379 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160800 2023-11-20 11:41:48,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1071980.0, ans=0.125 2023-11-20 11:41:56,232 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4500, loss[loss=0.09716, simple_loss=0.1182, pruned_loss=0.02707, audio_tagging_loss=0.01099, over 15980.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.1001, pruned_loss=0.01976, audio_tagging_loss=0.009813, over 3039037.45 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:42:12,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.604e+01 9.143e+01 9.908e+01 1.250e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 11:42:32,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=12.0 2023-11-20 11:42:41,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1072246.6666666667, ans=0.125 2023-11-20 11:42:44,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1072246.6666666667, ans=0.125 2023-11-20 11:42:50,257 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160850 2023-11-20 11:43:01,955 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4550, loss[loss=0.07254, simple_loss=0.09923, pruned_loss=0.01304, audio_tagging_loss=0.009893, over 15686.00 frames. ], tot_loss[loss=0.07916, simple_loss=0.0995, pruned_loss=0.0195, audio_tagging_loss=0.009906, over 3039133.26 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:43:02,715 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-20 11:43:26,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2023-11-20 11:43:36,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1072513.3333333333, ans=0.0 2023-11-20 11:43:36,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1072513.3333333333, ans=0.125 2023-11-20 11:43:41,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1072580.0, ans=0.125 2023-11-20 11:43:48,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-20 11:43:51,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1072580.0, ans=0.125 2023-11-20 11:43:53,059 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:43:55,641 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160900 2023-11-20 11:43:55,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1072646.6666666667, ans=0.1 2023-11-20 11:44:08,157 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4600, loss[loss=0.07722, simple_loss=0.1149, pruned_loss=0.01473, audio_tagging_loss=0.005054, over 15208.00 frames. ], tot_loss[loss=0.07934, simple_loss=0.09999, pruned_loss=0.01942, audio_tagging_loss=0.00993, over 3047939.30 frames. ], batch size: 54, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:44:21,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1072780.0, ans=0.125 2023-11-20 11:44:24,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.263e+01 8.597e+01 9.248e+01 1.165e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 11:44:41,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2023-11-20 11:44:48,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1072913.3333333333, ans=0.125 2023-11-20 11:44:51,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1072913.3333333333, ans=0.04949747468305833 2023-11-20 11:45:01,656 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 160950 2023-11-20 11:45:12,499 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4650, loss[loss=0.06127, simple_loss=0.07153, pruned_loss=0.01594, audio_tagging_loss=0.009558, over 14108.00 frames. ], tot_loss[loss=0.07947, simple_loss=0.1004, pruned_loss=0.01931, audio_tagging_loss=0.009973, over 3050153.21 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:45:21,690 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.75 vs. limit=22.5 2023-11-20 11:45:28,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1073113.3333333333, ans=0.0 2023-11-20 11:46:05,356 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161000 2023-11-20 11:46:10,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1073313.3333333333, ans=0.0 2023-11-20 11:46:16,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1073380.0, ans=0.1 2023-11-20 11:46:17,316 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4700, loss[loss=0.05607, simple_loss=0.06246, pruned_loss=0.01158, audio_tagging_loss=0.01326, over 15013.00 frames. ], tot_loss[loss=0.07954, simple_loss=0.1003, pruned_loss=0.01925, audio_tagging_loss=0.01015, over 3056140.15 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:46:20,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1073380.0, ans=0.125 2023-11-20 11:46:21,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1073380.0, ans=0.1 2023-11-20 11:46:34,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.445e+01 9.172e+01 1.004e+02 1.426e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 11:46:50,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1073513.3333333333, ans=0.0 2023-11-20 11:46:50,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1073513.3333333333, ans=0.0 2023-11-20 11:46:53,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1073513.3333333333, ans=0.0 2023-11-20 11:46:57,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1073580.0, ans=0.125 2023-11-20 11:47:10,720 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161050 2023-11-20 11:47:21,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-11-20 11:47:22,432 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4750, loss[loss=0.07576, simple_loss=0.09042, pruned_loss=0.01644, audio_tagging_loss=0.01412, over 15725.00 frames. ], tot_loss[loss=0.07923, simple_loss=0.09973, pruned_loss=0.01918, audio_tagging_loss=0.01018, over 3046656.18 frames. ], batch size: 61, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:47:34,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1073780.0, ans=0.025 2023-11-20 11:47:49,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1073846.6666666667, ans=0.2 2023-11-20 11:47:51,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1073846.6666666667, ans=0.1 2023-11-20 11:47:59,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2023-11-20 11:48:15,301 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161100 2023-11-20 11:48:26,999 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4800, loss[loss=0.08064, simple_loss=0.09226, pruned_loss=0.02232, audio_tagging_loss=0.01219, over 15140.00 frames. ], tot_loss[loss=0.07978, simple_loss=0.1003, pruned_loss=0.01941, audio_tagging_loss=0.01022, over 3040870.87 frames. ], batch size: 59, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:48:38,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1074113.3333333333, ans=0.1 2023-11-20 11:48:45,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.077e+01 9.114e+01 9.869e+01 1.463e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 11:48:56,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1074180.0, ans=0.125 2023-11-20 11:48:58,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1074180.0, ans=0.0 2023-11-20 11:49:19,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.31 vs. limit=10.0 2023-11-20 11:49:19,964 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161150 2023-11-20 11:49:27,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1074313.3333333333, ans=0.05 2023-11-20 11:49:31,595 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4850, loss[loss=0.09244, simple_loss=0.1138, pruned_loss=0.02589, audio_tagging_loss=0.009622, over 16266.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.1017, pruned_loss=0.01973, audio_tagging_loss=0.01021, over 3044864.15 frames. ], batch size: 60, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:49:42,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2023-11-20 11:50:04,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=15.0 2023-11-20 11:50:13,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1074580.0, ans=0.2 2023-11-20 11:50:17,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=22.5 2023-11-20 11:50:23,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-11-20 11:50:25,128 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161200 2023-11-20 11:50:34,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1074646.6666666667, ans=0.0 2023-11-20 11:50:36,493 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4900, loss[loss=0.06283, simple_loss=0.07236, pruned_loss=0.01562, audio_tagging_loss=0.01103, over 15165.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.1004, pruned_loss=0.01938, audio_tagging_loss=0.01022, over 3040490.77 frames. ], batch size: 61, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:50:42,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.71 vs. limit=15.0 2023-11-20 11:50:51,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1074780.0, ans=0.2 2023-11-20 11:50:56,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.282e+01 8.231e+01 8.987e+01 9.531e+01 1.955e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-20 11:51:03,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1074846.6666666667, ans=0.0 2023-11-20 11:51:05,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1074846.6666666667, ans=0.125 2023-11-20 11:51:17,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1074913.3333333333, ans=0.0 2023-11-20 11:51:30,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1074980.0, ans=10.0 2023-11-20 11:51:30,877 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161250 2023-11-20 11:51:34,065 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-20 11:51:40,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1074980.0, ans=0.125 2023-11-20 11:51:40,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1074980.0, ans=0.1 2023-11-20 11:51:43,156 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 4950, loss[loss=0.07572, simple_loss=0.09459, pruned_loss=0.01861, audio_tagging_loss=0.009815, over 15031.00 frames. ], tot_loss[loss=0.07959, simple_loss=0.1003, pruned_loss=0.01942, audio_tagging_loss=0.009998, over 3040585.96 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:51:50,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1075046.6666666667, ans=0.125 2023-11-20 11:51:51,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=22.5 2023-11-20 11:51:51,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2023-11-20 11:51:52,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1075046.6666666667, ans=0.2 2023-11-20 11:51:55,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1075113.3333333333, ans=0.125 2023-11-20 11:51:58,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1075113.3333333333, ans=0.125 2023-11-20 11:52:14,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2023-11-20 11:52:18,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1075180.0, ans=0.125 2023-11-20 11:52:36,401 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161300 2023-11-20 11:52:39,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1075313.3333333333, ans=0.0 2023-11-20 11:52:42,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1075313.3333333333, ans=0.04949747468305833 2023-11-20 11:52:47,464 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5000, loss[loss=0.08352, simple_loss=0.09951, pruned_loss=0.02657, audio_tagging_loss=0.007191, over 15029.00 frames. ], tot_loss[loss=0.07953, simple_loss=0.1002, pruned_loss=0.01947, audio_tagging_loss=0.009983, over 3036973.57 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:53:07,295 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 7.908e+01 8.687e+01 9.464e+01 1.260e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 11:53:11,432 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:53:18,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1075513.3333333333, ans=0.125 2023-11-20 11:53:30,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-20 11:53:37,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1075580.0, ans=0.0 2023-11-20 11:53:41,378 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161350 2023-11-20 11:53:45,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075646.6666666667, ans=0.1 2023-11-20 11:53:52,376 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5050, loss[loss=0.06851, simple_loss=0.08263, pruned_loss=0.01599, audio_tagging_loss=0.0112, over 13966.00 frames. ], tot_loss[loss=0.08027, simple_loss=0.1016, pruned_loss=0.01969, audio_tagging_loss=0.009782, over 3035521.62 frames. ], batch size: 54, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:53:53,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1075713.3333333333, ans=0.0 2023-11-20 11:53:56,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1075713.3333333333, ans=0.125 2023-11-20 11:54:08,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2023-11-20 11:54:09,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2023-11-20 11:54:13,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1075780.0, ans=0.125 2023-11-20 11:54:14,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2023-11-20 11:54:46,723 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161400 2023-11-20 11:54:50,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075980.0, ans=0.1 2023-11-20 11:54:55,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1075980.0, ans=0.0 2023-11-20 11:54:58,763 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5100, loss[loss=0.06302, simple_loss=0.07798, pruned_loss=0.01272, audio_tagging_loss=0.01131, over 15518.00 frames. ], tot_loss[loss=0.08023, simple_loss=0.1017, pruned_loss=0.0196, audio_tagging_loss=0.009798, over 3038223.15 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:55:09,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1076046.6666666667, ans=0.0 2023-11-20 11:55:12,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1076113.3333333333, ans=0.0 2023-11-20 11:55:14,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2023-11-20 11:55:17,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.531e+01 9.239e+01 1.028e+02 1.398e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-20 11:55:36,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1076246.6666666667, ans=0.125 2023-11-20 11:55:39,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1076246.6666666667, ans=0.0 2023-11-20 11:55:52,370 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161450 2023-11-20 11:56:04,121 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5150, loss[loss=0.08827, simple_loss=0.1146, pruned_loss=0.02195, audio_tagging_loss=0.009047, over 15470.00 frames. ], tot_loss[loss=0.07978, simple_loss=0.101, pruned_loss=0.0195, audio_tagging_loss=0.009783, over 3040718.84 frames. ], batch size: 55, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:56:36,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=12.0 2023-11-20 11:56:43,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1076580.0, ans=0.125 2023-11-20 11:56:48,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1076580.0, ans=22.5 2023-11-20 11:56:53,722 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:56:54,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1076646.6666666667, ans=0.1 2023-11-20 11:56:57,361 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161500 2023-11-20 11:57:07,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1076713.3333333333, ans=0.0 2023-11-20 11:57:09,009 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5200, loss[loss=0.08077, simple_loss=0.1061, pruned_loss=0.01834, audio_tagging_loss=0.009381, over 14860.00 frames. ], tot_loss[loss=0.07993, simple_loss=0.1013, pruned_loss=0.01949, audio_tagging_loss=0.009768, over 3031309.59 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:57:09,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1076713.3333333333, ans=0.1 2023-11-20 11:57:28,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.173e+01 8.712e+01 9.541e+01 1.479e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 11:57:30,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1076780.0, ans=0.1 2023-11-20 11:57:44,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1076846.6666666667, ans=0.07 2023-11-20 11:58:01,838 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161550 2023-11-20 11:58:08,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1076980.0, ans=0.0 2023-11-20 11:58:14,023 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5250, loss[loss=0.08138, simple_loss=0.09676, pruned_loss=0.02126, audio_tagging_loss=0.01174, over 14698.00 frames. ], tot_loss[loss=0.07981, simple_loss=0.1012, pruned_loss=0.01952, audio_tagging_loss=0.009691, over 3038448.34 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:58:20,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1077046.6666666667, ans=0.125 2023-11-20 11:58:54,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1077246.6666666667, ans=0.0 2023-11-20 11:59:03,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1077246.6666666667, ans=0.0 2023-11-20 11:59:06,734 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161600 2023-11-20 11:59:18,096 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5300, loss[loss=0.0584, simple_loss=0.07127, pruned_loss=0.009118, audio_tagging_loss=0.01364, over 14365.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.1012, pruned_loss=0.01946, audio_tagging_loss=0.009689, over 3036546.47 frames. ], batch size: 55, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:59:25,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1077380.0, ans=0.05 2023-11-20 11:59:38,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.714e+01 8.511e+01 9.160e+01 9.855e+01 1.370e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-20 11:59:41,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1077446.6666666667, ans=0.1 2023-11-20 11:59:48,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1077513.3333333333, ans=0.2 2023-11-20 12:00:04,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1077580.0, ans=0.0 2023-11-20 12:00:09,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1077646.6666666667, ans=0.125 2023-11-20 12:00:11,324 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161650 2023-11-20 12:00:12,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=15.0 2023-11-20 12:00:16,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-20 12:00:21,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1077646.6666666667, ans=0.125 2023-11-20 12:00:23,446 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5350, loss[loss=0.08061, simple_loss=0.1048, pruned_loss=0.01931, audio_tagging_loss=0.00889, over 15747.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.1009, pruned_loss=0.01943, audio_tagging_loss=0.009752, over 3038445.03 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:00:43,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1077780.0, ans=0.0 2023-11-20 12:00:53,315 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:00:54,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2023-11-20 12:01:05,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1077913.3333333333, ans=0.2 2023-11-20 12:01:14,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1077980.0, ans=0.035 2023-11-20 12:01:16,609 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161700 2023-11-20 12:01:28,391 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5400, loss[loss=0.07803, simple_loss=0.09188, pruned_loss=0.01997, audio_tagging_loss=0.01212, over 15004.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1014, pruned_loss=0.01957, audio_tagging_loss=0.009731, over 3031676.47 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:01:31,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1078046.6666666667, ans=0.125 2023-11-20 12:01:34,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1078046.6666666667, ans=0.125 2023-11-20 12:01:48,638 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.017e+01 8.531e+01 9.206e+01 1.102e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 12:01:59,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1078180.0, ans=0.04949747468305833 2023-11-20 12:01:59,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1078180.0, ans=0.2 2023-11-20 12:02:20,724 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161750 2023-11-20 12:02:25,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1078313.3333333333, ans=0.125 2023-11-20 12:02:32,319 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5450, loss[loss=0.0979, simple_loss=0.1209, pruned_loss=0.02555, audio_tagging_loss=0.01193, over 15173.00 frames. ], tot_loss[loss=0.08052, simple_loss=0.1016, pruned_loss=0.01984, audio_tagging_loss=0.009879, over 3034290.65 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:02:48,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-20 12:03:07,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.74 vs. limit=6.0 2023-11-20 12:03:17,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1078580.0, ans=0.125 2023-11-20 12:03:22,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1078646.6666666667, ans=0.125 2023-11-20 12:03:23,199 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-20 12:03:25,014 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161800 2023-11-20 12:03:29,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1078646.6666666667, ans=0.125 2023-11-20 12:03:37,497 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5500, loss[loss=0.09458, simple_loss=0.1196, pruned_loss=0.02544, audio_tagging_loss=0.009358, over 15252.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.1007, pruned_loss=0.01965, audio_tagging_loss=0.01004, over 3036219.63 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 8.0 2023-11-20 12:03:39,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1078713.3333333333, ans=0.1 2023-11-20 12:03:59,052 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.226e+01 8.739e+01 9.564e+01 1.235e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 12:04:03,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1078846.6666666667, ans=0.05 2023-11-20 12:04:13,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1078846.6666666667, ans=0.2 2023-11-20 12:04:26,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1078913.3333333333, ans=0.1 2023-11-20 12:04:29,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1078980.0, ans=0.125 2023-11-20 12:04:30,683 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161850 2023-11-20 12:04:32,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=12.0 2023-11-20 12:04:42,390 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5550, loss[loss=0.09647, simple_loss=0.1337, pruned_loss=0.02295, audio_tagging_loss=0.006664, over 15322.00 frames. ], tot_loss[loss=0.08056, simple_loss=0.1014, pruned_loss=0.0198, audio_tagging_loss=0.01009, over 3032007.75 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 8.0 2023-11-20 12:04:42,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1079046.6666666667, ans=0.125 2023-11-20 12:04:51,843 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:04:51,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1079046.6666666667, ans=0.0 2023-11-20 12:04:53,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1079046.6666666667, ans=0.09899494936611666 2023-11-20 12:04:53,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1079046.6666666667, ans=15.0 2023-11-20 12:04:57,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1079113.3333333333, ans=0.125 2023-11-20 12:05:07,054 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-11-20 12:05:20,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-11-20 12:05:23,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1079246.6666666667, ans=0.125 2023-11-20 12:05:27,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1079246.6666666667, ans=0.125 2023-11-20 12:05:35,622 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161900 2023-11-20 12:05:37,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1079313.3333333333, ans=0.0 2023-11-20 12:05:44,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1079313.3333333333, ans=0.07 2023-11-20 12:05:47,148 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5600, loss[loss=0.07101, simple_loss=0.08647, pruned_loss=0.01541, audio_tagging_loss=0.01236, over 14670.00 frames. ], tot_loss[loss=0.08129, simple_loss=0.1026, pruned_loss=0.01989, audio_tagging_loss=0.01012, over 3039414.52 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:05:47,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1079380.0, ans=0.125 2023-11-20 12:06:02,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1079446.6666666667, ans=0.0 2023-11-20 12:06:09,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.052e+01 8.920e+01 9.702e+01 1.303e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 12:06:21,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1079513.3333333333, ans=0.04949747468305833 2023-11-20 12:06:35,272 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:06:40,257 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 161950 2023-11-20 12:06:47,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1079646.6666666667, ans=0.125 2023-11-20 12:06:51,223 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5650, loss[loss=0.07294, simple_loss=0.09097, pruned_loss=0.01679, audio_tagging_loss=0.01067, over 15082.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1016, pruned_loss=0.01971, audio_tagging_loss=0.01016, over 3040468.24 frames. ], batch size: 55, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:07:22,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-11-20 12:07:31,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-20 12:07:39,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1079913.3333333333, ans=0.0 2023-11-20 12:07:45,016 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162000 2023-11-20 12:07:54,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1079980.0, ans=0.125 2023-11-20 12:07:56,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-20 12:07:56,860 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5700, loss[loss=0.08749, simple_loss=0.1115, pruned_loss=0.02471, audio_tagging_loss=0.007056, over 15522.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1009, pruned_loss=0.01961, audio_tagging_loss=0.01018, over 3035132.92 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:08:18,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1080113.3333333333, ans=0.125 2023-11-20 12:08:18,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.645e+01 7.906e+01 8.669e+01 9.570e+01 1.200e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 12:08:25,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1080180.0, ans=0.125 2023-11-20 12:08:50,299 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162050 2023-11-20 12:08:58,454 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:09:02,108 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5750, loss[loss=0.08867, simple_loss=0.1131, pruned_loss=0.02139, audio_tagging_loss=0.01073, over 14330.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1013, pruned_loss=0.01981, audio_tagging_loss=0.01, over 3037815.34 frames. ], batch size: 53, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:09:07,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1080380.0, ans=0.0 2023-11-20 12:09:33,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1080513.3333333333, ans=0.125 2023-11-20 12:09:55,129 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162100 2023-11-20 12:10:06,167 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5800, loss[loss=0.06185, simple_loss=0.0607, pruned_loss=0.01385, audio_tagging_loss=0.01764, over 14743.00 frames. ], tot_loss[loss=0.07953, simple_loss=0.1001, pruned_loss=0.01946, audio_tagging_loss=0.01002, over 3038617.12 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:10:23,604 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.14 vs. limit=22.5 2023-11-20 12:10:28,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.138e+01 8.619e+01 9.335e+01 1.829e+02, threshold=1.724e+02, percent-clipped=1.0 2023-11-20 12:10:59,965 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162150 2023-11-20 12:11:11,080 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5850, loss[loss=0.08656, simple_loss=0.1156, pruned_loss=0.01851, audio_tagging_loss=0.01023, over 14086.00 frames. ], tot_loss[loss=0.0795, simple_loss=0.1003, pruned_loss=0.01936, audio_tagging_loss=0.009971, over 3040002.48 frames. ], batch size: 52, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:11:13,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1081046.6666666667, ans=0.0 2023-11-20 12:11:15,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1081046.6666666667, ans=0.2 2023-11-20 12:11:18,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1081046.6666666667, ans=0.125 2023-11-20 12:12:03,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2023-11-20 12:12:04,352 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162200 2023-11-20 12:12:16,986 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5900, loss[loss=0.08231, simple_loss=0.1014, pruned_loss=0.01949, audio_tagging_loss=0.01212, over 14618.00 frames. ], tot_loss[loss=0.0795, simple_loss=0.1003, pruned_loss=0.01942, audio_tagging_loss=0.009913, over 3038207.03 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:12:27,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-20 12:12:38,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.547e+01 7.908e+01 8.564e+01 9.521e+01 1.124e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-20 12:12:39,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1081446.6666666667, ans=0.2 2023-11-20 12:13:10,407 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162250 2023-11-20 12:13:12,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2023-11-20 12:13:13,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1081646.6666666667, ans=0.125 2023-11-20 12:13:21,354 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 5950, loss[loss=0.1026, simple_loss=0.1362, pruned_loss=0.02432, audio_tagging_loss=0.01019, over 16725.00 frames. ], tot_loss[loss=0.07983, simple_loss=0.1009, pruned_loss=0.01957, audio_tagging_loss=0.009818, over 3039646.50 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:13:32,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1081713.3333333333, ans=0.0 2023-11-20 12:13:37,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2023-11-20 12:13:40,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1081780.0, ans=0.0 2023-11-20 12:14:06,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2023-11-20 12:14:14,701 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162300 2023-11-20 12:14:24,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1081980.0, ans=0.125 2023-11-20 12:14:26,362 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6000, loss[loss=0.06122, simple_loss=0.06783, pruned_loss=0.01376, audio_tagging_loss=0.01354, over 14483.00 frames. ], tot_loss[loss=0.07981, simple_loss=0.1008, pruned_loss=0.01964, audio_tagging_loss=0.009743, over 3040558.32 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:14:26,363 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 12:15:09,616 INFO [train_asr.py:1294] (2/4) Epoch 14, validation: loss=0.06225, simple_loss=0.05354, pruned_loss=0.005677, audio_tagging_loss=0.0298, over 4681554.00 frames. 2023-11-20 12:15:09,617 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 12:15:17,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-11-20 12:15:21,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1082113.3333333333, ans=0.2 2023-11-20 12:15:31,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.652e+01 8.215e+01 8.669e+01 9.712e+01 1.545e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 12:15:31,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1082113.3333333333, ans=0.0 2023-11-20 12:15:58,701 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:16:02,676 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162350 2023-11-20 12:16:11,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1082313.3333333333, ans=0.5 2023-11-20 12:16:13,776 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6050, loss[loss=0.087, simple_loss=0.1079, pruned_loss=0.02258, audio_tagging_loss=0.01048, over 15068.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.1009, pruned_loss=0.0197, audio_tagging_loss=0.009638, over 3036465.00 frames. ], batch size: 54, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:16:20,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1082380.0, ans=0.125 2023-11-20 12:16:59,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1082580.0, ans=0.07 2023-11-20 12:17:07,529 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162400 2023-11-20 12:17:10,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1082646.6666666667, ans=0.2 2023-11-20 12:17:19,033 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6100, loss[loss=0.08165, simple_loss=0.1056, pruned_loss=0.0171, audio_tagging_loss=0.01174, over 14482.00 frames. ], tot_loss[loss=0.07952, simple_loss=0.1007, pruned_loss=0.01948, audio_tagging_loss=0.009685, over 3040277.92 frames. ], batch size: 53, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:17:19,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2023-11-20 12:17:41,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.187e+01 8.098e+01 8.908e+01 9.804e+01 1.147e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 12:17:53,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1082846.6666666667, ans=0.0 2023-11-20 12:18:09,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1082980.0, ans=0.0 2023-11-20 12:18:12,864 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162450 2023-11-20 12:18:24,005 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6150, loss[loss=0.06871, simple_loss=0.0768, pruned_loss=0.01863, audio_tagging_loss=0.01168, over 14947.00 frames. ], tot_loss[loss=0.07908, simple_loss=0.09994, pruned_loss=0.01932, audio_tagging_loss=0.00979, over 3035137.23 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:18:45,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1083113.3333333333, ans=0.125 2023-11-20 12:18:47,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-20 12:18:58,400 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2023-11-20 12:19:16,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1083313.3333333333, ans=0.125 2023-11-20 12:19:17,355 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162500 2023-11-20 12:19:26,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1083313.3333333333, ans=0.125 2023-11-20 12:19:29,105 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6200, loss[loss=0.07281, simple_loss=0.0914, pruned_loss=0.0141, audio_tagging_loss=0.01301, over 15456.00 frames. ], tot_loss[loss=0.07887, simple_loss=0.09957, pruned_loss=0.01918, audio_tagging_loss=0.009903, over 3039601.71 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:19:38,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1083380.0, ans=0.2 2023-11-20 12:19:51,158 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.194e+01 8.727e+01 9.528e+01 1.309e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 12:19:53,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2023-11-20 12:20:21,993 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162550 2023-11-20 12:20:33,738 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6250, loss[loss=0.07122, simple_loss=0.08286, pruned_loss=0.01506, audio_tagging_loss=0.01473, over 14306.00 frames. ], tot_loss[loss=0.07824, simple_loss=0.09852, pruned_loss=0.01888, audio_tagging_loss=0.0101, over 3039380.70 frames. ], batch size: 54, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:20:33,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1083713.3333333333, ans=0.1 2023-11-20 12:20:49,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-11-20 12:20:49,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1083780.0, ans=15.0 2023-11-20 12:21:00,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=12.0 2023-11-20 12:21:26,689 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162600 2023-11-20 12:21:37,966 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6300, loss[loss=0.08934, simple_loss=0.1231, pruned_loss=0.02028, audio_tagging_loss=0.00752, over 15262.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.1, pruned_loss=0.01898, audio_tagging_loss=0.01016, over 3048040.73 frames. ], batch size: 55, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:21:43,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-20 12:21:45,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1084046.6666666667, ans=0.125 2023-11-20 12:21:51,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2023-11-20 12:22:00,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.702e+01 8.158e+01 9.011e+01 9.819e+01 1.577e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 12:22:11,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1084180.0, ans=0.125 2023-11-20 12:22:11,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1084180.0, ans=0.125 2023-11-20 12:22:32,226 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162650 2023-11-20 12:22:43,819 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6350, loss[loss=0.06603, simple_loss=0.07789, pruned_loss=0.01578, audio_tagging_loss=0.0113, over 15141.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.09886, pruned_loss=0.01867, audio_tagging_loss=0.01024, over 3049230.73 frames. ], batch size: 58, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:22:58,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1084446.6666666667, ans=0.5 2023-11-20 12:23:01,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1084446.6666666667, ans=0.0 2023-11-20 12:23:19,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1084513.3333333333, ans=0.1 2023-11-20 12:23:24,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1084580.0, ans=0.125 2023-11-20 12:23:32,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1084580.0, ans=0.0 2023-11-20 12:23:35,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1084646.6666666667, ans=0.125 2023-11-20 12:23:36,505 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162700 2023-11-20 12:23:48,110 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6400, loss[loss=0.1138, simple_loss=0.1443, pruned_loss=0.03194, audio_tagging_loss=0.009724, over 15565.00 frames. ], tot_loss[loss=0.07875, simple_loss=0.09927, pruned_loss=0.01882, audio_tagging_loss=0.01029, over 3046155.99 frames. ], batch size: 53, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:24:04,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1084780.0, ans=0.1 2023-11-20 12:24:06,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1084780.0, ans=0.0 2023-11-20 12:24:10,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.159e+01 8.686e+01 9.396e+01 1.221e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 12:24:40,970 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162750 2023-11-20 12:24:48,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1084980.0, ans=0.2 2023-11-20 12:24:52,683 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6450, loss[loss=0.07252, simple_loss=0.08199, pruned_loss=0.01667, audio_tagging_loss=0.01486, over 15468.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09881, pruned_loss=0.01873, audio_tagging_loss=0.0104, over 3043373.26 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:25:04,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1085113.3333333333, ans=0.07 2023-11-20 12:25:11,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1085113.3333333333, ans=0.125 2023-11-20 12:25:45,794 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162800 2023-11-20 12:25:57,218 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6500, loss[loss=0.05965, simple_loss=0.06655, pruned_loss=0.01368, audio_tagging_loss=0.0127, over 14518.00 frames. ], tot_loss[loss=0.07883, simple_loss=0.0992, pruned_loss=0.01895, audio_tagging_loss=0.01029, over 3046456.21 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:26:00,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2023-11-20 12:26:10,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1085446.6666666667, ans=0.125 2023-11-20 12:26:14,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-11-20 12:26:15,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1085446.6666666667, ans=0.125 2023-11-20 12:26:18,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.195e+01 8.750e+01 9.288e+01 1.237e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 12:26:47,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1085646.6666666667, ans=0.5 2023-11-20 12:26:49,745 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162850 2023-11-20 12:26:52,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=22.5 2023-11-20 12:27:01,708 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6550, loss[loss=0.07912, simple_loss=0.1055, pruned_loss=0.01907, audio_tagging_loss=0.007285, over 15402.00 frames. ], tot_loss[loss=0.07871, simple_loss=0.09917, pruned_loss=0.01893, audio_tagging_loss=0.01019, over 3049737.27 frames. ], batch size: 59, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:27:12,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1085713.3333333333, ans=0.1 2023-11-20 12:27:15,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1085780.0, ans=0.125 2023-11-20 12:27:25,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-11-20 12:27:45,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1085913.3333333333, ans=0.0 2023-11-20 12:27:47,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1085913.3333333333, ans=0.125 2023-11-20 12:27:54,759 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162900 2023-11-20 12:27:56,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-20 12:27:58,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.23 vs. limit=10.0 2023-11-20 12:28:06,264 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6600, loss[loss=0.06512, simple_loss=0.08025, pruned_loss=0.0142, audio_tagging_loss=0.01079, over 15459.00 frames. ], tot_loss[loss=0.07795, simple_loss=0.09819, pruned_loss=0.01867, audio_tagging_loss=0.01018, over 3042996.23 frames. ], batch size: 59, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:28:27,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.102e+01 8.649e+01 9.372e+01 1.515e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 12:28:42,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1086180.0, ans=0.1 2023-11-20 12:28:59,069 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 162950 2023-11-20 12:29:10,577 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6650, loss[loss=0.05962, simple_loss=0.07119, pruned_loss=0.01242, audio_tagging_loss=0.01161, over 15212.00 frames. ], tot_loss[loss=0.07776, simple_loss=0.09797, pruned_loss=0.01864, audio_tagging_loss=0.01013, over 3041359.01 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:29:13,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1086380.0, ans=0.1 2023-11-20 12:29:16,248 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2023-11-20 12:29:31,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1086446.6666666667, ans=0.125 2023-11-20 12:29:35,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1086513.3333333333, ans=0.0 2023-11-20 12:29:40,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1086513.3333333333, ans=0.125 2023-11-20 12:30:01,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1086646.6666666667, ans=0.125 2023-11-20 12:30:03,793 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163000 2023-11-20 12:30:13,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1086646.6666666667, ans=0.125 2023-11-20 12:30:14,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1086713.3333333333, ans=0.2 2023-11-20 12:30:15,834 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6700, loss[loss=0.0715, simple_loss=0.09127, pruned_loss=0.01598, audio_tagging_loss=0.009882, over 14911.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.09924, pruned_loss=0.01903, audio_tagging_loss=0.00997, over 3045591.06 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:30:18,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1086713.3333333333, ans=0.1 2023-11-20 12:30:37,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.088e+01 8.794e+01 9.537e+01 1.389e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 12:30:39,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1086780.0, ans=0.2 2023-11-20 12:30:50,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1086846.6666666667, ans=0.0 2023-11-20 12:31:07,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1086980.0, ans=0.125 2023-11-20 12:31:08,497 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163050 2023-11-20 12:31:08,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1086980.0, ans=0.0 2023-11-20 12:31:18,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1086980.0, ans=0.125 2023-11-20 12:31:20,550 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6750, loss[loss=0.101, simple_loss=0.1334, pruned_loss=0.02396, audio_tagging_loss=0.01034, over 15601.00 frames. ], tot_loss[loss=0.0784, simple_loss=0.09874, pruned_loss=0.01897, audio_tagging_loss=0.01006, over 3043688.99 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:31:30,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1087046.6666666667, ans=0.0 2023-11-20 12:31:35,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-11-20 12:31:40,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-11-20 12:31:44,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1087180.0, ans=0.125 2023-11-20 12:31:45,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1087180.0, ans=0.0 2023-11-20 12:31:58,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1087246.6666666667, ans=0.125 2023-11-20 12:32:13,372 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163100 2023-11-20 12:32:16,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1087313.3333333333, ans=0.0 2023-11-20 12:32:24,826 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6800, loss[loss=0.07854, simple_loss=0.1016, pruned_loss=0.0194, audio_tagging_loss=0.008333, over 14728.00 frames. ], tot_loss[loss=0.07814, simple_loss=0.09838, pruned_loss=0.01904, audio_tagging_loss=0.009915, over 3036408.55 frames. ], batch size: 55, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:32:45,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1087446.6666666667, ans=0.2 2023-11-20 12:32:45,883 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 7.924e+01 8.642e+01 9.401e+01 1.270e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 12:33:17,165 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163150 2023-11-20 12:33:28,577 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6850, loss[loss=0.07361, simple_loss=0.09774, pruned_loss=0.0166, audio_tagging_loss=0.008133, over 14819.00 frames. ], tot_loss[loss=0.07844, simple_loss=0.09889, pruned_loss=0.01912, audio_tagging_loss=0.009875, over 3031979.49 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:33:50,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1087780.0, ans=0.125 2023-11-20 12:33:56,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-20 12:34:02,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1087846.6666666667, ans=0.125 2023-11-20 12:34:21,363 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163200 2023-11-20 12:34:32,687 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6900, loss[loss=0.07881, simple_loss=0.1019, pruned_loss=0.01918, audio_tagging_loss=0.008664, over 14434.00 frames. ], tot_loss[loss=0.07848, simple_loss=0.09927, pruned_loss=0.01904, audio_tagging_loss=0.009799, over 3040119.29 frames. ], batch size: 53, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:34:39,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2023-11-20 12:34:42,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1088046.6666666667, ans=0.2 2023-11-20 12:34:44,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2023-11-20 12:34:55,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.124e+01 8.683e+01 9.436e+01 1.192e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 12:35:06,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1088180.0, ans=0.125 2023-11-20 12:35:06,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1088180.0, ans=0.0 2023-11-20 12:35:24,863 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:35:26,134 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163250 2023-11-20 12:35:38,415 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 6950, loss[loss=0.06295, simple_loss=0.0763, pruned_loss=0.01231, audio_tagging_loss=0.01248, over 15707.00 frames. ], tot_loss[loss=0.07895, simple_loss=0.1, pruned_loss=0.01914, audio_tagging_loss=0.009787, over 3039607.44 frames. ], batch size: 60, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:35:47,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2023-11-20 12:35:49,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1088446.6666666667, ans=0.2 2023-11-20 12:35:51,987 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:36:31,631 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163300 2023-11-20 12:36:35,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1088646.6666666667, ans=0.1 2023-11-20 12:36:42,510 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7000, loss[loss=0.08962, simple_loss=0.1111, pruned_loss=0.0247, audio_tagging_loss=0.009377, over 16606.00 frames. ], tot_loss[loss=0.0793, simple_loss=0.1005, pruned_loss=0.01926, audio_tagging_loss=0.009798, over 3044866.02 frames. ], batch size: 61, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:36:48,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1088713.3333333333, ans=0.125 2023-11-20 12:36:54,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1088780.0, ans=0.2 2023-11-20 12:36:55,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1088780.0, ans=0.0 2023-11-20 12:36:56,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1088780.0, ans=0.125 2023-11-20 12:37:04,379 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.412e+01 8.025e+01 8.662e+01 9.457e+01 1.125e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 12:37:07,517 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2023-11-20 12:37:17,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1088846.6666666667, ans=0.0 2023-11-20 12:37:19,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=22.5 2023-11-20 12:37:24,773 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2023-11-20 12:37:35,893 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163350 2023-11-20 12:37:46,687 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7050, loss[loss=0.08607, simple_loss=0.09607, pruned_loss=0.02595, audio_tagging_loss=0.01209, over 15412.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1014, pruned_loss=0.01959, audio_tagging_loss=0.009871, over 3050250.10 frames. ], batch size: 61, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:37:47,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-20 12:37:58,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1089113.3333333333, ans=0.2 2023-11-20 12:38:01,446 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:38:14,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2023-11-20 12:38:29,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1089246.6666666667, ans=0.0 2023-11-20 12:38:33,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2023-11-20 12:38:39,742 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163400 2023-11-20 12:38:51,916 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7100, loss[loss=0.08895, simple_loss=0.1237, pruned_loss=0.0197, audio_tagging_loss=0.00739, over 16555.00 frames. ], tot_loss[loss=0.07927, simple_loss=0.09995, pruned_loss=0.0193, audio_tagging_loss=0.009986, over 3054568.28 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:39:01,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1089380.0, ans=0.0 2023-11-20 12:39:14,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.043e+01 8.663e+01 9.375e+01 1.240e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 12:39:19,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1089513.3333333333, ans=0.0 2023-11-20 12:39:45,228 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163450 2023-11-20 12:39:51,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1089646.6666666667, ans=0.0 2023-11-20 12:39:55,999 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7150, loss[loss=0.09359, simple_loss=0.116, pruned_loss=0.02607, audio_tagging_loss=0.009516, over 14694.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.1008, pruned_loss=0.01965, audio_tagging_loss=0.01001, over 3053010.41 frames. ], batch size: 55, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:39:58,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1089713.3333333333, ans=0.2 2023-11-20 12:40:03,794 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2023-11-20 12:40:05,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1089713.3333333333, ans=0.125 2023-11-20 12:40:07,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=15.0 2023-11-20 12:40:27,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1089846.6666666667, ans=0.0 2023-11-20 12:40:41,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-20 12:40:48,643 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163500 2023-11-20 12:41:00,077 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7200, loss[loss=0.08393, simple_loss=0.1087, pruned_loss=0.01855, audio_tagging_loss=0.01102, over 15971.00 frames. ], tot_loss[loss=0.07981, simple_loss=0.1006, pruned_loss=0.01943, audio_tagging_loss=0.01007, over 3050361.88 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:41:04,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1090046.6666666667, ans=0.125 2023-11-20 12:41:23,956 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.075e+01 8.632e+01 9.241e+01 3.399e+02, threshold=1.726e+02, percent-clipped=1.0 2023-11-20 12:41:31,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1090180.0, ans=0.125 2023-11-20 12:41:36,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1090180.0, ans=0.125 2023-11-20 12:41:42,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1090246.6666666667, ans=0.125 2023-11-20 12:41:53,248 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163550 2023-11-20 12:42:03,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1090380.0, ans=0.0 2023-11-20 12:42:04,798 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7250, loss[loss=0.08512, simple_loss=0.1039, pruned_loss=0.02448, audio_tagging_loss=0.0087, over 14769.00 frames. ], tot_loss[loss=0.07985, simple_loss=0.1005, pruned_loss=0.01937, audio_tagging_loss=0.01024, over 3049231.51 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:42:18,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-20 12:42:37,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1090513.3333333333, ans=0.0 2023-11-20 12:42:44,508 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2023-11-20 12:42:50,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1090580.0, ans=0.0 2023-11-20 12:42:57,796 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163600 2023-11-20 12:43:07,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1090646.6666666667, ans=0.0 2023-11-20 12:43:09,581 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7300, loss[loss=0.07927, simple_loss=0.1052, pruned_loss=0.02069, audio_tagging_loss=0.005996, over 14375.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1014, pruned_loss=0.01954, audio_tagging_loss=0.01004, over 3047123.66 frames. ], batch size: 54, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:43:18,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1090713.3333333333, ans=0.0 2023-11-20 12:43:26,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1090780.0, ans=0.1 2023-11-20 12:43:26,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1090780.0, ans=0.125 2023-11-20 12:43:28,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1090780.0, ans=0.1 2023-11-20 12:43:31,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.253e+01 8.937e+01 9.591e+01 1.343e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 12:43:40,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-11-20 12:43:41,350 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:44:01,932 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163650 2023-11-20 12:44:04,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1090980.0, ans=0.125 2023-11-20 12:44:07,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2023-11-20 12:44:13,422 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7350, loss[loss=0.09239, simple_loss=0.1193, pruned_loss=0.02532, audio_tagging_loss=0.007411, over 15632.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.102, pruned_loss=0.01988, audio_tagging_loss=0.009869, over 3051187.73 frames. ], batch size: 58, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:44:18,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1091046.6666666667, ans=0.0 2023-11-20 12:44:28,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1091113.3333333333, ans=0.125 2023-11-20 12:44:47,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2023-11-20 12:44:51,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1091246.6666666667, ans=0.125 2023-11-20 12:44:55,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1091246.6666666667, ans=0.0 2023-11-20 12:44:55,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2023-11-20 12:44:59,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1091246.6666666667, ans=0.025 2023-11-20 12:45:01,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1091246.6666666667, ans=0.1 2023-11-20 12:45:04,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1091313.3333333333, ans=0.125 2023-11-20 12:45:05,636 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163700 2023-11-20 12:45:14,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1091313.3333333333, ans=0.0 2023-11-20 12:45:16,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1091380.0, ans=0.1 2023-11-20 12:45:17,047 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7400, loss[loss=0.08886, simple_loss=0.1132, pruned_loss=0.02251, audio_tagging_loss=0.009759, over 14528.00 frames. ], tot_loss[loss=0.08074, simple_loss=0.1021, pruned_loss=0.01989, audio_tagging_loss=0.009794, over 3049078.93 frames. ], batch size: 54, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:45:29,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1091446.6666666667, ans=0.125 2023-11-20 12:45:38,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1091446.6666666667, ans=0.125 2023-11-20 12:45:41,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.771e+01 8.379e+01 9.065e+01 9.616e+01 1.278e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-20 12:45:58,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1091580.0, ans=0.0 2023-11-20 12:46:02,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1091580.0, ans=0.0 2023-11-20 12:46:10,365 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163750 2023-11-20 12:46:16,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-20 12:46:21,970 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7450, loss[loss=0.08654, simple_loss=0.1163, pruned_loss=0.02076, audio_tagging_loss=0.007647, over 15874.00 frames. ], tot_loss[loss=0.07986, simple_loss=0.1007, pruned_loss=0.01967, audio_tagging_loss=0.009862, over 3042910.56 frames. ], batch size: 55, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:46:38,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1091780.0, ans=0.125 2023-11-20 12:46:39,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1091780.0, ans=0.0 2023-11-20 12:46:40,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-11-20 12:46:59,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1091913.3333333333, ans=0.1 2023-11-20 12:47:10,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1091913.3333333333, ans=0.09899494936611666 2023-11-20 12:47:14,959 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163800 2023-11-20 12:47:27,448 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7500, loss[loss=0.07986, simple_loss=0.1047, pruned_loss=0.019, audio_tagging_loss=0.008504, over 14052.00 frames. ], tot_loss[loss=0.07836, simple_loss=0.09848, pruned_loss=0.01913, audio_tagging_loss=0.009989, over 3047350.26 frames. ], batch size: 52, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:47:31,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1092046.6666666667, ans=0.0 2023-11-20 12:47:34,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1092046.6666666667, ans=0.125 2023-11-20 12:47:51,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.205e+01 8.885e+01 9.811e+01 1.439e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 12:47:52,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1092180.0, ans=0.125 2023-11-20 12:48:19,233 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163850 2023-11-20 12:48:19,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-20 12:48:30,687 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7550, loss[loss=0.08381, simple_loss=0.1008, pruned_loss=0.02131, audio_tagging_loss=0.01208, over 14764.00 frames. ], tot_loss[loss=0.07805, simple_loss=0.09818, pruned_loss=0.01904, audio_tagging_loss=0.009925, over 3047427.61 frames. ], batch size: 57, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:48:40,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1092380.0, ans=6.0 2023-11-20 12:48:41,322 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:48:45,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1092446.6666666667, ans=0.0 2023-11-20 12:49:10,600 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:49:23,461 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163900 2023-11-20 12:49:31,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1092646.6666666667, ans=0.0 2023-11-20 12:49:34,860 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7600, loss[loss=0.09044, simple_loss=0.12, pruned_loss=0.02081, audio_tagging_loss=0.009637, over 15076.00 frames. ], tot_loss[loss=0.07799, simple_loss=0.09803, pruned_loss=0.019, audio_tagging_loss=0.00998, over 3050813.95 frames. ], batch size: 57, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:49:43,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1092713.3333333333, ans=0.07 2023-11-20 12:49:54,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1092780.0, ans=0.125 2023-11-20 12:49:55,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1092780.0, ans=0.125 2023-11-20 12:49:59,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.009e+01 8.543e+01 9.202e+01 1.104e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 12:50:04,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1092846.6666666667, ans=0.0 2023-11-20 12:50:20,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-11-20 12:50:21,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1092913.3333333333, ans=0.0 2023-11-20 12:50:24,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1092913.3333333333, ans=0.125 2023-11-20 12:50:27,557 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 163950 2023-11-20 12:50:32,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1092980.0, ans=0.125 2023-11-20 12:50:39,682 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7650, loss[loss=0.06976, simple_loss=0.09548, pruned_loss=0.01151, audio_tagging_loss=0.01051, over 17155.00 frames. ], tot_loss[loss=0.07746, simple_loss=0.09756, pruned_loss=0.01874, audio_tagging_loss=0.009942, over 3051669.18 frames. ], batch size: 64, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:50:50,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1093113.3333333333, ans=0.1 2023-11-20 12:50:56,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1093113.3333333333, ans=0.04949747468305833 2023-11-20 12:50:57,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-20 12:51:04,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1093180.0, ans=0.0 2023-11-20 12:51:07,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1093180.0, ans=0.1 2023-11-20 12:51:08,113 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-11-20 12:51:29,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1093313.3333333333, ans=0.125 2023-11-20 12:51:31,779 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164000 2023-11-20 12:51:47,518 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7700, loss[loss=0.06949, simple_loss=0.0875, pruned_loss=0.01662, audio_tagging_loss=0.009118, over 15536.00 frames. ], tot_loss[loss=0.07756, simple_loss=0.09796, pruned_loss=0.01867, audio_tagging_loss=0.009907, over 3049899.29 frames. ], batch size: 59, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:51:52,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1093380.0, ans=0.0 2023-11-20 12:52:05,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1093446.6666666667, ans=0.125 2023-11-20 12:52:07,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1093446.6666666667, ans=0.125 2023-11-20 12:52:10,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1093446.6666666667, ans=0.5 2023-11-20 12:52:11,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.534e+01 7.870e+01 8.763e+01 9.438e+01 1.322e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 12:52:17,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1093513.3333333333, ans=0.2 2023-11-20 12:52:32,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1093580.0, ans=0.125 2023-11-20 12:52:32,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1093580.0, ans=0.2 2023-11-20 12:52:36,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1093580.0, ans=0.04949747468305833 2023-11-20 12:52:36,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-11-20 12:52:39,895 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164050 2023-11-20 12:52:51,387 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7750, loss[loss=0.1031, simple_loss=0.1404, pruned_loss=0.02501, audio_tagging_loss=0.007879, over 14370.00 frames. ], tot_loss[loss=0.07851, simple_loss=0.09911, pruned_loss=0.01908, audio_tagging_loss=0.00988, over 3049348.63 frames. ], batch size: 54, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:52:58,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1093713.3333333333, ans=0.0 2023-11-20 12:53:09,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-11-20 12:53:44,128 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164100 2023-11-20 12:53:55,677 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7800, loss[loss=0.0869, simple_loss=0.1009, pruned_loss=0.02547, audio_tagging_loss=0.011, over 14572.00 frames. ], tot_loss[loss=0.07864, simple_loss=0.09924, pruned_loss=0.01907, audio_tagging_loss=0.009955, over 3042745.51 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:54:02,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-11-20 12:54:20,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.543e+01 8.289e+01 9.083e+01 1.010e+02 1.614e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 12:54:40,005 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=15.0 2023-11-20 12:54:46,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1094313.3333333333, ans=0.2 2023-11-20 12:54:48,377 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164150 2023-11-20 12:54:59,848 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7850, loss[loss=0.08759, simple_loss=0.1094, pruned_loss=0.02334, audio_tagging_loss=0.009538, over 15802.00 frames. ], tot_loss[loss=0.07897, simple_loss=0.09943, pruned_loss=0.01924, audio_tagging_loss=0.01002, over 3037539.61 frames. ], batch size: 56, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:55:36,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1094513.3333333333, ans=0.1 2023-11-20 12:55:53,669 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164200 2023-11-20 12:56:05,323 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7900, loss[loss=0.07463, simple_loss=0.08881, pruned_loss=0.01876, audio_tagging_loss=0.01147, over 14384.00 frames. ], tot_loss[loss=0.0793, simple_loss=0.09989, pruned_loss=0.01926, audio_tagging_loss=0.01009, over 3042765.90 frames. ], batch size: 54, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:56:12,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1094713.3333333333, ans=0.2 2023-11-20 12:56:28,981 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.207e+01 9.087e+01 9.691e+01 1.187e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 12:56:56,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1094980.0, ans=0.04949747468305833 2023-11-20 12:56:58,022 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164250 2023-11-20 12:57:09,133 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 7950, loss[loss=0.0962, simple_loss=0.1274, pruned_loss=0.02428, audio_tagging_loss=0.008216, over 15065.00 frames. ], tot_loss[loss=0.07994, simple_loss=0.1003, pruned_loss=0.0196, audio_tagging_loss=0.01018, over 3037791.43 frames. ], batch size: 53, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:57:19,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1095046.6666666667, ans=0.125 2023-11-20 12:57:26,257 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:57:51,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2023-11-20 12:57:58,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1095246.6666666667, ans=0.1 2023-11-20 12:58:02,406 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164300 2023-11-20 12:58:06,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-20 12:58:13,261 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8000, loss[loss=0.08098, simple_loss=0.0942, pruned_loss=0.02136, audio_tagging_loss=0.01252, over 14556.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1007, pruned_loss=0.01979, audio_tagging_loss=0.01019, over 3046513.58 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:58:15,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-20 12:58:26,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=12.0 2023-11-20 12:58:39,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.429e+01 8.050e+01 8.646e+01 9.454e+01 1.422e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 12:58:47,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-20 12:58:53,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1095580.0, ans=0.05 2023-11-20 12:59:01,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1095580.0, ans=0.125 2023-11-20 12:59:06,821 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164350 2023-11-20 12:59:17,714 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8050, loss[loss=0.08215, simple_loss=0.09058, pruned_loss=0.02294, audio_tagging_loss=0.01391, over 15212.00 frames. ], tot_loss[loss=0.07985, simple_loss=0.1, pruned_loss=0.01955, audio_tagging_loss=0.01028, over 3048499.87 frames. ], batch size: 60, lr: 4.90e-03, grad_scale: 16.0 2023-11-20 12:59:25,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1095713.3333333333, ans=10.0 2023-11-20 12:59:54,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1095846.6666666667, ans=0.0 2023-11-20 12:59:54,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-20 13:00:10,889 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164400 2023-11-20 13:00:17,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1095980.0, ans=0.125 2023-11-20 13:00:22,245 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8100, loss[loss=0.08757, simple_loss=0.1095, pruned_loss=0.02231, audio_tagging_loss=0.01053, over 14938.00 frames. ], tot_loss[loss=0.08016, simple_loss=0.1008, pruned_loss=0.01964, audio_tagging_loss=0.01014, over 3044229.67 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:00:50,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.681e+01 9.657e+01 1.040e+02 1.994e+02, threshold=1.931e+02, percent-clipped=2.0 2023-11-20 13:00:56,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-11-20 13:00:57,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1096180.0, ans=0.0 2023-11-20 13:01:02,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1096246.6666666667, ans=0.0 2023-11-20 13:01:15,031 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164450 2023-11-20 13:01:15,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1096313.3333333333, ans=0.125 2023-11-20 13:01:26,618 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8150, loss[loss=0.07301, simple_loss=0.1004, pruned_loss=0.01773, audio_tagging_loss=0.005073, over 15230.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1009, pruned_loss=0.01962, audio_tagging_loss=0.009996, over 3049256.19 frames. ], batch size: 56, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:01:27,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1096380.0, ans=0.1 2023-11-20 13:01:34,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1096380.0, ans=0.125 2023-11-20 13:01:43,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1096446.6666666667, ans=0.125 2023-11-20 13:01:50,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.94 vs. limit=10.0 2023-11-20 13:01:51,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1096446.6666666667, ans=0.125 2023-11-20 13:01:53,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1096513.3333333333, ans=0.02 2023-11-20 13:01:54,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1096513.3333333333, ans=0.2 2023-11-20 13:02:08,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1096580.0, ans=0.0 2023-11-20 13:02:13,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1096580.0, ans=0.035 2023-11-20 13:02:14,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1096580.0, ans=0.125 2023-11-20 13:02:18,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1096646.6666666667, ans=0.0 2023-11-20 13:02:19,703 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164500 2023-11-20 13:02:23,245 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2023-11-20 13:02:31,755 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8200, loss[loss=0.1134, simple_loss=0.1349, pruned_loss=0.03493, audio_tagging_loss=0.01099, over 15205.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1009, pruned_loss=0.01955, audio_tagging_loss=0.009854, over 3048957.35 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:02:33,014 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:03:00,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.378e+01 8.960e+01 9.917e+01 5.109e+02, threshold=1.792e+02, percent-clipped=1.0 2023-11-20 13:03:20,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1096913.3333333333, ans=0.0 2023-11-20 13:03:21,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1096913.3333333333, ans=0.09899494936611666 2023-11-20 13:03:24,760 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164550 2023-11-20 13:03:36,293 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8250, loss[loss=0.06107, simple_loss=0.07834, pruned_loss=0.01045, audio_tagging_loss=0.01145, over 15012.00 frames. ], tot_loss[loss=0.07946, simple_loss=0.1003, pruned_loss=0.01942, audio_tagging_loss=0.009893, over 3047426.23 frames. ], batch size: 56, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:03:44,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1097046.6666666667, ans=0.0 2023-11-20 13:03:52,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2023-11-20 13:03:53,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1097113.3333333333, ans=0.0 2023-11-20 13:04:28,727 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164600 2023-11-20 13:04:40,225 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8300, loss[loss=0.06121, simple_loss=0.06775, pruned_loss=0.01485, audio_tagging_loss=0.01249, over 15076.00 frames. ], tot_loss[loss=0.07865, simple_loss=0.09943, pruned_loss=0.01911, audio_tagging_loss=0.009821, over 3050082.59 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:04:42,833 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:05:01,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1097446.6666666667, ans=0.125 2023-11-20 13:05:05,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:05,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:08,984 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.247e+01 8.781e+01 9.736e+01 1.742e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 13:05:10,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:32,909 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164650 2023-11-20 13:05:37,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.50 vs. limit=15.0 2023-11-20 13:05:42,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1097646.6666666667, ans=0.2 2023-11-20 13:05:44,410 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8350, loss[loss=0.06895, simple_loss=0.0908, pruned_loss=0.01315, audio_tagging_loss=0.01039, over 15370.00 frames. ], tot_loss[loss=0.07869, simple_loss=0.09972, pruned_loss=0.01904, audio_tagging_loss=0.009785, over 3062571.57 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 8.0 2023-11-20 13:06:05,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1097780.0, ans=0.125 2023-11-20 13:06:15,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1097846.6666666667, ans=0.125 2023-11-20 13:06:24,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1097913.3333333333, ans=0.1 2023-11-20 13:06:31,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1097913.3333333333, ans=0.0 2023-11-20 13:06:36,786 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164700 2023-11-20 13:06:40,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1097980.0, ans=0.125 2023-11-20 13:06:48,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1098046.6666666667, ans=0.1 2023-11-20 13:06:48,998 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8400, loss[loss=0.0768, simple_loss=0.09951, pruned_loss=0.01711, audio_tagging_loss=0.009936, over 14561.00 frames. ], tot_loss[loss=0.07868, simple_loss=0.0997, pruned_loss=0.0191, audio_tagging_loss=0.009727, over 3057720.48 frames. ], batch size: 56, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:07:02,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1098113.3333333333, ans=0.125 2023-11-20 13:07:10,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1098113.3333333333, ans=0.125 2023-11-20 13:07:17,345 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.406e+01 7.789e+01 8.682e+01 9.296e+01 1.321e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 13:07:34,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=15.0 2023-11-20 13:07:41,787 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164750 2023-11-20 13:07:45,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1098313.3333333333, ans=0.0 2023-11-20 13:07:53,344 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8450, loss[loss=0.06647, simple_loss=0.07698, pruned_loss=0.01542, audio_tagging_loss=0.01257, over 14526.00 frames. ], tot_loss[loss=0.07859, simple_loss=0.09963, pruned_loss=0.01895, audio_tagging_loss=0.009821, over 3055599.00 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:08:01,096 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.150e-01 2023-11-20 13:08:06,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1098446.6666666667, ans=0.125 2023-11-20 13:08:15,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1098446.6666666667, ans=0.125 2023-11-20 13:08:17,673 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:08:23,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1098513.3333333333, ans=0.0 2023-11-20 13:08:33,030 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2023-11-20 13:08:41,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1098580.0, ans=0.125 2023-11-20 13:08:46,030 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164800 2023-11-20 13:08:57,710 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8500, loss[loss=0.07929, simple_loss=0.09408, pruned_loss=0.02273, audio_tagging_loss=0.009518, over 13798.00 frames. ], tot_loss[loss=0.07846, simple_loss=0.09938, pruned_loss=0.01892, audio_tagging_loss=0.009855, over 3057874.57 frames. ], batch size: 53, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:09:04,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1098713.3333333333, ans=0.2 2023-11-20 13:09:14,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1098780.0, ans=0.125 2023-11-20 13:09:16,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2023-11-20 13:09:25,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.485e+01 9.217e+01 9.923e+01 2.190e+02, threshold=1.843e+02, percent-clipped=1.0 2023-11-20 13:09:50,083 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164850 2023-11-20 13:10:01,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1099046.6666666667, ans=0.125 2023-11-20 13:10:02,329 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8550, loss[loss=0.07957, simple_loss=0.09098, pruned_loss=0.02318, audio_tagging_loss=0.01089, over 14923.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09954, pruned_loss=0.01884, audio_tagging_loss=0.009929, over 3057308.36 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:10:03,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1099046.6666666667, ans=0.2 2023-11-20 13:10:27,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1099180.0, ans=0.0 2023-11-20 13:10:50,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-20 13:10:55,024 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164900 2023-11-20 13:10:58,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1099313.3333333333, ans=0.125 2023-11-20 13:11:06,509 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8600, loss[loss=0.04521, simple_loss=0.04558, pruned_loss=0.006797, audio_tagging_loss=0.01562, over 13636.00 frames. ], tot_loss[loss=0.0784, simple_loss=0.09884, pruned_loss=0.01891, audio_tagging_loss=0.01007, over 3047750.96 frames. ], batch size: 54, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:11:17,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1099446.6666666667, ans=0.125 2023-11-20 13:11:26,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1099446.6666666667, ans=0.125 2023-11-20 13:11:26,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1099446.6666666667, ans=0.125 2023-11-20 13:11:33,753 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.330e+01 8.002e+01 8.794e+01 9.640e+01 1.342e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 13:11:46,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1099580.0, ans=0.0 2023-11-20 13:11:50,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1099580.0, ans=0.0 2023-11-20 13:11:58,961 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 164950 2023-11-20 13:12:01,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1099646.6666666667, ans=0.0 2023-11-20 13:12:09,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1099713.3333333333, ans=0.125 2023-11-20 13:12:10,622 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8650, loss[loss=0.07139, simple_loss=0.08134, pruned_loss=0.02141, audio_tagging_loss=0.009312, over 13819.00 frames. ], tot_loss[loss=0.07892, simple_loss=0.09943, pruned_loss=0.0191, audio_tagging_loss=0.01011, over 3050381.57 frames. ], batch size: 54, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:12:12,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1099713.3333333333, ans=0.125 2023-11-20 13:12:14,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1099713.3333333333, ans=0.125 2023-11-20 13:12:17,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1099713.3333333333, ans=0.125 2023-11-20 13:12:19,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1099713.3333333333, ans=0.1 2023-11-20 13:12:23,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1099780.0, ans=0.05 2023-11-20 13:12:33,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-20 13:12:48,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1099913.3333333333, ans=0.0 2023-11-20 13:13:03,806 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165000 2023-11-20 13:13:04,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2023-11-20 13:13:12,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1099980.0, ans=0.125 2023-11-20 13:13:14,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1100046.6666666667, ans=0.0 2023-11-20 13:13:15,589 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8700, loss[loss=0.07097, simple_loss=0.08525, pruned_loss=0.01724, audio_tagging_loss=0.01111, over 14672.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.09993, pruned_loss=0.01928, audio_tagging_loss=0.01013, over 3046107.63 frames. ], batch size: 54, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:13:23,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1100046.6666666667, ans=0.09899494936611666 2023-11-20 13:13:30,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-11-20 13:13:35,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-11-20 13:13:42,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-20 13:13:42,534 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-20 13:13:44,008 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.298e+01 8.856e+01 9.572e+01 1.265e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 13:14:03,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1100246.6666666667, ans=0.1 2023-11-20 13:14:08,460 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165050 2023-11-20 13:14:12,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1100313.3333333333, ans=0.0 2023-11-20 13:14:20,653 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8750, loss[loss=0.05578, simple_loss=0.07608, pruned_loss=0.009839, audio_tagging_loss=0.007904, over 16040.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.1001, pruned_loss=0.01939, audio_tagging_loss=0.01021, over 3047113.58 frames. ], batch size: 60, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:15:06,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1100580.0, ans=0.125 2023-11-20 13:15:08,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1100580.0, ans=0.0 2023-11-20 13:15:13,148 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165100 2023-11-20 13:15:24,010 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8800, loss[loss=0.08404, simple_loss=0.1097, pruned_loss=0.0186, audio_tagging_loss=0.01057, over 15251.00 frames. ], tot_loss[loss=0.07997, simple_loss=0.1003, pruned_loss=0.01953, audio_tagging_loss=0.01029, over 3047430.81 frames. ], batch size: 56, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:15:24,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-11-20 13:15:26,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1100713.3333333333, ans=0.125 2023-11-20 13:15:49,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1100846.6666666667, ans=0.0 2023-11-20 13:15:51,938 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.177e+01 8.747e+01 9.582e+01 1.210e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 13:15:57,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1100846.6666666667, ans=0.2 2023-11-20 13:16:11,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1100913.3333333333, ans=0.1 2023-11-20 13:16:16,767 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165150 2023-11-20 13:16:22,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1100980.0, ans=0.0 2023-11-20 13:16:23,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1100980.0, ans=0.1 2023-11-20 13:16:27,594 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8850, loss[loss=0.0715, simple_loss=0.08971, pruned_loss=0.01775, audio_tagging_loss=0.008887, over 14899.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1008, pruned_loss=0.01966, audio_tagging_loss=0.01021, over 3051542.51 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:16:29,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1101046.6666666667, ans=0.2 2023-11-20 13:16:30,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2023-11-20 13:16:40,503 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:16:48,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1101113.3333333333, ans=0.025 2023-11-20 13:17:21,049 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165200 2023-11-20 13:17:29,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1101313.3333333333, ans=0.125 2023-11-20 13:17:32,355 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8900, loss[loss=0.0797, simple_loss=0.09997, pruned_loss=0.02014, audio_tagging_loss=0.009574, over 14894.00 frames. ], tot_loss[loss=0.08011, simple_loss=0.1009, pruned_loss=0.01964, audio_tagging_loss=0.01003, over 3049867.04 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:17:57,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1101513.3333333333, ans=0.05 2023-11-20 13:18:00,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.398e+01 8.901e+01 9.799e+01 1.599e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 13:18:01,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1101513.3333333333, ans=0.125 2023-11-20 13:18:03,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1101513.3333333333, ans=0.125 2023-11-20 13:18:23,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-20 13:18:25,638 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165250 2023-11-20 13:18:27,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1101646.6666666667, ans=0.125 2023-11-20 13:18:31,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1101646.6666666667, ans=0.125 2023-11-20 13:18:34,450 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2023-11-20 13:18:37,249 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 8950, loss[loss=0.06459, simple_loss=0.08675, pruned_loss=0.01401, audio_tagging_loss=0.007205, over 15401.00 frames. ], tot_loss[loss=0.07981, simple_loss=0.1007, pruned_loss=0.01959, audio_tagging_loss=0.009869, over 3047767.66 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:18:50,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1101780.0, ans=0.125 2023-11-20 13:18:51,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1101780.0, ans=0.125 2023-11-20 13:18:59,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-20 13:19:20,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1101913.3333333333, ans=0.2 2023-11-20 13:19:29,853 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165300 2023-11-20 13:19:37,412 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-20 13:19:40,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1102046.6666666667, ans=0.125 2023-11-20 13:19:41,539 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9000, loss[loss=0.08036, simple_loss=0.113, pruned_loss=0.01664, audio_tagging_loss=0.007226, over 14717.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1015, pruned_loss=0.01961, audio_tagging_loss=0.009818, over 3044351.39 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:19:41,539 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 13:20:06,742 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.4828, 6.2062, 6.4443, 6.1026], device='cuda:2') 2023-11-20 13:20:23,251 INFO [train_asr.py:1294] (2/4) Epoch 14, validation: loss=0.06237, simple_loss=0.05346, pruned_loss=0.005661, audio_tagging_loss=0.02998, over 4681554.00 frames. 2023-11-20 13:20:23,252 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 13:20:25,374 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:20:27,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1102046.6666666667, ans=0.0 2023-11-20 13:20:29,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1102046.6666666667, ans=0.05 2023-11-20 13:20:35,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-20 13:20:36,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1102113.3333333333, ans=0.125 2023-11-20 13:20:38,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1102113.3333333333, ans=0.125 2023-11-20 13:20:50,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 8.335e+01 8.996e+01 9.763e+01 1.376e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 13:21:16,389 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165350 2023-11-20 13:21:20,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1102313.3333333333, ans=0.0 2023-11-20 13:21:27,331 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9050, loss[loss=0.05384, simple_loss=0.06353, pruned_loss=0.01199, audio_tagging_loss=0.01009, over 14670.00 frames. ], tot_loss[loss=0.08028, simple_loss=0.1019, pruned_loss=0.01964, audio_tagging_loss=0.009685, over 3046060.86 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:21:29,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1102380.0, ans=0.0 2023-11-20 13:21:40,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1102446.6666666667, ans=0.125 2023-11-20 13:21:54,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1102513.3333333333, ans=0.125 2023-11-20 13:22:05,861 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:22:16,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1102580.0, ans=0.1 2023-11-20 13:22:20,861 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165400 2023-11-20 13:22:32,187 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9100, loss[loss=0.08091, simple_loss=0.1105, pruned_loss=0.01569, audio_tagging_loss=0.009977, over 16060.00 frames. ], tot_loss[loss=0.07968, simple_loss=0.101, pruned_loss=0.01949, audio_tagging_loss=0.009702, over 3049629.50 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:22:32,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1102713.3333333333, ans=0.125 2023-11-20 13:22:34,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1102713.3333333333, ans=0.125 2023-11-20 13:22:34,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1102713.3333333333, ans=0.0 2023-11-20 13:22:47,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1102780.0, ans=0.0 2023-11-20 13:23:01,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.047e+01 8.709e+01 9.562e+01 1.643e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 13:23:01,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1102846.6666666667, ans=0.0 2023-11-20 13:23:25,755 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165450 2023-11-20 13:23:36,772 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9150, loss[loss=0.1021, simple_loss=0.1287, pruned_loss=0.02571, audio_tagging_loss=0.01199, over 16341.00 frames. ], tot_loss[loss=0.0794, simple_loss=0.1005, pruned_loss=0.01941, audio_tagging_loss=0.009736, over 3051821.85 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:23:36,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1103046.6666666667, ans=0.125 2023-11-20 13:23:39,342 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=12.0 2023-11-20 13:23:41,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1103046.6666666667, ans=0.125 2023-11-20 13:24:03,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1103180.0, ans=0.0 2023-11-20 13:24:15,114 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.903e-02 2023-11-20 13:24:21,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1103246.6666666667, ans=0.125 2023-11-20 13:24:30,214 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165500 2023-11-20 13:24:41,877 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9200, loss[loss=0.0807, simple_loss=0.0995, pruned_loss=0.0234, audio_tagging_loss=0.007552, over 14453.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.1005, pruned_loss=0.01926, audio_tagging_loss=0.009669, over 3055402.31 frames. ], batch size: 53, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:24:56,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1103446.6666666667, ans=0.0 2023-11-20 13:25:06,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1103513.3333333333, ans=0.125 2023-11-20 13:25:10,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.175e+01 8.950e+01 9.913e+01 1.287e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 13:25:31,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1103580.0, ans=0.125 2023-11-20 13:25:35,329 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165550 2023-11-20 13:25:46,411 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9250, loss[loss=0.07424, simple_loss=0.09644, pruned_loss=0.01647, audio_tagging_loss=0.009547, over 14962.00 frames. ], tot_loss[loss=0.07864, simple_loss=0.09958, pruned_loss=0.01912, audio_tagging_loss=0.009723, over 3058004.28 frames. ], batch size: 58, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:25:55,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1103713.3333333333, ans=0.125 2023-11-20 13:25:56,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1103713.3333333333, ans=0.025 2023-11-20 13:26:06,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1103780.0, ans=0.125 2023-11-20 13:26:23,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1103846.6666666667, ans=0.1 2023-11-20 13:26:38,972 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165600 2023-11-20 13:26:46,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=22.5 2023-11-20 13:26:50,983 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9300, loss[loss=0.07989, simple_loss=0.1061, pruned_loss=0.01894, audio_tagging_loss=0.007923, over 16210.00 frames. ], tot_loss[loss=0.0786, simple_loss=0.09965, pruned_loss=0.01906, audio_tagging_loss=0.009715, over 3058670.40 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:26:53,695 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:27:21,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.442e+01 7.830e+01 8.462e+01 9.599e+01 1.223e+02, threshold=1.692e+02, percent-clipped=0.0 2023-11-20 13:27:26,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1104180.0, ans=0.125 2023-11-20 13:27:44,382 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165650 2023-11-20 13:27:53,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1104313.3333333333, ans=0.125 2023-11-20 13:27:55,256 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9350, loss[loss=0.05094, simple_loss=0.06059, pruned_loss=0.01217, audio_tagging_loss=0.00847, over 14163.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.09959, pruned_loss=0.01921, audio_tagging_loss=0.009845, over 3051422.35 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:28:02,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1104380.0, ans=0.125 2023-11-20 13:28:22,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1104513.3333333333, ans=0.04949747468305833 2023-11-20 13:28:31,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1104513.3333333333, ans=0.125 2023-11-20 13:28:35,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1104580.0, ans=0.0 2023-11-20 13:28:40,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1104580.0, ans=0.125 2023-11-20 13:28:47,708 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165700 2023-11-20 13:28:59,934 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9400, loss[loss=0.07928, simple_loss=0.09934, pruned_loss=0.0186, audio_tagging_loss=0.01101, over 15169.00 frames. ], tot_loss[loss=0.07873, simple_loss=0.09943, pruned_loss=0.01902, audio_tagging_loss=0.009996, over 3052070.86 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:29:09,628 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:29:14,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1104780.0, ans=0.125 2023-11-20 13:29:19,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1104780.0, ans=0.125 2023-11-20 13:29:29,609 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.038e+01 8.701e+01 9.410e+01 1.188e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 13:29:30,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2023-11-20 13:29:40,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1104913.3333333333, ans=0.1 2023-11-20 13:29:42,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1104913.3333333333, ans=0.125 2023-11-20 13:29:44,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1104913.3333333333, ans=0.2 2023-11-20 13:29:52,929 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165750 2023-11-20 13:29:59,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-11-20 13:30:02,356 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:30:04,859 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9450, loss[loss=0.08918, simple_loss=0.1141, pruned_loss=0.02389, audio_tagging_loss=0.008236, over 15733.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1003, pruned_loss=0.0192, audio_tagging_loss=0.009992, over 3054587.64 frames. ], batch size: 61, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:30:10,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-20 13:30:12,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-20 13:30:26,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1105113.3333333333, ans=0.125 2023-11-20 13:30:57,425 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165800 2023-11-20 13:31:09,005 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9500, loss[loss=0.08624, simple_loss=0.1035, pruned_loss=0.02113, audio_tagging_loss=0.01334, over 15887.00 frames. ], tot_loss[loss=0.07866, simple_loss=0.0992, pruned_loss=0.01895, audio_tagging_loss=0.01011, over 3046913.55 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:31:37,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1105513.3333333333, ans=0.125 2023-11-20 13:31:39,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.167e+01 8.714e+01 9.690e+01 1.183e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 13:31:48,992 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:32:01,610 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165850 2023-11-20 13:32:13,249 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9550, loss[loss=0.1007, simple_loss=0.1188, pruned_loss=0.0289, audio_tagging_loss=0.01241, over 15739.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09891, pruned_loss=0.01891, audio_tagging_loss=0.01017, over 3045958.61 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:32:17,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1105713.3333333333, ans=0.0 2023-11-20 13:32:17,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1105713.3333333333, ans=0.0 2023-11-20 13:32:21,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=22.5 2023-11-20 13:32:27,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1105780.0, ans=0.1 2023-11-20 13:32:56,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1105913.3333333333, ans=0.1 2023-11-20 13:32:57,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1105913.3333333333, ans=0.125 2023-11-20 13:33:02,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1105913.3333333333, ans=0.125 2023-11-20 13:33:06,227 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165900 2023-11-20 13:33:06,492 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:33:11,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2023-11-20 13:33:17,801 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9600, loss[loss=0.08496, simple_loss=0.1076, pruned_loss=0.02215, audio_tagging_loss=0.009027, over 14230.00 frames. ], tot_loss[loss=0.07894, simple_loss=0.09954, pruned_loss=0.01899, audio_tagging_loss=0.01018, over 3056241.60 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:33:21,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1106046.6666666667, ans=0.125 2023-11-20 13:33:22,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1106046.6666666667, ans=0.125 2023-11-20 13:33:39,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1106113.3333333333, ans=0.09899494936611666 2023-11-20 13:33:47,773 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-11-20 13:33:48,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.385e+01 9.134e+01 1.022e+02 1.365e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 13:34:10,114 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 165950 2023-11-20 13:34:11,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1106313.3333333333, ans=0.125 2023-11-20 13:34:12,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1106313.3333333333, ans=0.09899494936611666 2023-11-20 13:34:15,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1106313.3333333333, ans=0.125 2023-11-20 13:34:16,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1106313.3333333333, ans=0.1 2023-11-20 13:34:18,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1106313.3333333333, ans=0.0 2023-11-20 13:34:21,509 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9650, loss[loss=0.06599, simple_loss=0.08428, pruned_loss=0.01376, audio_tagging_loss=0.01009, over 14419.00 frames. ], tot_loss[loss=0.0785, simple_loss=0.09914, pruned_loss=0.01885, audio_tagging_loss=0.01008, over 3055011.82 frames. ], batch size: 53, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:34:25,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1106380.0, ans=0.125 2023-11-20 13:34:50,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1106513.3333333333, ans=0.125 2023-11-20 13:34:53,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2023-11-20 13:35:02,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1106580.0, ans=0.2 2023-11-20 13:35:13,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-20 13:35:14,149 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166000 2023-11-20 13:35:19,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1106646.6666666667, ans=0.0 2023-11-20 13:35:23,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1106646.6666666667, ans=0.125 2023-11-20 13:35:25,858 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9700, loss[loss=0.1046, simple_loss=0.1408, pruned_loss=0.02692, audio_tagging_loss=0.007256, over 15708.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.09979, pruned_loss=0.01901, audio_tagging_loss=0.009949, over 3054816.00 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 16.0 2023-11-20 13:35:33,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1106713.3333333333, ans=0.125 2023-11-20 13:35:53,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1106846.6666666667, ans=0.125 2023-11-20 13:35:57,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.103e+01 9.034e+01 9.824e+01 1.276e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 13:36:08,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1106913.3333333333, ans=0.1 2023-11-20 13:36:13,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-11-20 13:36:18,853 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166050 2023-11-20 13:36:21,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1106980.0, ans=0.125 2023-11-20 13:36:31,005 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9750, loss[loss=0.08313, simple_loss=0.115, pruned_loss=0.01865, audio_tagging_loss=0.006981, over 14419.00 frames. ], tot_loss[loss=0.07804, simple_loss=0.09884, pruned_loss=0.01876, audio_tagging_loss=0.00986, over 3052015.34 frames. ], batch size: 53, lr: 4.87e-03, grad_scale: 16.0 2023-11-20 13:36:31,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1107046.6666666667, ans=0.125 2023-11-20 13:36:35,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1107046.6666666667, ans=0.0 2023-11-20 13:36:35,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1107046.6666666667, ans=0.125 2023-11-20 13:36:40,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1107046.6666666667, ans=0.2 2023-11-20 13:37:10,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1107246.6666666667, ans=0.125 2023-11-20 13:37:24,255 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166100 2023-11-20 13:37:26,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=12.0 2023-11-20 13:37:35,955 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9800, loss[loss=0.07652, simple_loss=0.09825, pruned_loss=0.01764, audio_tagging_loss=0.009756, over 15796.00 frames. ], tot_loss[loss=0.07811, simple_loss=0.09872, pruned_loss=0.01892, audio_tagging_loss=0.009834, over 3052355.16 frames. ], batch size: 60, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:37:45,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2023-11-20 13:37:50,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-20 13:38:02,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-20 13:38:07,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.608e+01 8.297e+01 9.086e+01 9.730e+01 1.369e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 13:38:10,568 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-20 13:38:25,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1107580.0, ans=0.125 2023-11-20 13:38:28,903 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166150 2023-11-20 13:38:32,576 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:38:40,588 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9850, loss[loss=0.1178, simple_loss=0.1571, pruned_loss=0.03201, audio_tagging_loss=0.007272, over 15588.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09949, pruned_loss=0.01906, audio_tagging_loss=0.009734, over 3057560.87 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:38:49,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1107713.3333333333, ans=0.125 2023-11-20 13:38:56,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1107780.0, ans=0.0 2023-11-20 13:38:57,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2023-11-20 13:39:07,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1107846.6666666667, ans=0.0 2023-11-20 13:39:17,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1107846.6666666667, ans=0.05 2023-11-20 13:39:18,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1107913.3333333333, ans=0.125 2023-11-20 13:39:20,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1107913.3333333333, ans=0.125 2023-11-20 13:39:21,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1107913.3333333333, ans=0.1 2023-11-20 13:39:29,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2023-11-20 13:39:30,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-20 13:39:31,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1107980.0, ans=0.125 2023-11-20 13:39:33,706 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166200 2023-11-20 13:39:45,708 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9900, loss[loss=0.0815, simple_loss=0.103, pruned_loss=0.02064, audio_tagging_loss=0.00935, over 16096.00 frames. ], tot_loss[loss=0.07924, simple_loss=0.1006, pruned_loss=0.01926, audio_tagging_loss=0.009686, over 3060520.11 frames. ], batch size: 62, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:39:48,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1108046.6666666667, ans=0.0 2023-11-20 13:40:00,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1108113.3333333333, ans=0.0 2023-11-20 13:40:04,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1108113.3333333333, ans=0.2 2023-11-20 13:40:13,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1108180.0, ans=0.07 2023-11-20 13:40:18,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.742e+01 8.087e+01 8.695e+01 9.650e+01 1.416e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:40:27,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1108246.6666666667, ans=0.0 2023-11-20 13:40:33,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2023-11-20 13:40:38,860 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166250 2023-11-20 13:40:51,328 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 9950, loss[loss=0.07174, simple_loss=0.08604, pruned_loss=0.02065, audio_tagging_loss=0.008067, over 14700.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.1013, pruned_loss=0.01926, audio_tagging_loss=0.009669, over 3053624.96 frames. ], batch size: 54, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:40:54,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1108380.0, ans=0.125 2023-11-20 13:40:56,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1108380.0, ans=0.125 2023-11-20 13:40:56,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1108380.0, ans=0.0 2023-11-20 13:41:44,162 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166300 2023-11-20 13:41:44,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1108646.6666666667, ans=0.125 2023-11-20 13:41:55,001 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10000, loss[loss=0.08303, simple_loss=0.1138, pruned_loss=0.01911, audio_tagging_loss=0.007014, over 15219.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1006, pruned_loss=0.01916, audio_tagging_loss=0.009858, over 3059639.09 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:42:23,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2023-11-20 13:42:24,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1108846.6666666667, ans=0.125 2023-11-20 13:42:25,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1108846.6666666667, ans=0.07 2023-11-20 13:42:29,452 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.616e+01 8.105e+01 8.776e+01 9.451e+01 1.209e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 13:42:48,563 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166350 2023-11-20 13:42:59,235 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10050, loss[loss=0.07776, simple_loss=0.08827, pruned_loss=0.0223, audio_tagging_loss=0.01132, over 14375.00 frames. ], tot_loss[loss=0.07917, simple_loss=0.1002, pruned_loss=0.01926, audio_tagging_loss=0.009812, over 3055830.42 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:43:02,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1109046.6666666667, ans=0.1 2023-11-20 13:43:08,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.74 vs. limit=10.0 2023-11-20 13:43:31,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1109180.0, ans=0.2 2023-11-20 13:43:32,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1109180.0, ans=0.05 2023-11-20 13:43:43,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-20 13:43:52,166 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166400 2023-11-20 13:43:57,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1109313.3333333333, ans=0.0 2023-11-20 13:44:03,881 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10100, loss[loss=0.06967, simple_loss=0.09406, pruned_loss=0.0126, audio_tagging_loss=0.01004, over 15807.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1016, pruned_loss=0.01956, audio_tagging_loss=0.009827, over 3060675.59 frames. ], batch size: 58, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:44:09,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1109380.0, ans=0.0 2023-11-20 13:44:26,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2023-11-20 13:44:37,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.086e+01 8.697e+01 9.512e+01 1.226e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:44:40,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1109513.3333333333, ans=0.125 2023-11-20 13:44:45,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109580.0, ans=0.1 2023-11-20 13:44:45,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1109580.0, ans=6.0 2023-11-20 13:44:56,190 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:44:57,528 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166450 2023-11-20 13:44:57,774 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:45:00,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2023-11-20 13:45:08,349 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10150, loss[loss=0.06792, simple_loss=0.07407, pruned_loss=0.01672, audio_tagging_loss=0.01416, over 14405.00 frames. ], tot_loss[loss=0.07979, simple_loss=0.101, pruned_loss=0.01943, audio_tagging_loss=0.009882, over 3063812.48 frames. ], batch size: 54, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:45:09,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1109713.3333333333, ans=0.125 2023-11-20 13:45:18,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1109713.3333333333, ans=0.125 2023-11-20 13:45:23,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1109780.0, ans=0.0 2023-11-20 13:45:29,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1109780.0, ans=0.0 2023-11-20 13:45:36,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1109846.6666666667, ans=0.125 2023-11-20 13:45:37,505 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:45:58,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1109980.0, ans=0.0 2023-11-20 13:46:01,528 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166500 2023-11-20 13:46:04,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-20 13:46:05,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1109980.0, ans=0.05 2023-11-20 13:46:12,421 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10200, loss[loss=0.06988, simple_loss=0.08811, pruned_loss=0.0168, audio_tagging_loss=0.00903, over 14731.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.1006, pruned_loss=0.01939, audio_tagging_loss=0.009997, over 3065299.49 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:46:21,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1110046.6666666667, ans=0.1 2023-11-20 13:46:23,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1110046.6666666667, ans=0.125 2023-11-20 13:46:36,411 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:46:46,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.131e+01 8.850e+01 9.665e+01 1.277e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 13:46:55,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1110246.6666666667, ans=0.0 2023-11-20 13:47:05,122 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166550 2023-11-20 13:47:13,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1110313.3333333333, ans=0.2 2023-11-20 13:47:15,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1110380.0, ans=0.1 2023-11-20 13:47:16,465 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10250, loss[loss=0.06865, simple_loss=0.08266, pruned_loss=0.0147, audio_tagging_loss=0.01261, over 13882.00 frames. ], tot_loss[loss=0.07959, simple_loss=0.1005, pruned_loss=0.01929, audio_tagging_loss=0.01004, over 3057955.67 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:47:21,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2023-11-20 13:47:36,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1110446.6666666667, ans=0.125 2023-11-20 13:47:38,938 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2023-11-20 13:47:41,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=12.0 2023-11-20 13:47:53,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1110513.3333333333, ans=0.0 2023-11-20 13:48:07,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1110646.6666666667, ans=0.125 2023-11-20 13:48:09,985 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166600 2023-11-20 13:48:13,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1110646.6666666667, ans=0.125 2023-11-20 13:48:17,404 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:48:21,982 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10300, loss[loss=0.04308, simple_loss=0.04493, pruned_loss=0.0061, audio_tagging_loss=0.01451, over 15826.00 frames. ], tot_loss[loss=0.07872, simple_loss=0.09938, pruned_loss=0.0189, audio_tagging_loss=0.01012, over 3056837.64 frames. ], batch size: 63, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:48:30,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.99 vs. limit=10.0 2023-11-20 13:48:35,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-20 13:48:55,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.311e+01 8.084e+01 8.693e+01 9.702e+01 1.335e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:49:15,239 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166650 2023-11-20 13:49:26,763 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10350, loss[loss=0.08287, simple_loss=0.1051, pruned_loss=0.01851, audio_tagging_loss=0.01182, over 15792.00 frames. ], tot_loss[loss=0.07894, simple_loss=0.09923, pruned_loss=0.01915, audio_tagging_loss=0.01018, over 3054410.30 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 8.0 2023-11-20 13:49:42,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1111113.3333333333, ans=0.125 2023-11-20 13:49:43,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1111113.3333333333, ans=0.125 2023-11-20 13:49:50,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1111113.3333333333, ans=0.0 2023-11-20 13:49:58,114 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:50:02,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.35 vs. limit=10.0 2023-11-20 13:50:04,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-20 13:50:11,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1111246.6666666667, ans=0.125 2023-11-20 13:50:19,533 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166700 2023-11-20 13:50:28,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1111313.3333333333, ans=0.035 2023-11-20 13:50:31,165 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10400, loss[loss=0.06962, simple_loss=0.0787, pruned_loss=0.01852, audio_tagging_loss=0.01174, over 14192.00 frames. ], tot_loss[loss=0.07965, simple_loss=0.1001, pruned_loss=0.01937, audio_tagging_loss=0.01025, over 3056080.31 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:50:36,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1111380.0, ans=0.07 2023-11-20 13:50:38,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1111380.0, ans=0.0 2023-11-20 13:50:38,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1111380.0, ans=0.125 2023-11-20 13:50:42,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1111380.0, ans=0.125 2023-11-20 13:50:45,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-11-20 13:51:05,052 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.019e+01 8.655e+01 9.452e+01 1.304e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 13:51:11,542 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:51:19,971 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-11-20 13:51:24,386 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166750 2023-11-20 13:51:28,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1111646.6666666667, ans=0.07 2023-11-20 13:51:36,032 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10450, loss[loss=0.06824, simple_loss=0.08376, pruned_loss=0.01682, audio_tagging_loss=0.009542, over 15448.00 frames. ], tot_loss[loss=0.07896, simple_loss=0.09926, pruned_loss=0.01908, audio_tagging_loss=0.01024, over 3058395.77 frames. ], batch size: 57, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:52:02,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1111846.6666666667, ans=0.125 2023-11-20 13:52:10,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1111846.6666666667, ans=0.0 2023-11-20 13:52:24,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1111913.3333333333, ans=0.1 2023-11-20 13:52:29,656 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166800 2023-11-20 13:52:32,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1111980.0, ans=0.125 2023-11-20 13:52:41,522 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10500, loss[loss=0.07539, simple_loss=0.103, pruned_loss=0.01374, audio_tagging_loss=0.01013, over 15318.00 frames. ], tot_loss[loss=0.07823, simple_loss=0.09867, pruned_loss=0.0188, audio_tagging_loss=0.0101, over 3051800.89 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:52:45,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1112046.6666666667, ans=0.125 2023-11-20 13:53:14,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.112e+01 8.724e+01 9.287e+01 1.188e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 13:53:31,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1112246.6666666667, ans=0.2 2023-11-20 13:53:34,607 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166850 2023-11-20 13:53:40,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1112313.3333333333, ans=0.1 2023-11-20 13:53:45,948 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10550, loss[loss=0.1013, simple_loss=0.1243, pruned_loss=0.03117, audio_tagging_loss=0.007995, over 15657.00 frames. ], tot_loss[loss=0.07802, simple_loss=0.09854, pruned_loss=0.01873, audio_tagging_loss=0.01002, over 3052716.96 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:54:18,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1112513.3333333333, ans=0.1 2023-11-20 13:54:18,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1112513.3333333333, ans=0.07 2023-11-20 13:54:31,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1112580.0, ans=0.125 2023-11-20 13:54:38,997 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166900 2023-11-20 13:54:44,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1112646.6666666667, ans=0.0 2023-11-20 13:54:49,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1112713.3333333333, ans=0.1 2023-11-20 13:54:50,576 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10600, loss[loss=0.06892, simple_loss=0.08626, pruned_loss=0.01548, audio_tagging_loss=0.01031, over 15330.00 frames. ], tot_loss[loss=0.07809, simple_loss=0.0987, pruned_loss=0.01883, audio_tagging_loss=0.009906, over 3041340.40 frames. ], batch size: 57, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:54:56,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1112713.3333333333, ans=0.2 2023-11-20 13:55:13,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1112780.0, ans=0.05 2023-11-20 13:55:18,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.83 vs. limit=10.0 2023-11-20 13:55:24,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.206e+01 8.903e+01 9.867e+01 1.464e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 13:55:43,410 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 166950 2023-11-20 13:55:55,862 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10650, loss[loss=0.05701, simple_loss=0.06945, pruned_loss=0.01243, audio_tagging_loss=0.009848, over 13906.00 frames. ], tot_loss[loss=0.07805, simple_loss=0.09857, pruned_loss=0.01889, audio_tagging_loss=0.009868, over 3043548.64 frames. ], batch size: 53, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:55:57,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1113046.6666666667, ans=0.2 2023-11-20 13:56:12,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1113113.3333333333, ans=0.1 2023-11-20 13:56:14,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=12.0 2023-11-20 13:56:31,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1113180.0, ans=0.0 2023-11-20 13:56:37,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1113246.6666666667, ans=0.125 2023-11-20 13:56:43,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1113246.6666666667, ans=0.2 2023-11-20 13:56:48,719 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167000 2023-11-20 13:56:49,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1113313.3333333333, ans=0.125 2023-11-20 13:57:00,515 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10700, loss[loss=0.05717, simple_loss=0.0649, pruned_loss=0.01333, audio_tagging_loss=0.01139, over 14606.00 frames. ], tot_loss[loss=0.07777, simple_loss=0.09806, pruned_loss=0.01883, audio_tagging_loss=0.009916, over 3040104.38 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:57:24,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1113513.3333333333, ans=0.05 2023-11-20 13:57:34,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.075e+01 8.061e+01 8.803e+01 9.456e+01 1.141e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 13:57:40,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1113580.0, ans=15.0 2023-11-20 13:57:53,749 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167050 2023-11-20 13:58:05,325 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10750, loss[loss=0.07923, simple_loss=0.1102, pruned_loss=0.01671, audio_tagging_loss=0.007405, over 15116.00 frames. ], tot_loss[loss=0.07838, simple_loss=0.09935, pruned_loss=0.01891, audio_tagging_loss=0.009796, over 3047264.99 frames. ], batch size: 57, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:58:17,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1113780.0, ans=0.125 2023-11-20 13:58:25,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1113780.0, ans=0.0 2023-11-20 13:58:43,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2023-11-20 13:58:57,900 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167100 2023-11-20 13:58:58,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1113980.0, ans=0.2 2023-11-20 13:59:09,690 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10800, loss[loss=0.07655, simple_loss=0.1, pruned_loss=0.01794, audio_tagging_loss=0.008599, over 14419.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.1002, pruned_loss=0.01904, audio_tagging_loss=0.009731, over 3049019.01 frames. ], batch size: 55, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 13:59:43,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.350e+01 8.974e+01 9.650e+01 1.251e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 13:59:59,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1114246.6666666667, ans=0.125 2023-11-20 14:00:03,049 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167150 2023-11-20 14:00:14,909 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10850, loss[loss=0.07061, simple_loss=0.08889, pruned_loss=0.01327, audio_tagging_loss=0.01289, over 16942.00 frames. ], tot_loss[loss=0.07849, simple_loss=0.09938, pruned_loss=0.0189, audio_tagging_loss=0.009901, over 3047415.68 frames. ], batch size: 62, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:00:42,581 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2023-11-20 14:00:51,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1114513.3333333333, ans=0.95 2023-11-20 14:01:04,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1114580.0, ans=0.1 2023-11-20 14:01:08,157 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167200 2023-11-20 14:01:14,733 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:01:20,243 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10900, loss[loss=0.07226, simple_loss=0.08513, pruned_loss=0.01618, audio_tagging_loss=0.01352, over 14865.00 frames. ], tot_loss[loss=0.07858, simple_loss=0.09962, pruned_loss=0.01891, audio_tagging_loss=0.009865, over 3043196.22 frames. ], batch size: 55, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:01:26,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1114713.3333333333, ans=0.0 2023-11-20 14:01:37,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1114780.0, ans=0.125 2023-11-20 14:01:48,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1114846.6666666667, ans=0.125 2023-11-20 14:01:49,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1114846.6666666667, ans=0.05 2023-11-20 14:01:52,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1114846.6666666667, ans=0.2 2023-11-20 14:01:53,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.172e+01 8.152e+01 8.794e+01 9.597e+01 1.232e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 14:02:13,353 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167250 2023-11-20 14:02:21,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1114980.0, ans=0.125 2023-11-20 14:02:24,251 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 10950, loss[loss=0.07963, simple_loss=0.09588, pruned_loss=0.02131, audio_tagging_loss=0.01037, over 16389.00 frames. ], tot_loss[loss=0.0783, simple_loss=0.09933, pruned_loss=0.01872, audio_tagging_loss=0.009916, over 3045329.46 frames. ], batch size: 61, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:02:33,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1115046.6666666667, ans=0.1 2023-11-20 14:02:50,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1115180.0, ans=0.2 2023-11-20 14:02:58,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2023-11-20 14:03:12,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1115246.6666666667, ans=0.125 2023-11-20 14:03:17,722 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167300 2023-11-20 14:03:21,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1115313.3333333333, ans=0.0 2023-11-20 14:03:29,242 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11000, loss[loss=0.1013, simple_loss=0.1323, pruned_loss=0.02668, audio_tagging_loss=0.008435, over 15724.00 frames. ], tot_loss[loss=0.07835, simple_loss=0.09917, pruned_loss=0.01878, audio_tagging_loss=0.009988, over 3044097.43 frames. ], batch size: 57, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:03:38,513 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:03:44,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1115446.6666666667, ans=0.0 2023-11-20 14:03:49,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2023-11-20 14:03:57,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1115513.3333333333, ans=0.125 2023-11-20 14:04:02,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.121e+01 8.892e+01 9.815e+01 1.453e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 14:04:06,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-11-20 14:04:15,546 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-20 14:04:16,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-20 14:04:22,139 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167350 2023-11-20 14:04:23,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1115646.6666666667, ans=0.1 2023-11-20 14:04:33,140 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11050, loss[loss=0.06746, simple_loss=0.08763, pruned_loss=0.01503, audio_tagging_loss=0.008617, over 14849.00 frames. ], tot_loss[loss=0.07841, simple_loss=0.09917, pruned_loss=0.01881, audio_tagging_loss=0.01001, over 3046349.95 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:04:47,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1115780.0, ans=0.125 2023-11-20 14:04:56,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1115780.0, ans=0.2 2023-11-20 14:05:02,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1115846.6666666667, ans=0.1 2023-11-20 14:05:04,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1115846.6666666667, ans=0.125 2023-11-20 14:05:15,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1115913.3333333333, ans=0.1 2023-11-20 14:05:20,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1115913.3333333333, ans=0.2 2023-11-20 14:05:25,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1115980.0, ans=0.0 2023-11-20 14:05:26,821 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167400 2023-11-20 14:05:28,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1115980.0, ans=0.2 2023-11-20 14:05:38,012 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11100, loss[loss=0.09088, simple_loss=0.115, pruned_loss=0.02382, audio_tagging_loss=0.009562, over 15121.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.1, pruned_loss=0.01914, audio_tagging_loss=0.01, over 3049030.20 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:05:56,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1116113.3333333333, ans=0.95 2023-11-20 14:06:02,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1116113.3333333333, ans=0.125 2023-11-20 14:06:11,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.381e+01 8.919e+01 9.708e+01 1.297e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 14:06:15,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=15.0 2023-11-20 14:06:18,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1116246.6666666667, ans=0.125 2023-11-20 14:06:31,722 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167450 2023-11-20 14:06:41,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1116380.0, ans=0.125 2023-11-20 14:06:42,800 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11150, loss[loss=0.06395, simple_loss=0.07493, pruned_loss=0.01551, audio_tagging_loss=0.01097, over 16226.00 frames. ], tot_loss[loss=0.07909, simple_loss=0.09968, pruned_loss=0.01901, audio_tagging_loss=0.01025, over 3054200.82 frames. ], batch size: 62, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:06:48,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1116380.0, ans=0.2 2023-11-20 14:07:05,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1116446.6666666667, ans=0.125 2023-11-20 14:07:06,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1116446.6666666667, ans=0.0 2023-11-20 14:07:09,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1116513.3333333333, ans=0.125 2023-11-20 14:07:09,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1116513.3333333333, ans=0.04949747468305833 2023-11-20 14:07:23,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1116580.0, ans=0.0 2023-11-20 14:07:34,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1116646.6666666667, ans=0.125 2023-11-20 14:07:35,357 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167500 2023-11-20 14:07:38,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1116646.6666666667, ans=0.125 2023-11-20 14:07:47,524 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11200, loss[loss=0.08267, simple_loss=0.1127, pruned_loss=0.01504, audio_tagging_loss=0.01126, over 15417.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.09963, pruned_loss=0.01891, audio_tagging_loss=0.01033, over 3057385.12 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:07:53,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-20 14:07:59,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1116780.0, ans=0.0 2023-11-20 14:08:05,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1116780.0, ans=0.0 2023-11-20 14:08:10,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1116780.0, ans=0.07 2023-11-20 14:08:20,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.196e+01 8.773e+01 9.585e+01 1.271e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 14:08:20,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1116846.6666666667, ans=0.0 2023-11-20 14:08:27,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1116913.3333333333, ans=0.07 2023-11-20 14:08:28,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1116913.3333333333, ans=15.0 2023-11-20 14:08:38,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=8.0 2023-11-20 14:08:40,465 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167550 2023-11-20 14:08:43,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1116980.0, ans=0.0 2023-11-20 14:08:44,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1116980.0, ans=0.0 2023-11-20 14:08:51,246 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11250, loss[loss=0.08803, simple_loss=0.125, pruned_loss=0.01806, audio_tagging_loss=0.007471, over 15482.00 frames. ], tot_loss[loss=0.07886, simple_loss=0.09954, pruned_loss=0.01888, audio_tagging_loss=0.01021, over 3055799.20 frames. ], batch size: 54, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:09:24,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117180.0, ans=0.1 2023-11-20 14:09:26,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1117180.0, ans=0.125 2023-11-20 14:09:44,049 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167600 2023-11-20 14:09:55,738 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11300, loss[loss=0.08328, simple_loss=0.1026, pruned_loss=0.01948, audio_tagging_loss=0.01251, over 14497.00 frames. ], tot_loss[loss=0.07973, simple_loss=0.1004, pruned_loss=0.01945, audio_tagging_loss=0.01005, over 3055522.97 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:10:25,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2023-11-20 14:10:30,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.679e+01 8.103e+01 8.654e+01 9.341e+01 1.359e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 14:10:48,675 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167650 2023-11-20 14:10:52,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-11-20 14:10:54,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1117646.6666666667, ans=0.0 2023-11-20 14:11:00,294 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11350, loss[loss=0.07648, simple_loss=0.09557, pruned_loss=0.01798, audio_tagging_loss=0.01072, over 15616.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.1004, pruned_loss=0.01938, audio_tagging_loss=0.009987, over 3059533.20 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:11:05,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1117713.3333333333, ans=0.1 2023-11-20 14:11:11,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1117713.3333333333, ans=0.0 2023-11-20 14:11:24,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1117780.0, ans=0.125 2023-11-20 14:11:34,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2023-11-20 14:11:52,959 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167700 2023-11-20 14:11:57,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1117980.0, ans=0.125 2023-11-20 14:12:04,781 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11400, loss[loss=0.0995, simple_loss=0.1261, pruned_loss=0.02718, audio_tagging_loss=0.009273, over 14963.00 frames. ], tot_loss[loss=0.08012, simple_loss=0.1014, pruned_loss=0.01954, audio_tagging_loss=0.009883, over 3056506.84 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:12:10,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1118046.6666666667, ans=10.0 2023-11-20 14:12:21,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1118113.3333333333, ans=0.0 2023-11-20 14:12:30,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1118180.0, ans=0.125 2023-11-20 14:12:37,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1118180.0, ans=0.0 2023-11-20 14:12:39,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.090e+01 8.832e+01 9.724e+01 2.021e+02, threshold=1.766e+02, percent-clipped=1.0 2023-11-20 14:12:39,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1118180.0, ans=0.2 2023-11-20 14:12:42,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1118246.6666666667, ans=0.05 2023-11-20 14:12:57,885 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167750 2023-11-20 14:13:02,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-20 14:13:09,447 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11450, loss[loss=0.06121, simple_loss=0.07899, pruned_loss=0.01224, audio_tagging_loss=0.009472, over 14572.00 frames. ], tot_loss[loss=0.07931, simple_loss=0.1003, pruned_loss=0.01938, audio_tagging_loss=0.009785, over 3054021.38 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:13:11,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.71 vs. limit=10.0 2023-11-20 14:13:12,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1118380.0, ans=0.125 2023-11-20 14:13:29,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1118446.6666666667, ans=0.125 2023-11-20 14:14:01,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1118646.6666666667, ans=0.125 2023-11-20 14:14:02,156 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167800 2023-11-20 14:14:14,032 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11500, loss[loss=0.0852, simple_loss=0.101, pruned_loss=0.02416, audio_tagging_loss=0.01055, over 14722.00 frames. ], tot_loss[loss=0.07981, simple_loss=0.101, pruned_loss=0.0196, audio_tagging_loss=0.00973, over 3052065.72 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:14:20,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1118713.3333333333, ans=0.125 2023-11-20 14:14:22,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2023-11-20 14:14:41,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2023-11-20 14:14:45,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-20 14:14:48,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.299e+01 8.769e+01 9.853e+01 1.208e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 14:14:53,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2023-11-20 14:15:03,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1118913.3333333333, ans=0.125 2023-11-20 14:15:06,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-20 14:15:07,071 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167850 2023-11-20 14:15:19,111 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11550, loss[loss=0.08513, simple_loss=0.1001, pruned_loss=0.0253, audio_tagging_loss=0.009798, over 16031.00 frames. ], tot_loss[loss=0.07956, simple_loss=0.1007, pruned_loss=0.01945, audio_tagging_loss=0.009759, over 3050821.71 frames. ], batch size: 60, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:15:30,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2023-11-20 14:15:55,720 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:16:00,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1119246.6666666667, ans=0.125 2023-11-20 14:16:02,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1119246.6666666667, ans=0.1 2023-11-20 14:16:02,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1119246.6666666667, ans=0.125 2023-11-20 14:16:11,582 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167900 2023-11-20 14:16:14,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1119313.3333333333, ans=0.125 2023-11-20 14:16:23,316 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11600, loss[loss=0.09985, simple_loss=0.1199, pruned_loss=0.02997, audio_tagging_loss=0.009944, over 15669.00 frames. ], tot_loss[loss=0.07946, simple_loss=0.1002, pruned_loss=0.01953, audio_tagging_loss=0.009811, over 3048961.49 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:16:41,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1119446.6666666667, ans=0.125 2023-11-20 14:16:42,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-11-20 14:16:50,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1119513.3333333333, ans=0.0 2023-11-20 14:16:57,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.039e+01 8.649e+01 9.262e+01 1.367e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 14:17:04,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2023-11-20 14:17:08,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2023-11-20 14:17:14,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-20 14:17:15,931 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 167950 2023-11-20 14:17:19,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1119646.6666666667, ans=0.125 2023-11-20 14:17:26,996 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11650, loss[loss=0.08546, simple_loss=0.1082, pruned_loss=0.02116, audio_tagging_loss=0.01022, over 14781.00 frames. ], tot_loss[loss=0.07991, simple_loss=0.1007, pruned_loss=0.01962, audio_tagging_loss=0.009911, over 3047812.91 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:17:33,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1119713.3333333333, ans=0.0 2023-11-20 14:17:41,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1119780.0, ans=0.1 2023-11-20 14:17:58,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1119846.6666666667, ans=0.95 2023-11-20 14:18:09,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1119913.3333333333, ans=0.125 2023-11-20 14:18:20,057 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168000 2023-11-20 14:18:30,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1119980.0, ans=0.035 2023-11-20 14:18:34,837 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11700, loss[loss=0.07548, simple_loss=0.1001, pruned_loss=0.01763, audio_tagging_loss=0.007786, over 15575.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1005, pruned_loss=0.01964, audio_tagging_loss=0.009934, over 3050916.25 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:18:45,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2023-11-20 14:18:55,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1120113.3333333333, ans=0.0 2023-11-20 14:19:09,481 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.107e+01 8.645e+01 9.352e+01 1.111e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 14:19:11,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1120180.0, ans=0.125 2023-11-20 14:19:14,860 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:19:18,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1120246.6666666667, ans=0.125 2023-11-20 14:19:27,350 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168050 2023-11-20 14:19:32,145 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2023-11-20 14:19:39,554 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11750, loss[loss=0.06712, simple_loss=0.08259, pruned_loss=0.01727, audio_tagging_loss=0.00855, over 15687.00 frames. ], tot_loss[loss=0.08008, simple_loss=0.1007, pruned_loss=0.0198, audio_tagging_loss=0.009949, over 3050557.72 frames. ], batch size: 61, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:19:47,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1120380.0, ans=0.125 2023-11-20 14:19:54,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1120446.6666666667, ans=0.1 2023-11-20 14:20:12,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1120513.3333333333, ans=0.0 2023-11-20 14:20:32,712 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168100 2023-11-20 14:20:33,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-20 14:20:43,431 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11800, loss[loss=0.09071, simple_loss=0.1159, pruned_loss=0.02467, audio_tagging_loss=0.008071, over 15881.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.09959, pruned_loss=0.01959, audio_tagging_loss=0.009998, over 3043391.36 frames. ], batch size: 56, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:20:59,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1120780.0, ans=0.0 2023-11-20 14:21:08,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1120846.6666666667, ans=0.0 2023-11-20 14:21:10,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1120846.6666666667, ans=0.0 2023-11-20 14:21:19,049 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.087e+01 8.933e+01 9.931e+01 1.196e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 14:21:20,694 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:21:31,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1120913.3333333333, ans=0.125 2023-11-20 14:21:31,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1120913.3333333333, ans=0.125 2023-11-20 14:21:36,622 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168150 2023-11-20 14:21:41,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1120980.0, ans=0.1 2023-11-20 14:21:47,562 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11850, loss[loss=0.09377, simple_loss=0.1218, pruned_loss=0.02316, audio_tagging_loss=0.009729, over 16568.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.1001, pruned_loss=0.01953, audio_tagging_loss=0.01005, over 3047504.65 frames. ], batch size: 59, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:21:47,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1121046.6666666667, ans=0.035 2023-11-20 14:22:22,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-20 14:22:40,207 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168200 2023-11-20 14:22:51,474 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11900, loss[loss=0.08353, simple_loss=0.1063, pruned_loss=0.01668, audio_tagging_loss=0.01372, over 15310.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.09968, pruned_loss=0.01935, audio_tagging_loss=0.01017, over 3053378.03 frames. ], batch size: 57, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:23:07,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1121446.6666666667, ans=0.125 2023-11-20 14:23:14,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1121446.6666666667, ans=0.125 2023-11-20 14:23:25,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1121513.3333333333, ans=0.95 2023-11-20 14:23:27,237 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.121e+01 8.163e+01 8.778e+01 9.504e+01 1.300e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 14:23:30,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1121580.0, ans=0.125 2023-11-20 14:23:42,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1121646.6666666667, ans=0.035 2023-11-20 14:23:45,125 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168250 2023-11-20 14:23:56,555 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 11950, loss[loss=0.07281, simple_loss=0.08736, pruned_loss=0.01691, audio_tagging_loss=0.01222, over 14436.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.09973, pruned_loss=0.01926, audio_tagging_loss=0.01025, over 3045647.37 frames. ], batch size: 57, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:24:18,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1121780.0, ans=0.05 2023-11-20 14:24:25,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1121846.6666666667, ans=0.2 2023-11-20 14:24:26,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1121846.6666666667, ans=0.125 2023-11-20 14:24:37,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1121913.3333333333, ans=0.0 2023-11-20 14:24:42,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1121913.3333333333, ans=0.125 2023-11-20 14:24:48,378 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168300 2023-11-20 14:24:52,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-20 14:24:59,006 INFO [train_asr.py:1262] (2/4) Epoch 14, batch 12000, loss[loss=0.0979, simple_loss=0.1209, pruned_loss=0.02845, audio_tagging_loss=0.009023, over 14629.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1009, pruned_loss=0.01962, audio_tagging_loss=0.01034, over 3047567.65 frames. ], batch size: 55, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:24:59,007 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 14:25:19,249 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2475, 0.8620, 3.3392, 3.2698, 3.3261, 3.1850, 3.0530, 3.0097], device='cuda:2') 2023-11-20 14:25:41,043 INFO [train_asr.py:1294] (2/4) Epoch 14, validation: loss=0.06236, simple_loss=0.05348, pruned_loss=0.005638, audio_tagging_loss=0.02999, over 4681554.00 frames. 2023-11-20 14:25:41,044 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 14:25:43,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1122046.6666666667, ans=10.0 2023-11-20 14:25:44,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-20 14:25:51,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1122113.3333333333, ans=0.125 2023-11-20 14:25:58,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-20 14:26:00,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-20 14:26:46,226 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 0, loss[loss=0.08921, simple_loss=0.09392, pruned_loss=0.01872, audio_tagging_loss=0.02353, over 16005.00 frames. ], tot_loss[loss=0.08921, simple_loss=0.09392, pruned_loss=0.01872, audio_tagging_loss=0.02353, over 16005.00 frames. ], batch size: 60, lr: 4.68e-03, grad_scale: 32.0 2023-11-20 14:26:46,227 INFO [train_asr.py:1285] (2/4) Computing validation loss 2023-11-20 14:27:18,296 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3139, 4.9784, 4.6815, 5.1521], device='cuda:2') 2023-11-20 14:27:21,782 INFO [train_asr.py:1294] (2/4) Epoch 15, validation: loss=0.06153, simple_loss=0.05347, pruned_loss=0.005654, audio_tagging_loss=0.02914, over 4681554.00 frames. 2023-11-20 14:27:21,783 INFO [train_asr.py:1295] (2/4) Maximum memory allocated so far is 25622MB 2023-11-20 14:27:26,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.292e+01 9.006e+01 9.902e+01 1.226e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 14:27:39,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1122266.6666666667, ans=0.125 2023-11-20 14:27:41,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-20 14:27:44,722 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168350 2023-11-20 14:27:52,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1122333.3333333333, ans=0.125 2023-11-20 14:27:59,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1122400.0, ans=0.1 2023-11-20 14:28:05,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1122400.0, ans=0.125 2023-11-20 14:28:07,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1122400.0, ans=0.0 2023-11-20 14:28:17,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1122466.6666666667, ans=0.125 2023-11-20 14:28:25,986 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 50, loss[loss=0.07667, simple_loss=0.08952, pruned_loss=0.01232, audio_tagging_loss=0.01959, over 15095.00 frames. ], tot_loss[loss=0.0875, simple_loss=0.09852, pruned_loss=0.01887, audio_tagging_loss=0.01937, over 681732.90 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 14:28:50,224 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168400 2023-11-20 14:29:02,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1122666.6666666667, ans=0.0 2023-11-20 14:29:12,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=12.0 2023-11-20 14:29:16,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1122733.3333333333, ans=0.0 2023-11-20 14:29:24,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1122800.0, ans=0.07 2023-11-20 14:29:32,373 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 100, loss[loss=0.06209, simple_loss=0.07145, pruned_loss=0.008757, audio_tagging_loss=0.01761, over 15750.00 frames. ], tot_loss[loss=0.08682, simple_loss=0.09823, pruned_loss=0.01893, audio_tagging_loss=0.01877, over 1198527.59 frames. ], batch size: 62, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:29:36,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1122866.6666666667, ans=0.0 2023-11-20 14:29:37,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1122866.6666666667, ans=0.0 2023-11-20 14:29:39,162 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.769e+01 9.395e+01 1.004e+02 1.341e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-20 14:29:56,010 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168450 2023-11-20 14:30:37,468 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 150, loss[loss=0.09037, simple_loss=0.1145, pruned_loss=0.02179, audio_tagging_loss=0.01132, over 15322.00 frames. ], tot_loss[loss=0.08584, simple_loss=0.1007, pruned_loss=0.01912, audio_tagging_loss=0.01635, over 1611629.59 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:30:59,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-20 14:31:00,986 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168500 2023-11-20 14:31:02,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1123333.3333333333, ans=0.0 2023-11-20 14:31:03,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-20 14:31:07,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1123333.3333333333, ans=0.125 2023-11-20 14:31:09,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1123333.3333333333, ans=0.125 2023-11-20 14:31:40,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1123466.6666666667, ans=0.0 2023-11-20 14:31:42,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2023-11-20 14:31:42,722 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 200, loss[loss=0.06696, simple_loss=0.08019, pruned_loss=0.01442, audio_tagging_loss=0.01244, over 15065.00 frames. ], tot_loss[loss=0.0827, simple_loss=0.09958, pruned_loss=0.01845, audio_tagging_loss=0.01446, over 1932252.67 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:31:43,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=12.0 2023-11-20 14:31:48,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.233e+01 8.956e+01 9.883e+01 1.318e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 14:32:00,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1123600.0, ans=0.125 2023-11-20 14:32:00,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1123600.0, ans=0.125 2023-11-20 14:32:06,134 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168550 2023-11-20 14:32:39,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1123800.0, ans=0.125 2023-11-20 14:32:48,645 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 250, loss[loss=0.07956, simple_loss=0.1044, pruned_loss=0.01761, audio_tagging_loss=0.009732, over 15005.00 frames. ], tot_loss[loss=0.082, simple_loss=0.09975, pruned_loss=0.01894, audio_tagging_loss=0.01319, over 2179970.17 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:32:53,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1123866.6666666667, ans=0.0 2023-11-20 14:32:59,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1123866.6666666667, ans=0.0 2023-11-20 14:33:11,796 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168600 2023-11-20 14:33:16,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1124000.0, ans=0.125 2023-11-20 14:33:31,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1124066.6666666667, ans=15.0 2023-11-20 14:33:44,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1124133.3333333333, ans=0.125 2023-11-20 14:33:45,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1124133.3333333333, ans=0.125 2023-11-20 14:33:54,388 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 300, loss[loss=0.05719, simple_loss=0.07129, pruned_loss=0.01049, audio_tagging_loss=0.01106, over 15578.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.1004, pruned_loss=0.01921, audio_tagging_loss=0.01224, over 2369243.82 frames. ], batch size: 59, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:33:54,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1124200.0, ans=0.0 2023-11-20 14:33:54,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124200.0, ans=0.1 2023-11-20 14:34:00,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.500e+01 9.120e+01 9.945e+01 1.401e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-20 14:34:17,721 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168650 2023-11-20 14:34:19,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-11-20 14:34:23,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1124333.3333333333, ans=0.2 2023-11-20 14:34:54,636 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:34:59,593 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 350, loss[loss=0.09362, simple_loss=0.1151, pruned_loss=0.02658, audio_tagging_loss=0.009499, over 14572.00 frames. ], tot_loss[loss=0.08161, simple_loss=0.1014, pruned_loss=0.01932, audio_tagging_loss=0.0116, over 2518824.88 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:35:24,796 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168700 2023-11-20 14:35:29,701 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-20 14:35:33,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1124666.6666666667, ans=0.125 2023-11-20 14:36:05,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=1124866.6666666667, ans=0.02 2023-11-20 14:36:06,974 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 400, loss[loss=0.06615, simple_loss=0.08985, pruned_loss=0.01098, audio_tagging_loss=0.01025, over 15314.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1007, pruned_loss=0.01919, audio_tagging_loss=0.01113, over 2641794.94 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 14:36:07,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1124866.6666666667, ans=0.0 2023-11-20 14:36:13,872 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.381e+01 8.162e+01 9.229e+01 1.065e+02 1.239e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-20 14:36:30,692 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168750 2023-11-20 14:36:40,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1125000.0, ans=0.125 2023-11-20 14:36:52,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1125066.6666666667, ans=0.0 2023-11-20 14:37:07,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1125133.3333333333, ans=0.0 2023-11-20 14:37:09,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1125133.3333333333, ans=0.125 2023-11-20 14:37:12,768 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 450, loss[loss=0.07546, simple_loss=0.09817, pruned_loss=0.01739, audio_tagging_loss=0.008982, over 16150.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1007, pruned_loss=0.01921, audio_tagging_loss=0.01082, over 2734537.68 frames. ], batch size: 59, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:37:19,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1125200.0, ans=0.125 2023-11-20 14:37:25,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1125266.6666666667, ans=0.2 2023-11-20 14:37:34,861 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168800 2023-11-20 14:37:48,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1125333.3333333333, ans=0.0 2023-11-20 14:38:04,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2023-11-20 14:38:10,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2023-11-20 14:38:17,233 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 500, loss[loss=0.07176, simple_loss=0.09357, pruned_loss=0.01446, audio_tagging_loss=0.01052, over 15086.00 frames. ], tot_loss[loss=0.0796, simple_loss=0.1, pruned_loss=0.01902, audio_tagging_loss=0.01057, over 2793123.70 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:38:22,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1125533.3333333333, ans=0.125 2023-11-20 14:38:24,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.018e+01 8.483e+01 9.528e+01 1.143e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 14:38:40,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1125600.0, ans=0.07 2023-11-20 14:38:41,273 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168850 2023-11-20 14:38:43,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.43 vs. limit=22.5 2023-11-20 14:38:44,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1125666.6666666667, ans=0.125 2023-11-20 14:38:45,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1125666.6666666667, ans=0.2 2023-11-20 14:38:45,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1125666.6666666667, ans=0.025 2023-11-20 14:39:02,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1125733.3333333333, ans=0.125 2023-11-20 14:39:09,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1125800.0, ans=0.125 2023-11-20 14:39:10,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1125800.0, ans=0.04949747468305833 2023-11-20 14:39:21,996 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 550, loss[loss=0.07466, simple_loss=0.0856, pruned_loss=0.02145, audio_tagging_loss=0.01041, over 15307.00 frames. ], tot_loss[loss=0.07895, simple_loss=0.09896, pruned_loss=0.0189, audio_tagging_loss=0.01057, over 2845392.14 frames. ], batch size: 60, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:39:29,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1125866.6666666667, ans=0.125 2023-11-20 14:39:32,775 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:39:45,554 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168900 2023-11-20 14:39:50,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1126000.0, ans=0.1 2023-11-20 14:39:55,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1126000.0, ans=0.1 2023-11-20 14:40:01,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1126066.6666666667, ans=0.125 2023-11-20 14:40:07,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2023-11-20 14:40:23,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1126133.3333333333, ans=0.125 2023-11-20 14:40:27,399 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 600, loss[loss=0.07685, simple_loss=0.09969, pruned_loss=0.01814, audio_tagging_loss=0.008869, over 15248.00 frames. ], tot_loss[loss=0.07881, simple_loss=0.09916, pruned_loss=0.01885, audio_tagging_loss=0.01038, over 2894867.35 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:40:35,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.140e+01 8.992e+01 9.843e+01 1.226e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 14:40:49,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1126266.6666666667, ans=0.0 2023-11-20 14:40:50,134 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 168950 2023-11-20 14:40:51,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1126333.3333333333, ans=0.125 2023-11-20 14:40:55,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1126333.3333333333, ans=0.0 2023-11-20 14:40:58,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1126333.3333333333, ans=0.2 2023-11-20 14:41:32,767 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 650, loss[loss=0.08604, simple_loss=0.1109, pruned_loss=0.01999, audio_tagging_loss=0.01061, over 15234.00 frames. ], tot_loss[loss=0.07837, simple_loss=0.09875, pruned_loss=0.0187, audio_tagging_loss=0.01029, over 2927330.32 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:41:36,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1126533.3333333333, ans=0.07 2023-11-20 14:41:42,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.78 vs. limit=10.0 2023-11-20 14:41:43,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1126533.3333333333, ans=0.1 2023-11-20 14:41:57,351 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169000 2023-11-20 14:42:11,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1126666.6666666667, ans=0.0 2023-11-20 14:42:13,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1126733.3333333333, ans=0.0 2023-11-20 14:42:36,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1126800.0, ans=0.2 2023-11-20 14:42:38,531 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 700, loss[loss=0.06252, simple_loss=0.07945, pruned_loss=0.01457, audio_tagging_loss=0.00822, over 14938.00 frames. ], tot_loss[loss=0.07909, simple_loss=0.1001, pruned_loss=0.01891, audio_tagging_loss=0.01012, over 2958643.84 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:42:47,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.098e+01 8.725e+01 9.382e+01 1.189e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 14:43:03,411 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169050 2023-11-20 14:43:04,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1127000.0, ans=0.125 2023-11-20 14:43:08,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1127000.0, ans=0.125 2023-11-20 14:43:12,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1127000.0, ans=0.0 2023-11-20 14:43:17,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1127066.6666666667, ans=0.5 2023-11-20 14:43:41,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2023-11-20 14:43:44,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1127200.0, ans=0.125 2023-11-20 14:43:45,360 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 750, loss[loss=0.06872, simple_loss=0.08354, pruned_loss=0.01785, audio_tagging_loss=0.009096, over 13799.00 frames. ], tot_loss[loss=0.07901, simple_loss=0.09981, pruned_loss=0.01892, audio_tagging_loss=0.01018, over 2973528.99 frames. ], batch size: 52, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:43:50,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1127200.0, ans=0.0 2023-11-20 14:43:50,229 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-20 14:43:59,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1127266.6666666667, ans=0.1 2023-11-20 14:44:06,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-11-20 14:44:08,673 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169100 2023-11-20 14:44:12,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1127333.3333333333, ans=0.125 2023-11-20 14:44:25,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1127400.0, ans=0.125 2023-11-20 14:44:28,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1127400.0, ans=0.125 2023-11-20 14:44:50,614 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 800, loss[loss=0.08957, simple_loss=0.1211, pruned_loss=0.02055, audio_tagging_loss=0.008475, over 15570.00 frames. ], tot_loss[loss=0.07923, simple_loss=0.1, pruned_loss=0.01895, audio_tagging_loss=0.01026, over 2997099.15 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:44:50,970 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:44:57,930 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.097e+01 8.575e+01 9.313e+01 1.221e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-20 14:44:58,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1127533.3333333333, ans=0.07 2023-11-20 14:44:58,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1127533.3333333333, ans=0.125 2023-11-20 14:45:09,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=12.0 2023-11-20 14:45:13,830 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169150 2023-11-20 14:45:23,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1127666.6666666667, ans=10.0 2023-11-20 14:45:36,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1127733.3333333333, ans=0.5 2023-11-20 14:45:53,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1127800.0, ans=0.0 2023-11-20 14:45:56,205 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 850, loss[loss=0.08705, simple_loss=0.1066, pruned_loss=0.02134, audio_tagging_loss=0.01241, over 15441.00 frames. ], tot_loss[loss=0.07919, simple_loss=0.09999, pruned_loss=0.01889, audio_tagging_loss=0.01031, over 3007751.55 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:46:04,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1127866.6666666667, ans=0.0 2023-11-20 14:46:21,148 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169200 2023-11-20 14:46:24,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1128000.0, ans=0.0 2023-11-20 14:46:27,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2023-11-20 14:46:42,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1128066.6666666667, ans=0.0 2023-11-20 14:47:02,629 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 900, loss[loss=0.09206, simple_loss=0.114, pruned_loss=0.02666, audio_tagging_loss=0.008382, over 15140.00 frames. ], tot_loss[loss=0.07976, simple_loss=0.1005, pruned_loss=0.01912, audio_tagging_loss=0.01037, over 3015072.63 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:47:11,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.130e+01 8.827e+01 9.752e+01 1.444e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 14:47:15,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1128266.6666666667, ans=0.0 2023-11-20 14:47:26,439 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169250 2023-11-20 14:47:47,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1128400.0, ans=0.0 2023-11-20 14:48:07,410 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 950, loss[loss=0.08366, simple_loss=0.1064, pruned_loss=0.02058, audio_tagging_loss=0.009887, over 15604.00 frames. ], tot_loss[loss=0.07909, simple_loss=0.1001, pruned_loss=0.01883, audio_tagging_loss=0.01021, over 3025250.91 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:48:16,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-20 14:48:22,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1128600.0, ans=0.0 2023-11-20 14:48:30,285 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169300 2023-11-20 14:48:38,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1128666.6666666667, ans=0.125 2023-11-20 14:48:42,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1128666.6666666667, ans=0.1 2023-11-20 14:48:50,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2023-11-20 14:48:50,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1128733.3333333333, ans=0.2 2023-11-20 14:48:51,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=22.5 2023-11-20 14:48:52,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1128733.3333333333, ans=0.2 2023-11-20 14:49:11,803 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1000, loss[loss=0.07095, simple_loss=0.08831, pruned_loss=0.01608, audio_tagging_loss=0.01071, over 15518.00 frames. ], tot_loss[loss=0.07927, simple_loss=0.1007, pruned_loss=0.01907, audio_tagging_loss=0.009857, over 3031449.99 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:49:20,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.251e+01 8.894e+01 9.437e+01 1.345e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 14:49:24,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=1128933.3333333333, ans=6.0 2023-11-20 14:49:30,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1128933.3333333333, ans=0.0 2023-11-20 14:49:35,960 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169350 2023-11-20 14:49:36,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2023-11-20 14:49:40,313 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:50:07,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1129133.3333333333, ans=0.0 2023-11-20 14:50:17,309 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1050, loss[loss=0.06985, simple_loss=0.08658, pruned_loss=0.01746, audio_tagging_loss=0.0091, over 14955.00 frames. ], tot_loss[loss=0.08043, simple_loss=0.1024, pruned_loss=0.01951, audio_tagging_loss=0.009709, over 3034410.62 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:50:26,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1129200.0, ans=0.0 2023-11-20 14:50:40,833 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169400 2023-11-20 14:50:56,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2023-11-20 14:51:03,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1129400.0, ans=0.125 2023-11-20 14:51:07,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1129400.0, ans=0.04949747468305833 2023-11-20 14:51:18,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1129466.6666666667, ans=0.0 2023-11-20 14:51:23,876 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1100, loss[loss=0.05941, simple_loss=0.07796, pruned_loss=0.01044, audio_tagging_loss=0.009988, over 15410.00 frames. ], tot_loss[loss=0.07912, simple_loss=0.1008, pruned_loss=0.01899, audio_tagging_loss=0.009744, over 3029723.94 frames. ], batch size: 61, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:51:26,391 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:51:30,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1129533.3333333333, ans=0.125 2023-11-20 14:51:32,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.387e+01 8.402e+01 8.962e+01 9.739e+01 1.697e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 14:51:35,532 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2023-11-20 14:51:42,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1129600.0, ans=0.125 2023-11-20 14:51:47,041 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169450 2023-11-20 14:51:49,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1129666.6666666667, ans=0.0 2023-11-20 14:51:52,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2023-11-20 14:51:53,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1129666.6666666667, ans=0.2 2023-11-20 14:51:55,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1129666.6666666667, ans=0.125 2023-11-20 14:51:55,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2023-11-20 14:52:29,264 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1150, loss[loss=0.0649, simple_loss=0.08559, pruned_loss=0.01163, audio_tagging_loss=0.01047, over 15377.00 frames. ], tot_loss[loss=0.07914, simple_loss=0.1009, pruned_loss=0.01908, audio_tagging_loss=0.009611, over 3032844.44 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:52:39,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2023-11-20 14:52:40,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1129866.6666666667, ans=0.2 2023-11-20 14:52:48,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1129933.3333333333, ans=0.0 2023-11-20 14:52:52,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1129933.3333333333, ans=0.2 2023-11-20 14:52:53,518 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169500 2023-11-20 14:52:53,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1129933.3333333333, ans=0.1 2023-11-20 14:53:02,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1130000.0, ans=0.0 2023-11-20 14:53:35,185 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1200, loss[loss=0.07899, simple_loss=0.09406, pruned_loss=0.02, audio_tagging_loss=0.01196, over 14021.00 frames. ], tot_loss[loss=0.0788, simple_loss=0.1004, pruned_loss=0.01904, audio_tagging_loss=0.009569, over 3030993.66 frames. ], batch size: 53, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:53:44,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.178e+01 8.897e+01 9.679e+01 1.493e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 14:53:46,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1130200.0, ans=0.125 2023-11-20 14:53:58,869 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169550 2023-11-20 14:54:01,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1130333.3333333333, ans=0.07 2023-11-20 14:54:06,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1130333.3333333333, ans=0.125 2023-11-20 14:54:11,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1130333.3333333333, ans=0.125 2023-11-20 14:54:40,148 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1250, loss[loss=0.09815, simple_loss=0.1234, pruned_loss=0.02876, audio_tagging_loss=0.007708, over 15390.00 frames. ], tot_loss[loss=0.07968, simple_loss=0.1013, pruned_loss=0.01945, audio_tagging_loss=0.009588, over 3029795.00 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:54:51,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1130533.3333333333, ans=0.2 2023-11-20 14:55:02,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1130600.0, ans=0.125 2023-11-20 14:55:03,161 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169600 2023-11-20 14:55:06,504 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:55:23,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1130733.3333333333, ans=0.0 2023-11-20 14:55:27,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1130733.3333333333, ans=0.125 2023-11-20 14:55:29,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1130733.3333333333, ans=0.125 2023-11-20 14:55:37,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1130800.0, ans=0.0 2023-11-20 14:55:44,763 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1300, loss[loss=0.0874, simple_loss=0.1023, pruned_loss=0.0248, audio_tagging_loss=0.01144, over 14901.00 frames. ], tot_loss[loss=0.07949, simple_loss=0.1011, pruned_loss=0.01933, audio_tagging_loss=0.009625, over 3030313.84 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:55:45,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.78 vs. limit=10.0 2023-11-20 14:55:53,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.559e+01 8.274e+01 8.667e+01 1.016e+02 1.258e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 14:56:04,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1130933.3333333333, ans=0.0 2023-11-20 14:56:08,384 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169650 2023-11-20 14:56:10,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1131000.0, ans=0.0 2023-11-20 14:56:16,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131000.0, ans=0.1 2023-11-20 14:56:19,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1131000.0, ans=0.0 2023-11-20 14:56:49,816 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1350, loss[loss=0.07215, simple_loss=0.08885, pruned_loss=0.01727, audio_tagging_loss=0.01046, over 15612.00 frames. ], tot_loss[loss=0.07931, simple_loss=0.1008, pruned_loss=0.01923, audio_tagging_loss=0.009646, over 3037144.37 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:56:58,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1131200.0, ans=0.125 2023-11-20 14:57:02,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1131266.6666666667, ans=0.125 2023-11-20 14:57:13,644 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169700 2023-11-20 14:57:16,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1131333.3333333333, ans=0.125 2023-11-20 14:57:18,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1131333.3333333333, ans=0.0 2023-11-20 14:57:26,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1131333.3333333333, ans=0.0 2023-11-20 14:57:29,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2023-11-20 14:57:36,948 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:57:54,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1131533.3333333333, ans=0.0 2023-11-20 14:57:55,488 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1400, loss[loss=0.06958, simple_loss=0.08904, pruned_loss=0.01542, audio_tagging_loss=0.009637, over 15330.00 frames. ], tot_loss[loss=0.0798, simple_loss=0.1017, pruned_loss=0.01933, audio_tagging_loss=0.009599, over 3040876.87 frames. ], batch size: 61, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:57:55,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1131533.3333333333, ans=0.0 2023-11-20 14:58:03,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131533.3333333333, ans=0.1 2023-11-20 14:58:04,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.814e+01 7.998e+01 8.583e+01 9.280e+01 1.349e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 14:58:15,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1131600.0, ans=0.125 2023-11-20 14:58:19,169 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169750 2023-11-20 14:58:19,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1131600.0, ans=0.1 2023-11-20 14:58:29,146 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=15.0 2023-11-20 14:58:38,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1131733.3333333333, ans=0.0 2023-11-20 14:58:53,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1131800.0, ans=0.125 2023-11-20 14:59:00,603 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1450, loss[loss=0.06576, simple_loss=0.07767, pruned_loss=0.0139, audio_tagging_loss=0.01302, over 15412.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1017, pruned_loss=0.01934, audio_tagging_loss=0.009778, over 3036447.57 frames. ], batch size: 60, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:59:07,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1131866.6666666667, ans=0.0 2023-11-20 14:59:09,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-11-20 14:59:24,562 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169800 2023-11-20 15:00:06,397 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1500, loss[loss=0.07734, simple_loss=0.09091, pruned_loss=0.0206, audio_tagging_loss=0.01128, over 14422.00 frames. ], tot_loss[loss=0.07959, simple_loss=0.1007, pruned_loss=0.01933, audio_tagging_loss=0.009917, over 3036870.27 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:00:17,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.270e+01 9.018e+01 9.743e+01 1.216e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 15:00:17,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-20 15:00:25,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1132266.6666666667, ans=0.125 2023-11-20 15:00:29,958 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169850 2023-11-20 15:00:36,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1132333.3333333333, ans=0.1 2023-11-20 15:00:49,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1132400.0, ans=0.125 2023-11-20 15:00:49,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-20 15:00:51,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1132400.0, ans=0.125 2023-11-20 15:01:01,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1132466.6666666667, ans=0.5 2023-11-20 15:01:11,538 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1550, loss[loss=0.07107, simple_loss=0.08435, pruned_loss=0.01604, audio_tagging_loss=0.01286, over 15566.00 frames. ], tot_loss[loss=0.07889, simple_loss=0.09954, pruned_loss=0.01902, audio_tagging_loss=0.01009, over 3039282.21 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:01:24,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1132600.0, ans=0.0 2023-11-20 15:01:34,432 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169900 2023-11-20 15:01:38,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1132666.6666666667, ans=0.07 2023-11-20 15:01:39,672 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:02:15,845 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1600, loss[loss=0.08163, simple_loss=0.112, pruned_loss=0.0184, audio_tagging_loss=0.007247, over 16731.00 frames. ], tot_loss[loss=0.07848, simple_loss=0.09893, pruned_loss=0.01887, audio_tagging_loss=0.01014, over 3035974.05 frames. ], batch size: 61, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:02:25,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1132866.6666666667, ans=0.0 2023-11-20 15:02:26,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.675e+01 8.383e+01 8.914e+01 9.693e+01 2.648e+02, threshold=1.783e+02, percent-clipped=1.0 2023-11-20 15:02:39,772 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 169950 2023-11-20 15:03:11,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1133133.3333333333, ans=0.1 2023-11-20 15:03:20,768 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1650, loss[loss=0.07698, simple_loss=0.0974, pruned_loss=0.02035, audio_tagging_loss=0.007936, over 14832.00 frames. ], tot_loss[loss=0.07907, simple_loss=0.09986, pruned_loss=0.0191, audio_tagging_loss=0.01005, over 3036000.23 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:03:22,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1133200.0, ans=0.0 2023-11-20 15:03:22,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2023-11-20 15:03:37,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=22.5 2023-11-20 15:03:44,530 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170000 2023-11-20 15:03:49,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1133333.3333333333, ans=0.125 2023-11-20 15:04:19,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1133466.6666666667, ans=0.2 2023-11-20 15:04:20,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1133466.6666666667, ans=0.0 2023-11-20 15:04:25,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1133533.3333333333, ans=0.125 2023-11-20 15:04:26,885 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1700, loss[loss=0.09381, simple_loss=0.1195, pruned_loss=0.02332, audio_tagging_loss=0.01076, over 14109.00 frames. ], tot_loss[loss=0.07916, simple_loss=0.1001, pruned_loss=0.01899, audio_tagging_loss=0.0101, over 3040108.04 frames. ], batch size: 53, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:04:36,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 7.994e+01 8.673e+01 9.340e+01 1.265e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 15:04:43,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1133600.0, ans=0.0 2023-11-20 15:04:48,877 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170050 2023-11-20 15:05:15,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-20 15:05:22,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1133800.0, ans=0.2 2023-11-20 15:05:25,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1133800.0, ans=0.125 2023-11-20 15:05:27,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1133800.0, ans=0.0 2023-11-20 15:05:30,950 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1750, loss[loss=0.06945, simple_loss=0.07827, pruned_loss=0.01728, audio_tagging_loss=0.01303, over 15179.00 frames. ], tot_loss[loss=0.07878, simple_loss=0.09934, pruned_loss=0.01895, audio_tagging_loss=0.01016, over 3033265.48 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:05:33,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1133866.6666666667, ans=0.2 2023-11-20 15:05:40,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1133866.6666666667, ans=0.0 2023-11-20 15:05:43,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1133933.3333333333, ans=0.02 2023-11-20 15:05:46,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1133933.3333333333, ans=0.0 2023-11-20 15:05:49,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1133933.3333333333, ans=0.1 2023-11-20 15:05:54,417 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170100 2023-11-20 15:06:01,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1134000.0, ans=0.125 2023-11-20 15:06:13,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1134066.6666666667, ans=0.125 2023-11-20 15:06:14,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1134066.6666666667, ans=0.0 2023-11-20 15:06:34,897 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1800, loss[loss=0.07238, simple_loss=0.09568, pruned_loss=0.01684, audio_tagging_loss=0.007694, over 14557.00 frames. ], tot_loss[loss=0.0783, simple_loss=0.09894, pruned_loss=0.01883, audio_tagging_loss=0.01, over 3039930.33 frames. ], batch size: 53, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:06:38,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-20 15:06:42,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2023-11-20 15:06:46,178 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.918e+01 8.061e+01 8.642e+01 9.411e+01 1.208e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 15:06:50,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1134266.6666666667, ans=0.0 2023-11-20 15:06:59,365 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170150 2023-11-20 15:07:09,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1134333.3333333333, ans=0.125 2023-11-20 15:07:23,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1134400.0, ans=0.0 2023-11-20 15:07:39,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1134533.3333333333, ans=0.2 2023-11-20 15:07:40,581 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1850, loss[loss=0.1035, simple_loss=0.1347, pruned_loss=0.02609, audio_tagging_loss=0.01008, over 16106.00 frames. ], tot_loss[loss=0.07778, simple_loss=0.09849, pruned_loss=0.01856, audio_tagging_loss=0.00997, over 3044905.50 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:07:54,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2023-11-20 15:08:01,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2023-11-20 15:08:02,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-20 15:08:03,569 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170200 2023-11-20 15:08:05,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1134666.6666666667, ans=0.1 2023-11-20 15:08:06,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1134666.6666666667, ans=0.05 2023-11-20 15:08:10,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-20 15:08:20,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1134733.3333333333, ans=0.125 2023-11-20 15:08:44,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1134866.6666666667, ans=15.0 2023-11-20 15:08:45,404 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1900, loss[loss=0.08388, simple_loss=0.1161, pruned_loss=0.01896, audio_tagging_loss=0.006876, over 14672.00 frames. ], tot_loss[loss=0.07831, simple_loss=0.09948, pruned_loss=0.01879, audio_tagging_loss=0.009782, over 3053636.93 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:08:49,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1134866.6666666667, ans=0.125 2023-11-20 15:08:56,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.016e+01 8.806e+01 9.698e+01 1.880e+02, threshold=1.761e+02, percent-clipped=1.0 2023-11-20 15:08:59,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1134933.3333333333, ans=0.0 2023-11-20 15:09:08,240 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170250 2023-11-20 15:09:08,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1134933.3333333333, ans=0.95 2023-11-20 15:09:08,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1134933.3333333333, ans=0.125 2023-11-20 15:09:12,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1135000.0, ans=0.125 2023-11-20 15:09:18,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=22.5 2023-11-20 15:09:34,917 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-20 15:09:45,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-20 15:09:49,319 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 1950, loss[loss=0.05288, simple_loss=0.05526, pruned_loss=0.0113, audio_tagging_loss=0.01395, over 14078.00 frames. ], tot_loss[loss=0.07765, simple_loss=0.0986, pruned_loss=0.01856, audio_tagging_loss=0.009791, over 3049938.22 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:09:49,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1135200.0, ans=0.125 2023-11-20 15:09:51,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1135200.0, ans=0.025 2023-11-20 15:10:13,356 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170300 2023-11-20 15:10:13,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1135266.6666666667, ans=0.125 2023-11-20 15:10:53,735 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2000, loss[loss=0.09031, simple_loss=0.1253, pruned_loss=0.01917, audio_tagging_loss=0.008517, over 14320.00 frames. ], tot_loss[loss=0.07805, simple_loss=0.09903, pruned_loss=0.01869, audio_tagging_loss=0.009849, over 3042280.36 frames. ], batch size: 52, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:10:58,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1135533.3333333333, ans=0.0 2023-11-20 15:11:01,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1135533.3333333333, ans=0.0 2023-11-20 15:11:04,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1135533.3333333333, ans=0.07 2023-11-20 15:11:05,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 7.869e+01 8.530e+01 9.315e+01 1.202e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 15:11:14,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-20 15:11:16,605 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170350 2023-11-20 15:11:22,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1135666.6666666667, ans=0.125 2023-11-20 15:11:26,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2023-11-20 15:11:50,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1135800.0, ans=0.125 2023-11-20 15:11:58,377 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2050, loss[loss=0.07982, simple_loss=0.1016, pruned_loss=0.01878, audio_tagging_loss=0.01024, over 15060.00 frames. ], tot_loss[loss=0.0777, simple_loss=0.09852, pruned_loss=0.01855, audio_tagging_loss=0.009891, over 3044805.17 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:12:21,269 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170400 2023-11-20 15:12:31,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-11-20 15:13:02,587 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2100, loss[loss=0.109, simple_loss=0.142, pruned_loss=0.02984, audio_tagging_loss=0.008187, over 14683.00 frames. ], tot_loss[loss=0.07828, simple_loss=0.09915, pruned_loss=0.01882, audio_tagging_loss=0.00988, over 3038505.54 frames. ], batch size: 53, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:13:06,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1136200.0, ans=0.0 2023-11-20 15:13:14,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.226e+01 8.979e+01 1.003e+02 1.386e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 15:13:21,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1136266.6666666667, ans=0.0 2023-11-20 15:13:26,666 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170450 2023-11-20 15:13:57,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2023-11-20 15:14:07,085 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2150, loss[loss=0.08331, simple_loss=0.1141, pruned_loss=0.01615, audio_tagging_loss=0.01009, over 15835.00 frames. ], tot_loss[loss=0.07852, simple_loss=0.0994, pruned_loss=0.01903, audio_tagging_loss=0.009788, over 3035203.28 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:14:12,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1136533.3333333333, ans=0.2 2023-11-20 15:14:30,500 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170500 2023-11-20 15:14:38,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1136666.6666666667, ans=0.125 2023-11-20 15:14:41,542 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:14:45,071 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:14:54,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1136733.3333333333, ans=0.125 2023-11-20 15:15:12,321 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2200, loss[loss=0.07017, simple_loss=0.09219, pruned_loss=0.01593, audio_tagging_loss=0.008146, over 15044.00 frames. ], tot_loss[loss=0.07809, simple_loss=0.09899, pruned_loss=0.01877, audio_tagging_loss=0.009827, over 3034230.35 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:15:19,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1136866.6666666667, ans=0.125 2023-11-20 15:15:23,633 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.421e+01 8.880e+01 9.731e+01 1.234e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 15:15:26,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-20 15:15:34,872 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170550 2023-11-20 15:15:49,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1137066.6666666667, ans=0.0 2023-11-20 15:15:52,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137066.6666666667, ans=0.1 2023-11-20 15:16:13,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1137133.3333333333, ans=0.0 2023-11-20 15:16:16,545 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2250, loss[loss=0.0899, simple_loss=0.1273, pruned_loss=0.01838, audio_tagging_loss=0.007885, over 15197.00 frames. ], tot_loss[loss=0.07814, simple_loss=0.09918, pruned_loss=0.01869, audio_tagging_loss=0.009853, over 3033113.45 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:16:16,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1137200.0, ans=0.0 2023-11-20 15:16:39,906 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170600 2023-11-20 15:16:40,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1137266.6666666667, ans=0.125 2023-11-20 15:16:42,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1137333.3333333333, ans=0.125 2023-11-20 15:17:19,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1137466.6666666667, ans=0.125 2023-11-20 15:17:21,560 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2300, loss[loss=0.09313, simple_loss=0.1245, pruned_loss=0.02308, audio_tagging_loss=0.007778, over 15713.00 frames. ], tot_loss[loss=0.07911, simple_loss=0.1007, pruned_loss=0.01895, audio_tagging_loss=0.009813, over 3026507.63 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:17:33,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.112e+01 8.584e+01 9.317e+01 1.375e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 15:17:44,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-11-20 15:17:45,544 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170650 2023-11-20 15:17:51,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1137666.6666666667, ans=0.125 2023-11-20 15:17:51,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1137666.6666666667, ans=0.125 2023-11-20 15:18:02,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1137733.3333333333, ans=0.0 2023-11-20 15:18:09,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137733.3333333333, ans=0.1 2023-11-20 15:18:14,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-11-20 15:18:16,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1137800.0, ans=0.2 2023-11-20 15:18:18,354 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:18:26,352 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2350, loss[loss=0.06077, simple_loss=0.07044, pruned_loss=0.01335, audio_tagging_loss=0.01219, over 15194.00 frames. ], tot_loss[loss=0.08002, simple_loss=0.1015, pruned_loss=0.01935, audio_tagging_loss=0.009943, over 3037130.75 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:18:36,105 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=10.0 2023-11-20 15:18:36,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1137866.6666666667, ans=0.0 2023-11-20 15:18:45,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1137933.3333333333, ans=0.0 2023-11-20 15:18:48,989 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170700 2023-11-20 15:19:03,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1138066.6666666667, ans=0.0 2023-11-20 15:19:30,687 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2400, loss[loss=0.09626, simple_loss=0.1217, pruned_loss=0.02404, audio_tagging_loss=0.01136, over 15817.00 frames. ], tot_loss[loss=0.08048, simple_loss=0.1018, pruned_loss=0.01943, audio_tagging_loss=0.01015, over 3039006.22 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:19:31,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1138200.0, ans=0.09899494936611666 2023-11-20 15:19:38,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1138200.0, ans=0.125 2023-11-20 15:19:42,449 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-20 15:19:42,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.480e+01 8.080e+01 8.821e+01 9.568e+01 1.388e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 15:19:45,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1138266.6666666667, ans=0.0 2023-11-20 15:19:54,173 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170750 2023-11-20 15:19:59,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1138333.3333333333, ans=0.125 2023-11-20 15:20:04,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1138333.3333333333, ans=0.125 2023-11-20 15:20:09,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1138400.0, ans=0.125 2023-11-20 15:20:10,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1138400.0, ans=0.125 2023-11-20 15:20:23,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1138466.6666666667, ans=0.0 2023-11-20 15:20:32,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1138466.6666666667, ans=0.125 2023-11-20 15:20:35,641 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2450, loss[loss=0.0777, simple_loss=0.1022, pruned_loss=0.01748, audio_tagging_loss=0.009135, over 15414.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1013, pruned_loss=0.0192, audio_tagging_loss=0.01023, over 3041405.66 frames. ], batch size: 60, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:20:35,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1138533.3333333333, ans=0.0 2023-11-20 15:20:44,245 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.33 vs. limit=10.0 2023-11-20 15:20:55,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1138600.0, ans=0.125 2023-11-20 15:20:59,041 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170800 2023-11-20 15:20:59,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1138600.0, ans=0.125 2023-11-20 15:21:08,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1138666.6666666667, ans=0.125 2023-11-20 15:21:11,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1138666.6666666667, ans=0.1 2023-11-20 15:21:30,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=12.0 2023-11-20 15:21:41,379 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2500, loss[loss=0.0879, simple_loss=0.1103, pruned_loss=0.02223, audio_tagging_loss=0.0105, over 15933.00 frames. ], tot_loss[loss=0.07926, simple_loss=0.1001, pruned_loss=0.01889, audio_tagging_loss=0.01035, over 3043295.81 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:21:45,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=22.5 2023-11-20 15:21:50,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2023-11-20 15:21:54,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.766e+01 8.090e+01 8.633e+01 9.562e+01 1.495e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 15:22:04,113 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170850 2023-11-20 15:22:15,957 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:22:25,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1139066.6666666667, ans=10.0 2023-11-20 15:22:30,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1139066.6666666667, ans=0.0 2023-11-20 15:22:36,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1139133.3333333333, ans=0.0 2023-11-20 15:22:37,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-11-20 15:22:45,301 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2550, loss[loss=0.07117, simple_loss=0.08202, pruned_loss=0.01914, audio_tagging_loss=0.01102, over 15124.00 frames. ], tot_loss[loss=0.07828, simple_loss=0.09895, pruned_loss=0.01861, audio_tagging_loss=0.01019, over 3044166.61 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:22:49,946 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.582e-01 2023-11-20 15:22:50,252 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-11-20 15:22:58,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=22.5 2023-11-20 15:23:08,689 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170900 2023-11-20 15:23:10,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 15:23:14,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 15:23:20,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 15:23:42,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1139466.6666666667, ans=0.0 2023-11-20 15:23:50,108 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2600, loss[loss=0.06949, simple_loss=0.08643, pruned_loss=0.01375, audio_tagging_loss=0.01252, over 15303.00 frames. ], tot_loss[loss=0.07799, simple_loss=0.09884, pruned_loss=0.01853, audio_tagging_loss=0.01005, over 3041078.84 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:23:50,570 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:23:53,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-11-20 15:23:54,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1139533.3333333333, ans=10.0 2023-11-20 15:23:54,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.36 vs. limit=10.0 2023-11-20 15:24:03,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1139600.0, ans=0.125 2023-11-20 15:24:04,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.100e+01 8.871e+01 9.785e+01 4.234e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 15:24:11,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1139600.0, ans=0.0 2023-11-20 15:24:13,749 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 170950 2023-11-20 15:24:42,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1139800.0, ans=0.125 2023-11-20 15:24:55,152 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2650, loss[loss=0.08859, simple_loss=0.1193, pruned_loss=0.02149, audio_tagging_loss=0.007453, over 14969.00 frames. ], tot_loss[loss=0.07827, simple_loss=0.09909, pruned_loss=0.01877, audio_tagging_loss=0.009954, over 3036558.70 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:25:18,204 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 171000 2023-11-20 15:25:59,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.28 vs. limit=12.0 2023-11-20 15:26:00,163 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2700, loss[loss=0.09686, simple_loss=0.1129, pruned_loss=0.02861, audio_tagging_loss=0.01178, over 16678.00 frames. ], tot_loss[loss=0.07841, simple_loss=0.09951, pruned_loss=0.01875, audio_tagging_loss=0.009914, over 3041115.56 frames. ], batch size: 63, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:26:14,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.575e+01 8.137e+01 8.718e+01 9.635e+01 1.314e+02, threshold=1.744e+02, percent-clipped=1.0 2023-11-20 15:26:23,570 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 171050 2023-11-20 15:26:24,304 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-20 15:26:36,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1140333.3333333333, ans=0.1 2023-11-20 15:27:04,568 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2750, loss[loss=0.05076, simple_loss=0.05178, pruned_loss=0.0097, audio_tagging_loss=0.01516, over 15380.00 frames. ], tot_loss[loss=0.07749, simple_loss=0.09805, pruned_loss=0.01843, audio_tagging_loss=0.01003, over 3040665.69 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:27:16,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-20 15:27:22,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-11-20 15:27:28,523 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 171100 2023-11-20 15:27:33,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1140666.6666666667, ans=0.125 2023-11-20 15:27:42,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1140733.3333333333, ans=0.1 2023-11-20 15:28:00,069 WARNING [train_asr.py:1506] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:28:08,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1140866.6666666667, ans=0.0 2023-11-20 15:28:09,481 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2800, loss[loss=0.07013, simple_loss=0.08598, pruned_loss=0.01541, audio_tagging_loss=0.01173, over 16456.00 frames. ], tot_loss[loss=0.0773, simple_loss=0.09777, pruned_loss=0.01833, audio_tagging_loss=0.01008, over 3049909.23 frames. ], batch size: 62, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:28:20,635 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-11-20 15:28:23,481 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.091e+01 8.017e+01 8.655e+01 9.428e+01 1.274e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 15:28:32,320 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 171150 2023-11-20 15:28:49,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2023-11-20 15:28:55,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1141066.6666666667, ans=0.0 2023-11-20 15:29:04,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1141133.3333333333, ans=0.0 2023-11-20 15:29:13,870 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2850, loss[loss=0.06892, simple_loss=0.09249, pruned_loss=0.01323, audio_tagging_loss=0.009436, over 14691.00 frames. ], tot_loss[loss=0.07723, simple_loss=0.09806, pruned_loss=0.01834, audio_tagging_loss=0.009862, over 3038864.73 frames. ], batch size: 54, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:29:27,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1141266.6666666667, ans=0.125 2023-11-20 15:29:30,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1141266.6666666667, ans=0.125 2023-11-20 15:29:37,362 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 171200 2023-11-20 15:29:37,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1141266.6666666667, ans=0.125 2023-11-20 15:30:18,052 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2900, loss[loss=0.07788, simple_loss=0.0969, pruned_loss=0.01898, audio_tagging_loss=0.01044, over 15059.00 frames. ], tot_loss[loss=0.07736, simple_loss=0.09798, pruned_loss=0.01848, audio_tagging_loss=0.009896, over 3030730.18 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:30:19,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1141533.3333333333, ans=0.0 2023-11-20 15:30:32,506 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.283e+01 7.914e+01 8.602e+01 9.219e+01 1.779e+02, threshold=1.720e+02, percent-clipped=1.0 2023-11-20 15:30:42,575 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 171250 2023-11-20 15:31:06,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2023-11-20 15:31:18,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1141800.0, ans=0.125 2023-11-20 15:31:21,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1141800.0, ans=0.1 2023-11-20 15:31:23,105 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 2950, loss[loss=0.09201, simple_loss=0.1263, pruned_loss=0.0214, audio_tagging_loss=0.007458, over 14776.00 frames. ], tot_loss[loss=0.0775, simple_loss=0.09806, pruned_loss=0.01851, audio_tagging_loss=0.009959, over 3034163.74 frames. ], batch size: 53, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:31:26,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1141866.6666666667, ans=0.125 2023-11-20 15:31:31,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-20 15:31:45,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.56 vs. limit=10.0 2023-11-20 15:31:46,773 INFO [model.py:792] (2/4) Freeze_encoder: False; Current batch idx: 171300 2023-11-20 15:31:50,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-20 15:31:51,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1142000.0, ans=0.04949747468305833 2023-11-20 15:31:55,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1142000.0, ans=0.125 2023-11-20 15:32:15,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1142133.3333333333, ans=0.5 2023-11-20 15:32:24,047 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.95 vs. limit=10.0 2023-11-20 15:32:25,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1142133.3333333333, ans=0.1 2023-11-20 15:32:28,334 INFO [train_asr.py:1262] (2/4) Epoch 15, batch 3000, loss[loss=0.07792, simple_loss=0.09105, pruned_loss=0.02249, audio_tagging_loss=0.009904, over 14667.00 frames. ], tot_loss[loss=0.07811, simple_loss=0.09878, pruned_loss=0.01872, audio_tagging_loss=0.01, over 3042278.40 frames. ], batch size: 56, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 15:32:28,334 INFO [train_asr.py:1285] (2/4) Computing validation loss